Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow groupCounts #12

Closed
bwalsh opened this issue Jun 5, 2017 · 3 comments
Closed

Slow groupCounts #12

bwalsh opened this issue Jun 5, 2017 · 3 comments
Assignees

Comments

@bwalsh
Copy link
Collaborator

bwalsh commented Jun 5, 2017

{"query":[{"has":{"key":"gid","value":{"s":"type:Individual"}}},{"out":{"labels":["hasInstance"]}},{"groupCount":{"key":"info:gender"}}]} took  7964 ms


{"query":[{"has":{"key":"gid","value":{"s":"type:Individual"}}},{"out":{"labels":["hasInstance"]}},{"groupCount":{"key":"info:tumor_status"}}]} took  15424 ms


@prismofeverything prismofeverything self-assigned this Jun 9, 2017
@prismofeverything
Copy link
Contributor

Yes, this is vexing. Something about groupCount in Janus in general is slow. I have spent too much time on this issue for now, but I think a solution within Janus is possible. We can use elastic search for this in the immediate term, but I don't think that is a final solution.

I built a test using the mongo aggregation pipeline to do this, and using the exact same data these queries (for gender and tumor_status) returned in ~200 ms. Woefully, the query for which samples have a given mutation took about 150000. I think it could work, but the information you would need to filter out all the edges is not available until you actually get all the way to the Gene vertex. Pulling the gene symbol back towards the sample, maybe all the way to variantInBiosample, could fix that. Does more widely distributing the data like this throughout the graph compromise the "purity" of the graph? Ideally we could query things like group counts on properties multiple hops through millions of edges away. In practice, we may have to compromise there.

The other option, besides Janus or Mongo aggregation pipelines is to use Kyle's Arachne: https://github.com/bmeg/arachne. I tried installing it to load in our data and test the same queries, but ran into this issue: bmeg/grip#2 @kellrott

Postponing until I can devote the time I really need to this one.

@bwalsh
Copy link
Collaborator Author

bwalsh commented Jun 22, 2017

@prismofeverything @kellrott Hey guys. Can we take a few minutes tomorrow to review bmeg/bmeg-proxy#34

@kellrott
Copy link
Member

Issue redefined in bmeg/bmeg-proxy#36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants