-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graph#query is faster than using Queryable#query #169
Conversation
The shared specs are failing because they expect inheritance instead of aggregation. @no-reply What should we do? |
Just a note - this is significantly faster with even just a few hundred triples. |
Should this be fixed upstream? I can't look at it closely just yet (travel), but it seems like it might be a problem further up? Either way, I'll take a look in the next 24 hours and either resolve the spec or figure out what needs to happen in Queryable/Graph. |
What's odd is that |
It was calling query here: https://github.com/ruby-rdf/rdf/blob/develop/lib/rdf/mixin/queryable.rb#L44 |
That's true only if you're using an in-memory graph. Also grep is simply the fallback implementation, which might happen if you do If you're using an RDF::Graph, you should see that |
@gkellogg The fact of the matter is, our performance was awful before this change ( O(n^2) ) because it's iterating over every statement in the graph for each resource and not using RDF::Repository#query_pattern. All the tests are passing except the shared examples that are expecting a particular method to get called. Can you provide some other test that shows this is not a good solution? |
Here's the old backtrace:
|
Here's the new:
|
So we're cutting out:
and replacing it with:
|
This must depend on some other factor. In the first case, something is causing the statements to be turned into an array, which invokes the grep version of query_pattern. In the second case, you're avoiding an array representation, so you can stick with the Graph version of query_pattern. I can't tell specifically from the call graph why this would be. It would be useful to see some stand-alone code which reproduced each case. It would also be useful to see if query_pattern can be eliminated entirely; I'm not clear on why adding all statements involves a query. |
@gkellogg Is this because AT::RDFSource includes Queryable and has a graph (rather than is a graph)? Since it's already delegating |
Sure, that's why. If you implemented your own query_pattern that delegates to the graph, that would probably do the trick too. You'd also need to consider query_execute. Just delegating directly to the graph seems reasonable. The point is, if you are extending Queryable, you need to consider the methods it should implement. |
Following this discussion, it seems like the failing tests are indeed testing an implementation detail, rather than the expected behavior. If they can be fixed quickly, I think that's best. Otherwise, I'm.okay with opening a ticket, suppressing the failure, and merging. |
@gkellogg Graph#query_pattern is private, so I can't delegate to it: https://github.com/ruby-rdf/rdf/blob/develop/lib/rdf/model/graph.rb#L285 |
@gkellogg So are you 👍 to delegating to the inner graph? The problem is effectively that AT::Resources include RDF::Enumerable, which greps over #each_statement, which for AT::Resources is obtained from the graph powering the resource. |
In this case, it seems like the right thing to do. I was mostly concerned if the speed issues were central to RDF.rb, which it seems they are not. So +1 to your delegation mechanism. |
There are still speed issues, but I'm not sure if I can fix them - RDF::Repository's #query_pattern implementation seems to be close to as fast as it can be. |
Specs for `Queryable#query` tested internal implementation rather than behavior, this caused issues for classes mixing in `Queryable`, but delegating `#query` to some collaborator. See, e.g. ActiveTriples/ActiveTriples#169. This tests the contract directly, instead.
There's still another fix to go in related to ruby-rdf/rdf-spec#33. See the comments there. A PR is welcome. Otherwise, I'm probably getting to it this evening. |
See especially: ruby-rdf/rdf-spec#33 (comment) |
Queryable#query filters the result of #each which is calling Graph#query anyway.
RDF.rb is now more aggressive about rejecting statements with empty subject, predicate, or object on attempted insertion to a Graph. This acknowledges that change and adjusts its specs to use a different statement.
Graph#query is faster than using Queryable#query
Queryable#query filters the result of #each which is calling Graph#query
anyway.