Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph#query is faster than using Queryable#query #169

Merged
merged 3 commits into from
Nov 4, 2015
Merged

Conversation

jcoyne
Copy link
Contributor

@jcoyne jcoyne commented Oct 29, 2015

Queryable#query filters the result of #each which is calling Graph#query
anyway.

@jcoyne
Copy link
Contributor Author

jcoyne commented Oct 29, 2015

The shared specs are failing because they expect inheritance instead of aggregation. @no-reply What should we do?

@tpendragon
Copy link
Contributor

Just a note - this is significantly faster with even just a few hundred triples.

@no-reply
Copy link
Member

Should this be fixed upstream? I can't look at it closely just yet (travel), but it seems like it might be a problem further up?

Either way, I'll take a look in the next 24 hours and either resolve the spec or figure out what needs to happen in Queryable/Graph.

@gkellogg
Copy link

What's odd is that RDF::Graph#query is implemented using RDF::Queryable#query, unless you have your own Graph implementation. I don't see in Queryable#query where solutions are always filtered either. I agree with @no-reply that if there is some speedup needed, it should probably go into RDF.rb.

@jcoyne
Copy link
Contributor Author

jcoyne commented Oct 30, 2015

It was calling query here: https://github.com/ruby-rdf/rdf/blob/develop/lib/rdf/mixin/queryable.rb#L44
Which calls query_pattern here: https://github.com/ruby-rdf/rdf/blob/develop/lib/rdf/mixin/queryable.rb#L145 (read the doc in this method)
Which calls grep (which does the linear scan), which calls each, which is finally delegated to Graph#query_pattern. I was able to remove all that code and go direct to Graph#query_pattern, which uses an index instead of checking if each statement matches (in the grep).

@gkellogg
Copy link

That's true only if you're using an in-memory graph. Also grep is simply the fallback implementation, which might happen if you do [].extend(RDF::Queryable); it's implemented in Queryable in case an actual class doesn't implement it. Otherwise, it ends up using Repository's #query_pattern, which does the indexing you updated.

If you're using an RDF::Graph, you should see that RDF::Graph#query, calls RDF::Graph#query_pattern, which calls RDF::Repository#query_pattern. If you're seeing some other behavior, that would be worth investigating.

@jcoyne
Copy link
Contributor Author

jcoyne commented Oct 30, 2015

@gkellogg The fact of the matter is, our performance was awful before this change ( O(n^2) ) because it's iterating over every statement in the graph for each resource and not using RDF::Repository#query_pattern. All the tests are passing except the shared examples that are expecting a particular method to get called. Can you provide some other test that shows this is not a good solution?

@jcoyne
Copy link
Contributor Author

jcoyne commented Oct 30, 2015

Here's the old backtrace:

--> #0  RDF::Repository::Implementation.query_pattern(pattern#RDF::Query::Pattern, &block#Proc) at /Users/justin/workspace/rdf/lib/rdf/repository.rb:363
    #1  RDF::Queryable.query(pattern#RDF::Query::Pattern, options#Hash, &block#Proc) at /Users/justin/workspace/rdf/lib/rdf/mixin/queryable.rb:86
    #2  RDF::Graph.each(&block#Proc) at /Users/justin/workspace/rdf/lib/rdf/model/graph.rb:262
    #3  ActiveTriples::Resource.each(*args#Array, &block#Proc) at /Users/justin/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/active-triples-0.7.2/lib/active_triples/rdf_source.rb:76
    ͱ-- #4  Enumerable.grep() at /Users/justin/workspace/rdf/lib/rdf/mixin/queryable.rb:151
    #5  RDF::Queryable.query_pattern(pattern#RDF::Query::Pattern, options#Hash, &block#Proc) at /Users/justin/workspace/rdf/lib/rdf/mixin/queryable.rb:151
    #6  block in RDF::Queryable.block in enum_for(method#Symbol, *args#Array) at /Users/justin/workspace/rdf/lib/rdf/mixin/queryable.rb:303
    ͱ-- #7  Enumerator::Generator.each(*args) at /Users/justin/workspace/rdf/lib/rdf/mixin/enumerable.rb:159
    ͱ-- #8  Enumerator.each(*args) at /Users/justin/workspace/rdf/lib/rdf/mixin/enumerable.rb:159
    #9  RDF::Enumerable.each_statement(&block#Proc) at /Users/justin/workspace/rdf/lib/rdf/mixin/enumerable.rb:159
    #10 RDF::Writable.insert_statements(statements#RDF::Queryable::Enumerator) at /Users/justin/workspace/rdf/lib/rdf/mixin/writable.rb:129
    #11 RDF::Writable.<<(data#RDF::Queryable::Enumerator) at /Users/justin/workspace/rdf/lib/rdf/mixin/writable.rb:32
    #12 RDF::Mutable.<<(data#RDF::Queryable::Enumerator) at /Users/justin/workspace/rdf/lib/rdf/mixin/mutable.rb:77
    #13 ActiveTriples::RDFSource.reload at /Users/justin/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/active-triples-0.7.2/lib/active_triples/rdf_source.rb:295

@jcoyne
Copy link
Contributor Author

jcoyne commented Oct 30, 2015

Here's the new:

--> #0  RDF::Repository::Implementation.query_pattern(pattern#RDF::Query::Pattern, &block#Proc) at /Users/justin/workspace/rdf/lib/rdf/repository.rb:363
    #1  RDF::Queryable.query(pattern#RDF::Query::Pattern, options#Hash, &block#Proc) at /Users/justin/workspace/rdf/lib/rdf/mixin/queryable.rb:86
    #2  RDF::Graph.query_pattern(pattern#RDF::Query::Pattern, options#Hash, &block#Proc) at /Users/justin/workspace/rdf/lib/rdf/model/graph.rb:290
    #3  block in RDF::Queryable.block in enum_for(method#Symbol, *args#Array) at /Users/justin/workspace/rdf/lib/rdf/mixin/queryable.rb:303
    ͱ-- #4  Enumerator::Generator.each(*args) at /Users/justin/workspace/rdf/lib/rdf/mixin/enumerable.rb:159
    ͱ-- #5  Enumerator.each(*args) at /Users/justin/workspace/rdf/lib/rdf/mixin/enumerable.rb:159
    #6  RDF::Enumerable.each_statement(&block#Proc) at /Users/justin/workspace/rdf/lib/rdf/mixin/enumerable.rb:159
    #7  RDF::Writable.insert_statements(statements#RDF::Queryable::Enumerator) at /Users/justin/workspace/rdf/lib/rdf/mixin/writable.rb:129
    #8  RDF::Writable.<<(data#RDF::Queryable::Enumerator) at /Users/justin/workspace/rdf/lib/rdf/mixin/writable.rb:32
    #9  RDF::Mutable.<<(data#RDF::Queryable::Enumerator) at /Users/justin/workspace/rdf/lib/rdf/mixin/mutable.rb:77
    #10 ActiveTriples::RDFSource.reload at /Users/justin/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/active-triples-0.7.2/lib/active_triples/rdf_source.rb:297

@jcoyne
Copy link
Contributor Author

jcoyne commented Oct 30, 2015

So we're cutting out:

    #2  RDF::Graph.each(&block#Proc) at /Users/justin/workspace/rdf/lib/rdf/model/graph.rb:262
    #3  ActiveTriples::Resource.each(*args#Array, &block#Proc) at /Users/justin/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/active-triples-0.7.2/lib/active_triples/rdf_source.rb:76
    ͱ-- #4  Enumerable.grep() at /Users/justin/workspace/rdf/lib/rdf/mixin/queryable.rb:151
    #5  RDF::Queryable.query_pattern(pattern#RDF::Query::Pattern, options#Hash, &block#Proc) at /Users/justin/workspace/rdf/lib/rdf/mixin/queryable.rb:151

and replacing it with:

    #2  RDF::Graph.query_pattern(pattern#RDF::Query::Pattern, options#Hash, &block#Proc) at /Users/justin/workspace/rdf/lib/rdf/model/graph.rb:290

@gkellogg
Copy link

This must depend on some other factor. In the first case, something is causing the statements to be turned into an array, which invokes the grep version of query_pattern. In the second case, you're avoiding an array representation, so you can stick with the Graph version of query_pattern. I can't tell specifically from the call graph why this would be.

It would be useful to see some stand-alone code which reproduced each case. It would also be useful to see if query_pattern can be eliminated entirely; I'm not clear on why adding all statements involves a query.

@jcoyne
Copy link
Contributor Author

jcoyne commented Oct 30, 2015

@gkellogg Is this because AT::RDFSource includes Queryable and has a graph (rather than is a graph)? Since it's already delegating each to the graph, why not query too?

@gkellogg
Copy link

Sure, that's why. If you implemented your own query_pattern that delegates to the graph, that would probably do the trick too. You'd also need to consider query_execute. Just delegating directly to the graph seems reasonable.

The point is, if you are extending Queryable, you need to consider the methods it should implement.

@no-reply
Copy link
Member

Following this discussion, it seems like the failing tests are indeed testing an implementation detail, rather than the expected behavior.

If they can be fixed quickly, I think that's best. Otherwise, I'm.okay with opening a ticket, suppressing the failure, and merging.

@jcoyne
Copy link
Contributor Author

jcoyne commented Oct 30, 2015

@gkellogg Graph#query_pattern is private, so I can't delegate to it: https://github.com/ruby-rdf/rdf/blob/develop/lib/rdf/model/graph.rb#L285

@tpendragon
Copy link
Contributor

@gkellogg So are you 👍 to delegating to the inner graph? The problem is effectively that AT::Resources include RDF::Enumerable, which greps over #each_statement, which for AT::Resources is obtained from the graph powering the resource.

@gkellogg
Copy link

In this case, it seems like the right thing to do. I was mostly concerned if the speed issues were central to RDF.rb, which it seems they are not. So +1 to your delegation mechanism.

@tpendragon
Copy link
Contributor

I was mostly concerned if the speed issues were central to RDF.rb, which it seems they are not.

There are still speed issues, but I'm not sure if I can fix them - RDF::Repository's #query_pattern implementation seems to be close to as fast as it can be.

no-reply pushed a commit to ruby-rdf/rdf-spec that referenced this pull request Oct 31, 2015
Specs for `Queryable#query` tested internal implementation rather than
behavior, this caused issues for classes mixing in `Queryable`, but
delegating `#query` to some collaborator. See,
e.g. ActiveTriples/ActiveTriples#169.

This tests the contract directly, instead.
@tpendragon
Copy link
Contributor

@jcoyne It looks like @no-reply fixed RDF-spec. Can you try it out?

@no-reply
Copy link
Member

no-reply commented Nov 2, 2015

There's still another fix to go in related to ruby-rdf/rdf-spec#33. See the comments there.

A PR is welcome. Otherwise, I'm probably getting to it this evening.

@no-reply
Copy link
Member

no-reply commented Nov 2, 2015

See especially: ruby-rdf/rdf-spec#33 (comment)

Queryable#query filters the result of #each which is calling Graph#query
anyway.
jcoyne and others added 2 commits November 4, 2015 11:44
RDF.rb is now more aggressive about rejecting statements with empty
subject, predicate, or object on attempted insertion to a Graph. This
acknowledges that change and adjusts its specs to use a different
statement.
no-reply pushed a commit that referenced this pull request Nov 4, 2015
Graph#query is faster than using Queryable#query
@no-reply no-reply merged commit 8442154 into develop Nov 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants