Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic support of SERVICE clause #793

Merged
merged 5 commits into from
Feb 22, 2023
Merged

Basic support of SERVICE clause #793

merged 5 commits into from
Feb 22, 2023

Conversation

hannahbast
Copy link
Member

@hannahbast hannahbast commented Sep 18, 2022

First working implementation of SERVICE

Basic principle: Send the query inside the SERVICE clause to the remote endpoint and turn the result into a VALUES clause, which is then processed using the (in the meantime already committed) code from #820 and #822 and the (soon to be committed) code from #823.

  1. Update parsedQuery::Values to hold a table of TripleComponents
    instead of std::strings as before. Modify the SparqlQleverVisitor accordingly.

  2. Update the Values : Operation class to add OOV entries to the
    LocalVocab (any row containing an oov entry was ignored so far).
    Update the JOIN operation to propagate the localVocab if only of the
    operands has one (which is a frequent use case).

  3. Used the occasion to create a separate LocalVocab class with basic
    functionality for adding words to a local vocabulary once.

  4. Add a new class HttpClient which can open HTTP and HTTPS connections
    to a given host, and which can then send GET or POST requests, receive the result synchronously, and write it to a (potentially very large)
    string. Uses Boost.Beast, just like our HttpServer.

  5. Wrote a test for our HttpServer and HttpClient. However, the test so far only tests the HTTP case because our HttpServer just does HTTP so
    far. Also used the occasion to rename util/HttpServer to util/http
    because httpServer was simply a misnomer. This change required very many
    small changes (in includes and in the various CMakeLists.txt).

@hannahbast hannahbast force-pushed the support-service-clause branch 3 times, most recently from 690aa00 to e504cc6 Compare November 9, 2022 03:01
hannahbast pushed a commit that referenced this pull request Nov 9, 2022
The LocalVocab used to be an alias for std::vector<std::string> (defined
inside of the ResultTable class). New words were pushed to this vector
without checking whether they were already in the vocabulary (exception:
when a sequence of identical new words were encountered in a SPARQL
expression result, the words was only added once per sequence).

The basic data is still a std::vector<std::string>, but there is now a
phase, started by LocalVocab::startConstructionPhase() and ended by
LocalVocab::endConstrutionPhase(), where new words can be added and a
words-to-ID map records which words have already been added. After that
phase, the map is deleted and the local vocabulary is read-only. Check
test/LocalVocabTest for how it works in prinicple.

This is also used by the Values class in PR #793 (and I tested it
already quite extensively and it works).
hannahbast pushed a commit that referenced this pull request Nov 9, 2022
The LocalVocab used to be an alias for std::vector<std::string> (defined
inside of the ResultTable class). New words were pushed to this vector
without checking whether they were already in the vocabulary (exception:
when a sequence of identical new words were encountered in a SPARQL
expression result, the words was only added once per sequence).

The basic data is still a std::vector<std::string>, but there is now a
phase, started by LocalVocab::startConstructionPhase() and ended by
LocalVocab::endConstrutionPhase(), where new words can be added and a
words-to-ID map records which words have already been added. After that
phase, the map is deleted and the local vocabulary is read-only. Check
test/LocalVocabTest for how it works in prinicple.

This is also used by the Values class in PR #793 (and I tested it
already quite extensively and it works).
hannahbast pushed a commit that referenced this pull request Nov 9, 2022
The LocalVocab used to be an alias for std::vector<std::string> (defined
inside of the ResultTable class). New words were pushed to this vector
without checking whether they were already in the vocabulary (exception:
when a sequence of identical new words were encountered in a SPARQL
expression result, the words was only added once per sequence).

The basic data is still a std::vector<std::string>, but there is now a
phase, started by LocalVocab::startConstructionPhase() and ended by
LocalVocab::endConstrutionPhase(), where new words can be added and a
words-to-ID map records which words have already been added. After that
phase, the map is deleted and the local vocabulary is read-only. Check
test/LocalVocabTest for how it works in prinicple.

This is also used by the Values class in PR #793 (and I tested it
already quite extensively and it works).
@hannahbast hannahbast force-pushed the support-service-clause branch 3 times, most recently from 294124b to d8f9c32 Compare November 11, 2022 02:11
@hannahbast hannahbast changed the title Support service clause Support SERVICE clause Nov 11, 2022
@hannahbast hannahbast changed the title Support SERVICE clause Basic support of SERVICE clause Nov 11, 2022
@hannahbast hannahbast force-pushed the support-service-clause branch 7 times, most recently from db4774f to 0026bae Compare November 18, 2022 04:06
@hannahbast
Copy link
Member Author

hannahbast commented Nov 18, 2022

I have rebased and adapted this PR to the current master and it works (it's currently active on https://qlever.cs.uni-freiburg.de/wikidata and https://qlever.cs.uni-freiburg.de/osm-germany). But the code still needs quite a bit of work. Here is a list of TODOs:

  1. The following query gives a parse error, probably because it was not fully received: https://qlever.cs.uni-freiburg.de/osm-germany/1K6qfC

  2. Specifying a LIMIT in the SERVICE query has no effect because the LIMIT is ignored in subqueries (fix: check if the SERVICE query has a LIMIT and if yes, remove it and append it to the query sent to the endpoint): https://qlever.cs.uni-freiburg.de/osm-germany/EkOHgA

  3. There is no proper error message when the connection to the specified endpoint failed (for example because of a malformed IRI): https://qlever.cs.uni-freiburg.de/osm-germany/pK0eDB

  4. Create result table directly and not via VALUES clause

  5. Fix the time measurement for the SERVICE operation, it currently only measures the time for the part in computeResult (I think)

  6. Refactor the HttpClient, so that we can just pass the Url to it and don't need to differentiate between HTTP and HTTPS when using the class (the class can hide that from us because we don't really care).

@hannahbast hannahbast marked this pull request as ready for review November 18, 2022 05:03
@hannahbast hannahbast marked this pull request as draft November 18, 2022 05:04
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a first detailed look on this code and noticed,

  1. Some smaller suggestions
  2. A general conceptual question (how much of the remaining work will you do for this PR and how much should we leave to someone else)
  3. The lack of unit tests (start with the parsing, there it should be possible, and think about how we can test/ mock the HTTP stuff in the Service clause (but then again, we now have gained experience with writing unit tests for Http Client/Server pairs, so maybe here we can do something similar.

CMakeLists.txt Outdated
######################################
# SSL
######################################
find_package(OpenSSL REQUIRED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that the one with the heartbleed?

src/engine/ResultTable.cpp Outdated Show resolved Hide resolved
// The prologue (prefix definitions).
std::string servicePrologue_;
// The body of the SPARQL query for the remote endpoint.
std::string serviceQueryBody_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, the graphPattern_ and the serviceQueryBody contain the same content, but both are needed (one parsed, one as a string). This should be commented.
Also the names are confusing, maybe graphPattern_ and graphPatternAsSparql_.

Parser::GraphGraphPatternContext* ctx) {
reportNotSupported(ctx, "Named Graphs (FROM, GRAPH) are");
parsedQuery::Service Visitor::visit(Parser::ServiceGraphPatternContext* ctx) {
std::string serviceVarOrIri = ctx->varOrIri()->getText();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this (in the grammar) can really be a var or an Iri, then we should catch the var case here (I don't think you support variable endpoints at this time, is that correct?
In this case we should check (maybe explicitly visit the var or Iri, and then check for a variable),
and report a useful error here.

Also then maybe the member in the Service clause should be called serviceIri if there is no implementation of a var.
(Or maybe, look up in the standard, maybe it is easy to implement, what are the semantics if that variable is bound to zero or more than one value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now consider this in SparqlQLeverVisitor.cpp, but stumbled over the peculiarity that visit(VarOrIriContext*) returns VarOrTerm (which is a std::variant<Variable, GraphTerm>, where GraphTerm is a std::variant<Literal, BlankNode, Iri>) and not the expected std::variant<Variable, Iri>.

src/parser/sparqlParser/SparqlQleverVisitor.cpp Outdated Show resolved Hide resolved
src/engine/Service.cpp Outdated Show resolved Hide resolved
src/engine/Service.cpp Outdated Show resolved Hide resolved
src/engine/Service.cpp Show resolved Hide resolved
src/engine/Service.cpp Outdated Show resolved Hide resolved
src/engine/Service.cpp Outdated Show resolved Hide resolved
@hannahbast hannahbast force-pushed the support-service-clause branch 3 times, most recently from 1463e12 to 4a348d4 Compare November 28, 2022 12:58
@hannahbast hannahbast marked this pull request as ready for review November 28, 2022 13:02
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now already very clean and understandable.
Most of my suggestions are small structural improvements,
there is one discussion to be held,
and the unit tests of the actual SERVICE functionality are yet missing.

src/engine/Service.cpp Show resolved Hide resolved
src/engine/Service.cpp Outdated Show resolved Hide resolved
src/engine/Service.cpp Outdated Show resolved Hide resolved
src/engine/Service.cpp Outdated Show resolved Hide resolved
src/engine/Service.cpp Outdated Show resolved Hide resolved
src/util/Algorithm.h Outdated Show resolved Hide resolved
src/util/http/HttpClient.cpp Outdated Show resolved Hide resolved
src/util/http/HttpUtils.h Outdated Show resolved Hide resolved
test/AlgorithmTest.cpp Outdated Show resolved Hide resolved
src/parser/sparqlParser/SparqlQleverVisitor.cpp Outdated Show resolved Hide resolved
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now very close to merging, I just left some comments on small details.
Let me know, for which aspects you need helpt.

src/engine/Service.cpp Outdated Show resolved Hide resolved
src/engine/Service.cpp Outdated Show resolved Hide resolved
src/engine/Values.cpp Outdated Show resolved Hide resolved
src/engine/Values.h Outdated Show resolved Hide resolved
src/parser/TripleComponent.h Outdated Show resolved Hide resolved
test/HttpTestHelpers.h Outdated Show resolved Hide resolved
test/HttpTestHelpers.h Outdated Show resolved Hide resolved
test/ServiceTest.cpp Outdated Show resolved Hide resolved
Comment on lines 60 to 68
std::string_view body = match.template get<2>();
std::string response = absl::StrCat(
absl::StrJoin(absl::StrSplit(variables, " "), "\t"), "\n", body);
// std::cout << "Reponse (TSV): " << std::endl << response;
co_return co_await send(ad_utility::httpUtils::createOkResponse(
response, request, ad_utility::MediaType::tsv));
});
httpServer.runInOwnThread();
Iri serviceIri{absl::StrCat("<http://localhost:", httpServer.getPort(), ">")};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should definitely write this test without using the Server.
Given your implementation of Service (it takes a plain function pointer and is not templated) you would have to
write a free function here with the correct interface, and to specify the test query for multiple cases you would have to make it access a (file-scoped) local variable.

The alternative is to store not a plain function pointer in the Service class, but a std::function<istringstream(whatEverTheArguments)>. This can hold a plain function pointer as well as a capturing lambda, then you can also work with a lambda that takes the query string and returns a lambda here.

(The overhead of std::function is ok for the Service class, since we have to send HttpRequests afterwards.

test/ServiceTest.cpp Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jan 21, 2023

Codecov Report

Base: 69.28% // Head: 69.44% // Increases project coverage by +0.16% 🎉

Coverage data is based on head (62832f9) compared to base (0e4a076).
Patch coverage: 92.77% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #793      +/-   ##
==========================================
+ Coverage   69.28%   69.44%   +0.16%     
==========================================
  Files         238      240       +2     
  Lines       23481    23651     +170     
  Branches     3079     3093      +14     
==========================================
+ Hits        16269    16425     +156     
- Misses       5914     5919       +5     
- Partials     1298     1307       +9     
Impacted Files Coverage Δ
src/engine/LocalVocab.h 100.00% <ø> (ø)
src/engine/QueryExecutionTree.h 57.69% <ø> (ø)
src/parser/GraphPatternOperation.cpp 37.64% <0.00%> (-1.38%) ⬇️
src/parser/GraphPatternOperation.h 63.63% <ø> (ø)
src/parser/ParsedQuery.cpp 57.45% <0.00%> (-0.16%) ⬇️
src/parser/sparqlParser/SparqlQleverVisitor.h 95.00% <ø> (ø)
src/engine/CheckUsePatternTrick.cpp 93.75% <50.00%> (+0.08%) ⬆️
src/engine/Service.h 75.00% <75.00%> (ø)
src/engine/Service.cpp 94.87% <94.87%> (ø)
src/parser/sparqlParser/SparqlQleverVisitor.cpp 85.97% <97.50%> (+0.24%) ⬆️
... and 6 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

hannahbast pushed a commit that referenced this pull request Feb 20, 2023
Factored out from PR #793 (Basic support of SERVICE clause)
hannahbast pushed a commit that referenced this pull request Feb 20, 2023
Factored out from PR #793 (Basic support of SERVICE clause)
New version rebased to the current master and with additional tests.

The improvements of the handling of `Values` and `LocalVocab` (PR #880)
and the improvements in `http/util` (PR #900) have already been factored
out. So what is remaining here directly pertains to the implementation
of SERVICE.

TODO: Tests in `test/SparqlAntlrParserTest.cpp` are still missing, maybe
Johannes can help me with that.
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is now very clean and also well-tested.
I have sent you a message about your questions on the missing coverage,
feel free to integrate them, otherwise this can be merged.

We can't, with a reasonable effort, help the other four places.
@hannahbast hannahbast merged commit 857099c into master Feb 22, 2023
@hannahbast
Copy link
Member Author

@joka921 The end of a mini journey, thank you for your valuable help, Johannes!

@hannahbast hannahbast deleted the support-service-clause branch February 8, 2024 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants