Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All changes needed for the Evaluation paper #355

Closed
wants to merge 202 commits into from

Conversation

joka921
Copy link
Member

@joka921 joka921 commented Oct 21, 2020

Everything merged together which is also in single PRs and needed for the autcompletion to work.

DO NOT MERGE AS IS

-- now we have completely removed the dependency from google::sparsehash and migrated to absl.
-- Note that so far, no unescaping is performed.
- Transitive Paths use HashSets which now also have undefined orderings with absl. Tests don't rely on this anymore

- The case, where a prefix was used with an empty "content" (e.g. <a> wd: <b>) was broken before,
  luckily there was a unit test and this is now fixed.
…TL file.

- Previously there was a "parsing of ttl has failed, but there is still content left" warning, although
  the remainder of the ttl input was only whitespace.

- This was due to a bug in the Parser's skipWhitespace() function which failed if the input consisted of ONLY whitespace. This is now fixed.
…ashSet

# Conflicts:
#	src/parser/TurtleParser.cpp
#	src/parser/TurtleParser.h
- also apply the normalization of literals correctly during index build time
- Adapt the Index unit tests to "legal" knowledge bases
- Disable the CTRE parser for now, since it becomes awefully slow with the PnameNS and PnLocal changes for some reasons.
- TODO: Maybe we want to renable the CTRE Parser with the old "wrong"
  behavior as a very fast way to parse Wikidata
- The previous version put all triples and filters that appeared directly in a WHERE clause together.
- this is not correct, since always when an OPTIONAL, UNION, SUBQUERY etc. appears a new scope
  is started. For example in
  SELECT ?x ?y WHERE {
    ?x <is-a> <Scientist>
    OPTIONAL { ?x <Spouse> ?y}
    FILTER ?x < <Ada_Lovelace>
  }
  the filter should have no effect, because it is not in the same scope as the x is a scientist triple.
  the probably more useful query would be
  SELECT ?x ?y WHERE {
    ?x <is-a> <Scientist> .
    FILTER ?x < <Ada_Lovelace>
    OPTIONAL { ?x <Spouse> ?y}
  }
  where the filter is actually applied on the entities that are scientists.
  However, QLever previously handled these queries both in the second way.

- To correct this, implemented a BasicGraphPattern (triples, filters and values that are "directly" in the Body of another graph pattern or a WHERE clause) class.
  BasicGraphPatterns are also children of their parent body.

- In addition, Optionals were also treated wrong as the GraphPatterns must be treated in order which makes a difference for Optionals.

- This version of the QueryPlanner does some degree of optimization, however one could optimize multiple adjacent GraphPatterns that are not Optional, if their
  scopes are respected.

- TODO: There still are bugs in the handling of optionals as the SPARQL standard knows "unbound values" which can be rebound, e.g.

  SELECT ?x ?name WHERE {
    ?x <is-a> <human>
    OPTIONAL {?x <TwitterAccount> ?name}
    OPTIONAL {?x rdfs:label ?name }
  }
  which should only add the names only for humans that don't have twitter accounts and the handling of the empty query
  SELECT ?x WHERE {} (1 result where ?x is unbound/NULL)

- TODO: Some candidate plans found by optimizing a TransitivePath are currently unable to be joined with adjacent patterns.
        It seems that we can always find a suitable candidate, so that this works, but we still should inspect, whether this is the correct behavior.
…type of GraphPatternOperation.

The parser already compiles.
TODO: clean up, self-review, commit messages and squash/rewrite history.
…identical again.

Next step: make merge call join.
TODO<Hannah> checkout how this performs on the Wikidata evaluation.
TODO: squash, git history and then self-review (but first let them do the ParsedQuery stuff.)
…ern trick in cases where it was actually legal.
TODO: merge with the new optimizer to properly include it,
and make this all work and test it.
# Conflicts:
#	src/parser/ParsedQuery.h
f compile. Not yet done at all.
…rVariable).

- There also is a stub for BIND(<something> + ?else as ?sum), but that is not yet fully integrated yet.
- No more doubled application of Filters in case of BIND() clauses
- now the parser accepts the optional '.' after a bind clause.
- It keeps track of the number of maximally allowed bytes, using a synchronized, shared state, checks if an allocation is OK.
- The actual allocation is still done by a std::allocator.
- The IdTable now also internally uses an allocator template insteead of malloc and free.
- Currently only working in tests
- seem to have found a compiler bug in g++ 7.5
Required minor adaptations for a lot of Unit tests.

TODO: test, if we can indeed avoid the bad_allocs on galera.

TODO: more detailed unit tests.
TODO: limit the number of threads and make it configurable.
This greatly increases the time needed to start QLever
- The different steps (Handling the prefix Compression and the escaping) can be trivially parallelized
-The HashMaps in the Transitive Path operation previously contained
one unnecessary level of indirection which is removed by this commit.

- This increases the performance of the transitive paths.
Reason was an invalid dereferencing of an end-iterator.
joka921 added 10 commits June 9, 2021 09:34
…to pr328+pr330+pr332+pr342+pr34

# Conflicts:
#	src/engine/CancelableSort/SortExampleMain.cpp
#	src/engine/RuntimeInformation.h
#	src/engine/TransitivePath.cpp
#	src/index/Index.cpp
#	src/index/PrefixHeuristic.h
#	src/index/StringSortComparator.h
#	src/index/Vocabulary.h
#	src/index/VocabularyGeneratorImpl.h
@joka921 joka921 closed this Feb 28, 2024
@joka921 joka921 deleted the pr328+pr330+pr332+pr342+pr34 branch February 28, 2024 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants