Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gravsearch): Optimise Gravsearch queries using topological sort (DSP-1327) #1813

Merged
merged 42 commits into from Mar 2, 2021

Conversation

@benjamingeer
Copy link
Collaborator

@benjamingeer benjamingeer commented Feb 4, 2021

resolves DSP-1327
resolves DSP-1361

benjamingeer and others added 2 commits Feb 5, 2021
@SepidehAlassi SepidehAlassi marked this pull request as draft Feb 8, 2021
SepidehAlassi and others added 22 commits Feb 8, 2021
…ated by topologicalOrder
…tatements first.

- Fix DSP-1361.
@benjamingeer benjamingeer marked this pull request as ready for review Feb 22, 2021
@benjamingeer
Copy link
Collaborator Author

@benjamingeer benjamingeer commented Feb 23, 2021

@SepidehAlassi In NonTriplestoreSpecificGravsearchToPrequeryTransformerSpec, in the test reorder query patterns in where clause with union, the input UNION looks like this:

{
    ?thing anything:hasRichtext ?richtext .
    FILTER knora-api:matchText(?richtext, "test")
    ?thing anything:hasInteger ?int .
    ?int knora-api:intValueAsInt 1 .
}
UNION
{
    ?thing anything:hasText ?text .
    FILTER knora-api:matchText(?text, "test")
    ?thing anything:hasInteger ?int .
    ?int knora-api:intValueAsInt 3 .
}

After topological sort, it looks like this:

{
    ?thing anything:hasRichtext ?richtext .
    ?int knora-api:intValueAsInt 1 .
    ?thing anything:hasInteger ?int .
    FILTER(knora-api:matchText(?richtext, "test"))
} UNION {
    ?thing anything:hasText ?text .
    ?int knora-api:intValueAsInt 3 .
    ?thing anything:hasInteger ?int .
    FILTER(knora-api:matchText(?text, "test"))
}

I'm wondering if there's a way to make ?int knora-api:intValueAsInt 1 . come before ?thing anything:hasRichtext ?richtext .

@benjamingeer benjamingeer requested a review from SepidehAlassi Feb 23, 2021
Copy link
Contributor

@SepidehAlassi SepidehAlassi left a comment

thanks for all your work on this!

@benjamingeer
Copy link
Collaborator Author

@benjamingeer benjamingeer commented Feb 25, 2021

Thanks for doing most of it and for reviewing!

Copy link
Contributor

@SepidehAlassi SepidehAlassi left a comment

I just remembered we have not added any documentation about topological sorting and its feature toggle. Let's add that before merging this.

@SepidehAlassi
Copy link
Contributor

@SepidehAlassi SepidehAlassi commented Feb 26, 2021

BTW,

I'm wondering if there's a way to make ?int knora-api:intValueAsInt 1 . come before ?thing anything:hasRichtext ?richtext .

well, topological sorting works with the degree of nodes; here both ?thing and ?int are of 2nd degree, so both permutations (?thing, ?int, ?richtext, 1) and (?thing, ?int, 1, ?richtext) are valid orders. We can add a logic to consider the degree of nodes with respect to the number of outgoing edges instead of the number of total edges. In that case, ?int would be of 1st degree and ?thing of 2nd degree.
An easy approach to do this without changing the logic of findAllTopologicalOrders would be the following:
after calculating all permutations of topological sorting and obtaining results of findOrdersNotEndingWithObjectOfRdfType, instead of choosing any of ordersNotEndingWithObjectOfRdfType, we choose the one ordered with the number of outgoing edges.
That means; considering the example above, ordersNotEndingWithObjectOfRdfType would contain o1=(?thing, ?int, ?richtext, 1) and o2=(?thing, ?int, 1, ?richtext). Putting the root element, ?thing aside (because it is not an object of any statement), we take each node in these orders, find the statement it is the object of, calculate the number of outgoing edges of the node representing the subject of this statement. That would give for o1 -> (2, 2, 1) and for o2 -> (2,1,2). Then we choose the permutation which ends with the lowest number, which would be o1.

@SepidehAlassi
Copy link
Contributor

@SepidehAlassi SepidehAlassi commented Mar 1, 2021

@benjamingeer So I adjusted the code to get all permutations of the topological order according to Kahn's algorithm. For the following query

{
    ?thing anything:hasRichtext ?richtext .
    FILTER knora-api:matchText(?richtext, "test")
    ?thing anything:hasInteger ?int .
    ?int knora-api:intValueAsInt 1 .
}
UNION
{
    ?thing anything:hasText ?text .
    FILTER knora-api:matchText(?text, "test")
    ?thing anything:hasInteger ?int .
    ?int knora-api:intValueAsInt 3 .
}

After the topological sort, now it looks like this:

{
   ?int knora-api:intValueAsInt 1 .
    ?thing anything:hasRichtext ?richtext .
    ?thing anything:hasInteger ?int .
    FILTER(knora-api:matchText(?richtext, "test"))
} UNION {
    ?int knora-api:intValueAsInt 3 .
    ?thing anything:hasText ?text .
    ?thing anything:hasInteger ?int .
    FILTER(knora-api:matchText(?text, "test"))
}

As you can see with the corrected version of the code, ?int knora-api:intValueAsInt 1 . comes before ?thing anything:hasRichtext ?richtext .

For the first block of the union, there are two valid topological sorting permutations: O1=(?thing, ?richtext, ?int, 1) and O2=(?thing, ?int, ?richtext, 1). As can be seen above, O2 is chosen to sort the statements. However, I believe maybe we should define logic to prefer O1 instead to make the query even faster. What do you think?

@benjamingeer
Copy link
Collaborator Author

@benjamingeer benjamingeer commented Mar 1, 2021

So I adjusted the code to get all permutations of the topological order according to Kahn's algorithm.

Excellent!

However, I believe maybe we should define logic to prefer O1 instead to make the query even faster. What do you think?

Sounds good to me, how do we do that?

@SepidehAlassi
Copy link
Contributor

@SepidehAlassi SepidehAlassi commented Mar 1, 2021

Sounds good to me, how do we do that?

I wrote down the idea in DSP-1327. This can be implemented in a separate PR.

…h topological sorting
@SepidehAlassi
Copy link
Contributor

@SepidehAlassi SepidehAlassi commented Mar 1, 2021

@benjamingeer I added the documentation, can you please proof-read it and add any missing points?

@benjamingeer
Copy link
Collaborator Author

@benjamingeer benjamingeer commented Mar 2, 2021

This looks good to me, I proofread the docs. If you approve the PR we can merge it.

@SepidehAlassi SepidehAlassi self-requested a review Mar 2, 2021
Copy link
Contributor

@SepidehAlassi SepidehAlassi left a comment

@benjamingeer thank you.

@benjamingeer benjamingeer merged commit efbecee into main Mar 2, 2021
11 checks passed
11 checks passed
Build Everything
Details
Prepare next release
Details
Test client-test-data
Details
API Unit Tests
Details
API E2E Tests
Details
API Integration Tests
Details
Upgrade Integration Tests
Details
Docs Build Test
Details
Publish (on release only)
Details
Deploy docs (on release only)
Details
Google chat notification about release and published version
Details
@benjamingeer benjamingeer deleted the wip/DSP-1327-gravsearch branch Mar 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants