Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(gravsearch): Prevent duplicate results #1626

Merged
merged 47 commits into from Apr 16, 2020
Merged

Conversation

@loicjaouen
Copy link
Contributor

@loicjaouen loicjaouen commented Mar 10, 2020

fixes #1588

a simple query in the available anything data set:

PREFIX knora-api: <http://api.knora.org/ontology/knora-api/v2#>
PREFIX anything: <http://0.0.0.0:3333/ontology/0001/anything/v2#>
CONSTRUCT {
    ?thing knora-api:isMainResource true .
    ?thing anything:hasInteger ?int .
} WHERE {
    ?thing a knora-api:Resource .
    ?thing a anything:Thing .
    {
        ?thing anything:hasRichtext ?richtext .
        ?richtext knora-api:valueAsString ?richtextLitteral
        FILTER knora-api:match(?richtextLitteral, "test")

		?thing anything:hasInteger ?int .
		?int knora-api:intValueAsInt 1
    }
    UNION
    {
        ?thing anything:hasText ?text .
        ?text knora-api:valueAsString ?textLitteral
        FILTER knora-api:match(?textLitteral, "test")

		?thing anything:hasInteger ?int .
		?int knora-api:intValueAsInt 1
    }
}
order by (?int)
  • gives a match count of one with the route: /v2/searchextended/count
  • gives a result set of two with the route: /v2/searchextended

the in and outs of this have been discussed in the above mentionned #1588

if ever it helps, here are the generated sparql requests, for the count:

SELECT DISTINCT (COUNT(DISTINCT ?thing) AS ?count)
WHERE {
    ?thing <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#Resource> .
    GRAPH <http://www.ontotext.com/explicit> {
        ?thing <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
    }
    ?thing <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/0001/anything#Thing> .
    {
        ?thing <http://www.knora.org/ontology/0001/anything#hasRichtext> ?richtext .
        GRAPH <http://www.ontotext.com/explicit> {
            ?richtext <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
        }
        ?richtext <http://www.knora.org/ontology/knora-base#valueHasString> ?richtextLitteral .
        ?thing <http://www.knora.org/ontology/0001/anything#hasInteger> ?int .
        GRAPH <http://www.ontotext.com/explicit> {
            ?int <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
        }
        ?int <http://www.knora.org/ontology/knora-base#valueHasInteger> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
        GRAPH <http://www.ontotext.com/explicit> {
            ?richtextLitteral <http://www.ontotext.com/owlim/lucene#fullTextSearchIndex> "test"^^<http://www.w3.org/2001/XMLSchema#string> .
        }
    } UNION {
        ?thing <http://www.knora.org/ontology/0001/anything#hasText> ?text .
        GRAPH <http://www.ontotext.com/explicit> {
            ?text <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
        }
        ?text <http://www.knora.org/ontology/knora-base#valueHasString> ?textLitteral .
        ?thing <http://www.knora.org/ontology/0001/anything#hasInteger> ?int .
        GRAPH <http://www.ontotext.com/explicit> {
            ?int <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
        }
        ?int <http://www.knora.org/ontology/knora-base#valueHasInteger> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
        GRAPH <http://www.ontotext.com/explicit> {
            ?textLitteral <http://www.ontotext.com/owlim/lucene#fullTextSearchIndex> "test"^^<http://www.w3.org/2001/XMLSchema#string> .
        }
    }
}
LIMIT 1

for the search:

SELECT DISTINCT ?thing (GROUP_CONCAT(DISTINCT(?text); SEPARATOR='') AS ?text__Concat) (GROUP_CONCAT(DISTINCT(?int); SEPARATOR='') AS ?int__Concat) (GROUP_CONCAT(DISTINCT(?richtext); SEPARATOR='') AS ?richtext__Concat)
WHERE {
    ?thing <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/knora-base#Resource> .
    GRAPH <http://www.ontotext.com/explicit> {
        ?thing <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
    }
    ?thing <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.knora.org/ontology/0001/anything#Thing> .
    {
        ?thing <http://www.knora.org/ontology/0001/anything#hasRichtext> ?richtext .
        GRAPH <http://www.ontotext.com/explicit> {
            ?richtext <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
        }
        ?richtext <http://www.knora.org/ontology/knora-base#valueHasString> ?richtextLitteral .
        ?thing <http://www.knora.org/ontology/0001/anything#hasInteger> ?int .
        GRAPH <http://www.ontotext.com/explicit> {
            ?int <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
        }
        GRAPH <http://www.ontotext.com/explicit> {
            ?int <http://www.knora.org/ontology/knora-base#valueHasInteger> ?int__valueHasInteger .
        }
        ?int <http://www.knora.org/ontology/knora-base#valueHasInteger> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
        GRAPH <http://www.ontotext.com/explicit> {
            ?richtextLitteral <http://www.ontotext.com/owlim/lucene#fullTextSearchIndex> "test"^^<http://www.w3.org/2001/XMLSchema#string> .
        }
    } UNION {
        ?thing <http://www.knora.org/ontology/0001/anything#hasText> ?text .
        GRAPH <http://www.ontotext.com/explicit> {
            ?text <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
        }
        ?text <http://www.knora.org/ontology/knora-base#valueHasString> ?textLitteral .
        ?thing <http://www.knora.org/ontology/0001/anything#hasInteger> ?int .
        GRAPH <http://www.ontotext.com/explicit> {
            ?int <http://www.knora.org/ontology/knora-base#isDeleted> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
        }
        ?int <http://www.knora.org/ontology/knora-base#valueHasInteger> "1"^^<http://www.w3.org/2001/XMLSchema#integer> .
        GRAPH <http://www.ontotext.com/explicit> {
            ?textLitteral <http://www.ontotext.com/owlim/lucene#fullTextSearchIndex> "test"^^<http://www.w3.org/2001/XMLSchema#string> .
        }
    }
}
GROUP BY ?thing ?int__valueHasInteger
ORDER BY ASC(?int__valueHasInteger) ASC(?thing)
LIMIT 25
benjamingeer and others added 22 commits Feb 19, 2020
# Conflicts:
#	webapi/src/main/scala/org/knora/webapi/util/StringFormatter.scala
@loicjaouen loicjaouen added this to the Backlog milestone Mar 10, 2020
@loicjaouen loicjaouen linked an issue that may be closed by this pull request Mar 10, 2020
@loicjaouen
Copy link
Contributor Author

@loicjaouen loicjaouen commented Mar 10, 2020

so for now, this is only the test case demonstrating the bug of the issue #1588

Sorry @benjamingeer, I assign this to you as you are also the assignee on #1588

@benjamingeer
Copy link
Collaborator

@benjamingeer benjamingeer commented Mar 10, 2020

@loicjaouen What would you like to change in the generated SPARQL SELECT?

We need the GROUP_CONCAT for each variable whose values you want to receive in the API response.

@benjamingeer
Copy link
Collaborator

@benjamingeer benjamingeer commented Mar 19, 2020

After talking with @tobiasschweizer, we decided to handle further refactoring in a separate PR (see issue #1632).

# Conflicts:
#	webapi/src/main/scala/org/knora/webapi/responders/v2/search/MainQueryResultProcessor.scala
- Simplify Gravsearch paging.
- Add design docs.
# Conflicts:
#	docs/src/paradox/05-internals/design/api-v2/gravsearch.md
# Conflicts:
#	docs/src/paradox/05-internals/design/api-v2/gravsearch.md
#	webapi/src/main/scala/org/knora/webapi/responders/v2/SearchResponderV2.scala
#	webapi/src/main/scala/org/knora/webapi/responders/v2/search/MainQueryResultProcessor.scala
#	webapi/src/main/scala/org/knora/webapi/responders/v2/search/QueryTraverser.scala
#	webapi/src/main/scala/org/knora/webapi/responders/v2/search/gravsearch/prequery/AbstractPrequeryGenerator.scala
#	webapi/src/main/scala/org/knora/webapi/responders/v2/search/gravsearch/prequery/NonTriplestoreSpecificGravsearchToPrequeryGenerator.scala
@benjamingeer benjamingeer mentioned this pull request Apr 6, 2020
7 of 7 tasks complete
benjamingeer added 3 commits Apr 8, 2020
# Conflicts:
#	docs/src/paradox/05-internals/design/api-v2/gravsearch.md
#	webapi/src/main/scala/org/knora/webapi/messages/v2/responder/resourcemessages/ResourceMessagesV2.scala
#	webapi/src/main/scala/org/knora/webapi/responders/v2/ResourcesResponderV2.scala
#	webapi/src/main/scala/org/knora/webapi/responders/v2/SearchResponderV2.scala
#	webapi/src/main/scala/org/knora/webapi/responders/v2/StandoffResponderV2.scala
#	webapi/src/main/scala/org/knora/webapi/util/ConstructResponseUtilV2.scala
#	webapi/src/test/scala/org/knora/webapi/util/ConstructResponseUtilV2Spec.scala
@tobiasschweizer
Copy link
Contributor

@tobiasschweizer tobiasschweizer commented Apr 9, 2020

@benjamingeer I will review this PR today in the afternoon.

@tobiasschweizer
Copy link
Contributor

@tobiasschweizer tobiasschweizer commented Apr 9, 2020

Doing this review will also help me understand our own paper :-)

Copy link
Contributor

@tobiasschweizer tobiasschweizer left a comment

I had a first look, please see my comments. However, I would appreciate if I could talk to you on video chat some time next week to make sure I understand the changes introduced by this PR.

The prequery's SELECT clause is built by
`NonTriplestoreSpecificGravsearchToPrequeryTransformer.getSelectColumns`,
based on the variables used in the input query's `CONSTRUCT` clause.
The resulting SELECT clause looks as follows:

```sparql
SELECT DISTINCT

This comment has been minimized.

@tobiasschweizer

tobiasschweizer Apr 9, 2020
Contributor

I am unaware of the construct (GROUP_CONCAT(DISTINCT(IF(BOUND(?book), STR(?book), "")); SEPARATOR='') AS ?book__Concat) (82f8a55). I presume this means that if the variable is bound, an IRI is returned, otherwise an empty string.

This comment has been minimized.

@benjamingeer

benjamingeer Apr 9, 2020
Collaborator

Yes. This is explained in the paragraph starting on line 211:

Each GROUP_CONCAT checks whether the concatenated variable is bound in each result in the group; if a variable is unbound, we concatenate an empty string. This is necessary because, in Apache Jena (and perhaps other triplestores), "If GROUP_CONCAT has an unbound value in the list of values to concat, the overall result is 'error'" (see this Jena issue).

@@ -281,7 +282,8 @@ class SearchResponderV2(responderData: ResponderData) extends ResponderWithStand

// Create a Select prequery

nonTriplestoreSpecificConstructToSelectTransformer: NonTriplestoreSpecificGravsearchToCountPrequeryGenerator = new NonTriplestoreSpecificGravsearchToCountPrequeryGenerator(
nonTriplestoreSpecificConstructToSelectTransformer: NonTriplestoreSpecificGravsearchToCountPrequeryTransformer = new NonTriplestoreSpecificGravsearchToCountPrequeryTransformer(
constructClause = inputQuery.constructClause,

This comment has been minimized.

@tobiasschweizer

tobiasschweizer Apr 9, 2020
Contributor

I assume you need to know about the Construct clause because you only want to return the IRIs of the resources and values that the client asks for.

This comment has been minimized.

@benjamingeer

benjamingeer Apr 9, 2020
Collaborator

Yes.

* the search criteria. This query will be used to get resource IRIs for a single page of results. These IRIs will be included in a CONSTRUCT
* query to get the actual results for the page.
*
* @param typeInspectionResult the result of type inspection of the original query.

This comment has been minimized.

@tobiasschweizer

tobiasschweizer Apr 9, 2020
Contributor

Could you add all args to the doc string?

This comment has been minimized.

@benjamingeer

benjamingeer Apr 9, 2020
Collaborator

Done in 8d4fdfb.


/**
* Transforms a preprocessed CONSTRUCT query into a SELECT query that returns only the IRIs and sort order of the main resources that matched
* the search criteria. This query will be used to get resource IRIs for a single page of results. These IRIs will be included in a CONSTRUCT

This comment has been minimized.

@tobiasschweizer

tobiasschweizer Apr 9, 2020
Contributor

I think the comment should include additional information such as:

Transforms a preprocessed CONSTRUCT query into a SELECT query that returns only the IRIs and sort order of the main resources that matched the search criteria and are requested by client in the input query's WHERE clause.

This comment has been minimized.

@benjamingeer

benjamingeer Apr 9, 2020
Collaborator

Done in 8d4fdfb.

@@ -0,0 +1,345 @@
/*

This comment has been minimized.

@tobiasschweizer

tobiasschweizer Apr 9, 2020
Contributor

Has NonTriplestoreSpecificGravsearchToPrequeryGenerator.scala been renamed / moved to NonTriplestoreSpecificGravsearchToPrequeryTransformer? If yes, I don't see a diff (it looks like the files are unrelated) and it's hard for me to see what has changed exactly.

This comment has been minimized.

@benjamingeer

benjamingeer Apr 9, 2020
Collaborator

Yes, will attach the diff below.

@benjamingeer
Copy link
Collaborator

@benjamingeer benjamingeer commented Apr 9, 2020

Here's the diff you asked for:

prequery.diff.zip

@benjamingeer
Copy link
Collaborator

@benjamingeer benjamingeer commented Apr 9, 2020

I would appreciate if I could talk to you on video chat some time next week to make sure I understand the changes introduced by this PR.

No problem. I'll take a couple of days of holiday next week. How about we talk on Friday 16 April?

@tobiasschweizer
Copy link
Contributor

@tobiasschweizer tobiasschweizer commented Apr 9, 2020

I would appreciate if I could talk to you on video chat some time next week to make sure I understand the changes introduced by this PR.

No problem. I'll take a couple of days of holiday next week. How about we talk on Friday 16 April?

Works for me. I would prefer to have an online meeting before 12. Would 10 o'clock work for you?

@benjamingeer
Copy link
Collaborator

@benjamingeer benjamingeer commented Apr 9, 2020

Would 10 o'clock work for you?

Perfect!

@tobiasschweizer
Copy link
Contributor

@tobiasschweizer tobiasschweizer commented Apr 9, 2020

Would 10 o'clock work for you?

Perfect!

The appointment you created is on a Thursday, but that's also ok for me.

Copy link
Contributor

@tobiasschweizer tobiasschweizer left a comment

Thanks for the additional comments and adaptions.

@benjamingeer
Copy link
Collaborator

@benjamingeer benjamingeer commented Apr 16, 2020

Thanks for the review!

@benjamingeer benjamingeer merged commit 9313b88 into develop Apr 16, 2020
8 checks passed
8 checks passed
Build Everything
Details
API Unit Tests
Details
API E2E Tests
Details
API Integration Tests
Details
JS Lib Tests (11.x)
Details
Upgrade Unit Tests
Details
Upgrade Integration Tests
Details
Docs Build Test
Details
@benjamingeer benjamingeer deleted the wip/1588_duplicates branch Apr 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked issues

Successfully merging this pull request may close these issues.

3 participants