Skip to content
Permalink
Browse files

fix(gravsearch): Prevent duplicate results (#1626)

  • Loading branch information
loicjaouen committed Apr 16, 2020
1 parent e80bf52 commit 9313b88082e3d235bb5a9235fdee243c987c736b
Showing with 1,032 additions and 841 deletions.
  1. +20 −12 docs/src/paradox/05-internals/design/api-v2/gravsearch.md
  2. +12 −8 ...i/src/main/scala/org/knora/webapi/messages/v2/responder/resourcemessages/ResourceMessagesV2.scala
  3. +2 −2 webapi/src/main/scala/org/knora/webapi/responders/v1/ValuesResponderV1.scala
  4. +2 −0 webapi/src/main/scala/org/knora/webapi/responders/v2/ResourcesResponderV2.scala
  5. +49 −26 webapi/src/main/scala/org/knora/webapi/responders/v2/SearchResponderV2.scala
  6. +1 −0 webapi/src/main/scala/org/knora/webapi/responders/v2/StandoffResponderV2.scala
  7. +23 −213 webapi/src/main/scala/org/knora/webapi/responders/v2/search/MainQueryResultProcessor.scala
  8. +3 −18 webapi/src/main/scala/org/knora/webapi/responders/v2/search/QueryTraverser.scala
  9. +1 −1 webapi/src/main/scala/org/knora/webapi/responders/v2/search/SparqlTransformer.scala
  10. +118 −72 ...ala/org/knora/webapi/responders/v2/search/gravsearch/mainquery/GravsearchMainQueryGenerator.scala
  11. +53 −90 ...n/scala/org/knora/webapi/responders/v2/search/gravsearch/prequery/AbstractPrequeryGenerator.scala
  12. +17 −27 ...oCountPrequeryGenerator.scala → NonTriplestoreSpecificGravsearchToCountPrequeryTransformer.scala}
  13. +0 −206 ...esponders/v2/search/gravsearch/prequery/NonTriplestoreSpecificGravsearchToPrequeryGenerator.scala
  14. +378 −0 ...ponders/v2/search/gravsearch/prequery/NonTriplestoreSpecificGravsearchToPrequeryTransformer.scala
  15. +21 −0 ...scala/org/knora/webapi/responders/v2/search/gravsearch/types/GravsearchTypeInspectionResult.scala
  16. +13 −7 ...n/scala/org/knora/webapi/responders/v2/search/gravsearch/types/GravsearchTypeInspectionUtil.scala
  17. +5 −2 webapi/src/main/scala/org/knora/webapi/util/ConstructResponseUtilV2.scala
  18. +2 −1 webapi/src/main/scala/org/knora/webapi/util/StringFormatter.scala
  19. +4 −3 webapi/src/test/resources/test-data/searchR2RV2/IncomingLinksForBook.jsonld
  20. +9 −8 webapi/src/test/resources/test-data/searchR2RV2/LinkObjectsToBooks.jsonld
  21. +5 −4 webapi/src/test/resources/test-data/searchR2RV2/RegionsForPage.jsonld
  22. +104 −0 webapi/src/test/resources/test-data/searchR2RV2/ThingFromQueryWithUnion.jsonld
  23. +19 −18 webapi/src/test/resources/test-data/searchR2RV2/regionsOfZeitgloecklein.jsonld
  24. +52 −5 webapi/src/test/scala/org/knora/webapi/e2e/v2/SearchRouteV2R2RSpec.scala
  25. +7 −7 webapi/src/test/scala/org/knora/webapi/responders/v2/ResourcesResponderV2Spec.scala
  26. +3 −2 ...equeryGeneratorSpec.scala → NonTriplestoreSpecificGravsearchToCountPrequeryTransformerSpec.scala}
  27. +108 −109 ...hToPrequeryGeneratorSpec.scala → NonTriplestoreSpecificGravsearchToPrequeryTransformerSpec.scala}
  28. +1 −0 webapi/src/test/scala/org/knora/webapi/util/ConstructResponseUtilV2Spec.scala
@@ -174,16 +174,10 @@ PREFIX knora-api: <http://api.knora.org/ontology/knora-api/simple/v2#>
}
```

The prequery's SELECT clause is built using the member variables defined in `AbstractPrequeryGenerator`.
State of member variables after transformation of the input query into the prequery:

- `mainResourceVariable`: `QueryVariable(page)`
- `dependentResourceVariables`: `Set(QueryVariable(book))`
- `dependentResourceVariablesGroupConcat`: `Set(QueryVariable(book__Concat))`
- `valueObjectVariables`: `Set(QueryVariable(book__LinkValue), QueryVariable(seqnum))`: `?book` represents the dependent resource and `?book__LinkValue` the link value connecting `?page` and `?book`.
- `valueObjectVariablesGroupConcat`: `Set(QueryVariable(seqnum__Concat), QueryVariable(book__LinkValue__Concat))`

The resulting SELECT clause of the prequery looks as follows:
The prequery's SELECT clause is built by
`NonTriplestoreSpecificGravsearchToPrequeryTransformer.getSelectColumns`,
based on the variables used in the input query's `CONSTRUCT` clause.
The resulting SELECT clause looks as follows:

```sparql
SELECT DISTINCT
@@ -219,6 +213,11 @@ is unbound, we concatenate an empty string. This is necessary because, in Apache
triplestores), "If `GROUP_CONCAT` has an unbound value in the list of values to concat, the overall result is 'error'"
(see [this Jena issue](https://issues.apache.org/jira/browse/JENA-1856)).

If the input query contains a `UNION`, and a variable is bound in one branch
of the `UNION` and not in another branch, it is possible that the prequery
will return more than one row per main resource. To deal with this situation,
`SearchResponderV2` merges rows that contain the same main resource IRI.

### Main Query

The purpose of the main query is to get all requested information about the main resource, dependent resources, and value objects.
@@ -233,8 +232,17 @@ The classes involved in generating the main query can be found in

The main query is a SPARQL CONSTRUCT query. Its generation is handled by the
method `GravsearchMainQueryGenerator.createMainQuery`.
It takes three arguments: `mainResourceIris: Set[IriRef], dependentResourceIris:
Set[IriRef], valueObjectIris: Set[IRI]`. From the given Iris, statements are
It takes three arguments:
`mainResourceIris: Set[IriRef], dependentResourceIris: Set[IriRef], valueObjectIris: Set[IRI]`.

These sets are constructed based on information about variables representing
dependent resources and value objects in the prequery, which is provided by
`NonTriplestoreSpecificGravsearchToPrequeryTransformer`:

- `dependentResourceVariablesGroupConcat`: `Set(QueryVariable(book__Concat))`
- `valueObjectVariablesGroupConcat`: `Set(QueryVariable(seqnum__Concat), QueryVariable(book__LinkValue__Concat))`

From the given Iris, statements are
generated that ask for complete information on *exactly* these resources and
values. For any given resource Iri, only the values present in
`valueObjectIris` are to be queried. This is achieved by using SPARQL's
@@ -824,6 +824,17 @@ case class ReadResourcesSequenceV2(resources: Seq[ReadResourceV2],
)
}

private def getOntologiesFromResource(resource: ReadResourceV2): Set[SmartIri] = {
val propertyIriOntologies: Set[SmartIri] = resource.values.keySet.map(_.getOntologyFromEntity)

val valueOntologies: Set[SmartIri] = resource.values.values.flatten.collect {
case readLinkValueV2: ReadLinkValueV2 =>
readLinkValueV2.valueContent.nestedResource.map(nested => getOntologiesFromResource(nested))
}.flatten.flatten.toSet

propertyIriOntologies ++ valueOntologies + resource.resourceClassIri.getOntologyFromEntity
}

// #generateJsonLD
private def generateJsonLD(targetSchema: ApiV2Schema, settings: SettingsImpl, schemaOptions: Set[SchemaOption]): JsonLDDocument = {
// #generateJsonLD
@@ -843,14 +854,7 @@ case class ReadResourcesSequenceV2(resources: Seq[ReadResourceV2],
// Make JSON-LD prefixes for the project-specific ontologies used in the response.

val projectSpecificOntologiesUsed: Set[SmartIri] = resources.flatMap {
resource =>
val resourceOntology = resource.resourceClassIri.getOntologyFromEntity

val propertyOntologies = resource.values.keySet.map {
property => property.getOntologyFromEntity
}

propertyOntologies + resourceOntology
resource => getOntologiesFromResource(resource)
}.toSet.filter(!_.isKnoraBuiltInDefinitionIri)

// Make the knora-api prefix for the target schema.
@@ -744,7 +744,7 @@ class ValuesResponderV1(responderData: ResponderData) extends Responder(responde
// If we're updating a link, findResourceWithValueResult will contain the IRI of the property that points to the
// knora-base:LinkValue, but we'll need the IRI of the corresponding link property.
val propertyIri = changeValueRequest.value match {
case linkUpdateV1: LinkUpdateV1 => stringFormatter.linkValuePropertyIri2LinkPropertyIri(findResourceWithValueResult.propertyIri)
case linkUpdateV1: LinkUpdateV1 => stringFormatter.linkValuePropertyIriToLinkPropertyIri(findResourceWithValueResult.propertyIri)
case _ => findResourceWithValueResult.propertyIri
}

@@ -1075,7 +1075,7 @@ class ValuesResponderV1(responderData: ResponderData) extends Responder(responde
case (p, o) => p == OntologyConstants.KnoraBase.HasPermissions
}.map(_._2).getOrElse(throw InconsistentTriplestoreDataException(s"Value ${deleteValueRequest.valueIri} has no permissions"))

val linkPropertyIri = stringFormatter.linkValuePropertyIri2LinkPropertyIri(findResourceWithValueResult.propertyIri)
val linkPropertyIri = stringFormatter.linkValuePropertyIriToLinkPropertyIri(findResourceWithValueResult.propertyIri)

for {
// Get project info
@@ -1210,6 +1210,7 @@ class ResourcesResponderV2(responderData: ResponderData) extends ResponderWithSt
apiResponse: ReadResourcesSequenceV2 <- ConstructResponseUtilV2.createApiResponse(
mainResourcesAndValueRdfData = mainResourcesAndValueRdfData,
orderByResourceIri = resourceIrisDistinct,
pageSizeBeforeFiltering = resourceIris.size, // doesn't matter because we're not doing paging
mappings = mappingsAsMap,
queryStandoff = queryStandoff,
versionDate = versionDate,
@@ -1264,6 +1265,7 @@ class ResourcesResponderV2(responderData: ResponderData) extends ResponderWithSt
apiResponse: ReadResourcesSequenceV2 <- ConstructResponseUtilV2.createApiResponse(
mainResourcesAndValueRdfData = mainResourcesAndValueRdfData,
orderByResourceIri = resourceIrisDistinct,
pageSizeBeforeFiltering = resourceIris.size, // doesn't matter because we're not doing paging
mappings = Map.empty[IRI, MappingAndXSLTransformation],
queryStandoff = false,
versionDate = None,

0 comments on commit 9313b88

Please sign in to comment.