Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure on a namespace basis the fields to index in the thing-search #1521

Closed
thjaeckle opened this issue Oct 27, 2022 · 5 comments · Fixed by #1870
Closed

Configure on a namespace basis the fields to index in the thing-search #1521

thjaeckle opened this issue Oct 27, 2022 · 5 comments · Fixed by #1870
Assignees
Labels
community-interest Issues which were explicitly asked for by the Ditto community. help wanted
Milestone

Comments

@thjaeckle
Copy link
Member

Currently, the complete JSON of a thing (including all attributes+features) is automatically indexed in the search.
We learned from input of the community that this might not always be wished as this could lead to a lot of load on the search DB - which could e.g. be prevented if some fields are never used in the search "fields".

@thjaeckle thjaeckle added the outlook something which could be done in the future label Oct 27, 2022
@an1310
Copy link
Contributor

an1310 commented Jul 10, 2023

This is of strategic interest to us -- when would be a good time to go over requirements/design ideas for a possible contribution?

@thjaeckle
Copy link
Member Author

@an1310 I will try to find out where this could be placed the best.
As far as I remember there was an extension point for just that purpose used by Bosch's commercial offering of Ditto.

@thjaeckle
Copy link
Member Author

@an1310 I took myself a closer look and I think I found the mentioned extension point to do index only custom fields in namespaces.

I think to remember that we explicitly added CachingSignalEnrichmentFacade extension point in Ditto to retrieve only selectively parts of "things" (based on their namespace "search index configuration") when processing - retrieving a "partial thing" in order to update the search index:

public CompletionStage<JsonObject> retrievePartialThing(final ThingId thingId,
@Nullable final JsonFieldSelector jsonFieldSelector, final DittoHeaders dittoHeaders,
@Nullable final Signal<?> concernedSignal) {
final List<ThingEvent<?>> thingEvents =
(concernedSignal instanceof ThingEvent) && !(ProtocolAdapter.isLiveSignal(concernedSignal)) ?
List.of((ThingEvent<?>) concernedSignal) : List.of();
// as second step only return what was originally requested as fields:
final var cachingParameters =
new CachingParameters(jsonFieldSelector, thingEvents, true, 0);
return doRetrievePartialThing(thingId, dittoHeaders, cachingParameters)
.thenApply(jsonObject -> applyJsonFieldSelector(jsonObject, jsonFieldSelector));
}
/**
* Retrieve parts of a thing.
*
* @param thingId ID of the thing.
* @param jsonFieldSelector the selected fields of the thing.
* @param dittoHeaders Ditto headers containing authorization information.
* @param concernedSignals the Signals which caused that this partial thing retrieval was triggered
* (e.g. a {@code ThingEvent})
* @param minAcceptableSeqNr minimum sequence number of the concerned signals to not invalidate the cache.
* @return future that completes with the parts of a thing or fails with an error.
*/
@SuppressWarnings("java:S1612")
public CompletionStage<JsonObject> retrievePartialThing(final EntityId thingId,
final JsonFieldSelector jsonFieldSelector,
final DittoHeaders dittoHeaders,
final Collection<? extends Signal<?>> concernedSignals,
final long minAcceptableSeqNr) {
final List<ThingEvent<?>> thingEvents = concernedSignals.stream()
.filter(signal -> signal instanceof ThingEvent && !Signal.isChannelLive(signal))
.map(signal -> (ThingEvent<?>) signal)
.collect(Collectors.toList());
// as second step only return what was originally requested as fields:
final var cachingParameters =
new CachingParameters(jsonFieldSelector, thingEvents, true, minAcceptableSeqNr);
return doRetrievePartialThing(thingId, dittoHeaders, cachingParameters)
.thenApply(jsonObject -> applyJsonFieldSelector(jsonObject, jsonFieldSelector));
}

We would need a custom JsonFieldSelector configured per namespace name, defining which fields of a "thing" to retrieve (and cache) for updating the search index.
The configuration could look like:

  • a list/map of namespaces
    • namespace
      • indexedFields as JsonFieldSelector

Example what I have in mind (to put in the search.conf as configuration of the mentioned extension point):

caching-signal-enrichment-facade-provider = org.eclipse.ditto.thingsearch.service.persistence.write.streaming.DittoCachingSignalEnrichmentFacadeProvider

namespaces = [
  {
    namespace-name = "org.eclipse.ditto.foo"
    indexed-fields = [ 
      "thingId", 
      "policyId", 
      "_revision", 
      "_created", 
      "_modified", 
      "attributes/indexed-one"
      "attributes/complex-jsonobj"
    ]
  }
]

I hope that helps a little. Maybe a Ditto committer still at Bosch can share more details how this can be done, based on the identified extension point.

@thjaeckle thjaeckle added help wanted community-interest Issues which were explicitly asked for by the Ditto community. and removed outlook something which could be done in the future labels Sep 12, 2023
@thjaeckle
Copy link
Member Author

@an1310 are you working on this? Any update or PR for providing early feedback?

@an1310
Copy link
Contributor

an1310 commented Nov 7, 2023

Yes I am.

@thjaeckle thjaeckle added this to the 3.5.0 milestone Jan 23, 2024
thjaeckle added a commit that referenced this issue Jan 24, 2024
#1521: Configure which fields are indexed in the Ditto search index per namespace pattern
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-interest Issues which were explicitly asked for by the Ditto community. help wanted
Projects
Status: Done
2 participants