Consider adding a context to bindings? #1203

surilindur · 2023-04-17T20:03:58Z

Issue type:

➕ Feature request

Description:

There may be use cases where metadata should be maintained per bindings instance, and carried with the bindings all the way to the query engine output (I do not know how to express this properly, sorry). One such example use case could be when multiple sources may contribute to a single result binding, and information about the exact sources contributing to a specific result from the query engine are needed, to help trace back any potentially weird results or filter them as needed.

To help address this use case and potential other ones, perhaps a context could be added to bindings, with functionality similar to the context being passed to the query engine itself and then passed around internally to actors. For example, a Bindings instance could have a context property where members could be accessed using a IBindingsContextKey:

function exampleSourceAddition(bindings: Bindings): void {
  const sourcesKey: IBindingsContextKey = new BindingsContextKey('urn:comunica:something:sources')
  const sourcesValue: Set<string> = bindings.context.getSafe<Set<string>>(sourcesKey)
  sourcesValue.add('https://example.com/sparql')
  bindings.context = bindings.context.set(sourcesKey, sourcesValue)
}

Maybe the existing context-related work could be generalised, to have an IContextKey, IContext, etc.? That way, any future aspects of the engine also needing context could be taken into consideration, and it would help reduce the amount of types because there are already a lot of them.

I am just creating this issue to open the discussion around the topic. 🙂

The text was updated successfully, but these errors were encountered:

github-actions · 2023-04-17T20:04:22Z

Thanks for the suggestion!

rubensworks · 2023-04-18T09:06:43Z

We could probably indeed directly use the IContext interface, and add it as a context field to bindings.

I guess we'll want the context to be only assignable during creation of the bindings? This will require us to go beyond the bindings factory interface of RDF/JS.

Another question is how to handle joins, where 2 bindings are merged (see the merge functions). We may want to require the caller of the merge function to explicitly say how the contexts of 2 bindings should be merged, as there may not be a generic way that is good for all cases.

maartyman · 2023-04-22T10:41:06Z

In the current version of incremunica (an incremental version of comunica that can process both additions and deletions) I added a diff boolean to the Bindings class to define if a Binding has been deleted or added. A context field for the bindings as described here could perfectly allow for this functionality without the need to change this core comunica package.

Indeed, when merging two bindings, these contexts need to be combined. In my case, if either one of the bindings has a negative diff (meaning it has been deleted) the resulting merge binding has to have a negative diff too (logical and). (At this moment, this has not been correctly implemented in the master branch of incremunica)

rubensworks · 2023-04-22T12:58:47Z

It sounds like we may want to introduce some kind of merge-handlers for bindings context keys, which define what to do when specific context keys are merged. Perhaps we could even introduce a new bus for this.

maartyman · 2023-05-04T10:00:35Z

An important consideration is to also alter the @comunica/jest to take into account the context when comparing two bindings.

This allows annotating Bindings objects with custom metadata. This also adds a new bus (`bus-merge-bindings-context`) to determine how context entries in other Bindings contexts should be merged with each other during operations such as joins. Closes #1203

* Support relative and absolute paths for COMUNICA_CONFIG Closes #1276 * Support building with esModuleInterop TS flag This makes sure that Comunica can be used in projects with esModuleInterop either be set to true or false. * Convert raw source contexts to ActionContexts Closes #1003 * Log expression error messages As discussed in #1279 * Enforce `bindingsToString` key ordering with a unit test * Update dependency @types/node to v20 * Fix wrong cardinality for zero-or-one paths with one variable This could cause issues where our optimizer would think that a subquery has no results, while it has one result, and thereby incorrectly producing no results. Closes #1285 * Bump to release version v2.10.0 * Bump Github-Release-Action version * Generalize fromBindings in BindingsFactory To implement `RDF.BindingsFactory`, `fromBindings` must accept any `RDF.Bindings`. The implementation already does, but the type doesn't. * Fix updates with body streams failing in Node >= 18 * Bump to release version v2.10.1 * Fix broken links in READMEs Closes #903 * Migrate to different release action * Update dependency node-polyfill-webpack-plugin to v3 * Migrate to @smessie/readable-web-to-node-stream Closes comunica/comunica-feature-link-traversal#121 * Bump to release version v2.10.2 * Allow Bindings to have contexts This allows annotating Bindings objects with custom metadata. This also adds a new bus (`bus-merge-bindings-context`) to determine how context entries in other Bindings contexts should be merged with each other during operations such as joins. Closes #1203 * Generalize sources to accept all query operations Sources used to be hard-coupled to quad patterns, which severely limited the optimization possibilities. This commit undoes this hard-coupling by removing the bus-rdf-resolve-quad-pattern in favor of bus-query-source-identify. Instead of determining source types lazily during query execution, they will now be identified early as a context-preprocess step, which allows source properties to be used during query optimization. For example, query sources now expose "selector shapes", which describe the shape of query operations that are supported by sources. This information is used now during query planning to assign query operations to sources that are "as large as possible". Concretely, this commit allows query operations to be annotated with query sources. Optimize actors will modify source assignment to operations. Thanks to these modifications, this commit adds support for the following: - brTPF sources - Grouping operations into sources during federated querying (FedX) - Bound joins: pushing bindings into sources (FedX) - Pushing down filters into sources Closes #609 Closes #1218 Closes #1250 Closes #1251 Closes #1274 Closes #881 Related to #1245, #1292 * Add cache to query source identify actor * Add timeout for sparql endpoint count queries * Restore sending original queries to SPARQL endpoints Related to #844 * Fix CONSTRUCT queries over SPARQL endpoints not emitting metadata * Move sequential query process steps to separate bus and actor Concretely, the sequential parsing, optimizing, and evaluation steps have been moved to a separate actor so that we can more easily create alternative processing approaches in the future. * Restore handling of raw source contexts See 8f685d8 * Move context shortcut handling to preprocess actor * Process source contexts to support shortcuts * Restore aggregateStore being scoped per query execution * Remove unused context import in configs * Add option to disable caching (#1297) This can be done using the `noCache` context key or `--noCache` CLI flag. Closes #618 * Rename sparqlee-specific package names * Mark packages as version 3.0.0 * Ignore erroring actor tests in rdf-metadata-extract bus * Enforce order in optimize actors * Add initial query to context before optimization * Fix hypermedia iterators could stop before all links are processed This could occur rarely during link traversal. * Fix hypermedia iterator not starting if derived iterators start first This could occur during link traversal with aggregatedStore enabled where derived iterators from the aggregatedStore would be consumed first in full during joining, but would never start due to the main hypermedia iterator not having been started yet. * Fix hypermedia iterator incorrectly ignoring empty queue check During link traversal, it could occur that an hypermedia iterator would be forcefully closed (not destroyed), and that the iterator would effectively close (incorrectly) while the link queue was not empty yet, but there was a temporary period where no sub-iterators were running. * Allow sources to be annotated with traverse flag * Fix aggregated iterators sometimes closing before they started During link traversal, it could occur that for a single link following, the hypermedia iterator and derived iterators are closed, but the match call to the derived iterators has not started putting quads in the stream before it is closed. This could result in queries not producing all results. * Let hypermedia iterators with aggStore run in flow after start This fixes problems in traversal where a closed and unused hypermedia iterator will not fill up anymore, which stalls derived iterators, and makes the results iterator not continue anymore without end event. * Make lenient errors log to warn instead of error log level * Bump to typescript 5.3.3 * Migrate to @rubensworks/eslint-config v3 * Don't run CI on Node 16 anymore * Fix GROUP_CONCAT losing languages if equal * Enable noImplicitOverride in tsconfig * Enable strictFunctionTypes in tsc Closes #1289 * Remove node-polyfill-webpack-plugin from default webpack config * Ensure QueryEngine instances are fully independent This ensures that actor-specific caches will never be reused across different query engine instances. * Update to Components.js v6 * Bump to lerna v8 * Fix missing variables in projects after pruning * Update to asynciterator 3.9.0 * Add async iterable examples in README * Fix typo in filter-pushdown README * Fix broken CLI tools * Fix typo in query-sparql README * Bump to release version v3.0.1 * Reference non-next integration tests manifest * Lower q-value of application/json in fetch requests This fixes problems where raw JSON was being preferred over HTML. * Bump to release version v3.0.2 * Add generic type to LinkQueueWrapper (#1322) * Fix broken LIMITs on CONSTRUCT queries on SPARQL endpoints, Closes #1319 * Bump to release version v3.0.3 * Update link to HDT repo * Catch 404 of non-existing resource when fetching the selector shape (#1324) This fixes the problem where no data can be inserted into non-existent destinations. Closes #1317 * Fix linter error in ActorOptimizeQueryOperationAssignSourcesExhaustive * Avoid race conditions during DELETE/INSERT operations Closes #1301 * Migrate from readable-web-to-node-stream to readable-from-web * Use yarn over npm CLI where possible * Move validateHttpResponse helper to avoid interdependencies * Reset lockfile to reduce duplicate dependencies * Add @comunica/query-sparql-rdfjs-lite that is optimized for bundle size Closes #722 Related to #1323 * Throw error if bundle size becomes too large, Closes #722 * Fix package name in query-sparql-rdfjs-lite README * Always add preprocessed context to unidentified query sources (#1337) Closes #1336 --------- Co-authored-by: surilindur <16085353+surilindur@users.noreply.github.com> Co-authored-by: RubenEschauzier <61841193+RubenEschauzier@users.noreply.github.com> Co-authored-by: Aron Buzogany <108480125+AronBuzogany@users.noreply.github.com> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Ruben Taelman <ruben.taelman@ugent.be> Co-authored-by: Ruben Taelman <rubensworks@users.noreply.github.com> Co-authored-by: Petra Jaros <peeja@peeja.com> Co-authored-by: Simon Van Braeckel <33571080+simonvbrae@users.noreply.github.com> Co-authored-by: smessie <smessie@users.noreply.github.com> Co-authored-by: Karel Klíma <karelklima@gmail.com> Co-authored-by: surilindur <surilindur@users.noreply.github.com>

rubensworks added the feature ➕ label Apr 17, 2023

rubensworks added this to Triage in Development Apr 17, 2023

RubenEschauzier mentioned this issue Aug 14, 2023

Add context to bindings and allow specification of functions that merge specific entries of context during joins #1240

Closed

RubenEschauzier mentioned this issue Oct 24, 2023

Add context to bindings and allow specification of functions that merge specific entries of context during joins #1284

Merged

rubensworks closed this as completed in aeaa306 Mar 19, 2024

Development automation moved this from Triage to Done Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider adding a context to bindings? #1203

Consider adding a context to bindings? #1203

surilindur commented Apr 17, 2023 •

edited

Loading

github-actions bot commented Apr 17, 2023

rubensworks commented Apr 18, 2023

maartyman commented Apr 22, 2023

rubensworks commented Apr 22, 2023

maartyman commented May 4, 2023

Consider adding a context to bindings? #1203

Consider adding a context to bindings? #1203

Comments

surilindur commented Apr 17, 2023 • edited Loading

Issue type:

Description:

github-actions bot commented Apr 17, 2023

rubensworks commented Apr 18, 2023

maartyman commented Apr 22, 2023

rubensworks commented Apr 22, 2023

maartyman commented May 4, 2023

surilindur commented Apr 17, 2023 •

edited

Loading