Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding a context to bindings? #1203

Closed
surilindur opened this issue Apr 17, 2023 · 5 comments
Closed

Consider adding a context to bindings? #1203

surilindur opened this issue Apr 17, 2023 · 5 comments

Comments

@surilindur
Copy link
Contributor

surilindur commented Apr 17, 2023

Issue type:

  • ➕ Feature request

Description:

There may be use cases where metadata should be maintained per bindings instance, and carried with the bindings all the way to the query engine output (I do not know how to express this properly, sorry). One such example use case could be when multiple sources may contribute to a single result binding, and information about the exact sources contributing to a specific result from the query engine are needed, to help trace back any potentially weird results or filter them as needed.

To help address this use case and potential other ones, perhaps a context could be added to bindings, with functionality similar to the context being passed to the query engine itself and then passed around internally to actors. For example, a Bindings instance could have a context property where members could be accessed using a IBindingsContextKey:

function exampleSourceAddition(bindings: Bindings): void {
  const sourcesKey: IBindingsContextKey = new BindingsContextKey('urn:comunica:something:sources')
  const sourcesValue: Set<string> = bindings.context.getSafe<Set<string>>(sourcesKey)
  sourcesValue.add('https://example.com/sparql')
  bindings.context = bindings.context.set(sourcesKey, sourcesValue)
}

Maybe the existing context-related work could be generalised, to have an IContextKey, IContext, etc.? That way, any future aspects of the engine also needing context could be taken into consideration, and it would help reduce the amount of types because there are already a lot of them.

I am just creating this issue to open the discussion around the topic. 🙂

@github-actions
Copy link

Thanks for the suggestion!

@rubensworks rubensworks added this to Triage in Development Apr 17, 2023
@rubensworks
Copy link
Member

We could probably indeed directly use the IContext interface, and add it as a context field to bindings.

I guess we'll want the context to be only assignable during creation of the bindings? This will require us to go beyond the bindings factory interface of RDF/JS.

Another question is how to handle joins, where 2 bindings are merged (see the merge functions). We may want to require the caller of the merge function to explicitly say how the contexts of 2 bindings should be merged, as there may not be a generic way that is good for all cases.

@maartyman
Copy link
Contributor

In the current version of incremunica (an incremental version of comunica that can process both additions and deletions) I added a diff boolean to the Bindings class to define if a Binding has been deleted or added. A context field for the bindings as described here could perfectly allow for this functionality without the need to change this core comunica package.

Indeed, when merging two bindings, these contexts need to be combined. In my case, if either one of the bindings has a negative diff (meaning it has been deleted) the resulting merge binding has to have a negative diff too (logical and). (At this moment, this has not been correctly implemented in the master branch of incremunica)

@rubensworks
Copy link
Member

It sounds like we may want to introduce some kind of merge-handlers for bindings context keys, which define what to do when specific context keys are merged. Perhaps we could even introduce a new bus for this.

@maartyman
Copy link
Contributor

An important consideration is to also alter the @comunica/jest to take into account the context when comparing two bindings.

rubensworks pushed a commit that referenced this issue Feb 2, 2024
This allows annotating Bindings objects with custom metadata.

This also adds a new bus (`bus-merge-bindings-context`) to determine
how context entries in other Bindings contexts should be merged with
each other during operations such as joins.

Closes #1203
Development automation moved this from Triage to Done Mar 19, 2024
rubensworks added a commit that referenced this issue May 23, 2024
* Support relative and absolute paths for COMUNICA_CONFIG

Closes #1276

* Support building with esModuleInterop TS flag

This makes sure that Comunica can be used in projects with
esModuleInterop either be set to true or false.

* Convert raw source contexts to ActionContexts

Closes #1003

* Log expression error messages

As discussed in #1279

* Enforce `bindingsToString` key ordering with a unit test

* Update dependency @types/node to v20

* Fix wrong cardinality for zero-or-one paths with one variable

This could cause issues where our optimizer would think that a subquery
has no results, while it has one result, and thereby incorrectly
producing no results.

Closes #1285

* Bump to release version v2.10.0

* Bump Github-Release-Action version

* Generalize fromBindings in BindingsFactory

To implement `RDF.BindingsFactory`, `fromBindings` must accept any
`RDF.Bindings`. The implementation already does, but the type doesn't.

* Fix updates with body streams failing in Node >= 18

* Bump to release version v2.10.1

* Fix broken links in READMEs

Closes #903

* Migrate to different release action

* Update dependency node-polyfill-webpack-plugin to v3

* Migrate to @smessie/readable-web-to-node-stream

Closes comunica/comunica-feature-link-traversal#121

* Bump to release version v2.10.2

* Allow Bindings to have contexts

This allows annotating Bindings objects with custom metadata.

This also adds a new bus (`bus-merge-bindings-context`) to determine
how context entries in other Bindings contexts should be merged with
each other during operations such as joins.

Closes #1203

* Generalize sources to accept all query operations

Sources used to be hard-coupled to quad patterns, which severely limited
the optimization possibilities. This commit undoes this hard-coupling
by removing the bus-rdf-resolve-quad-pattern in favor of
bus-query-source-identify.

Instead of determining source types lazily during query execution, they
will now be identified early as a context-preprocess step, which allows
source properties to be used during query optimization.
For example, query sources now expose "selector shapes", which describe
the shape of query operations that are supported by sources.
This information is used now during query planning to assign query
operations to sources that are "as large as possible".

Concretely, this commit allows query operations to be annotated
with query sources. Optimize actors will modify source assignment to
operations.

Thanks to these modifications, this commit adds support for the following:
- brTPF sources
- Grouping operations into sources during federated querying (FedX)
- Bound joins: pushing bindings into sources (FedX)
- Pushing down filters into sources

Closes #609
Closes #1218
Closes #1250
Closes #1251
Closes #1274
Closes #881
Related to #1245, #1292

* Add cache to query source identify actor

* Add timeout for sparql endpoint count queries

* Restore sending original queries to SPARQL endpoints

Related to #844

* Fix CONSTRUCT queries over SPARQL endpoints not emitting metadata

* Move sequential query process steps to separate bus and actor

Concretely, the sequential parsing, optimizing, and evaluation steps
have been moved to a separate actor so that we can more easily create
alternative processing approaches in the future.

* Restore handling of raw source contexts

See 8f685d8

* Move context shortcut handling to preprocess actor

* Process source contexts to support shortcuts

* Restore aggregateStore being scoped per query execution

* Remove unused context import in configs

* Add option to disable caching (#1297)

This can be done using the `noCache` context key or `--noCache` CLI flag.

Closes #618

* Rename sparqlee-specific package names

* Mark packages as version 3.0.0

* Ignore erroring actor tests in rdf-metadata-extract bus

* Enforce order in optimize actors

* Add initial query to context before optimization

* Fix hypermedia iterators could stop before all links are processed

This could occur rarely during link traversal.

* Fix hypermedia iterator not starting if derived iterators start first

This could occur during link traversal with aggregatedStore enabled
where derived iterators from the aggregatedStore would be consumed first
in full during joining, but would never start due to the main hypermedia
iterator not having been started yet.

* Fix hypermedia iterator incorrectly ignoring empty queue check

During link traversal, it could occur that an hypermedia iterator would
be forcefully closed (not destroyed), and that the iterator would
effectively close (incorrectly) while the link queue was not empty yet,
but there was a temporary period where no sub-iterators were running.

* Allow sources to be annotated with traverse flag

* Fix aggregated iterators sometimes closing before they started

During link traversal, it could occur that for a single link following,
the hypermedia iterator and derived iterators are closed, but the match
call to the derived iterators has not started putting quads in the
stream before it is closed. This could result in queries not producing
all results.

* Let hypermedia iterators with aggStore run in flow after start

This fixes problems in traversal where a closed and unused hypermedia
iterator will not fill up anymore, which stalls derived iterators,
and makes the results iterator not continue anymore without end event.

* Make lenient errors log to warn instead of error log level

* Bump to typescript 5.3.3

* Migrate to @rubensworks/eslint-config v3

* Don't run CI on Node 16 anymore

* Fix GROUP_CONCAT losing languages if equal

* Enable noImplicitOverride in tsconfig

* Enable strictFunctionTypes in tsc

Closes #1289

* Remove node-polyfill-webpack-plugin from default webpack config

* Ensure QueryEngine instances are fully independent

This ensures that actor-specific caches will never be reused across
different query engine instances.

* Update to Components.js v6

* Bump to lerna v8

* Fix missing variables in projects after pruning

* Update to asynciterator 3.9.0

* Add async iterable examples in README

* Fix typo in filter-pushdown README

* Fix broken CLI tools

* Fix typo in query-sparql README

* Bump to release version v3.0.1

* Reference non-next integration tests manifest

* Lower q-value of application/json in fetch requests

This fixes problems where raw JSON was being preferred over HTML.

* Bump to release version v3.0.2

* Add generic type to LinkQueueWrapper (#1322)

* Fix broken LIMITs on CONSTRUCT queries on SPARQL endpoints, Closes #1319

* Bump to release version v3.0.3

* Update link to HDT repo

* Catch 404 of non-existing resource when fetching the selector shape (#1324)

This fixes the problem where no data can be inserted into non-existent destinations.

Closes #1317

* Fix linter error in ActorOptimizeQueryOperationAssignSourcesExhaustive

* Avoid race conditions during DELETE/INSERT operations

Closes #1301

* Migrate from readable-web-to-node-stream to readable-from-web

* Use yarn over npm CLI where possible

* Move validateHttpResponse helper to avoid interdependencies

* Reset lockfile to reduce duplicate dependencies

* Add @comunica/query-sparql-rdfjs-lite that is optimized for bundle size

Closes #722
Related to #1323

* Throw error if bundle size becomes too large, Closes #722

* Fix package name in query-sparql-rdfjs-lite README

* Always add preprocessed context to unidentified query sources (#1337)

Closes #1336

---------

Co-authored-by: surilindur <16085353+surilindur@users.noreply.github.com>
Co-authored-by: RubenEschauzier <61841193+RubenEschauzier@users.noreply.github.com>
Co-authored-by: Aron Buzogany <108480125+AronBuzogany@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Ruben Taelman <ruben.taelman@ugent.be>
Co-authored-by: Ruben Taelman <rubensworks@users.noreply.github.com>
Co-authored-by: Petra Jaros <peeja@peeja.com>
Co-authored-by: Simon Van Braeckel <33571080+simonvbrae@users.noreply.github.com>
Co-authored-by: smessie <smessie@users.noreply.github.com>
Co-authored-by: Karel Klíma <karelklima@gmail.com>
Co-authored-by: surilindur <surilindur@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development
  
Done
Development

No branches or pull requests

3 participants