GH-1387 Improved custom service executor extension system #1388

Aklakan · 2022-06-15T12:21:27Z

GitHub issue resolved #1387.
This fixes #1399.

Pull request Description: This PR adds the following changes to the custom service executor plugin system:

Improvements:

chaining There is now ChainingServiceExecutor which allows for transparently forwarding a request to the next ChainingServiceExecutor instance in the registry. This way a custom service executor can modify the request and have it processed by the remainder of the chain.
bulk processing The plugin system introduced with jena 4.5.0 only supports lookups with individual bindings. This PR adds support for plugins that want to process bindings in bulk. The main difference in the APIs is whether the createExecution method accepts a single Binding or a QueryIterator. There is now a ServiceExecutorRegistryBulk which by default has an registration that delegates to the non-bulk one.

Breaking changes:

ServiceExecutorRegistry.getFactories() now returns a List<ChainingServiceExecutor> because this is its internal storage; previously it was List<ServiceExecutorFactory>. I am afraid re-establishing compatibility in this regard would require a completely new registry with a new context attribute.

Deprecations:

Deprecated ServiceExecutorFactory in favor of ServiceExecutor (ServiceExecutorFactory extends from the latter). Use of the old code delegates to the new one.
- In the new code the method that creates a ServiceExecution is now called createExecution (rather than createExecutor)
~~Deprecated ServiceExecutorRegistry.add in favor of prepend. Both methods add executors to the beginning of the registry's list so they are considered before the default behavior.~~ QueryEngineFactory also uses add to prepend; so add is consistent with existing naming.
Legacy add method for ServiceExecutorFactory exists and wraps it as a ChainingServiceExecutor.
Legacy remove method for ServiceExecutorFactory finds previously wrapped elements..
Deprecated ServiceExecution because it doesn't add anything over QueryIterator.

Javadoc exists and is up-to-data
Tests are included.
Documentation change and updates are provided for the Apache Jena website
Commits have been squashed to remove intermediate development commit messages.
Key commit messages start with the issue number (GH-xxxx or JENA-xxxx)

By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.

See the Apache Jena "Contributing" guide.

afs · 2022-06-17T09:15:32Z

jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterRepeatApply.java

-            return null;
-        }
-
+    protected QueryIterator nextStage(QueryIterator input) {


Please explain what is going on here.

If a change is needed, then having a superclass is not necessary. Just two nextStage methods with different signatures.

QueryIterApplyBulk's nextStage(QueryIterator) method flat-maps the input QueryIterator to an output QueryIterator until the input iterator is consumed - i.e. implementations of nextStage can "per stage" consume an arbitrary amount > 0 from the input iterator.

QueryIterApply implements QueryIterApplyBulk.nextStage(QueryIterator) such that it delegates to its own usual QueryIterApply.nextStage(Binding) method.
In other words, QueryIterApplyBulk is all methods of QueryIterApply minus this one delegate.

Note that QueryIterApplyBulk is also the base class for QueryIterServiceBulk. The latter only operates on the QueryIterator and therefore does not need a nextStage(Binding) method. I find this separation cleaner than having both nextStage methods on the same class together with an override that causes the binding-level one to no longer be called. Other than that, having both methods on the same class would work too.

The only drawback I see with this separation is that reflection code that relied on QueryIterApply.class.getDeclaredMethods() would break.

I find this separation cleaner

My concern is that this is making the bulk case "normal" when in fact it is special to your service work.

jena-arq/src/main/java/org/apache/jena/sparql/ARQConstants.java

Aklakan · 2022-06-17T11:45:58Z

Just to clarify, my next step on this PR is to update and write javadoc; especially for what purpose some of the classes are for. This should make reviewing easier.

jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterRepeatApply.java

Aklakan · 2022-06-18T16:22:57Z

May latest attempt at cleaning up and simplifying things:

Reverted QueryIterRepeatApply (it used inside of ServiceExecutorBulkToSingle which acts as the bridge between bulk and non-bulk)
Combined bulk and non-bulk registries
Deprecated ServiceExecution in favor using QueryIterator directly
OpExecutor now directly calls the plugin system
This way QueryIterService is actually not used anymore - service execution via the registry eventually by default ends up with a QueryIterRepeatApply. The still existing QueryIterService only uses the non-bulk part of the registry.

If there should be a dedicated QueryIterService which calls the registry rather than OpExecutor doing that, then actually the proper base class would be something like a QueryIterDeferred / LazyInitialization: That iterator just takes the iterator returned by the service handler and serves from it. No repeat apply / batching / stages needed at that time.

afs · 2022-06-27T16:39:43Z

jena-arq/src/main/java/org/apache/jena/sparql/engine/QueryIterator.java


 /** Root of query iterators in ARQ. */

-public interface QueryIterator extends Closeable, Iterator<Binding>, PrintSerializable
+public interface QueryIterator extends ClosableIterator<Binding>, PrintSerializable


This change isn't necessary but if it is made, please use IteratorCloseable See the comments in ClosableIterator which refer to the Model API and ExtendedIterator - let's keep ClosableIterator for jena-core.

Updated. It seemed reasonable to add one of the interfaces that already combined iterator with close - it was a 50/50 chance 😄

afs · 2022-06-27T16:40:33Z

jena-arq/src/main/java/org/apache/jena/sparql/engine/main/iterator/QueryIterService.java

@@ -32,8 +32,8 @@
 import org.apache.jena.sparql.engine.main.QC ;
 import org.apache.jena.sparql.exec.http.Service;
 import org.apache.jena.sparql.service.ServiceExecution;


This is now an unused import and can be removed.

I removed the import and added a deprecation annotation because QueryIterService is now no longer used but existing code may have sub-classed from it.

afs · 2022-06-27T16:52:23Z

Looks good - 2 small comments.

Then it's the checklist items : javadoc item and how you want to commits to look when merged to the codebase. e.g any squashing and and any use of "GH-1387: " to pick out key commits.

Aklakan · 2022-06-28T19:48:02Z

I noticed that the overloaded add methods in ServiceExecutorRegistry would result in ambiguity with lambdas - which broke with some of my code based on the 4.5.0 api.
Before the naming gets frozen I now renamed the methods to:

add(ServiceExecutor) // Compatible with 4.5.0
addSingleLink(ChainingServiceExecutor) // New method; adds a link to the chain
addBulkLink(ChainingServiceExecutorBulk) // Also new

I updated javadoc and added a couple of @deprecated(since = "4.6.0") annotations.
I hope its final now.

Aklakan · 2022-06-30T11:05:09Z

Squashed

afs · 2022-07-03T15:26:11Z

Are there any documentation changes for jena-site?

Also - 2 or 3 sentences for the 4.6.0 release announcement would be good.

Aklakan marked this pull request as draft June 15, 2022 12:21

This was referenced Jun 15, 2022

Improved custom service executor extension system #1387

Closed

Bulk retrieval and caching with SERVICE clauses #1314

Closed

afs reviewed Jun 17, 2022

View reviewed changes

jena-arq/src/main/java/org/apache/jena/sparql/engine/iterator/QueryIterRepeatApply.java Outdated Show resolved Hide resolved

Aklakan marked this pull request as ready for review June 20, 2022 17:52

Aklakan mentioned this pull request Jun 23, 2022

SERVICE fails in empty context. #1399

Closed

afs reviewed Jun 27, 2022

View reviewed changes

apachegh-1387: Bulk extension for service executor plugin system

b2ed4de

Aklakan force-pushed the gh-1387 branch from 03a7f98 to b2ed4de Compare June 30, 2022 11:01

afs approved these changes Jul 3, 2022

View reviewed changes

afs merged commit db4a750 into apache:main Jul 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-1387 Improved custom service executor extension system #1388

GH-1387 Improved custom service executor extension system #1388

Aklakan commented Jun 15, 2022 •

edited

afs Jun 17, 2022

Aklakan Jun 17, 2022 •

edited

afs Jun 17, 2022

Aklakan commented Jun 17, 2022 •

edited

Aklakan commented Jun 18, 2022 •

edited

afs Jun 27, 2022

Aklakan Jun 28, 2022

afs Jun 27, 2022

Aklakan Jun 28, 2022

afs commented Jun 27, 2022

Aklakan commented Jun 28, 2022 •

edited

Aklakan commented Jun 30, 2022

afs commented Jul 3, 2022 •

edited

GH-1387 Improved custom service executor extension system #1388

GH-1387 Improved custom service executor extension system #1388

Conversation

Aklakan commented Jun 15, 2022 • edited

afs Jun 17, 2022

Choose a reason for hiding this comment

Aklakan Jun 17, 2022 • edited

Choose a reason for hiding this comment

afs Jun 17, 2022

Choose a reason for hiding this comment

Aklakan commented Jun 17, 2022 • edited

Aklakan commented Jun 18, 2022 • edited

afs Jun 27, 2022

Choose a reason for hiding this comment

Aklakan Jun 28, 2022

Choose a reason for hiding this comment

afs Jun 27, 2022

Choose a reason for hiding this comment

Aklakan Jun 28, 2022

Choose a reason for hiding this comment

afs commented Jun 27, 2022

Aklakan commented Jun 28, 2022 • edited

Aklakan commented Jun 30, 2022

afs commented Jul 3, 2022 • edited

Aklakan commented Jun 15, 2022 •

edited

Aklakan Jun 17, 2022 •

edited

Aklakan commented Jun 17, 2022 •

edited

Aklakan commented Jun 18, 2022 •

edited

Aklakan commented Jun 28, 2022 •

edited

afs commented Jul 3, 2022 •

edited