Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-1387 Improved custom service executor extension system #1388

Merged
merged 1 commit into from Jul 3, 2022

Conversation

Aklakan
Copy link
Contributor

@Aklakan Aklakan commented Jun 15, 2022

GitHub issue resolved #1387.
This fixes #1399.

Pull request Description: This PR adds the following changes to the custom service executor plugin system:

Improvements:

  • chaining There is now ChainingServiceExecutor which allows for transparently forwarding a request to the next ChainingServiceExecutor instance in the registry. This way a custom service executor can modify the request and have it processed by the remainder of the chain.
  • bulk processing The plugin system introduced with jena 4.5.0 only supports lookups with individual bindings. This PR adds support for plugins that want to process bindings in bulk. The main difference in the APIs is whether the createExecution method accepts a single Binding or a QueryIterator. There is now a ServiceExecutorRegistryBulk which by default has an registration that delegates to the non-bulk one.

Breaking changes:

  • ServiceExecutorRegistry.getFactories() now returns a List<ChainingServiceExecutor> because this is its internal storage; previously it was List<ServiceExecutorFactory>. I am afraid re-establishing compatibility in this regard would require a completely new registry with a new context attribute.

Deprecations:

  • Deprecated ServiceExecutorFactory in favor of ServiceExecutor (ServiceExecutorFactory extends from the latter). Use of the old code delegates to the new one.
    • In the new code the method that creates a ServiceExecution is now called createExecution (rather than createExecutor)
  • Deprecated ServiceExecutorRegistry.add in favor of prepend. Both methods add executors to the beginning of the registry's list so they are considered before the default behavior. QueryEngineFactory also uses add to prepend; so add is consistent with existing naming.
  • Legacy add method for ServiceExecutorFactory exists and wraps it as a ChainingServiceExecutor.
  • Legacy remove method for ServiceExecutorFactory finds previously wrapped elements..
  • Deprecated ServiceExecution because it doesn't add anything over QueryIterator.

  • Javadoc exists and is up-to-data
  • Tests are included.
  • Documentation change and updates are provided for the Apache Jena website
  • Commits have been squashed to remove intermediate development commit messages.
  • Key commit messages start with the issue number (GH-xxxx or JENA-xxxx)

By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.


See the Apache Jena "Contributing" guide.

return null;
}

protected QueryIterator nextStage(QueryIterator input) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain what is going on here.

If a change is needed, then having a superclass is not necessary. Just two nextStage methods with different signatures.

Copy link
Contributor Author

@Aklakan Aklakan Jun 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QueryIterApplyBulk's nextStage(QueryIterator) method flat-maps the input QueryIterator to an output QueryIterator until the input iterator is consumed - i.e. implementations of nextStage can "per stage" consume an arbitrary amount > 0 from the input iterator.

QueryIterApply implements QueryIterApplyBulk.nextStage(QueryIterator) such that it delegates to its own usual QueryIterApply.nextStage(Binding) method.
In other words, QueryIterApplyBulk is all methods of QueryIterApply minus this one delegate.

Note that QueryIterApplyBulk is also the base class for QueryIterServiceBulk. The latter only operates on the QueryIterator and therefore does not need a nextStage(Binding) method. I find this separation cleaner than having both nextStage methods on the same class together with an override that causes the binding-level one to no longer be called. Other than that, having both methods on the same class would work too.

The only drawback I see with this separation is that reflection code that relied on QueryIterApply.class.getDeclaredMethods() would break.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this separation cleaner

My concern is that this is making the bulk case "normal" when in fact it is special to your service work.

@Aklakan
Copy link
Contributor Author

Aklakan commented Jun 17, 2022

Just to clarify, my next step on this PR is to update and write javadoc; especially for what purpose some of the classes are for. This should make reviewing easier.

@Aklakan
Copy link
Contributor Author

Aklakan commented Jun 18, 2022

May latest attempt at cleaning up and simplifying things:

  • Reverted QueryIterRepeatApply (it used inside of ServiceExecutorBulkToSingle which acts as the bridge between bulk and non-bulk)
  • Combined bulk and non-bulk registries
  • Deprecated ServiceExecution in favor using QueryIterator directly
  • OpExecutor now directly calls the plugin system
  • This way QueryIterService is actually not used anymore - service execution via the registry eventually by default ends up with a QueryIterRepeatApply. The still existing QueryIterService only uses the non-bulk part of the registry.

If there should be a dedicated QueryIterService which calls the registry rather than OpExecutor doing that, then actually the proper base class would be something like a QueryIterDeferred / LazyInitialization: That iterator just takes the iterator returned by the service handler and serves from it. No repeat apply / batching / stages needed at that time.

@Aklakan Aklakan marked this pull request as ready for review June 20, 2022 17:52

/** Root of query iterators in ARQ. */

public interface QueryIterator extends Closeable, Iterator<Binding>, PrintSerializable
public interface QueryIterator extends ClosableIterator<Binding>, PrintSerializable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change isn't necessary but if it is made, please use IteratorCloseable See the comments in ClosableIterator which refer to the Model API and ExtendedIterator - let's keep ClosableIterator for jena-core.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. It seemed reasonable to add one of the interfaces that already combined iterator with close - it was a 50/50 chance 😄

@@ -32,8 +32,8 @@
import org.apache.jena.sparql.engine.main.QC ;
import org.apache.jena.sparql.exec.http.Service;
import org.apache.jena.sparql.service.ServiceExecution;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now an unused import and can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the import and added a deprecation annotation because QueryIterService is now no longer used but existing code may have sub-classed from it.

@afs
Copy link
Member

afs commented Jun 27, 2022

Looks good - 2 small comments.

Then it's the checklist items : javadoc item and how you want to commits to look when merged to the codebase. e.g any squashing and and any use of "GH-1387: " to pick out key commits.

@Aklakan
Copy link
Contributor Author

Aklakan commented Jun 28, 2022

I noticed that the overloaded add methods in ServiceExecutorRegistry would result in ambiguity with lambdas - which broke with some of my code based on the 4.5.0 api.
Before the naming gets frozen I now renamed the methods to:

add(ServiceExecutor) // Compatible with 4.5.0
addSingleLink(ChainingServiceExecutor) // New method; adds a link to the chain
addBulkLink(ChainingServiceExecutorBulk) // Also new

I updated javadoc and added a couple of @deprecated(since = "4.6.0") annotations.
I hope its final now.

@Aklakan
Copy link
Contributor Author

Aklakan commented Jun 30, 2022

Squashed

@afs
Copy link
Member

afs commented Jul 3, 2022

Are there any documentation changes for jena-site?

Also - 2 or 3 sentences for the 4.6.0 release announcement would be good.

@afs afs merged commit db4a750 into apache:main Jul 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SERVICE fails in empty context. Improved custom service executor extension system
2 participants