Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jena-core: add more javadocs about Graph-mem thread-safety and ConcurrentModificationException #1994

Merged
merged 1 commit into from
Oct 22, 2023
Merged

Conversation

sszuev
Copy link
Contributor

@sszuev sszuev commented Aug 17, 2023

…rentModificationException (see #1992, #1961)

GitHub issue resolved #

Pull request Description:


  • Tests are included.
  • Documentation change and updates are provided for the Apache Jena website
  • Commits have been squashed to remove intermediate development commit messages.
  • Key commit messages start with the issue number (GH-xxxx, or if in JIRA, JENA-xxxx)

By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.


See the Apache Jena "Contributing" guide.

Copy link
Member

@rvesse rvesse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, one minor suggestion

* This may happen even if the queried data does not relate directly to modified data
* (i.e. when triple search pattern does not match added or deleted triple).
* The good practice is to explicitly close iterators immediately after a read operation or
* use terminate operations such as {@link ExtendedIterator#toList()}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* use terminate operations such as {@link ExtendedIterator#toList()}.
* use materialising operations such as {@link ExtendedIterator#toList()}.

Copy link
Contributor Author

@sszuev sszuev Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In java-stream API such operations are called terminal, so it's typo.

maybe ... materializing (or terminal, in terms of Java Stream API) ... ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"terminal" is slightly different - it does not necessary read the whole stream (e.g. findFirst) but there is an internal "steam close" call .

ExtendedIterator does not this hidden close for partially consumed iterators. It's run-to-end or call ExtendedIterator.close. materialising is running to the end.

Copy link
Contributor Author

@sszuev sszuev Aug 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, there is still no org.apache.java.util.iterator.ExtendedIterator.findFirst method and other terminal methods. Such operations are available only in the form of utilities (org.apache.java.util.iterator.Iter.findFirst) and they do not call close, it makes them not so useful.
The existing ExtendedIterator materializing methods are also terminal.
In addition, Graph now supports Java Streams, so we also have to say a few words about Streams.
I think some words could be added directly to the Graph interface, and additional explanations could be added to GraphMemFactory as well, to make it less ambiguous. (UPD: added)

By the way, why we use org.apache.jena.atlas.lib.Closeable instead of java.lang.AutoCloseable?
I think we can do something like this:

package org.apache.jena.atlas.lib;
interface Closeable extends java.lang.AutoCloseable {

    @Override
    void close(); // no exception
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoCloseable.close is required. Jena allows exhausting the iterator.

Copy link
Contributor Author

@sszuev sszuev Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like I still don't quite understand the functional difference between ExtendedIterator and Stream.

Both closeable, both have terminal and intrermediate operations (e.g. intermediate: ExtendedIterator.filterKeep, Stream.filter), both can be exhausted or not, both lazy.
What am I missing out on?

The only functional difference I see - we can call Iter.findFirst several times, for Stream.findFirst it is not possible. BTW, maybe the method ExternalIterator.findFirst with close would be more convenient than Iter.findFirst without close.
Anyway this difference is not so important (IMO), maybe considered as a feature, which makes ExtendedIterator more suitable choice in some cases. And Stream is more heavy and complicated.

AutoCloseable is not required to be closed, it is also "good practice", but in Java we can use try-with-resources to make our code safer more elegant.
Also, switching to AutoCloseable will not break user code. (Java itself did this when try-with-resources was introduced in version 7). Maybe org.apache.jena.atlas.lib.Closeable was appeared before java7?

@sszuev
Copy link
Contributor Author

sszuev commented Aug 26, 2023

@afs

Would it be a good idea to add a mention to the factory-method and directly to the particular implementations documentation that the ConcurrentModificationException may not occur in some cases? A client application (such as ReadWriteLockingGrapg) may use this circumstance for its own purposes. For example, for optimization (see issues #1992 and sszuev/concurrent-rdf-graph#1)

For factory-method docs could look like this:

<strong>Note</strong>: Interrupting Graph Iterators with modification operations 
in general may result in a `ConcurrentModificationException` due to structural changes.
It is <strong>strongly</strong> not recommended in the client code violate this rule. 
However, interruption is possible in some cases, but it depends on the implementation. 

See the documentation for the specific `GraphMem` implementation. 
This behavior can be changed at any time, 
so please read the documentation of specific implementations before
if your code still requires interrupting iterations contrary to the recommendations.

@afs
Copy link
Member

afs commented Aug 28, 2023

IMO the javadoc should focus on the API contract. If it hints at more, a few application writers will chance it!

For iterators, testing or seeing modifications work isn't enough to guarantee the code. Different amounts of data change whether a modification inside the processing loop succeeds or crashes. Blank nodes mean two runs on the same data files may behave differently.

Implementations can change - but code writing to the contract will work.

When the material gets into discussion, for example when it says "may", that feels more like advice which is more suited for a How To on the subject. The javadoc can link to the material.

The contract is:

  • run to exhaustion or close the iterator
  • use finally to be sure
  • No modification inside the processing loop of the iterator
  • ExtendedIterator.remove is not guaranteed and for general code, should not be relied upon.

@afs afs merged commit 40597a4 into apache:main Oct 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants