Skip to content

JENA-1656: Make DatasetGraphMonitor transaction-aware.#519

Closed
pipcet wants to merge 1 commit intoapache:masterfrom
pipcet:datasetgraphmonitor-txn
Closed

JENA-1656: Make DatasetGraphMonitor transaction-aware.#519
pipcet wants to merge 1 commit intoapache:masterfrom
pipcet:datasetgraphmonitor-txn

Conversation

@pipcet
Copy link
Copy Markdown

@pipcet pipcet commented Jan 4, 2019

This adds transaction information to the DatasetChanges interface (and uses default methods, is that okay?)

@afs
Copy link
Copy Markdown
Member

afs commented Jan 4, 2019

I agree - start/finish really don't do the job needed and there is not much use of DatasetGraphMonitor/DatasetGraphChanges either.

Because Jena has regular(ish) releases, development sometimes needs to mature in the system using it.

There is an alternative interface RDFChanges and implementation DatasetGraphChanges which handle transactions. Read transactions are not passed to the changes monitor, writes are; as are transaction promotion events. The changes monitor only sees changes. start and finish are for batches of transactions.

Maybe RDFChanges is too patch-centric; maybe it misses things for other use cases. What do you think?

@pipcet
Copy link
Copy Markdown
Author

pipcet commented Jan 5, 2019

There are a few places in the tree that use DatasetGraphMonitor (or DatasetChanges), so it seemed like the best fit for my situation.

One problem I see with RDFChanges is that txnCommit is called before super.commit, so the change would not yet be visible to a thread that's woken up by txnCommit.

My current code also uses a new interface that monitors reads rather than writes, to determine which part of a graph has been looked at by a query (so we know when to repeat it to simulate a continuous query).

So there are probably three use cases:

  • be informed that the graph has potentially changed and the changes, if any, are visible to other threads
  • be informed that the graph is about to change, allowing you to veto transactions
  • be informed that the graph is being looked at

I think combining the first two in DatasetGraphMonitor/DatasetGraphChanges makes sense, at least if we allow the relatively new use of default interface methods. I suspect different applications have different requirements for false positives, though.

@afs
Copy link
Copy Markdown
Member

afs commented Jan 6, 2019

The important difference is in the visibility, whether the code can see the transaction as it happens (it is "inside" the transaction) or react to changes when they become generally visible ("outside" the transaction).

If "inside" the code has to be careful (!) and same-thread.

RDFChanges, DatasetChanges are inside the transaction. They see the changes as they happen.

For RDFChanges it is important that it is called before the local commit happens. In a distributed RDF Delta system, it is this point that is the system-wide commit point, and has to happen before the local commit. Any loss between the two is recovered when the global commit is replayed locally.

1 is "outside"; 2 is "inside". 3 is "inside" if you mean a real time "being looked at" but maybe is actually "start request, end request" at some higher level.

The only way to veto a transaction is from the inside and before commit. If "inside", same thread, the code can throw an exception any time up to commit and the transaction will abort (if used properly: Txn wraps this all up).

We could add before and after for RDFChanges/DatasetChanges but the "before"/"inside" case has to be there. The outside case can be done with DatasetWrapper but would be nice to have in the monitor lifecycle.

What about having a txnFinishes(TxnType) called once-only after commit/abort/end?

There is GraphChanges in RDF Patch for apps wanting the same functionality but applied to graphs only, including the Graphs-APIs
different/older transaction API.

The only uses I found where, aside from tests, in jena-text and jena-spatial (and jena-spatial is likely to be retired). Getting Lucene to coordinate a commit with the triple store needs to happen; Lucene commits first.

@afs
Copy link
Copy Markdown
Member

afs commented Jan 6, 2019

Ping @ajs6f who has an interest in relating sub-sections of the graph to actions; in his case, locking.

Default method are fine; they are being used in various places.

@pipcet
Copy link
Copy Markdown
Author

pipcet commented Jan 6, 2019

I think it would indeed be best to have both handlers for the "inside" and "outside" case; as you said, your application needs the inside case; mine needs the outside case (or a custom DatasetWrapper).

I don't think the distinction is particularly useful for read transactions, though: those will probably have to deal with false positives anyway.

@afs
Copy link
Copy Markdown
Member

afs commented Jan 10, 2019

I'm not sure what you mean by a "false positive" here.

@afs
Copy link
Copy Markdown
Member

afs commented Mar 16, 2021

Sorry this has gone on for so long.

I went back to this and investigated the relationship to jena-text. It's hard to untangle because jena-text is involved in the transaction and coordinates with Lucene.

The contract for DatasetChanges is rather unclear - it predates full transaction support for all datasets.

I've raised JENA-2071 to deprecate DatasetChanges so it can be removed from general use, freeing up space to do an "outside" transaction monitor with new, definite contract.

If this is still relevant, or you have further thoughts, please add to the ticket JENA-1656.

@afs afs closed this Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants