Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] concurrent org.apache.jena.graph.Graph implementations #1961

Open
sszuev opened this issue Jul 18, 2023 · 28 comments · Fixed by #1964
Open

[question] concurrent org.apache.jena.graph.Graph implementations #1961

sszuev opened this issue Jul 18, 2023 · 28 comments · Fixed by #1964
Labels

Comments

@sszuev
Copy link
Contributor

sszuev commented Jul 18, 2023

Version

4.x.x

Question

In ONT-API we need thread-safe graph.
For this purpose, I created a separate simple library: https://github.com/sszuev/concurrent-rdf-graph
It contains SynchronizedGraph & ReadWriteLockingGraph.
It would be convenient to have only Jena in dependencies (instead of Jena + this library).
So, the question to Apache Jena community: do we need such kinds of Graphs in Jena?
Even we don't need, I think it is good to have this issue for a record.

@sszuev sszuev changed the title [question] additional org.apache.jena.graph.Graph implementations [question] concurrent org.apache.jena.graph.Graph implementations Jul 19, 2023
@rvesse
Copy link
Member

rvesse commented Jul 19, 2023

Any Graph or Dataset that uses our Transactional should already be thread safe if the calling code using transactions around the read/write operations on that instance i.e. Jena Txn API

Now different implementations offer differing levels of concurrent access so the choice of implementation is going to depend on your usage pattern. For example DatasetGraphFactory.createTxnMem() will give you an in-memory transactional DatasetGraph with multiple-reader single-writer (MRSW) concurrency. Alternatively TDB2Factory.createDataset() gives you an in-memory transactional Dataset with multiple-reader and single-writer concurrency (MR+SW)

If you can provide more detail on how you use the Graph API currently we can try and give more specific advice on how to adapt your code to use one of our Transactional implementations

@sszuev
Copy link
Contributor Author

sszuev commented Jul 19, 2023

Well, the code is open for both ONT-API and concurrent-rdf-graph.
ONT-API is an implementation of OWL-API api, we can't use transactions, there are RW-lock mechanism. We have to follow OWL-API rules. OWLAPI supports concurrent access to OWL via java.util.concurrent.locks.ReadWriteLock.

Also, I don't quite understand how transactions could help if we have Streams and don't want to put everything in memory.
Usually transactions are used in the top-level, maybe with help of AOP.
In ONTAPI we have neither top-level nor AOP. And we can't use TDB in the library itself, it is out of scope. If someone wants use TDB he can, but it is not responsibility of the library.
ReadWriteLockingGraph offers such way. If we have a lot of read-operations, and few write ones, no so much memory is required (it use snapshot-mechanism but only if there are open Stream. Remain items are placed into memory in that case. Also there are some other attempts to reduce memory consumption.).

In concurrent-rdf-graph there are tests.

In any case, it seems to me that an additional mechanism will not hurt.

@afs
Copy link
Member

afs commented Jul 19, 2023

Thanks for raising this issue.

Jena provides: GraphTxn from GraphFactory.createGraphTxn (new name Factory -> GraphFactory).

GraphTxn is thread-safe on each single method call even without calling transaction begin-commit/abort from the application.

I don't quite understand how transactions could help if we have Streams and don't want to put everything in memory.

With transactions, iterators from this graph are consistent - they iterator over the data at the time find() was called regardless of changes by other threads. It is not a copy. It is MR+SW - readers and writers can run at the same time. TDB2 and TIM (the dataset for transactional use in-memory behind GarphTxn) do not throw away the state of the database until all usage of that epoch of the database have finished with it.

ONT-API is an implementation of OWL-API api, we can't use transactions, there are RW-lock mechanism.

Could you expand on that? Why "can't"? There are various problems that arise that ACID transactions address. Protecting the graph datastructures is one part of that - having a consistency view of the data is another.

SynchronizedGraph & ReadWriteLockingGraph

These seem to protect individual operations but not a sequence of operations (the A in ACID). E.g. adding several triples. So datastructure are protected but application can see half-completed changes.

I'm not sure the remember() function helps - it seems to only protect at each step of an iterator. But if the base data changes during an iterators life between calls to next(), it may thrown ConcurrentModificationException and if it doesn't may not give correct iteration, missing out or duplicating items.

@sszuev
Copy link
Contributor Author

sszuev commented Jul 19, 2023

Could you expand on that? Why "can't"? There are various problems that arise that ACID transactions address. Protecting the graph datastructures is one part of that - having a consistency view of the data is another.

OWLAPI provides R\W locking mechanism, so we should also support this functionality in ONTAPI, we already has concurrent OWLOntoloty, so it would be nice to have protected RDF-view.
Transactions can be used by those who use this library, they can protect calls of ONTAPI in anyway they like. ONTAPI is just an advanced wrapper of RDFGraph, a view. It can wrap any Graph.
So no problem for clients to make their own transaction protection.
But I think providing this functionality in the library itself would be out of scope.
ONTAPI is intended for OWL2 only. But maybe, some day, as an extension.

These seem to protect individual operations but not a sequence of operations (the A in ACID). E.g. adding several triples. So datastructure are protected but application can see half-completed changes.

You can comment out this method and run tests. I think the tests will fail. Method remember creates Wrapper for iterator (OuterTriplesIterator) with it is own lock (ReentrantLock, not the outer ReadWriteLock), it protects next & hasNext & close. When there is a Graph write operation, this lock plays its main role, see releaseToSnapshot (i.e. wrapper.lock.withLock {}). When this happens the iterator is collected to snapshot (List) but outer iteration is locked, all other iterators continue to run until their turn comes. After some iterator "released to snapshot" (i.e. collected to snapshot List), nothing blocks it from further iteration (against snapshot data not underlying data). Whole write operation is protected by R/W lock, so no new iterators will appeared (creation of iterator is a read operation), but already collected iterators are not locked any more.

But off course there could be mistakes and possible improvements, this implementation can be considered as a draft.
(Honestly, I'm not sure if anyone is using this feature, but it should be in the ONT-API library for compatibility with OWLAPI-impl)

These seem to protect individual operations but not a sequence of operations (the A in ACID). E.g. adding several triples. So datastructure are protected but application can see half-completed changes.

yes, there is an issue about protection RDF-Model view: owlcs/ont-api#46
Also, I consider RW-Graph as a collection, similar to java.util.concurrent.CopyOnWriteArrayList but for RDF.
So yes, it is not about ACID (only A).

@afs
Copy link
Member

afs commented Jul 20, 2023

From the Jena project point of view, it's not desirable to add implementations that support one or two uses if there is a way to use the general machinery. The project fills up with random unloved code with a maintenance cost.

Jena already has GraphTxn. Can this be used?

It is better than MRSW that ReadWriteLock provides (multiple readers OR a single writer). An iterator blocks writers or is inconsistent and may cause CCE.

GraphTxn provides multiple readers AND a single writer so while an iterator is in-use, writers are not held up.

It is thread-safe at the moment. It the moment iterators will see complete write transaction atomically appear between next() calls. This is a form of "read commited" semantics.

The only thing I see missing is have a consistent Graph.find() so that iterators do not see the overlapping writes
This can be added -- illustration: #1964 It assumes that ExtendedIterator.close is called.

It isn't even at risk of blocking other threads if the iterator is not closed; a write on the same thread will fail at the point of the write.

If the application uses explicit transactions itself, it will work as well.

Fuseki operations all use transactions.

@sszuev
Copy link
Contributor Author

sszuev commented Jul 20, 2023

It is better than MRSW that ReadWriteLock provides (multiple readers OR a single writer). An iterator blocks writers or is inconsistent and may cause CCE.

ReadWriteLockingGraph solution does not prevent iteration of existing iterators even someone forget to close some iterator (and note: all iterators detached from the graph, so low memory leaks probability), but it is prevent creation new iterators, during releaseToSnapshot the only one processed iterator is blocked for iteration (I think there could be a solution how to overcome this limitation + I see some more improvements)

I would not compare GraphTxn and ReadWriteLockingGraph, they have different scopes and usage.
GraphTxn is great but complicated and transaction-oriented, ReadWriteLockingGraph is simple and lightweight, could be used as an concurrent collection.

Jena already has GraphTxn. Can this be used?

I think ONT-API library users can use GraphTxn as a base graph (ONT-API is just an enhanced wrapper).
But that's not what ONT-API was created for.

From the Jena project point of view, it's not desirable to add implementations that support one or two uses if there is a way to use the general machinery.

Ok, I got the point. If it is the final resolution, then the issue, I think, can be closed.

@afs
Copy link
Member

afs commented Jul 21, 2023

The trouble with locks is deadlocks - both absolute ("deadly embrace") and under load ("deadlock by congestion").

Don't mix "simple" and "lightweight" :-) Java locks are not lightweight. Their contract on the memory model is expensive to the whole application. It's the first generation memory model for Java. It's not a great fit to model processor architectures.

GraphTxn - if the Txn troubles you ignore Txn.
It works without needing explicit application control - c.f. SQL has "auto commit".

The datastructures behind GraphTxn are a Java port of the structures in Scala.

I don't know ONT-API in detail - what is the contract for the application?
(OWL API doesn't mention it)

@sszuev
Copy link
Contributor Author

sszuev commented Jul 21, 2023

In OWL-API (and ONT-API), there is a concurrent OWLOntologyManager in additional to the standard one (e.g. OWL-API impl: https://github.com/owlcs/owlapi/blob/version5/impl/src/main/java/uk/ac/manchester/cs/owl/owlapi/concurrent/ConcurrentOWLOntologyImpl.java)
All Ontologies in a manager and manager itself use the single ReadWriteLock instance, so it is difficult to me to imagine a scenario with deadlocks. The same in ONT-API - RW-lock is not only about Graph operations, but also about managing references between ontologies, etc.

what is the contract for the application?

Maybe "contract" is not quite the right term, sorry for the misleading.
The goal of ONT-API is to support all the features of the classic OWL-API-impl, having an RDF as the basis (with arbitrary nature).

Concurrent manager is used, for example, in Protege (btw I have RDF-based fork of Protege with ONTAPI inside, as an example of ONTAPI use). Not sure concurrent manager in Protege is a correct solution (there are only two thread - main & EDT. it is a swing application).
Also, I raised a question owlcs/owlapi#1110
Now we know couple of usages in ancient projects. Maybe someone will give more examples.

GraphTxn - if the Txn troubles you ignore Txn.

I should take a closer look at it. Maybe it would be a new feature in ONTAPI.

+

Java locks are not lightweight. Their contract on the memory model is expensive to the whole application. It's the first generation memory model for Java. It's not a great fit to model processor architectures.

Maybe it is second generation? From [wiki] (https://en.wikipedia.org/wiki/Java_memory_model):
The original Java memory model developed in 1995, was widely perceived as broken, preventing many runtime optimizations and not providing strong enough guarantees for code safety. It was updated through the Java Community Process, as Java Specification Request 133 (JSR-133), which took effect back in 2004, for Tiger (Java 5.0).

java.util.concurrent.locks,ReentrantReadWriteLock has also been introduced in Java 5.0.
The first generation of JMM supported only primitives like synchronized, wait, notify.
Atomics, Concurrent Collections, Barriers, Locks have appeared later. Usually they use CAS approach.

@afs
Copy link
Member

afs commented Jul 24, 2023

it is difficult to me to imagine a scenario with deadlocks

The discussion here is about general support. If graphs implementations are in ONT-API, they can do what is required for that module.

@sszuev
Copy link
Contributor Author

sszuev commented Jul 27, 2023

Jena already has GraphTxn. Can this be used?

I see GraphTxn is an in-memory Graph.
https://github.com/apache/jena/blob/main/jena-arq/src/main/java/org/apache/jena/sparql/graph/GraphFactory.java#L45

I think, due to this fact it is not very suitable for the library ONT-API, it is intended to wrap any graph, providing concurrent access out-of-box (maybe important, in the ONT-API there are two ways: concurrent and non-concurrent)

+ I see lack of documentation, nothing is said about the fact that this graph is thread-safe.

@sszuev
Copy link
Contributor Author

sszuev commented Jul 30, 2023

I did a quick research, it shows that in some cases GraphTxn has worse performance than RW and Synchronized Graphs. But of course need JMH benchmarks to proof or disproof this conclusion. Note that apparently the performance and memory consumption of RW Graph can be improved (sszuev/concurrent-rdf-graph#1).

image

afs added a commit to afs/jena that referenced this issue Jul 31, 2023
@afs afs closed this as completed in #1964 Jul 31, 2023
afs added a commit that referenced this issue Jul 31, 2023
GH-1961: Thread-safe and consistent GraphTxn.find
@sszuev
Copy link
Contributor Author

sszuev commented Jul 31, 2023

@afs

Is it closed intentionally or accidentally (due to commit message)? It is OK for me if Jena doesn't need this thread-safe wrapper (lightweight and simple in my opinion), but it might be better if an explicit final resolution was written.

@afs
Copy link
Member

afs commented Jul 31, 2023

Side-effect of the PR. Reopened.

@afs afs reopened this Jul 31, 2023
@afs
Copy link
Member

afs commented Jul 31, 2023

GraphTx does more. Because that includes multiple true concurrent views of the data it will cost more.

It does not require the Kotlin runtime.

This is like "autocommit" for SQL databases. It has all the consequences of that as well. It has more overhead, and does give application change consistency (e.g. a read-modify-write) is two transactions and other transactions/thread may change or view the database between the "read" and the "write").

A back account with $1000. To add to the account, an app reads the amount, adds the extra amount, deltes the old value and writes the new value (let' assume there is a modify does delete-add atomically and not get into the fact Model doesn't have such a thing).

Thread A adds $500, thread B adds $250, what is the account now? With just safe read-modify operations, the answer is one of $1250, $1500, and $1750 depending on the way the operations interleave.

Any solution to the concurrent read-modify-write needs the application to indicate the start and finish of
a block of code so it can read the data, decide on the change and write the data without any other code invalidating the decision.

Model provides enterCriticalSection and leaveCriticalSection. This is rather old fashioned way to deal with the problem. But it at least gives a block of code over several operations.

@sszuev
Copy link
Contributor Author

sszuev commented Aug 1, 2023

I understand your example.

Of course, thread-safe Graphs cannot replace transactional ones. I think they have different fields of application. Analogy: Redis vs ConcurrentHashMap.

It seems to me that the main question in this issue is whether it will be useful to someone or not. If not, then perhaps we shouldn't add this functionality to Apache Jena.
That's why I'm asking here: we need more examples of possible use. I also asked in the OWLCS group.

There is definitely one example of use: ONTAPI. I'm thinking of other examples, but so far such examples are purely speculative.

For example, we could collect a graph from various sources simultaneously and display it as a tree or a set of axioms on various devices without consistency. Perhaps even co-edit (but it seems in this case a transactional graph is more suitable).

Maybe we could use a thread-safe graph as a kind of cache. The Caffeine cache uses ConcurrentHashMap, so why not.

And of course, a thread-safe graph-wrapper has some advantages over a transactional one - performance and the ability to wrap any graph. Maybe it would be useful for someone.

I asked a colleague about additional examples, if we come up with something, I will edit this comment.

UDP-1: Found one real example from business project: we collect data from different sources (cockroach db) in parallel and then write data to RDF Graph. Maybe right now the performance of GraphTxn is not bottleneck since db-operation costs must be more expensive. But we may change algorithm, e.g. process that graph on-fly. From DB we get entities, attributes and links (in OWL terms - class assertions, object & property assertions), so no conflicts are expected since we process in a stream not single triplet one by one, but a set of connected triplets for each entity. Maybe we will change this solution not to use single-graph algorithm.

UDP-2: Asked ChatGPT:

  • Question: given: concurrent RDF Graph implementation, no transaction (no ACID), provide use-cases where concurrent implementation will be better than ACID implementation. Note: ACID impl has worse performance
  • Answer:
Concurrent RDF Graph implementations without the ACID properties (Atomicity, Consistency, Isolation, and Durability) trade data consistency guarantees for improved performance and efficiency, which can make them suitable for specific use cases. Below are some scenarios where a concurrent implementation might be a better choice than an ACID implementation, despite the latter providing stronger consistency guarantees:

1. Data ingestion and analysis: In scenarios where large volumes of data need to be ingested and analyzed rapidly, such as in the IoT or social media streams, a concurrent RDF Graph implementation can handle the high throughput more efficiently, as long as occasional inconsistencies are tolerable.

2. Real-time recommendations and collaborative filtering: When providing real-time recommendations for items like news articles, movies, or products, a concurrent RDF implementation might offer better performance for query processing. Since recommendations don't require transactions or strict consistency, this can be suitable for large-scale applications that necessitate quick response times.

3. Cache maintenance: Concurrent implementation can be used as a frontend cache to regularly update synchronized RDF graphs, providing faster access to recently updated data. The non-ACID nature is not critical in this case, as long as the source of truth remains an ACID-based store.

4. Data exploration, visualization, and dashboarding: Since these use cases often require low latency and a flexible query capability, a concurrent RDF implementation can be more suitable. In these scenarios, eventual consistency is often acceptable, as users are more focused on overall trends and insights rather than strict accuracy.

5. Scientific simulations and modeling: Concurrent implementations can be particularly useful for large-scale simulations or modeling scenarios, such as climate studies or genomics research, where the sheer volume of data and frequent updates demand improved performance over strict consistency.

6. High availability (A/B testing, load balancing, and failover): In systems where high availability and redundancy are important, a concurrent RDF implementation can ensure that replicas of the graph continue to operate despite occasional inconsistencies. In these cases, the speed of replication outweighs the need for perfect consistency.

As with any data management system, the choice of implementation depends on the specific use case and requirements. When absolute data consistency is less critical than performance, a concurrent RDF Graph implementation can offer various advantages over an ACID-compliant implementation.

@arne-bdt
Copy link
Contributor

arne-bdt commented Aug 2, 2023

Although I initially didn't plan to share this, given our ongoing discussion, I believe it's appropriate...

I'm currently developing a thread-safe graph for Jena, which supports a multiple-reader plus single-writer (MR+SW) model. The existing implementation in Jena is based on Dexx collections, which are "a port of Scala's immutable, persistent collection classes to pure Java." This approach has numerous advantages, such as simple implementation, minimal locking requirements, and robustness. However, its main drawback is its poor performance and heavy reliance on the garbage collector.

My goal is to offer an alternative that should provide better performance. However, as this new model isn't based on immutable persistent collections, it's quite a delicate endeavor.

These graphs could form the foundation for an alternative thread-safe Dataset implementation. Therefore, they could potentially be used in Fuseki.

Please note that this is a work in progress and may take a few more weeks (or potentially months) to develop. It's still in an early stage and may need several rounds of refactoring. The current state of development suggests that my plan can be successful. Even considering the overhead of locking, thread-local variables, and some duplicate operations, this implementation can significantly outperform existing ones (as always, this will depend on specific use cases).

For anyone who can't resist a sneak peek, you can check out the progress at:
https://github.com/arne-bdt/jena/blob/SWMR_Dataset/jena-arq/src/main/java/org/apache/jena/sparql/core/mem2/GraphWrapperTransactional.java

Please be patient with me if my approach ends up not being successful. I'm doing my best to tackle this, but there's a high risk of failure.

@sszuev
Copy link
Contributor Author

sszuev commented Aug 26, 2023

wrote benchmarks:

# Run complete. Total time: 09:03:24

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                                                               (factory)   Mode  Cnt         Score         Error  Units
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_D_W_2x420              TXN_GRAPH  thrpt   25         1,446 �       0,041  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_D_W_2x420     SYNCHRONIZED_GRAPH  thrpt   25        18,554 �       0,204  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_D_W_2x420       RW_LOCKING_GRAPH  thrpt   25        20,452 �       0,315  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_F_RW_5x5               TXN_GRAPH  thrpt   25         4,954 �       0,198  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_F_RW_5x5      SYNCHRONIZED_GRAPH  thrpt   25         3,900 �       0,303  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_F_RW_5x5        RW_LOCKING_GRAPH  thrpt   25         2,894 �       0,144  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_G_R_4x21               TXN_GRAPH  thrpt   25         0,459 �       0,005  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_G_R_4x21      SYNCHRONIZED_GRAPH  thrpt   25       124,310 �       1,384  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_G_R_4x21        RW_LOCKING_GRAPH  thrpt   25       120,663 �      11,241  ops/s
FunctionalBenchmarks.ADD                                       SYNCHRONIZED_GRAPH  thrpt   25  10327745,424 �  324216,746  ops/s
FunctionalBenchmarks.ADD                                         RW_LOCKING_GRAPH  thrpt   25   9642466,664 �  545900,257  ops/s
FunctionalBenchmarks.ADD                                                TXN_GRAPH  thrpt   25    568284,348 �   14906,968  ops/s
FunctionalBenchmarks.ADD                                                MEM_GRAPH  thrpt   25   9868883,985 �  351432,363  ops/s
FunctionalBenchmarks.CONTAINS                                  SYNCHRONIZED_GRAPH  thrpt   25  17414202,069 �  838847,529  ops/s
FunctionalBenchmarks.CONTAINS                                    RW_LOCKING_GRAPH  thrpt   25  16596013,888 �  360388,978  ops/s
FunctionalBenchmarks.CONTAINS                                           TXN_GRAPH  thrpt   25      2617,453 �     639,489  ops/s
FunctionalBenchmarks.CONTAINS                                           MEM_GRAPH  thrpt   25  17735128,066 �  332657,672  ops/s
FunctionalBenchmarks.COUNT                                     SYNCHRONIZED_GRAPH  thrpt   25  34871132,016 �   54212,935  ops/s
FunctionalBenchmarks.COUNT                                       RW_LOCKING_GRAPH  thrpt   25  30702008,214 �  124360,601  ops/s
FunctionalBenchmarks.COUNT                                              TXN_GRAPH  thrpt   25     12754,607 �   12240,855  ops/s
FunctionalBenchmarks.COUNT                                              MEM_GRAPH  thrpt   25  35705376,831 �   58186,669  ops/s
FunctionalBenchmarks.DELETE                                    SYNCHRONIZED_GRAPH  thrpt   25   6132754,574 �  128806,329  ops/s
FunctionalBenchmarks.DELETE                                      RW_LOCKING_GRAPH  thrpt   25   6006734,462 �  158163,662  ops/s
FunctionalBenchmarks.DELETE                                             TXN_GRAPH  thrpt   25    466415,443 �   34619,335  ops/s
FunctionalBenchmarks.DELETE                                             MEM_GRAPH  thrpt   25   6378279,427 �   74230,896  ops/s
FunctionalBenchmarks.FIND_ALL                                  SYNCHRONIZED_GRAPH  thrpt   25   1995921,417 �  197773,961  ops/s
FunctionalBenchmarks.FIND_ALL                                    RW_LOCKING_GRAPH  thrpt   25    731483,422 �    4945,364  ops/s
FunctionalBenchmarks.FIND_ALL                                           TXN_GRAPH  thrpt   25     27210,697 �   11762,668  ops/s
FunctionalBenchmarks.FIND_ALL                                           MEM_GRAPH  thrpt   25   6250382,955 �  558544,461  ops/s
FunctionalBenchmarks.FIND_SOME                                 SYNCHRONIZED_GRAPH  thrpt   25   9185112,808 �  167003,190  ops/s
FunctionalBenchmarks.FIND_SOME                                   RW_LOCKING_GRAPH  thrpt   25   1891931,627 �   24180,754  ops/s
FunctionalBenchmarks.FIND_SOME                                          TXN_GRAPH  thrpt   25      2585,932 �     405,423  ops/s
FunctionalBenchmarks.FIND_SOME                                          MEM_GRAPH  thrpt   25  19784548,579 � 1761162,017  ops/s
FunctionalBenchmarks.MIXED_OPERATIONS                          SYNCHRONIZED_GRAPH  thrpt   25    371974,804 �    6627,910  ops/s
FunctionalBenchmarks.MIXED_OPERATIONS                            RW_LOCKING_GRAPH  thrpt   25    160969,040 �    1328,173  ops/s
FunctionalBenchmarks.MIXED_OPERATIONS                                   TXN_GRAPH  thrpt   25      1864,759 �     911,797  ops/s
FunctionalBenchmarks.MIXED_OPERATIONS                                   MEM_GRAPH  thrpt   25    511414,126 �    4680,192  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_A_RW_5x5             TXN_GRAPH  thrpt   25        53,449 �       0,348  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_A_RW_5x5    SYNCHRONIZED_GRAPH  thrpt   25       455,535 �       3,287  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_A_RW_5x5      RW_LOCKING_GRAPH  thrpt   25        40,529 �       0,714  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_A_RW_8x4             TXN_GRAPH  thrpt   25        42,045 �       0,159  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_A_RW_8x4    SYNCHRONIZED_GRAPH  thrpt   25       352,476 �       1,480  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_A_RW_8x4      RW_LOCKING_GRAPH  thrpt   25        25,387 �       0,759  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_B_R_5x5              TXN_GRAPH  thrpt   25        10,475 �       0,086  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_B_R_5x5     SYNCHRONIZED_GRAPH  thrpt   25       261,004 �       2,216  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_B_R_5x5       RW_LOCKING_GRAPH  thrpt   25       257,982 �       1,757  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_B_R_8x4              TXN_GRAPH  thrpt   25         8,194 �       0,168  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_B_R_8x4     SYNCHRONIZED_GRAPH  thrpt   25       203,590 �       0,816  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_B_R_8x4       RW_LOCKING_GRAPH  thrpt   25       209,688 �       4,162  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_C_RW_2x42            TXN_GRAPH  thrpt   25         8,184 �       0,346  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_C_RW_2x42   SYNCHRONIZED_GRAPH  thrpt   25       183,167 �       2,297  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_C_RW_2x42     RW_LOCKING_GRAPH  thrpt   25        45,588 �       0,512  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_C_RW_8x14            TXN_GRAPH  thrpt   25        17,841 �       0,509  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_C_RW_8x14   SYNCHRONIZED_GRAPH  thrpt   25       380,125 �       3,195  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_C_RW_8x14     RW_LOCKING_GRAPH  thrpt   25        81,118 �       0,655  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_DB_RW_6x14           TXN_GRAPH  thrpt   25        12,359 �       0,188  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_DB_RW_6x14  SYNCHRONIZED_GRAPH  thrpt   25       241,269 �       2,998  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_DB_RW_6x14    RW_LOCKING_GRAPH  thrpt   25        57,992 �       0,415  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_D_W_5x5              TXN_GRAPH  thrpt   25       134,988 �       5,140  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_D_W_5x5     SYNCHRONIZED_GRAPH  thrpt   25      1165,756 �      15,443  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_D_W_5x5       RW_LOCKING_GRAPH  thrpt   25      1144,433 �       8,823  ops/s
SmallGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_A_RW_5x5             TXN_GRAPH  thrpt   25       861,604 �      11,766  ops/s
SmallGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_A_RW_5x5    SYNCHRONIZED_GRAPH  thrpt   25      2317,995 �      14,237  ops/s
SmallGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_A_RW_5x5      RW_LOCKING_GRAPH  thrpt   25      1361,093 �       9,248  ops/s

I think concurrent benchmarks are not fully correct (if somebody knows how to write them correctly please give me feedback).
These benchmarks show that SynchronizedGraph is really fast and ReadWriteLockingGraph is number two, GraphTxn has the worst performance, but not always.

@arne-bdt let me know please when you finish your implementation, will include it in this benchmarks.

benchmarks-260823-01.txt

UPD: added ExtendedSynchronizedGraph, it is faster than ReadWriteLockingGraph

@arne-bdt
Copy link
Contributor

@sszuev, I would greatly appreciate it if you could include my implementation in your benchmarking tests.

My own benchmarks and tests suggest that this implementation of a SWMR+ graph will soon be viable for deployment with our clients. However, it's important to note that testing and documentation are still works in progress, and some refactoring is anticipated. Despite these ongoing developments, the implementation is functioning as intended and meets our speed requirements.

In our use case, we manage a graph with approximately 600,000 triples. We face a demanding scenario where 375,000 triples are updated every three seconds, alongside smaller updates involving about 100 triples every second. This occurs concurrently with eight threads, each reading all triples once per second. On my new notebook, which has sufficient processing power, I've been able to simulate this scenario in a single unit test. You can find these tests here:

To run these tests, you'll need to check out my SMMR_Dataset branch to get it running.

@sszuev
Copy link
Contributor Author

sszuev commented Jan 14, 2024

@afs
after switching 4.10.0 -> 5.0.0-SNAPSHOT I see that GraphTxn no longer works properly. Perhaps we now need to explicitly use the transaction mechanism even for simple reads like Graph#find ?

@arne-bdt
The same question about GraphWrapperTransactional2.

In my tests I can overcome this limitation by wrapping txn-graph with find = openTxn(); delegate.find().toList().iterator(); closeTxn()

image

@afs
Copy link
Member

afs commented Jan 14, 2024

I see that GraphTxn no longer works properly

Is that stacktrace for the GraphTxn case? I can't see where the "find" step is.

Why is the code writing inside a find()? Even if it worked, it can be inconsistent because the changes can be before or after where the iterator has got to.

The most robust solution is to isolate the find. It adds cost only if iterators are opened and not fully used (e.g. implementations of "contains" would need checking).

@arne-bdt
Copy link
Contributor

arne-bdt commented Jan 14, 2024

@sszuev: GraphWrapperTransactional2 is designed to track all write and read transactions. Additionally, opening a transaction explicitly is crucial for ensuring data consistency when multiple calls to the graph are required.

You would also need to consume the iterator within the transaction, Just Like you did with the .toList() call. After the transaction is closed, the iterator is no longer valid. I wanted to mention this, because it is not true for GraphTxn, as far as I know.

@sszuev
Copy link
Contributor Author

sszuev commented Jan 14, 2024

@afs

Is that stacktrace for the GraphTxn case? I can't see where the "find" step is.

yes. you can find code here: https://github.com/sszuev/concurrent-rdf-graph
and run tests against 4.10.0 and 5.0.0-SNAPSHOT.
Usually if some functionality worked well in a previous version and no longer works in the next version, it may be a regression.

The stacktrace above is from single-thread test in a scenario where read operations followed by write one
https://github.com/sszuev/concurrent-rdf-graph/blob/main/src/test/kotlin/scenarious/CommonScenarios.kt#L98
There is no transactional blocks in tests, no explicit closing iterator, but every read operation is closing implicitly by terminal operations like toList.

@arne-bdt
I can try to adapt tests and benchmarks to use transactional mechanism and ExtendedIterator#close.
Provided by concurrent-rdf-graph project concurrent implementations do not require such mechanisms (more precisely, close is still required for "short-circuiting terminal operations" like findFirst).
For adapting I can use some kind of graph-wrapper with no-opt transactional mechanism for my implementations.


Let me remind that I have neither implementations nor tests of a full-fledged ACID.
The goal of the project is to provide thread-safe implementations, but any transaction graph must also be thread-safe.
Therefore, comparison of such graphs is quite correct.

@sszuev
Copy link
Contributor Author

sszuev commented Jan 15, 2024

@afs
after adding start/end/commit I no longer see failures in 5.0.0-SNAPSHOT,
but the tests started to fail on the new code for version 4.10.0 (calling toList in a transaction). As far as I understand, these are bugs from the old version of the GraphTxn, so it is ok.

@arne-bdt
Copy link
Contributor

@afs I did the same thing :-)
But the estimated runtime for the benchmarks is > 12 hours. So I did not wait for them to finish. I only ran the unit tests, which work fine.

@sszuev Your benchmarks seem to test for throughput in different scenarios. If I understand the code correctly, there is always one graph to start with and then multiple threads doing almost the same operations (sometimes triples depend on tread or iteration number) working on that same graph in multiple iterations. Most of them mix read and write operations.
For all Tests with write operations they basically only wait until it is their turn. So all these operations are executed sequentially but in an undetertermined order.
Did I miss something?

In my implementation, the semapthore for the write locks ist not fair (maybe I should be).
With your current benchmarks, my graph would need longer timeouts and/or a fair semaphore, I guess.

@sszuev
Copy link
Contributor Author

sszuev commented Jan 16, 2024

The tests are based on various scenarios that are executed in multithreading (java-executors and kotlin-coroutines in tests - it does not matter) and in cycles. There are CyclicBarrier and CountDownLatch that provide simultaneous execution of scenarios.
There are also checks for invariants.
This is a classic scheme, without barriers we can't be sure that there is really parallel execution.
Scenarios are different, some only R, some W, most are RW.
These scenarios were written without any particular understanding of what needs to be tested, i.e. this is a set of some operations that were collected into one scenario based on some intuitive thoughts.

Benchmarks also contains functional, i.e. classic, benchmarks, where single operation is measured.


@arne-bdt I added your implementation to benchmarks and tests suites.
see branch jena-5.x.x-SWMR; https://github.com/sszuev/concurrent-rdf-graph/tree/jena-5.x.x-SWMR
Note that I changed version of project 5.0.0-SNAPSHOT-SWMR in order to work with the official 5.0.0-SNAPSHOT and your branch installed locally without re-installing.


I ran benchmarks with the parameters -wi 2 -i 3, Total time: 08:00:12
A classic SYNCHRONIZED_GRAPH_V1 is the faster one, WRAPPER_TRANSACTIONAL2_GRAPH on the last place.
SYNCHRONIZED_GRAPH_V2 - second place, this is advanced implementation, which does not store iterator's data in memory.

And important note: there are failures when running WRAPPER_TRANSACTIONAL2_GRAPH, so not all benchmarks have been collected.
See logs attached.

Benchmark                                                                         (factory)   Mode  Cnt         Score        Error  Units
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_D_W_6x21                         TXN_GRAPH  thrpt   15        64,930 ±      7,678  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_D_W_6x21             SYNCHRONIZED_GRAPH_V1  thrpt   15       232,215 ±     38,918  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_D_W_6x21             SYNCHRONIZED_GRAPH_V2  thrpt   15       241,507 ±     26,096  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_D_W_6x21               RW_LOCKING_GRAPH_V1  thrpt   15       231,456 ±     49,945  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_D_W_6x21               RW_LOCKING_GRAPH_V2  thrpt   15       232,684 ±     34,847  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_D_W_6x21      WRAPPER_TRANSACTIONAL2_GRAPH  thrpt   15         0,069 ±      0,001  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_F_RW_5x4                         TXN_GRAPH  thrpt   15         6,328 ±      0,291  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_F_RW_5x4             SYNCHRONIZED_GRAPH_V1  thrpt   15         4,606 ±      1,331  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_F_RW_5x4             SYNCHRONIZED_GRAPH_V2  thrpt   15         2,954 ±      0,220  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_F_RW_5x4               RW_LOCKING_GRAPH_V1  thrpt   15         4,845 ±      0,465  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_F_RW_5x4               RW_LOCKING_GRAPH_V2  thrpt   15         4,765 ±      0,541  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_F_RW_5x4      WRAPPER_TRANSACTIONAL2_GRAPH  thrpt   15         0,067 ±      0,001  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_G_R_4x11                         TXN_GRAPH  thrpt   15         1,131 ±      0,016  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_G_R_4x11             SYNCHRONIZED_GRAPH_V1  thrpt   15       159,767 ±     17,675  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_G_R_4x11             SYNCHRONIZED_GRAPH_V2  thrpt   15       156,280 ±     16,911  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_G_R_4x11               RW_LOCKING_GRAPH_V1  thrpt   15       161,223 ±     26,616  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_G_R_4x11               RW_LOCKING_GRAPH_V2  thrpt   15       158,052 ±     20,801  ops/s
BigGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_G_R_4x11      WRAPPER_TRANSACTIONAL2_GRAPH  thrpt   15       174,225 ±     40,285  ops/s
FunctionalBenchmarks.ADD                                              SYNCHRONIZED_GRAPH_V1  thrpt   15   9507833,994 ± 869229,620  ops/s
FunctionalBenchmarks.ADD                                              SYNCHRONIZED_GRAPH_V2  thrpt   15   9364786,667 ± 375206,535  ops/s
FunctionalBenchmarks.ADD                                                RW_LOCKING_GRAPH_V1  thrpt   15   9484691,317 ± 181337,215  ops/s
FunctionalBenchmarks.ADD                                                RW_LOCKING_GRAPH_V2  thrpt   15   9560358,404 ± 215175,600  ops/s
FunctionalBenchmarks.ADD                                                          TXN_GRAPH  thrpt   15    578039,756 ±  19580,417  ops/s
FunctionalBenchmarks.ADD                                                          MEM_GRAPH  thrpt   15  10340640,978 ± 728912,121  ops/s
FunctionalBenchmarks.ADD                                       WRAPPER_TRANSACTIONAL2_GRAPH  thrpt   15      3252,095 ±    781,614  ops/s
FunctionalBenchmarks.CONTAINS                                         SYNCHRONIZED_GRAPH_V1  thrpt   15  13899056,559 ± 739607,369  ops/s
FunctionalBenchmarks.CONTAINS                                         SYNCHRONIZED_GRAPH_V2  thrpt   15  13335406,849 ± 775383,488  ops/s
FunctionalBenchmarks.CONTAINS                                           RW_LOCKING_GRAPH_V1  thrpt   15  12737718,035 ± 264032,537  ops/s
FunctionalBenchmarks.CONTAINS                                           RW_LOCKING_GRAPH_V2  thrpt   15  12967674,455 ± 733900,276  ops/s
FunctionalBenchmarks.CONTAINS                                                     TXN_GRAPH  thrpt   15   1011517,407 ±  35875,156  ops/s
FunctionalBenchmarks.CONTAINS                                                     MEM_GRAPH  thrpt   15  14939271,773 ± 139467,875  ops/s
FunctionalBenchmarks.CONTAINS                                  WRAPPER_TRANSACTIONAL2_GRAPH  thrpt   15      3733,116 ±   1102,551  ops/s
FunctionalBenchmarks.COUNT                                            SYNCHRONIZED_GRAPH_V1  thrpt   15  20713402,946 ± 495506,189  ops/s
FunctionalBenchmarks.COUNT                                            SYNCHRONIZED_GRAPH_V2  thrpt   15  20743633,609 ± 276745,250  ops/s
FunctionalBenchmarks.COUNT                                              RW_LOCKING_GRAPH_V1  thrpt   15  18677039,197 ± 422703,739  ops/s
FunctionalBenchmarks.COUNT                                              RW_LOCKING_GRAPH_V2  thrpt   15  18545840,762 ± 583214,365  ops/s
FunctionalBenchmarks.COUNT                                                        TXN_GRAPH  thrpt   15    113679,704 ±   2759,275  ops/s
FunctionalBenchmarks.COUNT                                                        MEM_GRAPH  thrpt   15  22498971,119 ± 182178,216  ops/s
FunctionalBenchmarks.DELETE                                           SYNCHRONIZED_GRAPH_V1  thrpt   15   6531783,863 ± 111434,950  ops/s
FunctionalBenchmarks.DELETE                                           SYNCHRONIZED_GRAPH_V2  thrpt   15   5980168,481 ± 138766,198  ops/s
FunctionalBenchmarks.DELETE                                             RW_LOCKING_GRAPH_V1  thrpt   15   6004025,859 ± 188190,779  ops/s
FunctionalBenchmarks.DELETE                                             RW_LOCKING_GRAPH_V2  thrpt   15   5828131,807 ±  71189,885  ops/s
FunctionalBenchmarks.DELETE                                                       TXN_GRAPH  thrpt   15    457443,775 ±   2200,891  ops/s
FunctionalBenchmarks.DELETE                                                       MEM_GRAPH  thrpt   15   6794662,701 ± 141559,073  ops/s
FunctionalBenchmarks.DELETE                                    WRAPPER_TRANSACTIONAL2_GRAPH  thrpt   15      3492,775 ±    628,057  ops/s
FunctionalBenchmarks.FIND_ALL                                         SYNCHRONIZED_GRAPH_V1  thrpt   15   2436745,158 ±  47707,699  ops/s
FunctionalBenchmarks.FIND_ALL                                         SYNCHRONIZED_GRAPH_V2  thrpt   15    785148,924 ±  12140,536  ops/s
FunctionalBenchmarks.FIND_ALL                                           RW_LOCKING_GRAPH_V1  thrpt   15    777309,448 ±   8278,041  ops/s
FunctionalBenchmarks.FIND_ALL                                           RW_LOCKING_GRAPH_V2  thrpt   15    780578,613 ±   6091,941  ops/s
FunctionalBenchmarks.FIND_ALL                                                     TXN_GRAPH  thrpt   15    113831,962 ±   5882,211  ops/s
FunctionalBenchmarks.FIND_ALL                                                     MEM_GRAPH  thrpt   15   4071771,498 ±  48533,407  ops/s
FunctionalBenchmarks.FIND_ALL                                  WRAPPER_TRANSACTIONAL2_GRAPH  thrpt    3      4205,814 ±  12821,396  ops/s
FunctionalBenchmarks.FIND_SOME                                        SYNCHRONIZED_GRAPH_V1  thrpt   15   9878544,973 ± 130398,809  ops/s
FunctionalBenchmarks.FIND_SOME                                        SYNCHRONIZED_GRAPH_V2  thrpt   15   1895768,662 ±  21455,858  ops/s
FunctionalBenchmarks.FIND_SOME                                          RW_LOCKING_GRAPH_V1  thrpt   15   1868735,058 ±  22717,077  ops/s
FunctionalBenchmarks.FIND_SOME                                          RW_LOCKING_GRAPH_V2  thrpt   15   1864042,205 ±  33819,169  ops/s
FunctionalBenchmarks.FIND_SOME                                                    TXN_GRAPH  thrpt   15    821360,951 ±  17775,343  ops/s
FunctionalBenchmarks.FIND_SOME                                                    MEM_GRAPH  thrpt   15  16829918,234 ± 190392,137  ops/s
FunctionalBenchmarks.FIND_SOME                                 WRAPPER_TRANSACTIONAL2_GRAPH  thrpt    3      2870,380 ±  10246,865  ops/s
FunctionalBenchmarks.MIXED_OPERATIONS                                 SYNCHRONIZED_GRAPH_V1  thrpt   15    429112,584 ±   5577,188  ops/s
FunctionalBenchmarks.MIXED_OPERATIONS                                 SYNCHRONIZED_GRAPH_V2  thrpt   15    166994,601 ±   2825,367  ops/s
FunctionalBenchmarks.MIXED_OPERATIONS                                   RW_LOCKING_GRAPH_V1  thrpt   15    164384,369 ±   2256,327  ops/s
FunctionalBenchmarks.MIXED_OPERATIONS                                   RW_LOCKING_GRAPH_V2  thrpt   15    163507,147 ±   1040,422  ops/s
FunctionalBenchmarks.MIXED_OPERATIONS                                             TXN_GRAPH  thrpt   15     39756,133 ±   1235,483  ops/s
FunctionalBenchmarks.MIXED_OPERATIONS                                             MEM_GRAPH  thrpt   15    502636,605 ±  11395,114  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_A_RW_5x6                       TXN_GRAPH  thrpt   15       173,808 ±      6,032  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_A_RW_5x6           SYNCHRONIZED_GRAPH_V1  thrpt   15       416,793 ±      4,074  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_A_RW_5x6           SYNCHRONIZED_GRAPH_V2  thrpt   15       196,966 ±      2,568  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_A_RW_5x6             RW_LOCKING_GRAPH_V1  thrpt   15       176,334 ±      1,397  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_A_RW_5x6             RW_LOCKING_GRAPH_V2  thrpt   15       179,059 ±      6,782  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_B_R_5x5                        TXN_GRAPH  thrpt   15       216,288 ±      5,114  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_B_R_5x5            SYNCHRONIZED_GRAPH_V1  thrpt   15       278,412 ±      3,210  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_B_R_5x5            SYNCHRONIZED_GRAPH_V2  thrpt   15       219,668 ±      1,491  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_B_R_5x5              RW_LOCKING_GRAPH_V1  thrpt   15       266,005 ±      3,070  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_B_R_5x5              RW_LOCKING_GRAPH_V2  thrpt   15       267,921 ±      2,225  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_B_R_5x5     WRAPPER_TRANSACTIONAL2_GRAPH  thrpt    6       345,026 ±     98,806  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_C_RW_8x14                      TXN_GRAPH  thrpt   15        83,603 ±      0,789  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_C_RW_8x14          SYNCHRONIZED_GRAPH_V1  thrpt   15       422,064 ±     10,011  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_C_RW_8x14          SYNCHRONIZED_GRAPH_V2  thrpt   15       123,621 ±      1,448  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_C_RW_8x14            RW_LOCKING_GRAPH_V1  thrpt   15        83,957 ±      1,430  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_C_RW_8x14            RW_LOCKING_GRAPH_V2  thrpt   15        82,960 ±      1,118  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_C_RW_8x14   WRAPPER_TRANSACTIONAL2_GRAPH  thrpt   15         5,725 ±      0,117  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_DC_RW_6x14                     TXN_GRAPH  thrpt   15        47,662 ±      1,784  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_DC_RW_6x14         SYNCHRONIZED_GRAPH_V1  thrpt   15       273,537 ±      8,762  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_DC_RW_6x14         SYNCHRONIZED_GRAPH_V2  thrpt   15        88,320 ±      1,513  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_DC_RW_6x14           RW_LOCKING_GRAPH_V1  thrpt   15        58,757 ±      0,697  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_DC_RW_6x14           RW_LOCKING_GRAPH_V2  thrpt   15        58,869 ±      0,793  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_DC_RW_6x14  WRAPPER_TRANSACTIONAL2_GRAPH  thrpt   12         4,383 ±      0,085  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_D_W_5x5                        TXN_GRAPH  thrpt   15       301,122 ±     27,625  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_D_W_5x5            SYNCHRONIZED_GRAPH_V1  thrpt   15      1217,857 ±     11,667  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_D_W_5x5            SYNCHRONIZED_GRAPH_V2  thrpt   15      1190,543 ±     15,668  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_D_W_5x5              RW_LOCKING_GRAPH_V1  thrpt   15      1202,480 ±     20,628  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_D_W_5x5              RW_LOCKING_GRAPH_V2  thrpt   15      1201,385 ±     16,531  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_D_W_5x5     WRAPPER_TRANSACTIONAL2_GRAPH  thrpt   15       200,902 ±      7,643  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_H_RW_4x8                       TXN_GRAPH  thrpt   15       140,139 ±      3,358  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_H_RW_4x8           SYNCHRONIZED_GRAPH_V1  thrpt   15       215,947 ±      2,394  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_H_RW_4x8           SYNCHRONIZED_GRAPH_V2  thrpt   15       126,584 ±      1,533  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_H_RW_4x8             RW_LOCKING_GRAPH_V1  thrpt   15       126,836 ±      1,002  ops/s
PizzaGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_H_RW_4x8             RW_LOCKING_GRAPH_V2  thrpt   15       127,282 ±      1,476  ops/s
SmallGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_K_RW_5x5                       TXN_GRAPH  thrpt   15       546,588 ±      4,167  ops/s
SmallGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_K_RW_5x5           SYNCHRONIZED_GRAPH_V1  thrpt   15      2224,895 ±     30,334  ops/s
SmallGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_K_RW_5x5           SYNCHRONIZED_GRAPH_V2  thrpt   15      1599,416 ±     14,426  ops/s
SmallGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_K_RW_5x5             RW_LOCKING_GRAPH_V1  thrpt   15      1384,214 ±     31,430  ops/s
SmallGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_K_RW_5x5             RW_LOCKING_GRAPH_V2  thrpt   15      1380,612 ±     22,594  ops/s
SmallGraphConcurrentBenchmarks.CONCURRENT_SCENARIO_K_RW_5x5    WRAPPER_TRANSACTIONAL2_GRAPH  thrpt    8        61,845 ±      6,815  ops/s

benchmarks-160124-01.txt

@arne-bdt
Copy link
Contributor

@sszuev
Thank you very much for your benchmark- and test-suite! It helped me to find a race condition in my code.

From my understanding, your tests and benchmarks are tailored for graphs that remain stable when accessed concurrently by multiple threads. However, I'm puzzled about their practicality without transaction support. For instance, consider the following code snippet which lacks transactional context, leading to no guaranteed consistency between successive method calls to the graph:

var verTriple = g.find(versionTriple.getSubject(), versionTriple.getPredicate(), Node.ANY).next();
 
final var ver = (Integer)verTriple.getObject().getLiteralValue() + 1;
 
g.add(versionTriple.getSubject(), versionTriple.getPredicate(), NodeFactory.createLiteralByValue(ver));
 
g.delete(verTriple);

In my branch, transactional, I've incorporated a transaction context around your benchmark and test code where it seemed fitting.

The wrapper is coded like this:

internal fun execInWriteTransaction(graph: Graph, runnable: Runnable) {
     if(graph is Transactional) {
         if(graph.isInTransaction) {
             runnable.run()
             return
         }
         graph.begin(ReadWrite.WRITE)
     }
     try {
         runnable.run()
         if(graph is Transactional) {
             graph.commit()
         }
     } catch (e: Exception) {
         if(graph is Transactional) {
             graph.abort()
         }
         throw e
     }
}

Below are my benchmark results for GRAPH_WRAPPER_TRANSACTIONAL and TXN_GRAPH:
Benchmark_Screenshot_Graphen

They look pretty bad for GRAPH_WRAPPER_TRANSACTIONAL and do not match my experience in real life scenarios.
Interestingly, except for one unit test, GRAPH_WRAPPER_TRANSACTIONAL demonstrates commendable performance in all your tests.

@sszuev
Copy link
Contributor Author

sszuev commented Jan 19, 2024

However, I'm puzzled about their practicality without transaction support. For instance, consider the following code snippet which lacks transactional context, leading to no guaranteed consistency between successive method calls to the graph

Practical use is discussed above. I suggested considering adding a new functionality to Jena - a concurrent graph. It is needed for ONT-API. Additionally, I have provided the answer from ChatGPT about when a non-transactional concurrent graph can be used.

In short, the relationship between a concurrent and a transactional graph is similar to the relationship between a JDK's ConcurrentHashMap and a Redis.
We still use ConcurrentHashMap, although there is no guarantee of consistency.
Moreover, based on concurrent-graph you can make a cache, or even a transaction graph.

The wrapper is coded like this

This code is already present in the jena-5.x.x & jena-5.x.x-SWMR branch, otherwise I would not be able to collect benchmarks due to JenaTransactionException:

It is made via kotlin's extension functions,
see https://github.com/sszuev/concurrent-rdf-graph/blob/jena-5.x.x-SWMR/src/test/kotlin/TestGraphs.kt#L83 :

fun <X> Graph.transactionWrite(action: Graph.() -> X): X {
    try {
        startWrite()
        return action()
    } finally {
        endWrite()
    }
}

private fun Graph.startRead() {
    if (this is Transactional) {
        this.begin(TxnType.READ)
    }
}

private fun Graph.endRead() {
    if (this is Transactional) {
        this.end()
    }
}

fun createFrom(source: Graph): Graph {
    val res = createNew()
    res.transactionWrite {
        source.find().forEach { res.add(it) }
    }
    return res
}

@arne-bdt
Copy link
Contributor

@sszuev
I am truly sorry that I needlessly raised the topic of transactions again.
(I hadn't remembered it and hadn't re-read the thread before replying).
Thank you for your patient explanation and response.
My transactional implementation is, as the benchmarks show, a very poor choice for simple thread safety.
I was hoping the benchmark suite would also be suitable for comparing transactional graphs with a few tweaks. Unfortunately, this does not seem to be the case.

The idea of treating your thread-safe graphs like concurrent collections has also inspired me for my current projects at work. I want to cache real-time measurements in graphs. Maybe I don't need transaction safety in this scenario and then hopefully your graphs and benchmarks will be helpful. Thanks again!

cnanjo pushed a commit to fhircat/jena that referenced this issue Mar 2, 2024
cnanjo pushed a commit to fhircat/jena that referenced this issue Mar 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants