New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GraphMem2 #1912
Labels
enhancement
Incrementally add new feature
Comments
arne-bdt
pushed a commit
to arne-bdt/jena
that referenced
this issue
Jun 22, 2023
New in-memory, general-purpose, non-transactional graphs as successors of GraphMem: All variants strictly use term-equality and do not support Iterator#remove. (GraphMem uses value-equality for object nodes) GraphMem2Legacy: - Purpose: Use this graph implementation if you want to maintain the 'old' behavior of GraphMem or if your memory constraints prevent you from utilizing more memory-intensive solutions. - Slightly improved performance compared to GraphMem - Simplified implementation, primarily due to lack of support for Iterator#remove - The heritage of GraphMem: - Same basic structure - Same memory consumption - Also based on HashCommon GraphMem2Fast: - Purpose: GraphMem2Fast is a strong candidate for becoming the new default in-memory graph in the upcoming Jena 5, thanks to its improved performance and relatively minor increase in memory usage. - Faster than GraphMem2Legacy (specially Graph#add, Graph#find and Graph#stream) - Memory consumption is about 6-35% higher than GraphMem2Legacy - Maps and sets are not based on HashCommon, but use a faster custom alternative (only #remove is a bit slower) - Benefits from multiple small optimizations - The heritage of GraphMem: - Also uses 3 hash-maps indexed by subjects, predicates, and objects - Values of the maps also switch from arrays to hash sets for the triples GraphMem2Roaring - Purpose: GraphMem2Roaring is ideal for handling extremely large graphs. If you frequently work with such massive data structures, this implementation could be your top choice. - Graph#contains is faster than GraphMem2Fast - Better performance than GraphMem2Fast for operations with triple matches for the pattern S_O, SP_, and _PO on large graphs, due to bit-operations to find intersecting triples - Memory consumption is about 7-99% higher than GraphMem2Legacy - Suitable for really large graphs like bsbm-5m.nt.gz, bsbm-25m.nt.gz, and possibly even larger - Simple and straightforward implementation - No heritage of GraphMem - Internal structure: - One indexed hash set (same as GraphMem2Fast uses) that holds all triples - Three hash maps indexed by subjects, predicates, and objects with RoaringBitmaps as values - The bitmaps contain the indices of the triples in the central hash set Other Changes: - org.apache.jena.graph.test.TestGraph - added GraphMem2Fast, GraphMem2Legacy and GraphMem2Roaring to the suite - GraphMem: - moved property "TripleStore store" from GraphMemBase to GraphMem --> needed this to make a clean GraphMem2, which also extends GraphMem but the TripleStore interface is slightly different. - pom.xml: - added dependency roaringbitmap 0.9.44 - jena-benchmarks-jmh - added the three new graph implementations to the benchmarks - randomized the order of test data in some benchmarks to prevent them from showing order dependent behaviour - added benchmarks for sets and maps comparing - HashCommonSet vs. FastHashSet vs. Java HashSet - HashCommonMap vs. FastHashMap vs. Java HashMap
arne-bdt
pushed a commit
to arne-bdt/jena
that referenced
this issue
Jun 25, 2023
New in-memory, general-purpose, non-transactional graphs as successors of GraphMem: All variants strictly use term-equality and do not support Iterator#remove. (GraphMem uses value-equality for object nodes) GraphMem2Legacy: - Purpose: Use this graph implementation if you want to maintain the 'old' behavior of GraphMem or if your memory constraints prevent you from utilizing more memory-intensive solutions. - Slightly improved performance compared to GraphMem - Simplified implementation, primarily due to lack of support for Iterator#remove - The heritage of GraphMem: - Same basic structure - Same memory consumption - Also based on HashCommon GraphMem2Fast: - Purpose: GraphMem2Fast is a strong candidate for becoming the new default in-memory graph in the upcoming Jena 5, thanks to its improved performance and relatively minor increase in memory usage. - Faster than GraphMem2Legacy (specially Graph#add, Graph#find and Graph#stream) - Memory consumption is about 6-35% higher than GraphMem2Legacy - Maps and sets are not based on HashCommon, but use a faster custom alternative (only #remove is a bit slower) - Benefits from multiple small optimizations - The heritage of GraphMem: - Also uses 3 hash-maps indexed by subjects, predicates, and objects - Values of the maps also switch from arrays to hash sets for the triples GraphMem2Roaring - Purpose: GraphMem2Roaring is ideal for handling extremely large graphs. If you frequently work with such massive data structures, this implementation could be your top choice. - Graph#contains is faster than GraphMem2Fast - Better performance than GraphMem2Fast for operations with triple matches for the pattern S_O, SP_, and _PO on large graphs, due to bit-operations to find intersecting triples - Memory consumption is about 7-99% higher than GraphMem2Legacy - Suitable for really large graphs like bsbm-5m.nt.gz, bsbm-25m.nt.gz, and possibly even larger - Simple and straightforward implementation - No heritage of GraphMem - Internal structure: - One indexed hash set (same as GraphMem2Fast uses) that holds all triples - Three hash maps indexed by subjects, predicates, and objects with RoaringBitmaps as values - The bitmaps contain the indices of the triples in the central hash set Other Changes: - org.apache.jena.graph.test.TestGraph - added GraphMem2Fast, GraphMem2Legacy and GraphMem2Roaring to the suite - GraphMem: - moved property "TripleStore store" from GraphMemBase to GraphMem --> needed this to make a clean GraphMem2, which also extends GraphMem but the TripleStore interface is slightly different. - pom.xml: - added dependency roaringbitmap 0.9.44 - jena-benchmarks-jmh - added the three new graph implementations to the benchmarks - randomized the order of test data in some benchmarks to prevent them from showing order dependent behaviour - added benchmarks for sets and maps comparing - HashCommonSet vs. FastHashSet vs. Java HashSet - HashCommonMap vs. FastHashMap vs. Java HashMap
arne-bdt
pushed a commit
to arne-bdt/jena
that referenced
this issue
Jun 25, 2023
New in-memory, general-purpose, non-transactional graphs as successors of GraphMem: All variants strictly use term-equality and do not support Iterator#remove. (GraphMem uses value-equality for object nodes) GraphMem2Legacy: - Purpose: Use this graph implementation if you want to maintain the 'old' behavior of GraphMem or if your memory constraints prevent you from utilizing more memory-intensive solutions. - Slightly improved performance compared to GraphMem - Simplified implementation, primarily due to lack of support for Iterator#remove - The heritage of GraphMem: - Same basic structure - Same memory consumption - Also based on HashCommon GraphMem2Fast: - Purpose: GraphMem2Fast is a strong candidate for becoming the new default in-memory graph in the upcoming Jena 5, thanks to its improved performance and relatively minor increase in memory usage. - Faster than GraphMem2Legacy (specially Graph#add, Graph#find and Graph#stream) - Memory consumption is about 6-35% higher than GraphMem2Legacy - Maps and sets are not based on HashCommon, but use a faster custom alternative (only #remove is a bit slower) - Benefits from multiple small optimizations - The heritage of GraphMem: - Also uses 3 hash-maps indexed by subjects, predicates, and objects - Values of the maps also switch from arrays to hash sets for the triples GraphMem2Roaring - Purpose: GraphMem2Roaring is ideal for handling extremely large graphs. If you frequently work with such massive data structures, this implementation could be your top choice. - Graph#contains is faster than GraphMem2Fast - Better performance than GraphMem2Fast for operations with triple matches for the pattern S_O, SP_, and _PO on large graphs, due to bit-operations to find intersecting triples - Memory consumption is about 7-99% higher than GraphMem2Legacy - Suitable for really large graphs like bsbm-5m.nt.gz, bsbm-25m.nt.gz, and possibly even larger - Simple and straightforward implementation - No heritage of GraphMem - Internal structure: - One indexed hash set (same as GraphMem2Fast uses) that holds all triples - Three hash maps indexed by subjects, predicates, and objects with RoaringBitmaps as values - The bitmaps contain the indices of the triples in the central hash set Other Changes: - org.apache.jena.graph.test.TestGraph - added GraphMem2Fast, GraphMem2Legacy and GraphMem2Roaring to the suite - GraphMem: - moved property "TripleStore store" from GraphMemBase to GraphMem --> needed this to make a clean GraphMem2, which also extends GraphMem but the TripleStore interface is slightly different. - pom.xml: - added dependency roaringbitmap 0.9.44 - jena-benchmarks-jmh - added the three new graph implementations to the benchmarks - randomized the order of test data in some benchmarks to prevent them from showing order dependent behaviour - added benchmarks for sets and maps comparing - HashCommonSet vs. FastHashSet vs. Java HashSet - HashCommonMap vs. FastHashMap vs. Java HashMap
arne-bdt
pushed a commit
to arne-bdt/jena
that referenced
this issue
Jun 25, 2023
New in-memory, general-purpose, non-transactional graphs as successors of GraphMem: All variants strictly use term-equality and do not support Iterator#remove. (GraphMem uses value-equality for object nodes) GraphMem2Legacy: - Purpose: Use this graph implementation if you want to maintain the 'old' behavior of GraphMem or if your memory constraints prevent you from utilizing more memory-intensive solutions. - Slightly improved performance compared to GraphMem - Simplified implementation, primarily due to lack of support for Iterator#remove - The heritage of GraphMem: - Same basic structure - Same memory consumption - Also based on HashCommon GraphMem2Fast: - Purpose: GraphMem2Fast is a strong candidate for becoming the new default in-memory graph in the upcoming Jena 5, thanks to its improved performance and relatively minor increase in memory usage. - Faster than GraphMem2Legacy (specially Graph#add, Graph#find and Graph#stream) - Memory consumption is about 6-35% higher than GraphMem2Legacy - Maps and sets are not based on HashCommon, but use a faster custom alternative (only #remove is a bit slower) - Benefits from multiple small optimizations - The heritage of GraphMem: - Also uses 3 hash-maps indexed by subjects, predicates, and objects - Values of the maps also switch from arrays to hash sets for the triples GraphMem2Roaring - Purpose: GraphMem2Roaring is ideal for handling extremely large graphs. If you frequently work with such massive data structures, this implementation could be your top choice. - Graph#contains is faster than GraphMem2Fast - Better performance than GraphMem2Fast for operations with triple matches for the pattern S_O, SP_, and _PO on large graphs, due to bit-operations to find intersecting triples - Memory consumption is about 7-99% higher than GraphMem2Legacy - Suitable for really large graphs like bsbm-5m.nt.gz, bsbm-25m.nt.gz, and possibly even larger - Simple and straightforward implementation - No heritage of GraphMem - Internal structure: - One indexed hash set (same as GraphMem2Fast uses) that holds all triples - Three hash maps indexed by subjects, predicates, and objects with RoaringBitmaps as values - The bitmaps contain the indices of the triples in the central hash set Other Changes: - org.apache.jena.graph.test.TestGraph - added GraphMem2Fast, GraphMem2Legacy and GraphMem2Roaring to the suite - GraphMem: - moved property "TripleStore store" from GraphMemBase to GraphMem --> needed this to make a clean GraphMem2, which also extends GraphMem but the TripleStore interface is slightly different. - pom.xml: - added dependency roaringbitmap 0.9.44 - jena-benchmarks-jmh - added the three new graph implementations to the benchmarks - randomized the order of test data in some benchmarks to prevent them from showing order dependent behaviour - added benchmarks for sets and maps comparing - HashCommonSet vs. FastHashSet vs. Java HashSet - HashCommonMap vs. FastHashMap vs. Java HashMap
arne-bdt
pushed a commit
to arne-bdt/jena
that referenced
this issue
Jun 27, 2023
New in-memory, general-purpose, non-transactional graphs as successors of GraphMem: All variants strictly use term-equality and do not support Iterator#remove. (GraphMem uses value-equality for object nodes) GraphMem2Legacy: - Purpose: Use this graph implementation if you want to maintain the 'old' behavior of GraphMem or if your memory constraints prevent you from utilizing more memory-intensive solutions. - Slightly improved performance compared to GraphMem - Simplified implementation, primarily due to lack of support for Iterator#remove - The heritage of GraphMem: - Same basic structure - Same memory consumption - Also based on HashCommon GraphMem2Fast: - Purpose: GraphMem2Fast is a strong candidate for becoming the new default in-memory graph in the upcoming Jena 5, thanks to its improved performance and relatively minor increase in memory usage. - Faster than GraphMem2Legacy (specially Graph#add, Graph#find and Graph#stream) - Memory consumption is about 6-35% higher than GraphMem2Legacy - Maps and sets are not based on HashCommon, but use a faster custom alternative (only #remove is a bit slower) - Benefits from multiple small optimizations - The heritage of GraphMem: - Also uses 3 hash-maps indexed by subjects, predicates, and objects - Values of the maps also switch from arrays to hash sets for the triples GraphMem2Roaring - Purpose: GraphMem2Roaring is ideal for handling extremely large graphs. If you frequently work with such massive data structures, this implementation could be your top choice. - Graph#contains is faster than GraphMem2Fast - Better performance than GraphMem2Fast for operations with triple matches for the pattern S_O, SP_, and _PO on large graphs, due to bit-operations to find intersecting triples - Memory consumption is about 7-99% higher than GraphMem2Legacy - Suitable for really large graphs like bsbm-5m.nt.gz, bsbm-25m.nt.gz, and possibly even larger - Simple and straightforward implementation - No heritage of GraphMem - Internal structure: - One indexed hash set (same as GraphMem2Fast uses) that holds all triples - Three hash maps indexed by subjects, predicates, and objects with RoaringBitmaps as values - The bitmaps contain the indices of the triples in the central hash set Other Changes: - org.apache.jena.graph.test.TestGraph - added GraphMem2Fast, GraphMem2Legacy and GraphMem2Roaring to the suite - GraphMem: - moved property "TripleStore store" from GraphMemBase to GraphMem --> needed this to make a clean GraphMem2, which also extends GraphMem but the TripleStore interface is slightly different. - pom.xml: - added dependency roaringbitmap 0.9.44 - jena-benchmarks-jmh - added the three new graph implementations to the benchmarks - randomized the order of test data in some benchmarks to prevent them from showing order dependent behaviour - added benchmarks for sets and maps comparing - HashCommonSet vs. FastHashSet vs. Java HashSet - HashCommonMap vs. FastHashMap vs. Java HashMap
arne-bdt
pushed a commit
to arne-bdt/jena
that referenced
this issue
Jun 27, 2023
New in-memory, general-purpose, non-transactional graphs as successors of GraphMem: All variants strictly use term-equality and do not support Iterator#remove. (GraphMem uses value-equality for object nodes) GraphMem2Legacy: - Purpose: Use this graph implementation if you want to maintain the 'old' behavior of GraphMem or if your memory constraints prevent you from utilizing more memory-intensive solutions. - Slightly improved performance compared to GraphMem - Simplified implementation, primarily due to lack of support for Iterator#remove - The heritage of GraphMem: - Same basic structure - Same memory consumption - Also based on HashCommon GraphMem2Fast: - Purpose: GraphMem2Fast is a strong candidate for becoming the new default in-memory graph in the upcoming Jena 5, thanks to its improved performance and relatively minor increase in memory usage. - Faster than GraphMem2Legacy (specially Graph#add, Graph#find and Graph#stream) - Memory consumption is about 6-35% higher than GraphMem2Legacy - Maps and sets are not based on HashCommon, but use a faster custom alternative (only #remove is a bit slower) - Benefits from multiple small optimizations - The heritage of GraphMem: - Also uses 3 hash-maps indexed by subjects, predicates, and objects - Values of the maps also switch from arrays to hash sets for the triples GraphMem2Roaring - Purpose: GraphMem2Roaring is ideal for handling extremely large graphs. If you frequently work with such massive data structures, this implementation could be your top choice. - Graph#contains is faster than GraphMem2Fast - Better performance than GraphMem2Fast for operations with triple matches for the pattern S_O, SP_, and _PO on large graphs, due to bit-operations to find intersecting triples - Memory consumption is about 7-99% higher than GraphMem2Legacy - Suitable for really large graphs like bsbm-5m.nt.gz, bsbm-25m.nt.gz, and possibly even larger - Simple and straightforward implementation - No heritage of GraphMem - Internal structure: - One indexed hash set (same as GraphMem2Fast uses) that holds all triples - Three hash maps indexed by subjects, predicates, and objects with RoaringBitmaps as values - The bitmaps contain the indices of the triples in the central hash set Other Changes: - org.apache.jena.graph.test.TestGraph - added GraphMem2Fast, GraphMem2Legacy and GraphMem2Roaring to the suite - GraphMem: - moved property "TripleStore store" from GraphMemBase to GraphMem --> needed this to make a clean GraphMem2, which also extends GraphMem but the TripleStore interface is slightly different. - pom.xml: - added dependency roaringbitmap 0.9.44 - jena-benchmarks-jmh - added the three new graph implementations to the benchmarks - randomized the order of test data in some benchmarks to prevent them from showing order dependent behaviour - added benchmarks for sets and maps comparing - HashCommonSet vs. FastHashSet vs. Java HashSet - HashCommonMap vs. FastHashMap vs. Java HashMap
afs
added a commit
that referenced
this issue
Jun 28, 2023
afs
added a commit
to afs/jena
that referenced
this issue
Jun 28, 2023
afs
added a commit
to afs/jena
that referenced
this issue
Jun 28, 2023
afs
added a commit
to afs/jena
that referenced
this issue
Jun 28, 2023
afs
added a commit
to afs/jena
that referenced
this issue
Jun 28, 2023
afs
added a commit
that referenced
this issue
Jun 30, 2023
GH-1912: GraphMemFactory functions for GraphMem2
cnanjo
pushed a commit
to fhircat/jena
that referenced
this issue
Mar 2, 2024
New in-memory, general-purpose, non-transactional graphs as successors of GraphMem: All variants strictly use term-equality and do not support Iterator#remove. (GraphMem uses value-equality for object nodes) GraphMem2Legacy: - Purpose: Use this graph implementation if you want to maintain the 'old' behavior of GraphMem or if your memory constraints prevent you from utilizing more memory-intensive solutions. - Slightly improved performance compared to GraphMem - Simplified implementation, primarily due to lack of support for Iterator#remove - The heritage of GraphMem: - Same basic structure - Same memory consumption - Also based on HashCommon GraphMem2Fast: - Purpose: GraphMem2Fast is a strong candidate for becoming the new default in-memory graph in the upcoming Jena 5, thanks to its improved performance and relatively minor increase in memory usage. - Faster than GraphMem2Legacy (specially Graph#add, Graph#find and Graph#stream) - Memory consumption is about 6-35% higher than GraphMem2Legacy - Maps and sets are not based on HashCommon, but use a faster custom alternative (only #remove is a bit slower) - Benefits from multiple small optimizations - The heritage of GraphMem: - Also uses 3 hash-maps indexed by subjects, predicates, and objects - Values of the maps also switch from arrays to hash sets for the triples GraphMem2Roaring - Purpose: GraphMem2Roaring is ideal for handling extremely large graphs. If you frequently work with such massive data structures, this implementation could be your top choice. - Graph#contains is faster than GraphMem2Fast - Better performance than GraphMem2Fast for operations with triple matches for the pattern S_O, SP_, and _PO on large graphs, due to bit-operations to find intersecting triples - Memory consumption is about 7-99% higher than GraphMem2Legacy - Suitable for really large graphs like bsbm-5m.nt.gz, bsbm-25m.nt.gz, and possibly even larger - Simple and straightforward implementation - No heritage of GraphMem - Internal structure: - One indexed hash set (same as GraphMem2Fast uses) that holds all triples - Three hash maps indexed by subjects, predicates, and objects with RoaringBitmaps as values - The bitmaps contain the indices of the triples in the central hash set Other Changes: - org.apache.jena.graph.test.TestGraph - added GraphMem2Fast, GraphMem2Legacy and GraphMem2Roaring to the suite - GraphMem: - moved property "TripleStore store" from GraphMemBase to GraphMem --> needed this to make a clean GraphMem2, which also extends GraphMem but the TripleStore interface is slightly different. - pom.xml: - added dependency roaringbitmap 0.9.44 - jena-benchmarks-jmh - added the three new graph implementations to the benchmarks - randomized the order of test data in some benchmarks to prevent them from showing order dependent behaviour - added benchmarks for sets and maps comparing - HashCommonSet vs. FastHashSet vs. Java HashSet - HashCommonMap vs. FastHashMap vs. Java HashMap
cnanjo
pushed a commit
to fhircat/jena
that referenced
this issue
Mar 2, 2024
cnanjo
pushed a commit
to fhircat/jena
that referenced
this issue
Mar 2, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Version
4.9.0.SNAPSHOT
Feature
New in-memory, general-purpose, non-transactional graphs as successors of GraphMem:
All variants strictly use term-equality and do not support
Iterator#remove
.(GraphMem uses value-equality for object nodes)
(see also #1867 (comment))
GraphMem2Legacy
Iterator#remove
HashCommon
GraphMem2Fast
#remove
is a bit slower)GraphMem2Roaring
due to bit-operations to find intersecting triples
Memory consumption
I measured the memory consumption before and after adding a previously loaded list of triples to the graph. Therefore, the measurement should only account for the additional memory used by the graph's indexing structures, not the space occupied by the triples themselves:
Graph#add
Graph#find ANY ANY ANY
(Note: This is not a fair comparison with other 'find' patterns because GraphMem2Roaring uses a single set to return the results. Additionally, while GraphMem2Legacy has faster iterators, it does not offer faster streaming than GraphMem.)
Would these three graph implementations be appreciated by the Jena community?
(If so, I will still have to write some documentation and unit tests.)
Are you interested in contributing a solution yourself?
Yes
The text was updated successfully, but these errors were encountered: