-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add possibility to execute multiple slice queries together to KeyColumnValueStore [cql-tests] [tp-tests] #3825
Add possibility to execute multiple slice queries together to KeyColumnValueStore [cql-tests] [tp-tests] #3825
Conversation
628b54b
to
b9678db
Compare
b9678db
to
50f0266
Compare
Benchmarks of
Current PR:
Conclusion: Another observation, even so |
039546b
to
9cb30f3
Compare
2f0db4d
to
3aabab5
Compare
Latest benchmarks.
This PR:
Conclusion: Ideally, we need to aim to reach the same performance over fetching all vertices with a single slice query vs fetching all properties separately during same fully loaded CQL driver. I.e. The follow up performance improvement issue is #3816 , but that issue is less trivial. |
I would like to merge this PR using lazy consensus on Thursday. In case anyone needs more time for review, please, let me know. |
3aabab5
to
c7eb33d
Compare
Made some refactoring and improved queries grouping in Re-ran benchmarks just in case. The benchmarks stayed the same as in the previous run. Latest benchmark run for this PR:
|
c7eb33d
to
2e2723d
Compare
…mnValueStore [cql-tests] [tp-tests] - Adds possibility to execute groups of same key sets slice queries together - Implement parallel execution of all provided Slice queries to CQL storage backend - Adds a basic implementation (i.e. current non-optimized implementation) to any other storage backend which doesn't have optimized implementation right now - Adds queries grouping algorithm (`MultiSliceQueriesGroupingUtil`) (required for JanusGraph#3816 and also mimimizes keys duplicate collections creation) Fixes JanusGraph#3824 Related to JanusGraph#3816 Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
2e2723d
to
1da8876
Compare
Merging by following lazy consensus. |
Follow-up to JanusGraph#3825 Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
Follow-up to #3825 Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
What this PR does:
Detailed explanation:
As for now when JanusGraph performs multi-query it executes slice queries for multiple vertices one by one.
I.e. assuming multi-query is enabled
g.V(v1,v2,v3,v4,v5).values("prop1", "prop2", "prop3")
this step fetchesprop1
for all provided vertices (v1
,v2
,v3
,v4
,v5
), awaits for the result, then sends another slice query to fetchprop2
for the same set of vertices, then awaits for the result again and sends the last slice query to fetchprop3
for the provided set of vertices.This behavior currently exists in
master
branch as well as in the PR #3803 forrequired_properties_only
mode. However, #3803 PR adds another modeall_properties
which fetches all vertex properties in a single slice query which is usually much faster. The downside ofall_properties
mode is that it fetches all properties of the vertex and not only requested properties.This PR will change the way we execute multi-queries and instead of processing all slice queries one by one - we send all the necessary slice queries as well as all the necessary keys (vertices) to the storage implementation. Now the storage implementation is going to decide how those queries should be executed.
First obvious CQL optimization I did - execute all those slice queries in parallel. Thus, for CQL storage backend we will not wait for the previous property to be fetched anymore. Instead, we will fetch all properties in parallel which should significantly speedup properties fetching for cases when several specific properties are requested and the user uses
required_properties_only
mode.However, I'm adding this optimization to CQL storage implementation only as for now. All other storage backends won't see any difference (i.e. if you use HBase then the properties fetching using
required_properties_only
mode will be the same as currently inmaster
- non-optimized / blocking). We can improve other storage backends later as well.There are 2 breaking changes expected:
KeyColumnValueStore
adds a new method which storage backend implementations (adapters) will need to implement. However, I'm adding the utility method (KeyColumnValueStoreUtil.getMultiSliceNonOptimized
) which can be used to have the same implementation as it's currently onmaster
branch (i.e. non-optimized version of the method). This should help any storage adapter developers to quickly implement this method in case they are not interested in multi-slice optimizations.Fixes #3824
Related to #3816
Thank you for contributing to JanusGraph!
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
master
)?For code changes:
For documentation related changes: