Add possibility to execute multiple slice queries together to KeyColumnValueStore [cql-tests] [tp-tests] #3825

porunov · 2023-06-14T20:46:52Z

What this PR does:

Adds possibility to execute groups of same key sets slice queries together
Implement parallel execution of all provided Slice queries to CQL storage backend
Adds a basic implementation (i.e. current non-optimized implementation) to any other storage backend which doesn't have optimized implementation right now

Detailed explanation:
As for now when JanusGraph performs multi-query it executes slice queries for multiple vertices one by one.
I.e. assuming multi-query is enabled g.V(v1,v2,v3,v4,v5).values("prop1", "prop2", "prop3") this step fetches prop1 for all provided vertices (v1, v2, v3, v4, v5), awaits for the result, then sends another slice query to fetch prop2 for the same set of vertices, then awaits for the result again and sends the last slice query to fetch prop3 for the provided set of vertices.
This behavior currently exists in master branch as well as in the PR #3803 for required_properties_only mode. However, #3803 PR adds another mode all_properties which fetches all vertex properties in a single slice query which is usually much faster. The downside of all_properties mode is that it fetches all properties of the vertex and not only requested properties.

This PR will change the way we execute multi-queries and instead of processing all slice queries one by one - we send all the necessary slice queries as well as all the necessary keys (vertices) to the storage implementation. Now the storage implementation is going to decide how those queries should be executed.
First obvious CQL optimization I did - execute all those slice queries in parallel. Thus, for CQL storage backend we will not wait for the previous property to be fetched anymore. Instead, we will fetch all properties in parallel which should significantly speedup properties fetching for cases when several specific properties are requested and the user uses required_properties_only mode.
However, I'm adding this optimization to CQL storage implementation only as for now. All other storage backends won't see any difference (i.e. if you use HBase then the properties fetching using required_properties_only mode will be the same as currently in master - non-optimized / blocking). We can improve other storage backends later as well.

There are 2 breaking changes expected:

KeyColumnValueStore adds a new method which storage backend implementations (adapters) will need to implement. However, I'm adding the utility method (KeyColumnValueStoreUtil.getMultiSliceNonOptimized) which can be used to have the same implementation as it's currently on master branch (i.e. non-optimized version of the method). This should help any storage adapter developers to quickly implement this method in case they are not interested in multi-slice optimizations.
As all multi-query slice queries are executed together now, we can't profile individual slice queries. Thus, for multi-slice queries profiling is now grouped for all slice queries instead of individual slice queries.

Fixes #3824
Related to #3816

Thank you for contributing to JanusGraph!

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

Is there an issue associated with this PR? Is it referenced in the commit message?
Does your PR body contain #xyz where xyz is the issue number you are trying to resolve?
Has your PR been rebased against the latest commit within the target branch (typically master)?
Is your initial contribution a single, squashed commit?

For code changes:

Have you written and/or updated unit tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE.txt file, including the main LICENSE.txt file in the root of this repository?
If applicable, have you updated the NOTICE.txt file, including the main NOTICE.txt file found in the root of this repository?

For documentation related changes:

Have you ensured that format looks appropriate for the output in which it is rendered?

porunov · 2023-06-15T10:52:21Z

Benchmarks of CQLMultiQueryMultiSlicesBenchmark

master branch:

Benchmark                                                                                                          (verticesAmount)  Mode  Cnt     Score     Error  Units
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection                   5000  avgt    5    74.543 ±   7.149  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection                  50000  avgt    5   832.826 ±  48.409  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithUnlimitedBatch                                                     5000  avgt    5    68.687 ±   4.614  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithUnlimitedBatch                                                    50000  avgt    5   739.975 ±  16.342  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection              5000  avgt    5   350.945 ±  15.501  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection             50000  avgt    5  3684.055 ± 125.631  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithUnlimitedBatch                                                5000  avgt    5   278.037 ±   9.099  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithUnlimitedBatch                                               50000  avgt    5  3177.082 ± 127.442  ms/op

Current PR:

Benchmark                                                                                                          (verticesAmount)  Mode  Cnt     Score     Error  Units
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection                   5000  avgt    5    81.126 ±   5.636  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection                  50000  avgt    5   813.297 ±  24.664  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithUnlimitedBatch                                                     5000  avgt    5    67.659 ±   2.313  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithUnlimitedBatch                                                    50000  avgt    5   737.676 ±  18.572  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection              5000  avgt    5   266.972 ±  13.312  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection             50000  avgt    5  2796.589 ±  60.584  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithUnlimitedBatch                                                5000  avgt    5   269.096 ±   7.659  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithUnlimitedBatch                                               50000  avgt    5  3135.563 ± 174.468  ms/op

Conclusion:
As expected getValuesAllPropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection and getValuesAllPropertiesWithUnlimitedBatch have the same performance both for master branch and in this PR.
We see that getValuesMultiplePropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection is 24% faster in this PR than in master branch. This is due to the fact that in this PR CQL query computes each slice query per each key in parallel but it is sequential operation in master branch.
Why didn't we see any performance improvement for getValuesMultiplePropertiesWithUnlimitedBatch then? This test specifically shows that performance is equal between master branch and the current PR for this case because CQL is configured to accept up to 1024 CQL queries in parallel, but in master branch we are sending 50000 CQL requests 10 times sequentially and in this PR we are sending 500000 CQL requests in parallel.
Logically, if the driver can process only 1024 requests in parallel - it doesn't have any difference if we send 50000 CQL requests in 10 sequential batches or if we send 500000 CQL requests.

Another observation, even so getValuesMultiplePropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection is 24% faster than in master branch, this test is still not as fast as getValuesAllPropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection (which is about 3.4 times faster).
This shows us that sending a single CQL request which retrieves all data is more beneficial than sending multiple CQL requests which retrieves the same data. Thus, after this PR is merged, we should work on #3816 to group all slice queries into a single slice query for the respective keys.

porunov · 2023-06-17T16:41:48Z

Latest benchmarks.

master branch:

Benchmark                                                                                                          (verticesAmount)  Mode  Cnt      Score     Error  Units
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection                   5000  avgt    5     76.508 ±   6.666  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection                  50000  avgt    5    844.627 ±  45.378  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithUnlimitedBatch                                                     5000  avgt    5     68.332 ±   1.970  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithUnlimitedBatch                                                    50000  avgt    5    746.349 ±  26.421  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection              5000  avgt    5    352.094 ±  22.025  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection             50000  avgt    5   3776.992 ± 202.947  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithSmallBatch                                                    5000  avgt    5   1002.613 ±  46.442  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithSmallBatch                                                   50000  avgt    5  10208.184 ± 133.210  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithUnlimitedBatch                                                5000  avgt    5    289.104 ±  35.600  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithUnlimitedBatch                                               50000  avgt    5   3188.636 ± 270.826  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesThreePropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection                 5000  avgt    5    111.704 ±   8.093  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesThreePropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection                50000  avgt    5   1186.497 ±  69.885  ms/op
CQLMultiQueryMultiSlicesBenchmark.vertexCentricPropertiesFetching                                                              5000  avgt    5   4291.133 ± 144.476  ms/op
CQLMultiQueryMultiSlicesBenchmark.vertexCentricPropertiesFetching                                                             50000  avgt    5  42870.648 ± 978.082  ms/op

This PR:

Benchmark                                                                                                          (verticesAmount)  Mode  Cnt      Score     Error  Units
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection                   5000  avgt    5     76.598 ±   5.908  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection                  50000  avgt    5    838.959 ±  44.656  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithUnlimitedBatch                                                     5000  avgt    5     69.015 ±   1.115  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithUnlimitedBatch                                                    50000  avgt    5    748.797 ±  41.574  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection              5000  avgt    5    267.932 ±  16.651  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection             50000  avgt    5   2859.473 ± 111.065  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithSmallBatch                                                    5000  avgt    5    352.522 ±  12.514  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithSmallBatch                                                   50000  avgt    5   3740.955 ± 138.963  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithUnlimitedBatch                                                5000  avgt    5    272.325 ±   9.806  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithUnlimitedBatch                                               50000  avgt    5   3112.767 ± 324.889  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesThreePropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection                 5000  avgt    5     96.613 ±   5.981  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesThreePropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection                50000  avgt    5    993.698 ±  57.113  ms/op
CQLMultiQueryMultiSlicesBenchmark.vertexCentricPropertiesFetching                                                              5000  avgt    5   1002.535 ±  14.568  ms/op
CQLMultiQueryMultiSlicesBenchmark.vertexCentricPropertiesFetching                                                             50000  avgt    5  10329.903 ± 176.009  ms/op

Conclusion:
For cases when CQL driver is overloaded with other parallel requests - performance stays the same as currently at master branch. However, performance quite improves for multi-slice scenarios when CQL driver is less busy.

Ideally, we need to aim to reach the same performance over fetching all vertices with a single slice query vs fetching all properties separately during same fully loaded CQL driver. I.e. getValuesMultiplePropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection and getValuesAllPropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection ideally would perform the same.
However, this isn't the case as for now. In master branch the difference is 4.6 times between those tests. In this PR the difference is 3.5 times. This PR is just a step forward toward achieving same performance for these cases. Nevertheless, this PR brings at least 24% improvement and in less loaded scenarios the difference is even more (Like 2.85 times for getValuesMultiplePropertiesWithSmallBatch or 4.2 times for vertexCentricPropertiesFetching).

The follow up performance improvement issue is #3816 , but that issue is less trivial.

porunov · 2023-06-19T21:03:39Z

I would like to merge this PR using lazy consensus on Thursday. In case anyone needs more time for review, please, let me know.

porunov · 2023-06-21T22:12:59Z

Made some refactoring and improved queries grouping in ExpirationKCVSCache to group queries to the necessary key sets when some of the initial keys are cached for this query. In such case re-grouping is triggered for any query which changed it's initial key set group. Thus, we now guarantee that all queries are grouped together when their key sets are the same (it now works with cache enabled or disabled).

Re-ran benchmarks just in case. The benchmarks stayed the same as in the previous run.

Latest benchmark run for this PR:

Benchmark                                                                                                          (verticesAmount)  Mode  Cnt      Score     Error  Units
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection                   5000  avgt    5     79.521 ±   2.086  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection                  50000  avgt    5    834.695 ±  48.625  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithUnlimitedBatch                                                     5000  avgt    5     68.136 ±   1.526  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesAllPropertiesWithUnlimitedBatch                                                    50000  avgt    5    732.760 ±  19.484  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection              5000  avgt    5    267.611 ±  11.612  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection             50000  avgt    5   2740.878 ±  84.204  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithSmallBatch                                                    5000  avgt    5    353.603 ±  23.581  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithSmallBatch                                                   50000  avgt    5   3682.449 ±  93.293  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithUnlimitedBatch                                                5000  avgt    5    271.618 ±  24.563  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesMultiplePropertiesWithUnlimitedBatch                                               50000  avgt    5   2924.436 ± 109.638  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesThreePropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection                 5000  avgt    5     95.498 ±   6.907  ms/op
CQLMultiQueryMultiSlicesBenchmark.getValuesThreePropertiesWithAllMultiQuerySlicesUnderMaxRequestsPerConnection                50000  avgt    5    995.552 ±  75.749  ms/op
CQLMultiQueryMultiSlicesBenchmark.vertexCentricPropertiesFetching                                                              5000  avgt    5   1005.805 ±  19.179  ms/op
CQLMultiQueryMultiSlicesBenchmark.vertexCentricPropertiesFetching                                                             50000  avgt    5  10373.545 ± 388.756  ms/op

…mnValueStore [cql-tests] [tp-tests] - Adds possibility to execute groups of same key sets slice queries together - Implement parallel execution of all provided Slice queries to CQL storage backend - Adds a basic implementation (i.e. current non-optimized implementation) to any other storage backend which doesn't have optimized implementation right now - Adds queries grouping algorithm (`MultiSliceQueriesGroupingUtil`) (required for JanusGraph#3816 and also mimimizes keys duplicate collections creation) Fixes JanusGraph#3824 Related to JanusGraph#3816 Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

porunov · 2023-06-22T22:58:02Z

Merging by following lazy consensus.
Classes KeyMultiQuery and MultiKeyMultiQuery are redundant in this PR, but to not re-trigger TP tests I will make a new PR with CTR to remove them.

Follow-up to JanusGraph#3825 Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

Follow-up to #3825 Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

porunov added kind/performance backport/skip labels Jun 14, 2023

porunov added this to the Release v1.0.0 milestone Jun 14, 2023

janusgraph-bot added the cla: external Externally-managed CLA label Jun 14, 2023

porunov mentioned this pull request Jun 14, 2023

Enable multiQuery optimization for PropertyMapStep and ElementMapStep [cql-tests] [tp-tests] #3803

Merged

9 tasks

porunov force-pushed the feature/multi-range-slice-queries branch from 628b54b to b9678db Compare June 15, 2023 01:12

porunov changed the title ~~Add possibility to execute multiple slice queries together to KeyColumnValueStore~~ Add possibility to execute multiple slice queries together to KeyColumnValueStore [cql-tests] [tp-tests] Jun 15, 2023

porunov force-pushed the feature/multi-range-slice-queries branch from b9678db to 50f0266 Compare June 15, 2023 09:53

porunov marked this pull request as ready for review June 15, 2023 11:00

porunov requested a review from a team June 15, 2023 11:00

porunov force-pushed the feature/multi-range-slice-queries branch 2 times, most recently from 039546b to 9cb30f3 Compare June 16, 2023 23:42

porunov mentioned this pull request Jun 17, 2023

Implement multi-range slice queries for CQL storage implementation #3816

Open

porunov force-pushed the feature/multi-range-slice-queries branch 3 times, most recently from 2f0db4d to 3aabab5 Compare June 17, 2023 16:24

porunov mentioned this pull request Jun 19, 2023

Group Slice queries into a single CQL query for Cardinality.SINGLE properties fetching [cql-tests] [tp-tests] #3844

Merged

9 tasks

porunov force-pushed the feature/multi-range-slice-queries branch from 3aabab5 to c7eb33d Compare June 21, 2023 22:05

porunov force-pushed the feature/multi-range-slice-queries branch from c7eb33d to 2e2723d Compare June 22, 2023 00:43

porunov force-pushed the feature/multi-range-slice-queries branch from 2e2723d to 1da8876 Compare June 22, 2023 14:45

porunov merged commit 98d409b into JanusGraph:master Jun 22, 2023

porunov added a commit to porunov/janusgraph that referenced this pull request Jun 22, 2023

Refactoring: remove unused classes from JanusGraph#3825 CTR

28fb685

Follow-up to JanusGraph#3825 Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

porunov mentioned this pull request Jun 22, 2023

Refactoring: remove unused classes from #3825 CTR #3851

Merged

9 tasks

porunov added a commit that referenced this pull request Jun 23, 2023

Refactoring: remove unused classes from #3825 CTR

36bf485

Follow-up to #3825 Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add possibility to execute multiple slice queries together to KeyColumnValueStore [cql-tests] [tp-tests] #3825

Add possibility to execute multiple slice queries together to KeyColumnValueStore [cql-tests] [tp-tests] #3825

porunov commented Jun 14, 2023 •

edited

Loading

porunov commented Jun 15, 2023

porunov commented Jun 17, 2023

porunov commented Jun 19, 2023

porunov commented Jun 21, 2023

porunov commented Jun 22, 2023

Add possibility to execute multiple slice queries together to KeyColumnValueStore [cql-tests] [tp-tests] #3825

Add possibility to execute multiple slice queries together to KeyColumnValueStore [cql-tests] [tp-tests] #3825

Conversation

porunov commented Jun 14, 2023 • edited Loading

For all changes:

For code changes:

For documentation related changes:

porunov commented Jun 15, 2023

porunov commented Jun 17, 2023

porunov commented Jun 19, 2023

porunov commented Jun 21, 2023

porunov commented Jun 22, 2023

porunov commented Jun 14, 2023 •

edited

Loading