Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable multiQuery optimization for has step #3244

Closed
li-boxuan opened this issue Oct 5, 2022 · 4 comments · Fixed by #3724
Closed

Enable multiQuery optimization for has step #3244

li-boxuan opened this issue Oct 5, 2022 · 4 comments · Fixed by #3724

Comments

@li-boxuan
Copy link
Member

See #3236

In short, the has step within out('some_other_edge').has('other_indexed_prop','other_value') is not parallelized, but if they rewrote it using where step, the query gets parallelized.

Related issues:
#2996
#2444

Maybe we should make it a project?

@porunov
Copy link
Member

porunov commented Oct 24, 2022

Created project for multiQuery steps optimization: https://github.com/orgs/JanusGraph/projects/5/views/1

porunov added a commit to porunov/janusgraph that referenced this issue Apr 20, 2023
Fixes JanusGraph#3244

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
porunov added a commit to porunov/janusgraph that referenced this issue Apr 20, 2023
Fixes JanusGraph#3244

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
porunov added a commit to porunov/janusgraph that referenced this issue Apr 28, 2023
Fixes JanusGraph#3244

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
porunov added a commit to porunov/janusgraph that referenced this issue Apr 28, 2023
Fixes JanusGraph#3244

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
porunov added a commit to porunov/janusgraph that referenced this issue Apr 29, 2023
Fixes JanusGraph#3244

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
porunov added a commit to porunov/janusgraph that referenced this issue Apr 30, 2023
Fixes JanusGraph#3244

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
@porunov porunov self-assigned this Apr 30, 2023
@porunov
Copy link
Member

porunov commented Apr 30, 2023

I'm working on this ticket right now, but for anyone who is will be using JanusGraph prior 1.0.0 version the text configuration can be a workaround: query.batch-property-prefetch = true.
With batch-property-prefetch = true and query.batch = true any access to has step on adjacent vertex traversal will be pre-fetching all properties of those vertices to the transaction-level cache. Make sure you have sufficient cache.tx-cache-size, otherwise pre-fetch won't be efficient.
For 1.0.0 I would like to completely remove query.batch-property-prefetch with the fix of this issue.
Instead I would like to propose a configuration option which can control different modes of properties pre-fetching for has step.

Currently query.batch-property-prefetch = true means to pre-fetch all properties during adjacent vertex traversal if and only if it follows at least one has step. Otherwise no properties are going to be pre-fetched on adjacent vertex traversal.
If we move properties pre-fetching logic to has step itself, there won't be any reason to use previous logic. Thus query.batch-property-prefetch becomes obsolete.
Usually it makes sense to deprecate such things, but knowing that we are moving towards 1.0.0 release, I think it makes sense to completely remove that logic to not carry on obsolete code into the major version.

As replacement solution for query.batch-property-prefetch I propose the query.has-step-batch-mode configuration option with the next possible modes:

  • all_properties Pre-fetch all vertex properties on any property access
  • required_properties_only Pre-fetch necessary vertex properties for the whole chain of foldable has steps
  • required_and_next_properties Prefetch the same properties as with required_properties_only mode, but also prefetch properties which may be needed in the next properties access step like values, properties, valueMap, or elementMap. In case the next step is not one of those properties access steps then this mode behaves same as required_properties_only. In case the next step is one of the properties access steps with limited scope of properties, those properties will be pre-fetched together in the same multi-query. In case the next step is one of the properties access steps with unspecified scope of property keys then this mode behaves same as all_properties.
  • required_and_next_properties_or_all Prefetch the same properties as with required_properties_only, but in case the next step is not values, properties, valueMap, or elementMap then acts like all_properties.
  • none Skips has step batch properties pre-fetch optimization.

For anyone who previously used query.batch-property-prefetch = true they can make a drop in replacement of that configuration to query.has-step-batch-mode = all_properties.
For anyone who previously used query.batch-property-prefetch = false they can make a drop in replacement of that configuration to query.has-step-batch-mode = none.

For default value in 1.0.0 I think either required_and_next_properties, required_and_next_properties_or_all, or all_properties I think could be a good option. Nevertheless as we have query.fast-property=true it means that we will load all properties for a given vertex anyway (no matter which mode we select).

P.S. I think we could later decide to make query.fast-property as false by default and better leverage different strategy modes. We can also extend the above modes with more modes (for example, if there is only one HasContainer and the next step is a property step which returns the same property, it makes sense to pre-fetch only that specific property even when all_properties is used). Those are just thoughts for later development.

@FlorianHockmann
Copy link
Member

Thanks a lot for working on this, @porunov! I think this is really an important area where we can improve the performance for many users.

I don't fully grasp everything needed here and especially which steps we can optimize in what way exactly, so my questions maybe don't make that much sense 😄:

Are we sure that this optimizations can only be applied for has steps and not maybe later on also for other steps? Because then I wouldn't include the step name in the config option. But if we are sure that it only applies to the has step, then being specific with the name probably helps users to understand what this config does.

Nevertheless as we have query.fast-property=true it means that we will load all properties for a given vertex anyway (no matter which mode we select).

Except for none, right? I would at least expect in that case that all properties are fetched, but without using multi-query.

If query.fast-property=true already means that all properties are pre-fetched, then that means that this new option doesn't make any difference by default unless users disable the option themselves?

I think it's a bit confusing how these two options relate to each other so we should definitely try to document that clearly and maybe write some suggestions for common use cases about how to set both properties together.

@porunov
Copy link
Member

porunov commented May 3, 2023

Are we sure that this optimizations can only be applied for has steps and not maybe later on also for other steps? Because then I wouldn't include the step name in the config option. But if we are sure that it only applies to the has step, then being specific with the name probably helps users to understand what this config does.

At this moment it's applied to has step only, I'm not sure if we reuse this config for other steps, but I think that other steps may get their own configurations later. It feels that we need somehow name all these configs properly so that it would be easily explainable to what optimizations these configs relay.
Potentially we could start with something similar to:

query.step.has.batch-property-prefetch = all_properties

I'm not sure, but I suspect we may need to introduce similar config for properties steps (values, properties, valueMap, or elementMap). Maybe something like:

query.step.properties.batch-property-prefetch = all_properties / required_properties_only

Except for none, right? I would at least expect in that case that all properties are fetched, but without using multi-query.

Yes, right. When none is used then the optimization won't be applied and the behavior will be similar as right now. I.e. with query.fast-property=true all properties will be fetched for accessed vertices but without using multi-query (i.e. not in batches).

If query.fast-property=true already means that all properties are pre-fetched, then that means that this new option doesn't make any difference by default unless users disable the option themselves?

query.fast-property=true applies to a single vertex, but the new option is applied when multi-query (batch.enabled = true) is used.
With query.fast-property=true is means that anytime we access any property of a vertex (doesn't matter if in Gremlin queries or JanusGraph API) then is will pre-fetch all properties for the accessed vertex.
With query.batch-property-prefetch = true it means it will pre-fetch all properties for a batch of vertices using multi-query.
multi-query depending on the storage backend acts differently. In case storage backend supports multi-key slice queries then we pass these multi-key slice query to the storage backend implementation which then executes the query however they think is the most optimal execution. In case storage backend doesn't support multi-key slice queries (like with CQL storage backend implementation right now, but see the discussion #3170 and work-in-progress PR #3760) then multi query will reuse parallel-backend-ops executor service for parallel queries execution for each key.

I think it's a bit confusing how these two options relate to each other so we should definitely try to document that clearly and maybe write some suggestions for common use cases about how to set both properties together.

Totally agree. Currently documentation doesn't clearly tells the relation between these configurations and when to use them. We will definitely need to document them but I don't want to document the relation between query.batch-property-prefetch = true and query.fast-property=true right now because the first option will most likely be replaced and the second option's default value could potentially be changed.

All the multi-query steps optimization tickets are tracked here: https://github.com/orgs/JanusGraph/projects/5
We can add documentation improvement ticket to that board as well I think.

porunov added a commit to porunov/janusgraph that referenced this issue May 8, 2023
Fixes JanusGraph#3244

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
porunov added a commit to porunov/janusgraph that referenced this issue May 8, 2023
Fixes JanusGraph#3244

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
porunov added a commit to porunov/janusgraph that referenced this issue May 9, 2023
Fixes JanusGraph#3244

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
porunov added a commit to porunov/janusgraph that referenced this issue May 9, 2023
Fixes JanusGraph#3244

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
@porunov porunov added this to the Release v1.0.0 milestone May 29, 2023
porunov added a commit to porunov/janusgraph that referenced this issue Jun 5, 2023
Fixes JanusGraph#3244

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
porunov added a commit to porunov/janusgraph that referenced this issue Jun 6, 2023
Fixes JanusGraph#3244

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
porunov added a commit to porunov/janusgraph that referenced this issue Jun 6, 2023
Fixes JanusGraph#3244

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
porunov added a commit to porunov/janusgraph that referenced this issue Jun 6, 2023
Fixes JanusGraph#3244

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
porunov added a commit to porunov/janusgraph that referenced this issue Jun 6, 2023
Fixes JanusGraph#3244

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
porunov added a commit that referenced this issue Jun 6, 2023
Fixes #3244

Signed-off-by: Oleksandr Porunov <alexandr.porunov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

3 participants