Jump to conversation
Unresolved conversations (1)
@li-boxuan li-boxuan May 24, 2021
Why is this needed? The signature of JanusGraphMultiVertexQuery::properties does not say it throws exceptions.
...timize/step/JanusGraphPropertiesStep.java
li-boxuan rngcntr
Resolved conversations (47)
@li-boxuan li-boxuan Jun 9, 2021
nitpick: I feel this H2 heading is unnecessary. The H3 headings can be H2 headings.
Outdated
docs/operations/batch-processing.md
rngcntr
@li-boxuan li-boxuan Jun 9, 2021
Shall line 93 & 94 be removed?
Outdated
...ize/JanusGraphMultiQueryStrategyTest.java
@li-boxuan li-boxuan Jun 9, 2021
nitpick: it seems if `multiQueryAllowed` becomes false, we can `break` or `return` right away.
Outdated
...trategy/JanusGraphMultiQueryStrategy.java
rngcntr
@li-boxuan li-boxuan Jun 9, 2021
I think we might want to put `LazyBarrierStrategy` as a prior strategy here. Reasons: 1) Both strategies insert barrier steps, so there is a possibility that they interfere with each other. It might be better to keep the behavior consistent. 2) I can see `LazyBarrierStrategy` has some priors. I don't know but I guess maybe this strategy should have them too.
...trategy/JanusGraphMultiQueryStrategy.java
li-boxuan rngcntr
@li-boxuan li-boxuan Jun 9, 2021
Do you know how come L78 - L80 are not covered by tests? Seems they are executed as long as limitBatchSize is enabled, and no explicit NoOpBarrierStep is present.
...trategy/JanusGraphMultiQueryStrategy.java
rngcntr
@li-boxuan li-boxuan Jun 9, 2021
When `limitBatchSize` is false, is this still needed? Seems all vertices would have been registered in the `initialize` method already.
...timize/step/JanusGraphMultiQueryStep.java
rngcntr
@porunov porunov Jun 1, 2021
Below we have a method `flatMap`. In that method I see that if `batchPropertyPrefetching` is `true` than it will pre fetching vertices. In that method it casts all elements to vertices. I.e. `vertices.add((Vertex) v);`. If so, I'm just curious, in case `multiQueryResults` here was assigned with `multiQuery.edges()`, how will `flatMap` method work when `batchPropertyPrefetching` is `true`? Will it throw cast exception or do we somehow sure that `flatMap` method can be executed only when `multiQueryResults` was populated with vertices?
...p/optimize/step/JanusGraphVertexStep.java
porunov rngcntr
@porunov porunov May 31, 2021
Not sure if `verticesToPrefetch.clear();` is better than `verticesToPrefetch = new HashSet<>(verticesToPrefetch.size());` but probably it's hard to compare. `clear` - `O(n)` to execute + probably `O(n)` to garbage collect. `new` - probably `O(n+1)` to garbage collect + some time to allocate a new array of `n` size. Don't want to go deep into the tradeoffs, just wanted to add random thoughts. These thoughts are probably out of scope of this PR, so I automatically resolve this comment.
...p/optimize/step/JanusGraphVertexStep.java
rngcntr
@rngcntr rngcntr May 31, 2021
Codacy is wrong about this method being unused. It is used to generate the parameters for the two methods above.
...ptimize/JanusGraphMultiQueryStepTest.java
li-boxuan rngcntr
porunov
@porunov porunov May 25, 2021
What is the purpose of collecting Steps into a new List? Will there be concurent modification if we don't collect them? Otherwise, I would prefer to returning just a stream (i.e. to save cycles from collection overhead): ``` public static Stream<Step> getSteps(Predicate<Step> predicate, Traversal.Admin<?,?> traversal) { return traversal.getSteps().stream().filter(predicate); } ```
...pop/optimize/JanusGraphTraversalUtil.java
porunov rngcntr
@li-boxuan li-boxuan May 24, 2021
From your comment, I presume this method may need modifications once https://github.com/JanusGraph/janusgraph/pull/2619 is merged, right?
Outdated
...trategy/JanusGraphMultiQueryStrategy.java
rngcntr
@li-boxuan li-boxuan May 24, 2021
Any reason for this change?
...anusGraphLocalQueryOptimizerStrategy.java
rngcntr
@li-boxuan li-boxuan May 24, 2021
I noticed that the old comment said "it is guaranteed that all elements are vertices", while here you have a condition to check whether traversal.get() is a vertex. Any reason for that?
...timize/step/JanusGraphPropertiesStep.java
rngcntr
@li-boxuan li-boxuan May 24, 2021
will this ever be true?
...timize/step/JanusGraphPropertiesStep.java
rngcntr
@li-boxuan li-boxuan May 24, 2021
I am having some trouble understanding this class. Would be very helpful if you could document: 1) What are `List<MultiQueriable> clientSteps` 2) What does `registerFutureVertexForPrefetching` do
...timize/step/JanusGraphMultiQueryStep.java
rngcntr
@li-boxuan li-boxuan May 24, 2021
Can currentStep be null here? If it cannot be null, then the null check condition in the while loop is redundant. Otherwise, null check is missing here.
Outdated
...pop/optimize/JanusGraphTraversalUtil.java
rngcntr
@li-boxuan li-boxuan May 24, 2021
Any possibility to add a test for this condition?
...pop/optimize/JanusGraphTraversalUtil.java
rngcntr
@li-boxuan li-boxuan May 24, 2021
I feel it might be safer and more efficient to use `previousStep instanceof EmptyStep`
Outdated
...pop/optimize/JanusGraphTraversalUtil.java
@li-boxuan li-boxuan May 24, 2021
typo: queriable -> queryable. To be honest I cannot find both words in dictionary (I looked up in Merriam Webster and Cambridge), but seems "queryable" is preferred, because: 1) When I searched "queriable" on Google, it suggested me "queryable" instead. 2) I found "queryable" to be more popular in the programming world, e.g. https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/queryable_state.html EDIT: never mind, I can see this MultiQueriable was created long time ago.
...pop/optimize/JanusGraphTraversalUtil.java
@li-boxuan li-boxuan May 24, 2021
Any specific reason for this change (reversing the order)?
Outdated
...pop/optimize/JanusGraphTraversalUtil.java
rngcntr
@li-boxuan li-boxuan May 24, 2021
(nitpick) can you make the formatting consistent with other options in this file? That is, adding indentations for lines 279-281 (every line but the first).
Outdated
...iguration/GraphDatabaseConfiguration.java
@li-boxuan li-boxuan May 24, 2021
Apart from having a test in JanusGraphTest, can we add a similar test in JanusGraphIndexTest? Just want to see and make sure it works correctly when indexing backends are involved too. Theoretically it doesn't matter, but since it is a big change, I would prefer having more tests for different scenarios.
...rg/janusgraph/graphdb/JanusGraphTest.java
rngcntr li-boxuan
@li-boxuan li-boxuan May 24, 2021
Apart from counting the number of backend queries, can we also compare the results with and without LIMIT_BATCH_SIZE? This can give us more confidence in its correctness.
...rg/janusgraph/graphdb/JanusGraphTest.java
rngcntr
@porunov porunov May 21, 2021
Probably could be replaced with: ``` public static Stream<Step> getSteps(Predicate<Step> predicate, Traversal.Admin<?,?> traversal) { return traversal.getSteps().stream().filter(predicate); } ```
Outdated
...pop/optimize/JanusGraphTraversalUtil.java
@porunov porunov May 21, 2021
Probably the below check will be faster and we don't create a new spliterator but on another hand your code is cleaner. Thus, I don't have strong preference. It's up to you. ``` return step instanceof BranchStep || step instanceof OptionalStep || step instanceof TraversalFilterStep || step instanceof RepeatStep; ```
Outdated
...pop/optimize/JanusGraphTraversalUtil.java
porunov rngcntr
li-boxuan
@porunov porunov May 21, 2021
I think it's better to put `!(parentStep instanceof RepeatStep)` before `isMultiQueryCompatibleParent(parentStep)` because `isMultiQueryCompatibleParent` probably will be slower to check
Outdated
...pop/optimize/JanusGraphTraversalUtil.java
@porunov porunov May 21, 2021
`&& !(previousStep instanceof InjectStep)` - redundant condition. if `previousStep` is instance of `SideEffectStep` or `ProfileStep` it cannot be instance of `InjectStep`
Outdated
...pop/optimize/JanusGraphTraversalUtil.java
@li-boxuan li-boxuan Apr 27, 2021
Thanks for explaining the current behavior! I am just curious: do you know if this "blocking operator" works for `valueMap()` step as well (when `query.batch` is true)? AFAIK, batch query does not work for `valueMap` step, but I don't know whether it follows BFS or DFS. For example, consider below queries (assuming the index query hits Elasticsearch and edge store query hits Cassandra): `g.V().has("someKeyIndexed", "value").values()` Likely this query will fetch all results from ES (i.e., exhausting the iterator returned by `JanusGraphStep`) and then fire a single (logical) batch query against Cassandra, if I am understanding correctly. `g.V().has("someKeyIndexed", "value").valueMap()` - Does JanusGraph get all results from ES first and then fire multiple queries against Cassandra?
Outdated
docs/operations/batch-processing.md
li-boxuan rngcntr
@rngcntr rngcntr Apr 16, 2021
Does anybody have a better solution here? What I actually want is `currentStep != step`, but Codacy doesn't allow that and suggests to use `equals()`. However, multiple steps within a traversal can be equal without being _the same_.
Outdated
...pop/optimize/JanusGraphTraversalUtil.java
porunov rngcntr
li-boxuan
@mad mad Apr 15, 2021
Can we re-initialize multiQueryResult instead of putAll? What is case when some elements will be retrieved repeatedly? If we can re-initialize it significantly reduce memory usage for OLAP query. In some cases more preferable execute another query for repeated elements instead of OOM
Outdated
...timize/step/JanusGraphPropertiesStep.java
rngcntr
@rngcntr rngcntr Apr 15, 2021
Someone please review this. I have no idea, why I suddenly need `gcc` to build the docs. But otherwise ``` docker build -t doc-site:mkcos -f docs.Dockerfile . ``` fails with `no such file or directory: gcc`
Outdated
docs.Dockerfile
rngcntr li-boxuan
@farodin91 farodin91 Apr 13, 2021
2021
Outdated
...ph/blueprints/JanusGraphTestListener.java
@li-boxuan li-boxuan Apr 13, 2021
This comment now loses its context, since ```java final boolean useMultiQuery = !TraversalHelper.onGraphComputer(traversal) && janusGraph.getConfiguration().useMultiQuery(); ``` now has been removed.
Outdated
...anusGraphLocalQueryOptimizerStrategy.java
@li-boxuan li-boxuan Apr 13, 2021
What happens if JanusGraphLocalQueryOptimizerStrategy is excluded by the user? I guess at least this strategy won't properly work. Could it be even worse?
Outdated
...trategy/JanusGraphMultiQueryStrategy.java
rngcntr li-boxuan
@li-boxuan li-boxuan Apr 13, 2021
nitpick: this line seems redundant.
Outdated
...nkerpop/optimize/step/MultiQueriable.java
rngcntr
@li-boxuan li-boxuan Apr 13, 2021
Before looking at the code, I wonder how this mode (limited batch processing) works if barrier step is not present. This could happen when users exclude the [LazyBarrierStrategy](https://tinkerpop.apache.org/javadocs/current/full/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/LazyBarrierStrategy.html), and they don't explicitly insert a barrier() step.
docs/operations/batch-processing.md
rngcntr
@li-boxuan li-boxuan Apr 13, 2021
Just curious: now (after this PR), when barrier() step is present, does it always accumulate all previous results, no matter what barrier size is? If this is the case, does it mean explicit barrier() step would be redundant? If so, *maybe* you could mention to avoid confusion. Otherwise, users might wonder why `g.V().barrier(1000).out()` still fires a single (logical) batch query.
docs/operations/batch-processing.md
li-boxuan rngcntr
@li-boxuan li-boxuan Apr 13, 2021
typo: accordingly
Outdated
docs/operations/batch-processing.md
@li-boxuan li-boxuan Apr 13, 2021
This seems to be inconsistent with the example below, where you use `barrier(<size>)` rather than `limit(<size>)`.
Outdated
docs/operations/batch-processing.md
@li-boxuan li-boxuan Apr 13, 2021
Shouldn't it be "If limit step is applied very late, there ..."? Users could place limit(x) step anywhere they want in the query. Correct me if I am wrong, but per my understanding, if they have g.V().limit(10).out().limit(10).out().limit(10) then this is not a problem.
Outdated
docs/operations/batch-processing.md
rngcntr
@li-boxuan li-boxuan Apr 13, 2021
Might be worth adding that the first option also reduces memory footprint because intermediate results don't have to be cached. This follows the lazy evaluation philosophy of Tinkerpop.
Outdated
docs/operations/batch-processing.md
rngcntr
@li-boxuan li-boxuan Apr 13, 2021
Might be worth noting that real implementations differ. For example, in the CQL driver, we use a thread pool to fire backend queries concurrently rather than fire a single batch query to achieve the optimal performance. Maybe we could call it `a logical batch query`?
Outdated
docs/operations/batch-processing.md
li-boxuan rngcntr
@li-boxuan li-boxuan Apr 12, 2021
Rather than "assertTrue(true)", you could just put a comment here saying that this test is skipped, e.g. "//Do nothing".
Outdated
...h/graphdb/inmemory/InMemoryGraphTest.java
li-boxuan rngcntr
@rngcntr rngcntr Apr 12, 2021
Can I disable Codacy somehow? Alternative would be to add an `assertTrue(true)` to every disabled test.
...h/graphdb/inmemory/InMemoryGraphTest.java
porunov
@li-boxuan li-boxuan Apr 10, 2021
Might be worth noting that this option does not take effect if query.batch = false. Only looking at the doc, I am not 100% sure how this could affect my use case. Can you maybe also explain in the doc what the default size of the TinkerPop barrier is? Does this apply to any place that MULTIQUERY is applicable, i.e. if this is enabled, all multi-queries will have an upper bound for batch size?
Outdated
...iguration/GraphDatabaseConfiguration.java
rngcntr
@rngcntr rngcntr Apr 9, 2021
Normally, strategies are applied for the parent traversal first and then called for child traversals automatically. At this point, I'm not quite happy with the solution for the following reason: This strategy relies on the fact that all `VertexStep`s have been converted to `JanusGraphVertexStep`s by `JanusGraphLocalQueryOptimizerStrategy`, even in all child traversals. The strategy _does_ that, but only by manually applying that recursive search manually. (See the refactoring of `getStepsOfClass` to `getStepsOfAssignableClassRecursively` in `JanusGraphLocalQueryOptimizerStrategy`.) Ideally, this strategy would look at the child traversals when they have passed all other strategies and are completely optimized. It would then potentially insert `JanusGraphMultiQuerySteps` at higher levels of the traversal if necessary. However, the bottom-up approach is not supported by TP (and that's for good reason in most cases!). In general, one can not modify parent traversals in the optimization phase of their child traversals because the parents are already locked. The only other solution I came up with is to insert `JanusGraphMultiQueryStep`s everywhere they could _potentially_ be needed later and wait for the execution phase to let every `MultiQueriable` search their favorite `JanusGraphMultiQueryStep` bottom-up and on their own. - Pro: Optimization strategies are cleaner - Con: Unnecessary steps could remain within the traversal
Outdated
...trategy/JanusGraphMultiQueryStrategy.java
rngcntr
@rngcntr rngcntr Apr 9, 2021
As discussed on the [gremlin-users](https://groups.google.com/g/gremlin-users/c/93dssDZVJEM) mailing list, this class allows us to exclude certain strategies for specified tests. The reason this is needed is that `ProfileTest` asserts that the number of steps in a traversal does not change. To achieve that, the test itself disables `LazyBarrierStrategy`. And since that happens within the TinkerPop codebase, we need this mechanism of achieving the same result for `JanusGraphMultiQueryStrategy`.
...ph/blueprints/JanusGraphTestListener.java