Skip to content

Improved contains check for bulkset with elements#2425

Merged
vkagamlyk merged 2 commits intoapache:3.6-devfrom
steigma:bulkset-contains-check-improvement-3.6-dev
Jan 18, 2024
Merged

Improved contains check for bulkset with elements#2425
vkagamlyk merged 2 commits intoapache:3.6-devfrom
steigma:bulkset-contains-check-improvement-3.6-dev

Conversation

@steigma
Copy link
Contributor

@steigma steigma commented Jan 4, 2024

Improved within test check for bulkset with elements (i.e., Vertex, Edge, VertexProperty) by using contains method. Due to changes w.r.t. Gremlin comparison semantics (cf. https://tinkerpop.apache.org/docs/3.7.0/dev/provider/#gremlin-semantics-concepts) this check was no longer done efficiently, which led to some regressions (see query/example below). In some cases, we can however ensure that the contains of the bulkset (using hash code and Object.equals) leads to the same results as the GremlinValueComparator.COMPARABILITY.equals. In fact, for elements, both checks are only be done with the ids of these elements.

This change re-enables an efficient check for elements (if the bulkset also contains these elements and only contains these kind of elements). This is realized via a transient attribute (allContainedElementsSameClass) in the bulkset class that represents whether all elements are of same type/class, which is checked by the within test method. Tje attribute is computed lazily when accessed to avoid overhead if the information is not required.

Pseudo code for sample data:

final Vertex x1 = G.addVertex(T.id, "x1", T.label, "person", "age", 27, "name", "x1");
// many friends for x1
for (int i = 1; i < 10000; ++i) {
    final Vertex x1fi = G.addVertex(T.id, "f"+i, T.label, "person", "age", 27, "name", "f"+i);
    x1.addEdge("knows", x1fi, T.id, "e-x1-f"+i, "weight", 0.5);
}
// one special friend that also has many other friends
final Vertex x1f0 = G.addVertex(T.id, "f0", T.label, "person", "age", 27, "name", "f0");
x1.addEdge("knows", x1f0, T.id, "e-x1-f0", "weight", 0.5);

// adding these many other friends, so friends of friends for x1
for (int i = 1; i < 10000; ++i) {
    final Vertex x1f0ofi = G.addVertex(T.id, "fof"+i, T.label, "person", "age", 27, "name", "fof"+i);
    x1f0.addEdge("knows", x1f0ofi, T.id, "e-f0-f"+i, "weight", 0.5);
}

Sample query (which is very inefficiently executed without this change):

g.V("x1").as("root").aggregate("directFriends")
                                .select("root").out().aggregate("directFriends")
                                .select("directFriends").limit(1).unfold().out().where(without("directFriends"))

The query is obviously not optimally formulated, but reproduces the issue

@steigma
Copy link
Contributor Author

steigma commented Jan 4, 2024

javascript build errors seem unrelated:

Failed to execute goal com.github.eirslett:frontend-maven-plugin:1.15.0:install-node-and-npm (install node and npm) on project gremlin-javascript: Could not download Node.js: Got error code 522 from the server. -> [Help 1]

Copy link
Contributor

@Cole-Greer Cole-Greer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the submission @steigma. I see this approach as a good step towards recovering some of the efficiency of the old comparison semantics. I left a few comments with minor tweaks to simplify it slightly.

Comment on lines +57 to +59
((BulkSet<?>)second).allContainedElementsSameClass() &&
((BulkSet<?>)second).getAllContainedElementsClass() != null &&
Element.class.isAssignableFrom(((BulkSet<?>)second).getAllContainedElementsClass()) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these checks be removed? I believe they are redundant. If any of these checks fail, then the final first.getClass() == ((BulkSet<?>)second).getAllContainedElementsClass() check would also fail.

Suggested change
((BulkSet<?>)second).allContainedElementsSameClass() &&
((BulkSet<?>)second).getAllContainedElementsClass() != null &&
Element.class.isAssignableFrom(((BulkSet<?>)second).getAllContainedElementsClass()) &&

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think you are right, these checks are redundant. Removed them in new revision.

Comment on lines +86 to +90
boolean hadNull = false;
for (final S key : this.map.keySet()) {
if ((key == null || key.getClass() == null)) {
if (allContainedElementsClass != null) {
allContainedElementsClass = null;
break;
}
hadNull = true;
} else if (hadNull) {
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be simplified. If a single null is found in the set, then we can return false.

Suggested change
boolean hadNull = false;
for (final S key : this.map.keySet()) {
if ((key == null || key.getClass() == null)) {
if (allContainedElementsClass != null) {
allContainedElementsClass = null;
break;
}
hadNull = true;
} else if (hadNull) {
break;
for (final S key : this.map.keySet()) {
if ((key == null || key.getClass() == null)) {
allContainedElementsClass = null;
return false;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, simplified it in new revision as you suggested.

allContainedElementsClassChecked = true;
boolean hadNull = false;
for (final S key : this.map.keySet()) {
if ((key == null || key.getClass() == null)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what situation can key.getClass() == null be true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, getClass probably never returns null, removed it in new revision.

}

@Test
public void shouldNotHaveSameClassForNull() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test case for null in the middle.
Something like Vertex1, null, Vertex2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I added a second case where null is added in the "middle", but not sure whether it is testing something different as it is stored in different "random" order in the bulkset.

allContainedElementsClass = null;
break;
}
hadNull = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably hadNull = true; should be before if in line 89

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, simplified it as you suggested below.

@xiazcy
Copy link
Contributor

xiazcy commented Jan 10, 2024

Thanks for opening the PR with the improvement. I don't have much to add to what's already been commented, but having a CHANGELOG entry would be helpful.

@steigma steigma force-pushed the bulkset-contains-check-improvement-3.6-dev branch from 1588ad8 to a7781f5 Compare January 10, 2024 12:26
@steigma steigma force-pushed the bulkset-contains-check-improvement-3.6-dev branch from a7781f5 to 7bdc66d Compare January 10, 2024 12:37
@codecov-commenter
Copy link

Codecov Report

Attention: 234 lines in your changes are missing coverage. Please review.

Comparison is base (e86eed2) 75.16% compared to head (a7781f5) 76.17%.

❗ Current head a7781f5 differs from pull request most recent head 7bdc66d. Consider uploading reports for the commit 7bdc66d to get more accurate results

Files Patch % Lines
...in/language/grammar/DefaultGremlinBaseVisitor.java 0.00% 131 Missing ⚠️
...rpop/gremlin/language/grammar/ArgumentVisitor.java 76.47% 21 Missing and 7 partials ⚠️
...e/tinkerpop/gremlin/console/GremlinGroovysh.groovy 52.94% 6 Missing and 10 partials ⚠️
...emlin/language/grammar/TraversalMethodVisitor.java 94.56% 12 Missing and 1 partial ⚠️
...remlin/language/grammar/GenericLiteralVisitor.java 78.18% 7 Missing and 5 partials ⚠️
...in/process/traversal/AnonymousTraversalSource.java 25.00% 9 Missing ⚠️
...pache/tinkerpop/gremlin/jsr223/JavaTranslator.java 60.00% 4 Missing and 2 partials ⚠️
...p/gremlin/language/grammar/WithOptionsVisitor.java 87.50% 5 Missing ⚠️
.../gremlin/language/grammar/NoOpTerminalVisitor.java 0.00% 4 Missing ⚠️
...p/gremlin/language/grammar/GremlinAntlrToJava.java 83.33% 2 Missing ⚠️
... and 6 more
Additional details and impacted files
@@              Coverage Diff              @@
##             3.6-dev    #2425      +/-   ##
=============================================
+ Coverage      75.16%   76.17%   +1.01%     
- Complexity     12316    13135     +819     
=============================================
  Files           1057     1084      +27     
  Lines          63470    65044    +1574     
  Branches        6936     7264     +328     
=============================================
+ Hits           47706    49548    +1842     
+ Misses         13191    12799     -392     
- Partials        2573     2697     +124     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vkagamlyk
Copy link
Contributor

Thanks for contribution @steigma.

VOTE+1

@Cole-Greer
Copy link
Contributor

Thanks @steigma, VOTE +1

@vkagamlyk vkagamlyk merged commit a1a3009 into apache:3.6-dev Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants