Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade TinkerPop to 3.5.0 [tp-tests] #2619

Merged
merged 1 commit into from
May 29, 2021

Conversation

li-boxuan
Copy link
Member

@li-boxuan li-boxuan commented May 16, 2021

  1. Bump TinkerPop dependency from 3.4.10 to 3.5.0 and resolve
    dependency conflicts. Several jackson packages are ignored from
    requireUpperBoundDeps maven enforce plugin now, because Spark 3.0.0
    requires a lower version of jackson (2.10.0) while other libraries,
    including CQL driver, require a higher version of jackson.
  2. Support null value mutation semantics to comply with TinkerPop 3.5.0.
    property(single, prop, null) removes the property while
    property(list/set, prop, null) is simply ignored.
  3. Introduce a workaround for traversal strategies which can be removed
    once TINKERPOP-2568 is fixed.
  4. Upgrade apache common configuration to configuration2 to comply
    with the same TinkerPop breaking change. Now users won't experience the
    problem reported at ConfigurationGraphFactory configuration with multiple storage.hostname not working #1447 anymore, since configuration2 by default does
    not read comma delimited string as a list of values.
  5. Other minor changes mostly caused by breaking changes in TinkerPop.

Closes #2611, #2624, #1447


Thank you for contributing to JanusGraph!

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there an issue associated with this PR? Is it referenced in the commit message?
  • Does your PR body contain #xyz where xyz is the issue number you are trying to resolve?
  • Has your PR been rebased against the latest commit within the target branch (typically master)?
  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you written and/or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE.txt file, including the main LICENSE.txt file in the root of this repository?
  • If applicable, have you updated the NOTICE.txt file, including the main NOTICE.txt file found in the root of this repository?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

@janusgraph-bot janusgraph-bot added the cla: external Externally-managed CLA label May 16, 2021
@li-boxuan li-boxuan marked this pull request as draft May 16, 2021 11:43
@farodin91 farodin91 added this to the Release v0.6.0 milestone May 16, 2021
@li-boxuan li-boxuan force-pushed the bump-tp-to-3.5.0 branch 4 times, most recently from 3630c22 to 3eb6c6a Compare May 19, 2021 03:26
@li-boxuan li-boxuan changed the title Upgrade TinkerPop to 3.5.0 [tp-tests] Upgrade TinkerPop to 3.5.0 May 19, 2021
@li-boxuan li-boxuan force-pushed the bump-tp-to-3.5.0 branch 3 times, most recently from f89bbc9 to 63ce476 Compare May 21, 2021 07:52
@li-boxuan li-boxuan changed the title Upgrade TinkerPop to 3.5.0 Upgrade TinkerPop to 3.5.0 [tp-tests] May 21, 2021
@li-boxuan li-boxuan force-pushed the bump-tp-to-3.5.0 branch 2 times, most recently from b4b8ffd to e8738f5 Compare May 22, 2021 08:14
@farodin91

This comment has been minimized.

@li-boxuan li-boxuan marked this pull request as ready for review May 22, 2021 09:34
@li-boxuan
Copy link
Member Author

I cleaned up the settings parsing.

@farodin91 Do you want to create a separate PR for that clean-up?

@farodin91
Copy link
Contributor

I cleaned up the settings parsing.

@farodin91 Do you want to create a separate PR for that clean-up?

It requires tp 3.5. I can wait that this PR is merged and create a new PR. I think should be part of jg 0.6.

@li-boxuan
Copy link
Member Author

It requires tp 3.5. I can wait that this PR is merged and create a new PR

Got it, I think that would be better. Just want to avoid unnecessary changes in this PR since it's already large.

Copy link
Contributor

@farodin91 farodin91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update mkdocs.yml and docs/changelog.md?

@li-boxuan li-boxuan force-pushed the bump-tp-to-3.5.0 branch 2 times, most recently from 73a1e74 to 78323ec Compare May 22, 2021 17:29
Copy link
Contributor

@farodin91 farodin91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for upgrading TP?

I just have some questions.

@@ -269,9 +270,11 @@ public static Configuration getJobConf(List<SliceQuery> queries, Long modulus) {
}

public static Configuration getJobConf(List<SliceQuery> queries, Long modulus, Long modVal) {
BaseConfiguration baseConfiguration = new BaseConfiguration();
baseConfiguration.setListDelimiterHandler(new DefaultListDelimiterHandler(','));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need this setting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To maintain backward compatibility. In the old version of commons-configuration, comma-separated values are parsed into a list of values, while in the new version (configuration2), commas are read as they are.

ref:

  1. http://tinkerpop.apache.org/docs/3.5.0-SNAPSHOT/upgrade/#_versions_and_dependencies
  2. https://commons.apache.org/proper/commons-configuration/userguide/upgradeto2_0.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the old behavior?

Copy link
Member Author

@li-boxuan li-boxuan May 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't do baseConfiguration.setListDelimiterHandler(new DefaultListDelimiterHandler(','));, then comma-delimited values, e.g. "host1,host2,host3" will be read as a single string, rather than a string array ["host1","host2","host3"] (which is the old behavior).

In theory, we can read it as a single string. When we need it, we split it into a string array. The drawback is, this string splitting might happen many times on runtime and thus inefficient. Another problem is, I am not sure if I could implement that change correctly and not miss any place.

@@ -88,7 +88,8 @@
<scylladb.version>1.7.1</scylladb.version>
<!-- align with org.apache.hbase:hbase -->
<jackson1.version>1.9.13</jackson1.version>
<jackson2.version>2.12.2</jackson2.version>
<!-- align with org.apache.spark:spark-core_2.12 -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should upgrade spark in a next pr to 3 or not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I got you. spark-gremlin 3.5.0 depends on org.apache.spark:spark-core_2.12 3.0.0: https://mvnrepository.com/artifact/org.apache.tinkerpop/spark-gremlin/3.5.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.12 for jackson or not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spark package name is org.apache.spark:spark-core_2.12. The version of it is 3.0.0. It requires Jackson 2.10.0, otherwise, it throws runtime exceptions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the 2.12 is for the support scale version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean scala right? Yes, the "2.12" suffix of "spark_core" is seemingly to match the scala version.

Copy link
Member

@porunov porunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @li-boxuan for this upgrade!
I have just some nitpicks mostly

@li-boxuan li-boxuan force-pushed the bump-tp-to-3.5.0 branch 4 times, most recently from eac5087 to 1a6d8cc Compare May 26, 2021 16:27
1. Bump TinkerPop dependency from 3.4.10 to 3.5.0 and resolve
dependency conflicts. Several jackson packages are ignored from
requireUpperBoundDeps maven enforce plugin now, because Spark 3.0.0
requires a lower version of jackson (2.10.0) while other libraries,
including CQL driver, require a higher version of jackson.
2. Support null value mutation semantics to comply with TinkerPop 3.5.0.
property(single, prop, null) removes the property while
property(list/set, prop, null) is simply ignored.
3. Introduce a workaround for traversal strategies which can be removed
once TINKERPOP-2568 is fixed.
4. Upgrade apache common configuration to configuration2 to comply
with the same TinkerPop breaking change. Now users won't experience the
problem reported at JanusGraph#1447 anymore, since configuration2 by default does
not read comma delimited string as a list of values.
5. Other minor changes mostly caused by breaking changes in TinkerPop.

Closes JanusGraph#2611, JanusGraph#2624, JanusGraph#1447

Signed-off-by: Boxuan Li <liboxuan@connect.hku.hk>
Copy link
Member

@porunov porunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you @li-boxuan for this upgrade!

@li-boxuan
Copy link
Member Author

Anyone who is still reviewing, or intends to review, please let me know within 24 hours. Otherwise, I'll merge this PR after that.

@rngcntr
Copy link
Contributor

rngcntr commented May 28, 2021

The version compatibility matrix should list TP 3.5.z instead of 3.4.z for JanusGraph 0.6, LGTM otherwise

@porunov
Copy link
Member

porunov commented May 28, 2021

The version compatibility matrix should list TP 3.5.z instead of 3.4.z for JanusGraph 0.6, LGTM otherwise

I think it makes sense to add a separate commit which won't trigger full tp tests because it's just a documentation change.
I would either add a separate commit to this PR or add a CTR commit after the merge

@li-boxuan li-boxuan merged commit 3da125d into JanusGraph:master May 29, 2021
@li-boxuan li-boxuan deleted the bump-tp-to-3.5.0 branch May 29, 2021 04:08
li-boxuan added a commit to li-boxuan/janusgraph that referenced this pull request Jun 1, 2021
This adds test cases to demonstrate that comma-delimited values can
be loaded by HadoopGraph in Spark usecase.

Prior to TinkerPop 3.5.0 upgrade, if config file has
`gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph`,
which is typically the case for MapReduce/Spark traversals,
then `GraphFactory.open(filename)` would first read all comma delimited values
into a list, and then only retain the last value. A workaround is to
create a `PropertiesConfiguration` config object, disable comma delimiter, and
then call `GraphFactory.open(config)`.

This is auto-resolved by JanusGraph#2619 because since TinkerPop 3.5.0, comma-delimited
values are by default loaded as they are. This commit adds some test cases
to demonstrate this behaviour. The same tests fail without JanusGraph#2619.

Signed-off-by: Boxuan Li <liboxuan@connect.hku.hk>
li-boxuan added a commit to li-boxuan/janusgraph that referenced this pull request Jun 2, 2021
This adds test cases to demonstrate that comma-delimited values can
be loaded by HadoopGraph in Spark usecase.

Prior to TinkerPop 3.5.0 upgrade, if config file has
`gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph`,
which is typically the case for MapReduce/Spark traversals,
then `GraphFactory.open(filename)` would first read all comma delimited values
into a list, and then only retain the last value. A workaround is to
create a `PropertiesConfiguration` config object, disable comma delimiter, and
then call `GraphFactory.open(config)`.

This is auto-resolved by JanusGraph#2619 because since TinkerPop 3.5.0, comma-delimited
values are by default loaded as they are. This commit adds some test cases
to demonstrate this behaviour. The same tests fail without JanusGraph#2619.

Signed-off-by: Boxuan Li <liboxuan@connect.hku.hk>
li-boxuan added a commit that referenced this pull request Jun 2, 2021
This adds test cases to demonstrate that comma-delimited values can
be loaded by HadoopGraph in Spark usecase.

Prior to TinkerPop 3.5.0 upgrade, if config file has
`gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph`,
which is typically the case for MapReduce/Spark traversals,
then `GraphFactory.open(filename)` would first read all comma delimited values
into a list, and then only retain the last value. A workaround is to
create a `PropertiesConfiguration` config object, disable comma delimiter, and
then call `GraphFactory.open(config)`.

This is auto-resolved by #2619 because since TinkerPop 3.5.0, comma-delimited
values are by default loaded as they are. This commit adds some test cases
to demonstrate this behaviour. The same tests fail without #2619.

Signed-off-by: Boxuan Li <liboxuan@connect.hku.hk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: external Externally-managed CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update TinkerPop to 3.5.0
6 participants