Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-9162] Upgrade Jackson to version 2.10.2 #10643

Merged
merged 1 commit into from Feb 21, 2020

Conversation

iemejia
Copy link
Member

@iemejia iemejia commented Jan 21, 2020

@lukecwik
Copy link
Member

Run JavaPortabilityApi PreCommit

@lukecwik
Copy link
Member

Can you run the linkage checker against all the Beam modules that have a direct dependency on Jackson and enumerate the before and after similar to the PR description in #10631

@iemejia
Copy link
Member Author

iemejia commented Jan 24, 2020

$ grep -l -R "library.java.jackson" --include \*.gradle . | sort
./runners/apex/build.gradle
./runners/core-construction-java/build.gradle
./runners/core-java/build.gradle
./runners/direct-java/build.gradle
./runners/extensions-java/metrics/build.gradle
./runners/flink/flink_runner.gradle
./runners/gearpump/build.gradle
./runners/google-cloud-dataflow-java/build.gradle
./runners/google-cloud-dataflow-java/worker/build.gradle
./runners/google-cloud-dataflow-java/worker/legacy-worker/build.gradle
./runners/samza/build.gradle
./runners/spark/build.gradle
./sdks/java/core/build.gradle
./sdks/java/extensions/google-cloud-platform-core/build.gradle
./sdks/java/extensions/jackson/build.gradle
./sdks/java/extensions/sql/build.gradle
./sdks/java/harness/build.gradle
./sdks/java/io/amazon-web-services2/build.gradle
./sdks/java/io/amazon-web-services/build.gradle
./sdks/java/io/cassandra/build.gradle
./sdks/java/io/elasticsearch/build.gradle
./sdks/java/io/elasticsearch-tests/elasticsearch-tests-common/build.gradle
./sdks/java/io/google-cloud-platform/build.gradle
./sdks/java/io/hadoop-file-system/build.gradle
./sdks/java/io/hcatalog/build.gradle
./sdks/java/io/kafka/build.gradle
./sdks/java/io/kinesis/build.gradle
./sdks/java/io/synthetic/build.gradle
./sdks/java/maven-archetypes/examples/build.gradle
./sdks/java/testing/nexmark/build.gradle

@iemejia
Copy link
Member Author

iemejia commented Jan 24, 2020

is there any generic way to analyze just all the modules? Artifact ids don't coincide with directories, so doing a manual artifact list is painful and error-prone.

@iemejia
Copy link
Member Author

iemejia commented Jan 24, 2020

We could create a jenkins task for that and then just take a look at that in the future if we get to do it that way, no?

@iemejia
Copy link
Member Author

iemejia commented Jan 28, 2020

@suztomo maybe you know a way I can run the linkagechecker analysis in the full set of Beam modules? I think is more scalable to have a task for that that we invoke during PRs to validate that no regressions are included as suggested by Luke. (I can do that in Maven but my gradle-fu is still not good enough).

@suztomo
Copy link
Contributor

suztomo commented Jan 28, 2020

I want that job too! The challenge is that because of the many existing linkage errors, I'd have to compare

  • linkage errors in a PR, and
  • linkage errors in origin/master

Like a code coverage report. As I don't know how to do that, I'm still doing it diff command in my local environment.

@iemejia
Copy link
Member Author

iemejia commented Jan 28, 2020

Well the manual comparison is not ideal but we can cope with that for the moment, what I don't want is to type the command for the 31 modules of this PR and then have to change it for other dependency upgrade. I just want some sort of ./gradlew :checkJavaLinkage that works for the whole set of modules of the project. Is this 'feasible' with gradlew + Beam?

@suztomo
Copy link
Contributor

suztomo commented Jan 28, 2020

Let me think about that this week. https://issues.apache.org/jira/browse/BEAM-9206

For this PR, I would only check the modules that use jackson: find . -name 'build.gradle' | xargs grep library.java.jackson_.

@iemejia
Copy link
Member Author

iemejia commented Jan 28, 2020

Hehe so the 31 that I mentioned above, mmm not an easy to sell proposition. On the other hand I can help with the jenkins part if you get to do an incantation that works locally for all modules.

@lukecwik
Copy link
Member

If the linkage checker had a way to ignore pre-existing linkage failures, we could turn it into a test by enumerating all known failures statically. The linkage checker would complain if there was a new failure that wasn't pre-existing or if the pre-existing failure wasn't being reported anymore (allowing us to maintain the list over time).

The vendored gRPC 1.26.0 reduced the number of warnings in beam-sdks-java-core down to 4. Also, running the linkage checker per module would be useful and I can help with the Gradle bit if there was some good way to have linkage checker main return non zero status code on linkage errors and also if it supported enumerating pre-existing somehow.

@iemejia
Copy link
Member Author

iemejia commented Jan 29, 2020

@lukecwik do you think we can just get the gradle part that just runs the linkageChecker and outputs the errors for all modules in the meantime? That would allow me to run and do the manual comparison so we can get progress in this PR.

@lukecwik
Copy link
Member

lukecwik commented Jan 30, 2020

If you want to check linkage for all relevant modules then use:

./gradlew -Ppublishing -PjavaLinkageArtifactIds=beam-runners-apex,beam-runners-core-construction-java,beam-runners-core-java,beam-runners-direct-java,beam-runners-extensions-java-metrics,beam-runners-flink-1.7,beam-runners-flink-1.7-job-server,beam-runners-flink-1.8,beam-runners-flink-1.8-job-server,beam-runners-flink-1.9,beam-runners-flink-1.9-job-server,beam-runners-gearpump,beam-runners-google-cloud-dataflow-java,beam-runners-java-fn-execution,beam-runners-jet,beam-runners-local-java-core,beam-runners-portability-java,beam-runners-samza,beam-runners-samza-job-server,beam-runners-spark,beam-runners-spark-job-server,beam-sdks-java-bom,beam-sdks-java-core,beam-sdks-java-extensions-euphoria,beam-sdks-java-extensions-google-cloud-platform-core,beam-sdks-java-extensions-join-library,beam-sdks-java-extensions-json-jackson,beam-sdks-java-extensions-kryo,beam-sdks-java-extensions-protobuf,beam-sdks-java-extensions-sketching,beam-sdks-java-extensions-sorter,beam-sdks-java-extensions-sql,beam-sdks-java-extensions-sql-datacatalog,beam-sdks-java-extensions-sql-hcatalog,beam-sdks-java-extensions-sql-jdbc,beam-sdks-java-extensions-sql-perf-tests,beam-sdks-java-extensions-sql-zetasql,beam-sdks-java-extensions-zetasketch,beam-sdks-java-fn-execution,beam-sdks-java-harness,beam-sdks-java-io-amazon-web-services,beam-sdks-java-io-amazon-web-services2,beam-sdks-java-io-amqp,beam-sdks-java-io-cassandra,beam-sdks-java-io-clickhouse,beam-sdks-java-io-common,beam-sdks-java-io-elasticsearch,beam-sdks-java-io-google-cloud-platform,beam-sdks-java-io-hadoop-common,beam-sdks-java-io-hadoop-file-system,beam-sdks-java-io-hadoop-format,beam-sdks-java-io-hbase,beam-sdks-java-io-hcatalog,beam-sdks-java-io-jdbc,beam-sdks-java-io-jms,beam-sdks-java-io-kafka,beam-sdks-java-io-kinesis,beam-sdks-java-io-kudu,beam-sdks-java-io-mongodb,beam-sdks-java-io-mqtt,beam-sdks-java-io-parquet,beam-sdks-java-io-rabbitmq,beam-sdks-java-io-redis,beam-sdks-java-io-solr,beam-sdks-java-io-synthetic,beam-sdks-java-io-tika,beam-sdks-java-io-xml,beam-sdks-java-nexmark :checkJavaLinkage

I believe that is the entire set of modules that aren't archetypes, model, or vendored dependencies.

I think there is an issue with beam-sdks-java-extensions-sql-zetasql and beam-sdks-java-extensions-zetasketch which you might need to exclude from the list above.

@suztomo
Copy link
Contributor

suztomo commented Jan 30, 2020

I wouldn't rely on the result of the linkage check for an artifact list. LinkageCheck(artifactA,artifactB) is not the same as LinkageCheck(artifactA) + LinkageCheck(artifactB), where I define LinkageCheck(artifact...) as a function that takes list of Maven artifacts and returns list of linkage errors for a Java project that has dependencies of the artifacts. This is because linkage errors in one artifact may be hidden by classes in another artifact. For example, the missing javax.annotation.Nullable problem (BEAM-8917) would be undetected if we run Linkage Check on the beam-sdks-java-core artifact and another artifact that happens to depend on jsr305 (which contains javax.annotation.Nullable).

To ensure each Beam artifact is checked independently, I created a shell script that run "checkJavaLinkage" for each artifact:
https://gist.github.com/suztomo/ec4834151a81f8e1d26aa0afb278818b
What do you think about making this part of the Beam Jenkins jobs? (BEAM-9206)

an issue with beam-sdks-java-extensions-sql-zetasql

For this issue, today I made a release of Linkage Checker 1.1.3 and I just raised a PR #10721 .

@iemejia
Copy link
Member Author

iemejia commented Feb 21, 2020

Ok so finally back to this one after soooo long.
It took literally ages to run the linkage check for so many modules. I did not see anything weird
https://gist.github.com/iemejia/1b840e8ca248d51dd649b83612d8b330

Can you PTAL @lukecwik I really would like to have this one as part of 2.20.0. Thx

@iemejia
Copy link
Member Author

iemejia commented Feb 21, 2020

@suztomo I found that when we run the linkage checks on the hcatalog module it takes wwwayyy too long time and then ends up with a weird OOM error I am wondering if there is something odd like the checker assuming some recursive dependency. Can you PTAL?
https://gist.github.com/iemejia/863f5c112b77be1053e73d01f1ec9f0f

@suztomo
Copy link
Contributor

suztomo commented Feb 21, 2020

I've been working on the slowness of Linkage Checker. GoogleCloudPlatform/cloud-opensource-java#1145
I'm hoping I can fix that this week or next.

@iemejia
Copy link
Member Author

iemejia commented Feb 21, 2020

Awesome @suztomo my issue here was more about the OOM fact but probably related. Eager to retest when ready.

@lukecwik
Copy link
Member

We use a mix of jackson-dataformat-csv / jackson-dataformat-xml since it is brought in transitively through our dependencies such as org.apache.beam:beam-sdks-java-io-rabbitmq:2.20.0-SNAPSHOT (compile) / com.rabbitmq:amqp-client:5.7.3 (compile) / io.micrometer:micrometer-core:1.2.0 (compile, optional) / org.apache.logging.log4j:log4j-core:2.12.0

To improve our current usage we need to ensure that we declare 2.10.2 versions of jackson libraries in

> jackson-dataformat-xml-2.8.7.jar is at:
>   org.apache.beam:beam-sdks-java-extensions-sql:2.20.0-SNAPSHOT (compile) / com.alibaba:fastjson:1.2.49 (compile) / org.springframework:spring-webmvc:4.3.7.RELEASE (provided, optional) / com.fasterxml.jackson.dataformat:jackson-dataformat-xml:2.8.7 (compile, optional)

> jackson-dataformat-xml-2.9.9.jar is at:
>   org.apache.beam:beam-sdks-java-io-rabbitmq:2.20.0-SNAPSHOT (compile) / com.rabbitmq:amqp-client:5.7.3 (compile) / io.micrometer:micrometer-core:1.2.0 (compile, optional) / org.apache.logging.log4j:log4j-core:2.12.0 (compile, optional) / com.fasterxml.jackson.dataformat:jackson-dataformat-xml:2.9.9 (compile, optional)

< jackson-dataformat-csv-2.10.0.jar is at:
org.apache.beam:beam-sdks-java-io-kafka:2.20.0-SNAPSHOT (compile) / io.confluent:kafka-avro-serializer:5.3.2 (compile) / org.apache.kafka:kafka_2.12:5.3.2-ccs (provided) / com.fasterxml.jackson.dataformat:jackson-dataformat-csv:2.10.0 (provided)

I would say that this PR is a net positive since the prior version of Jackson didn't match the dataformat versions anyway but your analysis points to some simple additional changes we could do to improve consistency because of what a downstream dependency is bringing in.

Filed https://issues.apache.org/jira/browse/BEAM-9352 for further improvements.

@lukecwik lukecwik merged commit bdf0be1 into apache:master Feb 21, 2020
@iemejia
Copy link
Member Author

iemejia commented Feb 21, 2020

Thanks Luke will take a look at the other ticket.

@suztomo
Copy link
Contributor

suztomo commented Feb 21, 2020

Regarding the output of Linkage Checker, the conflict from optional dependencies shouldn't cause the problem. The checker is conservative to find many errors even in optional or provided dependencies.

@lukecwik
Copy link
Member

Agreed

@iemejia
Copy link
Member Author

iemejia commented Feb 24, 2020

Forgot I had one last question around this one. @lukecwik what command did you invoke to output the list of modules that you suggested above, I would like to be aware of this and document it somewhere for the future.

@iemejia
Copy link
Member Author

iemejia commented Feb 24, 2020

Another suggestion for the script @suztomo since I had multiple intermediary failures while trying to validate this PR modules, being able to produce output eagerly could help, currently we execute the validation for all modules but produce no output until the end.

@suztomo
Copy link
Contributor

suztomo commented Feb 24, 2020

@iemejia How about running diff command while waiting for the result? (The script outputs the file name).

I thought about the comparing the result early (see "Early Exist on Failure" below) but it would increase overall execution time.

Current

As of now the script consists of (1) Install artifacts and (2) run multiple checkJavaLinakge, for 2 branches.

for branch in [master, PR]:
  install artifacts   // This takes ~ 5 minutes
  for beamArtifact in ["beam-sdks-java-core", "beam-sdks-java-io-google-cloud-platform" ... ]
    checkJavaLinkage beamArtifact  // This takes 1 ~ 4 minutes
compare_diff

Total time for "install artifacts" is ~10 minutes (2 * 5)

Early Exit on Failure

I considered changing the loop order as below, but the number of "install artifacts" increases if the for-loop for the Beam artifacts is outside:

for beamArtifact in ["beam-sdks-java-core", "beam-sdks-java-io-google-cloud-platform", "beam-runners-google-cloud-dataflow" ]:
  for branch in [master, PR]:
    install artifacts   // This takes ~ 5 minutes
    checkJavaLinkage beamArtifact  // This takes 1 ~ 4 minutes
  compare_diff

Total time for "install artifacts" is ~30 minutes (3 * 2 * 5).

@lukecwik
Copy link
Member

Forgot I had one last question around this one. @lukecwik what command did you invoke to output the list of modules that you suggested above, I would like to be aware of this and document it somewhere for the future.

I looked through the gist that you provided and copied out the relevant lines.

@iemejia iemejia deleted the BEAM-9162-jackson-update branch February 25, 2020 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants