New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-1517] Move to Parquet 1.8.2 in preparation for moving to Spark 2.2.0 #1518

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
4 participants
@fnothaft
Member

fnothaft commented May 9, 2017

Bumps Parquet version to 1.8.2. Relocates org.apache.avro to org.bdgenomics.avro. Resolves #1517 pending release of and bump to Spark 2.2.0.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins May 9, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1980/

Build result: FAILURE

[...truncated 16 lines...] > /home/jenkins/git2/bin/git rev-parse origin/pr/1518/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains add8189 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1518/merge^{commit} # timeout=10Checking out Revision add8189 (origin/pr/1518/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f add8189a144f110fa5fb3ee371340b388995c31fFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

AmplabJenkins commented May 9, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1980/

Build result: FAILURE

[...truncated 16 lines...] > /home/jenkins/git2/bin/git rev-parse origin/pr/1518/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains add8189 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1518/merge^{commit} # timeout=10Checking out Revision add8189 (origin/pr/1518/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f add8189a144f110fa5fb3ee371340b388995c31fFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh May 9, 2017

Member

From Spark 1.6.1 Jenkins test failure:

java.lang.NoSuchMethodError: org.apache.parquet.io.api.Binary.fromCharSequence(Ljava/lang/CharSequence;)Lorg/apache/parquet/io/api/Binary;
	at org.apache.parquet.avro.AvroWriteSupport.fromAvroString(AvroWriteSupport.java:367)
	at org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:342)
	at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:274)
	at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:187)
	at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:161)
	at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
	at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
	at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
	at org.apache.spark.rdd.InstrumentedRecordWriter$$anonfun$write$1.apply$mcV$sp(InstrumentedOutputFormat.scala:55)

That InstrumentedRecordWriter is ours from bdg-utils, right?

Member

heuermh commented May 9, 2017

From Spark 1.6.1 Jenkins test failure:

java.lang.NoSuchMethodError: org.apache.parquet.io.api.Binary.fromCharSequence(Ljava/lang/CharSequence;)Lorg/apache/parquet/io/api/Binary;
	at org.apache.parquet.avro.AvroWriteSupport.fromAvroString(AvroWriteSupport.java:367)
	at org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:342)
	at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:274)
	at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:187)
	at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:161)
	at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
	at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
	at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
	at org.apache.spark.rdd.InstrumentedRecordWriter$$anonfun$write$1.apply$mcV$sp(InstrumentedOutputFormat.scala:55)

That InstrumentedRecordWriter is ours from bdg-utils, right?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft May 9, 2017

Member

@heuermh I believe this error indicates that moving to Parquet 1.8.2 breaks backwards compatibility with Spark 1.x.

Member

fnothaft commented May 9, 2017

@heuermh I believe this error indicates that moving to Parquet 1.8.2 breaks backwards compatibility with Spark 1.x.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 17, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2224/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1518/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 695be809b4ceb2e71f851ecf1d400cdab9b7ce3f # timeout=10Checking out Revision 695be809b4ceb2e71f851ecf1d400cdab9b7ce3f (origin/pr/1518/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 695be809b4ceb2e71f851ecf1d400cdab9b7ce3fFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

AmplabJenkins commented Jul 17, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2224/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1518/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 695be809b4ceb2e71f851ecf1d400cdab9b7ce3f # timeout=10Checking out Revision 695be809b4ceb2e71f851ecf1d400cdab9b7ce3f (origin/pr/1518/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 695be809b4ceb2e71f851ecf1d400cdab9b7ce3fFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 17, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2226/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1518/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 7dc8012 # timeout=10Checking out Revision 7dc8012 (origin/pr/1518/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 7dc8012a419fd3b690398efd2f4380511e4b258aFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

AmplabJenkins commented Jul 17, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2226/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1518/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 7dc8012 # timeout=10Checking out Revision 7dc8012 (origin/pr/1518/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 7dc8012a419fd3b690398efd2f4380511e4b258aFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft fnothaft referenced this pull request Jul 31, 2017

Open

does not run #304

[ADAM-1517] Move to Parquet 1.8.2 in preparation for moving to Spark …
…2.2.0.

Bumps Parquet version to 1.8.2. Shades org.apache.avro as org.bdgenomics.avro.
Resolves #1517 pending release of and bump to Spark 2.2.0.
@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Aug 24, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2326/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1518/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains b1854a63487857748b082c9067790865eba04796 # timeout=10Checking out Revision b1854a63487857748b082c9067790865eba04796 (origin/pr/1518/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f b1854a63487857748b082c9067790865eba04796First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.1.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.1.0,centosTriggering ADAM-prb ? 2.6.0,2.10,2.1.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.1.0,centosADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.1.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.1.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.1.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.1.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

AmplabJenkins commented Aug 24, 2017

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2326/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1518/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains b1854a63487857748b082c9067790865eba04796 # timeout=10Checking out Revision b1854a63487857748b082c9067790865eba04796 (origin/pr/1518/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f b1854a63487857748b082c9067790865eba04796First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.1.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.1.0,centosTriggering ADAM-prb ? 2.6.0,2.10,2.1.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.1.0,centosADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.1.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.1.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.1.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.1.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls Aug 24, 2017

Coverage Status

Coverage remained the same at 83.403% when pulling 7f76693 on fnothaft:issues/1517-parquet-1.8.2-shade-avro into 9b51df5 on bigdatagenomics:master.

coveralls commented Aug 24, 2017

Coverage Status

Coverage remained the same at 83.403% when pulling 7f76693 on fnothaft:issues/1517-parquet-1.8.2-shade-avro into 9b51df5 on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Aug 24, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2329/
Test PASSed.

AmplabJenkins commented Aug 24, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2329/
Test PASSed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Aug 25, 2017

Member

The bad things are to depend on 1.8.2 and shade 1.8.1 to include at runtime? Yeah that sucks. :)

Member

heuermh commented Aug 25, 2017

The bad things are to depend on 1.8.2 and shade 1.8.1 to include at runtime? Yeah that sucks. :)

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Aug 25, 2017

Member

Yeah, I'm not proud of what I've done. It's a real mess; there's a lot going on. Here's the big four:

  • spark-sql depends on parquet-column and whatnot, but doesn't depend on parquet-avro, but spark-core also depends separately on avro. Since they depend on avro but don't depend on parquet-avro, there's effectively a transitive dependency conflict that exists, but Spark doesn't notice it. spark-sql does include parquet-avro in test scope, but then pins parquet-avro to 1.8.1 even after parquet-column moved to 1.8.2 (iirc, I'd need to double check this).
  • parquet-avro kinda broke binary compat going 1.8.1 to 1.8.2. I feel a bit weird calling it "breaking binary compat" because parquet-avro 1.8.2 relies on a new public method in parquet-column 1.8.2, and thus cannot be linked to parquet-column 1.8.1. Also, it is an interface that is logically wholly contained inside of parquet (see below). shrug I don't fault the parquet folks—this is fairly reasonable behavior—but there's a dependency management story between Parquet and Spark and Avro that people haven't really thought through.
  • We've depended on a higher version of Avro for a while. The one thing this seems to break is persisting the sorted partition map to disk; otherwise this seems to work OK. I'm not 100% sure how we're getting away with this, but, we have been.
  • Arguably, the least bad solution to all of the above problems is to relocate both Parquet and Avro in a shaded JAR before building adam-core; then we don't have the dependency version mismatches in test scope (this is bad, right?). Again, this is a pretty bad "least bad" solution, but I digress. However, if you try to do this, you need to relocate both parquet-avro (and it's transitive dependencies) and parquet-scala. However, relocating parquet-scala appears to break the org.apache.parquet.filter2.dsl.Dsl companion object link, which means that you can't create predicates.

WRT the last point, we wind up relocating parquet in the shaded überjar, which is probably a bad thing for people who are using just adam-core, or alternatively, a bad thing for people who are using the adam-assembly überjar.

Here's the parquet binary compat issue. Parquet 1.8.2 relies on a method that's not present in Parquet 1.8.1:

java.lang.NoSuchMethodError: org.apache.parquet.io.api.Binary.fromCharSequence(Ljava/lang/CharSequence;)Lorg/apache/parquet/io/api/Binary;
 at org.apache.parquet.avro.AvroWriteSupport.fromAvroString(AvroWriteSupport.java:367)
 at org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:342)
 at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:274)
 at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:187)
 at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:161)
 at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
 at 
Member

fnothaft commented Aug 25, 2017

Yeah, I'm not proud of what I've done. It's a real mess; there's a lot going on. Here's the big four:

  • spark-sql depends on parquet-column and whatnot, but doesn't depend on parquet-avro, but spark-core also depends separately on avro. Since they depend on avro but don't depend on parquet-avro, there's effectively a transitive dependency conflict that exists, but Spark doesn't notice it. spark-sql does include parquet-avro in test scope, but then pins parquet-avro to 1.8.1 even after parquet-column moved to 1.8.2 (iirc, I'd need to double check this).
  • parquet-avro kinda broke binary compat going 1.8.1 to 1.8.2. I feel a bit weird calling it "breaking binary compat" because parquet-avro 1.8.2 relies on a new public method in parquet-column 1.8.2, and thus cannot be linked to parquet-column 1.8.1. Also, it is an interface that is logically wholly contained inside of parquet (see below). shrug I don't fault the parquet folks—this is fairly reasonable behavior—but there's a dependency management story between Parquet and Spark and Avro that people haven't really thought through.
  • We've depended on a higher version of Avro for a while. The one thing this seems to break is persisting the sorted partition map to disk; otherwise this seems to work OK. I'm not 100% sure how we're getting away with this, but, we have been.
  • Arguably, the least bad solution to all of the above problems is to relocate both Parquet and Avro in a shaded JAR before building adam-core; then we don't have the dependency version mismatches in test scope (this is bad, right?). Again, this is a pretty bad "least bad" solution, but I digress. However, if you try to do this, you need to relocate both parquet-avro (and it's transitive dependencies) and parquet-scala. However, relocating parquet-scala appears to break the org.apache.parquet.filter2.dsl.Dsl companion object link, which means that you can't create predicates.

WRT the last point, we wind up relocating parquet in the shaded überjar, which is probably a bad thing for people who are using just adam-core, or alternatively, a bad thing for people who are using the adam-assembly überjar.

Here's the parquet binary compat issue. Parquet 1.8.2 relies on a method that's not present in Parquet 1.8.1:

java.lang.NoSuchMethodError: org.apache.parquet.io.api.Binary.fromCharSequence(Ljava/lang/CharSequence;)Lorg/apache/parquet/io/api/Binary;
 at org.apache.parquet.avro.AvroWriteSupport.fromAvroString(AvroWriteSupport.java:367)
 at org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:342)
 at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:274)
 at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:187)
 at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:161)
 at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
 at 
@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Sep 14, 2017

Member

Closing in favor of #1722

Member

heuermh commented Sep 14, 2017

Closing in favor of #1722

@heuermh heuermh closed this Sep 14, 2017

@heuermh heuermh added this to the 0.23.0 milestone Dec 7, 2017

@heuermh heuermh added this to Completed in Release 0.23.0 Jan 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment