New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-808] build an assembly cli jar with maven shade plugin #813

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
4 participants
@heuermh
Member

heuermh commented Aug 26, 2015

Another approach to #808.

Still in progress (e.g. I'm not sure how adam-pyspark works), thought I'd should push it early for review/discussion.

Will depend on a fix for #812.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Aug 26, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/889/
Test PASSed.

AmplabJenkins commented Aug 26, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/889/
Test PASSed.

@massie

This comment has been minimized.

Show comment
Hide comment
@massie

massie Aug 26, 2015

Member

Thanks for this PR, Michael.

We originally created an uber jar for distribution but ran into file limitations with zip/jar files (see http://stackoverflow.com/questions/9616250/what-is-the-maximum-number-of-files-per-jar). Now that we've have Spark and Hadoop as provided dependencies, the number of files in our jar file should have dropped significantly.

How much head room do we have? How many files are in this uber jar, e.g. what does jar -tvf adam-uber.jar | wc -l output?

Member

massie commented Aug 26, 2015

Thanks for this PR, Michael.

We originally created an uber jar for distribution but ran into file limitations with zip/jar files (see http://stackoverflow.com/questions/9616250/what-is-the-maximum-number-of-files-per-jar). Now that we've have Spark and Hadoop as provided dependencies, the number of files in our jar file should have dropped significantly.

How much head room do we have? How many files are in this uber jar, e.g. what does jar -tvf adam-uber.jar | wc -l output?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Aug 27, 2015

Member

I think it'd make sense to move back to the uberjar if we've got enough head room.

Member

fnothaft commented Aug 27, 2015

I think it'd make sense to move back to the uberjar if we've got enough head room.

<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>

This comment has been minimized.

@fnothaft

fnothaft Aug 27, 2015

Member

OOC, I went back to the ol' days before we moved to appassembler and saw a few more exclusions (see at https://github.com/bigdatagenomics/adam/blob/adam-parent-0.6.0/adam-cli/pom.xml#L43). Might be worth doing a diff of the old maven-shade-plugin section versus the new one.

@fnothaft

fnothaft Aug 27, 2015

Member

OOC, I went back to the ol' days before we moved to appassembler and saw a few more exclusions (see at https://github.com/bigdatagenomics/adam/blob/adam-parent-0.6.0/adam-cli/pom.xml#L43). Might be worth doing a diff of the old maven-shade-plugin section versus the new one.

This comment has been minimized.

@heuermh

heuermh Aug 27, 2015

Member

Yep, and there are quite a few other transformers that e.g. the Spark project uses, will have to look further

@heuermh

heuermh Aug 27, 2015

Member

Yep, and there are quite a few other transformers that e.g. the Spark project uses, will have to look further

This comment has been minimized.

@massie

massie Aug 31, 2015

Member

@heuermh Did you see any changes that needed to by made to the PR when you looked at the old maven-shade-plugin xml?

This RP looks good to me but, since it's a big change (and I've hit the merge button too quickly a few times lately), I wanted to check.

@massie

massie Aug 31, 2015

Member

@heuermh Did you see any changes that needed to by made to the PR when you looked at the old maven-shade-plugin xml?

This RP looks good to me but, since it's a big change (and I've hit the merge button too quickly a few times lately), I wanted to check.

This comment has been minimized.

@fnothaft

fnothaft Aug 31, 2015

Member

I'll build this and run on the cluster and see how it goes.

@fnothaft

fnothaft Aug 31, 2015

Member

I'll build this and run on the cluster and see how it goes.

This comment has been minimized.

@massie

massie Aug 31, 2015

Member

Thanks, Frank.

@massie

massie Aug 31, 2015

Member

Thanks, Frank.

This comment has been minimized.

@fnothaft

fnothaft Aug 31, 2015

Member

removal of compute-adam-classpath.sh and compute-adam-jars.sh; I haven't done this yet because I'm not sure how adam-pyspark is supposed to work

I think it is OK to remove both of those, and possibly adam-pyspark? CC @laserson

decision on whether to continue using cli module as the assembly artifact or create a separate assembly module as in the Spark build

What exactly would the difference be here?

test assembly artifact on YARN cluster, to confirm issues such as #663 don't resurface

This seems to be OK.

test assembly artifact in build for downstream applications

I don't think this change should impact downstream applications, except possibly plugins. Are plugins your concern?

@fnothaft

fnothaft Aug 31, 2015

Member

removal of compute-adam-classpath.sh and compute-adam-jars.sh; I haven't done this yet because I'm not sure how adam-pyspark is supposed to work

I think it is OK to remove both of those, and possibly adam-pyspark? CC @laserson

decision on whether to continue using cli module as the assembly artifact or create a separate assembly module as in the Spark build

What exactly would the difference be here?

test assembly artifact on YARN cluster, to confirm issues such as #663 don't resurface

This seems to be OK.

test assembly artifact in build for downstream applications

I don't think this change should impact downstream applications, except possibly plugins. Are plugins your concern?

This comment has been minimized.

@fnothaft

fnothaft Aug 31, 2015

Member

adam-submit and adam-shell are both OK on YARN (the issue seen in #663 does not reproduce).

@fnothaft

fnothaft Aug 31, 2015

Member

adam-submit and adam-shell are both OK on YARN (the issue seen in #663 does not reproduce).

This comment has been minimized.

@heuermh

heuermh Aug 31, 2015

Member

test assembly artifact in build for downstream applications

I don't think this change should impact downstream applications, except possibly plugins. Are plugins your concern?

avocado, rice, qc-metrics, etc. all use classpath-determining scripts similar to those being removed here in their bin directories.

@heuermh

heuermh Aug 31, 2015

Member

test assembly artifact in build for downstream applications

I don't think this change should impact downstream applications, except possibly plugins. Are plugins your concern?

avocado, rice, qc-metrics, etc. all use classpath-determining scripts similar to those being removed here in their bin directories.

This comment has been minimized.

@fnothaft

fnothaft Aug 31, 2015

Member

avocado, rice, qc-metrics, etc. all use classpath-determining scripts similar to those being removed here in their bin directories.

Oh, sure, but this change doesn't change the adam-{core,apis,cli} artifact that is published to Maven Central.
We can change the packaging in the downstream projects at a later date.

@fnothaft

fnothaft Aug 31, 2015

Member

avocado, rice, qc-metrics, etc. all use classpath-determining scripts similar to those being removed here in their bin directories.

Oh, sure, but this change doesn't change the adam-{core,apis,cli} artifact that is published to Maven Central.
We can change the packaging in the downstream projects at a later date.

This comment has been minimized.

@heuermh

heuermh Sep 28, 2015

Member

sorry, I've lost the threading for this comment.

I went back to the old shade plugin configuration and don't see those exclusions as being necessary any more. I don't see reference.conf either, so I'm not sure what that transformer line was for.

@heuermh

heuermh Sep 28, 2015

Member

sorry, I've lost the threading for this comment.

I went back to the old shade plugin configuration and don't see those exclusions as being necessary any more. I don't see reference.conf either, so I'm not sure what that transformer line was for.

@@ -28,8 +26,34 @@ if [[ -z $@ && -n "$ADAM_OPTS" ]]; then
echo "Run adam-shell instead as adam-shell <spark-args>"
fi
# Find ADAM cli assembly jar
ADAM_CLI_JAR=
if [ -f "$SCRIPT_DIR/repo" ]; then

This comment has been minimized.

@fnothaft

fnothaft Aug 27, 2015

Member

Why are we checking for a /repo directory here? Is this for backwards compatibility with appassembler?

@fnothaft

fnothaft Aug 27, 2015

Member

Why are we checking for a /repo directory here? Is this for backwards compatibility with appassembler?

This comment has been minimized.

@heuermh

heuermh Aug 27, 2015

Member

I updated the distribution assembly.xml to copy the adam-cli jar to repo, so this check is to see if the user is trying to run from the dist. The Spark build uses a separate assembly module and has a more involved find (in the spark-class script), we may wish to crib from there.

@heuermh

heuermh Aug 27, 2015

Member

I updated the distribution assembly.xml to copy the adam-cli jar to repo, so this check is to see if the user is trying to run from the dist. The Spark build uses a separate assembly module and has a more involved find (in the spark-class script), we may wish to crib from there.

This comment has been minimized.

@fnothaft

fnothaft Aug 27, 2015

Member

Ah, OK! Got it.

@fnothaft

fnothaft Aug 27, 2015

Member

Ah, OK! Got it.

# Find ADAM cli assembly jar
ADAM_CLI_JAR=
if [ -f "$SCRIPT_DIR/repo" ]; then
ASSEMBLY_DIR="$SCRIPT_DIR/repo"

This comment has been minimized.

@fnothaft

fnothaft Aug 27, 2015

Member

Same question as above here.

@fnothaft

fnothaft Aug 27, 2015

Member

Same question as above here.

Show outdated Hide outdated bin/adam-submit
@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Aug 27, 2015

Member
$ jar -tvf adam-cli/target/adam-cli_2.10-0.17.2-SNAPSHOT.jar | wc -l
   37210

and the limitation was only with JDK6 and earlier, it shouldn't be a problem with JDK7 and later.

Member

heuermh commented Aug 27, 2015

$ jar -tvf adam-cli/target/adam-cli_2.10-0.17.2-SNAPSHOT.jar | wc -l
   37210

and the limitation was only with JDK6 and earlier, it shouldn't be a problem with JDK7 and later.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Aug 27, 2015

Member

If we are at half of the limit, then I would go with this.

and the limitation was only with JDK6 and earlier, it shouldn't be a problem with JDK7 and later.

Is ZIP64 the default in JDK7 and onwards?

Member

fnothaft commented Aug 27, 2015

If we are at half of the limit, then I would go with this.

and the limitation was only with JDK6 and earlier, it shouldn't be a problem with JDK7 and later.

Is ZIP64 the default in JDK7 and onwards?

@massie

This comment has been minimized.

Show comment
Hide comment
@massie

massie Aug 27, 2015

Member

Agree. We have plenty of headroom. Zip64 is supported from JDK7 onward. It appears reading online that zip is default and, if there are >65535 files, that zip64 is used. I guess they did that for backward compatibility (since pre-JDK7 can't read zip64).

Member

massie commented Aug 27, 2015

Agree. We have plenty of headroom. Zip64 is supported from JDK7 onward. It appears reading online that zip is default and, if there are >65535 files, that zip64 is used. I guess they did that for backward compatibility (since pre-JDK7 can't read zip64).

@massie

This comment has been minimized.

Show comment
Hide comment
@massie

massie Sep 9, 2015

Member

@heuermh You mentioned that the following needs to be done. How's it going?

  • removal of compute-adam-classpath.sh and compute-adam-jars.sh; I haven't done this yet because I'm not sure how adam-pyspark is supposed to work
  • a careful look at maven-shade-plugin excludes/transformer configuration as mentioned above
  • decision on whether to continue using cli module as the assembly artifact or create a separate assembly module as in the Spark build
  • cleanup of changes to adam-shell.sh and adam-submit.sh per review notes
  • test assembly artifact on YARN cluster, to confirm issues such as #663 don't resurface
  • test assembly artifact in build for downstream applications

@fnothaft Are you using this on the cluster with success?

Member

massie commented Sep 9, 2015

@heuermh You mentioned that the following needs to be done. How's it going?

  • removal of compute-adam-classpath.sh and compute-adam-jars.sh; I haven't done this yet because I'm not sure how adam-pyspark is supposed to work
  • a careful look at maven-shade-plugin excludes/transformer configuration as mentioned above
  • decision on whether to continue using cli module as the assembly artifact or create a separate assembly module as in the Spark build
  • cleanup of changes to adam-shell.sh and adam-submit.sh per review notes
  • test assembly artifact on YARN cluster, to confirm issues such as #663 don't resurface
  • test assembly artifact in build for downstream applications

@fnothaft Are you using this on the cluster with success?

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Sep 28, 2015

Member

Pushing after rebase with additional changes: removed adam-pyspark and compute-* scripts; removed NOTICE.txt as cofoja is no longer a dependency; and cleaned up per review above.

I couldn't get this to do the right thing, even with some twiddling

jars=({ASSEMBLY_DIR}/adam-cli*.jar)
num_jars=${#jars[*]}
Member

heuermh commented Sep 28, 2015

Pushing after rebase with additional changes: removed adam-pyspark and compute-* scripts; removed NOTICE.txt as cofoja is no longer a dependency; and cleaned up per review above.

I couldn't get this to do the right thing, even with some twiddling

jars=({ASSEMBLY_DIR}/adam-cli*.jar)
num_jars=${#jars[*]}
@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 29, 2015

Member

I couldn't get this to do the right thing, even with some twiddling

What (wrong thing) does it do instead?

Member

fnothaft commented Sep 29, 2015

I couldn't get this to do the right thing, even with some twiddling

What (wrong thing) does it do instead?

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Sep 29, 2015

Member

What (wrong thing) does it do instead?

If I change it to

jars=({$ASSEMBLY_DIR}/adam-cli*.jar)

then bash doesn't complain but

num_jars=0

when it shouldn't be. Maybe an OSX thing?

Member

heuermh commented Sep 29, 2015

What (wrong thing) does it do instead?

If I change it to

jars=({$ASSEMBLY_DIR}/adam-cli*.jar)

then bash doesn't complain but

num_jars=0

when it shouldn't be. Maybe an OSX thing?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 29, 2015

Member

I will check this out on my end.

Member

fnothaft commented Sep 29, 2015

I will check this out on my end.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 29, 2015

Member

Also, the above RE: num_jars looks fine by me as it is.

Member

fnothaft commented Sep 29, 2015

Also, the above RE: num_jars looks fine by me as it is.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Sep 29, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/947/
Test PASSed.

AmplabJenkins commented Sep 29, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/947/
Test PASSed.

@fnothaft fnothaft added this to the 0.18.0 milestone Oct 2, 2015

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 2, 2015

Member

@heuermh since this is a big build change, can you rebase this on ToT? Once it passes @AmplabJenkins, I will merge.

Member

fnothaft commented Oct 2, 2015

@heuermh since this is a big build change, can you rebase this on ToT? Once it passes @AmplabJenkins, I will merge.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Oct 2, 2015

Member

I might want to remove <mainClass>org.bdgenomics.adam.cli.ADAMMain</mainClass> as well. Per issue #808 running the adam cli jar will fail. WDYT?

Member

heuermh commented Oct 2, 2015

I might want to remove <mainClass>org.bdgenomics.adam.cli.ADAMMain</mainClass> as well. Per issue #808 running the adam cli jar will fail. WDYT?

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 2, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/955/
Test PASSed.

AmplabJenkins commented Oct 2, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/955/
Test PASSed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 2, 2015

Member

@heuermh that sounds like a good idea to me. Can you make that change and push it?

Member

fnothaft commented Oct 2, 2015

@heuermh that sounds like a good idea to me. Can you make that change and push it?

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Oct 2, 2015

Member

ok

Member

heuermh commented Oct 2, 2015

ok

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 2, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/956/
Test PASSed.

AmplabJenkins commented Oct 2, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/956/
Test PASSed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 2, 2015

Member

Thanks @heuermh! I've verified this on the cluster, and have merged it as f39de57.

Member

fnothaft commented Oct 2, 2015

Thanks @heuermh! I've verified this on the cluster, and have merged it as f39de57.

@fnothaft fnothaft closed this Oct 2, 2015

@heuermh heuermh deleted the heuermh:shade branch Oct 3, 2015

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Oct 3, 2015

Member

Thanks!

Member

heuermh commented Oct 3, 2015

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment