Use hadoop-bam BAMInputFormat to do loadIndexedBam #953

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
4 participants
@andrewmchen
Member

andrewmchen commented Feb 21, 2016

I changed the loadIndexedBam function to use the new InputFormat released in hadoop-bam 7.4.0 that filters using an index file. We used to use a InputFormat that @erictu wrote.

In order to make this change, I had to upgrade our htsjdk library to version 2.1.0. Hope this doesn't break anything.

I haven't tested this on the cluster but I wrote a small test here. When I get a chance I'll try testing this on the cluster as well.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Feb 21, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1091/

Build result: FAILURE

GitHub pull request #953 of commit 13f8b6a automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/953/merge^{commit} # timeout=10 > git branch -a --contains 8311854 # timeout=10 > git rev-parse remotes/origin/pr/953/merge^{commit} # timeout=10Checking out Revision 8311854 (origin/pr/953/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 83118549c4f1557462c5d8a811327cc74fd8cb8dFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1091/

Build result: FAILURE

GitHub pull request #953 of commit 13f8b6a automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/953/merge^{commit} # timeout=10 > git branch -a --contains 8311854 # timeout=10 > git rev-parse remotes/origin/pr/953/merge^{commit} # timeout=10Checking out Revision 8311854 (origin/pr/953/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 83118549c4f1557462c5d8a811327cc74fd8cb8dFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Feb 21, 2016

Member

Thanks for opening this, @andrewmchen! I will review soon. Moving to the new Hadoop-BAM/HTSJDK releases sounds good, but I would like to defer merging this to after the 0.19.0 release, as the version bump requires a move to Java 8. Since 0.19.0 has a lot of fixes in it, I'd like to keep that as widely available as possible.

Member

fnothaft commented Feb 21, 2016

Thanks for opening this, @andrewmchen! I will review soon. Moving to the new Hadoop-BAM/HTSJDK releases sounds good, but I would like to defer merging this to after the 0.19.0 release, as the version bump requires a move to Java 8. Since 0.19.0 has a lot of fixes in it, I'd like to keep that as widely available as possible.

@fnothaft fnothaft added this to the 0.20.0 milestone Feb 21, 2016

@andrewmchen

This comment has been minimized.

Show comment
Hide comment
@andrewmchen

andrewmchen Feb 21, 2016

Member

OK great thanks! I'm guessing then that the compilation problems on Jenkins are because of the mismatch between Java 7 and 8?

Member

andrewmchen commented Feb 21, 2016

OK great thanks! I'm guessing then that the compilation problems on Jenkins are because of the mismatch between Java 7 and 8?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Feb 21, 2016

Member

The unit test failure is showing a Java version mismatch, but I'd have to look into it more closely to say something more intelligent.

Member

fnothaft commented Feb 21, 2016

The unit test failure is showing a Java version mismatch, but I'd have to look into it more closely to say something more intelligent.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Feb 22, 2016

Member

+1 to waiting until 0.20 or later

Member

heuermh commented Feb 22, 2016

+1 to waiting until 0.20 or later

@@ -41,4 +42,19 @@ trait Interval {
*/
def width: Long = end - start
+ /**
+ * Need to implement getStart function from Locatable. (1-based start position. closed)

This comment has been minimized.

@heuermh

heuermh Mar 22, 2016

Member

Mixing 0- and 1- based coordinate systems in the same class gives me the hives. Can we hide this as an implementation detail in an adapter somewhere in the i/o code?

@heuermh

heuermh Mar 22, 2016

Member

Mixing 0- and 1- based coordinate systems in the same class gives me the hives. Can we hide this as an implementation detail in an adapter somewhere in the i/o code?

@@ -181,6 +181,7 @@
<dependency>
<groupId>org.seqdoop</groupId>
<artifactId>hadoop-bam</artifactId>
+ <version>7.4.0</version>

This comment has been minimized.

@heuermh

heuermh Mar 22, 2016

Member

We'll probably do the update to Hadoop-BAM and HTSJDK in a separate pull request after taking a little time to solicit feedback on the move to JDK8.

@heuermh

heuermh Mar 22, 2016

Member

We'll probably do the update to Hadoop-BAM and HTSJDK in a separate pull request after taking a little time to solicit feedback on the move to JDK8.

@@ -511,7 +511,7 @@
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
- <version>16.0.1</version> <!-- note: version 17.0 breaks hadoop 2.6+ at runtime -->
+ <version>19.0</version> <!-- note: version 17.0 breaks hadoop 2.6+ at runtime -->

This comment has been minimized.

@heuermh

heuermh Mar 22, 2016

Member

I don't think we can make this change until Hadoop fixes the problem upstream. Will try to find the relevant issue; I had a reference to it before.

@heuermh

heuermh Mar 22, 2016

Member

I don't think we can make this change until Hadoop fixes the problem upstream. Will try to find the relevant issue; I had a reference to it before.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft May 18, 2016

Member

Jenkins, test this please.

Member

fnothaft commented May 18, 2016

Jenkins, test this please.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins May 18, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1235/

Build result: FAILURE

GitHub pull request #953 of commit 13f8b6a.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse 13f8b6a^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 13f8b6a # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/953/head^{commit} # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/953/merge^{commit} # timeout=10Checking out Revision 13f8b6a (origin/pr/953/merge, origin/pr/953/head) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 13f8b6ab35faeb4b16d990451d7166f35c23c63aFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1235/

Build result: FAILURE

GitHub pull request #953 of commit 13f8b6a.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse 13f8b6a^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 13f8b6a # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/953/head^{commit} # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/953/merge^{commit} # timeout=10Checking out Revision 13f8b6a (origin/pr/953/merge, origin/pr/953/head) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 13f8b6ab35faeb4b16d990451d7166f35c23c63aFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft May 20, 2016

Member

Superseded by #1036.

Member

fnothaft commented May 20, 2016

Superseded by #1036.

@fnothaft fnothaft closed this May 20, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment