[ADAM-993] Support loading files using globs and from directory paths. #1117

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
3 participants
@fnothaft
Member

fnothaft commented Aug 18, 2016

Resolves #993. Currently based on #1116.

  • Add private helper functions in ADAMContext to elaborate out globs and
    directory paths when loading files.
  • Eliminate unused functions for elaborating paths and loading mixtures
    of read files, and some redundant dictionary loading functions.
  • Add tests to cover loading directories/globs of:
    • Parquet files
    • BAM files (with/without using indices)
    • VCF files
@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Aug 18, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1389/

Build result: FAILURE

GitHub pull request #1117 of commit 2ed6056 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1117/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 864c987c375b176d5aac27d791c30696d88f400f # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1117/merge^{commit} # timeout=10Checking out Revision 864c987c375b176d5aac27d791c30696d88f400f (origin/pr/1117/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 864c987c375b176d5aac27d791c30696d88f400fFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1389/

Build result: FAILURE

GitHub pull request #1117 of commit 2ed6056 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1117/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 864c987c375b176d5aac27d791c30696d88f400f # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1117/merge^{commit} # timeout=10Checking out Revision 864c987c375b176d5aac27d791c30696d88f400f (origin/pr/1117/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 864c987c375b176d5aac27d791c30696d88f400fFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

+ val bamFiles = getFsAndFiles(path)
+ val filteredFiles = bamFiles.filter(p => {
+ val pPath = p.getName()
+ pPath.endsWith(".bam") || pPath.endsWith(".sam") || pPath.startsWith("part-")

This comment has been minimized.

@fnothaft

fnothaft Aug 18, 2016

Member

I'd like close review of this line. Should it be different?

@fnothaft

fnothaft Aug 18, 2016

Member

I'd like close review of this line. Should it be different?

private[rdd] def loadVcfMetadata(filePath: String): (SequenceDictionary, Seq[Sample]) = {
+ // get the paths to all vcfs
+ val files = getFsAndFiles(new Path(filePath))

This comment has been minimized.

@fnothaft

fnothaft Aug 18, 2016

Member

Also, similar to the .bam/.sam logic, thoughts here? I don't want to go with the same logic, since .vcfs commonly show up GZIPped/etc, but .tbi files abound...

@fnothaft

fnothaft Aug 18, 2016

Member

Also, similar to the .bam/.sam logic, thoughts here? I don't want to go with the same logic, since .vcfs commonly show up GZIPped/etc, but .tbi files abound...

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Aug 18, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1390/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1390/
Test PASSed.

@fnothaft fnothaft referenced this pull request Aug 18, 2016

Merged

Clean up ADAMContext #1118

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Aug 25, 2016

Member

Ping for review.

Member

fnothaft commented Aug 25, 2016

Ping for review.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Aug 31, 2016

Member

Rebased. Ping for review/merge.

Member

fnothaft commented Aug 31, 2016

Rebased. Ping for review/merge.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Aug 31, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1445/

Build result: FAILURE

GitHub pull request #1117 of commit 24e6ce9 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1117/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 01b3db6 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1117/merge^{commit} # timeout=10Checking out Revision 01b3db6 (origin/pr/1117/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 01b3db687bea2cb42bd628c58a05136eff02257eFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1445/

Build result: FAILURE

GitHub pull request #1117 of commit 24e6ce9 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1117/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 01b3db6 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1117/merge^{commit} # timeout=10Checking out Revision 01b3db6 (origin/pr/1117/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 01b3db687bea2cb42bd628c58a05136eff02257eFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

[ADAM-993] Support loading files using globs and from directory paths.
Resolves #993.

* Add private helper functions in ADAMContext to elaborate out globs and
  directory paths when loading files.
* Eliminate unused functions for elaborating paths and loading mixtures
  of read files, and some redundant dictionary loading functions.
* Add tests to cover loading directories/globs of:
  * Parquet files
  * BAM files (with/without using indices)
  * VCF files
@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Aug 31, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1447/

Build result: FAILURE

GitHub pull request #1117 of commit 555c5a1 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1117/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 3c1110b # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1117/merge^{commit} # timeout=10Checking out Revision 3c1110b (origin/pr/1117/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 3c1110bca20be7fced7b4f6465c286e953f6b1e7First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1447/

Build result: FAILURE

GitHub pull request #1117 of commit 555c5a1 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1117/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 3c1110b # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1117/merge^{commit} # timeout=10Checking out Revision 3c1110b (origin/pr/1117/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 3c1110bca20be7fced7b4f6465c286e953f6b1e7First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 1, 2016

Member

Jenkins, retest this please.

Member

fnothaft commented Sep 1, 2016

Jenkins, retest this please.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Sep 1, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1449/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1449/
Test PASSed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Sep 6, 2016

Member

This can be closed since the commit was included in #1118, correct?

Member

heuermh commented Sep 6, 2016

This can be closed since the commit was included in #1118, correct?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 6, 2016

Member

@heuermh let me double check...

Member

fnothaft commented Sep 6, 2016

@heuermh let me double check...

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 6, 2016

Member

This can be closed since the commit was included in #1118, correct?

The answer is yes! Just went through and rebased and made sure we didn't lose anything. Closing this now...

Member

fnothaft commented Sep 6, 2016

This can be closed since the commit was included in #1118, correct?

The answer is yes! Just went through and rebased and made sure we didn't lose anything. Closing this now...

@fnothaft fnothaft closed this Sep 6, 2016

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 6, 2016

Member

This can be closed since the commit was included in #1118, correct?

The answer is yes! Just went through and rebased and made sure we didn't lose anything. Closing this now...

Member

fnothaft commented Sep 6, 2016

This can be closed since the commit was included in #1118, correct?

The answer is yes! Just went through and rebased and made sure we didn't lose anything. Closing this now...

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 6, 2016

Member

This can be closed since the commit was included in #1118, correct?

The answer is yes! Just went through and rebased and made sure we didn't lose anything. Closing this now...

Member

fnothaft commented Sep 6, 2016

This can be closed since the commit was included in #1118, correct?

The answer is yes! Just went through and rebased and made sure we didn't lose anything. Closing this now...

@fnothaft fnothaft deleted the fnothaft:issues/993-glob branch Sep 6, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment