[ADAM-1192] Correctly handle other whitespace in FASTA description. #1198

Merged
merged 1 commit into from Oct 8, 2016

Conversation

Projects
None yet
3 participants
@fnothaft
Member

fnothaft commented Oct 6, 2016

Resolves #1192. More broadly handles whitespace in the FASTA description, and also disregards FASTA metadata which is inlined.

Testing this on the cluster as we speak.

@@ -30,8 +30,8 @@ class Fasta2ADAMSuite extends ADAMFunSuite {
val contigFragments = sc.loadParquetContigFragments(convertPath)
assert(contigFragments.rdd.count() === 26)
val first = contigFragments.rdd.first()
- assert(first.getContig.getContigName === "gi|224384749|gb|CM000682.1|")
- assert(first.getDescription === "Homo sapiens chromosome 20, GRCh37 primary reference assembly")
+ assert(first.getContig.getContigName === null)

This comment has been minimized.

@heuermh

heuermh Oct 6, 2016

Member

Not sure I understand, the string before the first space is typically the sequence name in FASTA format. You might want to compare to the various open-bio projects.

@heuermh

heuermh Oct 6, 2016

Member

Not sure I understand, the string before the first space is typically the sequence name in FASTA format. You might want to compare to the various open-bio projects.

This comment has been minimized.

@fnothaft

fnothaft Oct 6, 2016

Member

Well, gi is the Entrez ID, and gb is the GenBank ID. I assure you that they are not the sequence name.

@fnothaft

fnothaft Oct 6, 2016

Member

Well, gi is the Entrez ID, and gb is the GenBank ID. I assure you that they are not the sequence name.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 6, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1523/

Build result: FAILURE

GitHub pull request #1198 of commit 0c98c3c automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1198/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 73f1a1cb0d8e8dee23d6ebf6b35bcb9b8881060b # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1198/merge^{commit} # timeout=10Checking out Revision 73f1a1cb0d8e8dee23d6ebf6b35bcb9b8881060b (origin/pr/1198/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 73f1a1cb0d8e8dee23d6ebf6b35bcb9b8881060bFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1523/

Build result: FAILURE

GitHub pull request #1198 of commit 0c98c3c automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1198/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 73f1a1cb0d8e8dee23d6ebf6b35bcb9b8881060b # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1198/merge^{commit} # timeout=10Checking out Revision 73f1a1cb0d8e8dee23d6ebf6b35bcb9b8881060b (origin/pr/1198/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 73f1a1cb0d8e8dee23d6ebf6b35bcb9b8881060bFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

[ADAM-1194] Correctly handle other whitespace in FASTA description.
Resolves #1192. More broadly handles whitespace in the FASTA description, and
also disregards FASTA metadata which is inlined.
@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 6, 2016

Member

Whoops! Forgot to commit the test resource. Just pushed.

Member

fnothaft commented Oct 6, 2016

Whoops! Forgot to commit the test resource. Just pushed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Oct 6, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1524/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1524/
Test PASSed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Oct 7, 2016

Member

With this, recomputing the MD tags worked on the cluster with GRCh38.

Member

fnothaft commented Oct 7, 2016

With this, recomputing the MD tags worked on the cluster with GRCh38.

@heuermh

heuermh approved these changes Oct 7, 2016

Not fully convinced of the changes here, but if they properly support the GRCh38 MD tag use case, then I'm +1.

@fnothaft fnothaft referenced this pull request in bigdatagenomics/bdg-formats Oct 7, 2016

Merged

Add sequence, slice, and read schema #83

@heuermh heuermh merged commit d70e68b into bigdatagenomics:master Oct 8, 2016

1 check passed

default Merged build finished.
Details
@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Oct 8, 2016

Member

Thank you, @fnothaft!

Member

heuermh commented Oct 8, 2016

Thank you, @fnothaft!

@ryan-williams ryan-williams referenced this pull request Oct 8, 2016

Closed

master won't compile #1200

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment