Clean up ADAMContext #1118

Merged
merged 4 commits into from Sep 6, 2016

Conversation

Projects
None yet
5 participants
@fnothaft
Member

fnothaft commented Aug 18, 2016

Based on #1117.

Cleaning up org.bdgenomics.adam.rdd.ADAMContext:

  • Made constructor private.
  • Removed dead implicits.
  • Made loadParquet package private.
  • Broke long lines.
  • Added scaladoc.
  • Removed non-idiomatic uses of Options.
  • Removed unused overloaded loadFeatures definition.
  • Removed loadGenes and PrintGenes command.
  • Added support for .vcf.gz in loadGenotypes and loadVariants.
  • Removed Java implicits conversions in ADAMContext.

I've separated that last one out into a second commit (040396e) for now. I'm OK with both squashing it or not.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Aug 18, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1391/

Build result: FAILURE

GitHub pull request #1118 of commit 040396e automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1118/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains f9816046bc0ed5b3e93079da3590fef9599d19ee # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1118/merge^{commit} # timeout=10Checking out Revision f9816046bc0ed5b3e93079da3590fef9599d19ee (origin/pr/1118/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f f9816046bc0ed5b3e93079da3590fef9599d19eeFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1391/

Build result: FAILURE

GitHub pull request #1118 of commit 040396e automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1118/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains f9816046bc0ed5b3e93079da3590fef9599d19ee # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1118/merge^{commit} # timeout=10Checking out Revision f9816046bc0ed5b3e93079da3590fef9599d19ee (origin/pr/1118/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f f9816046bc0ed5b3e93079da3590fef9599d19eeFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Aug 18, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1392/

Build result: FAILURE

GitHub pull request #1118 of commit 4d7e2c0 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1118/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains cfde76c76eb2c6b79db2b4907c99b3e6c4fb47df # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1118/merge^{commit} # timeout=10Checking out Revision cfde76c76eb2c6b79db2b4907c99b3e6c4fb47df (origin/pr/1118/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f cfde76c76eb2c6b79db2b4907c99b3e6c4fb47dfFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1392/

Build result: FAILURE

GitHub pull request #1118 of commit 4d7e2c0 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1118/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains cfde76c76eb2c6b79db2b4907c99b3e6c4fb47df # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1118/merge^{commit} # timeout=10Checking out Revision cfde76c76eb2c6b79db2b4907c99b3e6c4fb47df (origin/pr/1118/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f cfde76c76eb2c6b79db2b4907c99b3e6c4fb47dfFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Aug 18, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1393/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1393/
Test PASSed.

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Aug 19, 2016

Member

+1 to these, will plan to merge later in the day 8/22 unless more comments

Member

jpdna commented Aug 19, 2016

+1 to these, will plan to merge later in the day 8/22 unless more comments

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Aug 23, 2016

Member

Would like a chance to test the glob and directory path changes.

I wonder if @akmorrow13 and @tdanford are ok with removing the gene-related functionality?

Member

heuermh commented Aug 23, 2016

Would like a chance to test the glob and directory path changes.

I wonder if @akmorrow13 and @tdanford are ok with removing the gene-related functionality?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Aug 23, 2016

Member

@heuermh Agreed; I'm dropping a -1 here to block the merge, since this depends on #1117 which still needs more review.

Member

fnothaft commented Aug 23, 2016

@heuermh Agreed; I'm dropping a -1 here to block the merge, since this depends on #1117 which still needs more review.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Aug 23, 2016

Member

I wonder if @akmorrow13 and @tdanford are ok with removing the gene-related functionality?

I hadn't realized until now that we're using the Gene models downstream in Mango. IMO, I think we should drop the hierarchical gene models in favor of extracting flat transcripts/exons from feature files. @heuermh I think you had suggested this a while ago. The hierarchical gene models have always had iffy performance, since they rely on something like 4 consecutive groupBys. I think the flat feature approach is sufficient for what Mango is doing, but indeed, @akmorrow13 is the correct person to comment!

Member

fnothaft commented Aug 23, 2016

I wonder if @akmorrow13 and @tdanford are ok with removing the gene-related functionality?

I hadn't realized until now that we're using the Gene models downstream in Mango. IMO, I think we should drop the hierarchical gene models in favor of extracting flat transcripts/exons from feature files. @heuermh I think you had suggested this a while ago. The hierarchical gene models have always had iffy performance, since they rely on something like 4 consecutive groupBys. I think the flat feature approach is sufficient for what Mango is doing, but indeed, @akmorrow13 is the correct person to comment!

@akmorrow13

This comment has been minimized.

Show comment
Hide comment
@akmorrow13

akmorrow13 Aug 23, 2016

Contributor

By flat feature approach do you mean having an object of (transcript, genid..) without the exons? Whould the exons be a separate object?

Best,

Alyssa Morrow
akmorrow@berkeley.edu mailto:akmorrow@berkeley.edu
414-254-6645

On Aug 23, 2016, at 8:42 AM, Frank Austin Nothaft notifications@github.com wrote:

I wonder if @akmorrow13 https://github.com/akmorrow13 and @tdanford https://github.com/tdanford are ok with removing the gene-related functionality?

I hadn't realized until now that we're using the Gene models downstream in Mango. IMO, I think we should drop the hierarchical gene models in favor of extracting flat transcripts/exons from feature files. @heuermh https://github.com/heuermh I think you had suggested this a while ago. The hierarchical gene models have always had iffy performance, since they rely on something like 4 consecutive groupBys. I think the flat feature approach is sufficient for what Mango is doing, but indeed, @akmorrow13 https://github.com/akmorrow13 is the correct person to comment!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #1118 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AGM7-EbxkwA6IQXpWaesl_-dhn3XkLKbks5qixTXgaJpZM4JnR_q.

Contributor

akmorrow13 commented Aug 23, 2016

By flat feature approach do you mean having an object of (transcript, genid..) without the exons? Whould the exons be a separate object?

Best,

Alyssa Morrow
akmorrow@berkeley.edu mailto:akmorrow@berkeley.edu
414-254-6645

On Aug 23, 2016, at 8:42 AM, Frank Austin Nothaft notifications@github.com wrote:

I wonder if @akmorrow13 https://github.com/akmorrow13 and @tdanford https://github.com/tdanford are ok with removing the gene-related functionality?

I hadn't realized until now that we're using the Gene models downstream in Mango. IMO, I think we should drop the hierarchical gene models in favor of extracting flat transcripts/exons from feature files. @heuermh https://github.com/heuermh I think you had suggested this a while ago. The hierarchical gene models have always had iffy performance, since they rely on something like 4 consecutive groupBys. I think the flat feature approach is sufficient for what Mango is doing, but indeed, @akmorrow13 https://github.com/akmorrow13 is the correct person to comment!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #1118 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AGM7-EbxkwA6IQXpWaesl_-dhn3XkLKbks5qixTXgaJpZM4JnR_q.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Aug 23, 2016

Member

By flat feature approach do you mean having an object of (transcript, genid..) without the exons? Whould the exons be a separate object?

You would have an RDD[Feature], so in that sense it is flat. The structure of the features themselves is rich, including nesting via the parentIdsfeatureId relationship and typing via featureType.

Member

heuermh commented Aug 23, 2016

By flat feature approach do you mean having an object of (transcript, genid..) without the exons? Whould the exons be a separate object?

You would have an RDD[Feature], so in that sense it is flat. The structure of the features themselves is rich, including nesting via the parentIdsfeatureId relationship and typing via featureType.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Aug 23, 2016

Member

What @heuermh said. ;)
For an example, see how the transcriptome: RDD[Feature] is used here.

Member

fnothaft commented Aug 23, 2016

What @heuermh said. ;)
For an example, see how the transcriptome: RDD[Feature] is used here.

@@ -849,6 +1119,14 @@ class ADAMContext(@transient val sc: SparkContext) extends Serializable with Log
DatabaseVariantAnnotationRDD(rdd, sd)
}
+ /**
+ * Loads NucleotideContigFragments stored in Parquet, with metadata.
+ *

This comment has been minimized.

@heuermh

heuermh Aug 23, 2016

Member

comment doesn't match method

@heuermh

heuermh Aug 23, 2016

Member

comment doesn't match method

This comment has been minimized.

@fnothaft

fnothaft Aug 31, 2016

Member

Sorry, what here doesn't match the method?

@fnothaft

fnothaft Aug 31, 2016

Member

Sorry, what here doesn't match the method?

This comment has been minimized.

@heuermh

heuermh Aug 31, 2016

Member

Something is wonky with the github web UI, look at line 1136 in view whole file, it should be "Load VariantAnnotations ...". Then line 1126 def loadParquetVariantAnnotations is missing doc.

https://github.com/fnothaft/adam/blob/27b769c69e7a2a864ba7a99d824213df72909acc/adam-core/src/main/scala/org/bdgenomics/adam/rdd/ADAMContext.scala#L1136

@heuermh

heuermh Aug 31, 2016

Member

Something is wonky with the github web UI, look at line 1136 in view whole file, it should be "Load VariantAnnotations ...". Then line 1126 def loadParquetVariantAnnotations is missing doc.

https://github.com/fnothaft/adam/blob/27b769c69e7a2a864ba7a99d824213df72909acc/adam-core/src/main/scala/org/bdgenomics/adam/rdd/ADAMContext.scala#L1136

- filePath: String): DatabaseVariantAnnotationRDD = {
+ /**
+ * Loads variant annotations from a VCF.
+ *

This comment has been minimized.

@heuermh

heuermh Aug 23, 2016

Member

Loads variant annotations stored in VCF format.

@heuermh

heuermh Aug 23, 2016

Member

Loads variant annotations stored in VCF format.

This comment has been minimized.

@fnothaft

fnothaft Aug 23, 2016

Member

OOC, why do you see that text as preferable?

@fnothaft

fnothaft Aug 23, 2016

Member

OOC, why do you see that text as preferable?

This comment has been minimized.

@heuermh

heuermh Aug 24, 2016

Member

Little minds, consistency, etc. 😉 Reads the same as the other similar methods.

@heuermh

heuermh Aug 24, 2016

Member

Little minds, consistency, etc. 😉 Reads the same as the other similar methods.

def loadGenotypes(
filePath: String,
projection: Option[Schema] = None): GenotypeRDD = {
- if (filePath.endsWith(".vcf")) {
+ if (filePath.endsWith(".vcf") ||
+ filePath.endsWith(".vcf.gz")) {

This comment has been minimized.

@heuermh

heuermh Aug 23, 2016

Member

Should also support bgzf and bgz file extensions

@heuermh

heuermh Aug 23, 2016

Member

Should also support bgzf and bgz file extensions

def loadVariants(
filePath: String,
projection: Option[Schema] = None): VariantRDD = {
- if (filePath.endsWith(".vcf")) {
+ if (filePath.endsWith(".vcf") ||
+ filePath.endsWith(".vcf.gz")) {

This comment has been minimized.

@heuermh

heuermh Aug 23, 2016

Member

same as above

@heuermh

heuermh Aug 23, 2016

Member

same as above

@@ -906,6 +1207,21 @@ class ADAMContext(@transient val sc: SparkContext) extends Serializable with Log
}
}
+ /**
+ * Auto-detects the file type and loads contigs as an RDD.

This comment has been minimized.

@heuermh

heuermh Aug 24, 2016

Member

an RDD → a NucleotideContigFragmentRDD

@heuermh

heuermh Aug 24, 2016

Member

an RDD → a NucleotideContigFragmentRDD

@@ -1011,6 +1355,19 @@ class ADAMContext(@transient val sc: SparkContext) extends Serializable with Log
}
}
+ /**
+ * Loads alignments from a given path, and infers the input type.

This comment has been minimized.

@heuermh

heuermh Aug 24, 2016

Member

should this be alignment fragments? Or better yet: Auto-detects the file type and loads a FragmentRDD.

@heuermh

heuermh Aug 24, 2016

Member

should this be alignment fragments? Or better yet: Auto-detects the file type and loads a FragmentRDD.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Aug 31, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1444/

Build result: FAILURE

GitHub pull request #1118 of commit 27b769c automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1118/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains b995fcd # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1118/merge^{commit} # timeout=10Checking out Revision b995fcd (origin/pr/1118/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f b995fcdbb9436e4613066ecebc768ae0b7070e11First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1444/

Build result: FAILURE

GitHub pull request #1118 of commit 27b769c automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1118/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains b995fcd # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1118/merge^{commit} # timeout=10Checking out Revision b995fcd (origin/pr/1118/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f b995fcdbb9436e4613066ecebc768ae0b7070e11First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Aug 31, 2016

Member

For #1129, do you want to leave those changes out, have me PR against this, or add on to this PR?

In addition to what you have here, we would want to remove

adam-core/src/main/scala/org/bdgenomics/adam/rdd/features/GeneRDD.scala
adam-core/src/main/scala/org/bdgenomics/adam/models/Gene.scala

FeatureRDD.rddToGeneRDD, FeatureRDD.reassignParentIds, and FeatureRDD.toGenes methods
https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/rdd/features/FeatureRDD.scala#L144

and relevant unit tests.

Member

heuermh commented Aug 31, 2016

For #1129, do you want to leave those changes out, have me PR against this, or add on to this PR?

In addition to what you have here, we would want to remove

adam-core/src/main/scala/org/bdgenomics/adam/rdd/features/GeneRDD.scala
adam-core/src/main/scala/org/bdgenomics/adam/models/Gene.scala

FeatureRDD.rddToGeneRDD, FeatureRDD.reassignParentIds, and FeatureRDD.toGenes methods
https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/org/bdgenomics/adam/rdd/features/FeatureRDD.scala#L144

and relevant unit tests.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Aug 31, 2016

Member

How soon do you think you'll have #1129 done? If by the end of the week, let's tack it on here. If it'd be next week, let's just open it whenever you're ready.

Member

fnothaft commented Aug 31, 2016

How soon do you think you'll have #1129 done? If by the end of the week, let's tack it on here. If it'd be next week, let's just open it whenever you're ready.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Sep 1, 2016

Member

It just takes application of the delete key :)

Member

heuermh commented Sep 1, 2016

It just takes application of the delete key :)

+ sparkTest("load parquet with globs") {
+ val inputPath = resourcePath("small.sam")
+ val reads = sc.loadAlignments(inputPath)
+ val outputPath = tempLocation()

This comment has been minimized.

@heuermh

heuermh Sep 1, 2016

Member

This fails to compile for me on your branch. Are you depending on a different version of utils?

@heuermh

heuermh Sep 1, 2016

Member

This fails to compile for me on your branch. Are you depending on a different version of utils?

@heuermh

This comment has been minimized.

Show comment
Hide comment
Member

heuermh commented Sep 1, 2016

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Sep 1, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1452/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1452/
Test PASSed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Sep 6, 2016

Member

I've tested this and haven't run into any problems.

LGTM once the one remaining doc fix above (lines 1126 and 1136 in AdamContext.scala) is in.

Member

heuermh commented Sep 6, 2016

I've tested this and haven't run into any problems.

LGTM once the one remaining doc fix above (lines 1126 and 1136 in AdamContext.scala) is in.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 6, 2016

Member

Thanks @heuermh! Let me get those fixes and rebase this now.

Member

fnothaft commented Sep 6, 2016

Thanks @heuermh! Let me get those fixes and rebase this now.

fnothaft added some commits Aug 17, 2016

[ADAM-993] Support loading files using globs and from directory paths.
Resolves #993.

* Add private helper functions in ADAMContext to elaborate out globs and
  directory paths when loading files.
* Eliminate unused functions for elaborating paths and loading mixtures
  of read files, and some redundant dictionary loading functions.
* Add tests to cover loading directories/globs of:
  * Parquet files
  * BAM files (with/without using indices)
  * VCF files
Cleaning up `org.bdgenomics.adam.rdd.ADAMContext`:
* Made constructor private.
* Removed dead implicits.
* Made `loadParquet` package private.
* Broke long lines.
* Added scaladoc.
* Removed non-ideomatic uses of `Option`s.
* Removed unused overloaded `loadFeatures` definition.
* Removed `loadGenes` and `PrintGenes` command.
* Added support for .vcf.gz in `loadGenotypes` and `loadVariants`.
@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 6, 2016

Member

Just rebased and addressed comments. Thanks @heuermh!

Member

fnothaft commented Sep 6, 2016

Just rebased and addressed comments. Thanks @heuermh!

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Sep 6, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1456/

Build result: FAILURE

GitHub pull request #1118 of commit ea62ad8 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1118/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains adffcf2562521de78054bea2c3ec26d72ccf4920 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1118/merge^{commit} # timeout=10Checking out Revision adffcf2562521de78054bea2c3ec26d72ccf4920 (origin/pr/1118/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f adffcf2562521de78054bea2c3ec26d72ccf4920First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1456/

Build result: FAILURE

GitHub pull request #1118 of commit ea62ad8 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1118/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains adffcf2562521de78054bea2c3ec26d72ccf4920 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1118/merge^{commit} # timeout=10Checking out Revision adffcf2562521de78054bea2c3ec26d72ccf4920 (origin/pr/1118/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f adffcf2562521de78054bea2c3ec26d72ccf4920First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Sep 6, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1457/

Build result: FAILURE

GitHub pull request #1118 of commit c720f45 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1118/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains b001ce2059e41a5c830b132d006c52b20d3a554b # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1118/merge^{commit} # timeout=10Checking out Revision b001ce2059e41a5c830b132d006c52b20d3a554b (origin/pr/1118/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f b001ce2059e41a5c830b132d006c52b20d3a554bFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1457/

Build result: FAILURE

GitHub pull request #1118 of commit c720f45 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1118/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains b001ce2059e41a5c830b132d006c52b20d3a554b # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1118/merge^{commit} # timeout=10Checking out Revision b001ce2059e41a5c830b132d006c52b20d3a554b (origin/pr/1118/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f b001ce2059e41a5c830b132d006c52b20d3a554bFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Sep 6, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1458/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1458/
Test PASSed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Sep 6, 2016

Member

Ready to merge? Commits the way you'd like them?

Member

heuermh commented Sep 6, 2016

Ready to merge? Commits the way you'd like them?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 6, 2016

Member

All is good from my side! Feel free to merge, @heuermh.

Member

fnothaft commented Sep 6, 2016

All is good from my side! Feel free to merge, @heuermh.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Sep 6, 2016

Member

Great, will do!

I feel like this merge should come with a thunderbolt or some other cool sound effects.

Member

heuermh commented Sep 6, 2016

Great, will do!

I feel like this merge should come with a thunderbolt or some other cool sound effects.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 6, 2016

Member

I feel like this merge should come with a thunderbolt or some other cool sound effects.

Could be a nice addition to the ./scripts/commit-pr.sh script... ;)

Member

fnothaft commented Sep 6, 2016

I feel like this merge should come with a thunderbolt or some other cool sound effects.

Could be a nice addition to the ./scripts/commit-pr.sh script... ;)

@heuermh heuermh merged commit 9ea03a5 into bigdatagenomics:master Sep 6, 2016

1 check passed

default Merged build finished.
Details

@fnothaft fnothaft deleted the fnothaft:cleanup-adam-context branch Sep 6, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment