[ADAM-1011] Refactor to add GenomicRDDs for all Avro types #1051

Merged
merged 13 commits into from Jul 18, 2016

Conversation

Projects
None yet
5 participants
@fnothaft
Member

fnothaft commented Jun 15, 2016

A bit of work towards #1011. I'll be dropping the rest of #1011 onto here. Not ready for merge yet, but ready for first review.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jun 15, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1275/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1275/
Test PASSed.

+ replaceRdd(tFn(rdd))
+ }
+
+ protected def replaceRdd(newRdd: RDD[T]): U
}

This comment has been minimized.

@heuermh

heuermh Jun 17, 2016

Member

Could you explain (again, sorry) what transform and replaceRdd are for?

In the context of #1040, I was thinking the pattern would be

val foos: FooRDD = sc.loadFoo("...")
val rdd: RDD[Foo] = foos.rdd
rdd.map(//... general Spark RDD methods
foos.adamSpecificMethods() // ADAM-specific methods from FooRDD
foos.saveAsBar()

I.e. that the caller wouldn't mix RDD and FooRDD method calls. If I understand correctly, transform allows the caller to

foos.adamSpecificMethod().transform(_.map(/*... general Spark RDD methods */))

Is that a common enough pattern in scala? Will this be obvious to clients of our APIs? Will this work when adapted for Java APIs?

@heuermh

heuermh Jun 17, 2016

Member

Could you explain (again, sorry) what transform and replaceRdd are for?

In the context of #1040, I was thinking the pattern would be

val foos: FooRDD = sc.loadFoo("...")
val rdd: RDD[Foo] = foos.rdd
rdd.map(//... general Spark RDD methods
foos.adamSpecificMethods() // ADAM-specific methods from FooRDD
foos.saveAsBar()

I.e. that the caller wouldn't mix RDD and FooRDD method calls. If I understand correctly, transform allows the caller to

foos.adamSpecificMethod().transform(_.map(/*... general Spark RDD methods */))

Is that a common enough pattern in scala? Will this be obvious to clients of our APIs? Will this work when adapted for Java APIs?

This comment has been minimized.

@fnothaft

fnothaft Jun 17, 2016

Member

This riffs off the spark.ml pipelines work: http://spark.apache.org/docs/latest/ml-guide.html#transformers

In:

val foos: FooRDD = sc.loadFoo("...")
val rdd: RDD[Foo] = foos.rdd
rdd.map(//... general Spark RDD methods
foos.adamSpecificMethods() // ADAM-specific methods from FooRDD
foos.saveAsBar()

You'd need to create a new FooRDD with the product of rdd.map(...).

@fnothaft

fnothaft Jun 17, 2016

Member

This riffs off the spark.ml pipelines work: http://spark.apache.org/docs/latest/ml-guide.html#transformers

In:

val foos: FooRDD = sc.loadFoo("...")
val rdd: RDD[Foo] = foos.rdd
rdd.map(//... general Spark RDD methods
foos.adamSpecificMethods() // ADAM-specific methods from FooRDD
foos.saveAsBar()

You'd need to create a new FooRDD with the product of rdd.map(...).

@fnothaft

This comment has been minimized.

Show comment
Hide comment
Member

fnothaft commented Jun 27, 2016

@akmorrow13

This comment has been minimized.

Show comment
Hide comment
@akmorrow13

akmorrow13 Jun 29, 2016

Contributor

+1

Contributor

akmorrow13 commented Jun 29, 2016

+1

fnothaft added some commits Jun 14, 2016

Add NucleotideContigFragmentRDD.
* Add NucleotideContigFragmentRDD to replace RDD[NucleotideContigFragment] and
  updated various ADAMContext functions.
* Added test suite for Fasta2ADAM to validate round trip loading.
Added FeatureRDD and GeneRDD.
* Added Feature and Gene RDDs and updated loading functions.
* Modified Features2ADAMSuite to use sc.loadFeatures to test end-to-end
  conversion.
Added FragmentRDD.
* Added FragmentRDD to replace RDD[Fragment]; refactored load/save methods
* Moved toFragments method from implicit on RDD[AlignmentRecord] to AlignmentRecordRDD
* Added `-single` option to Fragments2Reads.
* Added test suites for Fragments2Reads and Reads2Fragments.
@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 7, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1330/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1330/
Test PASSed.

Refactored GenomicRDD support for overlaps.
* Made region join implementations package private to `org.bdgenomics.adam.rdd`.
* Added support for `broadcastRegionJoin`, `shuffleRegionJoin`, and
  `filterByOverlappingRegion` to `GenomicRDD`.
* Added private `GenericGenomicRDD` class.
* Cleaned up `CalculateDepth`, which used the old `broadcastRegionJoin`
  implementation.
@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 8, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1331/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1331/
Test PASSed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 8, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1332/

Build result: FAILURE

GitHub pull request #1051 of commit 5fc859b automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1051/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 89c7196aa36e4344beceac2be6d127015b394ede # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1051/merge^{commit} # timeout=10Checking out Revision 89c7196aa36e4344beceac2be6d127015b394ede (origin/pr/1051/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 89c7196aa36e4344beceac2be6d127015b394edeFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1332/

Build result: FAILURE

GitHub pull request #1051 of commit 5fc859b automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1051/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 89c7196aa36e4344beceac2be6d127015b394ede # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1051/merge^{commit} # timeout=10Checking out Revision 89c7196aa36e4344beceac2be6d127015b394ede (origin/pr/1051/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 89c7196aa36e4344beceac2be6d127015b394edeFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 8, 2016

Member

Poof! AlignmentRecordRDDFunctions is gone!

@heuermh your hunch about this making the Java API much easier to implement seems to be accurate.

Member

fnothaft commented Jul 8, 2016

Poof! AlignmentRecordRDDFunctions is gone!

@heuermh your hunch about this making the Java API much easier to implement seems to be accurate.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 8, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1333/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1333/
Test PASSed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 8, 2016

Member

@heuermh see bd56167 for the compound genomic RDD bit we were talking about.

Member

fnothaft commented Jul 8, 2016

@heuermh see bd56167 for the compound genomic RDD bit we were talking about.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 9, 2016

Member

Just org.bdgenomics.adam.rdd.variation to go... So. Close.

Member

fnothaft commented Jul 9, 2016

Just org.bdgenomics.adam.rdd.variation to go... So. Close.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 9, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1334/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1334/
Test PASSed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 9, 2016

Member

Almost done! Have all the *RDDFunctions classes factored out, I'm going to finish #1040 tomorrow morning.

Member

fnothaft commented Jul 9, 2016

Almost done! Have all the *RDDFunctions classes factored out, I'm going to finish #1040 tomorrow morning.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 9, 2016

Member

I would say that this is good for final review. The work for #1040 is going to be a single commit and will not be very intrusive. It will touch a lot of files, but not in any sort of important way, and it mostly touches test suites.

Member

fnothaft commented Jul 9, 2016

I would say that this is good for final review. The work for #1040 is going to be a single commit and will not be very intrusive. It will touch a lot of files, but not in any sort of important way, and it mostly touches test suites.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 9, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1335/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1335/
Test PASSed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 9, 2016

Member

All done! Resolves #1011 and #1040. Please review.

Member

fnothaft commented Jul 9, 2016

All done! Resolves #1011 and #1040. Please review.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 9, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1336/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1336/
Test PASSed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jul 9, 2016

Member

Nice work! This is a big one, will take a day or two to review.

@tdanford @jpdna @akmorrow13 @ryan-williams @laserson @massie Calling in for backup!

Member

heuermh commented Jul 9, 2016

Nice work! This is a big one, will take a day or two to review.

@tdanford @jpdna @akmorrow13 @ryan-williams @laserson @massie Calling in for backup!

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 9, 2016

Member

This is a big one, will take a day or two to review.

Yeah, with a change this big, I definitely want to get it right. Perhaps we can target having it reviewed by Wednesday in time for the next BDG standup?

Member

fnothaft commented Jul 9, 2016

This is a big one, will take a day or two to review.

Yeah, with a change this big, I definitely want to get it right. Perhaps we can target having it reviewed by Wednesday in time for the next BDG standup?

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Jul 14, 2016

Member

+1 on this based on my review so far.
Are there any specific aspects to this PR we feel need further attention/review?

Member

jpdna commented Jul 14, 2016

+1 on this based on my review so far.
Are there any specific aspects to this PR we feel need further attention/review?

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Jul 15, 2016

Member

Can we set a goal for Monday to either?:
a) merge this or
b) create a check-list of remaining issues

Member

jpdna commented Jul 15, 2016

Can we set a goal for Monday to either?:
a) merge this or
b) create a check-list of remaining issues

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jul 15, 2016

Member

I'm not sure there are any remaining issues, I'd just like more eyeballs on it, particularly from downstream users, since it is a breaking change from an API point of view.

EOD Monday sounds good for a deadline.

Member

heuermh commented Jul 15, 2016

I'm not sure there are any remaining issues, I'd just like more eyeballs on it, particularly from downstream users, since it is a breaking change from an API point of view.

EOD Monday sounds good for a deadline.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 15, 2016

Member

+1 @ EOD Monday deadline.

Member

fnothaft commented Jul 15, 2016

+1 @ EOD Monday deadline.

+ val aRdd = sc.loadAlignments(path)
+ assert(aRdd.jrdd.count() === 20)
+
+ val newRdd = JavaADAMReadConduit.conduit(aRdd, sc)

This comment has been minimized.

@heuermh

heuermh Jul 18, 2016

Member

I would prefer including the code from JavaADAMReadConduit inline here, and similar for the other types. That *Conduit classes are separate and public leads one to think they are more useful than they are.

@heuermh

heuermh Jul 18, 2016

Member

I would prefer including the code from JavaADAMReadConduit inline here, and similar for the other types. That *Conduit classes are separate and public leads one to think they are more useful than they are.

This comment has been minimized.

@fnothaft

fnothaft Jul 18, 2016

Member

Alas, they can't be inlined because they're written in Java and the test suite is written in Scala. I was unaware that they were public. Perhaps the best route forward is to make them package private?

@fnothaft

fnothaft Jul 18, 2016

Member

Alas, they can't be inlined because they're written in Java and the test suite is written in Scala. I was unaware that they were public. Perhaps the best route forward is to make them package private?

This comment has been minimized.

@heuermh

heuermh Jul 18, 2016

Member

Ah right, I missed the language jump!

It might be useful to have a Java version of ADAMFunSuite for downstream users of the API module, and then the unit tests for JavaADAMContext could bridge both languages. This would require running both junit and scalatest test runners (which shouldn't be a problem) and would allow us to create a test-jar artifact for use downstream.

For now I would suggest making the *Conduit classes package private, and we can expand unit test support in Java as part of #855.

@heuermh

heuermh Jul 18, 2016

Member

Ah right, I missed the language jump!

It might be useful to have a Java version of ADAMFunSuite for downstream users of the API module, and then the unit tests for JavaADAMContext could bridge both languages. This would require running both junit and scalatest test runners (which shouldn't be a problem) and would allow us to create a test-jar artifact for use downstream.

For now I would suggest making the *Conduit classes package private, and we can expand unit test support in Java as part of #855.

This comment has been minimized.

@fnothaft

fnothaft Jul 18, 2016

Member

+1, that sounds like a good approach!

@fnothaft

fnothaft Jul 18, 2016

Member

+1, that sounds like a good approach!

@@ -74,7 +74,7 @@ class AlleleCount(val args: AlleleCountArgs) extends BDGSparkCommand[AlleleCount
def run(sc: SparkContext) {
- val adamVariants: RDD[Genotype] = sc.loadGenotypes(args.adamFile)
+ val adamVariants: RDD[Genotype] = sc.loadGenotypes(args.adamFile).rdd

This comment has been minimized.

@heuermh

heuermh Jul 18, 2016

Member

Alternatively, should countAlleles below accept GenotypeRDD as a parameter?

@heuermh

heuermh Jul 18, 2016

Member

Alternatively, should countAlleles below accept GenotypeRDD as a parameter?

@@ -67,77 +63,32 @@ class CalculateDepth(protected val args: CalculateDepthArgs) extends BDGSparkCom
val proj = Projection(contigName, start, cigar, readMapped)
- val adamRDD: RDD[AlignmentRecord] = sc.loadAlignments(args.adamInputPath, projection = Some(proj))

This comment has been minimized.

@heuermh

heuermh Jul 18, 2016

Member

+1 to removing the variant name stuff, thanks

@heuermh

heuermh Jul 18, 2016

Member

+1 to removing the variant name stuff, thanks

- def apply(rdd: RDD[AlignmentRecord],
- sd: SequenceDictionary,
- rgd: RecordGroupDictionary): RDD[AlignmentRecord] = {
+ def apply(rdd: AlignmentRecordRDD): AlignmentRecordRDD = {

This comment has been minimized.

@heuermh

heuermh Jul 18, 2016

Member

rdd and adamRecordsalignmentRecords here and below, to reduce confusion about rdd.rdd.context

@heuermh

heuermh Jul 18, 2016

Member

rdd and adamRecordsalignmentRecords here and below, to reduce confusion about rdd.rdd.context

This comment has been minimized.

@fnothaft

fnothaft Jul 18, 2016

Member

Is it OK with you to punt this cleanup to #1081? I can make a note there to clean that up after the merge and rebase.

@fnothaft

fnothaft Jul 18, 2016

Member

Is it OK with you to punt this cleanup to #1081? I can make a note there to clean that up after the merge and rebase.

This comment has been minimized.

@heuermh

heuermh Jul 18, 2016

Member

Yep

@@ -335,8 +334,11 @@ class Transform(protected val args: TransformArgs) extends BDGSparkCommand[Trans
(rdd ++ t.rdd, sd ++ t.sequences, rgd ++ t.recordGroups)
})
+ // make a new alignment record rdd
+ val newRdd = AlignedReadRDD(mergedRdd, mergedSd, mergedRgd)

This comment has been minimized.

@heuermh

heuermh Jul 18, 2016

Member

alignment record vs. aligned read? I need to go see what the difference is. The comment and classname should match.

@heuermh

heuermh Jul 18, 2016

Member

alignment record vs. aligned read? I need to go see what the difference is. The comment and classname should match.

@@ -29,7 +30,7 @@ class Features2ADAMSuite extends ADAMFunSuite {
ignore("can convert a simple BED file") {

This comment has been minimized.

@heuermh

heuermh Jul 18, 2016

Member

if this test works, un-ignore it

@heuermh

heuermh Jul 18, 2016

Member

if this test works, un-ignore it

@@ -44,10 +44,8 @@ class ViewSuite extends ADAMFunSuite {
)
val aRdd = sc.loadBam(inputSamPath)

This comment has been minimized.

@heuermh

heuermh Jul 18, 2016

Member

aRddalignmentRecords

@heuermh

heuermh Jul 18, 2016

Member

aRddalignmentRecords

+ /**
+ * Java friendly save function. Automatically detects the output format.
+ *
+ * If the filename ends in ".bed", we write a BED file. If the file name ends

This comment has been minimized.

@heuermh

heuermh Jul 18, 2016

Member

To be clear, these actually write a directory full of part... files in the various feature formats, since we're using Spark saveAsTextFile. Support for single files is #1058.

@heuermh

heuermh Jul 18, 2016

Member

To be clear, these actually write a directory full of part... files in the various feature formats, since we're using Spark saveAsTextFile. Support for single files is #1058.

}
+ /**
+ * Saves this FeatrueRDD as a interval list file.

This comment has been minimized.

@heuermh

heuermh Jul 18, 2016

Member

FeatrueRDDFeatureRDD

@heuermh

heuermh Jul 18, 2016

Member

FeatrueRDDFeatureRDD

+ }
+
+ /**
+ * Groups all reads by record group and read name

This comment has been minimized.

@heuermh

heuermh Jul 18, 2016

Member

namename.

@heuermh

heuermh Jul 18, 2016

Member

namename.

- def transform(tFn: RDD[VariantContext] => RDD[VariantContext]): VariantContextRDD = {
- VariantContextRDD(tFn(rdd), sequences, samples)
+ /**
+ * Left outer join database variant annotations

This comment has been minimized.

@heuermh

heuermh Jul 18, 2016

Member

annotationsannotations.

@heuermh

heuermh Jul 18, 2016

Member

annotationsannotations.

- assert(features.count === 15)
+ val path = testFile("Homo_sapiens.GRCh37.75.trun20.gtf")
+ val features: RDD[Feature] = sc.loadFeatures(path).rdd
+ assert(features.rdd.count === 15)

This comment has been minimized.

@heuermh

heuermh Jul 18, 2016

Member

if features is already of type RDD[Feature] then why features.rdd.count instead of features.count?

@heuermh

heuermh Jul 18, 2016

Member

if features is already of type RDD[Feature] then why features.rdd.count instead of features.count?

val expected = sc.loadGff3(inputPath)
val outputPath = tempLocation(".gff3")
expected.saveAsGff3(outputPath)
// grab all partitions, may not necessarily be in order; sort by reference
val actual = sc.loadGff3(outputPath + "/part-*")
- val pairs = expected.coalesce(1).sortByReference().zip(actual.coalesce(1).sortByReference()).collect
+ val pairs = expected.transform(_.coalesce(1)).sortByReference().rdd.zip(actual.transform(_.coalesce(1)).sortByReference().rdd).collect

This comment has been minimized.

@heuermh

heuermh Jul 18, 2016

Member

eesh, this line keeps getting longer! I wish I could've separated it out into a utility method or class, but I couldn't get that to work due to serialization issues. If you would like to take a shot at it, go fer it!

@heuermh

heuermh Jul 18, 2016

Member

eesh, this line keeps getting longer! I wish I could've separated it out into a utility method or class, but I couldn't get that to work due to serialization issues. If you would like to take a shot at it, go fer it!

}
- def artificial_reads: RDD[AlignmentRecord] = {
+ def artificial_reads_rdd: AlignmentRecordRDD = {

This comment has been minimized.

@heuermh

heuermh Jul 18, 2016

Member

artificial_reads_rddartificialReads

@heuermh

heuermh Jul 18, 2016

Member

artificial_reads_rddartificialReads

}
+ /**
+ * Saves this FeatureRDD to a GTF file.

This comment has been minimized.

@heuermh

heuermh Jul 18, 2016

Member

How about
Save this FeatureRDD in GTF format.
and
@param fileName The path to save GTF formatted text file(s) to.

and similar for below?

@heuermh

heuermh Jul 18, 2016

Member

How about
Save this FeatureRDD in GTF format.
and
@param fileName The path to save GTF formatted text file(s) to.

and similar for below?

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jul 18, 2016

Member

Done with my review.

Member

heuermh commented Jul 18, 2016

Done with my review.

@heuermh heuermh modified the milestone: 0.20.0 Jul 18, 2016

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 18, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1342/

Build result: FAILURE

GitHub pull request #1051 of commit a6dbf86 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1051/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 2a96d1f9d793c6da5c758c0fdaf0f5a7d4555de7 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1051/merge^{commit} # timeout=10Checking out Revision 2a96d1f9d793c6da5c758c0fdaf0f5a7d4555de7 (origin/pr/1051/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 2a96d1f9d793c6da5c758c0fdaf0f5a7d4555de7First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1342/

Build result: FAILURE

GitHub pull request #1051 of commit a6dbf86 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1051/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 2a96d1f9d793c6da5c758c0fdaf0f5a7d4555de7 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1051/merge^{commit} # timeout=10Checking out Revision 2a96d1f9d793c6da5c758c0fdaf0f5a7d4555de7 (origin/pr/1051/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 2a96d1f9d793c6da5c758c0fdaf0f5a7d4555de7First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 18, 2016

Member

@heuermh all review comments should be addressed. Thanks for making a last pass!

Member

fnothaft commented Jul 18, 2016

@heuermh all review comments should be addressed. Thanks for making a last pass!

fnothaft added some commits Jul 8, 2016

Removed AlignmentRecordRDDFunctions.
* Removed `AlignmentRecordRDDFunctions` and moved all functions into
  `AlignmentRecordRDD`. Moved test suite along with the main class move.
* Removed `countTag` functions and `PrintTags` CLI.
* Removed `JavaAlignmentRecordRDD`. The Java API just uses the
  `AlignmentRecordRDD` now. To do this, I moved the Java friendly save methods
  and JavaRDD to `GenomicRDD`.
Removed NucleotideContigFragmentRDDFunctions.
* Removed `NucleotideContigFragmentRDDFunctions` and moved all functions to
  `NucleotideContigFragmentRDD`. Moved test suite to accompany refactor.
* Added Java helper functions, and added functions for loading sequence data
  from the `JavaADAMContext`.
* Refactored JavaADAMContext test code to make it easier to test Java API.
* Fixed `ADAMContext.loadSequences` so that it correctly identifies `.fa.gz`
  and `.fasta.gz` file extensions.
* Fixed `ADAMFunSuite.copyResource` so that it correctly copies resource files
  with compound file extensions.
Moderate additional refactor to FragmentRDD.
* Adding documentation to FragmentRDD.
* Added Java-friendly save method to FragmentRDD.
* Added method for loading Fragments from the `JavaADAMContext`.
Removed FeatureRDDFunctions class.
* Removed `FeatureRDDFunctions` class and moved all methods into `FeatureRDD`.
  Renamed test suite to match, and added Java helper functions.
* Moved all `adam-*/src/test/resources/features/*` files into their respective
  `resources` directory. This is necessary due to an issue when using
  `copyResources` with subdirectories.
* Removed `ignore` on test in `org.bdgenomics.adam.cli.Features2ADAMSuite`.
Removed VariationRDDFunctions.
* Removed `VariationRDDFunctions` and moved remaining methods to
  `VariantContextRDD`. Moved test suite code as well.
* Removed abstract class `ADAMSequenceDictionaryRDDAggregator`.
* Added Java helper methods to various variation RDDs and accompanying loader
  methods in `JavaADAMContext`.
* Miscellaneous documentation cleanup in `org.bdgenomics.adam.rdd.variation`.

Resolves #1011.

@fnothaft fnothaft referenced this pull request Jul 18, 2016

Closed

Clean up packages #1083

6 of 6 tasks complete
@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 18, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1343/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1343/
Test PASSed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 18, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1344/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1344/
Test PASSed.

@heuermh heuermh changed the title from Refactor to add GenomicRDDs for all Avro types to [ADAM-1011] Refactor to add GenomicRDDs for all Avro types Jul 18, 2016

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 18, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1346/

Build result: FAILURE

[...truncated 24 lines...]Triggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.3.1,centosADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1346/

Build result: FAILURE

[...truncated 24 lines...]Triggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.3.1,centosADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 18, 2016

Member

Jenkins, retest this please.

Member

fnothaft commented Jul 18, 2016

Jenkins, retest this please.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jul 18, 2016

Member

Did the pull request rename retrigger J enkins? sorry if so . . .

Member

heuermh commented Jul 18, 2016

Did the pull request rename retrigger J enkins? sorry if so . . .

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 18, 2016

Member

Seems so! Alas.

Member

fnothaft commented Jul 18, 2016

Seems so! Alas.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 18, 2016

Member

Can't run too many builds, though, right?

Member

fnothaft commented Jul 18, 2016

Can't run too many builds, though, right?

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 18, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1347/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1347/
Test PASSed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jul 18, 2016

Member

LGTM. Shall I merge now or did you want to squash some commits first?

Member

heuermh commented Jul 18, 2016

LGTM. Shall I merge now or did you want to squash some commits first?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 18, 2016

Member

@heuermh I think I'd like to keep the history as is here. I'm cool with this one being merged via the good ol' green merge button.

Member

fnothaft commented Jul 18, 2016

@heuermh I think I'd like to keep the history as is here. I'm cool with this one being merged via the good ol' green merge button.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jul 18, 2016

Member

<click>

Member

heuermh commented Jul 18, 2016

<click>

@heuermh heuermh merged commit 810806c into bigdatagenomics:master Jul 18, 2016

1 check passed

default Merged build finished.
Details
@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jul 18, 2016

Member

Thank you, @fnothaft!

Member

heuermh commented Jul 18, 2016

Thank you, @fnothaft!

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 18, 2016

Member

w00t! Thank you, as well @heuermh!

Member

fnothaft commented Jul 18, 2016

w00t! Thank you, as well @heuermh!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment