New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-651] Hive-style partitioning of parquet files by genomic position #1878

Closed
wants to merge 5 commits into
base: master
from

Conversation

6 participants
@jpdna
Member

jpdna commented Jan 19, 2018

Fixes #651

Manually merged changes for the "hive-style" partitioning branch as a single commit on top of master.

@coveralls

This comment has been minimized.

coveralls commented Jan 19, 2018

Coverage Status

Coverage decreased (-0.2%) to 82.6% when pulling 7cbe2f3 on jpdna:hive_partitioned_v5 into 4223f56 on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

AmplabJenkins commented Jan 19, 2018

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2578/
Test PASSed.

@heuermh

Thanks for cleaning up the rebase/merge stuff, @jpdna!

* @param pathName The path name to load alignment records from.
* Globs/directories are supported.
* @param regions Optional list of genomic regions to load.
* @param addChrPrefix Flag to add "chr" prefix to contigs

This comment has been minimized.

@heuermh

heuermh Jan 19, 2018

Member

I don't think this should be part of the API, and in fact simply adding or removing "chr" is not sufficient for converting between the different styles. See e.g. https://github.com/heuermh/dishevelled-bio/blob/master/tools/src/main/java/org/dishevelled/bio/tools/RenameReferences.java#L125 and below

This comment has been minimized.

@jpdna

jpdna Jan 19, 2018

Member

agreed, I don't like it either in the API.
I'll try to push any needed conversion into the application code (Mango) by having it look at sequence dictionary and see if a conversion is needed, so that by the time a ReferenceRegion makes it into ADAM code it is on the correct contig name convention for the underlying source dataset.
@heuermh - I'll plan to use the replacement logic you pointed to - thanks!

This comment has been minimized.

@fnothaft

fnothaft Jan 21, 2018

Member

+1 towards not including it and pushing it to user level. FYI, you can't link against dishevelled-bio as it is LGPL.

This comment has been minimized.

@heuermh

heuermh Jan 22, 2018

Member

You can't link against dishevelled-bio as it is LGPL.

I'm the only copyright holder in dishevelled-bio, so I could relicense stuff in there if necessary. I don't think this bit is interesting enough to do so, and it only covered the one use case I was interested in. That's why I haven't submitted something identical as a solution for #1757.

* @param pathName The path name to load alignment records from.
* Globs/directories are supported.
* @param regions Optional list of genomic regions to load.
* @param addChrPrefix Flag to add "chr" prefix to contigs

This comment has been minimized.

@heuermh

heuermh Jan 19, 2018

Member

Remove as above

* @param pathName The path name to load alignment records from.
* Globs/directories are supported.
* @param regions Optional list of genomic regions to load.
* @param addChrPrefix Flag to add "chr" prefix to contigs

This comment has been minimized.

@heuermh

heuermh Jan 19, 2018

Member

Remove as above

* @param addChrPrefix Flag to add "chr" prefix to contigs
* @return Returns a FeatureRDD.
*/

This comment has been minimized.

@heuermh

heuermh Jan 19, 2018

Member

Remove extra whitespace

}
datasetBoundFeatureRDD

This comment has been minimized.

@heuermh

heuermh Jan 19, 2018

Member

Remove extra whitespace

.option("spark.sql.parquet.compression.codec", compressCodec.toString.toLowerCase())
.save(filePath)
writePartitionedParquetFlag(filePath)
//rdd.context.writePartitionedParquetFlag(filePath)

This comment has been minimized.

@heuermh

heuermh Jan 19, 2018

Member

Remove commented out code

@@ -925,6 +925,33 @@ class FeatureRDDSuite extends ADAMFunSuite {
assert(rdd3.dataset.count === 4)
}
sparkTest("load paritioned parquet to sql, save, re-read from avro") {

This comment has been minimized.

@heuermh

heuermh Jan 19, 2018

Member

paritioned → partitioned

@@ -638,6 +638,41 @@ class AlignmentRecordRDDSuite extends ADAMFunSuite {
assert(rdd3.dataset.count === 20)
}
sparkTest("load from sam, save as partitioend parquet, and and re-read from partitioned parquet") {

This comment has been minimized.

@heuermh

heuermh Jan 19, 2018

Member

partitioend → partitioned

@@ -128,6 +128,15 @@ class GenotypeRDDSuite extends ADAMFunSuite {
assert(starts(752790L))
}
sparkTest("round trip to paritioned parquet") {

This comment has been minimized.

@heuermh

heuermh Jan 19, 2018

Member

paritioned → partitioned

"Options other than compression codec are ignored.")
val df = toDF()
df.withColumn("posBin", floor(df("start") / partitionSize))

This comment has been minimized.

@heuermh

heuermh Jan 19, 2018

Member

"posBin" → "position" or "positionBin" or "bin"

@coveralls

This comment has been minimized.

coveralls commented Jan 20, 2018

Coverage Status

Coverage decreased (-0.09%) to 82.616% when pulling 5a2be53 on jpdna:hive_partitioned_v5 into adff336 on bigdatagenomics:master.

@AmplabJenkins

This comment has been minimized.

AmplabJenkins commented Jan 20, 2018

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2580/
Test PASSed.

@fnothaft

Thanks @jpdna! It looks like this is close to ready!

* @param pathName The path name to load alignment records from.
* Globs/directories are supported.
* @param regions Optional list of genomic regions to load.
* @param addChrPrefix Flag to add "chr" prefix to contigs

This comment has been minimized.

@fnothaft

fnothaft Jan 21, 2018

Member

+1 towards not including it and pushing it to user level. FYI, you can't link against dishevelled-bio as it is LGPL.

val reads: AlignmentRecordRDD = ParquetUnboundAlignmentRecordRDD(sc, pathName, sd, rgd, pgs)
val datasetBoundAlignmentRecordRDD: AlignmentRecordRDD = regions match {
case Some(x) => DatasetBoundAlignmentRecordRDD(reads.dataset.filter(referenceRegionsToDatasetQueryString(x)), reads.sequences, reads.recordGroups, reads.processingSteps)

This comment has been minimized.

@fnothaft

fnothaft Jan 21, 2018

Member

Nit: Break longlines.

* @param addChrPrefix Flag to add "chr" prefix to contigs
* @return Returns a VariantRDD
*/

This comment has been minimized.

@fnothaft

fnothaft Jan 21, 2018

Member

Extra whitespace.

* @param addChrPrefix Flag to add "chr" prefix to contigs
* @return Returns an AlignmentRecordRDD.
*/
def loadPartitionedParquetAlignments(pathName: String, regions: Option[Iterable[ReferenceRegion]] = None): AlignmentRecordRDD = {

This comment has been minimized.

@fnothaft

fnothaft Jan 21, 2018

Member

Anywhere you have Option[Iterable[ReferenceRegion]] = None should be Iterable[ReferenceRegion] = Iterable.empty.

* @param addChrPrefix Flag to add "chr" prefix to contigs
* @return Returns a GenotypeRDD.
*/
def loadPartitionedParquetGenotypes(pathName: String, regions: Option[Iterable[ReferenceRegion]] = None): GenotypeRDD = {

This comment has been minimized.

@fnothaft

fnothaft Jan 21, 2018

Member

See above comment RE: Option[Iterable[ReferenceRegion]] = None.

* @param addChrPrefix Flag to add "chr" prefix to contigs
* @return Returns a NucleotideContigFragmentRDD
*/
def loadPartitionedParquetFragments(pathName: String, regions: Option[Iterable[ReferenceRegion]] = None): NucleotideContigFragmentRDD = {

This comment has been minimized.

@fnothaft

fnothaft Jan 21, 2018

Member

See above comment RE: Option[Iterable[ReferenceRegion]] = None.

* @return Return True if partitioned flag found, False otherwise.
*/
def checkPartitionedParquetFlag(filePath: String): Boolean = {

This comment has been minimized.

@fnothaft
*/
def checkPartitionedParquetFlag(filePath: String): Boolean = {
val path = new Path(filePath, "_isPartitionedByStartPos")

This comment has been minimized.

@fnothaft

fnothaft Jan 21, 2018

Member

Yeah, I'd suggest using the getFsAndFilesWithFilter function above. Behavior should be undefined if you have a glob but not all the paths are partitioned.

def referenceRegionsToDatasetQueryString(regions: Iterable[ReferenceRegion], partitionSize: Int = 1000000): String = {
var regionQueryString = "(contigName=" + "\'" + regions.head.referenceName + "\' and posBin >= \'" +

This comment has been minimized.

@fnothaft

fnothaft Jan 21, 2018

Member

This will throw if regions.isEmpty, suggest:

regions.map(r => {
  // logic for a single reference region goes here
}).mkString(" or " )
def writePartitionedParquetFlag(filePath: String): Boolean = {
val path = new Path(filePath, "_isPartitionedByStartPos")
val fs = path.getFileSystem(toDF().sqlContext.sparkContext.hadoopConfiguration)

This comment has been minimized.

@fnothaft

fnothaft Jan 21, 2018

Member

+1, should just be rdd.context.hadoopConfiguration

@fnothaft fnothaft referenced this pull request Jan 21, 2018

Closed

HBase seperate module PR #1388

@fnothaft fnothaft added this to the 0.24.0 milestone Jan 21, 2018

@heuermh heuermh changed the title from hive-style partitioning of parquet files by genomic position to [ADAM-651] Hive-style partitioning of parquet files by genomic position Jan 22, 2018

@AmplabJenkins

This comment has been minimized.

AmplabJenkins commented Jan 23, 2018

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2586/
Test PASSed.

@AmplabJenkins

This comment has been minimized.

AmplabJenkins commented Jan 23, 2018

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2587/
Test PASSed.

@AmplabJenkins

This comment has been minimized.

AmplabJenkins commented Jan 23, 2018

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2588/
Test PASSed.

@AmplabJenkins

This comment has been minimized.

AmplabJenkins commented Jan 23, 2018

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2589/
Test PASSed.

@jpdna

This comment has been minimized.

Member

jpdna commented Jan 23, 2018

I believe I have addressed the reviewer requests above, except for the following discussed below:

  1. add loadPartitionedParquetFragments that returns FragmentRDD and loadPartitionedParquetCoverage that returns CoverageRDD.
    I agree these should be added for completeness, but I'd rather not hold up accepting the existing changes because I don't believe Mango needs those for current demo project.

  2. java friendly method
    would like to delay for a future update

  3. make referenceRegionsToDatasetQueryString private
    This requires a bit of work as in the effort with Mango to optimize latency by "caching" the handle to a dataset to be re-used with multiple filters, Mango currently uses referenceRegionsToDatasetQueryString
    https://github.com/jpdna/mango/blob/18369e43354f0de3f6804ab7ed83b5923d001538/mango-core/src/main/scala/org/bdgenomics/mango/models/AlignmentRecordMaterialization.scala#L224
    I agree though that this is an implementation detail that doesn't need to be the the api, and we should incorporate into the API the ability to do the dataset handle caching optimization that Mango uses.
    I think we could solve this be adding another version of the partitioned parquet load functions that takes an existing dataset as input rather than a pathname.

def filterPartitionedParquetAlignmentRecordRDDbyRegions(dataset: AlignmentRecordRDD, regions: Iterable[ReferenceRegion] = Iterable.empty): AlignmentRecordRDD = {

I'll go ahead and start making those changes, and testing Mango with them, but I'd like to get the rest of this PR through a second pass in parallel.

OR - as we do need to get this merged sooner rather than later for Mango - what if we leave the referenceRegionsToDatasetQueryString as public for now but mark it as deprecated?

@AmplabJenkins

This comment has been minimized.

AmplabJenkins commented Jan 23, 2018

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2590/
Test PASSed.

@@ -857,4 +888,5 @@ class NucleotideContigFragmentRDDSuite extends ADAMFunSuite {
checkSave(variantContexts)
}

This comment has been minimized.

@akmorrow13

akmorrow13 Jan 23, 2018

Contributor

Remove line

This comment has been minimized.

@jpdna

jpdna Jan 24, 2018

Member

done

assert(sequenceRdd.sequences.containsRefName("aSequence"))
}
val inputPath = testFile("small.1.bed")

This comment has been minimized.

@akmorrow13

akmorrow13 Jan 23, 2018

Contributor

There are a lot of asserts here. Can you comment their purpose or break them into separate tests?

This comment has been minimized.

@jpdna

jpdna Jan 24, 2018

Member

done, removed intermediate step asserts which were redundant.

genotypes.saveAsPartitionedParquet(outputPath)
val unfilteredGenotypes = sc.loadPartitionedParquetGenotypes(outputPath)
assert(unfilteredGenotypes.rdd.count === 18)

This comment has been minimized.

@akmorrow13

akmorrow13 Jan 23, 2018

Contributor

remove line

This comment has been minimized.

@jpdna

jpdna Jan 24, 2018

Member

done

assert(unfilteredVariants.rdd.count === 6)
assert(unfilteredVariants.dataset.count === 6)
val regionsVariants = sc.loadPartitionedParquetVariants(outputPath, List(ReferenceRegion("2", 19000L, 21000L), ReferenceRegion("13", 752700L, 752750L)))

This comment has been minimized.

@akmorrow13

akmorrow13 Jan 23, 2018

Contributor

line break

This comment has been minimized.

@jpdna

jpdna Jan 24, 2018

Member

done

@akmorrow13

This comment has been minimized.

Contributor

akmorrow13 commented Jan 23, 2018

@jpdna it may be good to update the Mango PR bigdatagenomics/mango#344 and make sure we have all the functionality we need in Mango included in this PR.

@AmplabJenkins

This comment has been minimized.

AmplabJenkins commented Jan 24, 2018

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2592/
Test PASSed.

@akmorrow13

This comment has been minimized.

Contributor

akmorrow13 commented Jan 28, 2018

Jenkins, retest this please.

@AmplabJenkins

This comment has been minimized.

AmplabJenkins commented Jan 28, 2018

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2611/

Build result: FAILURE

[...truncated 7 lines...] > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1878/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 7894e89 # timeout=10Checking out Revision 7894e89 (origin/pr/1878/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 7894e89 > /home/jenkins/git2/bin/git rev-list 8f53bfe # timeout=10Triggering ADAM-prb ? 2.6.2,2.10,2.2.1,centosTriggering ADAM-prb ? 2.6.2,2.11,2.2.1,centosTriggering ADAM-prb ? 2.7.3,2.10,2.2.1,centosTriggering ADAM-prb ? 2.7.3,2.11,2.2.1,centosADAM-prb ? 2.6.2,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.6.2,2.11,2.2.1,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.7.3,2.11,2.2.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@AmplabJenkins

This comment has been minimized.

AmplabJenkins commented Jan 30, 2018

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2621/

Build result: FAILURE

[...truncated 7 lines...] > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1878/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 345e43f # timeout=10Checking out Revision 345e43f (origin/pr/1878/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 345e43f > /home/jenkins/git2/bin/git rev-list 7894e89 # timeout=10Triggering ADAM-prb ? 2.6.2,2.10,2.2.1,centosTriggering ADAM-prb ? 2.6.2,2.11,2.2.1,centosTriggering ADAM-prb ? 2.7.3,2.10,2.2.1,centosTriggering ADAM-prb ? 2.7.3,2.11,2.2.1,centosADAM-prb ? 2.6.2,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.6.2,2.11,2.2.1,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.7.3,2.11,2.2.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Paschall Paschall
hive-style partitioning of parquet files by genomic position
removed addChrPrefix parameter

Address PR comments - part 1

Address PR comments - part 2

fix nits

Rebased

Address review comments - part 3

address reviewer comments - white space and redundant asserts

fixed isPartitioned
@AmplabJenkins

This comment has been minimized.

AmplabJenkins commented Jan 30, 2018

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2622/
Test PASSed.

@AmplabJenkins

This comment has been minimized.

AmplabJenkins commented Jan 31, 2018

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2626/
Test PASSed.

@jpdna

This comment has been minimized.

Member

jpdna commented Feb 2, 2018

Ping for further review.

*
* @param filePath Path to save the file at.
*/
def writePartitionedParquetFlag(filePath: String): Boolean = {

This comment has been minimized.

@akmorrow13

akmorrow13 Feb 2, 2018

Contributor

should this be private?

This comment has been minimized.

@jpdna

jpdna Feb 2, 2018

Member

agree, done.

@AmplabJenkins

This comment has been minimized.

AmplabJenkins commented Feb 2, 2018

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2632/
Test PASSed.

@heuermh

This comment has been minimized.

Member

heuermh commented Feb 7, 2018

As discussed earlier, here are two alternatives for style changes for the load methods:

Always return dataset bound RDD (always-return-dataset-bound.patch.txt)

def loadPartitionedParquetAlignments(
  pathName: String,
  regions: Iterable[ReferenceRegion] = Iterable.empty): AlignmentRecordRDD = {

  require(isPartitioned(pathName), s"Input Parquet files ($pathName) are not partitioned.")

  val reads = loadParquetAlignments(pathName, optPredicate = None, optProjection = None)

  val dataset = if (regions.nonEmpty) {
    reads.dataset.filter(referenceRegionsToDatasetQueryString(regions))
  } else {
    reads.dataset
  }

  DatasetBoundAlignmentRecordRDD(dataset, reads.sequences, reads.recordGroups, reads.processingSteps)
}

Return unbound or dataset bound RDD (return-unbound-or-dataset-bound.patch.txt)

def loadPartitionedParquetAlignments(
  pathName: String,
  regions: Iterable[ReferenceRegion] = Iterable.empty): AlignmentRecordRDD = {

  val reads = loadParquetAlignments(pathName, optPredicate = None, optProjection = None)

  val filteredReads = if (regions.nonEmpty) {
    require(isPartitioned(pathName), s"Input Parquet files ($pathName) are not partitioned.")

    DatasetBoundAlignmentRecordRDD(
      reads.dataset.filter(referenceRegionsToDatasetQueryString(regions)),
      reads.sequences,
      reads.recordGroups,
      reads.processingSteps
    )
  } else {
    reads
  }

  filteredReads
}
@heuermh

This comment has been minimized.

Member

heuermh commented Feb 7, 2018

Sorry, github isn't allowing me to upload the referred to patches, will send via email.

@jpdna

This comment has been minimized.

Member

jpdna commented Feb 9, 2018

Thanks @heuermh!
I'm going with the first option that always returns DatasetBoundAlignmentRecordRDD because we cannot allow possibility of .rdd being called directly on a 'ParquetUnboundAlignmentRecordRDD' backed by partitioned parquet, as the partitioned parquet can only be read as a dataset, an attempt to read as an rdd will cause an error. Happily, once you have a DatasetBoundAlignmentRecordRDD then '.rdd' will work to convert it to an rdd.

Paschall Paschall
@AmplabJenkins

This comment has been minimized.

AmplabJenkins commented Feb 9, 2018

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2657/
Test PASSed.

@heuermh

This comment has been minimized.

Member

heuermh commented Feb 9, 2018

I'm going with the first option that always returns DatasetBoundAlignmentRecordRDD because we cannot allow possibility of .rdd being called directly on a 'ParquetUnboundAlignmentRecordRDD' backed by partitioned parquet, as the partitioned parquet can only be read as a dataset, an attempt to read as an rdd will cause an error.

We must not have good enough unit test coverage then, because both patches passed all unit tests. :)

@AmplabJenkins

This comment has been minimized.

AmplabJenkins commented Feb 9, 2018

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2660/

Build result: FAILURE

[...truncated 7 lines...] > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1878/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains d22b9ff # timeout=10Checking out Revision d22b9ff (origin/pr/1878/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f d22b9ff > /home/jenkins/git2/bin/git rev-list cba6e71 # timeout=10Triggering ADAM-prb ? 2.6.2,2.10,2.2.1,centosTriggering ADAM-prb ? 2.6.2,2.11,2.2.1,centosTriggering ADAM-prb ? 2.7.3,2.10,2.2.1,centosTriggering ADAM-prb ? 2.7.3,2.11,2.2.1,centosADAM-prb ? 2.6.2,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.6.2,2.11,2.2.1,centos completed with result FAILUREADAM-prb ? 2.7.3,2.10,2.2.1,centos completed with result FAILUREADAM-prb ? 2.7.3,2.11,2.2.1,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@heuermh heuermh added this to Triage in Release 0.24.0 Feb 10, 2018

@jpdna

This comment has been minimized.

Member

jpdna commented Feb 10, 2018

FYI - I have a working branch where the filtering has been moved into a filterByOverlappingRegions in GenomicDataset as discussed.
I'll push it tomorrow.

@jpdna

This comment has been minimized.

Member

jpdna commented Feb 13, 2018

Replaced by #1911

@jpdna jpdna closed this Feb 13, 2018

@heuermh heuermh moved this from Triage to Completed in Release 0.24.0 Feb 14, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment