New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-587] Clean up loading checks. #588

Merged
merged 1 commit into from Mar 5, 2015

Conversation

Projects
None yet
4 participants
@fnothaft
Member

fnothaft commented Feb 25, 2015

I think this resolves what we were discussing on #587 last night. CC @laserson @ryan-williams for thoughts and review.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Feb 25, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/609/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/609/
Test PASSed.

Show outdated Hide outdated adam-core/src/main/scala/org/bdgenomics/adam/rdd/ADAMContext.scala
if (projection.isDefined) {
log.warn("Projection is ignored when loading a BAM file")
}
val reads = adamBamLoad(filePath)

This comment has been minimized.

@laserson

laserson Feb 25, 2015

Contributor

why is there a loadBam and an adamBamLoad? Can we eliminate one?

@laserson

laserson Feb 25, 2015

Contributor

why is there a loadBam and an adamBamLoad? Can we eliminate one?

This comment has been minimized.

@fnothaft

fnothaft Feb 25, 2015

Member

Probably. Let me take a look.

@fnothaft

fnothaft Feb 25, 2015

Member

Probably. Let me take a look.

@@ -133,19 +133,13 @@ class ADAMContext(val sc: SparkContext) extends Serializable with Logging {
* @tparam T The type of records to return
* @return An RDD with records of the specified type
*/
def adamLoad[T, U <: UnboundRecordFilter](filePath: String, predicate: Option[Class[U]] = None, projection: Option[Schema] = None)(implicit ev1: T => SpecificRecord, ev2: Manifest[T]): RDD[T] = {
private[rdd] def adamLoad[T, U <: UnboundRecordFilter](filePath: String, predicate: Option[Class[U]] = None, projection: Option[Schema] = None)(implicit ev1: T => SpecificRecord, ev2: Manifest[T]): RDD[T] = {

This comment has been minimized.

@laserson

laserson Feb 25, 2015

Contributor

Is there still a use for keeping this public?

@laserson

laserson Feb 25, 2015

Contributor

Is there still a use for keeping this public?

This comment has been minimized.

@fnothaft

fnothaft Feb 25, 2015

Member

You need it to be private[rdd] so that you can unit test the function itself.

@fnothaft

fnothaft Feb 25, 2015

Member

You need it to be private[rdd] so that you can unit test the function itself.

Show outdated Hide outdated adam-core/src/main/scala/org/bdgenomics/adam/rdd/ADAMContext.scala
if (filePath.endsWith(".ifq")) {
def loadInterleavedFastq[U <: ADAMPredicate[AlignmentRecord]](

This comment has been minimized.

@laserson

laserson Feb 25, 2015

Contributor

In cases where we don't push the predicates into Parquet, does it make sense for the load function to take a predicate? What's the point? The user could just apply the filter themselves afterwards, and everything is more transparent.

@laserson

laserson Feb 25, 2015

Contributor

In cases where we don't push the predicates into Parquet, does it make sense for the load function to take a predicate? What's the point? The user could just apply the filter themselves afterwards, and everything is more transparent.

This comment has been minimized.

@fnothaft

fnothaft Feb 25, 2015

Member

The predicate is optional with a default of None; if the user doesn't want to use a predicate they just don't provide a predicate.

@fnothaft

fnothaft Feb 25, 2015

Member

The predicate is optional with a default of None; if the user doesn't want to use a predicate they just don't provide a predicate.

This comment has been minimized.

@laserson

laserson Feb 27, 2015

Contributor

I understand they don't have to provide a predicate. But you're essentially replicating the API that the RDD already gives them, namely .filter(). If they want to filter, they should call .filter().

@laserson

laserson Feb 27, 2015

Contributor

I understand they don't have to provide a predicate. But you're essentially replicating the API that the RDD already gives them, namely .filter(). If they want to filter, they should call .filter().

}
private def maybeLoadFasta[U <: ADAMPredicate[NucleotideContigFragment]](
def loadFasta[U <: ADAMPredicate[NucleotideContigFragment]](
filePath: String,
predicate: Option[Class[U]] = None,

This comment has been minimized.

@laserson

laserson Feb 25, 2015

Contributor

Does this fn even use the predicate?

@laserson

laserson Feb 25, 2015

Contributor

Does this fn even use the predicate?

This comment has been minimized.

@fnothaft

fnothaft Feb 25, 2015

Member

Good catch, will fix.

@fnothaft

fnothaft Feb 25, 2015

Member

Good catch, will fix.

Show outdated Hide outdated adam-core/src/main/scala/org/bdgenomics/adam/rdd/ADAMContext.scala
ContextUtil.getConfiguration(job))
if (Metrics.isRecording) records.instrument() else records
applyPredicate(records.map(p => vcc.convertToAnnotation(p._2.get)), predicate)

This comment has been minimized.

@laserson

laserson Feb 25, 2015

Contributor

Can we delete the applyPredicate fn? Shouldn't that be taken care of by RDD.filter()?

@laserson

laserson Feb 25, 2015

Contributor

Can we delete the applyPredicate fn? Shouldn't that be taken care of by RDD.filter()?

This comment has been minimized.

@fnothaft

fnothaft Feb 25, 2015

Member

applyPredicate is a convenience function to avoid needing to RDD.filter() over and over again.

@fnothaft

fnothaft Feb 25, 2015

Member

applyPredicate is a convenience function to avoid needing to RDD.filter() over and over again.

This comment has been minimized.

@laserson

laserson Feb 27, 2015

Contributor

I'm not sure I understand how it helps. Could you give an example?

@laserson

laserson Feb 27, 2015

Contributor

I'm not sure I understand how it helps. Could you give an example?

filePath: String,
predicate: Option[Class[U]] = None,
projection: Option[Schema] = None): RDD[DatabaseVariantAnnotation] = {
adamLoad[DatabaseVariantAnnotation, U](filePath, predicate, projection)
}
def loadVariantAnnotations[U <: ADAMPredicate[DatabaseVariantAnnotation]](

This comment has been minimized.

@laserson

laserson Feb 25, 2015

Contributor

Should this fn be targeted only at ADAM/Parquet data? Or is this the fn that does some "magic" to try to load whatever works?

@laserson

laserson Feb 25, 2015

Contributor

Should this fn be targeted only at ADAM/Parquet data? Or is this the fn that does some "magic" to try to load whatever works?

This comment has been minimized.

@fnothaft

fnothaft Feb 25, 2015

Member

This is the magical one. The ADAM/Parquet one is loadParquetVariantAnnotations right above this.

@fnothaft

fnothaft Feb 25, 2015

Member

This is the magical one. The ADAM/Parquet one is loadParquetVariantAnnotations right above this.

This comment has been minimized.

@laserson

laserson Feb 27, 2015

Contributor

Should we have a more distinct naming scheme for "magic" vs specific load fns? Or it'll be clear enough from docs?

@laserson

laserson Feb 27, 2015

Contributor

Should we have a more distinct naming scheme for "magic" vs specific load fns? Or it'll be clear enough from docs?

This comment has been minimized.

@fnothaft

fnothaft Feb 27, 2015

Member

We should probably have a more distinct scheme.

@fnothaft

fnothaft Feb 27, 2015

Member

We should probably have a more distinct scheme.

new GeneFeatureRDDFunctions(maybeLoadGTF(filePath)
.fold(adamLoad[Feature, U](filePath, predicate, projection))(applyPredicate(_, predicate)))
.asGenes()
import ADAMContext._

This comment has been minimized.

@laserson

laserson Feb 25, 2015

Contributor

What does this do?

@laserson

laserson Feb 25, 2015

Contributor

What does this do?

This comment has been minimized.

@fnothaft

fnothaft Feb 25, 2015

Member

It is necessary to bring in .asGenes().

@fnothaft

fnothaft Feb 25, 2015

Member

It is necessary to bring in .asGenes().

@tdanford

This comment has been minimized.

Show comment
Hide comment
@tdanford

tdanford Feb 26, 2015

Contributor

This isn't ready to merge yet, @fnothaft , you're still addressing uri's concerns, right?

Contributor

tdanford commented Feb 26, 2015

This isn't ready to merge yet, @fnothaft , you're still addressing uri's concerns, right?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Feb 26, 2015

Member

Yeah, there are a few small updates needed; I'll get to them this PM.
This needs a rebase anyways.

Member

fnothaft commented Feb 26, 2015

Yeah, there are a few small updates needed; I'll get to them this PM.
This needs a rebase anyways.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Feb 27, 2015

Member

Rebased and updated.

Member

fnothaft commented Feb 27, 2015

Rebased and updated.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Feb 27, 2015

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/614/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/614/
Test PASSed.

@laserson

This comment has been minimized.

Show comment
Hide comment
@laserson

laserson Mar 2, 2015

Contributor

@fnothaft is this ready to go? I'll open a new issue about eliminating applyPredicate where we can discuss (and also that only loading functions that use Parquet should accept predicates).

Contributor

laserson commented Mar 2, 2015

@fnothaft is this ready to go? I'll open a new issue about eliminating applyPredicate where we can discuss (and also that only loading functions that use Parquet should accept predicates).

@laserson

This comment has been minimized.

Show comment
Hide comment
@laserson

laserson Mar 2, 2015

Contributor

Reference #608 and #609 for the predicate issues.

Contributor

laserson commented Mar 2, 2015

Reference #608 and #609 for the predicate issues.

@laserson

This comment has been minimized.

Show comment
Hide comment
@laserson

laserson Mar 4, 2015

Contributor

@fnothaft let me know when you rebase this

Contributor

laserson commented Mar 4, 2015

@fnothaft let me know when you rebase this

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Mar 5, 2015

Member

@laserson rebased.

Member

fnothaft commented Mar 5, 2015

@laserson rebased.

laserson added a commit that referenced this pull request Mar 5, 2015

Merge pull request #588 from fnothaft/public-loaders
[ADAM-587] Clean up loading checks.

@laserson laserson merged commit 6f967b9 into bigdatagenomics:master Mar 5, 2015

@laserson

This comment has been minimized.

Show comment
Hide comment
@laserson

laserson Mar 5, 2015

Contributor

Thanks, @fnothaft!

Contributor

laserson commented Mar 5, 2015

Thanks, @fnothaft!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment