Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-1439] Add inferSequenceDictionary ctr to FeatureRDD. #1447

Merged
merged 2 commits into from Mar 21, 2017

Conversation

@heuermh
Copy link
Member

heuermh commented Mar 21, 2017

Fixes #1439.

Copy link
Member

fnothaft left a comment

LGTM! I have one syntax nit repeated 21 times, and a second syntax nit, but otherwise LGTM.


def run(sc: SparkContext) {
sc.loadFeatures(args.featuresFile, storageLevel, None, Option(args.numPartitions))
sc.loadFeatures(args.featuresFile, optStorageLevel, None, Option(args.numPartitions))

This comment has been minimized.

Copy link
@fnothaft

fnothaft Mar 21, 2017

Member

We should use the named syntax (optStorageLevel = optStorageLevel) for all of these non-default parameters.

minPartitions: Option[Int] = None,
stringency: ValidationStringency = ValidationStringency.LENIENT): FeatureRDD = {
val records = sc.textFile(filePath, minPartitions.getOrElse(sc.defaultParallelism))
.flatMap(new GFF3Parser().parse(_, stringency))
if (Metrics.isRecording) records.instrument() else records
FeatureRDD(records, storageLevel)
FeatureRDD.inferSequenceDictionary(records, optStorageLevel)

This comment has been minimized.

Copy link
@fnothaft

fnothaft Mar 21, 2017

Member

Ditto here RE: named parameter.

minPartitions: Option[Int] = None,
stringency: ValidationStringency = ValidationStringency.LENIENT): FeatureRDD = {
val records = sc.textFile(filePath, minPartitions.getOrElse(sc.defaultParallelism))
.flatMap(new GTFParser().parse(_, stringency))
if (Metrics.isRecording) records.instrument() else records
FeatureRDD(records, storageLevel)
FeatureRDD.inferSequenceDictionary(records, optStorageLevel)

This comment has been minimized.

Copy link
@fnothaft

fnothaft Mar 21, 2017

Member

Ditto here RE: named parameter.

minPartitions: Option[Int] = None,
stringency: ValidationStringency = ValidationStringency.LENIENT): FeatureRDD = {
val records = sc.textFile(filePath, minPartitions.getOrElse(sc.defaultParallelism))
.flatMap(new BEDParser().parse(_, stringency))
if (Metrics.isRecording) records.instrument() else records
FeatureRDD(records, storageLevel)
FeatureRDD.inferSequenceDictionary(records, optStorageLevel)

This comment has been minimized.

Copy link
@fnothaft

fnothaft Mar 21, 2017

Member

Ditto here RE: named parameter.

minPartitions: Option[Int] = None,
stringency: ValidationStringency = ValidationStringency.LENIENT): FeatureRDD = {
val records = sc.textFile(filePath, minPartitions.getOrElse(sc.defaultParallelism))
.flatMap(new NarrowPeakParser().parse(_, stringency))
if (Metrics.isRecording) records.instrument() else records
FeatureRDD(records, storageLevel)
FeatureRDD.inferSequenceDictionary(records, optStorageLevel)

This comment has been minimized.

Copy link
@fnothaft

fnothaft Mar 21, 2017

Member

Ditto here RE: named parameter.

@@ -457,7 +457,7 @@ class FeatureRDDSuite extends ADAMFunSuite with TypeCheckedTripleEquals {
val f2 = fb.setGeneId("gene2").build()
val f3 = fb.clearGeneId().build() // nulls last

val features = FeatureRDD(sc.parallelize(Seq(f3, f2, f1)))
val features = FeatureRDD.inferSequenceDictionary(sc.parallelize(Seq(f3, f2, f1)), None)

This comment has been minimized.

Copy link
@fnothaft

fnothaft Mar 21, 2017

Member

Ditto here RE: named parameter.

@@ -473,7 +473,7 @@ class FeatureRDDSuite extends ADAMFunSuite with TypeCheckedTripleEquals {
val f4 = fb.setGeneId("gene2").setTranscriptId("transcript2").build()
val f5 = fb.setGeneId("gene2").clearTranscriptId().build() // nulls last

val features = FeatureRDD(sc.parallelize(Seq(f5, f4, f3, f2, f1)))
val features = FeatureRDD.inferSequenceDictionary(sc.parallelize(Seq(f5, f4, f3, f2, f1)), None)

This comment has been minimized.

Copy link
@fnothaft

fnothaft Mar 21, 2017

Member

Ditto here RE: named parameter.

@@ -495,7 +495,7 @@ class FeatureRDDSuite extends ADAMFunSuite with TypeCheckedTripleEquals {
val f8 = fb.setGeneId("gene2").setTranscriptId("transcript1").setAttributes(ImmutableMap.of("rank", "2")).build()
val f9 = fb.setGeneId("gene2").setTranscriptId("transcript1").clearAttributes().build() // nulls last

val features = FeatureRDD(sc.parallelize(Seq(f9, f8, f7, f6, f5, f4, f3, f2, f1)))
val features = FeatureRDD.inferSequenceDictionary(sc.parallelize(Seq(f9, f8, f7, f6, f5, f4, f3, f2, f1)), None)

This comment has been minimized.

Copy link
@fnothaft

fnothaft Mar 21, 2017

Member

Ditto here RE: named parameter.

@@ -517,7 +517,7 @@ class FeatureRDDSuite extends ADAMFunSuite with TypeCheckedTripleEquals {
val f4 = fb.setAttributes(ImmutableMap.of("rank", "2")).build()
val f5 = fb.clearAttributes().build() // nulls last

val features = FeatureRDD(sc.parallelize(Seq(f5, f4, f3, f2, f1)))
val features = FeatureRDD.inferSequenceDictionary(sc.parallelize(Seq(f5, f4, f3, f2, f1)), None)

This comment has been minimized.

Copy link
@fnothaft

fnothaft Mar 21, 2017

Member

Ditto here RE: named parameter.

@@ -532,7 +532,7 @@ class FeatureRDDSuite extends ADAMFunSuite with TypeCheckedTripleEquals {
val f2 = Feature.newBuilder().setContigName("chr1").setStart(15).setEnd(20).setScore(2.0).build()
val f3 = Feature.newBuilder().setContigName("chr2").setStart(15).setEnd(20).setScore(2.0).build()

val featureRDD: FeatureRDD = FeatureRDD(sc.parallelize(Seq(f1, f2, f3)))
val featureRDD: FeatureRDD = FeatureRDD.inferSequenceDictionary(sc.parallelize(Seq(f1, f2, f3)), None)

This comment has been minimized.

Copy link
@fnothaft

fnothaft Mar 21, 2017

Member

Ditto here RE: named parameter.

@coveralls
Copy link

coveralls commented Mar 21, 2017

Coverage Status

Coverage increased (+0.08%) to 80.583% when pulling 01af076 on heuermh:infer-seqdict into dbf4f85 on bigdatagenomics:master.

@AmplabJenkins
Copy link

AmplabJenkins commented Mar 21, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1892/
Test PASSed.

@coveralls
Copy link

coveralls commented Mar 21, 2017

Coverage Status

Coverage decreased (-0.1%) to 80.367% when pulling 3a0d2a5 on heuermh:infer-seqdict into dbf4f85 on bigdatagenomics:master.

@AmplabJenkins
Copy link

AmplabJenkins commented Mar 21, 2017

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1893/
Test PASSed.

@fnothaft fnothaft merged commit fd0cb6e into bigdatagenomics:master Mar 21, 2017
1 of 3 checks passed
1 of 3 checks passed
codacy/pr Not so good... This pull request quality could be better.
Details
coverage/coveralls Coverage decreased (-0.1%) to 80.367%
Details
default Merged build finished.
Details
@fnothaft
Copy link
Member

fnothaft commented Mar 21, 2017

Merged! Thanks @heuermh!

@heuermh heuermh deleted the heuermh:infer-seqdict branch Mar 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.