added loadIndexedVcf and loadIndexedBam for multiple ReferenceRegions #1096

Merged
merged 1 commit into from Aug 5, 2016

Conversation

Projects
None yet
5 participants
@akmorrow13
Contributor

akmorrow13 commented Jul 31, 2016

No description provided.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 31, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1367/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1367/
Test PASSed.

+ * @return Returns a VariantContextRDD.
+ */
+ def loadIndexedVcf(filePath: String,
+ viewRegion: ReferenceRegion,

This comment has been minimized.

@heuermh

heuermh Aug 1, 2016

Member

Would it be useful to support multiple view regions?

@heuermh

heuermh Aug 1, 2016

Member

Would it be useful to support multiple view regions?

This comment has been minimized.

@akmorrow13

akmorrow13 Aug 1, 2016

Contributor

I'm not sure, but then we should support multiple regions for loadIndexedBam as well

@akmorrow13

akmorrow13 Aug 1, 2016

Contributor

I'm not sure, but then we should support multiple regions for loadIndexedBam as well

This comment has been minimized.

@heuermh

heuermh Aug 1, 2016

Member

Good point. If this were a Java API, I would propose two methods for both, one with vararg and one with Iterable

void loadIndexedVcf(String filePath, ReferenceRegion... viewRegions) { ... }
void loadIndexedVcf(String filePath, Iterable<ReferenceRegion> viewRegions) { ... }

I'm not sure what to do in Scala, and then there's the complication of the defaulted sdOpt parameter, and that varargs parameters need to be last . . .

@heuermh

heuermh Aug 1, 2016

Member

Good point. If this were a Java API, I would propose two methods for both, one with vararg and one with Iterable

void loadIndexedVcf(String filePath, ReferenceRegion... viewRegions) { ... }
void loadIndexedVcf(String filePath, Iterable<ReferenceRegion> viewRegions) { ... }

I'm not sure what to do in Scala, and then there's the complication of the defaulted sdOpt parameter, and that varargs parameters need to be last . . .

This comment has been minimized.

@akmorrow13

akmorrow13 Aug 2, 2016

Contributor

I don't know how to get around this either. We could require that the user defines an Iterable[ReferenceRegion] to being with, but that is messy. Anyone else have a recommendation for this issue? There is some talk of supporting overloading in Scala 2.12, but that is a ways off.

@akmorrow13

akmorrow13 Aug 2, 2016

Contributor

I don't know how to get around this either. We could require that the user defines an Iterable[ReferenceRegion] to being with, but that is messy. Anyone else have a recommendation for this issue? There is some talk of supporting overloading in Scala 2.12, but that is a ways off.

This comment has been minimized.

@heuermh

heuermh Aug 2, 2016

Member

Well, the vararg thing is just a convenience. For Scala, couldn't we just have two methods

def loadIndexedVcf(filePath: String, viewRegion: ReferenceRegion, sdOpt: Option[SequenceDictionary] ...
def loadIndexedVcf(filePath: String, viewRegions: Iterable[ReferenceRegion], sdOpt: Option[SequenceDictionary] ...
@heuermh

heuermh Aug 2, 2016

Member

Well, the vararg thing is just a convenience. For Scala, couldn't we just have two methods

def loadIndexedVcf(filePath: String, viewRegion: ReferenceRegion, sdOpt: Option[SequenceDictionary] ...
def loadIndexedVcf(filePath: String, viewRegions: Iterable[ReferenceRegion], sdOpt: Option[SequenceDictionary] ...

This comment has been minimized.

@akmorrow13

akmorrow13 Aug 2, 2016

Contributor

No, you cannot overload methods in scala with the same number of parameters.

@akmorrow13

akmorrow13 Aug 2, 2016

Contributor

No, you cannot overload methods in scala with the same number of parameters.

This comment has been minimized.

@heuermh

heuermh Aug 2, 2016

Member

Ah, well that's dumb. 😄

@heuermh

heuermh Aug 2, 2016

Member

Ah, well that's dumb. 😄

This comment has been minimized.

@akmorrow13

akmorrow13 Aug 2, 2016

Contributor

Correction, the issue is sdOpt: Option[SequenceDictionary] = None. This throws an error for overloaded methods defining default arguments.

@akmorrow13

akmorrow13 Aug 2, 2016

Contributor

Correction, the issue is sdOpt: Option[SequenceDictionary] = None. This throws an error for overloaded methods defining default arguments.

This comment has been minimized.

@fnothaft

fnothaft Aug 4, 2016

Member

I would remove sdOpt (here and everywhere).

@fnothaft

fnothaft Aug 4, 2016

Member

I would remove sdOpt (here and everywhere).

This comment has been minimized.

@fnothaft

fnothaft Aug 4, 2016

Member

Also, I would make a ticket for the multiple reference regions. I think it is a logical thing for us to support.

@fnothaft

fnothaft Aug 4, 2016

Member

Also, I would make a ticket for the multiple reference regions. I think it is a logical thing for us to support.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Aug 2, 2016

Member

LGTM

Member

heuermh commented Aug 2, 2016

LGTM

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Aug 4, 2016

Member

LGTM

Member

jpdna commented Aug 4, 2016

LGTM

+
+ val conf = ContextUtil.getConfiguration(job)
+ if (viewRegion.isDefined) {
+ val interval: Interval = new Interval(viewRegion.get.referenceName, viewRegion.get.start.toInt, viewRegion.get.end.toInt)

This comment has been minimized.

@fnothaft

fnothaft Aug 4, 2016

Member

IIRC, this is off by 1. I always lose track of this, but I am 95% sure that HTSJDK is 1-based closed intervals, while we're 0-based open intervals.

@fnothaft

fnothaft Aug 4, 2016

Member

IIRC, this is off by 1. I always lose track of this, but I am 95% sure that HTSJDK is 1-based closed intervals, while we're 0-based open intervals.

This comment has been minimized.

@heuermh

heuermh Aug 4, 2016

Member

+1

This comment has been minimized.

@akmorrow13

akmorrow13 Aug 4, 2016

Contributor

It doesn't seem that this is a problem for either vcf or bam indexes. If I start a region at 10 and the variant also starts at position 10, it will still retrieve that variant.

@akmorrow13

akmorrow13 Aug 4, 2016

Contributor

It doesn't seem that this is a problem for either vcf or bam indexes. If I start a region at 10 and the variant also starts at position 10, it will still retrieve that variant.

+
+ sparkTest("loadIndexedVcf") {
+ val path = resourcePath("bqsr1.vcf")
+ val refRegion = ReferenceRegion("22", 0, 16050822)

This comment has been minimized.

@fnothaft

fnothaft Aug 4, 2016

Member

Can we use a more specific predicate than this? Ideally, we'd use one that illustrates that there is not a coordinate conversion issue.

@fnothaft

fnothaft Aug 4, 2016

Member

Can we use a more specific predicate than this? Ideally, we'd use one that illustrates that there is not a coordinate conversion issue.

+ sparkTest("loadIndexedVcf") {
+ val path = resourcePath("bqsr1.vcf")
+ val refRegion = ReferenceRegion("22", 0, 16050822)
+ val reads2 = sc.loadIndexedVcf(path, refRegion)

This comment has been minimized.

@fnothaft

fnothaft Aug 4, 2016

Member

s/reads/variants/g | tr -d 2

@fnothaft

fnothaft Aug 4, 2016

Member

s/reads/variants/g | tr -d 2

@akmorrow13 akmorrow13 changed the title from added loadIndexedVcf to added loadIndexedVcf and loadIndexedBam for multiple ReferenceRegions Aug 4, 2016

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Aug 4, 2016

Member

Thanks for confirming that the index bit is not an issue, @akmorrow13!

Member

fnothaft commented Aug 4, 2016

Thanks for confirming that the index bit is not an issue, @akmorrow13!

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Aug 4, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1368/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1368/
Test PASSed.

@fnothaft fnothaft merged commit 2544a3d into bigdatagenomics:master Aug 5, 2016

1 check passed

default Merged build finished.
Details
@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Aug 5, 2016

Member

Merged! Thanks @akmorrow13!

Member

fnothaft commented Aug 5, 2016

Merged! Thanks @akmorrow13!

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Aug 5, 2016

Member

Also, just to confirm, there wasn't an issue for this, right? If there was, can you close the issue? I looked through the tracker but a cursory search didn't point me to one.

Member

fnothaft commented Aug 5, 2016

Also, just to confirm, there wasn't an issue for this, right? If there was, can you close the issue? I looked through the tracker but a cursory search didn't point me to one.

@akmorrow13 akmorrow13 referenced this pull request in bigdatagenomics/mango Oct 19, 2016

Closed

Support VCF Indexing #45

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment