New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added ReferenceMapping for Genotype, filterByOverlappingRegion for GenotypeRDDFunctions #470

Merged
merged 1 commit into from Nov 14, 2014

Conversation

Projects
None yet
4 participants
@erictu
Member

erictu commented Nov 9, 2014

No description provided.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Nov 9, 2014

Can one of the admins verify this patch?

AmplabJenkins commented Nov 9, 2014

Can one of the admins verify this patch?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Nov 9, 2014

Member

Jenkins, add to whitelist.

Member

fnothaft commented Nov 9, 2014

Jenkins, add to whitelist.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Nov 9, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/386/
Test PASSed.

AmplabJenkins commented Nov 9, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/386/
Test PASSed.

@erictu erictu changed the title from added ReferenceMapping for Genotype to added ReferenceMapping for Genotype, filterByOverlappingRegion for GenotypeRDDFunctions Nov 9, 2014

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Nov 9, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/387/
Test PASSed.

AmplabJenkins commented Nov 9, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/387/
Test PASSed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Nov 10, 2014

Member

Thanks @erictu! This looks great. Can you squash down to a single commit?

Member

fnothaft commented Nov 10, 2014

Thanks @erictu! This looks great. Can you squash down to a single commit?

def filterByOverlappingRegion(query: ReferenceRegion): RDD[Genotype] = {
def overlapsQuery(rec: Genotype): Boolean =
rec.getVariant.getContig.getContigName.toString == query.referenceName &&

This comment has been minimized.

@massie

massie Nov 10, 2014

Member

Seeing this chain of getters makes me wonder if we should protect against null values? Using Option() and orNull?

@massie

massie Nov 10, 2014

Member

Seeing this chain of getters makes me wonder if we should protect against null values? Using Option() and orNull?

This comment has been minimized.

@fnothaft

fnothaft Nov 10, 2014

Member

I think the general direction we're moving in (e.g., #469) is to require the API caller to provide input with valid mappings.

From a practical standpoint though, variants are meaningless without being anchored to a reference position. If you get to the point where you have variants without mappings attached to them, you've done something wrong.

@fnothaft

fnothaft Nov 10, 2014

Member

I think the general direction we're moving in (e.g., #469) is to require the API caller to provide input with valid mappings.

From a practical standpoint though, variants are meaningless without being anchored to a reference position. If you get to the point where you have variants without mappings attached to them, you've done something wrong.

This comment has been minimized.

@massie

massie Nov 10, 2014

Member

For example, since you use the variant three times, you can define it once using...

val variant = Option(rec.getVariant)

And then use pattern matching or a simple conditional to skip genotypes without a variant

@massie

massie Nov 10, 2014

Member

For example, since you use the variant three times, you can define it once using...

val variant = Option(rec.getVariant)

And then use pattern matching or a simple conditional to skip genotypes without a variant

This comment has been minimized.

@fnothaft

fnothaft Nov 10, 2014

Member

Genotypes without a variant are ill formed... IMO, we shouldn't check for that case. If we did want to validate that genotypes had a variant attached, it should only be via an assert.

@fnothaft

fnothaft Nov 10, 2014

Member

Genotypes without a variant are ill formed... IMO, we shouldn't check for that case. If we did want to validate that genotypes had a variant attached, it should only be via an assert.

This comment has been minimized.

@fnothaft

fnothaft Nov 10, 2014

Member

Also, variants are not nullable in the genotype schema:

record Genotype {
  /**
   The variant called at this site.
   */
  Variant variant;

https://github.com/bigdatagenomics/bdg-formats/pull/39/files changes this, although it is arguable as to whether that is a desirable change (desirable because it makes everything nullable, undesirable because genotypes without variants are meaningless).

@fnothaft

fnothaft Nov 10, 2014

Member

Also, variants are not nullable in the genotype schema:

record Genotype {
  /**
   The variant called at this site.
   */
  Variant variant;

https://github.com/bigdatagenomics/bdg-formats/pull/39/files changes this, although it is arguable as to whether that is a desirable change (desirable because it makes everything nullable, undesirable because genotypes without variants are meaningless).

This comment has been minimized.

@massie

massie Nov 10, 2014

Member

Would you ever want to project a Genotype without the variant? If so, then you need to make it nullable. If not, then we don't need to worry about the null check here as you say.

@massie

massie Nov 10, 2014

Member

Would you ever want to project a Genotype without the variant? If so, then you need to make it nullable. If not, then we don't need to worry about the null check here as you say.

This comment has been minimized.

@fnothaft

fnothaft Nov 10, 2014

Member

If so, then you need to make it nullable. If not, then we don't need to worry about the null check here as you say.

Since it isn't nullable now, no reason to do the check.

Would you ever want to project a Genotype without the variant?

Let's discuss this on the PR that makes the change. Conceivably, you could, but not sure if it makes sense to allow people to.

@fnothaft

fnothaft Nov 10, 2014

Member

If so, then you need to make it nullable. If not, then we don't need to worry about the null check here as you say.

Since it isn't nullable now, no reason to do the check.

Would you ever want to project a Genotype without the variant?

Let's discuss this on the PR that makes the change. Conceivably, you could, but not sure if it makes sense to allow people to.

This comment has been minimized.

@massie

massie Nov 10, 2014

Member

+1

-Matt

On Sun, Nov 9, 2014 at 9:38 PM, Frank Austin Nothaft <
notifications@github.com> wrote:

In
adam-core/src/main/scala/org/bdgenomics/adam/rdd/variation/VariationRDDFunctions.scala:

@@ -101,4 +102,12 @@ class GenotypeRDDFunctions(rdd: RDD[Genotype]) extends Serializable with Logging

 bySample

}
+

  • def filterByOverlappingRegion(query: ReferenceRegion): RDD[Genotype] = {
  • def overlapsQuery(rec: Genotype): Boolean =
  •  rec.getVariant.getContig.getContigName.toString == query.referenceName &&
    

If so, then you need to make it nullable. If not, then we don't need to
worry about the null check here as you say.

Since it isn't nullable now, no reason to do the check.

Would you ever want to project a Genotype without the variant?

Let's discuss this on the PR
https://github.com/bigdatagenomics/bdg-formats/pull/39/files that makes
the change. Conceivably, you could, but not sure if it makes sense to
allow people to.


Reply to this email directly or view it on GitHub
https://github.com/bigdatagenomics/adam/pull/470/files#r20064839.

@massie

massie Nov 10, 2014

Member

+1

-Matt

On Sun, Nov 9, 2014 at 9:38 PM, Frank Austin Nothaft <
notifications@github.com> wrote:

In
adam-core/src/main/scala/org/bdgenomics/adam/rdd/variation/VariationRDDFunctions.scala:

@@ -101,4 +102,12 @@ class GenotypeRDDFunctions(rdd: RDD[Genotype]) extends Serializable with Logging

 bySample

}
+

  • def filterByOverlappingRegion(query: ReferenceRegion): RDD[Genotype] = {
  • def overlapsQuery(rec: Genotype): Boolean =
  •  rec.getVariant.getContig.getContigName.toString == query.referenceName &&
    

If so, then you need to make it nullable. If not, then we don't need to
worry about the null check here as you say.

Since it isn't nullable now, no reason to do the check.

Would you ever want to project a Genotype without the variant?

Let's discuss this on the PR
https://github.com/bigdatagenomics/bdg-formats/pull/39/files that makes
the change. Conceivably, you could, but not sure if it makes sense to
allow people to.


Reply to this email directly or view it on GitHub
https://github.com/bigdatagenomics/adam/pull/470/files#r20064839.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Nov 11, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/388/
Test PASSed.

AmplabJenkins commented Nov 11, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/388/
Test PASSed.

@massie

This comment has been minimized.

Show comment
Hide comment
@massie

massie Nov 12, 2014

Member

If you squash this down to a single commit and rebase on master, I think we're ready to merge this.

Member

massie commented Nov 12, 2014

If you squash this down to a single commit and rebase on master, I think we're ready to merge this.

added ReferenceMapping for Genotype
added filterByOverlappingRegion for GenotypeRDDFunctions
@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Nov 14, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/401/
Test PASSed.

AmplabJenkins commented Nov 14, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/401/
Test PASSed.

fnothaft added a commit that referenced this pull request Nov 14, 2014

Merge pull request #470 from erictu/master
added ReferenceMapping for Genotype, filterByOverlappingRegion for GenotypeRDDFunctions

@fnothaft fnothaft merged commit 17cf66b into bigdatagenomics:master Nov 14, 2014

1 check passed

default Merged build finished.
Details
@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Nov 14, 2014

Member

Merged! Thanks @erictu!

Member

fnothaft commented Nov 14, 2014

Merged! Thanks @erictu!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment