[ADAM-1044] Support VCF annotation ANN field #1069

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
4 participants
@heuermh
Member

heuermh commented Jul 6, 2016

Work towards #1044.
Will need further discussion around conversion from VariantContext as mentioned in #1062.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 6, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1301/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1301/
Test PASSed.

@heuermh heuermh modified the milestone: 0.20.0 Jul 18, 2016

@heuermh heuermh changed the title from Support VCF annotation ANN field to [ADAM-1044] Support VCF annotation ANN field Jul 18, 2016

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jul 19, 2016

Member

Rebased. This can go in as-is, the further work described above will go in #1078.

Member

heuermh commented Jul 19, 2016

Rebased. This can go in as-is, the further work described above will go in #1078.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 19, 2016

Member

Thanks @heuermh! I will make a review pass today.

Member

fnothaft commented Jul 19, 2016

Thanks @heuermh! I will make a review pass today.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 19, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1353/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1353/
Test PASSed.

+import org.bdgenomics.utils.misc.Logging
+import scala.collection.JavaConverters._
+
+object VariantAnnotations extends Serializable with Logging {

This comment has been minimized.

@fnothaft

fnothaft Jul 19, 2016

Member

VariantAnnotations --> VariantAnnotationsConverter, to be consistent with the rest of the package?

@fnothaft

fnothaft Jul 19, 2016

Member

VariantAnnotations --> VariantAnnotationsConverter, to be consistent with the rest of the package?

This comment has been minimized.

@heuermh

heuermh Jul 19, 2016

Member

In #1078 the code in this class will be merged with the already existing VariantAnnotationConverter. Would you still want that change here?

@heuermh

heuermh Jul 19, 2016

Member

In #1078 the code in this class will be merged with the already existing VariantAnnotationConverter. Would you still want that change here?

This comment has been minimized.

@fnothaft

fnothaft Jul 19, 2016

Member

Oh, good point. Nevermind then!

@fnothaft

fnothaft Jul 19, 2016

Member

Oh, good point. Nevermind then!

+
+import htsjdk.variant.vcf.VCFConstants
+import htsjdk.variant.variantcontext.VariantContext
+import org.bdgenomics.formats.avro.{ TranscriptEffect, Variant, VariantAnnotation, VariantAnnotationMessage }

This comment has been minimized.

@fnothaft

fnothaft Jul 19, 2016

Member

Split long line.

@fnothaft

fnothaft Jul 19, 2016

Member

Split long line.

+import htsjdk.variant.vcf.VCFConstants
+import htsjdk.variant.variantcontext.VariantContext
+import org.bdgenomics.formats.avro.{ TranscriptEffect, Variant, VariantAnnotation, VariantAnnotationMessage }
+import org.bdgenomics.formats.avro.VariantAnnotationMessage._

This comment has been minimized.

@fnothaft

fnothaft Jul 19, 2016

Member

Prefer ._ imports above imports from same package.

@fnothaft

fnothaft Jul 19, 2016

Member

Prefer ._ imports above imports from same package.

+ private[converters] def parseTranscriptEffect(s: String): Seq[TranscriptEffect] = {
+ val fields = s.split("\\|", -1)
+ if (fields.length < 16) {
+ if (log.isWarnEnabled) log.warn("Invalid VCF ANN attribute, expected at least 16 fields, found {}: {}", fields.length, fields)

This comment has been minimized.

@fnothaft

fnothaft Jul 19, 2016

Member

What's the log.isWarnEnabled for? Might this be a better place to use a ValidationStringency check?

@fnothaft

fnothaft Jul 19, 2016

Member

What's the log.isWarnEnabled for? Might this be a better place to use a ValidationStringency check?

This comment has been minimized.

@heuermh

heuermh Jul 19, 2016

Member

We don't have log level checks in o.b.utils.Logging methods, so this uses log directly.

ValidationStringency would be nice to have, but would need to be pushed down all the way through VCF parsing, and I'm not sure Hadoop-BAM supports it.

@heuermh

heuermh Jul 19, 2016

Member

We don't have log level checks in o.b.utils.Logging methods, so this uses log directly.

ValidationStringency would be nice to have, but would need to be pushed down all the way through VCF parsing, and I'm not sure Hadoop-BAM supports it.

This comment has been minimized.

@fnothaft

fnothaft Jul 19, 2016

Member

We don't have log level checks in o.b.utils.Logging methods

Really? Drat!

ValidationStringency would be nice to have, but would need to be pushed down all the way through VCF parsing, and I'm not sure Hadoop-BAM supports it.

We call these functions in a map over the RDD of VariantContexts we get from Hadoop-BAM, so we can pass it down. I think it'd be good to have here, as it is likely desirable to fail on a badly formatted annotation line.

@fnothaft

fnothaft Jul 19, 2016

Member

We don't have log level checks in o.b.utils.Logging methods

Really? Drat!

ValidationStringency would be nice to have, but would need to be pushed down all the way through VCF parsing, and I'm not sure Hadoop-BAM supports it.

We call these functions in a map over the RDD of VariantContexts we get from Hadoop-BAM, so we can pass it down. I think it'd be good to have here, as it is likely desirable to fail on a badly formatted annotation line.

This comment has been minimized.

@heuermh

heuermh Jul 19, 2016

Member

ok, will try to push it down then

@heuermh

heuermh Jul 19, 2016

Member

ok, will try to push it down then

+ }
+ }
+
+ private def notEmpty(s: String): Boolean = Option(s) match {

This comment has been minimized.

@fnothaft

fnothaft Jul 19, 2016

Member

Nit: prefer fold over match. E.g.:

Option(s).fold(false)(s => s.nonEmpty)

nonEmpty should come in implicitly from http://www.scala-lang.org/api/2.10.3/index.html#scala.collection.immutable.StringOps.

@fnothaft

fnothaft Jul 19, 2016

Member

Nit: prefer fold over match. E.g.:

Option(s).fold(false)(s => s.nonEmpty)

nonEmpty should come in implicitly from http://www.scala-lang.org/api/2.10.3/index.html#scala.collection.immutable.StringOps.

This comment has been minimized.

@fnothaft

fnothaft Jul 19, 2016

Member

Also, perhaps you could factor this a bit further into:

def setIfNotEmpty(s: String, setFn: String => Unit) {
  Option(s).filter(_.nonEmpty)
    .foreach(setFn)
}
@fnothaft

fnothaft Jul 19, 2016

Member

Also, perhaps you could factor this a bit further into:

def setIfNotEmpty(s: String, setFn: String => Unit) {
  Option(s).filter(_.nonEmpty)
    .foreach(setFn)
}

This comment has been minimized.

@heuermh

heuermh Jul 19, 2016

Member

Avro-generated classes track whether or not a field has been set (enabling compression for un-set fields I would presume) so it is important not to set a field where the value is null or empty. This reads nice; for null the option won't be true, then _.nonEmpty won't NPE, and setFn is only called if s is non-empty.

@heuermh

heuermh Jul 19, 2016

Member

Avro-generated classes track whether or not a field has been set (enabling compression for un-set fields I would presume) so it is important not to set a field where the value is null or empty. This reads nice; for null the option won't be true, then _.nonEmpty won't NPE, and setFn is only called if s is non-empty.

This comment has been minimized.

@fnothaft

fnothaft Jul 19, 2016

Member

Exactly! I could wax philosophical about how I love the Option type in Scala...

@fnothaft

fnothaft Jul 19, 2016

Member

Exactly! I could wax philosophical about how I love the Option type in Scala...

+ val messages = parseMessages(fields(15))
+
+ val te = TranscriptEffect.newBuilder()
+ if (notEmpty(alternateAllele)) te.setAlternateAllele(alternateAllele)

This comment has been minimized.

@fnothaft

fnothaft Jul 19, 2016

Member

If you do https://github.com/bigdatagenomics/adam/pull/1069/files#r71388866, then you can eliminate all of these if statements.

@fnothaft

fnothaft Jul 19, 2016

Member

If you do https://github.com/bigdatagenomics/adam/pull/1069/files#r71388866, then you can eliminate all of these if statements.

+ cdsLength.foreach(te.setCdsLength(_))
+ proteinPosition.foreach(te.setProteinPosition(_))
+ proteinLength.foreach(te.setProteinLength(_))
+ if (notEmpty(distance)) te.setDistance(Integer.parseInt(distance))

This comment has been minimized.

@fnothaft

fnothaft Jul 19, 2016

Member

I might try catch around this to throw a more informative error message if parsing fails.

@fnothaft

fnothaft Jul 19, 2016

Member

I might try catch around this to throw a more informative error message if parsing fails.

This comment has been minimized.

@heuermh

heuermh Jul 20, 2016

Member

I've added the validation stringency stuff for invalid ANN lines but am not sure what to do for NumberFormatExceptions. They aren't explicitly handled elsewhere.

@heuermh

heuermh Jul 20, 2016

Member

I've added the validation stringency stuff for invalid ANN lines but am not sure what to do for NumberFormatExceptions. They aren't explicitly handled elsewhere.

+ if (notEmpty(distance)) te.setDistance(Integer.parseInt(distance))
+ if (!messages.isEmpty) te.setMessages(messages.asJava)
+
+ Seq(te.build())

This comment has been minimized.

@fnothaft

fnothaft Jul 19, 2016

Member

Why are we returning a Seq here?

@fnothaft

fnothaft Jul 19, 2016

Member

Why are we returning a Seq here?

This comment has been minimized.

@heuermh

heuermh Jul 19, 2016

Member

Same pattern used in feature parsing, TranscriptEffect association is an array in VariantAnnotation schema and a java.util.List<TranscriptEffect> in the Avro-generated object.

@heuermh

heuermh Jul 19, 2016

Member

Same pattern used in feature parsing, TranscriptEffect association is an array in VariantAnnotation schema and a java.util.List<TranscriptEffect> in the Avro-generated object.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 19, 2016

Member

Generally looks good. Could you add scaladoc inline?

Also, where will this code attach in? Will it get connected up in #1078?

Member

fnothaft commented Jul 19, 2016

Generally looks good. Could you add scaladoc inline?

Also, where will this code attach in? Will it get connected up in #1078?

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jul 20, 2016

Member

Also, where will this code attach in? Will it get connected up in #1078?

Yes, I'll also add scaladoc at that point, since the method signatures will probably change to look more like the other converter classes.

Member

heuermh commented Jul 20, 2016

Also, where will this code attach in? Will it get connected up in #1078?

Yes, I'll also add scaladoc at that point, since the method signatures will probably change to look more like the other converter classes.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 20, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1357/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1357/
Test PASSed.

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Jul 26, 2016

Member

+1 - this looks good and seems self contained at least looking backwards, I am going to merge now so that related PRs can build on top of.

Member

jpdna commented Jul 26, 2016

+1 - this looks good and seems self contained at least looking backwards, I am going to merge now so that related PRs can build on top of.

@jpdna

This comment has been minimized.

Show comment
Hide comment
@jpdna

jpdna Jul 26, 2016

Member

Merged: c23aace
Thanks @heuermh !

Member

jpdna commented Jul 26, 2016

Merged: c23aace
Thanks @heuermh !

@jpdna jpdna closed this Jul 26, 2016

@heuermh heuermh deleted the heuermh:vcf-ann branch Jul 26, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment