Cleanup in org.bdgenomics.adam.converters package. #1062

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
3 participants
@fnothaft
Member

fnothaft commented Jul 1, 2016

  • Remove unused GenotypesToVariantsConverter.
  • Made all classes package private to [converters] if they had no external references, or package private to [adam] if they were referenced elsewhere in ADAM.
  • Added documentation where necessary.
  • Added SupportedHeaderLines object. This contains code previously in VariantAnnotationConverter that was called in the ADAMVCFOutputFormat. This allows us to make VariantAnnotationConverter package private to converters and allows us to make the AttrKeys class and object private.
  • Additionally, pulled mergeAnnotations out to org.bdgenomics.adam.models.VariantContext, which is also needed to make VariantAnnotationConverter package private to converters. This also cleaned up some code in org.bdgenomics.adam.rdd.variation.VariationRDDFunctions.
@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 1, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1290/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1290/
Test PASSed.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 1, 2016

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1291/

Build result: FAILURE

[...truncated 24 lines...]Triggering ADAM-prb ? 2.6.0,2.10,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.3.1,centosADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1291/

Build result: FAILURE

[...truncated 24 lines...]Triggering ADAM-prb ? 2.6.0,2.10,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.3.1,centosADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 1, 2016

Member

Jenkins, retest this please.

Member

fnothaft commented Jul 1, 2016

Jenkins, retest this please.

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 1, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1292/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1292/
Test PASSed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jul 1, 2016

Member

Thanks, I will probably want this merged before I pull request on the VariantAnnotation stuff (which unfortunately has some name collisions with what is there already). Will review this afternoon.

Member

heuermh commented Jul 1, 2016

Thanks, I will probably want this merged before I pull request on the VariantAnnotation stuff (which unfortunately has some name collisions with what is there already). Will review this afternoon.

- def convert(vc: VariantContext, annotation: DatabaseVariantAnnotation): DatabaseVariantAnnotation = {
+ /**
+ * Remaps fields from an htsjdk variant context into a site annotation.
+ *

This comment has been minimized.

@heuermh

heuermh Jul 1, 2016

Member

I've been searching for a better name for DatabaseVariantAnnotation, could VariantSiteAnnotation be it?

@heuermh

heuermh Jul 1, 2016

Member

I've been searching for a better name for DatabaseVariantAnnotation, could VariantSiteAnnotation be it?

This comment has been minimized.

@fnothaft

fnothaft Jul 4, 2016

Member

I would +1 that! DatabaseVariantAnnotation is totes clunky.

@fnothaft

fnothaft Jul 4, 2016

Member

I would +1 that! DatabaseVariantAnnotation is totes clunky.

+ * @param alt The alternate allele to use for the site. If not provided, no
+ * alternate allele will be defined.
+ * @return Returns an Avro description of the genotyped site.
+ */
private def createADAMVariant(vc: BroadVariantContext, alt: Option[String]): Variant = {

This comment has been minimized.

@heuermh

heuermh Jul 1, 2016

Member

rename BroadVariantContextHtsjdkVariantContext?

@heuermh

heuermh Jul 1, 2016

Member

rename BroadVariantContextHtsjdkVariantContext?

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jul 1, 2016

Member

For #1044 I currently have a method

def createVariantAnnotations(variant: Variant, vc: VariantContext): VariantAnnotation = {
  ...

that pulls the ANN keyed INFO field attribute out of the variant context and populates an avro VariantAnnotation record (which contains an array of TranscriptEffect records).

This could be wrapped in a third convert method in VariantAnnotationConverter, however going back to bigdatagenomics/bdg-formats#41 and related discussion around bigdatagenomics/bdg-formats#67, is it really a good idea to have three different code paths around data associated with Variant? Do we need three different sets of schema records?

Member

heuermh commented Jul 1, 2016

For #1044 I currently have a method

def createVariantAnnotations(variant: Variant, vc: VariantContext): VariantAnnotation = {
  ...

that pulls the ANN keyed INFO field attribute out of the variant context and populates an avro VariantAnnotation record (which contains an array of TranscriptEffect records).

This could be wrapped in a third convert method in VariantAnnotationConverter, however going back to bigdatagenomics/bdg-formats#41 and related discussion around bigdatagenomics/bdg-formats#67, is it really a good idea to have three different code paths around data associated with Variant? Do we need three different sets of schema records?

Cleanup in org.bdgenomics.adam.converters package.
* Remove unused GenotypesToVariantsConverter.
* Made all classes package private to [converters] if they had no external
  references, or package private to [adam] if they were referenced elsewhere
  in ADAM.
* Added documentation where necessary.
* Added SupportedHeaderLines object. This contains code previously in
  VariantAnnotationConverter that was called in the ADAMVCFOutputFormat. This
  allows us to make VariantAnnotationConverter package private to converters
  and allows us to make the AttrKeys class and object private.
* Additionally, pulled `mergeAnnotations` out to
  org.bdgenomics.adam.models.VariantContext, which is also needed to make
  VariantAnnotationConverter package private to converters. This also cleaned up
  some code in org.bdgenomics.adam.rdd.variation.VariationRDDFunctions.
* Rename `BroadVariantContext` to `HtsjdkVariantContext`.
@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 4, 2016

Member

Addressed comment RE: HtsjdkVariantContext. That's a better name, given the repackaging in htsjdk. VariantContext used to live in org.broadinstitute.variant, IIRC.

Member

fnothaft commented Jul 4, 2016

Addressed comment RE: HtsjdkVariantContext. That's a better name, given the repackaging in htsjdk. VariantContext used to live in org.broadinstitute.variant, IIRC.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 4, 2016

Member

Thanks for bringing the following up, @heuermh:

This could be wrapped in a third convert method in VariantAnnotationConverter, however going back to bigdatagenomics/bdg-formats#41 and related discussion around bigdatagenomics/bdg-formats#67, is it really a good idea to have three different code paths around data associated with Variant? Do we need three different sets of schema records?

I agree. I like having annotation and genotype separate, but I don't think we need two different annotation records (for 3 total paths). Perhaps we can sit down this next week and go through the paths with a fine toothed comb? I hadn't gone through this part of the codebase line-by-line in a while, and it was harrier than I remembered...

Member

fnothaft commented Jul 4, 2016

Thanks for bringing the following up, @heuermh:

This could be wrapped in a third convert method in VariantAnnotationConverter, however going back to bigdatagenomics/bdg-formats#41 and related discussion around bigdatagenomics/bdg-formats#67, is it really a good idea to have three different code paths around data associated with Variant? Do we need three different sets of schema records?

I agree. I like having annotation and genotype separate, but I don't think we need two different annotation records (for 3 total paths). Perhaps we can sit down this next week and go through the paths with a fine toothed comb? I hadn't gone through this part of the codebase line-by-line in a while, and it was harrier than I remembered...

@AmplabJenkins

This comment has been minimized.

Show comment
Hide comment
@AmplabJenkins

AmplabJenkins Jul 4, 2016

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1298/
Test PASSed.

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1298/
Test PASSed.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jul 7, 2016

Member

LGTM

Member

heuermh commented Jul 7, 2016

LGTM

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jul 7, 2016

Member

Merged commit 038680c

Thank you, @fnothaft!

Member

heuermh commented Jul 7, 2016

Merged commit 038680c

Thank you, @fnothaft!

@heuermh heuermh closed this Jul 7, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment