Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sample code snippet showing how to specify combine operations for annotations #4993

Merged
merged 1 commit into from
Oct 3, 2018

Conversation

kgururaj
Copy link
Collaborator

Sample code to show how to modify INFO field combine operation while querying GenomicsDB using the Protobuf API

#4541

ping @ldgauthier @droazen @lbergelson @jamesemery

@kgururaj kgururaj changed the title DO NOT MERGE - #4541 DO NOT MERGE - sample code snippet Jul 10, 2018
@kgururaj kgururaj force-pushed the genomicsdb_info_field_combine_op branch 2 times, most recently from 0c7e9fa to 4cbaa66 Compare August 10, 2018 17:51
@kgururaj
Copy link
Collaborator Author

FeatureDataSource is the relevant file - the rest consists of changes related to other areas.

@droazen droazen self-requested a review August 13, 2018 16:26
@droazen droazen self-assigned this Aug 13, 2018
@codecov-io
Copy link

codecov-io commented Aug 13, 2018

Codecov Report

Merging #4993 into master will decrease coverage by 0.046%.
The diff coverage is 70.423%.

@@              Coverage Diff               @@
##             master     #4993       +/-   ##
==============================================
- Coverage      86.8%   86.754%   -0.046%     
+ Complexity    29781     29770       -11     
==============================================
  Files          1822      1825        +3     
  Lines        137723    137739       +16     
  Branches      15181     15181               
==============================================
- Hits         119544    119494       -50     
- Misses        12662     12727       +65     
- Partials       5517      5518        +1
Impacted Files Coverage Δ Complexity Δ
...ute/hellbender/tools/FixCallSetSampleOrdering.java 73.043% <100%> (+0.971%) 25 <0> (+1) ⬆️
.../hellbender/tools/genomicsdb/GenomicsDBImport.java 76.667% <61.111%> (-0.985%) 52 <1> (ø)
...institute/hellbender/engine/FeatureDataSource.java 76.087% <67.568%> (-0.983%) 49 <4> (+3)
...der/tools/genomicsdb/GenomicsDBImportUnitTest.java 86.207% <80%> (-6.101%) 9 <0> (ø)
...engine/spark/datasources/ReferenceSparkSource.java 0% <0%> (-100%) 0% <0%> (-1%)
...itute/hellbender/engine/spark/KnownSitesCache.java 0% <0%> (-91.667%) 0% <0%> (-8%)
...r/engine/spark/BroadcastJoinReadsWithVariants.java 50% <0%> (-12.5%) 3% <0%> (-1%)
...ls/spark/BaseRecalibratorSparkIntegrationTest.java 39.45% <0%> (-11.63%) 5% <0%> (-1%)
...itute/hellbender/engine/spark/ReadWalkerSpark.java 72.222% <0%> (-5.197%) 8% <0%> (-2%)
...formers/PalindromeArtifactClipReadTransformer.java 94.872% <0%> (-5.128%) 23% <0%> (ø)
... and 22 more

@ldgauthier
Copy link
Contributor

@kgururaj I just tried this out with my annotations and it worked right out of the box! The update was very simple on my end. Ideally it might be nice to define the combine operations as static Strings in the annotation classes, but we can do that on the GATK side.

@ldgauthier
Copy link
Contributor

Just to clarify, the variable_length_descriptor gets set automatically from the VCF header, right? So for allele-specific annotations, the vidmap already knows that they're allele-specific?

@kgururaj
Copy link
Collaborator Author

kgururaj commented Aug 22, 2018

GenomicsDB is too dumb to do anything automatically :)

For the allele specific annotation fields that we know of, we define the type and length descriptor here and this information gets stored in the vid JSON file.

This happens the first time the array is defined and data is imported. Subsequent reads of the vid file obtain the length descriptor and type of the allele specific annotation fields.

@ldgauthier
Copy link
Contributor

So if I needed a new combine operation for an allele-specific annotation, how would I specify that the annotation is allele-specific? Do we need a updateINFOFieldLengthDescriptor like updateINFOFieldCombineOperation?

@kgururaj
Copy link
Collaborator Author

  • If you are adding a new allele specific INFO field, yeah, we would need a updateINFOFieldDescriptor() function. You would need to call this the first time an array is created. You can take the vid that GenomicsDB creates and modify the specific INFO fields of interest.
  • If you wish to specify a new combine operation that doesn't exist in GenomicsDB yet (say element_wise_median), that would involved modifying the C++ code. I haven't documented that anywhere. If this is something that you wish to do, please let me know. I'll try to find a way of supporting such operations.

@ldgauthier
Copy link
Contributor

I don't anticipate needing new combine operations, but being able to specify them for AS annotations would be necessary.

@kgururaj
Copy link
Collaborator Author

kgururaj commented Aug 24, 2018

  • If you are planning to add more allele specific annotations (other than the ones listed here), then I can provide more example code in GATK showing how to set the type and length descriptors.
  • If you simply wish to change the combine operation for existing annotations, the example code in this PR should suffice

@ldgauthier
Copy link
Contributor

ldgauthier commented Aug 24, 2018 via email

@droazen droazen changed the title DO NOT MERGE - sample code snippet DO NOT MERGE - sample code snippet showing how to specify combine operations for annotations Aug 24, 2018
@droazen droazen assigned ldgauthier and kgururaj and unassigned droazen Aug 24, 2018
Copy link
Contributor

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interest of being able to add more allele-specific annotations in the future we need a updateINFOFieldLengthDescriptor or similar.

@droazen droazen self-assigned this Aug 24, 2018
@droazen droazen changed the title DO NOT MERGE - sample code snippet showing how to specify combine operations for annotations sample code snippet showing how to specify combine operations for annotations Aug 24, 2018
@droazen
Copy link
Collaborator

droazen commented Oct 1, 2018

@ldgauthier Could you rebase this branch and make any changes you require? Then we can merge it and rebase your other branch onto this patch.

@ldgauthier ldgauthier force-pushed the genomicsdb_info_field_combine_op branch from 4cbaa66 to fe1fb05 Compare October 1, 2018 18:56
@ldgauthier
Copy link
Contributor

I started looking into allele-specific annotation combine setting and it should be possible from the GATK-side, but then I realized I don't want to write all the tests for it right now. I'll put it in a branch and open an issue, but the gist is that this PR is useful, I want it merged, and my feature request can be addressed by a GATK dev at some point in the future if necessary.

public static GenomicsDBVidMapProto.VidMappingPB getProtobufVidMappingFromJsonFile(final File vidmapJson)
throws IOException {
GenomicsDBVidMapProto.VidMappingPB.Builder vidMapBuilder = GenomicsDBVidMapProto.VidMappingPB.newBuilder();
JsonFormat.merge(new FileReader(vidmapJson), vidMapBuilder);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try(FileReader reader:new FileReader(vidmapJson){
JsonFormat.merge..
}

public static HashMap<String, Integer> getFieldNameToListIndexInProtobufVidMappingObject(
final GenomicsDBVidMapProto.VidMappingPB vidMapPB) {
HashMap<String, Integer> fieldNameToIndexInVidFieldsList = new HashMap<String, Integer>();
for(int fieldIdx=0;fieldIdx<vidMapPB.getFieldsCount();++fieldIdx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{}

}
}
catch(URISyntaxException e) {
throw new UserException("Malformed URI "+e.toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include e as the cause of the exception

final Set<Path> gvcfPathsFromSampleNameMap = new HashSet<>(sampleNameMapFromGenomicsDBImport.values());
final Set<URI> gvcfURIsFromSampleNameMap = new HashSet<>(sampleNameMapFromGenomicsDBImport.values());
final Set<Path> gvcfPathsFromSampleNameMap = new HashSet<>();
for(final URI currEntry : gvcfURIsFromSampleNameMap)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{}

…querying GenomicsDB using the Protobuf API

Use URI instead of Path in import Map objects
@ldgauthier ldgauthier force-pushed the genomicsdb_info_field_combine_op branch from 035394e to e290670 Compare October 3, 2018 13:43
@ldgauthier ldgauthier merged commit 50e0185 into master Oct 3, 2018
* @param vidmapJson vid JSON file
* @return Protobuf object
*/
public static GenomicsDBVidMapProto.VidMappingPB getProtobufVidMappingFromJsonFile(final File vidmapJson)
Copy link
Collaborator

@droazen droazen Oct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ldgauthier Can you pull out these GenomicsDB-specific utility methods (createExportConfiguration(), getProtobufVidMappingFromJsonFile(), etc.) into a separate GenomicsDBUtils class, instead of cluttering FeatureDataSource with them?

Could you also update Karthik's comments to no longer say things like "Sample code snippet", and remove commented-out code?

@droazen droazen deleted the genomicsdb_info_field_combine_op branch October 3, 2018 17:14
EdwardDixon pushed a commit to EdwardDixon/gatk that referenced this pull request Nov 9, 2018
…ations

Sample code to show how to modify INFO field combine operation while querying GenomicsDB using the Protobuf API (broadinstitute#4993)
Also use URI instead of Path in import Map objects
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants