-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sample code snippet showing how to specify combine operations for annotations #4993
Conversation
0c7e9fa
to
4cbaa66
Compare
FeatureDataSource is the relevant file - the rest consists of changes related to other areas. |
Codecov Report
@@ Coverage Diff @@
## master #4993 +/- ##
==============================================
- Coverage 86.8% 86.754% -0.046%
+ Complexity 29781 29770 -11
==============================================
Files 1822 1825 +3
Lines 137723 137739 +16
Branches 15181 15181
==============================================
- Hits 119544 119494 -50
- Misses 12662 12727 +65
- Partials 5517 5518 +1
|
@kgururaj I just tried this out with my annotations and it worked right out of the box! The update was very simple on my end. Ideally it might be nice to define the combine operations as static Strings in the annotation classes, but we can do that on the GATK side. |
Just to clarify, the variable_length_descriptor gets set automatically from the VCF header, right? So for allele-specific annotations, the vidmap already knows that they're allele-specific? |
GenomicsDB is too dumb to do anything automatically :) For the allele specific annotation fields that we know of, we define the type and length descriptor here and this information gets stored in the vid JSON file. This happens the first time the array is defined and data is imported. Subsequent reads of the vid file obtain the length descriptor and type of the allele specific annotation fields. |
So if I needed a new combine operation for an allele-specific annotation, how would I specify that the annotation is allele-specific? Do we need a updateINFOFieldLengthDescriptor like updateINFOFieldCombineOperation? |
|
I don't anticipate needing new combine operations, but being able to specify them for AS annotations would be necessary. |
|
I would like to know how to add more allele-specific annotations. Example
code would be great.
…On Thu, Aug 23, 2018 at 8:12 PM, Karthik Gururaj ***@***.***> wrote:
- If you are planning to add more allele specific annotations (other
than the ones listed [here](For the allele specific annotation fields
that we know
<https://github.com/Intel-HLS/GenomicsDB/blob/master/src/main/java/com/intel/genomicsdb/importer/Constants.java>)),
then I can provide more example code in GATK showing how to set the type
and length descriptors.
- If you simply wish to change the combine operation for existing
annotations, the example code in this PR should suffice
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4993 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGRhdOgKRPyTCP0gxEq0Ye1b4Q5CZ8HFks5uT0TWgaJpZM4VJ9WN>
.
--
Laura Doyle Gauthier, Ph.D.
Associate Director, Germline Methods
Data Sciences Platform
gauthier@broadinstitute.org
Broad Institute of MIT & Harvard
320 Charles St.
Cambridge MA 0214
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the interest of being able to add more allele-specific annotations in the future we need a updateINFOFieldLengthDescriptor or similar.
@ldgauthier Could you rebase this branch and make any changes you require? Then we can merge it and rebase your other branch onto this patch. |
4cbaa66
to
fe1fb05
Compare
I started looking into allele-specific annotation combine setting and it should be possible from the GATK-side, but then I realized I don't want to write all the tests for it right now. I'll put it in a branch and open an issue, but the gist is that this PR is useful, I want it merged, and my feature request can be addressed by a GATK dev at some point in the future if necessary. |
public static GenomicsDBVidMapProto.VidMappingPB getProtobufVidMappingFromJsonFile(final File vidmapJson) | ||
throws IOException { | ||
GenomicsDBVidMapProto.VidMappingPB.Builder vidMapBuilder = GenomicsDBVidMapProto.VidMappingPB.newBuilder(); | ||
JsonFormat.merge(new FileReader(vidmapJson), vidMapBuilder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try(FileReader reader:new FileReader(vidmapJson){
JsonFormat.merge..
}
public static HashMap<String, Integer> getFieldNameToListIndexInProtobufVidMappingObject( | ||
final GenomicsDBVidMapProto.VidMappingPB vidMapPB) { | ||
HashMap<String, Integer> fieldNameToIndexInVidFieldsList = new HashMap<String, Integer>(); | ||
for(int fieldIdx=0;fieldIdx<vidMapPB.getFieldsCount();++fieldIdx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{}
} | ||
} | ||
catch(URISyntaxException e) { | ||
throw new UserException("Malformed URI "+e.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include e
as the cause of the exception
final Set<Path> gvcfPathsFromSampleNameMap = new HashSet<>(sampleNameMapFromGenomicsDBImport.values()); | ||
final Set<URI> gvcfURIsFromSampleNameMap = new HashSet<>(sampleNameMapFromGenomicsDBImport.values()); | ||
final Set<Path> gvcfPathsFromSampleNameMap = new HashSet<>(); | ||
for(final URI currEntry : gvcfURIsFromSampleNameMap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{}
…querying GenomicsDB using the Protobuf API Use URI instead of Path in import Map objects
035394e
to
e290670
Compare
* @param vidmapJson vid JSON file | ||
* @return Protobuf object | ||
*/ | ||
public static GenomicsDBVidMapProto.VidMappingPB getProtobufVidMappingFromJsonFile(final File vidmapJson) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ldgauthier Can you pull out these GenomicsDB-specific utility methods (createExportConfiguration()
, getProtobufVidMappingFromJsonFile()
, etc.) into a separate GenomicsDBUtils
class, instead of cluttering FeatureDataSource
with them?
Could you also update Karthik's comments to no longer say things like "Sample code snippet", and remove commented-out code?
…ations Sample code to show how to modify INFO field combine operation while querying GenomicsDB using the Protobuf API (broadinstitute#4993) Also use URI instead of Path in import Map objects
Sample code to show how to modify INFO field combine operation while querying GenomicsDB using the Protobuf API
#4541
ping @ldgauthier @droazen @lbergelson @jamesemery