New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make filtering on mutation types configurable #2670
Conversation
a0c096b
to
43e63f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor change requests/improvements, but overall looks good 👍
(I just need to review documentation changes)
|
||
//Filter by types if specified in the meta file, else filter for the default types | ||
if (filteredMutations != null) { | ||
Set<String> variantFilteredMutations = new HashSet<String>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make this global and initialize it (i.e. the loop on lines 130-133) in constructor of this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise this initialization code will execute for each mutation
} | ||
|
||
public void addRejectedVariant(Map<String, Integer> countMap, String mutation) { | ||
if (this.countMap.get(mutation) != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the computeIfPresent
made the if / else unnecessary....is this not the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the value for the specified key is present and non-null, attempts to compute a new mapping given the key and its current mapped value.
If the function returns null, the mapping is removed. If the function itself throws an (unchecked) exception, the exception is rethrown, and the current mapping is left unchanged.
So it only works if the key is already present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try combining it with computeIfAbsent()
@@ -254,6 +235,18 @@ public int getRedactedRejects() | |||
{ | |||
return this.redactedRejects; | |||
} | |||
|
|||
public Map<String, Integer> getMapRejections() { | |||
return this.countMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please call this mapRejections
or rename getter to getCountMap
@@ -275,4 +276,10 @@ public static String loadGenePanelInformation(File file) throws Exception { | |||
properties.load(new FileInputStream(file)); | |||
return properties.getProperty("gene_panel"); | |||
} | |||
|
|||
public static String loadFilteredMutations(File file) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this getVariantClassificationFilter. Please document here what the method does (gets a setting) and briefly what the purpose of this variant_classification_filter setting is .
@@ -363,7 +363,7 @@ def main(args): | |||
print 'Data loading step using: {}\n'.format(args.jar_path) | |||
|
|||
# process the options | |||
jvm_args = "-Dspring.profiles.active=dbcp -cp " + args.jar_path | |||
jvm_args = "-Dspring.profiles.active=dbcp -cp " + args.jar_path #-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this
@@ -277,7 +277,16 @@ public static String loadGenePanelInformation(File file) throws Exception { | |||
return properties.getProperty("gene_panel"); | |||
} | |||
|
|||
public static String loadFilteredMutations(File file) throws Exception { | |||
/** | |||
* Gets the information of "variant_classification"filter" in the file, if it exists. Otherwise, it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo "variant_classification"filter" -> "variant_classification_filter"
* which types of mutations want to be filtered. | ||
* | ||
* @param file | ||
* @return a string with the types of mutations that want to be filtered, comma-separated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rephrase "types of mutations that want to be filtered" -> types of mutations that should be filtered
} | ||
|
||
public void addRejectedVariant(Map<String, Integer> countMap, String mutation) { | ||
if (this.countMap.get(mutation) != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try combining it with computeIfAbsent()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost there, but a few small changes
} else { | ||
this.countMap.put(mutation, 1); | ||
this.rejectionMap.put(mutation, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of if/else try using computIfAbsent()
for (String filteredMutation : (Arrays.asList(filteredMutations.split(",")))) { | ||
filteredMutation = filteredMutation.trim(); | ||
this.filteredMutations.add(filteredMutation); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be simpler:
this.filteredMutations = new HashSet<String>(Arrays.asList(filteredMutations.split(",")));
that should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I also think this would be better done in getVariantClassificationFilter
immediately. This makes it clear that this setting is about a set of values.
|
||
if( safeStringTest( mutation.getMutationType(), "5'Flank" ) ) { | ||
if (whiteListGenesForPromoterMutations.contains(mutation.getEntrezGeneId())){ | ||
mutation.setProteinChange("Promoter"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is important and should apply to all 5'flank mutations
@oplantalech to finish, can you please setup unit tests with the following scenarios: Please also check that the events filtered out in (1) above are the same when running the previous version of the code (before your changes). Report these test results here (the output of the loading script for previous version and for test (1)). ℹ️ see |
@pieterlukasse I only needed to add tests to check for the filtering property. I already added this piece of code which allows to reuse the tests that are already there as having the default filtering:
|
@pieterlukasse
PR:
With the new system if they are no rejections that are not |
Nice, I like the new filtering table better. |
f264b07
to
66c0be3
Compare
I just squashed all the commits into one, so now the PR is ready to merge. |
temporarily changed to "do not merge"...one part for the validation script is missing. |
@n1zea144 or @sheridancbio could you please also review it? |
f399fff
to
553ac26
Compare
@@ -1258,6 +1258,10 @@ def skipValidation(self, data): | |||
"""Test whether the mutation is silent and should be skipped.""" | |||
is_silent = False | |||
variant_classification = data[self.cols.index('Variant_Classification')] | |||
if 'variant_classification_filter' in self.meta_dict: | |||
self.SKIP_VARIANT_TYPES = [x.strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the configured types are not valid types? Then the warning from checkVariantClassification
is not given anymore for applicable cases, right?
# we expect 6 infos: 4 about filtered mutations, 2 general info messages: | ||
self.assertEqual(len(record_list), 6) | ||
# First 3 INFO messages should be something like: "Line will not be loaded due to the variant classification filter. Filtered types:" | ||
for record in record_list[:3]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 or 4?
docs/File-Formats.md
Outdated
@@ -499,6 +499,9 @@ The mutation metadata file should contain the following fields: | |||
8. **data_filename**: your data file | |||
9. **gene_panel**: optional gene panel stable id | |||
10. **swissprot_identifier (optional)**: either `accession` or `name`, indicating the type of identifier in the `SWISSPROT` column | |||
11. **variant_classification_filter (optional)**: mutation types to be filtered out, separated by commas, using the allowed values specified below. By default, cBioPortal filters out the following mutation types: Silent, Intron, IGR, 3'UTR, 5'UTR, 3'Flank and 5'Flank, except the promoter mutations for the TERT gene. If no types want to be filtered out, leave the field empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you leave it empty, will no types be filtered out or will the default cBioPortal filtering apply? Please clarify.
@@ -1121,7 +1121,7 @@ def test_customized_variants_skipped(self): | |||
# we expect 6 infos: 4 about filtered mutations, 2 general info messages: | |||
self.assertEqual(len(record_list), 6) | |||
# First 3 INFO messages should be something like: "Line will not be loaded due to the variant classification filter. Filtered types:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 or 4?
|
||
# check that the values written by the user are valid values, otherwise raise a warning | ||
for value in self.SKIP_VARIANT_TYPES: | ||
self.checkVariantClassification(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not give a warning here, since these values will be skipped/ filtered. The warning given here is anyway not accurate for this scenario.
871af62
to
a0e3716
Compare
@oplantalech Nice PR. Could you add I'm getting this warning in my validation report: |
267ace3
to
f2fc74a
Compare
@sandertan Done. |
addRejectedVariant(rejectionMap, mutation.getMutationType()); | ||
return false; | ||
} else { | ||
if( safeStringTest( mutation.getMutationType(), "5'Flank" ) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a sanity check here - based on PR description, I think you are intentionally skipping the check for TERT promotor before the call to setProteinChange()? Do we need to support a whitelist for 5'Flank mutations when a variant_classification_filter is specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. Indeed, my idea here was to maintain the functionality exactly as it was if no variant_classification_filter
is specified. That also means skipping the promoter mutations of the TERT gene. However, it might be a good idea to discuss with the rest of the community whether we want to remove this specific check for the TERT promoter mutations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for the record: I think that when a variant_classification_filter is specified it should be leading and there should be no whitelist. IMO a whitelist would lead to confusion in this case. So yes, the whitelist check is omitted on purpose here.
} | ||
return filteredMutations; | ||
} else { | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor - extra tab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing! I've fixed it.
f2fc74a
to
8e395cd
Compare
@jjgao Can you merge this PR? Thanks. |
What? Why?
Currently, cBioPortal filters out the following mutation types:
Silent
,Intron
,IGR
,3'UTR
,5'UTR
,3'Flank
and5'Flank
, except the promoter mutations for the TERT gene. We want to make this filtering by mutation type configurable by the user that loads studies in cBioPortal.Changes proposed in this pull request:
The parameter is called
variant_classification_filter
. The mutation types need to be written separated by a comma and following the MAF nomenclature for Variant Classification.More information about this PR.