Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make filtering on mutation types configurable #2670

Merged
merged 1 commit into from Jul 26, 2017

Conversation

oplantalech
Copy link
Contributor

@oplantalech oplantalech commented Jun 28, 2017

What? Why?

Currently, cBioPortal filters out the following mutation types: Silent, Intron, IGR, 3'UTR, 5'UTR, 3'Flank and 5'Flank, except the promoter mutations for the TERT gene. We want to make this filtering by mutation type configurable by the user that loads studies in cBioPortal.

Changes proposed in this pull request:

  • Add an optional field in the mutation meta file that specifies the mutation types that want to be filtered out. If the field is left empty, no mutation types are filtered. If the field is not written in the meta file, the default filtering done by cBioPortal is applied.

The parameter is called variant_classification_filter. The mutation types need to be written separated by a comma and following the MAF nomenclature for Variant Classification.

More information about this PR.

Copy link
Member

@pieterlukasse pieterlukasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor change requests/improvements, but overall looks good 👍
(I just need to review documentation changes)


//Filter by types if specified in the meta file, else filter for the default types
if (filteredMutations != null) {
Set<String> variantFilteredMutations = new HashSet<String>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make this global and initialize it (i.e. the loop on lines 130-133) in constructor of this class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise this initialization code will execute for each mutation

}

public void addRejectedVariant(Map<String, Integer> countMap, String mutation) {
if (this.countMap.get(mutation) != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the computeIfPresent made the if / else unnecessary....is this not the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the value for the specified key is present and non-null, attempts to compute a new mapping given the key and its current mapped value.

If the function returns null, the mapping is removed. If the function itself throws an (unchecked) exception, the exception is rethrown, and the current mapping is left unchanged.

So it only works if the key is already present.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try combining it with computeIfAbsent()

@@ -254,6 +235,18 @@ public int getRedactedRejects()
{
return this.redactedRejects;
}

public Map<String, Integer> getMapRejections() {
return this.countMap;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please call this mapRejections or rename getter to getCountMap

@@ -275,4 +276,10 @@ public static String loadGenePanelInformation(File file) throws Exception {
properties.load(new FileInputStream(file));
return properties.getProperty("gene_panel");
}

public static String loadFilteredMutations(File file) throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this getVariantClassificationFilter. Please document here what the method does (gets a setting) and briefly what the purpose of this variant_classification_filter setting is .

@@ -363,7 +363,7 @@ def main(args):
print 'Data loading step using: {}\n'.format(args.jar_path)

# process the options
jvm_args = "-Dspring.profiles.active=dbcp -cp " + args.jar_path
jvm_args = "-Dspring.profiles.active=dbcp -cp " + args.jar_path #-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this

@@ -277,7 +277,16 @@ public static String loadGenePanelInformation(File file) throws Exception {
return properties.getProperty("gene_panel");
}

public static String loadFilteredMutations(File file) throws Exception {
/**
* Gets the information of "variant_classification"filter" in the file, if it exists. Otherwise, it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "variant_classification"filter" -> "variant_classification_filter"

* which types of mutations want to be filtered.
*
* @param file
* @return a string with the types of mutations that want to be filtered, comma-separated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rephrase "types of mutations that want to be filtered" -> types of mutations that should be filtered

}

public void addRejectedVariant(Map<String, Integer> countMap, String mutation) {
if (this.countMap.get(mutation) != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try combining it with computeIfAbsent()

Copy link
Member

@pieterlukasse pieterlukasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost there, but a few small changes

} else {
this.countMap.put(mutation, 1);
this.rejectionMap.put(mutation, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of if/else try using computIfAbsent()

for (String filteredMutation : (Arrays.asList(filteredMutations.split(",")))) {
filteredMutation = filteredMutation.trim();
this.filteredMutations.add(filteredMutation);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be simpler:

this.filteredMutations = new HashSet<String>(Arrays.asList(filteredMutations.split(",")));

that should be enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I also think this would be better done in getVariantClassificationFilter immediately. This makes it clear that this setting is about a set of values.


if( safeStringTest( mutation.getMutationType(), "5'Flank" ) ) {
if (whiteListGenesForPromoterMutations.contains(mutation.getEntrezGeneId())){
mutation.setProteinChange("Promoter");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is important and should apply to all 5'flank mutations

@pieterlukasse
Copy link
Member

pieterlukasse commented Jun 29, 2017

@oplantalech to finish, can you please setup unit tests with the following scenarios:
(prerequisite) setup dummy mutation file with the various filtering cases
1- check that the expected events are filtered out when running in default mode
2- check that the expected events are filtered out when using the filter property

Please also check that the events filtered out in (1) above are the same when running the previous version of the code (before your changes). Report these test results here (the output of the loading script for previous version and for test (1)).

ℹ️ see /cbioportal/core/src/test/java/org/mskcc/cbio/portal/scripts/TestImportExtendedMutationData.java; there are already some unit tests there.

@oplantalech
Copy link
Contributor Author

@pieterlukasse I only needed to add tests to check for the filtering property. I already added this piece of code which allows to reuse the tests that are already there as having the default filtering:

public ImportExtendedMutationData(File mutationFile, int geneticProfileId, String genePanel) {
        this(mutationFile, geneticProfileId, genePanel, null);
    }

@jjgao jjgao temporarily deployed to cbioportal-pr-2670 July 5, 2017 15:27 Inactive
@jjgao jjgao temporarily deployed to cbioportal-pr-2670 July 6, 2017 12:57 Inactive
@oplantalech
Copy link
Contributor Author

oplantalech commented Jul 7, 2017

@pieterlukasse
rc:

Filtering table:
-----------------
Mutation filter decisions: 87707
Rejects: 23477
Mutation Status 'None' Rejects:  0
Silent or Intron Rejects:  21543
UTR Rejects:  1934
IGR Rejects:  0
LOH or Wild Type Rejects:  0
Empty Annotation Rejects:  0
Missense Germline Rejects:  0

PR:

Filtering table:
-----------------
Mutation filter decisions: 87707
Rejects: 23477
Mutation Status 'None' Rejects:  0
LOH or Wild Type Rejects:  0
Empty Annotation Rejects:  0
Missense Germline Rejects:  0
5'Flank Rejects: 343
3'Flank Rejects: 288
Intron Rejects: 2875
Silent Rejects: 18668
5'UTR Rejects: 386
3'UTR Rejects: 917

Silent or Intron Rejects = Intron Rejects + Silent Rejects
UTR Rejects = 5'Flank Rejects + 3'Flank Rejects + 5'UTR Rejects + 3'UTR Rejects

With the new system if they are no rejections that are not Mutation Status 'None', LOH or Wild Type, Empty Annotation and Missense Germline, the category does not appear in the table. Therefore, there are no IGR Rejects in the PR.

@pieterlukasse
Copy link
Member

Nice, I like the new filtering table better.

@jjgao jjgao temporarily deployed to cbioportal-pr-2670 July 7, 2017 10:33 Inactive
@oplantalech
Copy link
Contributor Author

I just squashed all the commits into one, so now the PR is ready to merge.

@pieterlukasse
Copy link
Member

temporarily changed to "do not merge"...one part for the validation script is missing.

@jjgao
Copy link
Member

jjgao commented Jul 7, 2017

@n1zea144 or @sheridancbio could you please also review it?

@jjgao jjgao temporarily deployed to cbioportal-pr-2670 July 7, 2017 15:40 Inactive
@@ -1258,6 +1258,10 @@ def skipValidation(self, data):
"""Test whether the mutation is silent and should be skipped."""
is_silent = False
variant_classification = data[self.cols.index('Variant_Classification')]
if 'variant_classification_filter' in self.meta_dict:
self.SKIP_VARIANT_TYPES = [x.strip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the configured types are not valid types? Then the warning from checkVariantClassification is not given anymore for applicable cases, right?

# we expect 6 infos: 4 about filtered mutations, 2 general info messages:
self.assertEqual(len(record_list), 6)
# First 3 INFO messages should be something like: "Line will not be loaded due to the variant classification filter. Filtered types:"
for record in record_list[:3]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 or 4?

@@ -499,6 +499,9 @@ The mutation metadata file should contain the following fields:
8. **data_filename**: your data file
9. **gene_panel**: optional gene panel stable id
10. **swissprot_identifier (optional)**: either `accession` or `name`, indicating the type of identifier in the `SWISSPROT` column
11. **variant_classification_filter (optional)**: mutation types to be filtered out, separated by commas, using the allowed values specified below. By default, cBioPortal filters out the following mutation types: Silent, Intron, IGR, 3'UTR, 5'UTR, 3'Flank and 5'Flank, except the promoter mutations for the TERT gene. If no types want to be filtered out, leave the field empty.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you leave it empty, will no types be filtered out or will the default cBioPortal filtering apply? Please clarify.

@jjgao jjgao temporarily deployed to cbioportal-pr-2670 July 10, 2017 13:57 Inactive
@jjgao jjgao temporarily deployed to cbioportal-pr-2670 July 10, 2017 13:58 Inactive
@jjgao jjgao temporarily deployed to cbioportal-pr-2670 July 10, 2017 14:07 Inactive
@@ -1121,7 +1121,7 @@ def test_customized_variants_skipped(self):
# we expect 6 infos: 4 about filtered mutations, 2 general info messages:
self.assertEqual(len(record_list), 6)
# First 3 INFO messages should be something like: "Line will not be loaded due to the variant classification filter. Filtered types:"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 or 4?


# check that the values written by the user are valid values, otherwise raise a warning
for value in self.SKIP_VARIANT_TYPES:
self.checkVariantClassification(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not give a warning here, since these values will be skipped/ filtered. The warning given here is anyway not accurate for this scenario.

@jjgao jjgao temporarily deployed to cbioportal-pr-2670 July 10, 2017 15:34 Inactive
@jjgao jjgao temporarily deployed to cbioportal-pr-2670 July 11, 2017 14:40 Inactive
@sandertan sandertan self-requested a review July 12, 2017 20:43
@sandertan
Copy link
Contributor

sandertan commented Jul 12, 2017

@oplantalech Nice PR. Could you add variant_classification_filter to MetaFileTypes.MUTATION in cbioportal_common.py?

I'm getting this warning in my validation report:
Unrecognized field in meta file: variant_classification_filter

@jjgao jjgao temporarily deployed to cbioportal-pr-2670 July 13, 2017 07:58 Inactive
@jjgao jjgao temporarily deployed to cbioportal-pr-2670 July 13, 2017 08:56 Inactive
@oplantalech
Copy link
Contributor Author

@sandertan Done.

addRejectedVariant(rejectionMap, mutation.getMutationType());
return false;
} else {
if( safeStringTest( mutation.getMutationType(), "5'Flank" ) ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a sanity check here - based on PR description, I think you are intentionally skipping the check for TERT promotor before the call to setProteinChange()? Do we need to support a whitelist for 5'Flank mutations when a variant_classification_filter is specified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. Indeed, my idea here was to maintain the functionality exactly as it was if no variant_classification_filter is specified. That also means skipping the promoter mutations of the TERT gene. However, it might be a good idea to discuss with the rest of the community whether we want to remove this specific check for the TERT promoter mutations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the record: I think that when a variant_classification_filter is specified it should be leading and there should be no whitelist. IMO a whitelist would lead to confusion in this case. So yes, the whitelist check is omitted on purpose here.

}
return filteredMutations;
} else {
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor - extra tab.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing! I've fixed it.

@oplantalech
Copy link
Contributor Author

@jjgao Can you merge this PR? Thanks.

@jjgao jjgao merged commit 3590314 into cBioPortal:rc Jul 26, 2017
@oplantalech oplantalech deleted the promoter_mutations branch July 27, 2017 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants