edit to reference gene list #539

kgaonkar6 · 2020-02-14T18:51:17Z

Purpose/implementation Section

To add updated kinase list from pfam domain annotation
names(table(pfamGene[grep("kinase",pfamGene$NAME),"NAME"]))
[1] "AA_kinase" "Alpha_kinase" "APS_kinase" "Carb_kinase" "Choline_kinase" "DAG_kinase_N"
[7] "Flavokinase" "Fucokinase" "GHMP_kinases_C" "GHMP_kinases_N" "Hexokinase_1" "Hexokinase_2"
[13] "NAD_kinase" "P-mevalo_kinase" "PI3_PI4_kinase" "Pkinase" "Pkinase_C" "Pkinase_Tyr"

All genes with the above kinase doamins are added to the genereferencelist.txt.

What scientific question is your analysis addressing?

Previously certain key genes were not annotated as Kinase in results/pbta-fusion-putative-oncogenic.tsv. This updated gene list fixes the issue as well as add kinase genes fusions from the annotation based filtering in

OpenPBTA-analysis/analyses/fusion_filtering/04-project-specific-filtering.Rmd

Line 241 in 22066ad

# filter for putative driver genes and mutifused genes per sample

What was your approach?

I'm using pfam location file from http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/ucscGenePfam.txt.gz and pfam description file for http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/pfamDesc.txt.gz
. biomaRt R package to get the pfamIDs associated per gene to match to pfamIDs from UCSC.

Filtering for kinase in NAME column gets the hgnc_symbol associated with kinase domains :
https://gist.github.com/kgaonkar6/02b3fbcfeeddfa282a1cdf4803704794#file-format_reference_gene_list-r-L84

What GitHub issue does your pull request address?

#530

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

@jaclyn-taroni @jharenza

Which areas should receive a particularly close look?

Should the kinase gene list from pfamIDs generation script be part of this analysis?

Is there anything that you want to discuss further?

Are these pfam domains ok for considering as kinase domains? We did review them by simple google search please provide additional comments if applicable to these domain selection.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

results/
FilteredFusion.tsv
pbta-fusion-recurrent-fusion-byhistology.tsv
pbta-fusion-recurrently-fused-genes-byhistology.tsv
pbta-fusion-putative-oncogenic.tsv
pbta-fusion-recurrent-fusion-bysample.tsv
pbta-fusion-recurrently-fused-genes-bysample.tsv

What is your summary of the results?

3315 are retained as putative_oncogenic_fusions. The recurrent fusion and fused genes per histology are unchanged.

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
[X ] This analysis has been added to continuous integration.

Documentation Checklist

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

jaclyn-taroni · 2020-02-17T21:06:10Z

@kgaonkar6 - it doesn't look like there's a link to that Gist in the README of the module: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/analyses/fusion_filtering/README.md

I would include a link to that Gist even if you are not adding that to the repository.

jaclyn-taroni · 2020-02-17T21:41:23Z

@jharenza and @kgaonkar6 – a more general question/comment, why do we see changes in the following diffs:

https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/539/files#diff-686952d409ef921b805d47fe4186f5a9
https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/539/files#diff-96cb177fa4f37ad5ec6eafb0e9489a8f

Taking BS_277SBSCP as an example, which is being removed in this diff. BS_277SBSCP appears to be a poly-A stranded sample that is not present in the current histologies file (v14) and likely was removed in v13 when all of the poly-A stranded samples were removed. However, BS_277SBSCP is in pbta-fusion-recurrently-fused-genes-bysample.tsv file in v14. This file did not change between v13 and v14.

release-v13-20200116 md5sum:

aac74635d577f20fdc47e427cb119233  pbta-fusion-recurrently-fused-genes-bysample.tsv

release-v14-20200203 md5sum:

aac74635d577f20fdc47e427cb119233  pbta-fusion-recurrently-fused-genes-bysample.tsv

Is this a matter of these fusion files lagging behind by 1(ish) release?

kgaonkar6 · 2020-02-17T23:53:39Z

Hi, yes I think those files didn't get updates since 05-recurrent-fusions-per-histology.R was using data/pbta-fusion-putative-oncogenic.tsv as input (which was v13) when I created the v14 files.
https://github.com/kgaonkar6/OpenPBTA-analysis/blob/88b848b8f5a8389c7159b009d9f981e331c36dd1/analyses/fusion_filtering/run_fusion_merged.sh#L29

I modified the run script to read the results/pbta-fusion-putative-oncogenic.tsv in this run so these results use the updated fusion calls. Should I edit the run_script for the repo as well or is it easier to run everything from data folder for tests/CI etc?

All the results folder files need to be updated in v15 #543

jaclyn-taroni

This approach seems reasonable to me! Thanks for updating the README!

kgaonkar6 added 2 commits February 13, 2020 13:56

edit to reference gene list

b4813f3

updated kinase list

88b848b

added GIST for reference list

d07342f

kgaonkar6 mentioned this pull request Feb 18, 2020

Planned release: v15 #543

Closed

5 tasks

jaclyn-taroni approved these changes Feb 18, 2020

View reviewed changes

jaclyn-taroni added 2 commits February 18, 2020 11:06

Merge branch 'master' into add_kinase_genelist

2ac38a7

Merge branch 'master' into add_kinase_genelist

0c6be82

jaclyn-taroni merged commit c6e6261 into AlexsLemonade:master Feb 18, 2020

jaclyn-taroni mentioned this pull request Feb 19, 2020

Updated analysis: Update kinase reference gene list for fusion filtering #530

Closed

kgaonkar6 deleted the add_kinase_genelist branch December 8, 2020 23:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

edit to reference gene list #539

edit to reference gene list #539

kgaonkar6 commented Feb 14, 2020 •

edited

jaclyn-taroni commented Feb 17, 2020

jaclyn-taroni commented Feb 17, 2020

kgaonkar6 commented Feb 17, 2020

jaclyn-taroni left a comment

edit to reference gene list #539

edit to reference gene list #539

Conversation

kgaonkar6 commented Feb 14, 2020 • edited

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

jaclyn-taroni commented Feb 17, 2020

jaclyn-taroni commented Feb 17, 2020

kgaonkar6 commented Feb 17, 2020

jaclyn-taroni left a comment

Choose a reason for hiding this comment

kgaonkar6 commented Feb 14, 2020 •

edited