-
Notifications
You must be signed in to change notification settings - Fork 67
edit to reference gene list #539
edit to reference gene list #539
Conversation
@kgaonkar6 - it doesn't look like there's a link to that Gist in the README of the module: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/analyses/fusion_filtering/README.md I would include a link to that Gist even if you are not adding that to the repository. |
@jharenza and @kgaonkar6 – a more general question/comment, why do we see changes in the following diffs: https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/539/files#diff-686952d409ef921b805d47fe4186f5a9 Taking
Is this a matter of these fusion files lagging behind by 1(ish) release? |
Hi, yes I think those files didn't get updates since 05-recurrent-fusions-per-histology.R was using data/pbta-fusion-putative-oncogenic.tsv as input (which was v13) when I created the v14 files. I modified the run script to read the results/pbta-fusion-putative-oncogenic.tsv in this run so these results use the updated fusion calls. Should I edit the run_script for the repo as well or is it easier to run everything from data folder for tests/CI etc? All the results folder files need to be updated in v15 #543 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach seems reasonable to me! Thanks for updating the README!
Purpose/implementation Section
To add updated kinase list from pfam domain annotation
names(table(pfamGene[grep("kinase",pfamGene$NAME),"NAME"]))
[1] "AA_kinase" "Alpha_kinase" "APS_kinase" "Carb_kinase" "Choline_kinase" "DAG_kinase_N"
[7] "Flavokinase" "Fucokinase" "GHMP_kinases_C" "GHMP_kinases_N" "Hexokinase_1" "Hexokinase_2"
[13] "NAD_kinase" "P-mevalo_kinase" "PI3_PI4_kinase" "Pkinase" "Pkinase_C" "Pkinase_Tyr"
All genes with the above kinase doamins are added to the genereferencelist.txt.
What scientific question is your analysis addressing?
Previously certain key genes were not annotated as Kinase in results/pbta-fusion-putative-oncogenic.tsv. This updated gene list fixes the issue as well as add kinase genes fusions from the annotation based filtering in
OpenPBTA-analysis/analyses/fusion_filtering/04-project-specific-filtering.Rmd
Line 241 in 22066ad
What was your approach?
I'm using pfam location file from http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/ucscGenePfam.txt.gz and pfam description file for http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/pfamDesc.txt.gz
. biomaRt R package to get the pfamIDs associated per gene to match to pfamIDs from UCSC.
Filtering for kinase in NAME column gets the hgnc_symbol associated with kinase domains :
https://gist.github.com/kgaonkar6/02b3fbcfeeddfa282a1cdf4803704794#file-format_reference_gene_list-r-L84
What GitHub issue does your pull request address?
#530
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
@jaclyn-taroni @jharenza
Which areas should receive a particularly close look?
Should the kinase gene list from pfamIDs generation script be part of this analysis?
Is there anything that you want to discuss further?
Are these pfam domains ok for considering as kinase domains? We did review them by simple google search please provide additional comments if applicable to these domain selection.
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Results
What types of results are included (e.g., table, figure)?
results/
FilteredFusion.tsv
pbta-fusion-recurrent-fusion-byhistology.tsv
pbta-fusion-recurrently-fused-genes-byhistology.tsv
pbta-fusion-putative-oncogenic.tsv
pbta-fusion-recurrent-fusion-bysample.tsv
pbta-fusion-recurrently-fused-genes-bysample.tsv
What is your summary of the results?
3315 are retained as putative_oncogenic_fusions. The recurrent fusion and fused genes per histology are unchanged.
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.