Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

edit to reference gene list #539

Merged
merged 5 commits into from
Feb 18, 2020

Conversation

kgaonkar6
Copy link
Collaborator

@kgaonkar6 kgaonkar6 commented Feb 14, 2020

Purpose/implementation Section

To add updated kinase list from pfam domain annotation
names(table(pfamGene[grep("kinase",pfamGene$NAME),"NAME"]))
[1] "AA_kinase" "Alpha_kinase" "APS_kinase" "Carb_kinase" "Choline_kinase" "DAG_kinase_N"
[7] "Flavokinase" "Fucokinase" "GHMP_kinases_C" "GHMP_kinases_N" "Hexokinase_1" "Hexokinase_2"
[13] "NAD_kinase" "P-mevalo_kinase" "PI3_PI4_kinase" "Pkinase" "Pkinase_C" "Pkinase_Tyr"

All genes with the above kinase doamins are added to the genereferencelist.txt.

What scientific question is your analysis addressing?

Previously certain key genes were not annotated as Kinase in results/pbta-fusion-putative-oncogenic.tsv. This updated gene list fixes the issue as well as add kinase genes fusions from the annotation based filtering in

# filter for putative driver genes and mutifused genes per sample

What was your approach?

I'm using pfam location file from http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/ucscGenePfam.txt.gz and pfam description file for http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/pfamDesc.txt.gz
. biomaRt R package to get the pfamIDs associated per gene to match to pfamIDs from UCSC.

Filtering for kinase in NAME column gets the hgnc_symbol associated with kinase domains :
https://gist.github.com/kgaonkar6/02b3fbcfeeddfa282a1cdf4803704794#file-format_reference_gene_list-r-L84

What GitHub issue does your pull request address?

#530

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

@jaclyn-taroni @jharenza

Which areas should receive a particularly close look?

Should the kinase gene list from pfamIDs generation script be part of this analysis?

Is there anything that you want to discuss further?

Are these pfam domains ok for considering as kinase domains? We did review them by simple google search please provide additional comments if applicable to these domain selection.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

results/
FilteredFusion.tsv
pbta-fusion-recurrent-fusion-byhistology.tsv
pbta-fusion-recurrently-fused-genes-byhistology.tsv
pbta-fusion-putative-oncogenic.tsv
pbta-fusion-recurrent-fusion-bysample.tsv
pbta-fusion-recurrently-fused-genes-bysample.tsv

What is your summary of the results?

3315 are retained as putative_oncogenic_fusions. The recurrent fusion and fused genes per histology are unchanged.

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • [X ] This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@jaclyn-taroni
Copy link
Member

@kgaonkar6 - it doesn't look like there's a link to that Gist in the README of the module: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/analyses/fusion_filtering/README.md

I would include a link to that Gist even if you are not adding that to the repository.

@jaclyn-taroni
Copy link
Member

@jharenza and @kgaonkar6 – a more general question/comment, why do we see changes in the following diffs:

https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/539/files#diff-686952d409ef921b805d47fe4186f5a9
https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/539/files#diff-96cb177fa4f37ad5ec6eafb0e9489a8f

Taking BS_277SBSCP as an example, which is being removed in this diff. BS_277SBSCP appears to be a poly-A stranded sample that is not present in the current histologies file (v14) and likely was removed in v13 when all of the poly-A stranded samples were removed. However, BS_277SBSCP is in pbta-fusion-recurrently-fused-genes-bysample.tsv file in v14. This file did not change between v13 and v14.

release-v13-20200116 md5sum:

aac74635d577f20fdc47e427cb119233  pbta-fusion-recurrently-fused-genes-bysample.tsv

release-v14-20200203 md5sum:

aac74635d577f20fdc47e427cb119233  pbta-fusion-recurrently-fused-genes-bysample.tsv

Is this a matter of these fusion files lagging behind by 1(ish) release?

@kgaonkar6
Copy link
Collaborator Author

Hi, yes I think those files didn't get updates since 05-recurrent-fusions-per-histology.R was using data/pbta-fusion-putative-oncogenic.tsv as input (which was v13) when I created the v14 files.
https://github.com/kgaonkar6/OpenPBTA-analysis/blob/88b848b8f5a8389c7159b009d9f981e331c36dd1/analyses/fusion_filtering/run_fusion_merged.sh#L29

I modified the run script to read the results/pbta-fusion-putative-oncogenic.tsv in this run so these results use the updated fusion calls. Should I edit the run_script for the repo as well or is it easier to run everything from data folder for tests/CI etc?

All the results folder files need to be updated in v15 #543

@kgaonkar6 kgaonkar6 mentioned this pull request Feb 18, 2020
5 tasks
Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach seems reasonable to me! Thanks for updating the README!

@jaclyn-taroni jaclyn-taroni merged commit c6e6261 into AlexsLemonade:master Feb 18, 2020
@kgaonkar6 kgaonkar6 deleted the add_kinase_genelist branch December 8, 2020 23:04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants