Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Standard annotation and reference files #241

Closed
jashapiro opened this issue Nov 6, 2019 · 6 comments
Closed

Standard annotation and reference files #241

jashapiro opened this issue Nov 6, 2019 · 6 comments
Labels

Comments

@jashapiro
Copy link
Member

jashapiro commented Nov 6, 2019

File(s)

*.gtf

Release

v9

Link to OpenPBTA-manuscript

Put a link to the relevant section of the OpenPBTA manuscript here.

Question/issue

As far as I am aware, we do not currently provide gtf or fasta files for downstream analysis in a standardized way with the repository or data download. Providing such files would ease analysis by new contributors (see #198 (comment) for example) and improve reproducibility. We do currently install a txdb version, but that may not be completely aligned with the upstream analysis.

Since much of the upstream analysis relies on Gencode v27, (e.g. [Gene expression abundance] (https://alexslemonade.github.io/OpenPBTA-manuscript/#gene-expression-abundance-estimation), this would seem a logical file to include.

Including the relevant reference fasta file could also be useful, though perhaps at a lower priority.

@jashapiro jashapiro added the data label Nov 6, 2019
@jharenza
Copy link
Collaborator

jharenza commented Nov 7, 2019

will add to the next release

@jashapiro
Copy link
Member Author

Are there any other reference files that you think should be added as well?

@jharenza
Copy link
Collaborator

jharenza commented Nov 7, 2019

The only other thing I can think of offhand is a mapping file from biomart for ENS to Hugo to Entrez.. this was useful for me with the PDX paper to harmonize gene symbols (update old ones if algorithms used old mappings to current), but I'm not sure if this will be the case for us since we use more recent hg38 and the same ref for all DNA algorithms and same goes into STAR for both fusion algorithms. This was an issue in the MAF and the 4 diff fusion algorithms using different reference genomes. Eg: former MLL is now KMT2A.

@migbro
Copy link
Contributor

migbro commented Nov 13, 2019

RNA reference used: https://www.gencodegenes.org/human/release_27.html
Specifically this fasta: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_27/GRCh38.primary_assembly.genome.fa.gz

And this gtf: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_27/gencode.v27.primary_assembly.annotation.gtf.gz

@jharenza
Copy link
Collaborator

thanks @migbro - we can add to #254

@yuankunzhu yuankunzhu mentioned this issue Nov 15, 2019
@jharenza
Copy link
Collaborator

Added with #273

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants