Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Proposed Analysis: quantify telomerase activity across pediatric brain tumors #148

Closed
syzheng opened this issue Oct 4, 2019 · 13 comments
Closed
Labels
in progress Someone is working on this issue, but feel free to propose an alternative approach! proposed analysis transcriptomic Related to or requires transcriptomic data

Comments

@syzheng
Copy link

syzheng commented Oct 4, 2019

Scientific goals

The goal is to quantify telomerase activity and correlate them with telomere length and molecular alterations (TERTp mutation, ATRX mutation, etc)

Proposed methods

We will use our newly developed method EXTEND (EXpression based Telomerase ENzymatic activity Detection)

Required input data

Gene expression from RNAseq (either of TPM, RPKM, or counts)

Proposed timeline

One to two weeks.

Relevant literature

Barthel et al. Nat Genet, 2017; Zheng et al. Cancer Cell, 2016; Ackermann et al. Science, 2016

@cgreene
Copy link
Collaborator

cgreene commented Oct 5, 2019

This sounds exciting! As I think about potential caveats, does it matter if the RNA-seq samples are poly-A selected or rRNA depleted?

@jharenza jharenza added the in progress Someone is working on this issue, but feel free to propose an alternative approach! label Oct 7, 2019
@syzheng
Copy link
Author

syzheng commented Oct 7, 2019

This sounds exciting! As I think about potential caveats, does it matter if the RNA-seq samples are poly-A selected or rRNA depleted?

that is a great point. we currently use data from regular polyA enriched protocol, mostly because our primary input data is TCGA. this does impact the method, because a key gene in our signature, TERC, is a non-coding RNA that is not properly captured by polyA methods. PCR shows this gene is abundantly expressed across tissues; however, RNAseq data from TCGA and GTEx only show very low expression of this gene. EXTEND demonstrates reasonable performance with both TCGA, CCLE and GTEx, but We have not tested data from total RNAseq or rRNA depletion. Great point.

@cgreene
Copy link
Collaborator

cgreene commented Oct 7, 2019

@syzheng : Ok! This dataset contains both poly-A and rRNA depleted samples. #120 and https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/selection-strategy-comparison took a dive into the implications for gene expression analyses based on some earlier work by @cbethell.

I'm a bit confused by "a key gene in our signature, TERC, is a non-coding RNA that is not properly captured by polyA methods" and also "we currently use data from regular polyA enriched protocol, mostly because our primary input data is TCGA". Did you mean that you are better off with poly-A? There are many fewer poly-A samples here than rRNA-depleted.

As something that may be helpful in extending an analysis across both sets: @jharenza is looking to determine whether or not we can generate some that are matched (sequenced with both protocols).

@syzheng
Copy link
Author

syzheng commented Oct 7, 2019

@syzheng : Ok! This dataset contains both poly-A and rRNA depleted samples. #120 and https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/selection-strategy-comparison took a dive into the implications for gene expression analyses based on some earlier work by @cbethell.

I'm a bit confused by "a key gene in our signature, TERC, is a non-coding RNA that is not properly captured by polyA methods" and also "we currently use data from regular polyA enriched protocol, mostly because our primary input data is TCGA". Did you mean that you are better off with poly-A? There are many fewer poly-A samples here than rRNA-depleted.

As something that may be helpful in extending an analysis across both sets: @jharenza is looking to determine whether or not we can generate some that are matched (sequenced with both protocols).

EXTEND was developed using data from polyA. We essentially do not know if it works for rRNA depletion, because we did not have this type of data when we benchmarked the method. The key is TERC, a non-coding RNA that constitutes our gene signature as well as the telomerase complex. It would be great if we have a few cases that are sequenced by both methods. Otherwise, we can examine the distribution of TERC expression in the dataset to see if they behave similarly to poly A datasets.

@cgreene
Copy link
Collaborator

cgreene commented Oct 7, 2019

Gotcha! You'll find both sets of files in the data download as processed in a few different ways:

pbta-gene-expression-kallisto.polya.rds
pbta-gene-expression-kallisto.stranded.rds
pbta-gene-expression-rsem-fpkm.polya.rds
pbta-gene-expression-rsem-fpkm.stranded.rds
pbta-gene-counts-rsem-expected_count.polya.rds
pbta-gene-counts-rsem-expected_count.stranded.rds

For now, it will be interesting to see if the distribution is different and/or if TERC matches the estimates from the method in the stranded ones. Hopefully we'll have the set with both in the not terribly distant future, but we shouldn't wait for them to get started. Thanks for taking this on!

@jaclyn-taroni jaclyn-taroni added the transcriptomic Related to or requires transcriptomic data label Oct 26, 2019
@jharenza
Copy link
Collaborator

Hi @syzheng ! Checking in on this analysis - do you have an idea of when you or your team would file a PR for this? Thanks!

@syzheng
Copy link
Author

syzheng commented Oct 28, 2019 via email

@jharenza
Copy link
Collaborator

jharenza commented Dec 12, 2019

Hi @syzheng! Wanted to update you that with V12 (#326) of the data release, we will provide stranded seq for 45 samples on which we also have polyA rna-seq, so would be interesting to determine whether there are telomerase prediction differences in these two sets of data. Stay tuned end of this week/early next week. Also looking forward to your PR!

@syzheng
Copy link
Author

syzheng commented Dec 12, 2019 via email

@jharenza
Copy link
Collaborator

jharenza commented Jan 4, 2020

Hi @syzheng ! Happy New Year! Do you think your team will be able to submit a PR on this analysis sometime soon? We are starting to wrap up/finalize analyses and determine manuscript figures. Thanks!

@syzheng
Copy link
Author

syzheng commented Jan 4, 2020 via email

@jharenza
Copy link
Collaborator

jharenza commented Jan 4, 2020

No worries, glad to hear!

@jaclyn-taroni
Copy link
Member

Addressed through #494, #506, #511, and #516

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
in progress Someone is working on this issue, but feel free to propose an alternative approach! proposed analysis transcriptomic Related to or requires transcriptomic data
Projects
None yet
Development

No branches or pull requests

4 participants