Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure all datasets have the same transcriptomics normalization metric #154

Closed
jjacobson95 opened this issue Apr 24, 2024 · 3 comments
Closed
Assignees

Comments

@jjacobson95
Copy link
Collaborator

All transcriptomic, proteomic, etc., data should be harmonized across datasets. This should be done before we explore batch effects for the paper.

@jjacobson95
Copy link
Collaborator Author

@moonchangin I heard you you have some details on this - feel free to share in this thread.

@moonchangin
Copy link
Collaborator

Issue Description

There are discrepancies in the normalization metrics used for different datasets: MPNST and BeatAML. This leads to potential issues in comparing results directly between these datasets.

Details

  • MPNST Data

  • BeatAML Data

Proposed Solution

To harmonize the data, we propose adjusting the MPNST's TPM data to match the RPKM used in the BeatAML dataset. The adjustment can be done using the following Python code snippet, which requires the gene lengths in kilobases (kb):

tpm['gene_length_kb'] = tpm['gene_length'] / 1000  # Convert gene length to kilobases
total_RNA_seq_depth = 10**6  # Placeholder for total depth in millions
tpm['rpkm'] = (tpm['transcriptomics'] * total_RNA_seq_depth) / tpm['gene_length_kb']

Requirements

  • Gene Length Information: We need to obtain the gene lengths from GENCODE version 29, which is the version used previously in the RNA-seq workflow. The gene length data should be converted from GTF format to a gene-symbol to gene-length pair in CSV format.
    gencode_v29_gene_lengths.csv

Code and resources used for tpm to rpkm conversion

code_for_tpm_to_rpkm.zip

@sgosline
Copy link
Member

Everything is in TPM except for BeatAML. @jjacobson95 can you please dig up the TPM for beatAML? If not, please add a conversion step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

3 participants