You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are discrepancies in the normalization metrics used for different datasets: MPNST and BeatAML. This leads to potential issues in comparing results directly between these datasets.
Details
MPNST Data
Normalization Metric: TPM (Transcripts Per Million)
To harmonize the data, we propose adjusting the MPNST's TPM data to match the RPKM used in the BeatAML dataset. The adjustment can be done using the following Python code snippet, which requires the gene lengths in kilobases (kb):
tpm['gene_length_kb'] =tpm['gene_length'] /1000# Convert gene length to kilobasestotal_RNA_seq_depth=10**6# Placeholder for total depth in millionstpm['rpkm'] = (tpm['transcriptomics'] *total_RNA_seq_depth) /tpm['gene_length_kb']
Requirements
Gene Length Information: We need to obtain the gene lengths from GENCODE version 29, which is the version used previously in the RNA-seq workflow. The gene length data should be converted from GTF format to a gene-symbol to gene-length pair in CSV format. gencode_v29_gene_lengths.csv
Code and resources used for tpm to rpkm conversion
All transcriptomic, proteomic, etc., data should be harmonized across datasets. This should be done before we explore batch effects for the paper.
The text was updated successfully, but these errors were encountered: