See associated publication: https://www.nature.com/articles/s42003-021-02792-w
Pipeline to produce mitochondrial-nuclear correlation matrices from TPM matrices
Function_file | Function_name | Description | Input | Output | Dataset specific |
---|---|---|---|---|---|
filterNullGenesAndSamps.py | filterNullGenesAndSamps | Removes samples with TPM=0 in all genes and retains only genes where TPM>0 in all samples. | One or more .csv files in the format: rows=samples, columns=genes | Filtered .csv file | No |
log10MedNormalise.R | log10MedNormalise | Log10 median normalises counts. The log10 transformation makes sample distributions normal, then median normalisation makes the sample expression medians the same for inter-sample comparability | One or more .csv files in the format: cols=samples, rows=genes | Log10 median normalised .csv file | No |
maskGeneOutliers.py | maskGeneOutliers | Masks gene outlier values as follows: LQ+/- 3IQR and UQ+/- 3IQR with NaN value | One or more .csv files in the format: rows=samples, columns=genes | Masked outliers .csv file | No |
gtex_regress_covariates.py | gtex_regress_covariates | Runs a linear model to regress out (hardcoded) covariates from GTEx CNS data | 1. One or more .csv files in the format: rows=samples, columns=genes; 2. Metadata from GTEx portal; 3. Phenotype data from GTEx portal | Covariate corrected residuals .csv file | Yes: GTEX V6p CNS |
rosmap_regress_covariates.py | rosmap_regress_covariates | Runs a linear model to regress out (hardcoded) covariates from ROS/MAP case-control frontal cortex data | 1. One or more .csv files in the format: rows=samples, columns=genes; 2. Metadata from the Synapse portal including ROS/MAP ID table, clinical metadata and RNAseq metadata (preprocessing of this done using ROSMAP_preprocess_and_covs.ipynb) | Covariate corrected residuals .csv file | YES: ROS/MAP case-control frontal cortex |
genCorrs.py | genCorrs | Generates all gene pairwise spearman or pearson correlations | One or more .csv files in the format: rows=samples, columns=genes | 1. correlation matrix .csv file; 2. p-value matrix .csv file | No |