Circular RNAs (CircRNAs) differential expression analysis by using Generalized Linear Mixed Model.
Here we present a novel approach to detect circRNAs differentially expressed (DECs). We evaluate our approach that combine the expression of circRNAs from different detection tools (CIRI, DCC, CircExplorer2 and findcirc) in terms of:
- the ability of differential expression detection methods to control the Type I Error;
- the ability of differential expression detection methods in terms of Consistency;
- the power of differntial expression detection methods.
Data used in this analysis was retrieved from available GEO repository of Ribo-depleted RNA-seq samples of two or more different conditions (GSE53697, GSE86356, GSE52463).
To perform GLMM on the combined matrix (CMAT) we used a function get_combined_matrix from the egaffo/CREART R package.
The function take as input x the list of the methods' output to be combined, or the path of the CirComPara2's results, or the full path to the circrnas.gtf file from the CirComPara's output. You have to specify in the option select_methods the names of the detection tools kept to compose the CMAT.
The directory ./robustness/ contains:
- getPheno.R which create a file .txt containing B combination of samples for the creation of synthetic datasets;
- glm_glmm_paired.R and DEscripts.R which estimates the Negative Binomial and GLMM models for each synthetic datasets saving the results as .RData;
- SensitivityPrecision.Rnw which computes specificity, sensitivity and other measures considering p-values generated by each method in the simulations;
- plot_eval.R which puts the information from all datasets together and then plots the results.
The directory ./parametric_sim/ contains:
- datasets_and_models.R and sampling_func_glmm.R which estimates the Negative Binomial parametric distributions to use as template for the simulations in the datasets;
- simulator_New.R which creates the simulation framework for both glm and glmm models evaluation;
- eval_function_call.R which tests the differential expression detection methods;
- evalPVals.R which computes specificity, sensitivity and other measures considering p-values generated by each method in the simulations;
The directory ./consinstency/ contains:
- consistency_replicability.Rmd which loads DECs results from robustness evaluation and then tests the differential expression detection methods in terms of Concordance At the Top.
The directory ./type_I_error_control/ contains the TIEC.Rmd file which loads DECs results estimated using glm_glmm_paired.R and DEscripts.R for the evaluation of the ability of differential expression detection methods to control the type first error using mock datasets, without differentially abundant features, generated using getSampleShuffle.R script.
Since the entire data production took a long time, the ./data/ directory contains several outputs from all the analyses. This should make it easier for the user to replicate the results.
To replicate the analyses it is strongly suggested to clone or download the entire github directory. Some of the functions used this paper are adapted from the work of: Assessment of statistical methods from single cell, bulk RNA-seq and metagenomics applied to microbiome data., their original code is available at https://github.com/mcalgaro93/sc2meta. The analyses run in many version of R during the development, R 4.1.2 was the final R version on which the methods worked.