This repository tracks the analysis pipeline for RRBS analysis for the Fedulov lab. All analysis steps are recorded in analysis.md
. The rrbs_data
folder contains a scripts
subfolder, which contains all of the scripts use to run various aspects of the analysis. The portions of analysis.md
that use bash were run on Oscar, while the R sections were run on a local machine.
Initial QC of raw reads was run using FASTQC (Andrews S. (2010). FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc). Reads were trimmed using Trim Galore using the following settings: --quality 20 --adapter AGATCGGAAGAGC --stringency 1 --length 50
(Andrews S. (2012). Available online at :http://www.bioinformatics.babraham.ac.uk/projects/trim_galore). Bismark was used to prepare the reference genome (Ensembl Release 89 of GRCm38) and to align trimmed reads using the --bowtie2
parameter. Bismark methylation extractor was used to extract methylation information using the following parameters --bedGraph --comprehensive --ignore 3 -s --merge_non_CpG
.
The edgeR package was used to find differentially methylated loci (Chen et al. 2018). The glmFit function was used to fit a negative binomial generalized log-linear model. The experimental design matrix was constructed using modelMatrixMeth with a factorial experimental design (~hormone * treatment
). The glmLRT function was used to find differentially methylated loci for comparisons of interest, which were made by constructing contrast vectors. Individual CpG sites were considered differentially methylated if the nominal p-value was < 0.02 and if the CpG was within 5 kilobases upstream or downstream of a transcription start site. Interaction term contrasts were only explored for loci where the inclusion of the interaction terms better fit the data (e.g., p-values were < 0.02 for the glmLRT with both interaction coefficients).