# <span> Module 2: DNA Methylation Analysis <span>

Watch [this video](https://youtu.be/_T46fuV7qYw) to learn more about this submodule.

## **Introduction**

### <span> What is Epigenetics? <span>
+ Changes in gene expression caused by mechanisms other than changes in the underlying DNA sequence.
+ Enables a cell/organism to respond to its dynamic external environment during development and throughout life.
+ Epigenetic changes to the genome can be inherited if these changes occur in cells giving rise to gametes.
    
### <span> Epigenetics Mechanisms <span>
+ DNA Methylation
+ Histone Modification
    
### <span> DNA Methylation <span> 
DNA Methylation is an epigenetic mechanism that can control gene regulation. Methylation involves the transfer of a methyl group and typically occurs at the CpG dinucleotides in vertebrates. The locations of methylated DNA, including hyper- and hypo- methylated DNA can give information on different diseases and allow researchers to predict and study these diseases. Bisulfite sequencing is a technique that can determine the patterns of DNA methylation. Bisulfite sequencing has a single-base resolution that allows researchers to study the methylation patterns at the base level. 

The addition of methyl groups to DNA, mostly CpG sites, is to convert cytosine to 5-methylcytosine. DNA methylation at promoter regions can impede target gene expression. CpG sites are regions of DNA where a cytosine nucleotide occurs next to a guanine nucleotide in the linear sequence of bases along its length. "CpG" stands for cytosine and guanine separated by a phosphate (—C—phosphate—G—), which links the two nucleosides together in DNA. Methyl groups attached to DNA affects accessibility of genes to transcription proteins. Highly methylated DNA stays tightly wound around histones, preventing RNA polymerase binding and gene transcription. Low methylation loosens the coils and make the DNA accessible to RNA polymerase, allowing gene transcription.

<figure>
<img src="../../images/epigenetic-mech.jpeg" width="700" height="500">
<figcaption align = "center"> <b> Fig 1: Affect of epigenetic mechanisms on health. [1] </b> </figcaption>
    
</figure>
    
<div class="alert alert-block alert-warning">
    <i class="fa fa-pencil" aria-hidden="true"></i>
    <b>[1] Reference:</b> https://commonfund.nih.gov/epigenomics/figure </div>


### <span> DNA Methylation Flashcards <span>


In [None]:
IRdisplay::display_html('<iframe src = "../../docs/quiz_files/methylation.html" width=95% height=600></iframe>')

### <span> Analysis Architecture for Differential Methylation Analysis <span>

<figure>
<img src="../../images/methylation-flowchart.jpeg" width="700" height="500">
<figcaption align = "center"><b>Fig 2: Analysis Architecture for Differential Methylation Analysis. </b></figcaption>
</figure>  
  
This figure represents the analysis architecture followed in this module. The module has been designed according to the resources and the availability of data. The blue box represents the pipeline that can be implemented using the Nextflow nf-core/methylseq module. The purple box represents the data that can be directly extracted from GEO. Both the blue and purple boxes generate a methylation coverage file, and the user can implement either of the methods to generate gene counts and feed them to perform the further downstream analysis. However, Nextflow would take a lot of storage and processing power, so it is recommended to extract the data from GEO if available. If the required data is not available from GEO, then the Nextflow pipeline can be used to extract the gene counts. The downstream analysis is carried through using the R kernel of a Jupyter notebook, and all the steps are discussed in detail in this module.

### <span>Raw Reads to Methylation Coverage File (Optional)<a name="RR"></a> </span>
<figure>
<img src="../../images/nfcore-methylseq.png" width="700" height="500">
<figcaption align = "center"><b>Fig 3: Flowchart for converting raw read to methylation coverage. </b></figcaption>
</figure>
    
This figure represents the analysis architecture followed in this module. The module has been designed according to the resources and the availability of data. The analysis steps represent the pipeline that can be implemented using the Nextflow nf-core/methylseq module. In this figure, the analysis steps to perform methyl seq are shown. Now, there are two different workflows that can be followed to implement this pipeline. The first one is Bismark workflow, where it shows all the tools which can be used for each step of the analysis. We have a similar tools list for each step for the bwa-meth workflow. Both of them are very popular workflows to implement methylseq pipeline.
    
The sample command to run nf-core methylseq pipeline to generate quality control reports and extract methylation call and coverage file is provided below. #### This step is <u>optional</u> as it is the preprocessing step to let you experience generating your own methylation coverage file. To save on computational and storage resources, we have already provided the methylation coverage file you will use in the down processing analysis in step 3. 
    
If you choose to generate your own methylation coverage file then refer to the instructions outlined in the RNAseq submodule, and refer to the nf-core [methylseq](https://nf-co.re/methylseq). Again, you will need to modify the config file to include your bucket and project ID. 

## **Learning Objectives**

* **Understanding Epigenetics and DNA Methylation:** Learners will gain a foundational understanding of epigenetics, focusing specifically on DNA methylation as a key epigenetic mechanism.  This includes learning about CpG sites, the impact of methylation on gene expression, and the relationship between methylation patterns and disease.

* **Familiarity with DNA Methylation Analysis Techniques:** The notebook introduces bisulfite sequencing as a method for determining DNA methylation patterns.  It explains the principle behind the technique and its single-base resolution capabilities.

* **Mastering the Methylation Analysis Workflow:** The core objective revolves around learning a complete workflow for differential methylation analysis.  This includes:

    * **Data Acquisition:** Understanding how to obtain methylation data, either through public databases (GEO) or by running a Nextflow pipeline (nf-core/methylseq – although this is presented as optional to reduce computational demands).
    * **Data Preprocessing:** Performing quality control, filtering, and normalization of methylation data.  This involves using tools within the methylKit R package to filter by coverage and handle potential biases.
    * **Differential Methylation Analysis:** Applying statistical methods (Fisher's Exact Test and Logistic Regression) to identify differentially methylated regions (DMRs) between experimental groups.  The notebook also mentions a Betabinomial test but doesn't fully implement it.
    * **Data Interpretation and Visualization:** Creating and interpreting visualizations such as histograms, scatter plots, dendrograms, PCA plots, and volcano plots.  This allows for assessing data quality, identifying outliers, and visualizing methylation differences.
    * **Annotation and Interpretation:** Annotating differentially methylated CpGs using the genomation package to understand their genomic context (e.g., proximity to genes, location within gene features) for biological interpretation.

* **Developing Practical Skills in R and Nextflow:** Learners will gain practical experience using R (specifically the methylKit and genomation packages) for data analysis.  The optional Nextflow component introduces a bioinformatics pipeline for processing raw sequencing data, enhancing the learning experience for those with more advanced bioinformatics skills and resources.

## **Prerequisites**

**Software and Packages:**

* **Nextflow:** For running the RNA-Seq pipeline.
* **R and several R packages:**  `ggplot2`, `ggforce`, `tidyverse`, `BiocManager`, `methylKit`, `GenomicRanges`, `genomation`, `matrixStats`, `IRdisplay`.  These are crucial for the differential gene expression analysis and visualization steps. 

**APIs**

* **Amazon S3**  The notebook extensively uses `aws s3` commands, indicating it needs access to Amazon S3 for downloading input files and potentially for storing intermediate and output files from the Nextflow pipeline.  The notebook suggests creating a bucket (`aws s3 mb s3://UNIQUE-BUCKET-NAME`).

* **AWS Batch Compute Environment and Job Queue:** You must have an AWS Batch compute environment and job queue configured. The CloudFormation template automates this. You can set up one manually following the instructions in the link provided in the notebook, but using the template is *recommended* for ease of setup.

## **Get started**

Run the following to create a Kernel with all required packaged installed. It will take about 10 minutes to install packages and create a kernel.

In [None]:
system('chmod +x install_rrbs_packages.sh' , intern=TRUE)
system('bash install_rrbs_packages.sh >> logs.txt' , intern=TRUE) # This creates "R-RRBS" kernel

### **Important: Choose "R-RRBS" kernel for the rest of the notebook.**

### AWS Batch Setup

AWS Batch will create the needed permissions, roles and resources to run Nextflow in a serverless manner. You can set up AWS Batch manually or deploy it **automatically** with a stack template. The Launch Stack button below will take you to the cloud formation create stack webpage with the template with required resources already linked. 

If you prefer to skip manual deployment and deploy automatically in the cloud, click the Launch Stack button below. For a walkthrough of the screens during automatic deployment please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/HowToLaunchAWSBatch.md). The deployment should take ~5 min and then the resources will be ready for use. 

[![Launch Stack](../../images/LaunchStack.jpg)](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?stackName=aws-batch-nigms&templateURL=https://nigms-sandbox.s3.us-east-1.amazonaws.com/cf-templates/AWSBatch_template.yaml )


Before begining this tutorial, if you do not have required roles, policies, permissions or compute environment and would like to **manually** set those up please click [here](https://github.com/NIGMS/NIGMS-Sandbox/blob/main/docs/AWS-Batch-Setup.md) to set that up.

### Install Nextflow

In [None]:
#Install nexflow, make it exceutable, and update it
system('curl https://get.nextflow.io | bash' , intern=TRUE)
system('chmod +x nextflow' , intern=TRUE)
system('./nextflow self-update' , intern=TRUE)

The size of the output data generated by Nextflow is large. We can mitigate that by storing the temporary and output files to a bucket by setting the 'workDir' and 'params.outdir' to a existing bucket:
 
`workDir = 's3://your_bucket_name/meth-tmp`  
`params.outdir = 's3://your_bucket_name/meth-outputs`

In [None]:
#This step can take up to 20 min depending on the machine-type and input files.
system('./nextflow run nf-core/methylseq -c rrbs-aws.config -profile test,aws', intern=TRUE)

<div class="alert alert-block alert-info">
    <i class="fa fa-lightbulb-o" aria-hidden="true"></i>
    <b>Tip: </b> If you don't immediately see a output on your screen check your output directory you have pointed to in your config file to insure that Nextflow is running. You should see some output directories/files.
</div>

### Methylation Coverage to Differential Methylation

### <span> Load packages </span>

In [None]:
library("methylKit")
library("GenomicRanges")
library("genomation")

### <span> Reading Methylation Call Files and Design Experiment</span>
The sample files are collected in an R list object and then loaded into methylKit using the methRead function. methRead loads all of the methylation files into a methylRawList object and sample location, IDs, assembly, treatment, and context should be supplied in this function

In [None]:
# download data files from storage bucket
file.list=list("GSM5266860_CD_NP1.txt.gz",
                "GSM5266861_CD_NP2.txt.gz",
                "GSM5266862_CD_NP3.txt.gz", 
                "GSM5266863_CD_P1.txt.gz", 
                "GSM5266864_CD_P2.txt.gz", 
                "GSM5266865_CD_P3.txt.gz", 
                "GSM5266866_BN_NP1.txt.gz", 
                "GSM5266867_BN_NP2.txt.gz",
                "GSM5266868_BN_NP3.txt.gz", 
                "GSM5266869_BN_P1.txt.gz",
                "GSM5266870_BN_P2.txt.gz",
                "GSM5266871_BN_P3.txt.gz")

for (f in file.list) {
    system(paste0("aws s3 cp ","s3://nigms-sandbox/nosi-und/RRBS/",f , " . "), intern=TRUE)
}

In [None]:
myobj=methRead(file.list,
               sample.id=list("CD_NP1","CD_NP2","CD_NP3","CD_P1","CD_P2","CD_P3","BN_NP1","BN_NP2","BN_NP3","BN_P1","BN_P2","BN_P3"),
               assembly="Rnor_6.0",
               treatment=c(0,0,0,0,0,0,1,1,1,1,1,1),
               context="CpG"
)

<div class="alert alert-block alert-success">
    <i class="fa fa-hand-paper-o" aria-hidden="true"></i>
    <b>Note: </b>  If you've used Nextflow to produce your methylation coverage files and would like to use them for the down processing analysis instead of the test data provided enter your own files into the two previous code cells above with by copying them from the <b>bismark</b> subdirectory within your Nextflow outputs directory.
</div>

### <span> Data Filtration and Exploratory Analysis </span>
#### Descriptive Statistics
Once the data has been collected into a single object, we now look at the basic statistics for each sample. Basic statistics can include the percentage methylation and the coverage. Percentage methylation histograms normally have peaks on both of the distribution's ends. Within a cell, cytosines are either methylated or unmethylated. Using this knowledge, we can determine if there is a similar pattern between many cells for locations with high methylation, low methylation, and intermediate methylation. Typically, there should be a higher number of locations with high methylation and low methylation, and a lower number of locations with intermediate methylation. Bisulfite sequencing does have a relatively high error rate and because of this, samples between 0% and 10% are typically classified as "unmethylated" while samples between 90% and 100% are classified as "fully methylated", though these thresholds are not fixed.

In [None]:
# Get a histogram of the methylation percentage per sample
# Here for sample 2
getMethylationStats(myobj[[2]],plot=TRUE,both.strands=FALSE)

Experiments that are suffering from PCR duplication bias will have a secondary peak towards the right hand side of the coverage histogram.

In [None]:
#Histogram of methylation coverage
getCoverageStats(myobj[[2]],plot=TRUE,both.strands=FALSE)

### Filter Step
Filtering samples based on coverage can often be useful. Specifically, if samples have overamplification or PCR bias, it can be useful to discard bases that have a very high read coverage. Bases with a very low read coverage should also be discarded because they tend to produce statistics that are unreliable and unstable in the downstream analyses. The code shown below filters a methylRawList and discards bases that have covereage below 10 reads, which was already done when the files were read in. Additionally, the code below discards bases with more than 99.9th percentile coverage in each sample.

In [None]:
filtered.myobj=filterByCoverage(myobj,lo.count=10,lo.perc=NULL,
                                      hi.count=NULL,hi.perc=99.9)

### <span> Normalization </span>
Basic normalization of the coverage values between samples can be performed using a scaling factor. This scaling factor is derived from differences in the median coverage distributions.

In [None]:
myobj.filt.norm <- normalizeCoverage(filtered.myobj, method = "median")

### Merging samples into a single table
Before further analysis can be performed, bases that are covered by the reads need to be extracted for all samples. The unite() function merges all of the samples into one object covering the base-pair locations in all samples. Setting destrand=TRUE (the default is FALSE) will merge reads on both strands of a CpG dinucleotide. This provides better coverage, but only advised when looking at CpG methylation.

In [None]:
## we use :: notation to make sure unite() function from only methylKit package is called
meth=unite(myobj.filt.norm, destrand=FALSE)

In [None]:
# creates a methylBase object, where only CpGs covered with at least 1 sample per group will be returned
meth.min=unite(myobj,min.per.group=1L)

### <span> Filtering CpGs </span>
Many CpG sites with little to no variation among study subject are often present in high-throughput methylation data, which is not very informative for downstream analyses. Standard deviation filtering of methylation ratio values (equivalent to Beta values), is the most commonly used and simple method. This method has been shown to be consistent and robust for use in different real datasets and on most occasions will suffice.

In [None]:
# get percent methylation matrix
pm=percMethylation(meth) 

# calculate standard deviation of CpGs
sds=matrixStats::rowSds(pm)

# Visualize the distribution of the per-CpG standard deviation
# to determine a suitable cutoff
hist(sds, breaks = 100, col="cornflowerblue", xlab="Std. dev. per CpG")

# keep only CpG with standard deviations larger than 2%
meth <- meth[sds > 2]

# Check the remaining number of CpGs
nrow(meth)

C -> T mutations can be further removed because they do not represent true bisulfite-treatment-associated conversions. We can store mutation locations in a GRanges object and we can use the object to remove the overlapping CpGs with the mutations. To perform the overlap operation, we convert the methylKit object to a GRanges object and perform filtering using the %over% function. This results in a returned methylKit object.

In [None]:
library(GenomicRanges)
# example SNP
mut=GRanges(seqnames=c("chr21","chr21"),
            ranges=IRanges(start=c(9853296, 9853326),
                           end=c( 9853296,9853326)))

# select CpGs that do not overlap with mutations
sub.meth=meth[! as(meth,"GRanges") %over% mut,]
nrow(meth)
nrow(sub.meth)

### <span> Data Structures and Outlier Detection </span>
We can check the correlation between samples using getCorrelation. This function will plot scatter plots with Pearson correlation coefficients.

In [None]:
getCorrelation(meth,plot=TRUE)

### <span> Clustering Analysis </span>
The data structure can additionally be visualized in a dendrogram using hierarchical clustering of distance measures derived from each samples’ percentage methylation.

In [None]:
clusterSamples(meth, dist="correlation", method="ward.D2", plot=TRUE)

### <span> Principal Component Analysis </span>
We can also visualize the data through plotting the samples in a principal component space. Multidimensional data (i.e. we have as many dimensions in this data as there are CpG loci in meth) can be projected in into the PCA plot's 2- or 3- dimensional space, while maintaining as much variation as possible. In the PCA space, samples that are more alike will be clustered together, and with this plot we can identify the largest sources of variation in the data as well as if there are sample swaps or outlier samples.

In [None]:
pc=PCASamples(meth,obj.return = TRUE, scale=FALSE, screeplot = FALSE, comp=c(1,2), transpose=TRUE)
summary(pc)

### <span> Differential Methylation </span>
### Single CpG Sites
Once we have confirmed that the basic statistics and data structures of the samples are reasonable, we can proceed to differential methylation. Differential DNA methylation is usually calculated by comparing the proportion of methylated Cs in a test sample relative to a control. The Fisher's Exact Test and similar methods can be applied when there are no replicates for the test and control cases. This can allow us to make simple comparisons between the pairs of samples such as the test and control. When replicates are present, regression based methods are typically used to model the methylation levels relative to the sample groups and variation between the replicates. Regression methods also have another additional advantage over the use of the Fisher's Exact test in that they all for the inclusion of sample specific covariates (categorical or continuous) as well as the ability to adjust for confounding variables. 

There are three options provided to get the differential methylation results namely Fisher’s Exact Test, Betabinomial Distribution Based Test, and Logistic Regression Based Test as you will see below. Only the Fisher’s exact test and the Logistic Regression based test will be explored. If you plan to use Betabinomial Distribution Based Test or compare the results of all three types of tests, the code can be uncommented. 

### Fisher’s Exact Test

In [None]:
pooled.meth=pool(meth,sample.ids=c("test","control"))
dm.pooledf=calculateDiffMeth(pooled.meth)

In [None]:
# get differentially methylated bases/regions with specific cutoffs
all.diff=getMethylDiff(dm.pooledf,difference=25,qvalue=0.01,type="all")

# get hyper-methylated
hyper=getMethylDiff(dm.pooledf,difference=25,qvalue=0.01,type="hyper")

# get hypo-methylated
hypo=getMethylDiff(dm.pooledf,difference=25,qvalue=0.01,type="hypo")

#using [ ] notation
hyper2=dm.pooledf[dm.pooledf$qvalue < 0.01 & dm.pooledf$meth.diff > 25,]

In [None]:
head(dm.pooledf)
nrow(dm.pooledf)

In [None]:
#Check the results
head(hyper)
nrow(hyper)
head(hypo)
nrow(hypo)

### Optional: Betabinomial-Distribution-Based Tests
The beta-binominal model for calculating the differential methylation can be accessed through the code below. This accounts for both sampling and epigenetic variablity, and is useful for better modeling of the variance. This model follows the binominal distribution of the number of reads which is similar to how logistic regression works. However, the beta distribution can have varying methylation proportions across samples.

If you plan to use Betabinomial Distribution Based Test or compare the results of all three types of tests, the code can be uncommented. 

In [None]:
#dm.dss=calculateDiffMethDSS(meth)

### Logistic Regression Based Tests
The following code tests for the differential methylation of our dataset; i.e comparing methylation levels between two groups. If the data has replicates, logistic regression should be used.

In [None]:
# Test for differential methylation... This might take a few minutes.
dm.lr=calculateDiffMeth(meth,overdispersion = "MN",test ="Chisq")

In [None]:
# Simple volcano plot to get an overview of differential methylation
plot(dm.lr$meth.diff, -log10(dm.lr$qvalue))
abline(v=0)

Next, we can visualize the number of hyper- and hypomethylation events per chromosome, as a percent of the sites with minimum coverage and minimal differential methylation. By default this is a 25% change in methylation and all samples with 10X coverage.

### <span> Explore Results </span>

In [None]:
# Overview of percentage hyper and hypo CpGs per chromosome.
diffMethPerChr(dm.lr)

After q-value calculation, we can select the differentially methylated regions/bases based on q-value and percent methylation difference cutoffs of Treatment versus Control. The following bits of code selects the bases that have q-value < 0.01 and percent methylation difference larger than 25%. If you specify type="hyper" or type="hypo" options, you will extract the hyper-methylated or hypo-methylated regions/bases.

If necessary, covariates (such as age, sex, smoking status, …) can be included in the regression analysis. The function will then try to separate the influence of the covariates from the treatment effect via the logistic regression model.

In [None]:
# get hyper methylated bases and order by qvalue
myDiff25p.hyper <- getMethylDiff(dm.lr,
                              difference=25,
                              qvalue=0.01,
                              type="hyper")
myDiff25p.hyper <- myDiff25p.hyper[order(myDiff25p.hyper$qvalue),]

# get hypo methylated bases and order by qvalue
myDiff25p.hypo <- getMethylDiff(dm.lr,
                             difference=25,
                             qvalue=0.01,
                             type="hypo")
myDiff25p.hypo <- myDiff25p.hypo[order(myDiff25p.hypo$qvalue),]

# get all differentially methylated bases and order by qvalue
myDiff25p <- getMethylDiff(dm.lr,
                        difference=25,
                        qvalue=0.01)

#get all differentially methylated bases with pvalue < 0.05
myDiff25p <- getMethylDiff(dm.lr,
                        difference=25,
                        qvalue=0.01)

#Order by qvalue
myDiff25p <- myDiff25p[order(myDiff25p$qvalue),]


In [None]:
#Explore the results
head(dm.lr)
nrow(dm.lr)
head(myDiff25p.hyper)
nrow(myDiff25p.hyper)
head(myDiff25p.hypo)
nrow(myDiff25p.hypo)
head(myDiff25p)
nrow(myDiff25p)

### <span> CpG Annotation </span>
Annotation of the differentially methylated regions and bases using the genomation package can help with biological interpretation of the data. A common annotation task looks at where the CpGs of interest are relative to genes, gene parts, and regulatory regions. The code below shows an example of reading the gene annotation information from a BED file (Browser Extensible Data - file format containing genome coordinates and associated annotations), and the following annotation of the differentially methylated regions using genomation functions. This annotation file can be downloaded from the UCSC TableBrowser.

In [None]:
# download data files from storage bucket
system("aws s3 cp s3://nigms-sandbox/nosi-und/RRBS/rn6_ensGene.bed .", intern=TRUE)


In [None]:
# First load the annotation data; i.e the coordinates of promoters, TSS, intron and exons
gene.obj <- readTranscriptFeatures("rn6_ensGene.bed")

In [None]:
head(gene.obj)

Annotate the results from the differentially methylated calls calculated. Some data wrangling is required to make the data compatible with the annotateWithGeneParts function. Here the chr is added to annotate the chromosome number and then the data is converted into a GRanger object.

In [None]:
anot.diff <- myDiff25p
anot.diff$chr <- sapply(anot.diff$chr, function(x) paste('chr', x, sep = ""))
head(anot.diff)
class(anot.diff)
anot.diff <- as(anot.diff,"GRanges")

The final data (anot.diff) is then used in the next step for annotation using annotateWithGeneParts function.

In [None]:
myDiff25p.all.anot <- annotateWithGeneParts(anot.diff, gene.obj)

In [None]:
# Summary of target set annotation
myDiff25p.all.anot

In [None]:
# View the distance to the nearest Transcription Start Site; the target.row column in the output indicates the row number in the initial target set
dist_tss <- getAssociationWithTSS(myDiff25p.all.anot)
head(dist_tss)

# See whether the differentially methylated CpGs are within promoters,introns or exons; the order is the same as the target set
head(getMembers(myDiff25p.all.anot))

# This can also be summarized for all differentially methylated CpGs
plotTargetAnnotation(myDiff25p.all.anot, main = "Differential Methylation Annotation")

### <span> (Optional) Make a dataframe with TSS, values and qvalues for future analysis. </span>

In [None]:
bs_results <- cbind(dist_tss, qvalue = anot.diff$qvalue, pvalue = anot.diff$pvalue)

### <span> (Optional) Write the Results to a Text File. </span>

In [None]:
#Write results to a text file. 
write.table(bs_results, "bs_results.txt", sep = "\t")

## **Conclusion**

This module provided a comprehensive overview of DNA methylation analysis, starting with an introduction to epigenetics and DNA methylation mechanisms. We explored the practical aspects of differential methylation analysis, including data acquisition from public sources or through the optional nf-core/methylseq pipeline, quality control, normalization, and statistical testing using the methylKit R package. The module emphasized data interpretation through various visualization techniques and biological annotation using the genomation package, enabling the identification of differentially methylated regions and their genomic context. By working through this notebook, learners have gained valuable skills in R and Nextflow for analyzing DNA methylation data and can now apply these techniques to explore the role of epigenetics in various biological processes and diseases.

## **Clean up**

Remember stop your notebook instance if you are finished.

<hr style="border:2px solid Orange">

### <span> References and useful links </span>

- #### https://www.bioconductor.org/packages/release/bioc/vignettes/methylKit/inst/doc/methylKit.html#4_Annotating_differentially_methylated_bases_or_regions
- #### https://nbis-workshop-epigenomics.readthedocs.io/en/stable/content/tutorials/methylationSeq/Seq_Tutorial.html
- #### https://compgenomr.github.io/book/bsseq.html