# Submodule 2: Preprocessing and Quality Control

<img src="images/LessonPlan2.jpg" alt="Drawing" width=1000 />

## Overview & Purpose

### Overview Video 1 - Methodology and Example Data

<video controls width=500 src="videos/ChromOccupancyVid1_wsubtitles.mp4">animation </video>

### Overview Video 2 - How to Use these Lessons

<video controls width=500 src="videos/ChromOccupancyVid2_wsubtitles.mp4">animation </video>\

## Filtering, Signal Visualization, and Peak Identification
This submodule introduces concepts for post map filtering, visualization of signal, identifying binding sites by calling peaks, detection of sequence motifs, and differential peak analysis.  

To demonstrate the process, we will build on the analysis performed  in submodule 1, where we preprocesses and mapped ChIP-seq, CUT&RUN, or CUT&Tag data comparing BAF inhibitor or mutant to a control sample. 

As a reminder, this module covers the processing of the data from three distinct but similar methods using downsampled data to improve runtime speed. The original data was published in :

Weber CM, et al. mSWI/SNF promotes Polycomb repression both directly and through genome-wide redistribution. Nat Struct Mol Biol. 2021  PMID: [34117481](https://pubmed.ncbi.nlm.nih.gov/34117481/)

Brahma S, Henikoff S. The BAF chromatin remodeler synergizes with RNA polymerase II and transcription factors to evict nucleosomes. Nat Genet. 2024 PMID: [38049663](https://pubmed.ncbi.nlm.nih.gov/38049663/)

Note that to allow faster processing we have limited the reads to that of a single chromosome (chr4).  

### Ways to use this module
If you used submodule 1, you may recall how to navigate through the module. Throughout this module, we have color-coded commands according to ChIP-seq, CUT&RUN, and CUT&Tag. Therefore this module can be used to learn about the processing of each method individually, to compare each method to the others, or you can follow the colored commands to only process one type, either ChIP-seq, CUT&RUN, or CUT&Tag.
Commands for each method will be designated by an individual logo before the command, just like the following examples

<img src="images/ChIPseqLogo.jpg" alt="Drawing" style="width: 250px;" align="left"/>

In [None]:
#run this cell for ChIP-seq
print("Code for ChIP-seq will be placed after the above image. Run these cells if performing ChIP-seq analysis.")

<img src="images/CUT&RUNLogo.jpg" alt="Drawing" style="width: 250px;" align="left"/>

In [None]:
#run this cell for CUT&RUN
print("Code for CUT&RUN will be placed after the above image. Run these cells if performing CUT&RUN analysis.")

<img src="images/CUT&TagLogo.jpg" alt="Drawing" style="width: 250px;" align="left"/>

In [None]:
#run this cell for CUT&Tag
print("Code for CUT&Tag will be placed after the above image. Run these cells if performing CUT&Tag analysis.")

<div class="alert alert-block alert-success" style="font-size:100%">
<span style="color:black"> By following the colors/images, you can run one, two, or all three types of analyses.</span>
</div>

### Required Files
In this stage of the module, you will use the sam files that are the output from submodule 1 (We also provide them if you skipped submodule 1). You can also use this module on your own data or any published ChIP-seq, CUT&RUN, or CUT&Tag dataset. 

<div class="alert-info" style="font-size:200%">
STEP 1: Set Up Environment
</div>

Initial items to configure your Cloud environment. In this step we will use conda to install the following packages:

Quality filtering and deduplication:
[samtools](https://anaconda.org/bioconda/samtools), [picard](https://anaconda.org/bioconda/picard)

Paired-end filtering and computing coverage:
[bedtools](https://bedtools.readthedocs.io/en/latest/index.html#)

Visualization:
[igv-notebook](https://igv.org/), [deeptools](https://deeptools.readthedocs.io/en/develop/)

Peak identification:
[macs3](https://github.com/macs3-project/MACS) [seacr](https://github.com/FredHutch/SEACR)

Motif analysis:
[Homer](http://homer.ucsd.edu/homer/motif/)

Differential peak calling:
[MAnorm](https://github.com/shao-lab/MAnorm)

In [None]:
#First let's install mamba to configure our environment
! curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
! bash Miniforge3-$(uname)-$(uname -m).sh -u -b -p $HOME/mambaforge
print("done")

In [None]:
#now let's install several required packages
!mamba install -c bioconda picard samtools deeptools bedtools==2.27 ucsc-bedgraphtobigwig macs3 seacr meme manorm -y
!pip install jupyterquiz==2.0.7 jupytercards
!pip install --user igv-notebook
print("done")

<div class="alert alert-block alert-warning" style="font-size:100%">
<span style="color:black"> The next cell will restart the kernel, so you will see the following message once you run the cell. When this image appears, click okay, and then keep going onto the next cell.</span>
</div>

<img src="./images/restartkernel.jpg" alt="Drawing" style="width: 500px;" align="left"/>

In [None]:
#Restarts the kernel to implement changes 
import os
os._exit(00)

<div class="alert alert-block alert-warning" style="font-size:100%">
<span style="color:black"> After the kernel restarts continue from here.</span>
</div>

In [None]:
#Now let's import packages that we installed
numthreads=!lscpu | grep '^CPU(s)'| awk '{print $2-1}'
numthreadsint = int(numthreads[0])
import sys
import os
from jupyterquiz import display_quiz
from IPython.display import IFrame
#from IPython.display import display
from IPython.display import Image
from jupytercards import display_flashcards
import igv_notebook
import pandas as pd
#import modules for matching-type quiz
%cd questions
from quiz_module import run_quiz
%cd ../
import json
import ipywidgets as widgets
from IPython.display import display
import random
print("done")

In [None]:
wd="~/SageMaker/SandboxChromatinOccupancy"
%cd $wd
#show which folder you are working in. 
!pwd

In [None]:
# These commands move into our Tutorial 2 directory and create our subdirectory structure.
!mkdir -p $wd/Submodule2/
%cd $wd/Submodule2/
!mkdir -p $wd/Submodule2/Filtering
!mkdir -p $wd/Submodule2/Visualization
!mkdir -p $wd/Submodule2/Peaks
!mkdir -p $wd/Submodule2/Motifs

In [None]:
#Let's copy and extract our tutorial files
!wget https://chromatinoccupancytutorial.s3.us-east-2.amazonaws.com/Submodule2.zip
!unzip Submodule2.zip
print("done")

<div class="alert-info" style="font-size:200%">
STEP 2: Filter and convert sam files to bam or bed files
</div>

In submodule 1, we introduced the sam format. 

<img src="images/samformat.jpg" alt="Drawing" style="width: 800px;"/>

We'll start by filtering out low quality mappings, sort the file by chromosomal coordinate, and output the result as a bam file or a paired-end bed file. We will use the bed files particularly for CUT&RUN and CUT&Tag analysis. 

<img src="images/ChIPseqLogo.jpg" alt="Drawing" style="width: 100px;" align="left" /> Run the following command for ChIP-seq, which lists the sam files we will use.

In [None]:
#list the files in ChIPseqSam which we just downloaded.
!ls $wd/Submodule2/Submodule2/ChIPSeqSam

In [None]:
# This will convert to bam by using samtools view with the -b option. The h and S option tells samtools that the file has a header and is in sam format. We will pipe this to samtools sort. Pay attention to the "-" at the end of the sort command which tells samtools to use stdin.
!samtools view -q 10 -bhS $wd/Submodule2/Submodule2/ChIPSeqSam/H3K27ac_ChIPseq_noaux.sam | samtools sort -o $wd/Submodule2/Filtering/H3K27ac_ChIPseq_noaux.bam - 
print("done with H3K27ac_ChIPseq_noaux")
#We'll do the same command for our bafi sample.
!samtools view -q 10 -bhS $wd/Submodule2/Submodule2/ChIPSeqSam/H3K27ac_ChIPseq_aux.sam | samtools sort -o $wd/Submodule2/Filtering/H3K27ac_ChIPseq_aux.bam - 
print("done with H3K27ac_ChIPseq_aux")

You may have noticed the parameters -bhS and -q 10 in the above commands. Briefly, -bhS describes aspects of the file to samtools, such that you want to output a bam file (the b option), that it has a header (the h option), and that it is currently in sam format (the S option). We also specified -q 10 which removes reads with a mapping score <= 10. 

</div>In the cell below, let's view the first line of one of our bam files:

In [None]:
!samtools view $wd/Submodule2/Filtering/H3K27ac_ChIPseq_noauxdedup.bam | head -n 3
# Note that there will be an error message because we are breaking a pipe by printing only the first 3 lines. Please ignore the error message.

<img src="images/CUT&RUNLogo.jpg" alt="Drawing" style="width: 120px;" align="left"/> Run the following command for CUT&RUN, which lists the sam files we will use.

In [None]:
#list the files in CUTnRUNSam which we just downloaded.
!ls $wd/Submodule2/Submodule2/CUTnRUNsam

 You will notice that in addition to the mm39 sam files (mapped to mouse genome), we have additional sam files mapped to R64-1-1 (Saccharomyces cereviseae a.k.a the budding yeast) genome. These are for spike-in reads. In CUT&RUN and CUT&Tag, users add a constant and tiny amount of pre-fragmented heterogous DNA as "spike-in" (e.g., yeast spike-in for experiments targeting protein in mouse cells) to quantitatively compare signals or reads from the target genome. This helps to account for differences in cell number, DNA purification, library preparation, and sequencing efficiencies between controls and treatment sets. See PMID: PMID: [34117481](https://pubmed.ncbi.nlm.nih.gov/34117481/) 

In [None]:
# This will convert the mm39 (mouse)sam files first to bam files by using samtools view with the -b option. The h option tells samtools that the file has a header. We will then conver the bam files to paired-end bed files using bamtobed and -bedpe option.
!samtools view -b -h $wd/Submodule2/Submodule2/CUTnRUNsam/BRG1_CnR_con.mm39_chr4.sam -o $wd/Submodule2/Filtering/BRG1_CnR_con.mm39_chr4.bam
!bedtools bamtobed -bedpe -i $wd/Submodule2/Filtering/BRG1_CnR_con.mm39_chr4.bam | cut -f 1,2,6 | sort -k1,1 -k2,2n -k3,3n > $wd/Submodule2/Filtering/BRG1_CnR_con.mm39_chr4.bed
print("done with BRG1_CnR_con")

</div>In the cell below, let's view the first few lines of the bed file to see how they look.
</div>

</div>Note: the first column ("chrom") denotes the chromosome number to which the reads mapped, the second and the third columns show the first and the last base pair positions of mapped fragment, "chromStart" and "chromEnd", respectively. Remember, we used paired-end sequencing, so we can obtain the coordinates of both the beginining an the end of each mapped fragment. This is the minimum requirement for a bed file. There can be up to 9 more columns. What are information can be there in a bed file? Learn more: [UCSC_BED_format] (https://genome.ucsc.edu/FAQ/FAQformat.html#format1)

In [None]:
!head $wd/Submodule2/Filtering/BRG1_CnR_con.mm39_chr4.bed

In [None]:
#Now let's make the bam and bed files for the treatment and IgG data.
#This may take a few minutes: 🕘

!samtools view -b -h $wd/Submodule2/Submodule2/CUTnRUNsam/BRG1_CnR_FLV.mm39_chr4.sam -o $wd/Submodule2/Filtering/BRG1_CnR_FLV.mm39_chr4.bam
!bedtools bamtobed -bedpe -i $wd/Submodule2/Filtering/BRG1_CnR_FLV.mm39_chr4.bam | cut -f 1,2,6 | sort -k1,1 -k2,2n -k3,3n > $wd/Submodule2/Filtering/BRG1_CnR_FLV.mm39_chr4.bed
print("done with BRG1_CnR_FLV")

!samtools view -b -h $wd/Submodule2/Submodule2/CUTnRUNsam/IgG_CnR.mm39_chr4.sam -o $wd/Submodule2/Filtering/IgG_CnR.mm39_chr4.bam
!bedtools bamtobed -bedpe -i $wd/Submodule2/Filtering/IgG_CnR.mm39_chr4.bam | cut -f 1,2,6 | sort -k1,1 -k2,2n -k3,3n > $wd/Submodule2/Filtering/IgG_CnR.mm39_chr4.bed
print("done with IgG_CnR")

In [None]:
#Let's make the spike-in bed files.
!samtools view -b -h $wd/Submodule2/Submodule2/CUTnRUNsam/BRG1_CnR_con.R64-1-1.sam -o $wd/Submodule2/Filtering/BRG1_CnR_con.R64-1-1.bam
!bedtools bamtobed -bedpe -i $wd/Submodule2/Filtering/BRG1_CnR_con.R64-1-1.bam | cut -f 1,2,6 | sort -k1,1 -k2,2n -k3,3n > $wd/Submodule2/Filtering/BRG1_CnR_con.R64-1-1.bed
print("done with BRG1_CnR_con")

!samtools view -b -h $wd/Submodule2/Submodule2/CUTnRUNsam/BRG1_CnR_FLV.R64-1-1.sam -o $wd/Submodule2/Filtering/BRG1_CnR_FLV.R64-1-1.bam
!bedtools bamtobed -bedpe -i $wd/Submodule2/Filtering/BRG1_CnR_FLV.R64-1-1.bam | cut -f 1,2,6 | sort -k1,1 -k2,2n -k3,3n > $wd/Submodule2/Filtering/BRG1_CnR_FLV.R64-1-1.bed
print("done with BRG1_CnR_FLV")

!samtools view -b -h $wd/Submodule2/Submodule2/CUTnRUNsam/IgG_CnR.R64-1-1.sam -o $wd/Submodule2/Filtering/IgG_CnR.R64-1-1.bam
!bedtools bamtobed -bedpe -i $wd/Submodule2/Filtering/IgG_CnR.R64-1-1.bam | cut -f 1,2,6 | sort -k1,1 -k2,2n -k3,3n > $wd/Submodule2/Filtering/IgG_CnR.R64-1-1.bed
print("done with IgG_CnR")

<img src="images/CUT&TagLogo.jpg" alt="Drawing" style="width: 120px;" align="left"/> Run the following command for CUT&Tag, which lists the sam files we will use.

In [None]:
#list the files in CUTnTagSam which we just downloaded.
!ls $wd/Submodule2/Submodule2/CUTnTagSam

You will notice that in addition to the mm39 sam files (mapped to mouse genome), we have additional sam files mapped to ecoli genome. These are for spike-in reads. In CUT&RUN and CUT&Tag, users add/take advantage of a constant and tiny amount of pre-fragmented heterogous DNA as "spike-in" (e.g., ecoli spike-in which is carried over with the pA-Tn5 in CUT&Tag experiments) to quantitatively compare signals or reads from the target genome. This helps to account for differences in cell number, DNA purification, library preparation, and sequencing efficiencies between controls and treatment sets. See PMID: [31036827](https://pubmed.ncbi.nlm.nih.gov/31036827/)

In [None]:
# This will convert the mm39 (mouse)sam files first to bam files by using samtools view with the -b option. The h option tells samtools that the file has a header. We will then conver the bam files to paired-end bed files using bamtobed and -bedpe option.
!samtools view -b -h $wd/Submodule2/Submodule2/CUTnTagSam/RNAPII-S5P_CnT_con.mm39_chr4.sam -o $wd/Submodule2/Filtering/RNAPII-S5P_CnT_con.mm39_chr4.bam
!bedtools bamtobed -bedpe -i $wd/Submodule2/Filtering/RNAPII-S5P_CnT_con.mm39_chr4.bam | cut -f 1,2,6 | sort -k1,1 -k2,2n -k3,3n > $wd/Submodule2/Filtering/RNAPII-S5P_CnT_con.mm39_chr4.bed
print("done with RNAPII-S5P_CnT_con")

</div>In the cell below, let's view the first few lines of the bed file to see how they look.
</div>
</div>Note: the first column ("chrom") denotes the chromosome number to which the reads mapped, the second and the third columns show the first and the last base pair positions of mapped fragment, "chromStart" and "chromEnd", respectively. Remember, we used paired-end sequencing, so we can obtain the coordinates of both the beginining an the end of each mapped fragment. This is the minimum requirement for a bed file. There can be up to 9 more columns. What are information can be there in a bed file? Learn more: [UCSC_BED_format] (https://genome.ucsc.edu/FAQ/FAQformat.html#format1)

In [None]:
!head $wd/Submodule2/Filtering/RNAPII-S5P_CnT_con.mm39_chr4.bed

In [None]:
#Let's make the bam and bed files for the treatment and IgG data.
#This may take a few minutes: 🕘

!samtools view -b -h $wd/Submodule2/Submodule2/CUTnTagSam/RNAPII-S5P_CnT_FLV.mm39_chr4.sam -o $wd/Submodule2/Filtering/RNAPII-S5P_CnT_FLV.mm39_chr4.bam
!bedtools bamtobed -bedpe -i $wd/Submodule2/Filtering/RNAPII-S5P_CnT_FLV.mm39_chr4.bam | cut -f 1,2,6 | sort -k1,1 -k2,2n -k3,3n > $wd/Submodule2/Filtering/RNAPII-S5P_CnT_FLV.mm39_chr4.bed
print("done with RNAPII-S5P_CnT_FLV")

!samtools view -b -h $wd/Submodule2/Submodule2/CUTnTagSam/IgG_CnT.mm39_chr4.sam -o $wd/Submodule2/Filtering/IgG_CnT.mm39_chr4.bam
!bedtools bamtobed -bedpe -i $wd/Submodule2/Filtering/IgG_CnT.mm39_chr4.bam | cut -f 1,2,6 | sort -k1,1 -k2,2n -k3,3n > $wd/Submodule2/Filtering/IgG_CnT.mm39_chr4.bed
print("done with IgG_CnT")

In [None]:
#Now let's make the spike-in bed files.
!samtools view -b -h $wd/Submodule2/Submodule2/CUTnTagSam/RNAPII-S5P_CnT_con.ecoli.sam -o $wd/Submodule2/Filtering/RNAPII-S5P_CnT_con.ecoli.bam
!bedtools bamtobed -bedpe -i $wd/Submodule2/Filtering/RNAPII-S5P_CnT_con.ecoli.bam | cut -f 1,2,6 | sort -k1,1 -k2,2n -k3,3n > $wd/Submodule2/Filtering/RNAPII-S5P_CnT_con.ecoli.bed
print("done with RNAPII-S5P_CnT_con")

!samtools view -b -h $wd/Submodule2/Submodule2/CUTnTagSam/RNAPII-S5P_CnT_FLV.ecoli.sam -o $wd/Submodule2/Filtering/RNAPII-S5P_CnT_FLV.ecoli.bam
!bedtools bamtobed -bedpe -i $wd/Submodule2/Filtering/RNAPII-S5P_CnT_FLV.ecoli.bam | cut -f 1,2,6 | sort -k1,1 -k2,2n -k3,3n > $wd/Submodule2/Filtering/RNAPII-S5P_CnT_FLV.ecoli.bed
print("done with RNAPII-S5P_CnT_FLV")

!samtools view -b -h $wd/Submodule2/Submodule2/CUTnTagSam/IgG_CnT.ecoli.sam -o $wd/Submodule2/Filtering/IgG_CnT.ecoli.bam
!bedtools bamtobed -bedpe -i $wd/Submodule2/Filtering/IgG_CnT.ecoli.bam | cut -f 1,2,6 | sort -k1,1 -k2,2n -k3,3n > $wd/Submodule2/Filtering/IgG_CnT.ecoli.bed
print("done with IgG_CnT")

<div class="alert-info" style="font-size:200%">
Interactive Quiz Question: Click on the correct answer in the following cell.
</div>

In [None]:
%cd $wd/Submodule2
display_quiz("../questions/mappingquality.json")

<div class="alert-info" style="font-size:200%">
STEP 3: Removal of Duplicates
</div>
It's important to remove duplicates from our reads because part of the ChIP-seq method includes a PCR step for library amplification. This can create biases in the data resulting from PCR duplicates. To understand how PCR duplicates can affect the analysis, let's jump ahead a bit. Occupancy is represented by "peaks" of signal.

<img src="images/peaksofsignal.jpg" alt="Drawing" style="width: 500px;"/>

<div class="alert-warning" style="font-size:200%">
    
    
<img src="images/CUT&RUNLogo.jpg" alt="Drawing" style="width: 120px;" align="left"/>
    
<div class="alert-warning" style="font-size:50%">
</div>
NOTE for CUT&RUN and CUT&Tag
</div>
It is not so important to remove duplicates from CUT&RUN or CUT&Tag data because CUT&RUN and CUT&Tag library prepation typically involves fewer PCR cycles, so the chance of PCR duplicates is low!
</div>


<div class="alert-info" style="font-size:200%">
Interactive Quiz Question: Click on the correct answer in the following cell.
</div>

In [None]:
%cd $wd/Submodule2
display_quiz("../questions/duplicateQuiz.json")

<img src="images/ChIPseqLogo.jpg" alt="Drawing" style="width: 100px;" align="left" /> Run the following command for ChIP-seq, which lists the sam files we will use.

Okay, let's remove these duplicates using picard.

In [None]:
%cd $wd/Submodule2
# This will take the sorted bam file and remove duplicates, saving a new bam file and a summary in a text file.
!picard MarkDuplicates REMOVE_DUPLICATES=TRUE I=Filtering/H3K27ac_ChIPseq_noaux.bam O=Filtering/H3K27ac_ChIPseq_noauxdedup.bam METRICS_FILE=Filtering/H3K27ac_ChIPseq_ctldedup_metrics.txt 2> Filtering/PicardLog.txt
print("done with H3K27ac_ChIPseq_noaux")
#Let's do it again, this time for bafi.
!picard MarkDuplicates REMOVE_DUPLICATES=TRUE I=Filtering/H3K27ac_ChIPseq_aux.bam O=Filtering/H3K27ac_ChIPseq_auxdedup.bam METRICS_FILE=Filtering/H3K27ac_ChIPseq_auxdedup_metrics.txt 2> Filtering/PicardLog.txt
print("done with H3K27ac_ChIPseq_aux")

<div class="alert-info" style="font-size:200%">
Interactive Quiz Question: Match the tool that we used for each step in the analysis.
</div>

In [None]:
#Run for the quiz
%cd $wd/Submodule2
run_quiz("../questions/postprocessingtools.json", instant_feedback=True, shuffle_questions=False, shuffle_answers=True)

<div class="alert-info" style="font-size:200%">
STEP 4: Visualization
</div>


### We have previously learnt that we can see the coordinates of each read in the bam or bed files. However, this can be tideous and not practical for datasets with millions of reads. We need a better way of visualizing the results.
### In this step we will create files that summarize the pileup of reads at each base-pair along our genome, in [bedgraph](https://genome.ucsc.edu/goldenpath/help/bedgraph.html) or [bigwig](https://genome.ucsc.edu/goldenpath/help/bigWig.html) formats. 

For ChIP-seq datasets, we will create bigwig files using the command bamCoverage, part of the [deeptools](https://deeptools.readthedocs.io/en/develop/) package.

For CUT&RUN and CUT&Tag datasets, we will create paired-end bedgraph files using the command genomecov, part of the [bedtools](https://bedtools.readthedocs.io/en/latest/content/tools/genomecov.html) package.

<img src="images/balance.jpg" alt="Drawing" style="width: 100px;"/>

But first, let's talk about normalization.  What would these look like if we sequenced 100 million reads for one sample but only 1 million reads for another? 

We need to normalize  or calibrate our signal.

For ChIP-seq, we can normalize signal by the depth of sequencing. We can do so by dividing the count of reads at each coordinate by the total (per million). We can do this at the same time as creating the bigwig.

For CUT&RUN and CUT&Tag, we will calibrate signals by factoring spike-in reads. We will do this at the same time as creating the bedgraph.

<img src="images/ChIPseqLogo.jpg" alt="Drawing" style="width: 100px;" align="left" /> Run the following command for ChIP-seq datasets.
<div>This may take a few minutes: 🕘

In [None]:
# First we need to create an index of our bam file.
!samtools index $wd/Submodule2/Filtering/H3K27ac_ChIPseq_noauxdedup.bam

# Then we can create a bigwig file of the control sample.
!bamCoverage -b $wd/Submodule2/Filtering/H3K27ac_ChIPseq_noauxdedup.bam -o $wd/Submodule2/Visualization/H3K27ac_ChIPseq_noauxdedup.bw -bs 50 --normalizeUsing BPM 2> $wd/Submodule2/Visualization/bamCovLog_noaux.txt
print("done with H3K27ac_ChIPseq_noaux")

# Now let's rerun the commands for our other sample.
!samtools index $wd/Submodule2/Filtering/H3K27ac_ChIPseq_auxdedup.bam
!bamCoverage -b $wd/Submodule2/Filtering/H3K27ac_ChIPseq_auxdedup.bam -o $wd/Submodule2/Visualization/H3K27ac_ChIPseq_auxdedup.bw -bs 50 --normalizeUsing BPM 2> $wd/Submodule2/Visualization/bamCovLog_aux.txt

print("done with H3K27ac_ChIPseq_aux")

In the above examples we specify the bam file name after -b and the output file name after -o. 

We specified -bs 50, which tells bamCoverage to summarize the reads at 50 bp resolution.

You could also specify the number of threads to use with -p.

Lastly, we specified --normalizeUsing BPM. BPM stands for Bins Per Million mapped reads. What do you think this normalization does?

<div class="alert-info" style="font-size:200%">
Interactive Quiz Question: Click on the correct answer in following cell.
</div>

In [None]:
%cd $wd/Submodule2
display_quiz("../questions/BPMnorm.json")

<div class="alert-info" style="font-size:150%">
How does spike-in calibration work?
</div>
In the example below, although sample A and sample B have about the same amounts of experimental DNA, sample B has more spike-in DNA which results in lower enrichment of expermental reads.
For spike-in calibration, we need to first count the number of spike-in reads in each sample.

<img src="images/spike-in.jpg" alt="Drawing" style="width: 300px;"/>

<img src="images/CUT&RUNLogo.jpg" alt="Drawing" style="width: 120px;" align="left"/> Run the following command for CUT&RUN datasets.
<div>We will first need to calculate the calibration factor for each sample by dividing a large constant number, such as 10,000, by the count of spike-in reads

In [None]:
#In order to calibrate mapped mm39 reads by spike-in reads, we will compute a calibration factor by dividing a large constant number, such as 10,000, by the count of spike-in reads
print ("calibrartion factor for BRG1_CnR_con is")
!bc -l <<< "10000 / $(cat $wd/Submodule2/Filtering/BRG1_CnR_con.R64-1-1.bed | wc -l)"
print ("calibrartion factor for BRG1_CnR_FLV is")
!bc -l <<< "10000 / $(cat $wd/Submodule2/Filtering/BRG1_CnR_FLV.R64-1-1.bed | wc -l)"
print ("calibrartion factor for IgG_CnR is")
!bc -l <<< "10000 / $(cat $wd/Submodule2/Filtering/IgG_CnR.R64-1-1.bed | wc -l)"

In [None]:
#🕘 This may take a few minutes 
#Let's make the bedgraph files using bedtools genomecov. Note that we have plugged in the claibrtaion factors as "scale". The bedgraph files will show the scaled coverage of reads at each genomic coordinate.
#At the same time, we will convert the bedgraph files to bigWig format for downstream visualization

!bedtools genomecov -bg -scale .12734635662073708071 -i $wd/Submodule2/Filtering/BRG1_CnR_con.mm39_chr4.bed -g $wd/Submodule2/Submodule2/GenomeAnnotations/chr_lens > $wd/Submodule2/Visualization/BRG1_CnR_con.mm39_chr4.bedgraph
!bedGraphToBigWig $wd/Submodule2/Visualization/BRG1_CnR_con.mm39_chr4.bedgraph $wd/Submodule2/Submodule2/GenomeAnnotations/chr_lens $wd/Submodule2/Visualization/BRG1_CnR_con.mm39_chr4.bw
print("done with BRG1_CnR_con")
!bedtools genomecov -bg -scale .11316569721386053459 -i $wd/Submodule2/Filtering/BRG1_CnR_FLV.mm39_chr4.bed -g $wd/Submodule2/Submodule2/GenomeAnnotations/chr_lens > $wd/Submodule2/Visualization/BRG1_CnR_FLV.mm39_chr4.bedgraph
!bedGraphToBigWig $wd/Submodule2/Visualization/BRG1_CnR_FLV.mm39_chr4.bedgraph $wd/Submodule2/Submodule2/GenomeAnnotations/chr_lens $wd/Submodule2/Visualization/BRG1_CnR_FLV.mm39_chr4.bw
print("done with BRG1_CnR_FLV")
!bedtools genomecov -bg -scale .39950461427829491430 -i $wd/Submodule2/Filtering/IgG_CnR.mm39_chr4.bed -g $wd/Submodule2/Submodule2/GenomeAnnotations/chr_lens > $wd/Submodule2/Visualization/IgG_CnR.mm39_chr4.bedgraph
!bedGraphToBigWig $wd/Submodule2/Visualization/IgG_CnR.mm39_chr4.bedgraph $wd/Submodule2/Submodule2/GenomeAnnotations/chr_lens $wd/Submodule2/Visualization/IgG_CnR.mm39_chr4.bw
print("done with IgG_CnR")

<img src="images/CUT&TagLogo.jpg" alt="Drawing" style="width: 120px;" align="left"/> Run the following command for CUT&Tag datasets.
<div>We will first need to calculate the calibration factor for each sample by dividing a large constant number, such as 10,000, by the count of spike-in reads

In [None]:
#In order to calibrate mapped mm39 reads by spike-in reads, we will compute a calibration factor by dividing a large constant number, such as 10,000, by the count of spike-in reads
print ("calibrartion factor for RNAPII-S5P_CnT_con is")
!bc -l <<< "10000 / $(cat $wd/Submodule2/Filtering/RNAPII-S5P_CnT_con.ecoli.bed | wc -l)"
print ("calibrartion factor for RNAPII-S5P_CnT_FLV is")
!bc -l <<< "10000 / $(cat $wd/Submodule2/Filtering/RNAPII-S5P_CnT_FLV.ecoli.bed | wc -l)"
print ("calibrartion factor for IgG_CnT is")
!bc -l <<< "10000 / $(cat $wd/Submodule2/Filtering/IgG_CnT.ecoli.bed | wc -l)"

In [None]:
#🕘 This may take a few minutes 
#Let's make the bedgraph files using bedtools genomecov. Note that we have plugged in the claibrtaion factors as "scale". The bedgraph files will show the scaled coverage of reads at each genomic coordinate. 
#At the same time, we will convert the bedgraph files to bigWig format for downstream visualization

!bedtools genomecov -bg -scale 1.15633672525439407955 -i $wd/Submodule2/Filtering/RNAPII-S5P_CnT_con.mm39_chr4.bed -g $wd/Submodule2/Submodule2/GenomeAnnotations/chr_lens > $wd/Submodule2/Visualization/RNAPII-S5P_CnT_con.mm39_chr4.bedgraph
!bedGraphToBigWig $wd/Submodule2/Visualization/RNAPII-S5P_CnT_con.mm39_chr4.bedgraph $wd/Submodule2/Submodule2/GenomeAnnotations/chr_lens $wd/Submodule2/Visualization/RNAPII-S5P_CnT_con.mm39_chr4.bw
print("done with RNAPII-S5P_CnT_con")
!bedtools genomecov -bg -scale 8.81057268722466960352 -i $wd/Submodule2/Filtering/RNAPII-S5P_CnT_FLV.mm39_chr4.bed -g $wd/Submodule2/Submodule2/GenomeAnnotations/chr_lens > $wd/Submodule2/Visualization/RNAPII-S5P_CnT_FLV.mm39_chr4.bedgraph
!bedGraphToBigWig $wd/Submodule2/Visualization/RNAPII-S5P_CnT_FLV.mm39_chr4.bedgraph $wd/Submodule2/Submodule2/GenomeAnnotations/chr_lens $wd/Submodule2/Visualization/RNAPII-S5P_CnT_FLV.mm39_chr4.bw
print("done with RNAPII-S5P_CnT_FLV")
!bedtools genomecov -bg -scale 1.21832358674463937621 -i $wd/Submodule2/Filtering/IgG_CnT.mm39_chr4.bed -g $wd/Submodule2/Submodule2/GenomeAnnotations/chr_lens > $wd/Submodule2/Visualization/IgG_CnT.mm39_chr4.bedgraph
!bedGraphToBigWig $wd/Submodule2/Visualization/IgG_CnT.mm39_chr4.bedgraph $wd/Submodule2/Submodule2/GenomeAnnotations/chr_lens $wd/Submodule2/Visualization/IgG_CnT.mm39_chr4.bw
print("done with IgG_CnT")

<div class="alert-info" style="font-size:150%">
Genome Browser
</div>

Now that we have our bigwig files, we can visualize the signal in a genome browser. We'll use [igv](https://igv.org/) in this example.

This will load in the signal into IGV and allow you to browse the genome. Feel free to play around with this. More instructions can be found on the [IGV](https://igv.org/) website.

Click the "gear" icon on the right of each track to customize the color, name, height, etc... 

The default is to autoscale the tracks. But this means that our samples have different y-axis scales! Because we BPM normalized, we should set the data range to be the same for each sample.

<img src="images/igvdatarange.jpg" alt="Drawing" style="width: 1000px;"/>

<img src="images/ChIPseqLogo.jpg" alt="Drawing" style="width: 100px;" align="left" /> Run the following command for ChIP-seq datasets.

In [None]:
%cd $wd/
igv_notebook.init()
myigv = igv_notebook.Browser(
    {
        "genome": "mm39",
        "locus": "chr4:44,000,000-47,000,000"
    }
)
myigv.load_track(
{
        "name": "H3K27ac_ChIPseq_noaux",
        "url": "Submodule2/Visualization/H3K27ac_ChIPseq_noauxdedup.bw",
        "format": "bigwig",
        "type": "wig"
    }
    
)
myigv.load_track(
{
        "name": "H3K27ac_ChIPseq_aux",
        "url": "Submodule2/Visualization/H3K27ac_ChIPseq_auxdedup.bw",
        "format": "bigwig",
        "type": "wig"
    }
    
)

<img src="images/CUT&RUNLogo.jpg" alt="Drawing" style="width: 120px;" align="left"/> Run the following command for CUT&RUN datasets.

In [None]:
%cd $wd/
igv_notebook.init()
myigv = igv_notebook.Browser(
    {
        "genome": "mm39",
        "locus": "chr4:10,000,000-150,000,000"
    }
)
myigv.load_track(
{
        "name": "BRG1_CnR_con",
        "url": "Submodule2/Visualization/BRG1_CnR_con.mm39_chr4.bw",
        "format": "bigwig",
        "type": "wig"
    }
    
)
myigv.load_track(
{
        "name": "BRG1_CnR_FLV",
        "url": "Submodule2/Visualization/BRG1_CnR_FLV.mm39_chr4.bw",
        "format": "bigwig",
        "type": "wig"
    }
    
)
myigv.load_track(
{
        "name": "IgG_CnR",
        "url": "Submodule2/Visualization/IgG_CnR.mm39_chr4.bw",
        "format": "bigwig",
        "type": "wig"
    }
    
)

<img src="images/CUT&TagLogo.jpg" alt="Drawing" style="width: 120px;" align="left"/> Run the following command for CUT&Tag datasets.

In [None]:
%cd $wd/
igv_notebook.init()
myigv = igv_notebook.Browser(
    {
        "genome": "mm39",
        "locus": "chr4:10,000,000-150,000,000"
    }
)
myigv.load_track(
{
        "name": "RNAPII-S5P_CnT_con",
        "url": "Submodule2/Visualization/RNAPII-S5P_CnT_con.mm39_chr4.bw",
        "format": "bigwig",
        "type": "wig"
    }
    
)
myigv.load_track(
{
        "name": "RNAPII-S5P_CnT_FLV",
        "url": "Submodule2/Visualization/RNAPII-S5P_CnT_FLV.mm39_chr4.bw",
        "format": "bigwig",
        "type": "wig"
    }
    
)
myigv.load_track(
{
        "name": "IgG_CnT",
        "url": "Submodule2/Visualization/IgG_CnT.mm39_chr4.bw",
        "format": "bigwigh",
        "type": "wig"
    }
    
)

<div class="alert-info" style="font-size:150%">
Average Profiles
</div>

In addition to browsing, we can make average profiles of signal across specific regions. For example, let's check if there is any signal enrichment around genes. Let's test this using [deeptools](https://anaconda.org/bioconda/deeptools).

Deeptools takes in a bigwig file representing the signal. It also takes a bed file representing the features across which one wants to average the signal. In our case the bed file will be composed of gene annotations. Creating the profile will occur in two steps. The first is to create the summarized matrix, while the second plots that data.

In the following example, we'll use options which either scale genes to a similar size and plots the average signal across the entire region (ChIP-seq example), or plots the average signal only around the transcription start sites or TSSs of genes (CUT&RUN and CUT&Tag examples).

This may take a few minutes: 🕘

<img src="images/ChIPseqLogo.jpg" alt="Drawing" style="width: 100px;" align="left" /> Run the following command for ChIP-seq datasets.

In [None]:
%cd $wd
# -S option specifies the bigwig signal file, where we can specify multiple separated by spaces. -R option specifies the genome annotation bed file. -a and -b specify how many bp to plot on either side. 
!computeMatrix scale-regions -S Submodule2/Visualization/H3K27ac_ChIPseq_noauxdedup.bw Submodule2/Visualization/H3K27ac_ChIPseq_auxdedup.bw -R Submodule2/Submodule2/GenomeAnnotations/mm39v36_chr4_genes.bed -o Submodule2/Visualization/ChIPseq_GeneprofileMatrix -a 2000 -b 2000 --regionBodyLength 2000
print("done creating the matrix... plotting....")
!plotProfile -m Submodule2/Visualization/ChIPseq_GeneprofileMatrix -o Submodule2/Visualization/ChIPseq_Geneprofile.png --perGroup
print("done")

Let's view the output:

In [None]:
%cd $wd
Image(url= "Submodule2/Visualization/ChIPseq_Geneprofile.png", width=400, height=400)

<img src="images/CUT&RUNLogo.jpg" alt="Drawing" style="width: 120px;" align="left"/> Run the following command for CUT&RUN datasets.

</div>This may take a few minutes: 🕘

In [None]:
%cd $wd
# -S option specifies the bigwig signal file, where we can specify multiple separated by spaces. -R option specifies the genome annotation bed file. -a and -b specify how many bp to plot on either side. 
!computeMatrix reference-point -S Submodule2/Visualization/BRG1_CnR_con.mm39_chr4.bw Submodule2/Visualization/BRG1_CnR_FLV.mm39_chr4.bw -R Submodule2/Submodule2/GenomeAnnotations/mm39v36_chr4_genes.bed -o Submodule2/Visualization/CUTnRUN_GeneprofileMatrix -a 5000 -b 5000
print("done creating the matrix... plotting....")
!plotProfile -m Submodule2/Visualization/CUTnRUN_GeneprofileMatrix -o Submodule2/Visualization/CUTnRUN_Geneprofile.png --perGroup
print("done")

In [None]:
%cd $wd
Image(url= "Submodule2/Visualization/CUTnRUN_Geneprofile.png", width=400, height=400)

<img src="images/CUT&TagLogo.jpg" alt="Drawing" style="width: 120px;" align="left"/> Run the following command for CUT&Tag datasets.

<div>This may take a few minutes: 🕘

In [None]:
%cd $wd
# -S option specifies the bigwig signal file, where we can specify multiple separated by spaces. -R option specifies the genome annotation bed file. -a and -b specify how many bp to plot on either side. 
!computeMatrix reference-point -S Submodule2/Visualization/RNAPII-S5P_CnT_con.mm39_chr4.bw Submodule2/Visualization/RNAPII-S5P_CnT_FLV.mm39_chr4.bw -R Submodule2/Submodule2/GenomeAnnotations/mm39v36_chr4_genes.bed -o Submodule2/Visualization/CUTnTag_GeneprofileMatrix -a 5000 -b 5000
print("done creating the matrix... plotting....")
!plotProfile -m Submodule2/Visualization/CUTnTag_GeneprofileMatrix -o Submodule2/Visualization/CUTnTag_Geneprofile.png --perGroup
print("done")

In [None]:
%cd $wd
Image(url= "Submodule2/Visualization/CUTnTag_Geneprofile.png", width=400, height=400)

Later, we'll do some downstream analysis, which will identify differential occupancy and determine where they are relative to genomic features.

<div class="alert-info" style="font-size:200%">
STEP 3: Peak Detection
</div>

<img src="images/mountain.jpg" alt="Drawing" style="width: 100px;"/>

Occupied sites are loci with a pileup of reads in "Peaks". In the next steps, we'll identify these sites genome-wide.

<img src="images/peaksofsignal.jpg" alt="Drawing" style="width: 400px;"/>

To call peaks from ChIP-seq data, we'll use macs3, while to call peaks from CUT&RUN and CUT&Tag data, we'll use SEACR which is designed for CUT&RUN and CUT&Tag - read [PMID31300027](https://pubmed.ncbi.nlm.nih.gov/31300027/)

<div class="alert-warning" style="font-size:200%" color="black">
A note about controls
</div>
In ChIP-seq, CUT&RUN, and CUT&Tag it is important to control for non-specific signal enrichment. The processing steps up until this point are exactly the same for those control samples, but we'll use the control samples during peak calling to provide a background level.

### Note that control samples are of different kinds for ChIP-seq (input) and CUT&RUN or CUT&Tag (IgG/non-specific antibody).


<img src="images/ChIPseqLogo.jpg" alt="Drawing" style="width: 100px;" align="left" /> Run the following command for ChIP-seq datasets.
</div>
</div>We've prepared the bam files for the Input control samples for you, and specify them in the following command.  

In [None]:
!macs3 callpeak -t $wd/Submodule2/Filtering/H3K27ac_ChIPseq_noauxdedup.bam -c $wd/Submodule2/Submodule2/ChIPSeqInputBam/Input_ChIPseq_noaux_dedup.bam -f BAM -g mm -n $wd/Submodule2/Peaks/H3K27ac_ChIPseq_noaux --nomodel
#Do it again for the other sample.
!macs3 callpeak -t $wd/Submodule2/Filtering/H3K27ac_ChIPseq_auxdedup.bam -c $wd/Submodule2/Submodule2/ChIPSeqInputBam/Input_ChIPseq_aux_dedup.bam -f BAM -g mm -n $wd/Submodule2/Peaks/H3K27ac_ChIPseq_aux --nomodel

#### Let's count how many "peaks" it called. 

In [None]:
!wc -l $wd/Submodule2/Peaks/H3K27ac_ChIPseq_noaux_peaks.narrowPeak
!wc -l $wd/Submodule2/Peaks/H3K27ac_ChIPseq_aux_peaks.narrowPeak

From the counts above, it looks like auxin-mediated degradation of BAF results in fewer identified H3K27ac sites. Later, we'll use better statistical analysis to identify differences quantitatively

#### Let's also visualize the peak calls.

In [None]:
%cd $wd
igv_notebook.init()
myigv = igv_notebook.Browser(
    {
        "genome": "hg38",
        "locus": "chr4:39,000,000-42,000,000"
    }
)
myigv.load_track(
{
        "name": "H3K27ac_ChIPseq_noaux",
        "url": "Submodule2/Visualization/H3K27ac_ChIPseq_noauxdedup.bw",
        "format": "bigwig",
        "type": "wig"
    }
    
)
myigv.load_track(
{
        "name": "H3K27ac -auxin peaks",
        "url": "Submodule2/Peaks/H3K27ac_ChIPseq_noaux_peaks.narrowPeak",
        "format": "bed",
        "type": "annotation"
    }
    
)
myigv.load_track(
{
        "name": "H3K27ac_ChIPseq_aux",
        "url": "Submodule2/Visualization/H3K27ac_ChIPseq_auxdedup.bw",
        "format": "bigwig",
        "type": "wig"
    }
    
)
myigv.load_track(
{
        "name": "H3K27ac +auxin peaks",
        "url": "Submodule2/Peaks/H3K27ac_ChIPseq_aux_peaks.narrowPeak",
        "format": "bed",
        "type": "annotation"
    }
    
)

<img src="images/CUT&RUNLogo.jpg" alt="Drawing" style="width: 120px;" align="left"/> Run the following command for CUT&RUN datasets.


In [None]:
#For CUT&RUN, we will use SEACR to call peaks in the BRG1_con and BRG1_FLV data against spike-normalized IgG control using the "non" and "stringent" modes. You may want to try calling peaks in the "relaxed" mode to compare how different they are.
%cd $wd/Submodule2/Peaks
!SEACR_1.3.sh $wd/Submodule2/Submodule2/Viz/BRG1_CnR_con.bedgraph $wd/Submodule2/Submodule2/Viz/IgG_CnR.bedgraph non stringent BRG1_CnR_con_peaks
print("done with BRG1_CnR_con")
!SEACR_1.3.sh $wd/Submodule2/Submodule2/Viz/BRG1_CnR_FLV.bedgraph $wd/Submodule2/Submodule2/Viz/IgG_CnR.bedgraph non stringent BRG1_CnR_FLV_peaks
print("done with BRG1_CnR_con")

#### Let's count how many "peaks" it called. 

In [None]:
!wc -l $wd/Submodule2/Peaks/BRG1_CnR_con_peaks.stringent.bed
!wc -l $wd/Submodule2/Peaks/BRG1_CnR_FLV_peaks.stringent.bed

#### In the cell below, try calling BRG1_con and BRG1_FLV peaks in the "relaxed" mode 

<img src="images/CUT&TagLogo.jpg" alt="Drawing" style="width: 120px;" align="left"/> Run the following command for CUT&Tag datasets.

In [None]:
#For CUT&Tag, we will use SEACR to call peaks in the RNAPII-S5P_con and RNAPII-S5P_FLV data against spike-normalized IgG control using the "non" and "stringent" modes. You may want to try calling peaks in the "relaxed" mode to compare how different they are.
%cd $wd/Submodule2/Peaks
!SEACR_1.3.sh $wd/Submodule2/Submodule2/Viz/RNAPII-S5P_CnT_con.bedgraph $wd/Submodule2/Submodule2/Viz/IgG_CnT.bedgraph non stringent RNAPII-S5P_CnT_con_peaks
print("done with BRG1_CnR_con")
!SEACR_1.3.sh $wd/Submodule2/Submodule2/Viz/RNAPII-S5P_CnT_FLV.bedgraph $wd/Submodule2/Submodule2/Viz/IgG_CnT.bedgraph non stringent RNAPII-S5P_CnT_FLV_peaks
print("done with BRG1_CnR_con")

#### Let's count how many "peaks" it called. 

In [None]:
!wc -l $wd/Submodule2/Peaks/RNAPII-S5P_CnT_con_peaks.stringent.bed
!wc -l $wd/Submodule2/Peaks/RNAPII-S5P_CnT_FLV_peaks.stringent.bed

#### In the cell below, try calling RNAPII-S5P_con and RNAPII-S5P_FLV peaks in the "relaxed" mode 

<div class="alert-info" style="font-size:200%">
STEP 4: Differential Peaks
</div>

We want to identify differences in occupancy. Many people new to analysis will simply intersect peak calls, but we strongly recommend to not take that approach! 

Peak identification is not perfect, and identification of a peak in one sample, does not mean there wasn't enriched signal in the other! Instead, it only means that the tool had difficulty with identifying that site. Instead of intersecting peaks, let's use a more quantitative approach for differential analysis using [MAnorm](https://github.com/shao-lab/MAnorm).

MAnorm makes a master list of peaks (merging peaks called from either sample), and compares the actual signal at these peaks. In doing so, it also normalizes to the signal across the total peaks to try to normalize for immunoprecipitation efficiency.

In [None]:
#We specify both peak files (--p1 and --p2), the format of which is narrowpeak (--pf). We also specify the reads in bam format (--r1 and --r2, and --rf).
!manorm --p1 $wd/Submodule2/Peaks/H3K27ac_ChIPseq_noaux_peaks.narrowPeak --p2 $wd/Submodule2/Peaks/H3K27ac_ChIPseq_aux_peaks.narrowPeak --pf narrowpeak --r1 $wd/Submodule2/Filtering/H3K27ac_ChIPseq_noauxdedup.bam --r2 $wd/Submodule2/Filtering/H3K27ac_ChIPseq_auxdedup.bam --rf bam --n1 H3K27ac_noauxin --n2 H3K27ac_auxin -o $wd/Submodule2/MANorm_H3K27ac_ChIPseq
#Let's copy these files to have an easier name
!cp $wd/Submodule2/MANorm_H3K27ac_ChIPseq/output_filters/H3K27ac_noauxin_vs_H3K27ac_auxin_M_above_1.0_biased_peaks.bed $wd/Submodule2/MANorm_H3K27ac_ChIPseq/H3K27ac_higher_in_noauxin.bed
!cp $wd/Submodule2/MANorm_H3K27ac_ChIPseq/output_filters/H3K27ac_noauxin_vs_H3K27ac_auxin_M_below_-1.0_biased_peaks.bed $wd/Submodule2/MANorm_H3K27ac_ChIPseq/H3K27ac_higher_in_auxin.bed

<div class="alert-warning" style="font-size:150%; color:black">
⬇️ Now, within the following block, try to type out the command to create a plot of the signal on loci with higher H3K27ac occupancy in the noauxin sample. ⬇️
</div>

<div class="alert alert-block alert-success" style="font-size:120%">
<span style="color:black">Congrats! You have successfully performed some filtering and downstream analysis! Using what you learned above, try adding some cells to this notebook and, for the differential peaks, visualize in IGV, plot the average signal. 
In the next tutorial, will compare the data to ATAC-seq and RNA-seq data. We highly recommend going through the NIH/NIGMS Sandbox on ATAC-seq and RNA-seq which teach how to generate the differential accessible sites and differentially expressed genes. We'll provide those files and focus on the comparison to chromatin occupancy from ChIP-seq, CUT&RUN, and CUT&Tag.</span>