<h1 style="font-size: 40px; margin-bottom: 0px;">8.2 Chromatin domains</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 600px;"></hr>

If we have time, we'll take a look at chromatin markers that signify an active enhancer region, specifically around our focused set of overlapping peaks. This will give us an idea of which of our peaks overlap with enhancers that have been marked as active based on the region's epigenetic signature. We'll continue to make use of HOMER, particularly the <code>annotatePeaks.pl</code> program, but we'll provide an another option to slightly change the way it runs.

<strong>Learning objectives:</strong>

<ul>
    <li>Continue to practice using HOMER to analyze CHIP-seq data</li>
    <li>Generate heatmaps of read coverage around a region of interest</li>
    <li>Infer enhancer or promoter activity based on epigenetic signatures</li>
    <li>Plot TF read coverage alongside histone read coverage</li>
</ul>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

<h1 style="font-size: 40px; margin-bottom: 0px;">Analyze epigenetic signatures</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 600px;"></hr>

I have a <code>histone</code> directory located within our shared <code>chip</code> directory, containing the bedGraph files from <a href="https://doi.org/10.1186/1471-2164-15-331" rel="noopener noreferrer" target="_blank"><u>Rhie et al 2014</u></a> (<a href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE49651" rel="noopener noreferrer" target="_blank"><u>GSE49651</u></a>), which Zanconato et al 2015 also used in their analyses. We'll be using their read coverage data to get an idea of the epigenetic signature around our peaks. 

Here, we'll call up the <code>annotatePeaks.pl</code> program and make use of its histogram mode. But rather than have it simply provide us with the read density along all peaks in a single column, we can tell it to provide us with a data matrix, where we can then see the read coverage for each peak individually.

<h3>Create <code>heatmap</code> directory and enter it</h3>

Let's create a new directory called <code>heatmap</code> in this week's directory and change into it.

<pre style="width: 450px; margin-top: 15px; margin-bottom: 15px; color: #000000; background-color: #EEEEEE; border: 1px solid; border-color: #AAAAAA; padding: 10px; border-radius: 15px; font-size: 12px;">mkdir heatmap</pre>

<pre style="width: 450px; margin-top: 15px; margin-bottom: 15px; color: #000000; background-color: #EEEEEE; border: 1px solid; border-color: #AAAAAA; padding: 10px; border-radius: 15px; font-size: 12px;">cd heatmap</pre>

<hr style="border: 1px solid; border-color: #BBBBBB;"></hr>

<h3>Obtain data matrix for h3k4me1</h3>

<pre style="width: 450px; margin-top: 15px; margin-bottom: 15px; color: #000000; background-color: #EEEEEE; border: 1px solid; border-color: #AAAAAA; padding: 10px; border-radius: 15px; font-size: 12px;">annotatePeaks.pl \
~/MCB201B_F2024/Week_8/top_concordant_peaks.narrowPeak \
hg19 \
-bedGraph ~/shared/course/mcb201b-shared-readwrite/chip/histone/h3k4me1-1.bedGraph \
-size 2000 \
-hist 10 \
-ghist \
> h3k4me1_heatmap.txt</pre>

<strong>Let's break down the code:</strong>

<code>annotatePeaks.pl</code>

This calls up the annotatePeaks program.

<hr style="border: 1px solid; border-color: #AAAAAA;"></hr>

<code>~/MCB201B_F2024/Week_8/top_concordant_peaks.narrowPeak</code>

Here, we provide HOMER with our focused set of overlapping peaks.

<hr style="border: 1px solid; border-color: #AAAAAA;"></hr>

<code>-bedGraph ~/shared/course/mcb201b-shared-readwrite/chip/histone/h3k4me1-1.bedGraph</code>

We provide it with coverage maps of the reads associated with one of our histone marks, which will give us an idea of the activity of the peak region, each of these marks are is associated with an active enhancer or promoter site.


<hr style="border: 1px solid; border-color: #AAAAAA;"></hr>

<code>-size 2000</code>

Like yesterday, we use the <code>-size</code> option to specify the size of the region around our peaks that we are interested in looking at.

<hr style="border: 1px solid; border-color: #AAAAAA;"></hr>

<code>-hist 10</code>

We provide the bin size and also switch to histogram mode, so the annotatePeaks program will provide us with information on read distributions.

<hr style="border: 1px solid; border-color: #AAAAAA;"></hr>

<code>-ghist</code>

You can kind of think of the <code>-ghist</code> option as switching to a different sub-mode of the histogram mode. Instead of providing us with the read coverage for all peaks at once, it will give us the read coverage for each peak in the form of a data matrix, where the rows correspond to different peaks, and the columns provide us with the read densities along those positions for each peak.

<hr style="border: 1px solid; border-color: #AAAAAA;"></hr>

<code>&gt; h3k4me1_heatmap.txt</code>

We specify our output file that we want the data matrix to be stored in. This redirects the annotatePeaks output from the Terminal window to a .txt file that we can open up in Python to visualize the data.

<h2>Plot heatmap of read densities</h2>

Now let's make use of seaborn's heatmap function to visualize the read densities of H3K4me1 marks around our peaks.

First import our data matrix:

Then, let's drop the first column to keep things simple for us later on.

Now, let's plot the heatmap using seaborn.

And we can reorder it so that the peaks with the highest mean reads are up top

<h1 style="font-size: 40px; margin-bottom: 0px;">Analyze other histone marks</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 600px;"></hr>

If we have time, see if you can do the same for the other histone marks.

<h3>Obtain data matrix of read densities for H3K4me3</h3>

<pre style="width: 450px; margin-top: 15px; margin-bottom: 15px; color: #000000; background-color: #EEEEEE; border: 1px solid; border-color: #AAAAAA; padding: 10px; border-radius: 15px; font-size: 12px;">annotatePeaks.pl \
~/MCB201B_F2024/Week_8/top_concordant_peaks.narrowPeak \
hg19 \
-bedGraph ~/shared/course/mcb201b-shared-readwrite/chip/histone/h3k4me3-1.bedGraph \
-size 2000 \
-hist 10 \
-ghist \
> h3k4me3_heatmap.txt</pre>

<hr style="border: 1px solid; border-color: #BBBBBB;"></hr>

<h3>Obtain data matrix of read densities for H3K27Ac</h3>

<pre style="width: 450px; margin-top: 15px; margin-bottom: 15px; color: #000000; background-color: #EEEEEE; border: 1px solid; border-color: #AAAAAA; padding: 10px; border-radius: 15px; font-size: 12px;">annotatePeaks.pl \
~/MCB201B_F2024/Week_8/top_concordant_peaks.narrowPeak \
hg19 \
-bedGraph ~/shared/course/mcb201b-shared-readwrite/chip/histone/h3k27ac-1.bedGraph \
-size 2000 \
-hist 10 \
-ghist \
> h3k27ac_heatmap.txt</pre>

<h2>Plot their heatmaps</h2>

<h1 style="font-size: 40px; margin-bottom: 0px;">References</h1>

<hr style="margin-left: 0px; border: 0.25px solid; border-color: #000000; width: 400px;"></hr>

<a href="https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-15-331" rel="noopener noreferrer" target="_blank"><u>Rhie et al 2014 BMC Genomics:</u></a> Nucleosome positioning and histone modifications define relationships between regulatory elements and nearby gene expression in breast epithelial cells