Hunter Bennett | Glass Lab | Brain Aging Project | 19 Feb 2021

This notebook uses differential peaks called in the previous notebook along with H3K27Ac signal to identify nucleosome free regions within differential H3K27Ac peaks. We currently dont know exactly what method we will use to call motifs but right now three methods are possible:

1. Call motifs across 1000bp broad peaks.
2. Call motifs on HOMER identified nucleosome free regions.
3. Call motifs on HisTrader identified nucleosome free regions.

See analysis in microglia for a comparison of the relative benefits of each method, it appears that using HOMER minimizes false positive while using histrader maximizes sensitivity at the cost of more false positives.

This notebook does the following.
1. Call NFR using histrader  
    a. Make bedGraph files from relevant merged tag directories.  
    b. use histrader, bedgraphs, and differential peaks to identify NFRs  
    c. use histrader and overall variable width peaks to identify background NFRs (takes a long time)
    d. use annotatePeaks to rename hisTrader peaks (create unique peakIDs) so that it plays nice with HOMER.       
2. Use bedtools to identify HOMER -NFRs that lie within differential peaks.

In [1]:
### header ###
__author__ = "Hunter Bennett"
__license__ = "BSD"
__email__ = "hunter.r.bennett@gmail.com"
%load_ext autoreload
%autoreload 2
### imports ###
import sys
%matplotlib inline
import os
import re
import glob
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt 
import seaborn as sns
matplotlib.rcParams['savefig.dpi'] = 200
sns.set(font_scale=1)
sns.set_context('talk')
sns.set_style('white')

In [2]:
dataDirectory = '/data/mm10/Brain_MPSIIIA/ChIP/H3K27AC/NeuN/WT/'
inputDirectory = '/data/mm10/Brain_MPSIIIA/ChIP/input/NeuN/'
workingDirectory = '/home/h1bennet/brain_aging/results/01_NeuN_H3K27Ac/'
if not os.path.isdir(workingDirectory):
    os.mkdir(workingDirectory)
os.chdir(workingDirectory)

# Prep NFR predictions using HisTrader
___
Histrader requires two inputs to do NFR predictions:
1. BedGraph style peak data.
2. Bed file "broadPeaks" or homer -regions within which to perform NFR predictions.

The easiest way to evaluate the NFR predictions in my view is to predict the NFR within differentially acetylated variable width regions as defined by HOMER, we can then compare those as well as the HOMER defined NFR to the ATAC-seq peaks as "ground truth"

## Generate bedGraphs using HOMER


In [4]:
if not os.path.isdir('./bedGraphs'):
    os.mkdir('./bedGraphs')

In [None]:
%%bash

makeUCSCfile ./merged_tagdirs/04_NeuN_4Month_ChIP_H3K27Ac_merged/ \
-o ./bedGraphs/04_NeuN_4Month_ChIP_H3K27Ac_merged.bedGraph

gunzip ./bedgraphs/04_NeuN_4Month_ChIP_H3K27Ac_merged.bedGraph.gz

makeUCSCfile ./merged_tagdirs/06_NeuN_20MonthPlus_ChIP_H3K27Ac_merged/ \
-o ./bedGraphs/06_NeuN_20MonthPlus_ChIP_H3K27Ac_merged.bedGraph

gunzip ./bedgraphs/06_NeuN_20MonthPlus_ChIP_H3K27Ac_merged.bedGraph.gz

# Convert merged peaks to bed

In [15]:
if not os.path.isdir('./merged_bed/'):
    os.mkdir('./merged_bed/')

In [None]:
%%bash
# delete the existing script file
rm ./peak_to_bed_merge.sh
# create a script file
touch ./peak_to_bed_merge.sh

In [14]:
%%bash
for peakfile in ./merged_peaks/*txt;
do out=${peakfile/.txt/.bed};
out=${out/merged_peaks/merged_bed}
echo "pos2bed.pl -o $out $peakfile" >> ./peak_to_bed_merge.sh
done

## Run HisTrader

In [13]:
if not os.path.isdir('./4month_vs_25month/histrader/'):
    os.mkdir('./4month_vs_25month/histrader/')

Run HisTrader for specific peaks

In [None]:
%%bash

perl ~/code/HisTrader/Histrader.pl \
--bedGraph ./bedGraphs/04_NeuN_4Month_ChIP_H3K27Ac_merged.bedGraph \
--peaks ./4month_vs_25month/bed_files/00_neun_4month_union_act_peaks.bed \
--out ./00_neun_4month_act_histrader

mv ./00_neun_4month_act_histrader* ./4month_vs_25month/histrader/

perl ~/code/HisTrader/Histrader.pl \
--bedGraph ./bedGraphs/06_NeuN_20MonthPlus_ChIP_H3K27Ac_merged.bedGraph \
--peaks ./4month_vs_25month/bed_files/01_neun_25month_union_act_peaks.bed \
--out ./01_neun_25month_act_histrader

mv ./01_neun_25month_act_histrader* ./4month_vs_25month/histrader/

Run HisTrader for all peaks

In [None]:
%%bash

perl ~/code/HisTrader/Histrader.pl \
--bedGraph ./bedGraphs/04_NeuN_4Month_ChIP_H3K27Ac_merged.bedGraph \
--peaks ./merged_bed/04_NeuN_4Month_vw_peaks_merged.bed \
--out ./00_neun_4month_bg_histrader

mv ./00_neun_4month_bg_histrader* ./4month_vs_25month/histrader/

perl ~/code/HisTrader/Histrader.pl \
--bedGraph ./bedGraphs/06_NeuN_20MonthPlus_ChIP_H3K27Ac_merged.bedGraph \
--peaks ./merged_bed/06_NeuN_20Month_vw_peaks_merged.bed \
--out ./01_neun_25month_bg_histrader

mv ./01_neun_25month_bg_histrader* ./4month_vs_25month/histrader/

I suppose we might have to run hisTrader on all peaks in the samples to get a reliable background set.

# post-processing so these play nice with HOMER
___
Unfortunately the Histrader peaks are not uniquely named - they instead retain the name of the original broadPeak that they come from.. so we need to convert them so we can easily trace motifs back to particular peaks

In [None]:
%%bash

annotatePeaks.pl ./4month_vs_25month/histrader/00_neun_4month_act_histrader.nfr.bed mm10 \
-size given > ./4month_vs_25month/histrader/00_neun_4month_act_histrader.nfr.txt

annotatePeaks.pl ./4month_vs_25month/histrader/01_neun_25month_act_histrader.nfr.bed mm10 \
-size given > ./4month_vs_25month/histrader/01_neun_25month_act_histrader.nfr.txt

# Select HOMER NFRs

Since we have already made the bed files this is just a simple intersect command.

In [3]:
%%bash

bedtools intersect -wa -a ./merged_bed/04_NeuN_4Month_nfr_peaks_merged.bed \
-b ./4month_vs_25month/bed_files/00_neun_4month_union_act_peaks.bed \
> ./4month_vs_25month/homer_nfr/00_neun_4month_homer_nfr_act_peaks.bed

bedtools intersect -wa -a ./merged_bed/06_NeuN_20Month_nfr_peaks_merged.bed \
-b ./4month_vs_25month/bed_files/01_neun_25month_union_act_peaks.bed \
> ./4month_vs_25month/homer_nfr/01_neun_25month_homer_nfr_act_peaks.bed