Hunter Bennett | Glass Lab | Brain Aging Project | 12 Feb 2021  

The goal of this notebook is to compare the nucleosome free regions called by HOMER and HisTrader to the nucleosome free regions defined by ATAC-seq. This will inform the analysis in other cell types and determine whether we can uce the H3K27Ac defined NFR regions for motif finding.

In [3]:
### header ###
__author__ = "Hunter Bennett"
__license__ = "BSD"
__email__ = "hunter.r.bennett@gmail.com"
%load_ext autoreload
%autoreload 2
### imports ###
import sys
%matplotlib inline
import os
import re
import glob
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt 
import seaborn as sns
matplotlib.rcParams['savefig.dpi'] = 200
sns.set(font_scale=1)
sns.set_context('talk')
sns.set_style('white')

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [4]:
dataDirectory = '/data/mm10/Brain_MPSIIIA/ATAC/Microglia/'
workingDirectory = '/home/h1bennet/brain_aging/results/00_Microglia_CompareATACNFR/'
if not os.path.isdir(workingDirectory):
    os.mkdir(workingDirectory)
os.chdir(workingDirectory)

# Copy in peak files from other analyses

In [5]:
if not os.path.isdir('./peak_files'):
    os.mkdir('./peak_files')

### First copy in the NFR peaks

In [None]:
%%bash
cp ../00_PU1_H3K27Ac_4month_vs_25month/merged_peaks/nfr_peaks_merged.txt \
./peak_files/

### Next copy in the IDR peaks

In [None]:
%%bash
cp ../00_Microglia_ATAC/merged_peaks/idr_peaks_merged.txt
./peak_files/

# Use homer mergePeaks to check for overlap

In [7]:
%%bash

cd ./peak_files/

mergePeaks -prefix merge -matrix overlap_matrix \
idr_peaks_merged.txt nfr_peaks_merged.txt \
> idr_nfr_peaks_merged.txt

	Max distance to merge: direct overlap required (-d given)
	Merging peaks... 
	Comparing idr_peaks_merged.txt (119988 total) and idr_peaks_merged.txt (119988 total)
	Comparing idr_peaks_merged.txt (119988 total) and nfr_peaks_merged.txt (105306 total)
	Comparing nfr_peaks_merged.txt (105306 total) and idr_peaks_merged.txt (119988 total)
	Comparing nfr_peaks_merged.txt (105306 total) and nfr_peaks_merged.txt (105306 total)

idr_peaks_merged.txt	nfr_peaks_merged.txt	Total	Name
	X	61683	nfr_peaks_merged.txt
X		77918	idr_peaks_merged.txt
X	X	38516	idr_peaks_merged.txt|nfr_peaks_merged.txt


The overlap here is pretty bad - only about 1/3 of the idr ATAC peaks overlap HOMER NFR peaks. This led us to go looking for other technologies to predict ATAC-seq peaks, we then found HisTrader!

# Prep NFR predictions using HisTrader
___
Histrader requires two inputs to do NFR predictions:
1. BedGraph style peak data.
2. Bed file "broadPeaks" or homer -regions within which to perform NFR predictions.

The easiest way to evaluate the NFR predictions in my view is to predict the NFR within differentially acetylated variable width regions as defined by HOMER, we can then compare those as well as the HOMER defined NFR to the ATAC-seq peaks as "ground truth"

## Generate bedGraphs using HOMER

In [None]:
if not os.path.isdir('./bedGraphs'):
    os.mkdir('./bedGraphs')

In [None]:
%%bash

makeUCSCfile ../00_Microglia_H3K27Ac/merged_tagdirs/00_all_microglia_H3K27Ac_4months/ \
-o ./bedGraphs/00_all_microglia_H3K27Ac_4months

gunzip ./bedgraphs/00_all_microglia_H3K27Ac_4months.bedGraph.gz

makeUCSCfile ../00_Microglia_H3K27Ac/merged_tagdirs/01_all_microglia_H3K27Ac_25months/ \
-o ./bedGraphs/01_all_microglia_H3K27Ac_25months

gunzip ./bedgraphs/01_all_microglia_H3K27Ac_25months.bedGraph.gz

## Copy over differentially acetylated regions (in bed format)

In [8]:
if not os.path.isdir('./differential_bed'):
    os.mkdir('./differential_bed')

In [9]:
!ls differential_bed/

00_pu1_4month_act_peaks.bed  01_pu1_25month_act_peaks.bed


In [12]:
ls ../00_PU1_H3K27Ac_4month_vs_25month/bed_files/

00_pu1_4month_nfr_act_background.bed
00_pu1_4month_nfr_act_background_distal.bed
00_pu1_4month_nfr_act_peaks.bed
00_pu1_4month_nfr_act_peaks_distal.bed
00_pu1_4month_union_act_peaks.bed
00_pu1_4month_union_act_peaks_distal.bed
01_pu1_25month_nfr_act_background.bed
01_pu1_25month_nfr_act_background_distal.bed
01_pu1_25month_nfr_act_peaks.bed
01_pu1_25month_nfr_act_peaks_distal.bed
01_pu1_25month_union_act_peaks.bed
01_pu1_25month_union_act_peaks_distal.bed
fw_peaks_merged.bed
nfr_peaks_merged.bed
vw_peaks_merged.bed


In [15]:
%%bash

cp ../00_PU1_H3K27Ac_4month_vs_25month/bed_files/00_pu1_4month_union_act_peaks.bed \
./differential_bed/00_pu1_4month_act_peaks.bed

cp ../00_PU1_H3K27Ac_4month_vs_25month/bed_files/01_pu1_25month_union_act_peaks.bed \
./differential_bed/01_pu1_25month_act_peaks.bed

## Run HisTrader

In [16]:
%%bash

perl ~/code/HisTrader/Histrader.pl \
--bedGraph ./bedGraphs/00_all_microglia_H3K27Ac_4months.bedGraph \
--peaks ./differential_bed/00_pu1_4month_act_peaks.bed \
--out ./00_pu1_4month_act_histrader

mv ./00_pu1_4month_act_histrader* ./histrader/



########################################################################################################
##                                                                                                    ##
##    HISTRADER: A tool to identify nucleosome free regions from ChIP-Seq of Histone Modifications    ##
##                                                                                                    ##
##                                                                                                    ##
##                                Written by Yifei Yan and Swneke D. Bailey                           ##
##                                   Copyright 2020 Swneke D. Bailey                                  ##
##                                                                                                    ##
########################################################################################################


Identifying Valleys in ./bedGraphs/00_all_microglia

In [None]:
%%bash

perl ~/code/HisTrader/Histrader.pl \
--bedGraph ./bedGraphs/01_all_microglia_H3K27Ac_25months.bedGraph \
--peaks ./differential_bed/01_pu1_25month_act_peaks.bed \
--out ./01_pu1_25month_act_histrader

mv ./01_pu1_25month_act_histrader* ./histrader/

## TODO:
1. Pull ATAC-seq peaks within these regions using bedtools
2. Use merge peaks to compare NFR regions called by HOMER and HisTrader to ATAC-seq peaks.
3. Visualize on browser.