layout | permalink | title | header1 | header2 | image | home | description | author |
---|---|---|---|---|---|---|---|---|
tutorial_page |
/EPI_2021_Module2_lab |
EPI 2021 Module 2 Lab |
Workshop Pages for Students |
Epigenomic Analysis 2021 |
/site_images/CBW_Epigenome-data_icon.jpg |
Introduction to ChIP-seq and analysis |
Martin Hirst and Edmund Su |
by Martin Hirst and Edmund Su
Tool | Website | Info |
---|---|---|
BWA | http://bio-bwa.sourceforge.net/bwa.shtml | Genomic Sequence Read alignment tool |
PICARD | https://broadinstitute.github.io/picard/ | Toolkit for BAM file manipulation |
SAMTOOLS | http://www.htslib.org/doc/samtools.html | Toolkit for BAM file manipulation |
BEDTOOLS | https://bedtools.readthedocs.io/en/latest/index.html | Toolkit for BED/BEDGRAPH file manipulation |
MACS2 | https://github.com/macs3-project/MACS | Enriched region identifier for ChIP-seq |
UCSCtools | http://genome.ucsc.edu/goldenPath/help/bigWig.html | Manipulation of Wigs and Bed/Bedgraph to binary forms |
File name | File location | Source |
---|---|---|
BWA INDEX | ~/CourseData/EPI_data/Module1/BWA_index/ | |
Enhancer file.bed | ~/CourseData/EPI_data/Module1/QC_resources/ | download various state7 and merge https://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final/ |
TSS.bed | ~/CourseData/EPI_data/Module1/QC_resources/ | Generated by downloading Ensemblv79 GTF convert to Bed +/-2kb of TSS. See https://www.biostars.org/p/56280/ |
HOX regions.bed | ~/CourseData/EPI_data/Module1/QC_resources/ | Generated by downloading Ensemblv79 GTF convert to Bed then selecting for HOX |
Hg38 Black list regions | ~/CourseData/EPI_data/Module1/QC_resources/ | https://www.encodeproject.org/files/ENCFF356LFX/@@download/ENCFF356LFX.bed.gz |
1. Dedup BAM file
2. Running a peak caller
3. Remove blacklist regions
4. Visualization
mkdir ~/workspace/peaks
samtools index -@4 ~/CourseData/EPI_data/Module1/MCF10A_resources/*.bam
treatment_bam=~/CourseData/EPI_data/Module1/MCF10A_resources/MCF10A_H3K27ac.bam
treatment_dedup=~/workspace/peaks/MCF10A_H3K27ac.dedup.bam
samtools view -@4 ${treatment_bam} -bh -q10 -F1028 -o ${treatment_dedup}
input_bam=~/CourseData/EPI_data/Module1/MCF10A_resources/MCF10A_Input.bam
input_dedup=~/workspace/peaks/MCF10A_Input.dedup.bam
samtools view -@4 ${treatment_bam} -bh -q10 -F1028 -o ${treatment_dedup}
samtools view -@4 ${treatment_bam} -bh -q10 -F1028 chr19# {"-F1028": removes reads that unmapped and duplicated,
"-@4": uses two threads,
"-bh": outputs in binary,
"q10": MAPQ must be >=10
This step will take a while, copies have been prepared in resources. Smallest file taking 3 minutes. Longest taking 10.
mkdir -p ~/workspace/peaks
treatment_frag=~/CourseData/EPI_data/Module1/MCF10A_resources/MCF10A_H3K27ac.dedup.bam
input_frag=~/CourseData/EPI_data/Module1/MCF10A_resources/MCF10A_Input.dedup.bam
name=MCF10A_H3K27ac
macs3 callpeak -t ${treatment_frag} -c ${input_frag} -f BAMPE -g hs -n ${name} --keep-dup all --outdir ~/workspace/peaks/ --bdg 1> ~/workspace/peaks/${name}.out.log 2> ~/workspace/peaks/${name}.err.log
macs3 callpeak \ #General purpose peak calling mode
-t ${treatment_frag} \ #Treatment file, can provide more than one to "pool"
-c ${input_frag} \ #Control File/Input
-f BAMPE \ #Instructs MACS2 on what kind of file to expect. Single/Paired-end bed/bam
-g hs \ #Sets appropriate genome size for background sampling hs=human, mm=humouse
-n ${name} \ #name
#-q 0.05 #FDR q value default
#-broad #"broad mode" for broad marks - stitches small peaks together
--outdir ~/workspace/peaks/ \ #where to output files otherwise stores in current working directory
--bdg \ #outputs pileup into bedgraphs
1> ~/workspace/peaks/${name}.out.log \ #output log
2> ~/workspace/peaks/${name}.err.log #error log
List for common marks but is not exhaustive. Best practice would be to observe trends on genome browser and then decide.
Narrow Marks:
H3K27ac
H3K4me3
Broad Marks:
H3K27me3
H3K4me1
H3K36me3
H3K36me2
H3K9me3
blacklist=~/CourseData/EPI_data/Module1/QC_resources/hg38_blacklist.bed
sample="MCF10A_H3K27ac_peaks"
bedtools intersect -v -a ~/workspace/peaks/${sample}.narrowPeak -b ${blacklist} > ~/workspace/peaks/${sample}.blacklistRemoved.narrowPeak
bedtools intersect -u -a ~/workspace/peaks/${sample}.narrowPeak -b ${blacklist} > ~/workspace/peaks/${sample}.blacklist.narrowPeak
sample="MCF10A_H3K27ac_peaks"
wc -l ~/workspace/peaks/${sample}.blacklistRemoved.narrowPeak
head ~/workspace/peaks/${sample}.blacklistRemoved.narrowPeak -n5
32384 /home/ubuntu/workspace/peaks/MCF10A_H3K27ac_peaks.blacklistRemoved.narrowPeak
chr1 10003 10467 MCF10A_H3K27ac_peak_1 2160 . 47.2436 221.596 216.028 179
chr1 17283 17514 MCF10A_H3K27ac_peak_2 27 . 3.9983 5.1706 2.76136 82
chr1 180312 180997 MCF10A_H3K27ac_peak_3 803 . 28.8766 85.1306 80.3433 461
chr1 777953 778720 MCF10A_H3K27ac_peak_4 210 . 10.1154 24.4429 21.0334 503
chr1 778984 779223 MCF10A_H3K27ac_peak_5 44 . 4.87285 7.00538 4.45379 69
chromosome name
peak start
peak stop
peak name
int(-10*log10pvalue)
N/A
Fold change at peak summit
-log10 P-value at Peak
-log10 Q-value at Peak
Summit position relative to peak
sample="MCF10A_H3K27ac"
query_bam=~/CourseData/EPI_data/Module1/MCF10A_resources/${sample}.bam
samtools view -@4 -q 10 -F 1028 $query_bam -c
samtools view -@4 -q 10 -F 1028 $query_bam -L ~/CourseData/EPI_data/Module1/QC_resources/encode_enhancers_liftover.bed -c
samtools view -@4 -q 10 -F 1028 $query_bam -L ~/workspace/peaks/${sample}_peaks.blacklistRemoved.narrowPeak -c
19632094
10752392
2028377
samtools view #Viewing entirety or subset of BAM file
-q 10 \ #Parsing for reads with 5>=MAPQ
-F 1028 \ # Remove reads that have the following flags:UNMAP,DUP #https://broadinstitute.github.io/picard/explain-flags.html
$query_bam \ # Bam of interest
-c # Count number of reads instead of displaying
-L # Region of interest. Bed File
H3K4me1 | H3K27me3 | H3K27ac | H3K4me3 | |
---|---|---|---|---|
Reads Count | ||||
Total | 71,545,746 | 55,438,916 | 19,632,094 | 26,381,511 |
Enhancer | 48,466,245 | 25,742,977 | 10,752,392 | 12,302,265 |
HOX | 13,866 | 13,085 | 4,504 | 28,674 |
TSS | 7,303,584 | 5,111,631 | 2,702,194 | 18,041,080 |
In Peaks | 27,539,876 | 25,072,097 | 2,028,377 | 20,920,281 |
H3K4me1 | H3K27me3 | H3K27ac | H3K4me3 | |
Percent of Total | ||||
Total | 100.00 | 100.00 | 100.00 | 100.00 |
Enhancer | 67.74 | 46.43 | 54.77 | 46.63 |
HOX | 0.02 | 0.02 | 0.02 | 0.11 |
TSS | 10.21 | 9.22 | 13.76 | 68.39 |
In Peaks | 38.49 | 45.22 | 10.33 | 79.30 |
mkdir -p ~/workspace/{bigBed,bigWig}
sample="MCF10A_H3K27ac"
input_bedgraph=~/workspace/peaks/${sample}_treat_pileup.bdg
output_bigwig=~/workspace/bigWig/${sample}_treat_pileup.bw
chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes
sort -k1,1 -k2,2n ${input_bedgraph} > ~/workspace/bigWig/tmp
bedGraphToBigWig ~/workspace/bigWig/tmp ${chrom_sizes} ${output_bigwig}
rm ~/workspace/bigWig/tmp
input_bedgraph=~/workspace/peaks/${sample}_control_lambda.bdg
output_bigwig=~/workspace/bigWig/${sample}_control_lambda.bw
chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes
sort -k1,1 -k2,2n ${input_bedgraph} > ~/workspace/bigWig/tmp
bedGraphToBigWig ~/workspace/bigWig/tmp ${chrom_sizes} ${output_bigwig}
rm ~/workspace/bigWig/tmp
mkdir -p ~/workspace/{bigBed,bigWig}
sample="MCF10A_H3K27ac" # Specify variables
input_bedgraph=~/workspace/peaks/${sample}_treat_pileup.bdg #
output_bigwig=~/workspace/bigWig/${sample}_treat_pileup.bw #
chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes #
sort -k1,1 -k2,2n ${input_bedgraph} > ~/workspace/bigWig/tmp # Sorts peaks as to match alphabetical order in col 1 and numerical in col 2 #$(chrom_sizes) into a temporary file
bedGraphToBigWig ~/workspace/bigWig/tmp ${chrom_sizes} ${output_bigwig} # Conversion step
rm ~/workspace/bigWig/tmp #remove temporary file
mkdir -p ~/workspace/{bigBed,bigWig}
sample="MCF10A_H3K27ac"
input_bed=~/workspace/peaks/${sample}_peaks.blacklistRemoved.narrowPeak
output_bigbed=~/workspace/bigBed/${sample}_peaks.blacklistRemoved.bb
chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes
cut -f1-3 ${input_bed} | sort -k1,1 -k2,2n > ~/workspace/bigBed/tmp
bedToBigBed ~/workspace/bigBed/tmp ${chrom_sizes} ${output_bigbed}
rm ~/workspace/bigBed/tmp
sample="MCF10A_H3K27ac"
input_bed=~/workspace/peaks/${sample}_peaks.blacklistRemoved.narrowPeak
output_bigbed=~/workspace/bigBed/${sample}_peaks.blacklistRemoved.bb
chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes
cut -f1-3 ${input_bed} | sort -k1,1 -k2,2n > ~/workspace/bigBed/tmp
bedToBigBed ~/workspace/bigBed/tmp ${chrom_sizes} ${output_bigbed} # Largely the same as above, except we trim our narrowPeak file to just 3 columns
rm ~/workspace/bigBed/tmp
- Download tracks via public url
- Upload onto IGV
- Explore!
Check out the provided regions!
Filtered out region due to intersect w/ blacklist
chr16:34,145,433-34,160,335
Artifact region:
chr16:34514779-34732558
HOXA region:
chr7:26981738-27365875
Try Running H3K27me3, H3K4me1 and H3K4me3 on your own! When running H3K27me3 and H3K4me1 note the comments and code breakdown of step 2
mkdir -p ~/workspace/peaks
for mark in H3K27ac H3K4me3 H3K27me3 H3K4me1 Input;
do
bam=~/CourseData/EPI_data/Module1/MCF10A_resources/MCF10A_${mark}.bam
dedup=~/workspace/peaks/MCF10A_${mark}.dedup.bam
samtools view -@4 ${bam} -bh -q10 -F1028 -o ${dedup}
done
input_frag=~/workspace/peaks/MCF10A_Input.dedup.bam
for narrow_mark in H3K27ac H3K4me3;
do
treatment_frag=~/workspace/peaks/MCF10A_${narrow_mark}.dedup.bam
macs3 callpeak -t ${treatment_frag} -c ${input_frag} -f BAMPE -g hs -n MCF10A_${narrow_mark} --keep-dup all --outdir ~/workspace/peaks/ --bdg 1> ~/workspace/peaks/${narrow_mark}.out.log 2> ~/workspace/peaks/${narrow_mark}.err.log
done
for broad_mark in H3K4me1 H3K27me3;
do
treatment_frag=~/workspace/peaks/MCF10A_${broad_mark}.dedup.bam
macs3 callpeak -t ${treatment_frag} -c ${input_frag} -f BAMPE -g hs -n MCF10A_${broad_mark} --keep-dup all --outdir ~/workspace/peaks/ --bdg --broad 1> ~/workspace/peaks/${broad_mark}.out.log 2> ~/workspace/peaks/${broad_mark}.err.log
done
blacklist=~/CourseData/EPI_data/Module1/QC_resources/hg38_blacklist.bed
for narrow_mark in H3K27ac H3K4me3;
do
bedtools intersect -v -a ~/workspace/peaks/MCF10A_${narrow_mark}_peaks.narrowPeak -b ${blacklist} > ~/workspace/peaks/MCF10A_${narrow_mark}_peaks.blacklistRemoved.narrowPeak
bedtools intersect -u -a ~/workspace/peaks/MCF10A_${narrow_mark}_peaks.narrowPeak -b ${blacklist} > ~/workspace/peaks/MCF10A_${narrow_mark}_peaks.blacklist.narrowPeak
done
for broad_mark in H3K4me1 H3K27me3;
do
bedtools intersect -v -a ~/workspace/peaks/MCF10A_${broad_mark}_peaks.broadPeak -b ${blacklist} > ~/workspace/peaks/MCF10A_${broad_mark}_peaks.blacklistRemoved.broadPeak
bedtools intersect -u -a ~/workspace/peaks/MCF10A_${broad_mark}_peaks.broadPeak -b ${blacklist} > ~/workspace/peaks/MCF10A_${broad_mark}_peaks.blacklist.broadPeak
done
mkdir -p ~/workspace/{bigBed,bigWig}
chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes
for mark in H3K27ac H3K4me3 H3K27me3 H3K4me1
do
input_bedgraph=~/workspace/peaks/MCF10A_${mark}_treat_pileup.bdg
output_bigwig=~/workspace/bigWig/MCF10A_${mark}_treat_pileup.bw
sort -k1,1 -k2,2n ${input_bedgraph} > ~/workspace/bigWig/tmp
bedGraphToBigWig ~/workspace/bigWig/tmp ${chrom_sizes} ${output_bigwig}
rm ~/workspace/bigWig/tmp
done
sample="MCF10A_H3K27ac"
input_bedgraph=~/workspace/peaks/${sample}_control_lambda.bdg
output_bigwig=~/workspace/bigWig/${sample}_control_lambda.bw
chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes
sort -k1,1 -k2,2n ${input_bedgraph} > ~/workspace/bigWig/tmp
bedGraphToBigWig ~/workspace/bigWig/tmp ${chrom_sizes} ${output_bigwig}
rm ~/workspace/bigWig/tmp
mkdir -p ~/workspace/{bigBed,bigWig}
chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes
for narrow_mark in H3K27ac H3K4me3;
do
input_bed=~/workspace/peaks/MCF10A_${narrow_mark}_peaks.blacklistRemoved.narrowPeak
output_bigbed=~/workspace/bigBed/MCF10A_${narrow_mark}_peaks.blacklistRemoved.bb
chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes
cut -f1-3 ${input_bed} | sort -k1,1 -k2,2n > ~/workspace/bigBed/tmp
bedToBigBed ~/workspace/bigBed/tmp ${chrom_sizes} ${output_bigbed}
done
for broad_mark in H3K4me1 H3K27me3;
do
input_bed=~/workspace/peaks/MCF10A_${broad_mark}_peaks.blacklistRemoved.broadPeak
output_bigbed=~/workspace/bigBed/MCF10A_${broad_mark}_peaks.blacklistRemoved.bb
chrom_sizes=~/CourseData/EPI_data/Module1/QC_resources/hg38.chrom.sizes
cut -f1-3 ${input_bed} | sort -k1,1 -k2,2n > ~/workspace/bigBed/tmp
bedToBigBed ~/workspace/bigBed/tmp ${chrom_sizes} ${output_bigbed}
done
for narrow_mark in H3K27ac H3K4me3;
do
bam=~/CourseData/EPI_data/Module1/MCF10A_resources/MCF10A_${narrow_mark}.bam
peak=~/workspace/peaks/MCF10A_${narrow_mark}_peaks.blacklistRemoved.narrowPeak
echo total reads in ${narrow_mark} $(samtools view -@4 ${bam} -q10 -F1028 -c )
done
for broad_mark in H3K4me1 H3K27me3;
do
bam=~/CourseData/EPI_data/Module1/MCF10A_resources/MCF10A_${broad_mark}.bam
peak=~/workspace/peaks/MCF10A_${broad_mark}_peaks.blacklistRemoved.narrowPeak
echo total reads in ${broad_mark} $(samtools view -@4 ${bam} -bh -q10 -F1028 -c )
done
for narrow_mark in H3K27ac H3K4me3;
do
bam=~/CourseData/EPI_data/Module1/MCF10A_resources/MCF10A_${narrow_mark}.bam
peak=~/workspace/peaks/MCF10A_${narrow_mark}_peaks.blacklistRemoved.narrowPeak
echo enriched reads in ${narrow_mark} $(samtools view -@4 ${bam} -q10 -F1028 -c -L ${peak})
done
for broad_mark in H3K4me1 H3K27me3;
do
bam=~/CourseData/EPI_data/Module1/MCF10A_resources/MCF10A_${broad_mark}.bam
peak=~/workspace/peaks/MCF10A_${broad_mark}_peaks.blacklistRemoved.broadPeak
echo enriched reads in ${broad_mark} $(samtools view -@4 ${bam} -bh -q10 -F1028 -c -L ${peak})
done
for file in $(ls ~/CourseData/EPI_data/Module1/QC_resources/* | grep bed | grep -v blacklist);
do
for bam in $(ls ~/CourseData/EPI_data/Module1/MCF10A_resources/*bam );
do
echo $(basename ${bam}) $(basename ${file}) $(samtools view -@4 -q 10 -F 1028 ${bam} -L $file -c )
done
done
for file in $(ls ~/workspace/peaks/*.blacklistRemoved.*Peak);
do
wc -l $file
done
for file in $(ls ~/CourseData/EPI_data/Module1/CEEHRC_resources/*.bed);
do
wc -l $file
done
for histone_mark in H3K27ac H3K27me3 H3K4me3 H3K4me1;
do
for fileA in $(ls ~/workspace/peaks/*${histone_mark}*.blacklistRemoved*Peak);
do
for fileB in $(ls ~/CourseData/EPI_data/Module1/CEEHRC_resources/*${histone_mark}*.bed);
do
echo $(basename $fileA) $(basename $fileB) AuB $(bedtools intersect -u -a $fileA -b $fileB | wc -l)
echo $(basename $fileA) $(basename $fileB) A-B $(bedtools intersect -v -a $fileA -b $fileB | wc -l)
echo $(basename $fileA) $(basename $fileB) B-A $(bedtools intersect -v -b $fileA -a $fileB | wc -l)
done
done
done