## BIOM262: ChIP-Seq workshop – Part 2

---
Start by grabbing an interactive node: 
```
qsub -I -l nodes=1:ppn=8 -l walltime=4:00:00 -q hotel![image.png](attachment:image.png)
```

---

Following the steps 1-5 on we did on Tuesday, you should have a tagdirectory for each of the datasets (6 in total) and you should also have the UCSCfiles.   

As we discussed, each dataset was created by a different antibody, and they can be divided into three types: TFs (transcription factors) HMs (histone modifications) and global input (background). Since we will need to treat each type differently, I recommend making a directory for each – input, TFs and HMs and move the tag directories to the relevant one (e.g. tfs/oct4-esc/, etc.).

---
**6. One of the most common tasks with ChIP-seq data is to find ‘enriched’ regions commonly called “peaks”.** HOMER contains a command called findPeaks which is used to analyze tag directories for peaks. There are two common ways to use the command:

```
findPeaks <tag directory> -i <control tag directory>  -style factor  -o auto
```

or

```
findPeaks <path>/oct4-esc/ -i <path>/input-esc/ -style factor -o auto
```

The difference between the two is in the “-style factor/histone” argument, which will tell the program to look for focal, fixed width peaks vs. variable length peaks; the later is more common in the case of histone modifications. To find Oct4 peaks in the data, run the following command:

```
findPeaks <path>/oct4-esc/ -i <path>/input-esc/ -style factor -o auto
```
  
This command will look for enriched regions and filter them based on several criterion, including ensuring that they have at least 4-fold more reads in peak regions relative to the control experiment (in this case `input-esc/`). The output will be stored in a HOMER-style peak file located in the Oct4 tag directory (`oct4-esc/peaks.txt`). The beginning of this file contains statistics and QC stats from the peak finding, including the number of peaks, number of peaks lost to input filtering, etc.     

One field worth paying attention to is the **“Approximate IP efficiency”** which reports what fraction of reads from the experiment were actually found in peaks. For most decent experiments this value ranges from 1% to >30% (remember ChIP is an enrichment strategy... there is plenty of background in the data too!). Below this are the peaks along with enrichment statistics for each region.

One other thing to note is that HOMER reports the results in a ‘peak’ file, which has a slightly different format from a traditional BED file format. To create a BED file from the peak file, use the tool pos2bed.pl (e.g. `pos2bed.pl oct4-esc/peaks.txt > oct4-esc.bed`. The “`> output.txt`” part at the end means that the results will be sent to stdout, and the “`> output.txt`” is used to capture the output information in a file.). BED files can be uploaded to IGV just like a bedGraph file. Also, most HOMER programs will work with either BED or peak files as input.


Next we will find peaks for all samples using two ‘for loops’ – for the two types of data:

```
for dir in <path>/hms/*; 
do findPeaks $dir -i <path>input-esc/ -style histone -o auto; 
done 
```

and

```
for dir in <path>/tfs/*; 
do findPeaks $dir -i <path>input-esc/ -style factor -o auto; 
done
```

Make a directory for the annotation files (e.g., “annotations”) and convert the peak.txt and region.txt files to bed files:

```
for dir in <path>/hms/*; 
do dirname=${dir##*/}; 
    pos2bed.pl $dir/regions.txt > <path>/annotations/$dirname.bed; 
done
```

And 

```
for dir in <path>/tfs/*; 
do dirname=${dir##*/}; 
    pos2bed.pl $dir/peaks.txt > <path>/annotations/$dirname.bed; 
done
```

Copy to your local computer using scp or filezila and load to IGV. Explore the original (previous to conversion to bed) peaks.txt and regions.txt files with less -S in the command line. For instance, each peak gets a score, see how the ones with high scores look on IGV vs. ones with low scores. 