# Generation of Hi-C map by Juicer

> Modified on Nov 24th, 2021 by Meng Zhang (mzhang@mpipz.mpg.de)

Juicer is a powerful tool for Hi-C data processing and analysis. Here we just take the advantage of the pipeline that creates Hi-C files from raw data. If you are interested in the complete function of juicer, please visit the GitHub of [Aidenlab](https://github.com/aidenlab/juicer).

## Step 1: Install Juicer
You can also go to [juicer installation page](https://github.com/aidenlab/juicer/wiki/Installation#quick-start) to read detailed guidance and interpretation.

In [None]:
cd /path/to/install/juicer/
git clone https://github.com/theaidenlab/juicer.git

# create a symbolic link in working directory
cd /path/to/your/HiC/analysis/working/directory/
ln -s ~/juicer/CPU scripts

# juicer_tools is required in Hi-C map generation step
# take CPU version as example
cd scripts/common
wget https://hicfiles.tc4ga.com/public/juicer/juicer_tools.1.9.9_jcuda.0.8.jar
ln -s juicer_tools.1.9.9_jcuda.0.8.jar  juicer_tools.jar

## Step 2: Configrue directory structure

Juicer requires a particular directory structure to navigate. A typical sample is given below, and it is highly recommended you configure your files and directories as the same structure to ensure that juicer is able to find files properly. [Juicer wiki](https://github.com/aidenlab/juicer/wiki/Installation#quick-start) also gives a clear interpretation about the directory structure.

![Recommended directory structure](./juicer_directory_structure.png)

All files associated with juicer analysis are suggested to be put under one directory. `scripts` is a symbolic link to where the main juicer scripts locate, which might be different for CPU and cluster versions. Please determine your running platform before creating the link, and also refer to the examples given by [juicer wiki](https://github.com/aidenlab/juicer/wiki/Installation#quick-start). Note that the above installation command takes CPU version as example. `reference/` folder stores the reference genome and its BWA indexed files. `restriction_sites/` contains the restriction site files, see below to learn about how to generate them. The sequenced Hi-C reads are in `fastq/`, which can remain gzipped. Files generated by juicer will be stored in `aligned/` and `split/` but you do not have to create these two folders because they will be created when you run juicer.

## Step 3: Preprocessing
If reference genome has not been indexed, please use BWA to index under `reference/` directory.

In [None]:
JSD='/path/to/juicer/scripts/directory' # directory where you put scripts, i.e. your juicer analysis working directory
REF='/path/to/reference/reference.fasta' 
genomeID='your_reference_name' # should be the same as reference genome name, that is 
thread=<number of threads>

In [None]:
# index reference genome
cd $JSD/reference
bwa index $REF

Python program `generate_site_positions.py` is more stable when running via python2, so if unexpected errors appear, please inspect your running environment. But this problem could possibly be solved in latest juicer version. Plus, note the restriction enzyme used when generating Hi-C data. If it is not defined in `generate_site_positions.py`, click [here](https://github.com/aidenlab/juicer/wiki/Usage) to learn about how to modify the script or set juicer flag. If you don't want to do any fragment-level analysis (as with a DNAse experiment), you should assign the site "none", as in `juicer.sh -s none`.

In [None]:
# create restriction sites file
cd $JSD/restriction_sites
python /path/to/juicer/misc/generate_site_positions.py \
DpnII  $genomeID $REF

# generate genome.chrom.sizes
awk 'BEGIN{OFS="\t"}{print $1, $NF}' ${genomeID}_DpnII.txt \
 > ${genomeID}.chrom.sizes

## Step 4: Run juicer for each replicate

If Hi-C sequencing data consists of multiple replicates, no matter biological or technical, it is suggested to regard them as independent works, and run juicer separately. For example, the Hi-C data I used came from two different libraries (1M and 2M, biological replicates), and two different PCRs amplification based on each library (R1 and R2, technical replicates), so it ended up that four alignment works ran separately.

In [None]:
# juicer pipeline for each replicate
cd $JSD
$JSD/scripts/juicer.sh -g $genomeID -s DpnII -t 64 \
 -d $JSD/HiCwork/replicate1/  -D $JSD  -z $REF \
 -p $JSD/restriction_sites/${genomeID}.chrom.sizes \
 -y $JSD/restriction_sites/${genomeID}_DpnII.txt
 
$JSD/scripts/juicer.sh -g $genomeID -s DpnII -t 64 \
 -d $JSD/HiCwork/replicate2/  -D $JSD  -z $REF \
 -p $JSD/restriction_sites/${genomeID}.chrom.sizes \
 -y $JSD/restriction_sites/${genomeID}_DpnII.txt

...

## Step 5: Merge replicates (optional)

If your Hi-C sequencing data are from multiple replicates, it is necessary to merge them to generate a combined Hi-C map. To this end, Juicer provides us `MEGA`. You can find detailed interpretaion in [juicer manual](https://github.com/aidenlab/juicer/wiki/Usage#creating-a-mega-map). However, if you do not have sequencing replicates, which means only one juicer analysis is implemented in step 4, this step is not required.

In [None]:
cd $JSD/HiC_work
$JSD/scripts/common/mega.sh -D $JSD -g $gneomeID -s DpnII

In the end, you can find the 'mega' map and some statistics at `$JSD/HiC_work/mega/aligned/`. It is worth to note that juicer produces Hi-C map with quality over 1 and 30 They are named as `inter.hic` and `inter_30.hic`, and the corresponding statistics are named as `inter.txt` and `inter_30.txt`.