# Day 2, session 3: Detecting features in Hi-C maps

In this session we will be looking at ways to automatically find regions with features of interest.
This includes both supervised and unsupervised methods depending on the question.

## Unsupervised detection

### Differential contacts
The classic approach, much like in differential gene expression analysis, is to look at regions with differing contact counts.
There are some well established tools to do this type of analyses, like diffhic or ACCOST.

### Structural changes

A more recent approach, implemented as CHESS (python package chess-hic) uses the notion of structural changes. This is also unsupervised, but attempts to find differential features as "vignettes" in the map.

Those features can then be clustered by similarity so that the user can identify what they represent (loops, stripes, borders, ...)
More info about CHESS in the official docs: https://chess-hic.readthedocs.io/en/latest/?badge=latest

## Supervised pattern detection

There are many methods to detect a specific type of pattern, especially loops and TADS.
Most of those softwares are listed here: https://github.com/mdozmorov/HiC_tools#loop-callers


### Chromosight

Much like other tools, Chromosight can detect patterns in Hi-C contact maps. Instead of being limited to loops or TADS, it uses template matching (a computer vision algorithm) to detect various patterns.

More info about chromosight in the official docs: https://chromosight.readthedocs.io/en/latest/

In [None]:
%%bash

chromosight detect --pattern=loops data/g1.cool data/g1_loops
chromosight detect --pattern=borders data/g1.cool data/g1_borders
chromosight detect --pattern=hairpins data/g1.cool data/g1_hairpins

## Using external tracks

Often instead of detecting those features solely from the Hi-C signal, we use external tracks such as ChIP-seq of proteins of interest to find the regions.

This can be used for example to train machine learning methods such as [Peakachu](https://github.com/tariks/peakachu)

There are also helpful tools like [coolpuppy](https://github.com/open2c/coolpuppy) which can do 2D aggregation of Hi-C maps using 1D signals.


In [None]:
%%bash
coolpup.py g1.cool scer_cohesin_peaks.bed

Chromosight can also quantify correlation scores with a given pattern at 2D positions given by a BED file.

In [None]:
%%bash
# We first need to get 1D positions into 2D combinations
# E.g. to generate all combinations of positions spaced by more than 10kb but less than 50kb:

MINDIST=10000
MAXDIST=50000
bedtools window -a input/scer_cohesin_peaks.bed \
                -b input/scer_cohesin_peaks.bed \
                -w $MAXDIST \
    | awk -vmd=$MINDIST '$1 == $4 && ($5 - $2) >= md {print}' \
    | sort -k1,1 -k2,2n -k4,4 -k5,5n \
    > input/scer_cohesin_peaks.bed2d

In [None]:
%%bash
chromosight quantify --pattern=loops scer_cohesin_peaks.bed2d g1.cool data/cohesin_loops