# Setup cLoops for annotating loops in HiC data processed with Juicer

notebook by Frank Grenn  

juicer pipeline by Aiden lab:   
[juicer github](https://github.com/aidenlab/juicer)  
cLoops pipeline by YaqiangCao:  
[cLoops github](https://github.com/YaqiangCao/cLoops)

this notebook depends on processed HiC data from the juicer pipeline in [long format](https://github.com/aidenlab/juicer/wiki/Pre#file-format). Usually named "merged_nodups.txt" 

In [14]:
USERDIR="/folder/username" #directory containing conda directory which will hold the custom python environment

### (1) Create environment for cLoops

In [9]:
print("cd {}".format(USERDIR))

cd /data/$USER


In [10]:
print("source {}/conda/etc/profile.d/conda.sh".format(USERDIR))
print("conda activate base")
print("which python")
print("conda update conda")
print("conda clean --all --yes")

source /data/$USER/conda/etc/profile.d/conda.sh
conda activate base
which python
conda update conda
conda clean --all --yes


In [8]:
print("cd temp")
print("git clone https://github.com/YaqiangCao/cLoops")
print("cd cLoops")
print("conda env create -n cLoops --file cLoops_env.yaml")

cd temp
git clone https://github.com/YaqiangCao/cLoops
cd cLoops
conda create -n cLoops --file cLoops_env.yaml


---
#### troubleshooting:
if this fails for the ```joblib``` package then go into cLoops_env.yaml, delete the joblib line and rerun the ```conda env create -n cLoops --file cLoops_env.yaml``` command above

then run:

In [9]:
print("conda activate cLoops")
print("which pip")
print("pip install joblib")

conda activate cLoops
which pip
pip install joblib


that should fix it (although joblib may already be available...)

---

In [1]:
print("conda activate cLoops")
print("python setup.py install")


conda activate cLoops
python setup.py install


---
#### troubleshooting:
if this fails you may need to set the PYTHONPATH environmental variable and add --prefix to the install command
    

In [12]:
print("export PYTHONPATH={}/conda/envs/cLoops/lib/python2.7/site-packages/".format(USERDIR))
print("python setup.py install --prefix={}/conda/envs/cLoops".format(USERDIR))

export PYTHONPATH=/data/$USER/conda/envs/cLoops/lib/python2.7/site-packages/
python setup.py install --prefix=/data/$USER/conda/envs/cLoops


---

In [13]:
WRKDIR="/folder/for/cLoops/run/files"
OUT_PREFIX="35236"
CLOOPS_OUTDIR="/folder/for/output/{}".format(OUT_PREFIX)
BEDPE_FILE="{}/{}_PET.bedpe".format(CLOOPS_OUTDIR,OUT_PREFIX)
JUICERLONG_FILE="/path/to/sample/aligned/merged_nodups.txt"

#### (2) Write bash job script to run the python conversion script


In [83]:
with open ("{}/scripts/juicerLong2bedpe_{}.sh".format(WRKDIR,OUT_PREFIX), "w") as text_file:
    print("#!/bin/bash \n\
source {}/conda/etc/profile.d/conda.sh \n\
module load python \n\
conda activate cLoops \n\
export PYTHONPATH={}/conda/envs/cLoops/lib/python2.7/site-packages/ \n\
{}/temp/cLoops/scripts/juicerLong2bedpe.py -i {} -o {} \n\
echo 'done'".format(USERDIR, USERDIR, USERDIR, JUICERLONG_FILE, BEDPE_FILE), file = text_file)
    text_file.close()

In [4]:
print("sbatch --mem=200g --cpus-per-task=10 --mail-type=ALL --time=24:00:00 {}/scripts/juicerLong2bedpe_{}.sh".format(WRKDIR,OUT_PREFIX))

sbatch --mem=200g --cpus-per-task=10 --mail-type=ALL --time=24:00:00 /data/LNG/Frank/HiC_project/cLoops/scripts/juicerLong2bedpe_35236.sh


example cLoops runs:

cLoops -f GSM1551552_GM12878_HiC_chr21_hg38.bedpe.gz -o hic -w -j -eps 5000,7500,10000 -minPts 20,30 -s -hic
cLoops -f GSM1551552_GM12878_HiC_chr21_hg38.bedpe.gz -o hic -w -j -s -m 3

#### (3) Write bash job script to run cLoops

In [85]:
with open ("{}/scripts/run_cLoops_{}.sh".format(WRKDIR,OUT_PREFIX), "w") as text_file:
    print("#!/bin/bash \n\
source {}/conda/etc/profile.d/conda.sh \n\
module load python \n\
conda activate cLoops \n\
export PYTHONPATH={}/conda/envs/cLoops/lib/python2.7/site-packages/ \n\
cd {} \n\
cLoops -f {} -o {} -w -j -s -m 3 -plot -p -1 \n\
echo 'done'".format(USERDIR, USERDIR, CLOOPS_OUTDIR,BEDPE_FILE, OUT_PREFIX), file = text_file)
    text_file.close()

In [4]:
print("sbatch --mem=800g --partition=largemem --cpus-per-task=10 --mail-type=ALL --time=10-0 {}/scripts/run_cLoops_{}.sh".format(WRKDIR,OUT_PREFIX))

sbatch --mem=800g --partition=largemem --cpus-per-task=10 --mail-type=ALL --time=10-0 /data/LNG/Frank/HiC_project/cLoops/scripts/run_cLoops_35236.sh
