Skip to content

batxes/4Cin

Repository files navigation

Important Note

If you have any problem with installation, running the code, or ANYTHING, write an email (batxes@gmail.com) so that I can help you. There is nothing more annoying than a program that does not work.

4Cin

4Cin is a suite of scripts that lets you generate 3D models of the chromatin of your favourite locus, using 4C-seq data as the only input. These 3D models are visualized with UCSF Chimera. The pipeline creates Hi-C-like contact maps (or virtual Hi-C) of these 3D models and additional scripts are provided to analyze the region further.

Using Docker to execute 4Cin

A docker image can be downloaded from Docker Hub with all dependencies installed and ready to work. Steps are:

Install Docker: https://docs.docker.com/engine/installation/

Pull the image:

docker pull batxes/4cin_ubuntu

Now, you can use all the scripts like this:

docker run -it batxes/4cin_ubuntu 4Cin.py
docker run -it batxes/4cin_ubuntu Mut_comp.py
docker run -it batxes/4cin_ubuntu calculate_boundaries.py
...

We can use volumes so docker can access the data that is in your computer:

docker run -it -v /path/in/my/computer/to/4C/data:/data/ batxes/4cin_ubuntu data_manager.py /data/

To run the modeling, we can mount two volumes: one will be for docker to access the 4C data. The other one to get the models that are generated in the docker container.

sudo docker run -it -v <path_where_my_4C_data_is>:/data/ -v <path_where_I_want_my_new_models>:/opt/4Cin/<Name_of_my_models> batxes/4cin_ubuntu 4Cin.py /data/ <Name_of_my_models> --cpu 10 --fragments_in_each_bead 20 --colormap magma_r

You can test with the data that it is provided:

sudo docker run -it -v /home/<name>/Programs/4Cin-master/data/Six_zebra/:/data/Six_zebra -v /home/<name>/MyNewModels:/opt/4Cin/Six_zebra_models batxes/4cin_ubuntu 4Cin.py /data/Six_zebra/ Six_zebra_models --cpu 10 --fragments_in_each_bead 20 --colormap magma_r

The first volume links a folder in our computer with /data/Six_zebra. We use the later as a 4Cin parameter, the directory where the data is.

The second volumne links a folder in our computer (MyNewModels will be a directory that it will be created when we run the program) with /opt/4Cin/Six_zebra_models. /opt/4Cin/Six_zebra_models is a directory that it will be created in the docker container because we specified Six_zebra_models as the name for the directory where our models will be created as a parameter. If we want our models to be called mouse_WT_rep2, then we will have to link the second volume like this: /home//MyNewModels:/opt/4Cin/mouse_WT_rep2

These scripts are available now:

4Cin.py

data_manager.py

paint_model.py

prepare_data.py

calculate_boundaries.py

prepare_data.py

calculate_vhic_from_realdata.py

Mut_comp.py

Evo_comp.py

HiC_comp.py

If you want to use it in your system or cluster without docker, go to https://github.com/batxes/4Cin/blob/master/README.md#dependencies.

Execute 4Cin without docker (The normal way)

First we need to install all the dependencies (https://github.com/batxes/4Cin/blob/master/README.md#dependencies). Then u can check the fast usage (https://github.com/batxes/4Cin/blob/master/README.md#fast-usage) or the explained usage (https://github.com/batxes/4Cin/blob/master/README.md#explained-usage)

Fast Usage

1 - python src/prepare_data.py /path/to/data/all_4C_data
2 - Provide a primers.txt file with name of primers and position:
    viewpoint1 chr2:423234
    viewpoint2 chr2:426351
    viewpoint3 chr2:449584
    viewpoint4 chr2:501421    
3 - python 4Cin.py /path/to/data/ Name_of_your_locus

A log.txt is generated in the modeling directory with all the variables used in the modeling.

[Optionals]

1.5 - python src/data_manager.py /path/to/my/data/ Name_of_your_locus
4 - python src/paint_model.py Representative.py 
                              /path/to/my/data/ 
                              epigenetic_data colormap
5 - python src/calculate_boundaries.py vhic.txt
6 - python src/Evo_comp.py /path/to/my/data/ 
                            Name_of_your_locus 
                            vhic.txt 
                            max_distance_of_locus
                            /path/to/my/otherdata/ 
                            Name_of_your_other_locus 
                            other_vhic.txt 
                            max_distance_of_other_locus
7 - python src/Mut_comp.py /path/to/my/data/ 
                            locus vhic.txt 
                            max_distance_of_locus
                            mutant_locus 
                            mutant_vhic.txt 
                            max_distance_of_mutant_locus

Examples

Six Zebrafish models with 10 cores, 20 4C-seq fragments in each bead and magma_r colormap for the virtual Hi-C:

python 4Cin.py data/Six_zebra/ Six_zebra_models --cpu 10 --fragments_in_each_bead 20 --colormap magma_r

Same as before but generating 10000 models and getting the best 50 models:

python 4Cin.py data/Six_zebra/ Six_zebra_models --cpu 10 --Nmodels 10000 --subset 50 --fragments_in_each_bead 20 --colormap magma_r

Same as before but jumping the pre-modeling step (we need to set also the max_distance, uZ and lZ:

python 4Cin.py data/Six_zebra/ Six_zebra_models --cpu 10 --Nmodels 10000 --subset 50 --jump_step pre_modeling --max_distance 8000 --uZ 0.2 --lZ -0.5 --fragments_in_each_bead 20 --colormap magma_r

If we only want to change the virtual Hi-C scale

python 4Cin.py data/Six_zebra/ Six_zebra_models --repaint_vhic --maximum_hic_value 5000 --max_distance 8000 --uZ 0.2 --lZ -0.5 --fragments_in_each_bead 20 --colormap magma_r

python 4Cin.py --help

usage: 4Cin.py [-h] [--preNmodels PRE_NUMBER_OF_MODELS]
               [--from_dist FROM_DIST] [--to_dist TO_DIST]
               [--dist_bins DIST_BINS] [--from_zscore FROM_ZSCORE]
               [--to_zscore TO_ZSCORE] [--zscore_bins ZSCORE_BINS]
               [--Nmodels NUMBER_OF_MODELS]
               [--ignore_beads IGNORE_BEADS [IGNORE_BEADS ...]]
               [--subset SUBSET] [--std_dev STD_DEV]
               [--cut_off_percentage CUT_OFF_PERCENTAGE] [--k_value K_MEAN]
               [--maximum_hic_value MAXIMUM_HIC_VALUE] [--repaint_vhic]
               [--colormap COLORMAP] [--cpu NUMBER_OF_CPUS] [--verbose]
               [--working_dir WORKING_DIR]
               [--fragments_in_each_bead FRAGMENTS_IN_EACH_BEAD]
               [--jump_step {pre_modeling,modeling,analysis,vhic,representative}]
               [--uZ UZ] [--lZ LZ] [--max_distance MAX_DISTANCE]
               data_dir prefix

Program that generates 3D models and a virtual Hi-C of your favourite region.

optional arguments:
  -h, --help            show this help message and exit

Pre-modeling:
  Parameters used in the pre-modeling

  --preNmodels PRE_NUMBER_OF_MODELS
                        number of models that will be generated in the pre-
                        modeling phase
  --from_dist FROM_DIST
                        minimum max-distance that will be used in the pre-
                        modeling phase
  --to_dist TO_DIST     maximum max-distance that will be used in the pre-
                        modeling phase
  --dist_bins DIST_BINS
                        size of jump between from_dist and to_dist
  --from_zscore FROM_ZSCORE
                        minimum Z-score that will be used in the pre-modeling
                        phase
  --to_zscore TO_ZSCORE
                        maximum Z-score that will be used in the pre-modeling
                        phase
  --zscore_bins ZSCORE_BINS
                        size of jump between from_zscore and to_zscore

Modeling:
  Parameters used in the modeling

  --Nmodels NUMBER_OF_MODELS
                        number of models that will be generated in the
                        modeling phase
  --ignore_beads IGNORE_BEADS [IGNORE_BEADS ...]
                        Beads that are not gonna have distance restraints.
                        Also usable in the pre-modeling

Analysis and clustering:
  Parameters used in the analysis and clustering

  --subset SUBSET       Number of best models out of the Modeling process
  --std_dev STD_DEV     Standard deviation of the distances between beads, to
                        be considered fulfilled
  --cut_off_percentage CUT_OFF_PERCENTAGE
                        Percetange of fulfilled distances in each model to be
                        a good model
  --k_value K_MEAN      Number of cluster to expect in the clustering.

Virtual Hi-C:
  Parameters used in the generation of the virtual Hi-C

  --maximum_hic_value MAXIMUM_HIC_VALUE
                        The virtual Hi-C gradient color will be from 0 to
                        maximum_hic_value.
  --repaint_vhic        repaint_vhic True to generate the virtual Hi-C again.
                        Modify the --maximum_hic_value also.
  --colormap COLORMAP   The colormap of the virtual Hi-C. Matplotlib colormap.

Global:
  Global parameters used in the modeling

  data_dir              location of the 4C data. primers.txt needs tobe in
                        there also
  prefix                Name of the models
  --cpu NUMBER_OF_CPUS  number of CPUs that will be used in this script
  --verbose             Verbose True for more information while executing the
                        script
  --working_dir WORKING_DIR
                        location where the models will be generated
  --fragments_in_each_bead FRAGMENTS_IN_EACH_BEAD
                        Number of fragments that will be represented with each
                        bead
  --jump_step {pre_modeling,modeling,analysis,vhic,representative}
                        Jump the step and the previous ones. The steps in
                        order are: Pre-Modeling, Modeling, Analysis &
                        Clustering, virtual Hi-C calculation, most
                        representative model
  --uZ UZ               Upper bound Z score (Only needed if jumping pre-
                        modeling steps)
  --lZ LZ               Lower bound Z score (Only needed if jumping pre-
                        modeling steps)
  --max_distance MAX_DISTANCE
                        Maximum distance (Only needed if jumping pre-modeling
                        steps)

Apart from the 4C data, a primers.txt file is needed in that folder, which has
4C file name and position of the gene. Optionaly, a primers_vhic.txt file can
be also created to paint interesting positions in the virtual Hi-C. File needs
to be like this: gene_name chrN:position color. Color written as yellow, red,
green or other colors.

Explained Usage

1 - run "prepare_data.py" to homogenize the 4C files. All 4C files need to have the same number of fragments and the same length.

Example: python src/prepare_data.py /path/to/data/all_4C_data

2 - Generate primers.txt file, with name of genes and position. Optional, primers_vhic.txt (the same format as in primers.txt) with more positions to paint in the virtual Hi-C. color can be added afterwards.

Example primers.txt:

viewpoint1 chr2:423234
viewpoint2 chr2:426351
viewpoint3 chr2:449584
viewpoint4 chr2:501421    

Example primers_vhic.txt:

viewpoint1 chr2:423234 red
viewpoint2 chr2:426351 cyan
viewpoint3 chr2:449584
viewpoint4 chr2:501421 
geneA chr2:526921 lightgreen
geneB chr2:439558 darkviolet
enhancer1 chr2:468954 lightgreen

if no color stated, default will be yellow.

3 - Do the modeling.

Example:

python 4Cin.py /path/to/data/ Name_of_your_locus

A log.txt is generated in the modeling directory with all the variables used in the modeling.

Optional Steps:

11 - run "paint_model.py". It will map epigenetic marks in a model of our choice. We will set the path of the bed or bam file and the colormap (matplotlib).

Example:

      python src/paint_model.py /home/user/4Cin/MyModels/MyModels_final_output/Representative.py 
                                /home/user/4Cin/data/my_data/  
                                /home/user/4Cin/data/epigenetic_data.bam 
                                Blues

12 - run "calculate_boundaries.py". Given the virtual Hi-C matrix it plots the directionality index plot.

Example:

python src/calculate_boundaries.py /home/user/4Cin/MyModels/MyModels_final_output/vhic_MyModels.txt

13 - run "Evo_comp.py". Evolutive comparison. Given two vhics, it creates a hi-c like matrix with the relative positions of both loci. The conserved regions will be set in two primers_evocomp.txt, each under each data directory.

Example primers_evocomp.txt:

a chr2:423234 red
b chr2:426351 cyan
c chr2:449584
d chr2:501421 
geneA chr2:526921 lightgreen
geneB chr2:439558 darkviolet
e chr2:468954 lightgreen

Example

      python src/Evo_comp.py /home/user/4Cin/data/my_zebra_data/ 
                             Zebra_models 
                             /home/user/4Cin/Zebra_models/Zebra_models_final_output/vhic_Zebra_models.txt 
                             max_distance_of_Zebra_models_locus
                             /home/user/4Cin/data/my_mouse_data/ 
                             Mouse_models 
                             /home/user/4Cin/Mouse_models/Mouse_models_final_output/vhic_Mouse_models.txt 
                             max_distance_of_Mouse_models_locus

14 - run "Mut_comp.py". Mutation comparison. The same as Evo_comp.py, but this time the same locus is compared. Useful to study structural genomic variations like inversions, truncations, deletions... primers_vhic.txt will be used to paint positions in the vhic.

Example

      python src/Mut_comp.py /home/user/4Cin/data/my_locus/ 
                             wt_models 
                             /home/user/4Cin/wt_models/wt_models_final_output/vhic_wt_models.txt 
                             max_distance_of_wt_models
                             mutant_models 
                             /home/user/4Cin/mutant_models/mutant_models_final_output/vhic_mutant_models.txt 
                             max_distance_of_mutant_models

15 - Input data can be checked calling data_manager.py. Shows 3 plots for each 4C file, showing read counts, Z scores and the conversion into distance restraints that would be used in the modeling.

Example:

python src/data_manager.py /home/user/4Cin/data/my_locus/ [0.2 -0.4 8000]  

Dependencies

Note: Tested only in Linux, of course.

python 2.7
matplotlib
scipy
numpy
UCSF Chimera (Download from https://www.cgl.ucsf.edu/chimera/download.html)
IMP 2.5, 2.4 (newer versions crash) (Download from http://integrativemodeling.org/old-versions.html)
pysam (for paint_model.py)

Go to Installing dependencies(https://github.com/batxes/4Cin/blob/master/README.md#installing-dependencies) to install them. If you have no sudo, go to Installing without SUDO(https://github.com/batxes/4Cin/blob/master/README.md#installing-without-sudo).

Installing dependencies

Matplotlib, scipy and numpy:

apt-get install python-matplotlib
apt-get install python-scipy
apt-get install python-numpy

To install Chimera:

Download from: https://www.cgl.ucsf.edu/chimera/download.html

make it executable:

chmod +x chimera-installer.bin

run it:

./chimera-installer.bin

#in the installation process, set a symbolic link wherever you want. If it was not set, you can generate one afterwards, for example:

ln -s /pathtoCHIMERAfiles/bin/chimera /usr/local/bin/chimera 

if problems visit: https://www.cgl.ucsf.edu/chimera/data/downloads/1.11.2/linux.html

Install IMP from source:

sudo apt-get install cmake
sudo apt-get install libboost-all-dev
sudo apt-get install libhdf5-dev
sudo apt-get install libcgal-dev
sudo apt-get install python-dev
sudo apt-get install libpcre3-dev

Swig can be installed with: sudo apt-get install swig Or can be installed from source. Check in: http://swig.org/

tar xvf swig-version.tar.gz
cd swig-version
./configure --prefix=/path/to/swig
make & make install

Download the IMP tarball file from http://salilab.org/imp/ and uncompress it:

wget https://integrativemodeling.org/2.5.0/download/imp-2.5.0.tar.gz -O imp-2.5.0.tar.gz
tar xzvf imp-2.5.0.tar.gz

Create a directory for the IMP instalation.

mkdir IMP

Move into the IMP directory and compile the code (Note: the -j option stands for the number of CPUs you want to assign to the compiler; the higher the faster).

cd IMP
cmake ../imp-2.5.0 -DCMAKE_BUILD_TYPE=Release -DIMP_MAX_CHECKS=NONE -DIMP_MAX_LOG=SILENT
make -j4

Once the compilation has finished, open the file setup_environment.sh in your IMP directory and copy the first lines into your >~/.bashrc file (if this file in not present in your home directory, create it). These lines should look like:

LD_LIBRARY_PATH="/path/to/IMP/lib:/path/to/IMP/src/dependency/RMF/:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH
PYTHONPATH="/path/to/IMP/lib:/path/to/IMP/src/dependency/RMF/:$PYTHONPATH"
export PYTHONPATH

Important note: Do not copy the lines above, copy them from setup_environment.sh, where SOMETHING is replaced by your real path to IMP.

Installation process of IMP taken from: https://3dgenomes.github.io/TADbit/install.html#imp-3d-modeling

Install pysam (only for paint_model.py)

sudo apt-get install python-pip
pip install pysam. 

If does not work try this:

git clone git@github.com/pysam-developers/pysam
python setup.py build
python setup.py install (libcurl4-gnutls-dev )
!if u get an error saying regcompA was not found, rename regex.h from the boost library (in my case /usr/local/include/regex.h) to something else before building (like, "regex.heyho"). Then change it back to "regex.h"!

Installing without SUDO

Note: python and its libraries are not explained how to install. If you are installing these programs in a cluster it is very likely that python and its libraries are already installed.

install latest cmake (currently 3.8.0)

tar xvf cmake-3.8.0-rc1.tar.gz
cd cmake-3.8.0-rc1
./configure --prefix=/path/to/cmake
make & make install

Include cmake in the $PATH environment variable

export PATH=$PATH:/path/to/cmake/bin

install latest boost C++ (currently 1.63.0)

tar xvf boost_1_63_0.tar.gz
cd boost_1_63_0
./bootstrap.sh
./b2 install --prefix=/path/to/boost

install latest swig currently(3.0.12)

tar xvf swig-3.0.12.tar.gz
cd swig-3.0.12
./configure --prefix=/path/to/swig
make & make install

install latest hdf5 currently (1.8.18)

tar xf hdf5-1.8.18.tar
cd hdf5-1.8.18
./configure --prefix=/path/to/hdf5
make & make install

Download IMP 2.5.0 and unpack

tar xvf imp-2.5.0.tar.gz 
mkdir IMP
cd IMP 

Prepare the environmental variables with paths to boost and hdf5

export BOOST_ROOT=/path/to/boost_dir
export HDF5_ROOT=/path/to/hdf5_dir

COMPILE IMP

cmake ../imp-2.5.0 -DCMAKE_BUILD_TYPE=Release -DIMP_MAX_CHECKS=NONE -DIMP_MAX_LOG=SILENT -DSWIG_EXECUTABLE=/path/to/swig-3.0.12/swig
make -j4

Additional scripts

Getting 4C data like from Hi-C

python data/get_data.py -> generates 4C like files from the Hi-C file

calculate_vhic_from_realdata.py convert_HiC_data.py HiC_comp.py

Notes

  • If you want to concatenate the beads with a tube, after openning the model in UCSF-Chimera, write this in its command line: "shape tube #X-Y radius Z bandlength 10000" (X and Y being the first and last beads, Z being the thickness of tube in Angstroms.)

  • All the data will be stored under a directory with the same name as the prefix, unless we set it under --working_dir parameter

  • bam files need to be sorted and indexed before using.

  • --: samtools sort mouse_h3k4me3_ES_bingren_rep1.bam mouse_h3k4me3_ES_bingren_rep1.sorted

  • --: samtools index mouse_h3k4me3_ES_bingren_rep1.sorted

FAQ

I have UCSF Chimera installed but it does not work. Create a link to chimera with "ln -s" and give permissions to whole python2.7 inside chimera/bin

"AttributeError: 'Model' object has no attribute 'this'" install Swig 3.0.7. For this perhaps u need to install sudo apt-get install libpcre3 libpcre3-dev

matplotlib colors are here: http://matplotlib.org/examples/color/colormaps_reference.html

GenomePainting does not work. Check the file generated coloring.cmd and fix the path. Chimera does not like if the first line is something like "open ../myModel.py". Change to something like /home/user/myModel.py.

Modeling can't find my viewpoints or I can't show positions in the vHi-Cs. Check that your primers files (primers.txt, primers_vhic.txt and primers_evocomp.txt) are as follows: NAME chrX:position

references

ref1. Tjong H, Gong K, Chen L, Alber F. Physical tethering and volume exclusion determine higher-order ge- nome organization in budding yeast. Genome Res. 2012; 22: 1295–1305. doi: 10.1101/gr.129437.111 PMID: 22619363

ref2. Bystricky K, Heun P, Gehlen L, Langowski J, Gasser SM. Long-range compaction and flexibility of inter- phase chromatin in budding yeast analyzed by high-resolution imaging techniques. Proc Natl Acad Sci U S A. 2004; 101: 16495–16500. PMID: 15545610

Overview

alt tag

More Technical figure: alt tag

About

3D chromatin modeler using 4C-seq data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages