process_hicup
This pipeline will take as input two fastq files, RE sites, the genome indexed with GEM and the same genome in FASTA file. This pipeline uses TADbit to map, filter and produce a bed file that will be used later on to produce bam file compatible with CHiCAGO algorithm. More information about filtering and mapping https://3dgenomes.github.io/TADbit/
- config : str
Configuration JSON file
- in_metadata : str
Location of input JSON metadata for files
- out_metadata : str
Location of output JSON metadata for files
- Wd : folders and files
path to the working directory where the output files are
REQUIREMENT - Needs two fastq files single end, FASTA genome and bowtie2 indexed genome.
When running the pipeline on a local machine without COMPSs:
python process_hicup.py \
--config tests/json/config_hicup.json \
--in_metadata tests/json/input_hicup.json \
--out_metadata tests/json/output_hicup.json \
--local
When using a local version of the [COMPS virtual machine](https://www.bsc.es/research-and-development/software-and-apps/software-list/comp-superscalar/):
runcompss \
--lang=python \
--library_path=${HOME}/bin \
--pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
--log_level=debug \
process_fastq2bed.py \
--config tests/json/config_hicup.json \
--in_metadata tests/json/input_hicup.json \
--out_metadata tests/json/output_hicup.json
process_hicup.process_hicup
process_rmap
- This pipeline creates the .rmap file, one of the inputs of CHiCAGO. The file Consisting on
<chr> <start> <end> <numeric ID>. Is a virtual digest of the genome using a RE.
- config : str
Configuration JSON file
- in_metadata : str
Location of input JSON metadata for files
- out_metadata : str
Location of output JSON metadata for files
output_files : .rmap file Rtree_files: rtree file with information about the RE fragments in the genome. It is used for the process_baitmap.py
REQUIREMENT - Needs FASTA file of the gneome, and a RE in the config file.
When running the pipeline on a local machine without COMPSs:
python process_rmap.py \
--config tests/json/config_rmap.json \
--in_metadata tests/json/input_rmap.json \
--out_metadata tests/json/output_rmap.json \
--local
When using a local version of the [COMPS virtual machine](https://www.bsc.es/research-and-development/software-and-apps/software-list/comp-superscalar/):
runcompss \
--lang=python \
--library_path=${HOME}/bin \
--pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
--log_level=debug \
process_rmap.py \
--config tests/json/config_rmap.json \
--in_metadata tests/json/input_rmap.json \
--out_metadata tests/json/output_rmap.json
process_rmap.process_rmap
process_baitmap
This pipeline creates the .baitmap file, one of the inputs of CHiCAGO. The file Consisting on <chr> <start> <end> <numeric ID> <annotation> Is a subset of the rmap file, containing the RE fragments that overlap with baits provided by the user. Baits are the RE fragments that are capture during the experimental protocol.
- config : str
Configuration JSON file
- in_metadata : str
Location of input JSON metadata for files
- out_metadata : str
Location of output JSON metadata for files
out_baitmap : .baitmap file out_sam : .sam file used to generate .baitmap
- REQUIREMENT - Needs rtree file generated using process_rmap.py,
genome indexed using bwa, File with the used probes,
When running the pipeline on a local machine without COMPSs:
python process_baitmap.py \
--config tests/json/config_baitmap.json \
--in_metadata tests/json/input_baitmap.json \
--out_metadata tests/json/output_baitmap.json \
--local
When using a local version of the [COMPS virtual machine](https://www.bsc.es/research-and-development/software-and-apps/software-list/comp-superscalar/):
runcompss \
--lang=python \
--library_path=${HOME}/bin \
--pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
--log_level=debug \
process_rmap.py \
--config tests/json/config_baitmap.json \
--in_metadata tests/json/input_baitmap.json \
--out_metadata tests/json/output_baitmap.json
process_baitmap.process_baitmap
process_design
This script use as input .rmap and .baitmap files and generate the Design files. NPerBin file (.npb): <baitID> <Total no. valid restriction fragments in distance bin 1> ... <Total no. valid restriction fragments in distance bin N>, where the bins map within the "proximal" distance range from each bait (0; maxLBrownEst] and bin size is defined by the binsize parameter. NBaitsPerBin file (.nbpb): <otherEndID> <Total no. valid baits in distance bin 1> ... <Total no. valid baits in distance bin N>, where the bins map within the "proximal" distance range from each other end (0; maxLBrownEst] and bin size is defined by the binsize parameter. Proximal Other End (ProxOE) file (.poe): <baitID> <otherEndID> <absolute distance> for all combinations of baits and other ends that map within the "proximal" distance range from each other (0; maxLBrownEst]. Data in each file is preceded by a comment line listing the input parameters used to generate them.
- config : str
Configuration JSON file
- in_metadata : str
Location of input JSON metadata for files
- out_metadata : str
Location of output JSON metadata for files
"nbpb" : .nbpb file "npb" : .npb file "poe" : .poe file
REQUIREMENT - Needs RMAP and BAITMAP files
When running the pipeline on a local machine without COMPSs:
python process_design.py \
--config tests/json/config_design.json \
--in_metadata tests/json/input_design.json \
--out_metadata tests/json/output_design.json \
--local
When using a local version of the [COMPS virtual machine](https://www.bsc.es/research-and-development/software-and-apps/software-list/comp-superscalar/):
runcompss \
--lang=python \
--library_path=${HOME}/bin \
--pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
--log_level=debug \
process_design.py \
--config tests/json/config_design.json \
--in_metadata tests/json/input_design.json \
--out_metadata tests/json/output_design.json
process_design.process_design
process_bam2chicago_tool
This pipeline convert the output of process_bed2bam.py BAM file to a .chinput file, input for process_runChicago.py
- config : str
Configuration JSON file
- in_metadata : str
Location of input JSON metadata for files
- out_metadata : str
Location of output JSON metadata for files
chrRMAP : .rmap file with chr# format chrBAITMAP : .baitmap file with chr# format sample_name : .chinput output
- REQUIREMENT - Needs BAM file produced by hicup.py
Needs a .rmap file Needs a .baitmap file
When running the pipeline on a local machine without COMPSs:
python process_bam2chicago_tool.py \
--config tests/json/config_bam2chicago.json \
--in_metadata tests/json/input_bam2chicago.json \
--out_metadata tests/json/output_bam2chicago.json \
--local
When using a local version of the [COMPS virtual machine](https://www.bsc.es/research-and-development/ software-and-apps/software-list/comp-superscalar/):
runcompss \
--lang=python \
--library_path=${HOME}/bin \
--pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
--log_level=debug \
process_bam2chicago_tool.py \
--config tests/json/config_bam2chicago.json \
--in_metadata tests/json/input_bam2chicago.json \
--out_metadata tests/json/output_bam2chicago.json
process_bam2chicago_tool.process_bam2chicago
process_run_chicago
This pipeline runs the normalization of the data and call the real chomatine interactions
- config : str
Configuration JSON file
- in_metadata : str
Location of input JSON metadata for files
- out_metadata : str
Location of output JSON metadata for files
output_dir: directory with all output folders and files
- REQUIREMENT - Needs a reference genome
- Needs file with the capture sequences with FASTA format
- settings file
- design dir:
.rmap .baitmap .npb .nbpb .poe
When running the pipeline on a local machine without COMPSs:
python process_run_chicago.py \
--config tests/json/config_chicago.json \
--in_metadata tests/json/input_chicago.json \
--out_metadata tests/json/output_chicago.json \
--local
When using a local version of the [COMPS virtual machine](https://www.bsc.es/research-and-development/ software-and-apps/software-list/comp-superscalar/):
runcompss \
--lang=python \
--library_path=${HOME}/bin \
--pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
--log_level=debug \
process_runChicago.py \
--config tests/json/config_chicago.json \
--in_metadata tests/json/input_chicago.json \
--out_metadata tests/json/output_chicago.json
process_run_chicago.process_run_chicago
process_CHiC
This pipeline runs the whole Capture Hi-C pipeline.
- config : str
Configuration JSON file
- in_metadata : str
Location of input JSON metadata for files
- out_metadata : str
Location of output JSON metadata for files
output_dir: directory with all output folders and files
- REQUIREMENT - Needs a referance genome
- Folder with indexed reference genome using bowtie2
- Folder with a indexed reference genome using bwa
- Two FASTQ files
- Settings chicago file
When running the pipeline on a local machine without COMPSs:
python process_CHiC.py \
--config tests/json/config_CHiC.json \
--in_metadata tests/json/input_CHiC.json \
--out_metadata tests/json/output_CHiC.json \
--local
When using a local version of the [COMPS virtual machine](https://www.bsc.es/research-and-development/ software-and-apps/software-list/comp-superscalar/):
runcompss \
--lang=python \
--library_path=${HOME}/bin \
--pythonpath=/<pyenv_virtenv_dir>/lib/python2.7/site-packages/ \
--log_level=debug \
process_CHiC.py \
--config tests/json/config_CHiC.json \
--in_metadata tests/json/input_CHiC.json \
--out_metadata tests/json/output_CHiC.json
process_CHiC.process_CHiC