This tool marks with chromothripsis those breakpoints that form a spatial cluster. This assumption is based on this article. In order to select from the whole set of structural variations those that form a spatial cluster, the following algorithm was used. A weighted graph constructed, whose vertices are the breakpoints, and the weights of the edges are the values from the Hi-C matrix for the corresponding pair of the breakpoints. The resulting graph is searched for a dense subgraph, since a chromothripsis event occurs only once in one patient, one subgraph will contain all the chromothripsis breakpoints. The vertices from the resulting subgraph are translated back into breakpoints, and annotated with the structural variations in which they are included, the structural variations with both breakpoints marked as chromotripsis are labeled chromotriptic and are the final result of the algorithm.
- Clone all repository
git clone https://github.com/Dv1t/hic_chromo_detection
-
Download example .mcool file here and place it in the tool folder.
-
Install required packages
pip install pandas, PyMaxflow, argparse, numpy, cooler
or
pip install pandas, PyMaxflow, argparse, numpy
conda install -c conda-forge -c bioconda cooler
- Run main script:
python hic_chromo.py --sv example_data/prostata_sv.csv --hic Prostata40к80k200к400к800к.mcool --o result.csv
Main script of project including all modules together can be run from anywhere as console tool.
Path to the file with structural variations data, .csv extension.
Path to file with Hi-C matrix, .mcool extension. (.cool can be converted to .mcool by cooler zoomify) cooler-zoomify — how to generate a multi-resolution cooler file by coarsening.
Path to output file, .csv extension.
Hi-C matrix resolution in bases. Default is 40000.
Output file is a table in .csv format. It consist from 24 columns.
patient_id
— ID of the patient with structural variations from the input data.
1chr
—22chr
,Xchr
— arrays of coordinates of breakpoints, that algorithm marked as chromothriptic.
Example structural variations data is available in example-data
folder. Example .mcool file could be downloaded here.
If you use this tool in your research, please cite:
Petukhova N., Zabelkin A., Dravgelis V., Aganezov S., Alexeev N. Chromothripsis Rearrangements Are Informed by 3D-Genome Organization. Lecture Notes in Computer Science. 2022. Vol. 13234. pp. 221-231. https://doi.org/10.1007/978-3-031-06220-9_13