This is the repository for DETOPT
a combinatorial optimization method for DETermining Optimal Placement in Tumor progression history of single nucleotide variants (SNVs) from the genomic regions impacted by copy-number aberrations (CNAs) using multi-sample bulk DNA sequencing data.
Clone the repository locally
$ git clone https://github.com/algo-cancer/DETOPT.git
$ cd DETOPT
DEPENDENCIES
- Python >= 3.10
- Required packages are managed by conda. If it has not been previously installed, follow installation directions for
conda
. Create the conda environment from the.yaml
file, then activate it using the following commands.
$ conda env create -f env/detopt.yaml
$ conda activate detopt
Note
DETOPT
requires an installation and a valid license for Gurobi
(freely available for academic users). Follow instructions for retrival and setting up of a license here.
(detopt) foo@bar:$ python detopt.py --help
usage: DETOPT [+h] [-p [CNA_REG]] [-h [N_SAMPLES]] -d DATA_DIR -o OUT -s SNV_FILE -t TREE_FILE
(DETermining Optimal Placement in Tumor progression history)
options:
+h, ++help show this help message and exit
-p [CNA_REG], --cna_reg [CNA_REG] regularization weight (default: 0.25)
-h [N_SAMPLES], --n_samples [N_SAMPLES] number of samples
-d DATA_DIR, --data_dir DATA_DIR directory containing required `snv_file` and `tree_file` files
-o OUT, --out OUT output filename prefix, optionally with filepath
-s SNV_FILE, --snv_file SNV_FILE `snv_file` file containing information about read counts and allele-copy number calls of SNVs
-t TREE_FILE, --tree_file TREE_FILE `tree_file` file containing information about the base tree
Flag | Argument | Symbol Used | Description |
---|---|---|---|
-s | snv_file |
- | refer to extended description of required input files |
-t | tree_file |
- | refer to extended description of required input files |
-p | cna_reg |
ρ | regularization parameter, denoted as in the manuscript, introduced to balance the two objective terms. By default, ρ is set to 0.25. |
-h | n_samples |
h | the total number of samples from the multi-sample bulk DNA sequencing data |
-d | data_dir |
- | a relative path to the directory in which the input .snvs.input and .tree files are located |
-o | out |
- | the filename for which DETOPT will create files containing the outputs in the current working directory, e.g., <out>.detopt.tsv . By prepending filename with a filepath, the file can be writting to another directory, e.g., </path/to/out>.detopt.tsv |
extended description of required input files
DETOPT
requires two (2) input files. The first contains the read counts and copy-number states (as obtained from HATCHet1, or any allele- and clone-specific copy-number caller) for each mutation, including both copy-number neutral and aberrant SNVs, in each sample. The second describes the base tree topology, constructed by the use of only copy-number neutral SNVs, with inferred sample node (subclone) frequencies.
SNV file. --snv_file <SNV_FILE>
This file contains information for each SNV, the read counts mapping to the variant and reference alleles and the allele- and clone-specific copy number calls. hatchetconvert.py
is provided for the conversion of a .seg.ucn
file from HATCHet (example) and a mutation tab-separated .tsv
file (example) into an input file for DETOPT with fields,
mut_index sample var_reads ref_reads normal_state normal_prop tumor1_state tumor1_prop tumor2_state tumor2_prop
mut_1 sample_1 42 123 1|1 0.23 2|1 0.54 2|0 0.23
mut_index
: unique mutation identifiersample
: unique sample identifiervar_reads
: number of reads mapping to the variant alleleref_reads
: number of reads mapping to the reference allelenormal_state
: normal copy number state, '1|1'normal_prop
: proportion of cells in the sample that have thenormal_state
copy number statetumor_state
: aberrant copy number state, 'A|B' where A and B are number of copies of allele A and B, respectively. Values of A and B are not allele-specific.tumor_prop
: proportion of cells in the sample that have thetumor_state
copy number state
Tree file. --tree_file <TREE_FILE>
This file contains information for each subclone, represented by a node in the base tree, the information for the following fields,
NODE_ID PARENT_ID MUTATIONS_AT_NODE SAMPLE_IDS NODE_FREQUENCIES
0 1 mut_1, ..., mut_i sample_1, ..., sample_j f_1, ..., f_j
NODE_ID
:node
in the tree, representing a subclonePARENT_ID
: parental node (also a subclone) ofnode
MUTATIONS_AT_NODE
: copy number neutral mutations assigned tonode
SAMPLE_IDS
: list of samples in whichnode
(subclone) is presentNODE_FREQUENCIES
: list of the inferred sample node (subclone) frequencies. For each sample inSAMPLE_IDS
, the node (subclone) frequency is the fraction of all cells in a sample, including normal cells, that belong to that subclone.
Important
DETOPT
returns a tab-separated _assignments.tsv
file containing the inferred placements of copy number gain and loss events. If an SNV was impacted by more than one copy-number aberration event, each copy-number state and its inferred placement are designated by a unique index that has no particular signficance other than to distinguish these events.
DETOPT
produces two (2) additional files. _mut_copy_numbers.tsv
contains the allele-specific number of copies of the mutant allele (on A,B) for each mutation (row) in each node (column). _tot_copy_numbers.tsv
contains the allele-specific number of total copies of each allele (on A,B) for each mutation (row) in each node (column).
Example | Description | output |
---|---|---|
4355 | Demo of DETOPT on metastatic breast cancer patient 43552 with 18 samples |
here |
(detopt) $ python src/detopt.py -d real_data/demo/demo_inputs -o real_data/demo/demo_outputs/4355 -s 4355.snvs.input -t 4355.tree
The software and corresponding documentations are maintained by the research group of Dr. S. Cenk Sahinalp. If you have encountered issues with the tool or have inquiries, please raise it on the issue forum or contact Chih Hao Wu and Salem Malikić. Please always refer to the GitHub repository of DETOPT for the most updated version of the software and relevant documentation.
Footnotes
-
Zacharakis, N., Huq, L.M., Seitter, S.J., Kim, S.P., Gartner, J.J., Sindiri, S., Hill, V.K., Li, Y.F., Paria, B.C., Ray, S., et al.: Breast cancers are immunogenic: immunologic analyses and a phase ii pilot clinical trial using mutation-reactive autologous lymphocytes. Journal of Clinical Oncology 40(16), 1741–1754 (2022) https://doi.org/10.1200/jco.21.02170 ↩