MEM-Rearrange is an implementation of the algorithms described in the paper New Algorithms for Structure Informed Genome Rearrangement. The data used in the expirement of the paper, is placed in the folder "input_families". In the paper we define two problems, Constrained TreeToString Divergence (CTTSD) and TreeToString Divergence (TTSD).
The input to CTTSD consists of two signed permutations of length
Generalizing CTTSD, in TTSD we do not assume that the input strings are permutations, and we allow deletions. The input to TTSD consists of two signed strings,
In our experiment, we set the parameters of the algorithms as follows.
The input files format is as follows. For each CSB in the cluster, there is a line in the next format:
<CSB_ID><\t><\t><\t>Instance_Count><\t><\t><Main_Category><\t><Family_ID> Please see an example in the folder "input_families".
Place all input files (in the correct format) in a folder named "input_families".
In order to run the algorithm for CTTSD, import get_all_MEM4_dicts from MEM. Call the method 'get_all_MEM4_dicts(bp_qnode_penal, qnode_flip_penal, jump_penal=1)'.
bp_qnode_penal: The penalty for a breakpoint inside a Q-node (
qnode_flip_penal: The penalty for a flip operation (
jump_penal: The penalty for jumping is calulated accordingly. Currently, the value is constantly 1.
In order to run the algorithm for TTSD, import get_all_MEM_Rearrange_dicts from MEM_general. Call the method 'get_all_MEM_Rearrange_dicts(bp_qnode_penal, qnode_flip_penal, d_T, d_S, delete_T_penal, delete_S_penal, jump_penal=1)'.
bp_qnode_penal: The penalty for a breakpoint inside a Q-node (
qnode_flip_penal: The penalty for a flip operation (
jump_penal: The penalty for jumping is calulated accordingly. Currently, the value is constantly 1.
d_T: The number of allowed deletions from the tree.
d_T: The number of allowed deletions from the tree.
delete_T_penal: The penalty for a deletion from the tree.
delete_S_penal: The penalty for a deletion from the string.
See examples in the file "main.py".
The methods return a pairwise divergence scores for each input file as a dictionary in the following format.
The keys of the dictionary are the input files names. The value is a dictionary, with the pairwise scores in the next format:
For each CSB, there is a dictionary with keys as the CBS names in the family, and the values are the divergence scores.
See example in the "main.py".