This project implements several algorithms for the secure computation of Edit Distance using the semi-honest two-party computation based on garbled circuits model provided by the EMP-toolkit.
There are six different algorithms implemented in plain C++.
- Wagner-Fischer (Matrix): "The string-to-string correction problem". 1974.
- Wagner-Fischer (Optimized): "The string-to-string correction problem". 1974.
- Ukkonen (Threshold): "Algorithms for approximate string matching". 1985.
- Ukkonen (Generalized): "Algorithms for approximate string matching". 1985.
- Wu et al.: "An O(NP) sequence comparison algorithm". 1990.
- Enhanced Ukkonen: "An Extension of Ukkonen's Enhanced Dynamic Programming ASM Algorithm". 1996.
- Myers' Bit-Vector: "A Fast Bit-Vector Algorithm for Approximate String Matching Based on Dynamic Programming". 1999.
The first four of these algorithms are also implemented with the EMP-toolkit. For the others a direct implementation is not convenient due to unefficient management of secret array indexes in secure computation.
After compiling, just run:
./EditDistance <input_length> <threshold>
- Install the EMP-toolkit
- Run
./bin/EditDistance 1 12345 <input_length> <threshold> & ./bin/EditDistance 2 12345 <input_length> <threshold>
The <input_length>
(must be the same for the programs in Secure
directory) and the <threshold>
are optional.
Possible <input_length>
are: 5, 7, 50, 100, 200, 250, 500, 1000, 2000, 3000, 3500, 4000, 5000, 10000, 50000. These are randomly generated DNA (A, C, G, T) strings.
It is also possible to use IDash_1
, IDash_2
, and IDash_3
, which are real DNA sequences of approximately 3400 characters from a genome database released by iDASH Security and Privacy Workshop 2016
Further details about EMP options are here
Distributed under the GPL 3.0 License. See LICENSE
for more information.