similarity join and search algorithms for edit distance and jaccard
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
MassJoinCode
adaptjoin
allpair-ed
bitriejoin-ed
common
edjoin-ed
fastjoin
fastss-ed
flamingo
hstree
mtree-ed
papers
partenum-ed
partenum-token
passjoin-ed
passjoin-token
ppjoin-token
qchunk-indexchunk-ed
qchunk-indexgram-ed
triejoin-ed
LICENSE
README.md
fastjoin.tar.gz
makefile
passjoin-openrefine.zip
similarity.zip

README.md

How to build:

Prerequisite

  1. g++ >= 4.8
  2. boost >= 1.5
  3. gnu make

Then just run 'make'!

How to run:

  1. For ed, the command is './PATH_TO_EXECUTABLE data_file_name threshold q'
  2. For token-based metrics, the command is './PATH_TO_EXECUTABLE metric data_file_name threshold'. metric should be one of 'jaccard', 'cosine' or 'dice'