Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arbitrary length Dice coefficients #63

Merged
merged 56 commits into from
Mar 14, 2018
Merged
Show file tree
Hide file tree
Changes from 45 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
f5444a3
Refactor main C++ function to avoid use "constant" memory and avoid n…
Jan 15, 2018
5d5337b
Implement popcount on (almost) arbitrary length arrays.
Jan 15, 2018
3864028
First pass at integrating arbitrary length keys. Slows things down a …
Jan 15, 2018
5d5338f
Refactor Dice coefficient calculation.
Feb 1, 2018
88e3625
Temporary fiddling with benchmark code.
Feb 1, 2018
a705de8
Calculate and report popcount speed from native code implementation.
Feb 1, 2018
cff1cb6
Give some values more sensible variable names.
Feb 2, 2018
603b6d4
Remove unused import.
Feb 2, 2018
de33a67
Add documentation.
Feb 2, 2018
a458ed0
Expand reporting of various measurements.
Feb 2, 2018
7d2e66c
Comments.
Feb 5, 2018
9666eae
Update README.
Feb 5, 2018
6fe3663
Bring test suite up-to-date.
Feb 5, 2018
66d9b6e
Refactor main C++ function to avoid use "constant" memory and avoid n…
Jan 15, 2018
05246f2
Merge branch 'hlaw-refactor-cpp' of github.com:n1analytics/anonlink i…
Feb 5, 2018
8181752
Merge branch 'hlaw-refactor-cpp' into hlaw-arbitrary-length-dice-coeff
Feb 6, 2018
3a55dc4
Screw everything up by unrolling with C++ templates, apparently.
Feb 6, 2018
b94c555
Magical argument that makes the compiler generate the correct (perfor…
Feb 7, 2018
166f6e9
Address Brian's comments.
Feb 7, 2018
9cbc243
Update tests; also test native code version.
Feb 7, 2018
cf26901
Print popcount throughput; give some variables better names.
Feb 8, 2018
c986021
Merge branch 'hlaw-fix-issue-56' into hlaw-refactor-cpp
Feb 8, 2018
e978834
Merge branch 'hlaw-refactor-cpp' into hlaw-arbitrary-length-dice-coeff
Feb 9, 2018
d02f23a
Make some functions static inline.
Feb 9, 2018
888e989
Tidy up some expressions.
Feb 9, 2018
c6780f0
Put some braces in the right place; make fn inline.
Feb 9, 2018
3f1104f
Reinstate comment on origin of popcount assembler.
Feb 9, 2018
edc7c2b
Make constant a template parameter.
Feb 9, 2018
892c599
Comment.
Feb 15, 2018
f500231
Complete version working with multiples of 1024 bits.
Feb 15, 2018
063115a
Add -march=native compiler option.
Feb 18, 2018
c9134d0
Implementation of arbitrary length CLKs.
Feb 18, 2018
b2435f9
Fix dumb mistakes in updating array pointer and popcounts.
Feb 19, 2018
4acd62f
Tests for arbitrary length popcounts.
Feb 19, 2018
5730617
Merge branch 'develop' into hlaw-arbitrary-length-dice-coeff
Feb 19, 2018
38ca3ce
Update some comments.
Feb 19, 2018
e8c77bc
Arbitrary length Dice coefficient.
Feb 19, 2018
1febd65
Rename function.
Feb 19, 2018
21390c4
Move native dicecoeff calculation into its own function.
Feb 20, 2018
75cef8e
Add tests for native Dice coefficient calculation.
Feb 20, 2018
c338c32
Move dicecoeff tests to bloommatcher tests; move common bitarray util…
Feb 20, 2018
4d74b1c
Simplify slow path / reduce branches in fast path.
Feb 20, 2018
2d6b5f7
Adapt entitymatcher to arbitrary length CLK interface.
Feb 20, 2018
ab45ea8
Remove unused function.
Feb 20, 2018
9ccaa8d
Update README.
Feb 20, 2018
e515b34
Address Brian's comments.
Feb 21, 2018
446033f
Exit early if filter is zero.
Feb 21, 2018
dea0a0d
Specialise popcount arrays calls on array length.
Feb 23, 2018
d3671a2
Fix performance regression.
Mar 2, 2018
93abfae
Remove storage class specifiers from explicit template specialisations.
Mar 2, 2018
e9706ff
Update README and requirements.txt files.
Mar 2, 2018
8e673fb
Merge branch 'feature-use-pytest' into hlaw-arbitrary-length-dice-coeff
Mar 2, 2018
bea2ad5
Merge branch 'develop' into hlaw-arbitrary-length-dice-coeff
Mar 8, 2018
ed09687
Disable unused function.
Mar 8, 2018
63cc6e0
Put stars in their proper place.
Mar 9, 2018
ef82759
Add documentation.
Mar 9, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -135,8 +135,6 @@ Limitations
- The linkage process has order n^2 time complexity - although algorithms exist to
significantly speed this up. Several possible speedups are described
in http://dbs.uni-leipzig.de/file/P4Join-BTW2015.pdf
- The C++ code makes an assumption of 1024 bit keys (although this would be easy
to change).


License
Expand Down
10 changes: 5 additions & 5 deletions _cffi_build/build_matcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,14 @@
"_entitymatcher",
source,
source_extension='.cpp',
extra_compile_args=['-Wall', '-Wextra', '-Werror', '-O3', '-std=c++11', '-mssse3', '-mpopcnt'],
extra_compile_args=['-Wall', '-Wextra', '-Werror', '-O3', '-std=c++11', '-march=native', '-mssse3', '-mpopcnt', '-fvisibility=hidden'
],
)

ffibuilder.cdef("""
int match_one_against_many_dice(const char * one, const char * many, int n, double * score);
int match_one_against_many_dice_1024_k_top(const char *one, const char *many, const uint32_t *counts_many, int n, uint32_t k, double threshold, int *indices, double *scores);
double dice_coeff_1024(const char *e1, const char *e2);
double popcount_1024_array(const char *many, int n, uint32_t *counts_many);
int match_one_against_many_dice_k_top(const char *one, const char *many, const uint32_t *counts_many, int n, int keybytes, uint32_t k, double threshold, int *indices, double *scores);
double dice_coeff(const char *array1, const char *array2, int array_bytes);
double popcount_arrays(uint32_t *counts, const char *arrays, int narrays, int array_bytes);
""")


Expand Down
Loading