Refactor main C++ function to avoid use "constant" memory and avoid new/delete #55

unzvfu · 2018-01-15T01:15:08Z

This is a low priority PR that came up while looking into data61/anonlink-entity-service#59

The improvement it implements is to use "essentially" constant stack space of size k rather than heap space equivalent to the input length n. So we avoid memory allocation/deallocation, heap usage, and it's also somewhat shorter.

Also included are some minor syntactical and semantic clean-ups, and some comments for future improvements.

Details:

Previous behaviour was to

allocate a big array all_scores on the heap
calculate (almost) all the scores in turn and put each one in all_scores
scan through all_scores to pick the top k matches (using the essentially fixed-sized prio queue)
deallocate all_scores

Now we skip all_scores altogether and just calculate (almost) the scores in turn, inserting each one into the fixed-sized prio queue.

…ew/delete.

hardbyte · 2018-01-15T06:14:15Z

_cffi_build/dice_one_against_many.cpp

@@ -139,14 +139,17 @@ extern "C"
        const uint64_t *comp1 = (const uint64_t *) one;
        const uint64_t *comp2 = (const uint64_t *) many;

+        // TODO: Given that k is 10 by default, often 5 in practice,
+        // and probably never ever more than 20 or so, the use of a


This statement isn't (well shouldn't be) correct - anonlink needs to support computing all raw similarity scores.

hardbyte · 2018-01-15T06:18:04Z

_cffi_build/dice_one_against_many.cpp

-
-            if(max_k_scores.size() > k) max_k_scores.pop();
+                // TODO: double precision is overkill for this
+                // problem; just use float.


Sounds good

hardbyte · 2018-01-15T06:31:23Z

_cffi_build/dice_one_against_many.cpp

@@ -155,39 +158,32 @@ extern "C"

        for (int j = 0; j < n; j++) {
            const uint64_t *current = comp2 + j * KEYWORDS;
+            const uint32_t counts_many_j = counts_many[j];


It always suprises me what you have to help the compiler with

Yeah, this can be more or less important depending on the context. Here it's more about showing intent to fellow programmers than to the compiler.

…ew/delete.

…nto hlaw-refactor-cpp

* Skip conversion to cffi char[] unless required * Libraries shouldn't configure logging * Version bump to 0.6.3 * Improvements to benchmark (#58) * Refactor Dice coefficient calculation. * Temporary fiddling with benchmark code. * Calculate and report popcount speed from native code implementation. * Give some values more sensible variable names. * Remove unused import. * Add documentation. * Expand reporting of various measurements. * Comments. * Update README. * Bring test suite up-to-date. * Address Brian's comments. * Update tests; also test native code version. * Print popcount throughput; give some variables better names. * Update README with throughput data. * Refactor main C++ function to avoid use "constant" memory and avoid new/delete (#55) * Refactor main C++ function to avoid use "constant" memory and avoid new/delete. * Refactor Dice coefficient calculation. * Temporary fiddling with benchmark code. * Calculate and report popcount speed from native code implementation. * Give some values more sensible variable names. * Remove unused import. * Add documentation. * Expand reporting of various measurements. * Comments. * Update README. * Bring test suite up-to-date. * Refactor main C++ function to avoid use "constant" memory and avoid new/delete. * Address Brian's comments. * Update tests; also test native code version. * Print popcount throughput; give some variables better names. * Feature build on Travis CI (#61) Run tests with travis ci * Fix #include file name. * Use pytest (#68) * Update README and requirements.txt files. * Add missing line in README. * Use pytest on Jenkins. * Make Jenkins test commands the same as Travis. * Generate test output and coverage data properly. * Move 'checkout scm' command to start of function; remove redundant cleaning code. Fix #65 * Feature use jenkinslibrary (#70) * Update jenkinsfile to use jenkins library. * Reduce the number of OSX build and which node in Jenkinsfile (see #71) * Arbitrary length Dice coefficients (#63) * Refactor main C++ function to avoid use "constant" memory and avoid new/delete. * Implement popcount on (almost) arbitrary length arrays. * First pass at integrating arbitrary length keys. Slows things down a bit. * Refactor Dice coefficient calculation. * Temporary fiddling with benchmark code. * Calculate and report popcount speed from native code implementation. * Give some values more sensible variable names. * Remove unused import. * Add documentation. * Expand reporting of various measurements. * Comments. * Update README. * Bring test suite up-to-date. * Refactor main C++ function to avoid use "constant" memory and avoid new/delete. * Screw everything up by unrolling with C++ templates, apparently. * Magical argument that makes the compiler generate the correct (performant) code. * Address Brian's comments. * Update tests; also test native code version. * Print popcount throughput; give some variables better names. * Make some functions static inline. * Tidy up some expressions. * Put some braces in the right place; make fn inline. * Reinstate comment on origin of popcount assembler. * Make constant a template parameter. * Comment. * Complete version working with multiples of 1024 bits. * Add -march=native compiler option. * Implementation of arbitrary length CLKs. * Fix dumb mistakes in updating array pointer and popcounts. * Tests for arbitrary length popcounts. * Update some comments. * Arbitrary length Dice coefficient. * Rename function. * Move native dicecoeff calculation into its own function. * Add tests for native Dice coefficient calculation. * Move dicecoeff tests to bloommatcher tests; move common bitarray utilities to their own file. * Simplify slow path / reduce branches in fast path. * Adapt entitymatcher to arbitrary length CLK interface. * Remove unused function. * Update README. * Address Brian's comments. * Exit early if filter is zero. * Specialise popcount arrays calls on array length. * Fix performance regression. * Remove storage class specifiers from explicit template specialisations. * Update README and requirements.txt files. * Disable unused function. * Put stars in their proper place. * Add documentation. * Prepare changelog and bump version for release 0.7.0 * Add clkhash as dependency (required for benchmark) Add travis badge to readme

Refactor main C++ function to avoid use "constant" memory and avoid n…

f5444a3

…ew/delete.

unzvfu requested a review from hardbyte January 15, 2018 01:15

hardbyte approved these changes Jan 15, 2018

View reviewed changes

unzvfu self-assigned this Jan 15, 2018

hardbyte added this to the Sprint 2018-01-29 milestone Jan 24, 2018

Hamish Ivey-Law added 16 commits February 1, 2018 11:03

Refactor Dice coefficient calculation.

5d5338f

Temporary fiddling with benchmark code.

88e3625

Calculate and report popcount speed from native code implementation.

a705de8

Give some values more sensible variable names.

cff1cb6

Remove unused import.

603b6d4

Add documentation.

de33a67

Expand reporting of various measurements.

a458ed0

Comments.

7d2e66c

Update README.

9666eae

Bring test suite up-to-date.

6fe3663

Refactor main C++ function to avoid use "constant" memory and avoid n…

66d9b6e

…ew/delete.

Merge branch 'hlaw-refactor-cpp' of github.com:n1analytics/anonlink i…

05246f2

…nto hlaw-refactor-cpp

Address Brian's comments.

166f6e9

Update tests; also test native code version.

9cbc243

Print popcount throughput; give some variables better names.

cf26901

Merge branch 'hlaw-fix-issue-56' into hlaw-refactor-cpp

c986021

hardbyte modified the milestones: Sprint 2018-01-29, Sprint 2018-02-12 Feb 9, 2018

Hamish Ivey-Law and others added 2 commits February 9, 2018 16:35

Merge remote-tracking branch 'origin/develop' into hlaw-refactor-cpp

ff3bf51

Merge branch 'develop' into hlaw-refactor-cpp

c8c11c0

unzvfu merged commit 5cab824 into develop Feb 9, 2018

unzvfu deleted the hlaw-refactor-cpp branch February 9, 2018 05:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor main C++ function to avoid use "constant" memory and avoid new/delete #55

Refactor main C++ function to avoid use "constant" memory and avoid new/delete #55

unzvfu commented Jan 15, 2018 •

edited

Loading

hardbyte Jan 15, 2018

hardbyte Jan 15, 2018

hardbyte Jan 15, 2018

unzvfu Jan 15, 2018

Refactor main C++ function to avoid use "constant" memory and avoid new/delete #55

Refactor main C++ function to avoid use "constant" memory and avoid new/delete #55

Conversation

unzvfu commented Jan 15, 2018 • edited Loading

hardbyte Jan 15, 2018

Choose a reason for hiding this comment

hardbyte Jan 15, 2018

Choose a reason for hiding this comment

hardbyte Jan 15, 2018

Choose a reason for hiding this comment

unzvfu Jan 15, 2018

Choose a reason for hiding this comment

unzvfu commented Jan 15, 2018 •

edited

Loading