-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert accelerated dice coefficient wrapper code to Cython #363
Conversation
Create a cython wrapper for the similarity comparison CPP code. Closes #168
Codecov Report
@@ Coverage Diff @@
## master #363 +/- ##
==========================================
- Coverage 94.63% 94.57% -0.07%
==========================================
Files 16 16
Lines 802 792 -10
==========================================
- Hits 759 749 -10
Misses 43 43 |
- drop numpy in favour of using array.array
Raise the threshold dynamically. Once we've found k scores above the threshold use the lowest of those as the new threshold.
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
Update libpopcount.h
So it looks like all the tests now pass on Windows, OSX, and Linux. The CI is failing due to sending code coverage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that. Looks pretty good, well, not that I am an authority when it comes to cython...
You may want to add a line to the changelog.
Otherwise, see comments.
setup.py
Outdated
@@ -34,46 +33,59 @@ | |||
|
|||
current_os = platform.system() | |||
if current_os == "Windows": | |||
extra_compile_args = ['/std:c++17', '/O2', '/arch:AVX512'] | |||
# '/arch:AVX512' or '/arch:AVX2' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you take the arch:AVX512
out? Maybe expand the comment somewhat...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the compilation instructions in https://github.com/kimwalisch/libpopcnt - turns out the flags are not required as the CPU support checks are done at runtime.
anonlink/similarities/_dice.pyx
Outdated
# Note array.extend_buffer requires the gil to extend the array | ||
# `.data.as_chars` gives us direct access to the underlying contiguous C array | ||
# array.extend_buffer(result_sims, c_scores.data.as_chars, matches) | ||
# array.extend_buffer(result_indices0, i_buffer.data.as_chars, matches) | ||
# array.extend_buffer(result_indices1, c_indices.data.as_chars, matches) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's with this section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I was trying... added more complexity and didn't allow me to make this function gil-less or faster so went back to the simpler array.extend
method below. I'll remove the commented out code.
anonlink/similarities/_dice_x86.py
Outdated
carr1.frombytes(b''.join(memoryview(f) for f in filters1)) | ||
|
||
# Only worth popcounting in C for a large number of filters1 | ||
if len(filters1) < 10000: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 10000?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment, but just found by benchmarking and noting at what size the throughput in cmp/s dropped
top_k_scores.pop(); | ||
// threshold can now be raised | ||
dynamic_threshold = temp_node.score; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, the top_k_scores
is a priority queue where the lowest score has the highest priority? Thank god that's so well documented, otherwise this would be quite confusing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment where the priority queue is created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. That looks great now.
Creates a cython wrapper for the similarity comparison C++ code and removes the cffi compiler.
I noticed that the hot loop includes three dynamically resizing arrays (similarities and two indicies arrays), not trivial but it would speed things up if that was all done in the C/C++ code.
Closes #168