Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert accelerated dice coefficient wrapper code to Cython #363

Merged
merged 8 commits into from
Nov 2, 2020
Merged

Convert accelerated dice coefficient wrapper code to Cython #363

merged 8 commits into from
Nov 2, 2020

Conversation

hardbyte
Copy link
Collaborator

Creates a cython wrapper for the similarity comparison C++ code and removes the cffi compiler.

I noticed that the hot loop includes three dynamically resizing arrays (similarities and two indicies arrays), not trivial but it would speed things up if that was all done in the C/C++ code.

Closes #168

Create a cython wrapper for the similarity comparison CPP code.

Closes #168
@codecov
Copy link

codecov bot commented Oct 19, 2020

Codecov Report

Merging #363 into master will decrease coverage by 0.06%.
The diff coverage is 93.75%.

@@            Coverage Diff             @@
##           master     #363      +/-   ##
==========================================
- Coverage   94.63%   94.57%   -0.07%     
==========================================
  Files          16       16              
  Lines         802      792      -10     
==========================================
- Hits          759      749      -10     
  Misses         43       43              

- drop numpy in favour of using array.array
Raise the threshold dynamically. Once we've found k scores above the threshold use the lowest of those as the new threshold.
@wilko77
Copy link
Collaborator

wilko77 commented Oct 20, 2020

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@hardbyte
Copy link
Collaborator Author

So it looks like all the tests now pass on Windows, OSX, and Linux. The CI is failing due to sending code coverage.

Copy link
Collaborator

@wilko77 wilko77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that. Looks pretty good, well, not that I am an authority when it comes to cython...
You may want to add a line to the changelog.
Otherwise, see comments.

setup.py Outdated Show resolved Hide resolved
anonlink/benchmark.py Show resolved Hide resolved
setup.py Outdated
@@ -34,46 +33,59 @@

current_os = platform.system()
if current_os == "Windows":
extra_compile_args = ['/std:c++17', '/O2', '/arch:AVX512']
# '/arch:AVX512' or '/arch:AVX2'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you take the arch:AVX512 out? Maybe expand the comment somewhat...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the compilation instructions in https://github.com/kimwalisch/libpopcnt - turns out the flags are not required as the CPU support checks are done at runtime.

anonlink/similarities/_dice.pyx Outdated Show resolved Hide resolved
Comment on lines 174 to 178
# Note array.extend_buffer requires the gil to extend the array
# `.data.as_chars` gives us direct access to the underlying contiguous C array
# array.extend_buffer(result_sims, c_scores.data.as_chars, matches)
# array.extend_buffer(result_indices0, i_buffer.data.as_chars, matches)
# array.extend_buffer(result_indices1, c_indices.data.as_chars, matches)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's with this section?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I was trying... added more complexity and didn't allow me to make this function gil-less or faster so went back to the simpler array.extend method below. I'll remove the commented out code.

carr1.frombytes(b''.join(memoryview(f) for f in filters1))

# Only worth popcounting in C for a large number of filters1
if len(filters1) < 10000:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 10000?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment, but just found by benchmarking and noting at what size the throughput in cmp/s dropped

top_k_scores.pop();
// threshold can now be raised
dynamic_threshold = temp_node.score;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, the top_k_scores is a priority queue where the lowest score has the highest priority? Thank god that's so well documented, otherwise this would be quite confusing...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment where the priority queue is created

@hardbyte hardbyte requested a review from wilko77 October 31, 2020 19:41
Copy link
Collaborator

@wilko77 wilko77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. That looks great now.

@wilko77 wilko77 merged commit d8ec9df into data61:master Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert dice_coefficient_x86 glue code to Cython
2 participants