Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature arbitrary size encodings #366

Merged
merged 8 commits into from
Nov 15, 2020
Merged

Feature arbitrary size encodings #366

merged 8 commits into from
Nov 15, 2020

Conversation

hardbyte
Copy link
Collaborator

Builds on #363 as suggested by @nbgl in #163

Implements in C a path to popcount and dice one to many when the encoding size is not word aligned. This means you can finally work on encodings of just 1 byte or 13 bytes (if you are so inclined).

I haven't done any benchmarking although I've expanded and added several tests.

Encodings of any number of bytes are now supported.

Closes #163
@codecov
Copy link

codecov bot commented Oct 21, 2020

Codecov Report

Merging #366 (47ca356) into master (03ac3d0) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #366   +/-   ##
=======================================
  Coverage   94.57%   94.57%           
=======================================
  Files          16       16           
  Lines         792      792           
=======================================
  Hits          749      749           
  Misses         43       43           

@hardbyte
Copy link
Collaborator Author

hardbyte commented Nov 9, 2020

@wilko77 did you find a way to make the coverage work for external PRs? As far as I can tell this is passing all the tests on each system and is ready for review.

@wilko77
Copy link
Collaborator

wilko77 commented Nov 9, 2020

I started looking into the pipeline (#365). I made the coverage optional, that works. However, we then have another step where we upload the artifacts to a feed. And this still fails. It complains that the Anonlink build service hasn't got the permissions to publish there.
That's where I got stuck. As far as I can see, the Build Service has contributor permissions, thus should be fine to twine upload to the feed.
Maybe we can get the release pipeline to do the publishing to the feed.
Or can we do without that feed?

@hardbyte
Copy link
Collaborator Author

hardbyte commented Nov 9, 2020

My guess is that "Publish package to test feed" stage could be only enabled for the main branches - https://docs.microsoft.com/en-us/azure/devops/pipelines/process/conditions?view=azure-devops&tabs=yaml

Maybe we can get the release pipeline to do the publishing to the feed.
Or can we do without that feed?

Yeah I think if you could get the release pipeline to pick up all the artifacts directly you wouldn't need the test feed. I made it so we could test linked feature branches across the multiple projects. Primarily so anonlink-entity-service could test a not-yet released version of the anonlink library by pulling from the test feed.

Copy link
Collaborator

@wilko77 wilko77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.
A few minor things. See comments.
Thank you.

If you merge master, then you'll get the updated CI definition, and with it the satisfaction of seeing the CI pipeline succeed.

anonlink/similarities/_dice_x86.py Show resolved Hide resolved
anonlink/similarities/dice.cpp Show resolved Hide resolved
Comment on lines 27 to 29
# To permit arbitrary size input_data, we may need to pad with zeros to align to a word boundary?
# Eg say our input is 32 bits and our array size is 8, our output_size is already correct at 4
# but we may want to make the input 64 bits?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this. I though you implemented arbitrary sized inputs? What is this issue here exactly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was me recording my thinking early on. As you know, before this PR the CPP code assumed inputs would be a multiple of WORD_SIZE (8 bytes) in length, at least for supporting popcount we could have padded the input with zeros up to the nearest word size boundary and avoided modifying the CPP code.

Ultimately I didn't take that approach anywhere so I'll remove the comment.

@hardbyte hardbyte merged commit 6d4aba7 into data61:master Nov 15, 2020
@hardbyte hardbyte deleted the feature-arbitrary-size-encodings branch November 15, 2020 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants