Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed hard-coded constant in sortition.go. #3558

Merged
merged 5 commits into from
Feb 4, 2022

Conversation

Olshansk
Copy link
Contributor

@Olshansk Olshansk commented Feb 3, 2022

Summary

Remove a hardcoded constant in sortition.go which was
used as the denominator in determining the selection ratio.

This clarifies what the maximum possible output size is of
the output VRF based on the SHA algorithm used to generate it.

Purpose

This is simply a code quality change to make it easier to read and
understand without any focus on new functionality or performance.

It is unlikely that the VRF SHA length used by Algorand will changed,
so the generalization to a dynamic length based on the length of the
Crypto digest is only for the reader.

The use of an init function to initialize the maxFloat once rather than
anytime sortition runs is a minor optimization, but likely to be neglibile
in the grand scheme of things.

Test Plan

Due to the usage of C src code, a simple go test -v ./data/committee/sortition/
cannot be executed without the proper configurations. However, circle CI
should be able to execute the unit tests automatically and verify the changes.

Recommendation: Running the recommended make targets to test the changes
did not automatically due to missing dependencies, so potentially a Dockerfile
could help resolve this for future contributors.

Remove a hardcoded constant in sortition.go which was
used as the denominator in determining the selection ratio.

This clarifies what the maximum possible output size is of
the output VRF based on the SHA algorithm used to generate it.

Due to the usage of C src code, a simple `go test -v ./data/committee/sortition/`
cannot be executed without the proper configurations. However, circle CI
should be able to execute the unit tests automatically and verify the changes.
@CLAassistant
Copy link

CLAassistant commented Feb 3, 2022

CLA assistant check
All committers have signed the CLA.

precision := uint(8 * (len(vrfOutput) + 1))
max, b, err := big.ParseFloat("0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff", 0, precision, big.ToNearestEven)
maxFloatString := fmt.Sprintf("0x%s", strings.Repeat("f", crypto.DigestSize*2+1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be right, can it? Surely an odd number of hex digits is not what you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. This was something I added while manually testing. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the update in my comment, might be easier to add Dockerfile so all the dependencies don't have to be installed locally to run the unit tests. I'm relying on circle CI right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to see the fmt.Sprintf done once, rather than each time, as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a local var since you can't assign the output of formatting to a constant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For that matter, perhaps you could move max out of the function as well, so it's not reparsed each time. I suspect the PR would be more palatable if it had a performance improvement, rather than only making the constant explicit. (On the explicitness front, why not strings.Repeat("ff", crypto.DIgestSize)?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@tsachiherman
Copy link
Contributor

@Olshansk - can I suggest a small improvement for your PR ?
I've implemented these proposals in #3559.

@tsachiherman
Copy link
Contributor

@Olshansk - you'll need to sign the CLAassistant before the proposed code changes could be merged in.

@tsachiherman tsachiherman self-requested a review February 3, 2022 15:29
@tsachiherman tsachiherman added the quality Improve code quality or style label Feb 3, 2022
@Olshansk
Copy link
Contributor Author

Olshansk commented Feb 4, 2022

@tsachiherman Thanks for the suggestion!

TIL about init in Golang so this was super useful. I also updated the description of the PR with a Purpose to provide some extra context for the need for this change. It's very much a nice-to-have.

Also signed the agreement.

@codecov-commenter
Copy link

codecov-commenter commented Feb 4, 2022

Codecov Report

Merging #3558 (1645da6) into master (6230dc5) will decrease coverage by 0.00%.
The diff coverage is 62.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3558      +/-   ##
==========================================
- Coverage   47.58%   47.58%   -0.01%     
==========================================
  Files         370      370              
  Lines       60118    60121       +3     
==========================================
  Hits        28606    28606              
- Misses      28198    28203       +5     
+ Partials     3314     3312       -2     
Impacted Files Coverage Δ
data/committee/sortition/sortition.go 82.35% <62.50%> (-3.37%) ⬇️
network/wsPeer.go 65.83% <0.00%> (-3.34%) ⬇️
catchup/peerSelector.go 98.95% <0.00%> (-1.05%) ⬇️
ledger/acctupdates.go 66.53% <0.00%> (-0.20%) ⬇️
catchup/service.go 69.38% <0.00%> (+0.74%) ⬆️
ledger/tracker.go 75.75% <0.00%> (+1.29%) ⬆️
crypto/merkletrie/node.go 93.48% <0.00%> (+1.86%) ⬆️
crypto/merkletrie/trie.go 68.61% <0.00%> (+2.18%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6230dc5...1645da6. Read the comment docs.

Copy link
Contributor

@jannotti jannotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

Copy link
Contributor

@tsachiherman tsachiherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes.

@tsachiherman tsachiherman merged commit c765903 into algorand:master Feb 4, 2022
@Olshansk
Copy link
Contributor Author

Olshansk commented Feb 7, 2022

@tsachiherman @jannotti I was wondering if either of you could help me understand the logic behind sortition_binomial_cdf_walk.

I've looked at the paper and read through some online articles (example) but am still struggling to understand this:

In short, I understand that we use VRF to compute a proof and a hash, which is normalized to a value in [0, 1) using the hashlen (i.e. the ratio). The binomial distribution is built using the total voting power, and the number of expected leaders. We then compute the CDF for each value j less than the specific node's voting power and compare it boundary of the CDF.

    // Found the correct boundary, break
    if (ratio <= boundary) {
      return j;
    }

However, I'm still struggling to understand the significance of j in the check above. Are we saying that a single node has the opportunity to vote with at most money tokens (i.e. their voting power), but based on the random value of the VRF hash, it is going to be some value less than that based on the CDF?

@jannotti
Copy link
Contributor

jannotti commented Feb 7, 2022

It's not something I'm super familiar with, but I think you've got the idea. Suppose I have 1M algos. I don't "win" the VRF lottery will all of them, or lose with all of them. Some fraction of them win and some fraction loses, based on the outcome of the VRF.

@Olshansk
Copy link
Contributor Author

Olshansk commented Feb 8, 2022

Yea, the high level idea makes sense and even most of the intricacies except for that one… Maybe it's something we could figure out together?

I've looked at section 5 of the whitepaper and my understanding is:

Assume / let the following:

  • Each user as a Vk (Verification Key) and a Sk (Secret Key)
  • Vk of each user has been broadcasted to all other users in the network
  • Seed S is determined and publically known (but only computed AFTER Vk has been broadcasted)
  • Every user i has a voting power Wi where W is the voting power of the whole network

p, hash = VRFsk(seed)

  • p is the proof that can be verified with VRFvk(seed)
  • hash is uniformly distributed and is of size hashlen

The binomial distribution is set to:

  • Number of trials is the individual nodes voting power (each unit is an individual attempt to be selected to protect against Sybil attacks)
  • The probably of each trial is number of expected candidates in committee / total voting power (each unit is an individual sub-user)

After re-reading it, I actually think the core of the “magic” is here:

Screen Shot 2022-02-07 at 10 41 25 PM

  1. Hash / 2^hashlen normalizes the uniformly distributed VRF output
  2. Using the binomial distribution, we can find out if EXACTLY j sub-users are selected.

The maximum value for j is the maximum number of their sub-users that may be selected. However, if exactly j sub-users are not selected (based on the normalized VRF output), it doesn’t mean that a value less than j might not be selected. E.g. if the total voting power is 1000 ALGO, and a single node has 100, they should still have an opportunity to be a potential committee member if only 50 of their ALGO is elected as their voting round for that view change.

My understanding is that this is the reason we’re using a CDF of the binomial distribution and accepting any value of j smaller than their individual voting power as the value to be a candidate.

tsachiherman added a commit that referenced this pull request Feb 8, 2022
## Summary

Following up on #3558, this PR update the THANKS.md file.

## Test Plan

Not required for this change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
quality Improve code quality or style
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants