Split Alpha into AlphaPreference and AlphaConfidence #2125

StephenButtolph · 2023-10-03T00:24:08Z

Background

The Avalanche network uses the Snow* consensus protocols to reach distributed agreement. Specifically, each blockchain runs its own isolated instance of Snowman.

Snowman

Snowman is a multi-decree multivariate chain-based consensus process built from the single-decree binary consensus protocol, Snowball. It runs (potentially many) dependent Snowball protocols concurrently, which form a binary tree with the root being the last accepted Snowball decision.

As an example:

graph BT
  A[LastAccepted]
  B[Block A]
  C[Block B]
  D[Block C]

  B-->A
  C-->A
  D-->C

The tree above has 3 processing blocks, A, B, and C. For simplicity, we'll ignore that blocks can have many conflicts. Since A and B are conflicting there is a Snowball instance that will determine which block to accept. Even though C is built on top of B, this does not impact the likelihood of B being accepted over A. The only mechanism that decides between A and B is the Snowball instance.

Snowball

Snowball's goal is to have a network agree on one of two values. Typically, we refer to these values as Red and Blue. This process can itself be broken into 2 primary mechanisms.

Color selection - Each node attempts to match its own color to the network's majority color. Over time, the network should converge so that every correct node reports the same color.
Finalization detection - Each node determines when to stop changing their color.

It's critical that these mechanisms are able to tolerate arbitrary failures. Specifically, nodes reporting incorrect colors, or not reporting any color at all, must not impact network progress.

The existing Snowball specification is:

def onQuery(peer, peerColor):
  respond(peer, currentColor) // Send our current color to the peer, which might be ⊥
  if currentColor = ⊥:
    currentColor = peerColor // If we do not currently have a color, adopt the peer's color

def snowball():
  confidence = 0
  preferenceStrength = {Red:0, Blue:0}
  while confidence < β:
    if currentColor = ⊥:
      continue // Wait until we hear of a color from a peer

    sampledNodes = sample(N, k) // Sample k nodes from the network
    sampledPreferences = [query(node, currentColor) ∀ node ∈ sampledNodes]
    sampledPreferenceCounts = count(sampledPreferences) // Map each color to the number of times it was sampled
    sampledMajorityColor = argmax(sampledPreferenceCounts)
    if sampledMajorityColor ∉ {Red, Blue}: // The network is still initializing its preferences
      confidence = 0
      continue

    majorityCount = sampledPreferenceCounts[sampledMajorityColor]
    if majorityCount < α:
      confidence = 0 // No one got an alpha majority
      continue

    preferenceStrength[sampledMajorityColor]++
    minorityColor = otherColor(majorityColor)
    if preferenceStrength[majorityColor] > preferenceStrength[minorityColor]:
      currentColor = majorityColor // Update the color we report to other peers

    if majorityColor != lastMajorityColor:
      confidence = 0
    lastMajorityColor = majorityColor
    confidence++
  currentColor = lastMajorityColor

We can see the two mechanisms at play.

The node determines it's currentColor by adopting whichever color has had the most alphaMajority samples.
The node determines that the network will finalize lastMajorityColor once it sees β consecutive samples where lastMajorityColor had an alphaMajority.

Snowball Simulation

With this specification, we can run simulations on how many rounds of sampling it takes for the network to finalize.

Parameters:

N = 250
K = 21
β = 30

The network is initialized with 125 Red and 125 Blue nodes. Nodes perform sampling in lockstep.

For small values of α, the majority of the sampling time is done to trigger the finalization mechanism. The finalization detector requires at least β = 30 rounds until termination. However, as α increases, the number of rounds increases significantly higher than β. This is because the color selection mechanism becomes less aggressive as α increases towards K.

This is a major reason why α = 15 and K = 20 are the current Mainnet consensus parameters.

When nodes are randomly selected to perform the sampling, versus in lockstep, the average rounds until termination decreases.

However, the growth pattern is similar as α increases.

Conflicting Optimization

As the above graph shows, increasing α can significantly increase the required rounds until termination. This is because having a high α can take an extended amount of time for nodes to adopt the majority color. However, increasing α is often desirable to increase the safety of the decision.

Increasing α trades off performance for safety.

Suggested change

Snowball's color selection and finalization detection mechanisms should be better separated. Specifically, the usage of α in the Snowball specification should be modified to have two values: alphaPreference and alphaConfidence.

It is required that alphaPreference <= alphaConfidence.

The updated Snowball specification would be:

def onQuery(peer, peerColor):
  respond(peer, currentColor) // Send our current color to the peer, which might be ⊥
  if currentColor = ⊥:
    currentColor = peerColor // If we do not currently have a color, adopt the peer's color

def snowball():
  confidence = 0
  preferenceStrength = {Red:0, Blue:0}
  while confidence < β:
    if currentColor = ⊥:
      continue // Wait until we hear of a color from a peer

    sampledNodes = sample(N, k) // Sample k nodes from the network
    sampledPreferences = [query(node, currentColor) ∀ node ∈ sampledNodes]
    sampledPreferenceCounts = count(sampledPreferences) // Map each color to the number of times it was sampled
    sampledMajorityColor = argmax(sampledPreferenceCounts)
    if sampledMajorityColor ∉ {Red, Blue}: // The network is still initializing its preferences
      confidence = 0
      continue

    majorityCount = sampledPreferenceCounts[sampledMajorityColor]
-    if majorityCount < α:
+    if majorityCount < alphaPreference:
      confidence = 0 // No one got an alpha majority
      continue

    preferenceStrength[sampledMajorityColor]++
    minorityColor = otherColor(majorityColor)
    if preferenceStrength[majorityColor] > preferenceStrength[minorityColor]:
      currentColor = majorityColor // Update the color we report to other peers

+    if majorityCount < alphaConfidence:
+      confidence = 0 // No one got an alphaConfidence majority
+      continue
+
    if majorityColor != lastMajorityColor:
      confidence = 0
    lastMajorityColor = majorityColor
    confidence++
  currentColor = lastMajorityColor

Updated Snowball Simulation

Parameters:

N               = 250
K               = 21
β               = 30
AlphaPreference = 11

The network is initialized with 125 Red and 125 Blue nodes.

Nodes performing sampling in lockstep:

Sampling which node to perform the sampling:

These simulations show that keeping alphaPreference constant while increasing alphaConfidence increases the rounds until termination significantly less than increasing α in the current protocol specification.

Safety Impact

While modifying the color selection does impact the safety analysis, the impact is not expected to be significant.

As a simple argument for why safety shouldn't be drastically impacted, consider the case that one virtuous node has accepted Red. If byzantine nodes switching their color to Blue caused polls to see more Blue than Red, safety will eventually be violated. This is because eventually nodes will switch their preferences, except for the one node that has accepted Red, to Blue. Once enough nodes have updated their preferences to Blue, another virtuous node may accept Blue. Notice that this is independent of α. α in this situation only impacts the speed at which such a failure would occur.

The actual impact on the safety analysis is due to the variance in sampling. Larger α values protect more against unlikely samples of a disproportionate number of byzantine nodes. Before modifying the default consensus parameters, this should be included in the formal safety analysis.

Additionally, this change can unblock using a larger alphaConfidence than currently configured without significantly degrading performance.

Why this should be merged

This change meaningfully improves the performance of the core consensus mechanism.
This change does not actually change the behavior of the node. The node will only behave differently if alphaPreference is set to a different value than alphaConfidence. Changing the default value of these parameters is left to a future proposal.

How this was tested

Performance simulations
CI

marun

Where can I find documentation about avalanchego consensus? I'm curious about the different implementations (binary, nnary, flat, etc) and what they are used for.

snow/consensus/snowball/binary_snowball_test.go

StephenButtolph · 2023-10-03T17:47:23Z

Where can I find documentation about avalanchego consensus? I'm curious about the different implementations (binary, nnary, flat, etc) and what they are used for.

In general - the only thing that is used from the snowball package is snowball.NewTree. Flat (and correspondingly all the nnary implementations) could be moved into test files.

As for binary - the spec has only been slightly modified since the original paper: https://assets.website-files.com/5d80307810123f5ffbb34d6e/6009805681b416f34dcae012_Avalanche%20Consensus%20Whitepaper.pdf

subnets/config_test.go

abi87 · 2023-10-04T14:38:15Z

@StephenButtolph can you elaborate on the difference between lockstep vs sampled simulations? They seems to represent a synchronous vs asynchronous enviroment but not sure.

StephenButtolph · 2023-10-04T15:55:50Z

can you elaborate on the difference between lockstep vs sampled simulations?

When run in lockstep:

every (virtuous) node samples k peers and stores their color
after the sampling, all of the samples are then applied (where nodes update their colors).

This means that in any given round, the nodes observe the same colors of virtuous nodes. This was meant to behave as if all the nodes are very equal and there is some (constant) network delay.

When running by sampling nodes:

a random (non-finalized + virtuous) node is selected (uniformly randomly)
the selected node samples k peers and applies those votes (may update its color)

This is why the last graph Rounds Until Termination by alphaConfidence (Sampled Node) grows more with a higher alphaConfidence than Rounds Until Termination by alphaConfidence (Lockstep). Specifically, the network is waiting for the last few virtuous nodes to be sampled in the first step so that their color updates. This kind of emulates where some nodes in the network are "slower" than other nodes.

snow/consensus/snowball/tree.go

abi87

nits and possibly a type in a UT. Still looking at this.

snow/consensus/snowball/tree.go

snow/consensus/snowball/parameters.go

snow/consensus/snowman/topological.go

snow/consensus/snowball/binary_snowflake.go

snow/consensus/snowball/consensus_performance_test.go

snow/consensus/snowball/nnary_snowflake.go

gyuho

lgtm

just nits to make sure I understand the test changes :)

snow/consensus/snowball/flat_test.go

snow/consensus/snowball/nnary_snowflake_test.go

addressed

abi87

lgtm

gyuho · 2023-10-05T22:55:40Z

Lgtm 🫡

commit 188f2b2 Author: David Boehm <91908103+dboehm-avalabs@users.noreply.github.com> Date: Mon Oct 16 13:10:00 2023 -0400 MerkleDB Reduce buffer creation/memcopy on path construction (#2124) Co-authored-by: Dan Laine <daniel.laine@avalabs.org> commit 9d44ec2 Author: Alberto Benegiamo <alberto.benegiamo@gmail.com> Date: Mon Oct 16 08:53:59 2023 -0700 Validator Diffs: docs and UTs cleanup (#2037) Co-authored-by: Stephen Buttolph <stephen@avalabs.org> commit 50f131e Author: Patrick O'Grady <prohb125@gmail.com> Date: Thu Oct 12 23:04:02 2023 -0700 [x/merkledb] `Prefetcher` interface (#2167) commit 007f98d Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu Oct 12 22:33:23 2023 -0700 Bump google.golang.org/grpc from 1.55.0 to 1.58.3 (#2159) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Stephen Buttolph <stephen@avalabs.org> commit 8d2c4f2 Author: Stephen Buttolph <stephen@avalabs.org> Date: Thu Oct 12 19:44:04 2023 -0400 Use set.Of rather than set.Add (#2164) commit 9da2e62 Author: Stephen Buttolph <stephen@avalabs.org> Date: Thu Oct 12 17:41:27 2023 -0400 Remove context lock from API VM interface (#2165) commit 9dbf82a Author: Stephen Buttolph <stephen@avalabs.org> Date: Wed Oct 11 19:48:54 2023 -0400 Remove write lock option from the avm rpc API (#2156) commit 2eb6e84 Author: Stephen Buttolph <stephen@avalabs.org> Date: Wed Oct 11 19:39:44 2023 -0400 Remove write lock option from the platformvm API (#2157) commit 9725095 Author: Dhruba Basu <7675102+dhrubabasu@users.noreply.github.com> Date: Wed Oct 11 14:12:29 2023 -0700 Remove aliasing of `math` standard lib (#2163) commit 18fbdef Author: Stephen Buttolph <stephen@avalabs.org> Date: Wed Oct 11 16:57:11 2023 -0400 Remove lock options from the admin API (#2150) commit aae7260 Author: Stephen Buttolph <stephen@avalabs.org> Date: Wed Oct 11 16:56:13 2023 -0400 Remove lock options from the IPCs api (#2151) commit 8247f74 Author: Stephen Buttolph <stephen@avalabs.org> Date: Wed Oct 11 16:56:05 2023 -0400 Remove write lock option from the xsvm API (#2152) commit 1bc63d4 Author: Dhruba Basu <7675102+dhrubabasu@users.noreply.github.com> Date: Wed Oct 11 16:34:12 2023 -0400 Rename `removeSubnetValidatorValidation` to `verifyRemoveSubnetValidatorTx` (#2162) commit 99fc926 Author: Stephen Buttolph <stephen@avalabs.org> Date: Wed Oct 11 12:39:01 2023 -0400 Fix json marshalling of Sets (#2161) commit 0f95f13 Author: Stephen Buttolph <stephen@avalabs.org> Date: Wed Oct 11 11:37:57 2023 -0400 Remove write lock option from the avm wallet API (#2155) commit e6dab5d Author: Stephen Buttolph <stephen@avalabs.org> Date: Wed Oct 11 11:30:41 2023 -0400 Remove write lock option from the avm static API (#2154) commit 7f61fee Author: Stephen Buttolph <stephen@avalabs.org> Date: Tue Oct 10 23:17:17 2023 -0400 Remove lock options from the info api (#2149) commit c50ea11 Author: Stephen Buttolph <stephen@avalabs.org> Date: Tue Oct 10 23:13:19 2023 -0400 Marshal blocks and transactions inside API calls (#2153) commit 0ac1937 Author: kyoshisuki <143475866+kyoshisuki@users.noreply.github.com> Date: Tue Oct 10 20:12:25 2023 -0700 Fix typo in block formation logic documentation (#2158) commit 145dfb0 Author: Stephen Buttolph <stephen@avalabs.org> Date: Tue Oct 10 22:18:27 2023 -0400 Update versions for v1.10.12 (#2139) commit 6d53e51 Author: Stephen Buttolph <stephen@avalabs.org> Date: Fri Oct 6 18:08:15 2023 -0400 Split Alpha into AlphaPreference and AlphaConfidence (#2125) commit 1fc8973 Author: Stephen Buttolph <stephen@avalabs.org> Date: Thu Oct 5 17:50:20 2023 -0400 Add additional payload.Hash examples (#2145) Signed-off-by: Stephen Buttolph <stephen@avalabs.org> Co-authored-by: Dhruba Basu <7675102+dhrubabasu@users.noreply.github.com> Signed-off-by: Joshua Kim <20001595+joshua-kim@users.noreply.github.com>

hsk81 · 2023-10-28T23:33:06Z

In my own simulations, the number of rounds until finalization used to go higher when the underlying stake distribution was weird with no expected mean. See "Law of Large Numbers":

《The average of the results obtained from a large number of trials may fail to converge in some cases. For instance, the average of n results taken from the Cauchy distribution or some Pareto distributions (α<1) will not converge as n becomes larger; the reason is heavy tails.[4] 》

What distribution and parametrization did you assume during the simulations? See my repository if you're interested:

https://github.com/hsk81/avalanche-sim

Split Alpha into AlphaPreference and AlphaConfidence

d148dcd

StephenButtolph added enhancement New feature or request consensus This involves consensus labels Oct 3, 2023

StephenButtolph requested a review from abi87 as a code owner October 3, 2023 00:24

StephenButtolph self-assigned this Oct 3, 2023

StephenButtolph requested review from danlaine and gyuho as code owners October 3, 2023 00:24

marun reviewed Oct 3, 2023

View reviewed changes

snow/consensus/snowball/binary_snowball_test.go Show resolved Hide resolved

snow/consensus/snowball/binary_snowball_test.go Show resolved Hide resolved

aaronbuchwald approved these changes Oct 3, 2023

View reviewed changes

abi87 reviewed Oct 4, 2023

View reviewed changes

subnets/config_test.go Show resolved Hide resolved

StephenButtolph added this to the v1.10.13 milestone Oct 5, 2023

abi87 reviewed Oct 5, 2023

View reviewed changes

snow/consensus/snowball/tree.go Outdated Show resolved Hide resolved

abi87 previously requested changes Oct 5, 2023

View reviewed changes

gyuho approved these changes Oct 5, 2023

View reviewed changes

Apply suggestions from code review

06e7f7e

StephenButtolph requested a review from abi87 October 5, 2023 18:41

abi87 approved these changes Oct 5, 2023

View reviewed changes

StephenButtolph modified the milestones: v1.10.13, v1.10.12 Oct 6, 2023

StephenButtolph added 2 commits October 6, 2023 14:21

Keep compatibility with --snow-quorum-size flag users

3f35fba

Merge branch 'dev' into alpha-preference

3ec3333

StephenButtolph added this pull request to the merge queue Oct 6, 2023

Merged via the queue into dev with commit 6d53e51 Oct 6, 2023
16 checks passed

StephenButtolph deleted the alpha-preference branch October 6, 2023 22:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split Alpha into AlphaPreference and AlphaConfidence #2125

Split Alpha into AlphaPreference and AlphaConfidence #2125

StephenButtolph commented Oct 3, 2023

marun left a comment

StephenButtolph commented Oct 3, 2023 •

edited

abi87 commented Oct 4, 2023

StephenButtolph commented Oct 4, 2023 •

edited

abi87 left a comment

gyuho left a comment

abi87 left a comment

gyuho commented Oct 5, 2023

hsk81 commented Oct 28, 2023

Split Alpha into AlphaPreference and AlphaConfidence #2125

Split Alpha into AlphaPreference and AlphaConfidence #2125

Conversation

StephenButtolph commented Oct 3, 2023

Background

Snowman

Snowball

Snowball Simulation

Conflicting Optimization

Suggested change

Updated Snowball Simulation

Safety Impact

Why this should be merged

How this was tested

marun left a comment

Choose a reason for hiding this comment

StephenButtolph commented Oct 3, 2023 • edited

abi87 commented Oct 4, 2023

StephenButtolph commented Oct 4, 2023 • edited

abi87 left a comment

Choose a reason for hiding this comment

gyuho left a comment

Choose a reason for hiding this comment

abi87 left a comment

Choose a reason for hiding this comment

gyuho commented Oct 5, 2023

hsk81 commented Oct 28, 2023

StephenButtolph commented Oct 3, 2023 •

edited

StephenButtolph commented Oct 4, 2023 •

edited