Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split Alpha into AlphaPreference and AlphaConfidence #2125

Merged
merged 4 commits into from Oct 6, 2023

Conversation

StephenButtolph
Copy link
Contributor

Background

The Avalanche network uses the Snow* consensus protocols to reach distributed agreement. Specifically, each blockchain runs its own isolated instance of Snowman.

Snowman

Snowman is a multi-decree multivariate chain-based consensus process built from the single-decree binary consensus protocol, Snowball. It runs (potentially many) dependent Snowball protocols concurrently, which form a binary tree with the root being the last accepted Snowball decision.

As an example:

graph BT
  A[LastAccepted]
  B[Block A]
  C[Block B]
  D[Block C]

  B-->A
  C-->A
  D-->C

The tree above has 3 processing blocks, A, B, and C. For simplicity, we'll ignore that blocks can have many conflicts. Since A and B are conflicting there is a Snowball instance that will determine which block to accept. Even though C is built on top of B, this does not impact the likelihood of B being accepted over A. The only mechanism that decides between A and B is the Snowball instance.

Snowball

Snowball's goal is to have a network agree on one of two values. Typically, we refer to these values as Red and Blue. This process can itself be broken into 2 primary mechanisms.

  1. Color selection - Each node attempts to match its own color to the network's majority color. Over time, the network should converge so that every correct node reports the same color.
  2. Finalization detection - Each node determines when to stop changing their color.

It's critical that these mechanisms are able to tolerate arbitrary failures. Specifically, nodes reporting incorrect colors, or not reporting any color at all, must not impact network progress.

The existing Snowball specification is:

def onQuery(peer, peerColor):
  respond(peer, currentColor) // Send our current color to the peer, which might be ⊥
  if currentColor = ⊥:
    currentColor = peerColor // If we do not currently have a color, adopt the peer's color

def snowball():
  confidence = 0
  preferenceStrength = {Red:0, Blue:0}
  while confidence < β:
    if currentColor = ⊥:
      continue // Wait until we hear of a color from a peer

    sampledNodes = sample(N, k) // Sample k nodes from the network
    sampledPreferences = [query(node, currentColor) ∀ node ∈ sampledNodes]
    sampledPreferenceCounts = count(sampledPreferences) // Map each color to the number of times it was sampled
    sampledMajorityColor = argmax(sampledPreferenceCounts)
    if sampledMajorityColor ∉ {Red, Blue}: // The network is still initializing its preferences
      confidence = 0
      continue

    majorityCount = sampledPreferenceCounts[sampledMajorityColor]
    if majorityCount < α:
      confidence = 0 // No one got an alpha majority
      continue

    preferenceStrength[sampledMajorityColor]++
    minorityColor = otherColor(majorityColor)
    if preferenceStrength[majorityColor] > preferenceStrength[minorityColor]:
      currentColor = majorityColor // Update the color we report to other peers

    if majorityColor != lastMajorityColor:
      confidence = 0
    lastMajorityColor = majorityColor
    confidence++
  currentColor = lastMajorityColor

We can see the two mechanisms at play.

  1. The node determines it's currentColor by adopting whichever color has had the most alphaMajority samples.
  2. The node determines that the network will finalize lastMajorityColor once it sees β consecutive samples where lastMajorityColor had an alphaMajority.

Snowball Simulation

With this specification, we can run simulations on how many rounds of sampling it takes for the network to finalize.

Parameters:

N = 250
K = 21
β = 30

The network is initialized with 125 Red and 125 Blue nodes. Nodes perform sampling in lockstep.

Small Single Alpha Lockstep

For small values of α, the majority of the sampling time is done to trigger the finalization mechanism. The finalization detector requires at least β = 30 rounds until termination. However, as α increases, the number of rounds increases significantly higher than β. This is because the color selection mechanism becomes less aggressive as α increases towards K.

Large Single Alpha Lockstep

This is a major reason why α = 15 and K = 20 are the current Mainnet consensus parameters.

When nodes are randomly selected to perform the sampling, versus in lockstep, the average rounds until termination decreases.

Small Single Alpha Sampled Node

However, the growth pattern is similar as α increases.

Large Single Alpha Sampled Node

Conflicting Optimization

As the above graph shows, increasing α can significantly increase the required rounds until termination. This is because having a high α can take an extended amount of time for nodes to adopt the majority color. However, increasing α is often desirable to increase the safety of the decision.

Increasing α trades off performance for safety.

Suggested change

Snowball's color selection and finalization detection mechanisms should be better separated. Specifically, the usage of α in the Snowball specification should be modified to have two values: alphaPreference and alphaConfidence.

It is required that alphaPreference <= alphaConfidence.

The updated Snowball specification would be:

def onQuery(peer, peerColor):
  respond(peer, currentColor) // Send our current color to the peer, which might be ⊥
  if currentColor = ⊥:
    currentColor = peerColor // If we do not currently have a color, adopt the peer's color

def snowball():
  confidence = 0
  preferenceStrength = {Red:0, Blue:0}
  while confidence < β:
    if currentColor = ⊥:
      continue // Wait until we hear of a color from a peer

    sampledNodes = sample(N, k) // Sample k nodes from the network
    sampledPreferences = [query(node, currentColor) ∀ node ∈ sampledNodes]
    sampledPreferenceCounts = count(sampledPreferences) // Map each color to the number of times it was sampled
    sampledMajorityColor = argmax(sampledPreferenceCounts)
    if sampledMajorityColor ∉ {Red, Blue}: // The network is still initializing its preferences
      confidence = 0
      continue

    majorityCount = sampledPreferenceCounts[sampledMajorityColor]
-    if majorityCount < α:
+    if majorityCount < alphaPreference:
      confidence = 0 // No one got an alpha majority
      continue

    preferenceStrength[sampledMajorityColor]++
    minorityColor = otherColor(majorityColor)
    if preferenceStrength[majorityColor] > preferenceStrength[minorityColor]:
      currentColor = majorityColor // Update the color we report to other peers

+    if majorityCount < alphaConfidence:
+      confidence = 0 // No one got an alphaConfidence majority
+      continue
+
    if majorityColor != lastMajorityColor:
      confidence = 0
    lastMajorityColor = majorityColor
    confidence++
  currentColor = lastMajorityColor

Updated Snowball Simulation

Parameters:

N               = 250
K               = 21
β               = 30
AlphaPreference = 11

The network is initialized with 125 Red and 125 Blue nodes.

Nodes performing sampling in lockstep:
Dual Alpha Lockstep

Sampling which node to perform the sampling:
Dual Alpha Sampled Node

These simulations show that keeping alphaPreference constant while increasing alphaConfidence increases the rounds until termination significantly less than increasing α in the current protocol specification.

Safety Impact

While modifying the color selection does impact the safety analysis, the impact is not expected to be significant.

As a simple argument for why safety shouldn't be drastically impacted, consider the case that one virtuous node has accepted Red. If byzantine nodes switching their color to Blue caused polls to see more Blue than Red, safety will eventually be violated. This is because eventually nodes will switch their preferences, except for the one node that has accepted Red, to Blue. Once enough nodes have updated their preferences to Blue, another virtuous node may accept Blue. Notice that this is independent of α. α in this situation only impacts the speed at which such a failure would occur.

The actual impact on the safety analysis is due to the variance in sampling. Larger α values protect more against unlikely samples of a disproportionate number of byzantine nodes. Before modifying the default consensus parameters, this should be included in the formal safety analysis.

Additionally, this change can unblock using a larger alphaConfidence than currently configured without significantly degrading performance.

Why this should be merged

  • This change meaningfully improves the performance of the core consensus mechanism.
  • This change does not actually change the behavior of the node. The node will only behave differently if alphaPreference is set to a different value than alphaConfidence. Changing the default value of these parameters is left to a future proposal.

How this was tested

  • Performance simulations
  • CI

@StephenButtolph StephenButtolph added enhancement New feature or request consensus This involves consensus labels Oct 3, 2023
@StephenButtolph StephenButtolph self-assigned this Oct 3, 2023
Copy link
Contributor

@marun marun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where can I find documentation about avalanchego consensus? I'm curious about the different implementations (binary, nnary, flat, etc) and what they are used for.

@StephenButtolph
Copy link
Contributor Author

StephenButtolph commented Oct 3, 2023

Where can I find documentation about avalanchego consensus? I'm curious about the different implementations (binary, nnary, flat, etc) and what they are used for.

In general - the only thing that is used from the snowball package is snowball.NewTree. Flat (and correspondingly all the nnary implementations) could be moved into test files.

As for binary - the spec has only been slightly modified since the original paper: https://assets.website-files.com/5d80307810123f5ffbb34d6e/6009805681b416f34dcae012_Avalanche%20Consensus%20Whitepaper.pdf

@abi87
Copy link
Contributor

abi87 commented Oct 4, 2023

@StephenButtolph can you elaborate on the difference between lockstep vs sampled simulations? They seems to represent a synchronous vs asynchronous enviroment but not sure.

@StephenButtolph
Copy link
Contributor Author

StephenButtolph commented Oct 4, 2023

can you elaborate on the difference between lockstep vs sampled simulations?

When run in lockstep:

  1. every (virtuous) node samples k peers and stores their color
  2. after the sampling, all of the samples are then applied (where nodes update their colors).

This means that in any given round, the nodes observe the same colors of virtuous nodes. This was meant to behave as if all the nodes are very equal and there is some (constant) network delay.

When running by sampling nodes:

  1. a random (non-finalized + virtuous) node is selected (uniformly randomly)
  2. the selected node samples k peers and applies those votes (may update its color)

This is why the last graph Rounds Until Termination by alphaConfidence (Sampled Node) grows more with a higher alphaConfidence than Rounds Until Termination by alphaConfidence (Lockstep). Specifically, the network is waiting for the last few virtuous nodes to be sampled in the first step so that their color updates. This kind of emulates where some nodes in the network are "slower" than other nodes.

@StephenButtolph StephenButtolph added this to the v1.10.13 milestone Oct 5, 2023
abi87
abi87 previously requested changes Oct 5, 2023
Copy link
Contributor

@abi87 abi87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits and possibly a type in a UT. Still looking at this.

snow/consensus/snowball/tree.go Outdated Show resolved Hide resolved
snow/consensus/snowball/parameters.go Show resolved Hide resolved
snow/consensus/snowman/topological.go Show resolved Hide resolved
snow/consensus/snowball/binary_snowflake.go Show resolved Hide resolved
snow/consensus/snowball/nnary_snowflake.go Show resolved Hide resolved
Copy link
Contributor

@gyuho gyuho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

just nits to make sure I understand the test changes :)

snow/consensus/snowball/flat_test.go Show resolved Hide resolved
snow/consensus/snowball/flat_test.go Outdated Show resolved Hide resolved
snow/consensus/snowball/nnary_snowflake_test.go Outdated Show resolved Hide resolved
snow/consensus/snowball/nnary_snowflake_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@abi87 abi87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@gyuho
Copy link
Contributor

gyuho commented Oct 5, 2023

Lgtm 🫡

@StephenButtolph StephenButtolph modified the milestones: v1.10.13, v1.10.12 Oct 6, 2023
@StephenButtolph StephenButtolph added this pull request to the merge queue Oct 6, 2023
Merged via the queue into dev with commit 6d53e51 Oct 6, 2023
16 checks passed
@StephenButtolph StephenButtolph deleted the alpha-preference branch October 6, 2023 22:32
joshua-kim added a commit that referenced this pull request Oct 17, 2023
commit 188f2b2
Author: David Boehm <91908103+dboehm-avalabs@users.noreply.github.com>
Date:   Mon Oct 16 13:10:00 2023 -0400

    MerkleDB Reduce buffer creation/memcopy on path construction (#2124)

    Co-authored-by: Dan Laine <daniel.laine@avalabs.org>

commit 9d44ec2
Author: Alberto Benegiamo <alberto.benegiamo@gmail.com>
Date:   Mon Oct 16 08:53:59 2023 -0700

    Validator Diffs: docs and UTs cleanup (#2037)

    Co-authored-by: Stephen Buttolph <stephen@avalabs.org>

commit 50f131e
Author: Patrick O'Grady <prohb125@gmail.com>
Date:   Thu Oct 12 23:04:02 2023 -0700

    [x/merkledb] `Prefetcher` interface (#2167)

commit 007f98d
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Thu Oct 12 22:33:23 2023 -0700

    Bump google.golang.org/grpc from 1.55.0 to 1.58.3 (#2159)

    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    Co-authored-by: Stephen Buttolph <stephen@avalabs.org>

commit 8d2c4f2
Author: Stephen Buttolph <stephen@avalabs.org>
Date:   Thu Oct 12 19:44:04 2023 -0400

    Use set.Of rather than set.Add (#2164)

commit 9da2e62
Author: Stephen Buttolph <stephen@avalabs.org>
Date:   Thu Oct 12 17:41:27 2023 -0400

    Remove context lock from API VM interface (#2165)

commit 9dbf82a
Author: Stephen Buttolph <stephen@avalabs.org>
Date:   Wed Oct 11 19:48:54 2023 -0400

    Remove write lock option from the avm rpc API (#2156)

commit 2eb6e84
Author: Stephen Buttolph <stephen@avalabs.org>
Date:   Wed Oct 11 19:39:44 2023 -0400

    Remove write lock option from the platformvm API (#2157)

commit 9725095
Author: Dhruba Basu <7675102+dhrubabasu@users.noreply.github.com>
Date:   Wed Oct 11 14:12:29 2023 -0700

    Remove aliasing of `math` standard lib (#2163)

commit 18fbdef
Author: Stephen Buttolph <stephen@avalabs.org>
Date:   Wed Oct 11 16:57:11 2023 -0400

    Remove lock options from the admin API (#2150)

commit aae7260
Author: Stephen Buttolph <stephen@avalabs.org>
Date:   Wed Oct 11 16:56:13 2023 -0400

    Remove lock options from the IPCs api (#2151)

commit 8247f74
Author: Stephen Buttolph <stephen@avalabs.org>
Date:   Wed Oct 11 16:56:05 2023 -0400

    Remove write lock option from the xsvm API (#2152)

commit 1bc63d4
Author: Dhruba Basu <7675102+dhrubabasu@users.noreply.github.com>
Date:   Wed Oct 11 16:34:12 2023 -0400

    Rename `removeSubnetValidatorValidation` to `verifyRemoveSubnetValidatorTx` (#2162)

commit 99fc926
Author: Stephen Buttolph <stephen@avalabs.org>
Date:   Wed Oct 11 12:39:01 2023 -0400

    Fix json marshalling of Sets (#2161)

commit 0f95f13
Author: Stephen Buttolph <stephen@avalabs.org>
Date:   Wed Oct 11 11:37:57 2023 -0400

    Remove write lock option from the avm wallet API (#2155)

commit e6dab5d
Author: Stephen Buttolph <stephen@avalabs.org>
Date:   Wed Oct 11 11:30:41 2023 -0400

    Remove write lock option from the avm static API (#2154)

commit 7f61fee
Author: Stephen Buttolph <stephen@avalabs.org>
Date:   Tue Oct 10 23:17:17 2023 -0400

    Remove lock options from the info api (#2149)

commit c50ea11
Author: Stephen Buttolph <stephen@avalabs.org>
Date:   Tue Oct 10 23:13:19 2023 -0400

    Marshal blocks and transactions inside API calls (#2153)

commit 0ac1937
Author: kyoshisuki <143475866+kyoshisuki@users.noreply.github.com>
Date:   Tue Oct 10 20:12:25 2023 -0700

    Fix typo in block formation logic documentation (#2158)

commit 145dfb0
Author: Stephen Buttolph <stephen@avalabs.org>
Date:   Tue Oct 10 22:18:27 2023 -0400

    Update versions for v1.10.12 (#2139)

commit 6d53e51
Author: Stephen Buttolph <stephen@avalabs.org>
Date:   Fri Oct 6 18:08:15 2023 -0400

    Split Alpha into AlphaPreference and AlphaConfidence (#2125)

commit 1fc8973
Author: Stephen Buttolph <stephen@avalabs.org>
Date:   Thu Oct 5 17:50:20 2023 -0400

    Add additional payload.Hash examples (#2145)

    Signed-off-by: Stephen Buttolph <stephen@avalabs.org>
    Co-authored-by: Dhruba Basu <7675102+dhrubabasu@users.noreply.github.com>

Signed-off-by: Joshua Kim <20001595+joshua-kim@users.noreply.github.com>
@hsk81
Copy link

hsk81 commented Oct 28, 2023

In my own simulations, the number of rounds until finalization used to go higher when the underlying stake distribution was weird with no expected mean. See "Law of Large Numbers":

《The average of the results obtained from a large number of trials may fail to converge in some cases. For instance, the average of n results taken from the Cauchy distribution or some Pareto distributions (α<1) will not converge as n becomes larger; the reason is heavy tails.[4] 》

What distribution and parametrization did you assume during the simulations? See my repository if you're interested:

https://github.com/hsk81/avalanche-sim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consensus This involves consensus enhancement New feature or request
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

6 participants