Skip to content

Commit

Permalink
Account for ouroboros-network changes
Browse files Browse the repository at this point in the history
* Update `ouroboros-consensus-diffusion` to `ouroboros-network-0.16.0.0`

* GSM: use diffusion layer info for HAA
  Concretely, this addresses
  #974 in the context of
  bootstrap peers, but this code will also work for the "actual" Honest
  Availability Assumption of Genesis once implemented on the Network side.

* BootstrapPeers.md: include pseudo-HAA

* Remove `ShowProxy` instance of `SlotNo`
  It is now provided by `ouroboros-network`.
  • Loading branch information
Lucsanszky committed May 9, 2024
1 parent 4faa552 commit 3f873d6
Show file tree
Hide file tree
Showing 9 changed files with 82 additions and 66 deletions.
10 changes: 10 additions & 0 deletions cabal.project
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,16 @@ source-repository-package
eras/byron/crypto
eras/byron/crypto/test

source-repository-package
type: git
location: https://github.com/IntersectMBO/ouroboros-network
tag: 821feaa29b9f82d1364345348b304484bc189283
--sha256: sha256-yQEj74pnTVJQeeHNRr5ejW1KSKp89OFbNROtOdthG/o=
subdir: ouroboros-network
ouroboros-network-api
ouroboros-network-framework
ouroboros-network-protocols

-- We want to always build the test-suites and benchmarks
tests: true
benchmarks: true
Expand Down
42 changes: 32 additions & 10 deletions docs/website/contents/for-developers/BootstrapPeersIER.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,32 +12,48 @@ The following state machine depicts the desired behavior of the node.

```mermaid
graph
OnlyBootstrap[OnlyBootstrap]
CaughtUp[CaughtUp]
subgraph OnlyBootstrap
direction TB
PreSyncing[PreSyncing]
Syncing[Syncing]
PreSyncing -- "Honest Availability Assumption\nis satisfied" --> Syncing
Syncing -- "Honest Availability Assumption\nis no longer satisfied" --> PreSyncing
end
OnlyBootstrap -- "no peers claim to have\nsubsequent headers,\nand its selection is ≥\nthe best header" --> CaughtUp
CaughtUp -- "vol tip became older than X" --> OnlyBootstrap
CaughtUp[CaughtUp]
Syncing -- "no peers claim to have\nsubsequent headers,\nand its selection is ≥\nthe best header" --> CaughtUp
CaughtUp -- "vol tip became older than X" ----> PreSyncing
StartUp[[Node start-up]]
StartUp -- "node was most recently in CaughtUp\nand vol tip is younger than X" --> CaughtUp
StartUp -- "otherwise" --> OnlyBootstrap
StartUp -- "otherwise" --> PreSyncing
```

- `OnlyBootstrap` state - All upstream peers must reside in a centralized set of trusted _bootstrap peers_.
- `OnlyBootstrap` - All upstream peers must be trusted.

In the context of bootstrap peers, as all peers are trusted, the _Honest Availability Assumption_ is satisfied in the following cases:

- The node is configured to connect to bootstrap peers, and it has established a connection to a bootstrap peer.

- The node is not configured to connect to bootstrap peers. This is the case for eg block producers and hidden relays. They will only be connected to trusted local root peers (eg the relays for a block-producing node).

- `CaughtUp` state - The peers are chosen according to the P2P design, including the _ledger peers_ etc.

**Desideratum 2.**
In particular, the transitions should happen promptly.

- `CaughtUp -> OnlyBootstrap` should be prompt in order to minimize the duration that the node is exposed to untrusted peers (aka non-bootstrap peers) while its stale volatile tip is making it vulnerable.
- `CaughtUp -> PreSyncing` should be prompt in order to minimize the duration that the node is exposed to untrusted peers (aka non-bootstrap peers) while its stale volatile tip is making it vulnerable.
Delays here would directly threaten the security of the node.

- `OnlyBootstrap -> CaughtUp` should be prompt so that the centralized, relatively-few bootstrap peers are relieved of load as soon as possible.
- `Syncing -> CaughtUp` should be prompt so that the centralized, relatively-few bootstrap peers are relieved of load as soon as possible.
Delays here would not directly threaten the security of the node.
However, wasting the centralized resources would threaten the ability of nodes to join the net, ie the availability of the whole net.
Determining the exact load constraints for the bootstrap peers is not yet finalized.

- `PreSyncing -> Syncing` should be prompt to allow the node to conclude that is is caught up as a follow-up.

- `Syncing -> PreSyncing` should be prompt to prevent the node from concluding that it is caught up while it is not actually connected to a bootstrap peers.

**Desideratum 3.**
The node should not return to `OnlyBootstrap` every time it restarts/briefly loses network/etc.
Such unnecessary connections would also put unnecessary load on the centralized, relatively-few bootstrap peers.
Expand All @@ -59,11 +75,11 @@ This is the point of the "Node start-up" pseudo state in the diagram above.

As the volatile tip age approaches X, the Consensus Layer could forewarn the Diffusion Layer, eg "it seems like the transition back to OnlyBootstrap will be necessary soon; please prepare", if that would be helpful.

- For similar reasons, the Diffusion Layer should also manage the disconections from all peers upon the `OnlyBootstrap -> CaughtUp` transition.
- For similar reasons, the Diffusion Layer should also manage the disconections from all (bootstrap) peers upon the `OnlyBootstrap -> CaughtUp` transition.

## Anticipated Interface

See [IntersectMBO/ouroboros-network#4555](https://github.com/IntersectMBO/ouroboros-network/pull/4555) for the definition/implementation of this interface on the Network side.
See [IntersectMBO/ouroboros-network#4555](https://github.com/IntersectMBO/ouroboros-network/pull/4555) and [IntersectMBO/ouroboros-network#4846](https://github.com/IntersectMBO/ouroboros-network/pull/4846) for the definition/implementation of this interface on the Network side.

- The Diffusion Layer should monitor a `TVar State` (maybe via a `STM State` action).
The Consensus Layer will update that state promptly.
Expand All @@ -75,6 +91,12 @@ See [IntersectMBO/ouroboros-network#4555](https://github.com/IntersectMBO/ourobo
Here, `YoungEnough` signals that the ledger state's distribution among stake relays is sufficiently close to that of the actual real world.
For now, we conservatively return `YoungEnough` only when the node concludes it has fully caught-up, and `TooOld` otherwise.

- The Diffusion Layer will inform the Consensus Layer whether the Honest Availability Assumption is satisfied.
```haskell
data OutboundConnectionsState = TrustedStateWithExternalPeers | UntrustedState
daUpdateOutboundConnectionsState :: OutboundConnectionsState -> STM m ()
```

- Whenever necessary, the Diffusion Layer can ask the Consensus Layer for the ledger peer information, eg
```haskell
lpGetLedgerPeers :: STM m [(PoolStake, NonEmpty RelayAccessPoint)]
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
### Non-Breaking

- Implemented the Honest Availability Assumption properly (both for
Praos/"Genesis Lite" and Genesis) based on newly exposed state by the
diffusion layer.
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
### Non-Breaking

- Upgraded to `ouroboros-network-0.16`
Original file line number Diff line number Diff line change
Expand Up @@ -89,10 +89,10 @@ library
io-classes ^>=1.4.1,
mtl,
ouroboros-consensus ^>=0.17,
ouroboros-network ^>=0.14,
ouroboros-network-api ^>=0.7,
ouroboros-network-framework ^>=0.12,
ouroboros-network-protocols ^>=0.8,
ouroboros-network ^>=0.16,
ouroboros-network-api ^>=0.7.2,
ouroboros-network-framework ^>=0.13,
ouroboros-network-protocols ^>=0.8.1,
random,
safe-wild-cards ^>=1.0,
serialise ^>=0.2,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ module Ouroboros.Consensus.Node (
-- * Standard arguments
, StdRunNodeArgs (..)
, stdBfcSaltIO
, stdChainSyncTimeout
, stdGsmAntiThunderingHerdIO
, stdKeepAliveRngIO
, stdLowLevelRunNodeArgsIO
Expand Down Expand Up @@ -60,6 +59,7 @@ import qualified Codec.CBOR.Encoding as CBOR
import Codec.Serialise (DeserialiseFailure)
import qualified Control.Concurrent.Class.MonadSTM.Strict as StrictSTM
import Control.DeepSeq (NFData)
import Control.Monad (when)
import Control.Monad.Class.MonadTime.SI (MonadTime)
import Control.Monad.Class.MonadTimer.SI (MonadTimer)
import Control.Tracer (Tracer, contramap, traceWith)
Expand Down Expand Up @@ -110,9 +110,9 @@ import Ouroboros.Consensus.Util.ResourceRegistry
import Ouroboros.Consensus.Util.Time (secondsToNominalDiffTime)
import Ouroboros.Network.BlockFetch (BlockFetchConfiguration (..))
import qualified Ouroboros.Network.Diffusion as Diffusion
import qualified Ouroboros.Network.Diffusion.Configuration as Diffusion
import qualified Ouroboros.Network.Diffusion.NonP2P as NonP2P
import qualified Ouroboros.Network.Diffusion.P2P as P2P
import qualified Ouroboros.Network.Diffusion.Policies as Diffusion
import Ouroboros.Network.Magic
import Ouroboros.Network.NodeToClient (ConnectionId, LocalAddress,
LocalSocket, NodeToClientVersionData (..), combineVersions,
Expand All @@ -129,15 +129,14 @@ import Ouroboros.Network.PeerSelection.PeerMetric (PeerMetrics,
import Ouroboros.Network.PeerSelection.PeerSharing (PeerSharing)
import Ouroboros.Network.PeerSelection.PeerSharing.Codec
(decodeRemoteAddress, encodeRemoteAddress)
import Ouroboros.Network.Protocol.Limits (shortWait)
import Ouroboros.Network.RethrowPolicy
import qualified SafeWildCards
import System.Exit (ExitCode (..))
import System.FilePath ((</>))
import System.FS.API (SomeHasFS (..))
import System.FS.API.Types
import System.FS.IO (ioHasFS)
import System.Random (StdGen, newStdGen, randomIO, randomRIO, split)
import System.Random (StdGen, newStdGen, randomIO, split)

{-------------------------------------------------------------------------------
The arguments to the Consensus Layer node functionality
Expand Down Expand Up @@ -626,7 +625,11 @@ runWith RunNodeArgs{..} encAddrNtN decAddrNtN LowLevelRunNodeArgs{..} =
lpGetLatestSlot = getImmTipSlot kernel,
lpGetLedgerPeers = fromMaybe [] <$> getPeersFromCurrentLedger kernel (const True),
lpGetLedgerStateJudgement = getLedgerStateJudgement kernel
}
},
Diffusion.daUpdateOutboundConnectionsState =
let varOcs = getOutboundConnectionsState kernel in \newOcs -> do
oldOcs <- readTVar varOcs
when (newOcs /= oldOcs) $ writeTVar varOcs newOcs
}

localRethrowPolicy :: RethrowPolicy
Expand Down Expand Up @@ -732,7 +735,7 @@ mkNodeKernelArgs
, blockFetchSize = estimateBlockSize
, mempoolCapacityOverride = NoMempoolCapacityBytesOverride
, miniProtocolParameters = defaultMiniProtocolParameters
, blockFetchConfiguration = defaultBlockFetchConfiguration
, blockFetchConfiguration = Diffusion.defaultBlockFetchConfiguration bfcSalt
, gsmArgs = GsmNodeKernelArgs {
gsmAntiThunderingHerd
, gsmDurationUntilTooOld
Expand All @@ -744,15 +747,6 @@ mkNodeKernelArgs
, peerSharingRng = psRng
, publicPeerSelectionStateVar
}
where
defaultBlockFetchConfiguration :: BlockFetchConfiguration
defaultBlockFetchConfiguration = BlockFetchConfiguration
{ bfcMaxConcurrencyBulkSync = 1
, bfcMaxConcurrencyDeadline = 1
, bfcMaxRequestsInflight = fromIntegral $ blockFetchPipeliningMax defaultMiniProtocolParameters
, bfcDecisionLoopInterval = 0.01 -- 10ms
, bfcSalt
}

-- | We allow the user running the node to customise the 'NodeKernelArgs'
-- through 'llrnCustomiseNodeKernelArgs', but there are some limits to some
Expand Down Expand Up @@ -800,33 +794,6 @@ stdGsmAntiThunderingHerdIO = newStdGen
stdKeepAliveRngIO :: IO StdGen
stdKeepAliveRngIO = newStdGen

stdChainSyncTimeout :: IO NTN.ChainSyncTimeout
stdChainSyncTimeout = do
-- These values approximately correspond to false positive
-- thresholds for streaks of empty slots with 99% probability,
-- 99.9% probability up to 99.999% probability.
-- t = T_s [log (1-Y) / log (1-f)]
-- Y = [0.99, 0.999...]
-- T_s = slot length of 1s.
-- f = 0.05
-- The timeout is randomly picked per bearer to avoid all bearers
-- going down at the same time in case of a long streak of empty
-- slots.
-- To avoid global synchronosation the timeout is picked uniformly
-- from the interval 135 - 269, corresponds to the a 99.9% to
-- 99.9999% thresholds.
-- TODO: The timeout should be drawn at random everytime chainsync
-- enters the must reply state. A static per connection timeout
-- leads to selection preassure for connections with a large
-- timeout, see #4244.
mustReplyTimeout <- Just <$> realToFrac <$> randomRIO (135,269 :: Double)
return NTN.ChainSyncTimeout
{ canAwaitTimeout = shortWait
, intersectTimeout = shortWait
, mustReplyTimeout
, idleTimeout = Just 3673
}

stdVersionDataNTN :: NetworkMagic
-> DiffusionMode
-> PeerSharing
Expand Down Expand Up @@ -887,7 +854,7 @@ stdLowLevelRunNodeArgsIO RunNodeArgs{ rnProtocolInfo
llrnKeepAliveRng <- stdKeepAliveRngIO
pure LowLevelRunNodeArgs
{ llrnBfcSalt
, llrnChainSyncTimeout = fromMaybe stdChainSyncTimeout srnChainSyncTimeout
, llrnChainSyncTimeout = fromMaybe Diffusion.defaultChainSyncTimeout srnChainSyncTimeout
, llrnChainSyncLoPBucketConfig = ChainSyncLoPBucketDisabled
, llrnCustomiseHardForkBlockchainTimeArgs = id
, llrnGsmAntiThunderingHerd
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,8 @@ import Ouroboros.Network.NodeToNode (ConnectionId,
import Ouroboros.Network.PeerSelection.Bootstrap (UseBootstrapPeers)
import Ouroboros.Network.PeerSelection.LedgerPeers.Type
(LedgerStateJudgement (..))
import Ouroboros.Network.PeerSelection.LocalRootPeers
(OutboundConnectionsState (..))
import Ouroboros.Network.PeerSharing (PeerSharingAPI,
PeerSharingRegistry, newPeerSharingAPI,
newPeerSharingRegistry, ps_POLICY_PEER_SHARE_MAX_PEERS,
Expand Down Expand Up @@ -149,6 +151,9 @@ data NodeKernel m addrNTN addrNTC blk = NodeKernel {
, setBlockForging :: [BlockForging m blk] -> m ()

, getPeerSharingAPI :: PeerSharingAPI addrNTN StdGen m

, getOutboundConnectionsState
:: StrictTVar m OutboundConnectionsState
}

-- | Arguments required when initializing a node
Expand Down Expand Up @@ -204,6 +209,8 @@ initNodeKernel args@NodeKernelArgs { registry, cfg, tracers
, varLedgerJudgement
} = st

varOutboundConnectionsState <- newTVarIO UntrustedState

do let GsmNodeKernelArgs {..} = gsmArgs
gsmTracerArgs =
( castTip . either AF.anchorToTip tipFromHeader . AF.head . fst
Expand Down Expand Up @@ -239,9 +246,12 @@ initNodeKernel args@NodeKernelArgs { registry, cfg, tracers
gsmMarkerFileView
, GSM.writeGsmState = \x -> atomically $ do
writeTVar varLedgerJudgement $ GSM.gsmStateToLedgerJudgement x
, -- In the context of bootstrap peers, it is fine to always
-- return 'True' as all peers are trusted during syncing.
GSM.isHaaSatisfied = pure True
, GSM.isHaaSatisfied = do
readTVar varOutboundConnectionsState <&> \case
-- See the upstream Haddocks for the exact conditions under
-- which the diffusion layer is in this state.
TrustedStateWithExternalPeers -> True
UntrustedState -> False
}
judgment <- readTVarIO varLedgerJudgement
void $ forkLinkedThread registry "NodeKernel.GSM" $ case judgment of
Expand Down Expand Up @@ -278,6 +288,8 @@ initNodeKernel args@NodeKernelArgs { registry, cfg, tracers
, getTracers = tracers
, setBlockForging = \a -> atomically . LazySTM.putTMVar blockForgingVar $! a
, getPeerSharingAPI = peerSharingAPI
, getOutboundConnectionsState
= varOutboundConnectionsState
}
where
blockForgingController :: InternalState m remotePeer localPeer blk
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
### Breaking

- Remove `ShowProxy` instance of `SlotNo` (now provided by `ouroboros-network`)
Original file line number Diff line number Diff line change
Expand Up @@ -64,12 +64,6 @@ instance Serialise (VerKeyDSIGN MockDSIGN) where
encode = encodeVerKeyDSIGN
decode = decodeVerKeyDSIGN

{-------------------------------------------------------------------------------
ShowProxy
-------------------------------------------------------------------------------}

instance ShowProxy SlotNo where

{-------------------------------------------------------------------------------
NoThunks
-------------------------------------------------------------------------------}
Expand Down

0 comments on commit 3f873d6

Please sign in to comment.