Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Inconsistency between MsgHasTx / MsgNextTx in the LocalTxMonitor #1009

Closed
KtorZ opened this issue Mar 20, 2024 · 9 comments · Fixed by #1017
Closed

[BUG] - Inconsistency between MsgHasTx / MsgNextTx in the LocalTxMonitor #1009

KtorZ opened this issue Mar 20, 2024 · 9 comments · Fixed by #1017

Comments

@KtorZ
Copy link
Contributor

KtorZ commented Mar 20, 2024

Internal/External

Summary

The LocalTxMonitor protocol doesn't always respond appropriately to the MsgHasTx message. I've hunted down the issue and found that the root cause comes from the hard-fork combinator applied to the GenTxId.

Steps to reproduce

  1. Create a valid transaction
  2. Submit the transaction through the local submit protocol
  3. Before the transaction is removed from the mempool, acquire a mempool snapshot
  4. Ask whether the transaction is in the mempool (MsgHasTx) using a TxId wrapped in any other era than the current one.
  5. Observe that the reply is False
  6. Ask the next transaction present in the mempool (MsgNextTx) to confirm that the transaction is indeed in the snapshot.
Here is a minimal reproduction example in Haskell (requires a serialized transaction to be passed as 1st argument)
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE QuantifiedConstraints #-}
module Main where

import Prelude

import qualified Codec.CBOR.Encoding as Cbor
import qualified Codec.CBOR.Write as Cbor
import Control.Monad.Class.MonadTimer
import Control.Tracer
import Data.ByteString
import Data.ByteString.Base16
import qualified Data.ByteString.Lazy as LBS
import Data.Map
import Data.Proxy
import qualified Data.Text as Text
import qualified Data.Text.Encoding as Text
import Data.Void
import System.Environment

import Cardano.Chain.Slotting
import Cardano.Ledger.Api
import Cardano.Ledger.Binary
import Cardano.Slotting.Slot

import Network.TypedProtocol
import Network.TypedProtocol.Codec

import Ouroboros.Consensus.Byron.Ledger.Config
import Ouroboros.Consensus.Cardano
import Ouroboros.Consensus.Cardano.Block
import Ouroboros.Consensus.Ledger.SupportsMempool
import Ouroboros.Consensus.Network.NodeToClient
import Ouroboros.Consensus.Node.NetworkProtocolVersion
import Ouroboros.Consensus.Shelley.Ledger
import Ouroboros.Consensus.Util

import Ouroboros.Network.Block
import Ouroboros.Network.Driver.Simple
import Ouroboros.Network.Magic
import Ouroboros.Network.Mux
import Ouroboros.Network.NodeToClient
import Ouroboros.Network.Protocol.LocalTxMonitor.Client
import qualified Ouroboros.Network.Protocol.LocalTxMonitor.Client as LTM
import Ouroboros.Network.Protocol.LocalTxSubmission.Client
import qualified Ouroboros.Network.Protocol.LocalTxSubmission.Client as LTS

import Ouroboros.Consensus.Protocol.Praos.Translate
    ()
import Ouroboros.Consensus.Shelley.Ledger.SupportsProtocol
    ()

type Block = CardanoBlock StandardCrypto

data NetworkName = Mainnet | PreProd | Preview deriving (Show, Read)

defaultNetwork :: NetworkName
defaultNetwork = Preview

defaultEpochSlots :: EpochSlots
defaultEpochSlots =
    case defaultNetwork of
        Mainnet -> EpochSlots 21600
        PreProd -> EpochSlots 21600
        Preview -> EpochSlots 4320

defaultNetworkMagic :: NetworkMagic
defaultNetworkMagic =
    NetworkMagic $ case defaultNetwork of
        Mainnet -> 764824073
        PreProd -> 1
        Preview -> 2

defaultNodeSocket :: FilePath
defaultNodeSocket = "/tmp/node.socket"

main :: IO ()
main = do
    transaction <- unsafeDecodeTransaction . unsafeDecodeBase16 . Prelude.head <$> getArgs
    let protocols = mkProtocols (mkLocalTxSubmissionClient transaction) (mkLocalTxMonitorClient transaction)
    withIOManager $ \iocp -> connectTo (localSnocket iocp) tracers (versions protocols) defaultNodeSocket
  where
    versions protocols =
        let
            v = NodeToClientV_15
            vData = NodeToClientVersionData defaultNetworkMagic False
         in
            simpleSingletonVersions v vData (protocols v)

    tracers =
        NetworkConnectTracers
            { nctMuxTracer = nullTracer
            , nctHandshakeTracer = nullTracer
            }

mkLocalTxSubmissionClient
    :: GenTx Block
    -> LocalTxSubmissionClient (GenTx Block) (HardForkApplyTxErr (CardanoEras StandardCrypto)) IO ()
mkLocalTxSubmissionClient transaction =
    LocalTxSubmissionClient $ pure $ SendMsgSubmitTx transaction $ \result -> do
        putStrLn $ "Submitted transaction " <> show (txId transaction) <> " -> " <> showResult result
        threadDelay maxBound
        pure $ LTS.SendMsgDone ()
  where
    showResult = \case
        SubmitSuccess -> "Success"
        SubmitFail reason -> "Failed: " <> show reason

mkLocalTxMonitorClient
    :: GenTx Block
    -> LocalTxMonitorClient (GenTxId Block) (GenTx Block) SlotNo IO ()
mkLocalTxMonitorClient transaction =
    LocalTxMonitorClient $ pure $ SendMsgAcquire $ \slot -> do
        putStrLn $ "Acquired mempool snapshot at slot: " <> show slot
        watchMempool []
  where
    watchMempool :: [GenTxId Block] -> IO (ClientStAcquired (GenTxId Block) (GenTx Block) SlotNo IO ())
    watchMempool !txs = pure $ SendMsgNextTx $ \case
        Just tx -> do
            watchMempool (txId tx : txs)
        Nothing -> do
            putStrLn $ "Mempool: " <> show txs
            pure $ if Prelude.null txs then
                SendMsgAwaitAcquire $ \slot -> do
                    putStrLn $ "Acquired mempool snapshot at slot: " <> show slot
                    watchMempool []
            else
                SendMsgHasTx (asTxIdBabbage transaction) $ \has -> do
                    putStrLn $ "Has " <> show (asTxIdBabbage transaction) <> "? " <> show has
                    pure $ SendMsgHasTx (asTxIdAlonzo transaction) $ \has' -> do
                        putStrLn $ "Has " <> show (asTxIdAlonzo transaction) <> "? " <> show has'
                        pure $ SendMsgRelease $ pure $ LTM.SendMsgDone ()

    asTxIdBabbage =
        txId

    asTxIdAlonzo =
        \case
            GenTxIdBabbage (ShelleyTxId i) -> GenTxIdAlonzo (ShelleyTxId i)
            orElse -> orElse
        .
        txId

unsafeDecodeBase16
    :: String
    -> ByteString
unsafeDecodeBase16 =
    either (error . show) id
    . decodeBase16Untyped
    . Text.encodeUtf8
    . Text.pack

unsafeDecodeTransaction
    :: ByteString
    -> GenTx Block
unsafeDecodeTransaction =
    either (error . show) GenTxBabbage
        . decodeFullDecoder'
            (eraProtVerLow @StandardBabbage)
            "Transaction"
            (fromPlainDecoder fromCBOR)
        . Cbor.toStrictByteString
        . wrapCBORinCBOR Cbor.encodePreEncoded

mkProtocols
    :: LocalTxSubmissionClient (GenTx Block) (HardForkApplyTxErr (CardanoEras StandardCrypto)) IO ()
    -> LocalTxMonitorClient (GenTxId Block) (GenTx Block) SlotNo IO ()
    -> NodeToClientVersion
    -> OuroborosApplicationWithMinimalCtx InitiatorMode addr LBS.ByteString IO () Void
mkProtocols localTxSubmissionClient localTxMonitorClient !v =
    NodeToClientProtocols
        { localTxSubmissionProtocol =
            protocol cTxSubmissionCodec (localTxSubmissionClientPeer localTxSubmissionClient)

        , localTxMonitorProtocol =
            protocol cTxMonitorCodec (localTxMonitorClientPeer localTxMonitorClient)

        , localChainSyncProtocol =
            protocol cChainSyncCodec chainSyncPeerNull

        , localStateQueryProtocol =
            protocol cStateQueryCodec localStateQueryPeerNull
        } `nodeToClientProtocols` v
  where
    protocol
        :: forall ps (st :: ps) pr failure bytes addr a.
            ( Show failure
            , forall (st' :: ps). Show (ClientHasAgency st')
            , forall (st' :: ps). Show (ServerHasAgency st')
            , ShowProxy ps
            )
        => (ClientCodecs Block IO -> Codec ps failure IO bytes)
        -> Peer ps pr st IO a
        -> RunMiniProtocolWithMinimalCtx InitiatorMode addr bytes IO a Void
    protocol codec peer =
        InitiatorProtocolOnly $ MiniProtocolCb $ \_ctx channel ->
            runPeer nullTracer (codec codecs) channel peer

    codecs :: ClientCodecs Block IO
    codecs = clientCodecs cfg (supportedVersions ! v) v
      where
        supportedVersions = supportedNodeToClientVersions (Proxy @Block)
        cfg = CardanoCodecConfig
            (let byron   = ByronCodecConfig defaultEpochSlots in byron)
            (let shelley = ShelleyCodecConfig in shelley)
            (let allegra = ShelleyCodecConfig in allegra)
            (let mary    = ShelleyCodecConfig in mary)
            (let alonzo  = ShelleyCodecConfig in alonzo)
            (let babbage = ShelleyCodecConfig in babbage)
            (let conway  = ShelleyCodecConfig in conway)

This is what the script above yields for a valid transaction (with some comments for clarity)

-- A mempool snapshot is acquired immediately. It is empty 
Acquired mempool snapshot at slot: SlotNo 44280567
Mempool: []

-- The transaction is successfully submitted
Submitted transaction HardForkGenTxId {getHardForkGenTxId = S (S (S (S (S (Z (WrapGenTxId {unwrapGenTxId = txid: TxId {unTxId = SafeHash "d7b4f21da71700e7fbbdea322c42f85b6373fecbbf16dab180b854df15503b66"}}))))))} -> Success

-- This causes a new mempool snapshot to become available. Which contains out transaction.
Acquired mempool snapshot at slot: SlotNo 44280567
Mempool: [HardForkGenTxId {getHardForkGenTxId = S (S (S (S (S (Z (WrapGenTxId {unwrapGenTxId = txid: TxId {unTxId = SafeHash "d7b4f21da71700e7fbbdea322c42f85b6373fecbbf16dab180b854df15503b66"}}))))))}]

-- Asking whether the mempool knows the tx id as a "BabbageGenTxId" returns `True`
Has HardForkGenTxId {getHardForkGenTxId = S (S (S (S (S (Z (WrapGenTxId {unwrapGenTxId = txid: TxId {unTxId = SafeHash "d7b4f21da71700e7fbbdea322c42f85b6373fecbbf16dab180b854df15503b66"}}))))))}? True

-- Asking whether the mempool knows *the same* tx id as a "AlonzoGenTxId" returns `False`
Has HardForkGenTxId {getHardForkGenTxId = S (S (S (S (Z (WrapGenTxId {unwrapGenTxId = txid: TxId {unTxId = SafeHash "d7b4f21da71700e7fbbdea322c42f85b6373fecbbf16dab180b854df15503b66"}})))))}? False

Expected behavior

When the transaction is in the mempool, MsgHasTx should return True, regardless of the era in which the TxId is being wrapped / submitted. We know it is in the mempool because it is returned in the next MsgNextTx call.

I imagine that one might say: "But the transaction id isn't guaranteed to be equal and have compatible representation across eras!", yes. Sure. But in effect, it's been the case for ALL eras so far and given that the underlying type is the same and is era-independent, then the wrapping era we choose for TxId shouldn't matter.

This is particularly annoying to deal with as it forces clients to keep track of the current era also in the mempool monitoring protocol (this is already sufficiently annoying with the local-state-query 😬 ...). This is problematic especially around hard forks.

Given that we look for the transaction id by looking at a membership in a set, a possible resolution path could be to derive the Ord instance (and while at it, the Eq too) on the inner type of the TxId type instance, instead of the type itself (i.e. deriving newtype). This way, it wouldn't matter where in the combinator is the underlying type, the equality/comparison will happen on the inner type which holds the era-independent transaction id.

newtype instance TxId (GenTx (HardForkBlock xs)) = HardForkGenTxId {
getHardForkGenTxId :: OneEraGenTxId xs
}
deriving (Eq, Generic, Ord, Show)

System info (please complete the following information):

  • OS Name: MacOS
  • OS Version: 14.4
  • Consensus version:
    • any.ouroboros-consensus ==0.15.0.0,
    • any.ouroboros-consensus-cardano ==0.13.0.0,
    • any.ouroboros-consensus-diffusion ==0.10.0.0,
    • any.ouroboros-consensus-protocol ==0.7.0.0,

Screenshots and attachments
ø

Additional context
ø

@amesgen
Copy link
Member

amesgen commented Mar 20, 2024

Thanks for the excellent description!

I imagine that one might say: "But the transaction id isn't guaranteed to be equal and have compatible representation across eras!", yes. Sure. But in effect, it's been the case for ALL eras so far and given that the underlying type is the same and is era-independent, then the wrapping era we choose for TxId shouldn't matter.

Note that there already is a difference for Byron 😩 vs Shelley-based eras:

data instance TxId (GenTx ByronBlock)
= ByronTxId !Utxo.TxId
| ByronDlgId !Delegation.CertificateId
| ByronUpdateProposalId !Update.UpId
| ByronUpdateVoteId !Update.VoteId
deriving (Eq, Ord)

newtype instance TxId (GenTx (ShelleyBlock proto era)) = ShelleyTxId (SL.TxId (EraCrypto era))


Given that we look for the transaction id by looking at a membership in a set, a possible resolution path could be to derive the Ord instance (and while at it, the Eq too) on the inner type of the TxId type instance, instead of the type itself (i.e. deriving newtype). This way, it wouldn't matter where in the combinator is the underlying type, the equality/comparison will happen on the inner type which holds the era-independent transaction id.

newtype instance TxId (GenTx (HardForkBlock xs)) = HardForkGenTxId {
getHardForkGenTxId :: OneEraGenTxId xs
}
deriving (Eq, Generic, Ord, Show)

Note that OneEraGenTxId is an NS, and the Eq instance of NS f x will always return False if the index of the summand is different (as it only requires All (Eq `Compose` f) xs, so it can't compare things with different indices). In order to do better, one would have to require more constraints on the f/xs.

Ideas:

  • Introduce more HFC machinery allowing to specify for each pair of eras a and b whether a GenTxId a is contained in a Set (GenTxId b) (or sth similar, EDIT like tweaking the Eq/Ord instance (potentially of a newtype) as mentioned below). Seems straightforward, but a bit sad that we need to add more stuff to the abstract HFC "just" for a node-to-client protocol.
    • Refinement: do this check by requiring Typeable (GenTxId blk) constraints for all eras?
  • (feels too ad-hoc) Serialize the GenTxId (without the HFC envelope) and compare that to the serialized other GenTxIds.

@jasagredo
Copy link
Contributor

I imagine that one might say: "But the transaction id isn't guaranteed to be equal and have compatible representation across eras!", yes. Sure. But in effect, it's been the case for ALL eras so far and given that the underlying type is the same and is era-independent, then the wrapping era we choose for TxId shouldn't matter.

Note that there already is a difference for Byron 😩 vs Shelley-based eras:

Actually a TxIn is a pair TxId and TxIx, and it was confirmed with the Ledger team that TxIns will remain the same for all Shelley-based eras (that's why in UTXO-HD we can use the CanonicalTxIn type). So I think that Byron is indeed the exception.

@KtorZ
Copy link
Contributor Author

KtorZ commented Mar 20, 2024

@amesgen (feels too ad-hoc) Serialize the GenTxId (without the HFC envelope) and compare that to the serialized other GenTxIds.

I am not sure why this one feels too ad-hoc 😶 ? It seems like the right thing to do here. The whole wrapper around TxId is already quite arguable IMO.

One other idea that could solve this particular problem (though wouldn't be a holistic solution overall since the same issue may occur with a different HFC-wrapped type), would be to alter the implementation of snapshotHasTx to ensure the set membership happens on the raw transaction id and not the HFC wrapped type. An "easy" way might be to wrap the GenTxId in yet-another-type that provides an Ord instance which ensure that the comparison is done properly.

Fundamentally, there are only 2 cases to distinguish with this TxId as you are already pointing out: Byron, and the others. Starting from Shelley, all TxId correspond to the same type which is only dependent on the crypto. And the Byron one is just a wrapper around a bytestring.

So one could think of a ComparableTxId that would wrap a GenTxId and pull out the inner byte representation. Which is something that can likely be written in a sufficiently generic way that it keeps working with new eras (so long as they keep using TxId).

@amesgen
Copy link
Member

amesgen commented Mar 21, 2024

I am not sure why this one feels too ad-hoc 😶 ?

Just that it is unprincipled and later potentially surprising/confusing to rely on serialization for logic that is not directly related to serialization.


In general, this case feels a bit similar to OneEraHash:

-- | The hash for an era
--
-- This type is special: we don't use an NS here, because the hash by itself
-- should not allow us to differentiate between eras. If it did, the /size/
-- of the hash would necessarily have to increase, and that leads to trouble.
-- So, the type parameter @xs@ here is merely a phantom one, and we just store
-- the underlying raw hash.
newtype OneEraHash (xs :: [k]) = OneEraHash { getOneEraHash :: ShortByteString }
deriving newtype (Eq, Ord, NoThunks, Serialise)

Ideally, we would be able to do the same thing for OneEraGenTxId, then comparisons wouldn't care about the era. But for now, Byron is preventing this, so OneEraGenTxId is instead an n-ary sum. But we could indeed tweak its Eq/Ord instance (or that of a newtype) to hide differences in all post-Byron eras, eg by using new information from CanHardFork or by using the serialization functions.

@KtorZ
Copy link
Contributor Author

KtorZ commented Mar 21, 2024

Byron is preventing this, so OneEraGenTxId is instead an n-ary sum

What about making it only a 2-ary sum? Is that an option with the current type framework? Then at least the problem is less worse; we still don't get compatibility between Byron and post-Shelley eras, but it's arguably better than incompatibility across all eras. Plus, the inconvenience of not being able to compare a Byron transaction id with the rest is very likely not a big issue.

@amesgen
Copy link
Member

amesgen commented Mar 21, 2024

What about making it only a 2-ary sum? Is that an option with the current type framework?

Yeah, that should also be possible; in fact, something similar (replacing an NS by a specific sum type) was done recently in #951 (that targets UTxO HD), there for performance reasons. My intuition is that it might be more invasive than tweaking the Eq/Ord instance, but I haven't followed through with either approach.

@KtorZ
Copy link
Contributor Author

KtorZ commented Mar 21, 2024

I leave it you now and assume you'll update / close this ticket when resolved.

For now, I'll implement a fallback in Ogmios to simply retry HasTx on False with the transaction id wrapped in different eras 😰 ...

@amesgen
Copy link
Member

amesgen commented Mar 25, 2024

The Consensus team discussed this last week:

  • The cleanest solution would be to redefine OneEraGenTxId to wrap a ShortByteString directly (like OneEraHash as mentioned above).

    I had a short look, and it seems feasible to change GenTxId ByronBlock such that it no longer has different constructors, and instead just record a single hash.

    However, this would a breaking change to the N2N/N2C binary format, so it requires a few more dances, so we probably won't do that immediately.

  • A minimal intermediate approach would be to introduce a way to extract the raw hash (say as a ShortByteString) from any GenTxId and change its Eq/Ord instance based on that. I implemented this in Add ConvertRawTxId, use it for comparing OneEraGenTxIds #1017.

@amesgen
Copy link
Member

amesgen commented Mar 27, 2024

With #1017 merged, your usecase should now be supported correctly. You still have to wrap the hash in a specific era, but the choice won't matter anymore. See #1028 for future work making this redundant choice unnecessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants