Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change remaining contestation period to deadline #483

Merged

Conversation

ch1bo
Copy link
Collaborator

@ch1bo ch1bo commented Sep 8, 2022

🍁 TLDR: Addresses a couple FIXMEs in the realm of knowing "when we can fanout". We took short-cuts when computing a remaining contestation period and also using the Delay to know when to send a ReadyToFanout message to our clients. Unfortunately this output was not very reliable as ADR20 cares to explain.

🍁 Avoid workarounds of setting remainingContestatioPeriod to 0 in Direct.State, but then filling it in Direct.Handlers modules by "just use the contestationDeadline". This ripples through the system as it we change the OnCloseTx event etc.

🍁 Instead of continuing the workaround via Delay effects, this PR bites the bullet and introduces a Tick event to communicate time from the Chain to the Logic layer. This made Delay redundant and we can remove that wart as well.

🍁 Reworked the TimeHandle functions as I was starting to use UTCTime everywhere. This results in much less import Plutus.V2.Ledger.Api all over the place.

🍁 With these regular Tick events, we can send ReadyToFanout when we really know that the deadline has passed and a Fanout input will be able to construct the fanoutTx.

🍁 Update clients and fix test suites to use the contestationDeadline to wait more robustly on ReadyToFanout.

🍁 Also needed to work around some issues with IOSim not bailing on us anymore (which we even relied on in the ModelSpec)

🍁 Re-generated golden files and updated json specs round off the picture and make this a HUGE PR.. I'm sorry, but it was fun and I think this is needed 😅

This should solve all our FIXMEs but has a bit of a rats tail:
- API and clients need to change
- Need to Delay "until" the deadline
    - Either change Delay effect semantics (or introduce a new one),
    - or bite the bullet and observe "blockchain time"
This makes it trivial to compare the contestation deadline with the
latest observed time (from the chain layer).
Usually we would reach for a with pattern now, but the Model based tests
can't deal with that scoping. So let's start with a thread handle and
add some cleanup functions next.
This required to add some information about whether it was sent already
in the state (readyToFanoutSent). Furthermore, to be able to re-use the
previous state value in that head logic step, factoring out a
ClosedState was necessary.
Seems like this is failing now as IOSim cannot detect a deadlock with
the new tick thread. Unless we find a way to do this, we need to drop
this test now.
This is necessary as we cannot detect deadlock (impossible progress)
anymore within our IOSim-based simulation of the chain. See the
currently failing sanity test in BehaviorSpec.
This is not possible anymore as the hydra node is continuously fed with
Tick events now and may be responding eventually. So there is no
deadlock to detect and we cannot early exit simulation. Unfortunate as
it makes tests slower, but it is a more accurate simulation.
Also updated the e2e scenario to reflect the semantics accordingly. This
is something which should also go into the API -> how can we not
duplicate our docs?
This is necessary as we have not the right constraints on re-querying
things from the node in the actual handling of chain sync events.
Besides.. when to re-query things like protocol parameters? Likely not
on each block?
This simplifies a couple of things:

  * No need to inspect the (chain) state when posting fanout as the
    deadline is already in the HeadLogic / FanoutTx

  * Using UTCTime does not lose precision when converting forth and back
    with slots (using the EpochInfo / History APIs)

  * We do not need to query protocol parameters to convert as we are
    only working with protocol versions not requiring the time
    conversion "backport" (hack).
These two tests do not make any assertions on whether `Tick` events are
yielded or not, so we simply ignore them in the `callback.`
The property tests highlight that `POSIXTime -> UTCTime -> POSIXTime` is
losless, while `UTCTime -> POSIXTime -> UTCTime` looses precision.
We do not care about the exact number here and it different now because
of the 'Tick' events.
This is required to compute the remaining time as before, but this time
in the client.
Although this is only for informational purposes, it makes the
instrumentation in the e2e integration scenarios much simpler and no
computation on when to expect the `ReadyToFanout` is required.
This makes it possible to only render the "[f]anout" when actually
possible and makes it more clear that the deadline passed, but it's not
yet ready.
This is now possible as the only usage for this was the
'ShouldPostFanout'.
@github-actions
Copy link

github-actions bot commented Sep 8, 2022

Transactions Costs

Sizes and execution budgets for Hydra protocol transactions. Note that unlisted parameters are currently using arbitrary values and results are not fully deterministic and comparable to previous runs.

Metadata
Generated at 2022-09-09 15:45:22.892550454 UTC
Max. memory units 14000000
Max. CPU units 10000000000
Max. tx size (kB) 16384

Cost of Init Transaction

Parties Tx size % max Mem % max CPU Min fee ₳
1 4833 11.69 4.65 0.49
2 5038 13.85 5.49 0.52
3 5242 15.58 6.15 0.55
5 5652 18.53 7.26 0.60
10 6677 29.26 11.41 0.76
45 13854 98.37 38.04 1.83

Cost of Commit Transaction

Currently only one UTxO per commit allowed (this is about to change soon)

UTxO Tx size % max Mem % max CPU Min fee ₳
1 5771 19.70 7.94 0.62

Cost of CollectCom Transaction

Parties Tx size % max Mem % max CPU Min fee ₳
1 13058 20.77 8.32 0.95
2 13380 36.59 14.81 1.14
3 13702 55.11 22.49 1.36
4 14092 77.96 31.97 1.63

Cost of Close Transaction

Parties Tx size % max Mem % max CPU Min fee ₳
1 9327 8.19 3.36 0.65
2 9522 9.41 3.98 0.67
3 9657 9.75 4.27 0.68
5 10020 11.70 5.32 0.72
10 10842 15.60 7.58 0.81
30 14218 32.44 17.09 1.16
66 15801 39.10 14.80 1.27

Cost of Contest Transaction

Parties Tx size % max Mem % max CPU Min fee ₳
1 9320 8.16 3.34 0.65
2 9527 9.38 3.96 0.67
3 9683 10.16 4.42 0.69
5 9991 11.28 5.15 0.72
10 10853 15.81 7.65 0.81
30 14188 31.98 16.90 1.15
43 16331 41.97 22.71 1.37

Cost of Abort Transaction

Some variation because of random mixture of still initial and already committed outputs.

Parties Tx size % max Mem % max CPU Min fee ₳
1 13495 22.50 9.35 0.99
2 13604 34.02 14.28 1.13
3 13927 52.63 22.47 1.35
4 14463 83.18 37.18 1.73
5 14506 99.20 43.43 1.90

Cost of FanOut Transaction

Involves spending head output and burning head tokens. Uses ada-only UTxO for better comparability.

UTxO Tx size % max Mem % max CPU Min fee ₳
1 13484 10.85 4.70 0.87
2 13455 11.40 5.19 0.87
3 13491 12.92 6.07 0.89
5 13628 16.74 8.14 0.94
10 13739 23.57 12.23 1.03
50 15245 84.88 47.60 1.85
60 15535 99.74 56.26 2.05

@ch1bo
Copy link
Collaborator Author

ch1bo commented Sep 8, 2022

TODOs on master

Total (153)

  • TODO (84)
  • FIXME (35)
  • HACK (1)
  • REVIEW (14)
  • XXX (19)

TODOs on branch

Total (146)

  • TODO (82)
  • FIXME (29)
  • HACK (1)
  • REVIEW (13)
  • XXX (21)

@github-actions
Copy link

github-actions bot commented Sep 8, 2022

Unit Test Results

251 tests  +3   245 ✔️ +3   14m 33s ⏱️ +53s
  91 suites +2       6 💤 ±0 
    5 files   ±0       0 ±0 

Results for commit 719b0cd. ± Comparison against base commit 93b6daf.

This pull request removes 7 and adds 10 tests. Note that renamed tests count towards both.
Hydra.Behavior/Behavior of one ore more hydra nodes/Sanity tests of test suite ‑ does detect when no responses are sent
Hydra.Chain.Direct.State/ChainSyncHandler ‑ can replay chain on (benign) rollback
Hydra.Chain.Direct.State/ChainSyncHandler ‑ yields observed transactions rolling forward
Hydra.HeadLogic/Coordinated Head Protocol ‑ any node should post FanoutTx when observing on-chain CloseTx
Hydra.HeadLogic/Event/JSON encoding of (Event SimpleTx) ‑ allows to encode values with aeson and read them back
Hydra.HeadLogic/Event/JSON encoding of (Event SimpleTx) ‑ produces the same JSON as is found in golden/Event SimpleTx.json
Hydra.Ledger.Cardano/Evaluate helpers ‑ slotNoFromPOSIXTime . slotNoToPOSIXTime === id
Hydra.Chain.Direct.Handlers ‑ can replay chain on (benign) rollback
Hydra.Chain.Direct.Handlers ‑ roll forward results in Tick events
Hydra.Chain.Direct.Handlers ‑ yields observed transactions rolling forward
Hydra.Chain.Direct.TimeHandle ‑ can roundtrip currentPointInTime
Hydra.HeadLogic/Coordinated Head Protocol ‑ notify user on head closing and when passing the contestation deadline
Hydra.HeadLogic/Types/JSON encoding of (Event SimpleTx) ‑ allows to encode values with aeson and read them back
Hydra.HeadLogic/Types/JSON encoding of (Event SimpleTx) ‑ produces the same JSON as is found in golden/Event SimpleTx.json
Hydra.HeadLogic/Types/JSON encoding of (HeadState SimpleTx) ‑ allows to encode values with aeson and read them back
Hydra.HeadLogic/Types/JSON encoding of (HeadState SimpleTx) ‑ produces the same JSON as is found in golden/HeadState SimpleTx.json
Hydra.Ledger.Cardano/Evaluate helpers ‑ slotNoFromUTCTime . slotNoToUTCTime === id

♻️ This comment has been updated with latest results.

Performing the Model in ModelSpec still does, but that will change on
another branch as well, so holding off on this.
This trades a TODO for an XXX as the generators are getting more and
more messy.
+ Clients can now rely on `ReadyToFanout`, such that sending a `Fanout` input after seeing this output will never be "too early".
+ The `HeadIsClosed` server output now contains the deadline instead of the remaining time.
+ See `hydra-tui` for an example how to use the `contestationDeadline` and `ReadyToFanout`.
+ See [ADR20](./docs/adr/2022-08-02_020-handling-time.md) for details and the rationale.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

simulateTicks nodes = forever $ do
threadDelay blockTime
now <- getCurrentTime
readTVarIO nodes >>= \ns -> mapM_ (`handleChainEvent` Tick now) ns
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

NewState
-- XXX: Requires -Wno-incomplete-record-updates. Should refactor
-- 'HeadState' to hold individual 'ClosedState' etc. types
(cst{readyToFanoutSent = True})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or simply write: ClosedState{contestationDeadline, readyToFanoutSent = True} 😅 ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it's some more fields

  | ClosedState
      { parameters :: HeadParameters
      , confirmedSnapshot :: ConfirmedSnapshot tx
      , previousRecoverableState :: HeadState tx
      , contestationDeadline :: UTCTime
      , -- | Tracks whether we have informed clients already about being
        -- 'ReadyToFanout'.
        readyToFanoutSent :: Bool
      }

and I chose to disable the warning instead of the field noise

{ currentPointInTime = do
currentSlotNo <- left show $ utcTimeToSlot now
pt <- slotToUTCTime currentSlotNo
pure (currentSlotNo, pt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this now, I find this function's name confusing. The current suggests that this is an effectful function that will get the current time, but it merely spits back whatever time it was when the handle was created.

Perhaps something like: lastKnownPointInTime?

slotToUTCTime slot =
-- NOTE: We are not using the Ledger.slotToPOSIXTime as we do not need the
-- workaround for past protocol versions. Hence, we also not need the
-- protocol parameters for this conversion.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

-- time. We tried passing the current time from the caller but given the current machinery
-- around `observeSomeTx` this is actually not straightforward and quite ugly.
let event = OnCloseTx{snapshotNumber, remainingContestationPeriod = 0}
let ClosedThreadOutput{closedContestationDeadline} = threadOutput
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

| ReceivedTxs {onChainTxs :: [OnChainTx Tx], receivedTxs :: [Ledger.TxId StandardCrypto]}
| RolledForward {point :: SomePoint}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixed feelings about that addition, especially if we start using the chain-sync protocol more / in replacement of the state-query. We don't want to be dumping 1000 log messages per seconds while fast syncing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are logging all ReceivedTxs on 0.7.0! We have been trimming that down to only log TxId on master and I thought I add this as we are not logging chain points anywhere and not always log ReceivedTxs, so I hoped that RolledForward would be a consistent point of reference in the logs and helpful to find values for --start-chain-from :)

(CC @ffakenz we just spoke about where to find those.. here would also be a way)

hydra-node/src/Hydra/Chain/Direct/Handlers.hs Outdated Show resolved Hide resolved
@@ -196,8 +196,15 @@ withDirectChain tracer networkId iocp socketPath keyPair party cardanoKeys point
(submitTx queue)
)
( handle onIOException $ do
-- NOTE: We can't re-query the time handle while the
-- 'chainSyncHandler' is running due to constraints. So this will use
-- always these initial parameters (as queried) for time conversions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: We can't re-query the time handle while the 'chainSyncHandler' is running due to constraints.

What 😅 ? Can you clarify what you mean by that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Querying things from the node involves quite some IO functions. However, the whole purpose of the ChainSyncHandler abstraction was to decouple the code for testability and not tie ourselves to IO, but allow running the handlers in IOSim. Hence we only have MonadDelay, MonadTime, etc. constraints on them. Also, I don't think that MonadIO IOSim is a thing.

Then, thinking about this.. I don't think we would want to re-query stuff from the cardano-node on each received block?

  • SystemStart will never change -> ez
  • EraHistory does change / extend in-between hard-forks? -> I have no idea.

Long story short, we query on start and this likely means we won't be able to cross hard-forks with changing slot lengths because of this.. but YAGNI?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EraHistory does change / extend in-between hard-forks? -> I have no idea.

It does, every era adds a new portion to it. But more importantly, the last era in the history has an upper bound which is at k blocks after the tip. Which means that, it's already obsolete by the moment you've fetched it. As you time goes by, your visible future shorten (how poetic) until the moment when the upper bound of your era history is now in the past and any slot conversion will fail.

There are two ways around it.

  1. We can unsafeExtendSafeZone to infinity which is basically the same as saying "we assume that all future eras will have the same protocol parameters w.r.t slot arithmetic". Possibly okay-ish if we also allow to restart a hydra-node without loosing a head (such that, in case the parameters in a next hard-fork are bound to change, we can ship an update of the hydra-node which can accommodate this).

  2. We fetch a new EraHistory everytime we need to calculate time. I imagine that defining a MonadEraHistory wouldn't be too complicated, with an IOSim instance that returns an history with no upper bound for testing, and runs in plain IO for production.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would mean that the current interpreter will fail after using for longer than the safe zone .. should validate that, but a MonadTimeHandle which could be used to mock queryTimeHandle out for IOSim is a good idea.

When this happens, we don't expect that chain sync handler to continue
to work. In fact, the whole process might be ill-fated and requires a
restart. So this is really an exceptional case (and can't be handled
right now).
@v0d1ch v0d1ch mentioned this pull request Sep 12, 2022
@abailly-iohk
Copy link
Contributor

I did not understand the issue raised by @KtorZ but we managed to clarify this with @ffakenz.
Can we agree the PR is good to merge in its current state but there should be a follow-up PR addressing the specific concern of EraHistory upper bound?

@KtorZ
Copy link
Collaborator

KtorZ commented Sep 13, 2022

@abailly-iohk: Can we agree the PR is good to merge in its current state but there should be a follow-up PR addressing the specific concern of EraHistory upper bound

I'd at least have one of my other comment addressed before this get merged Nevermind, Sebastian did it in 719b0cd.

So, good for me if and only if we record the work left to be done in an issue and it gets addressed promptly.

@abailly-iohk abailly-iohk merged commit a97f8b3 into master Sep 13, 2022
@abailly-iohk abailly-iohk deleted the ch1bo/change-remaining-contestation-period-to-deadline branch September 13, 2022 11:23
@ch1bo ch1bo mentioned this pull request Oct 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants