Randomness service: Improve performance and reliability #372

GabrielMartinezRodriguez · 2025-01-23T13:54:06Z

Linked Issues

Description

This PR aims to improve the performance and reliability of the randomness service, after noticing the following problems:
1 - The 'random()' method in the Random contract was failing in some blocks
2 - We were experiencing multiple drand network errors
3 - The first drand was delayed in submission by about 30 seconds
To fix the first problem, I discovered that it was caused because the drand delay wasn't sufficient to include the drand to be used in the random function in all blocks. For example, if the drand delay is 2, and you are in the block with timestamp 11, you are using the drand for timestamp 9. To include transactions in the block with timestamp 11, you must submit transactions with at least 1 second of margin, meaning you must submit the drand for the block with timestamp 11 at timestamp 11 or later.
This is problematic because we don't receive the block with timestamp 9 exactly at timestamp 9, and simultaneously, we must request drands from the network. The network delay is approximately 300ms, with peaks of 1s. The situation was even worse because we were performing gas estimation for every 'postDrand' transaction, adding about 600ms extra to include the transaction.
These multiple delays were causing us to miss multiple blocks.
To address these issues, this PR adds:

Increased drand delay from 2 to 4
Hardcoded gas limit to avoid estimating gas for every transaction
Increased timeout for the drand network from 500 ms to 1000 ms

With these changes, we fix problems 1 and 2.
To solve problem 3, I found a bug: if there were no reveal transactions, we weren't sending the post drand transaction. This was causing a delay of about 30 seconds after the randomness service starts to reveal commitments that were committed approximately 30 seconds earlier.

Toggle Checklist

Checklist

Basics

B1. I have applied the proper label & proper branch name (e.g. norswap/build-system-caching).
B2. This PR is not so big that it should be split & addresses only one concern.
B3. The PR targets the lowest branch it can (ideally master).

Reminder: PR review guidelines

Correctness

C1. Builds and passes tests.
C2. The code is properly parameterized & compatible with different environments (e.g. local,
testnet, mainnet, standalone wallet, ...).
C3. I have manually tested my changes & connected features.
C4. I have performed a thorough self-review of my code after submitting the PR,
and have updated the code & comments accordingly.

Architecture & Documentation

D1. I made it easy to reason locally about the code, by (1) using proper abstraction boundaries,
(2) commenting these boundaries correctly, (3) adding inline comments for context when needed.
D2. All public-facing APIs & meaningful (non-local) internal APIs are properly documented in code
comments.
D3. If appropriate, the general architecture of the code is documented in a code comment or
in a Markdown document.

cloudflare-workers-and-pages · 2025-01-23T13:54:07Z

Deploying happychain with Cloudflare Pages

Latest commit:	`ca26b17`
Status:	✅ Deploy successful!
Preview URL:	https://8a1f88a4.happychain.pages.dev
Branch Preview URL:	https://graphite-base-404.happychain.pages.dev

View logs

GabrielMartinezRodriguez · 2025-01-23T13:54:48Z

Deploy randomness service #416 : 2 dependent PRs (#434 , #441 )
Add config contract #404
Randomness service: Improve performance and reliability #372 👈 (View in Graphite)
Add transaction submission failure hook and error handling #366
Increase default RPC timeout to 2000ms #365
Add configurable HTTP polling interval for block monitoring #362
Force initialization before using the TXM #361
Script to monitor randomness service #325
Script to easily launch the randomness service locally #324
Submit drand numbers #318
Random oracle contracts #307
Only prune finalized randomness entries #305
Add NewBlock hook #304
Refactors the randomness service to be able to process pending commitments #303
Add SQLite persistence to randomness service #300
Txm no longer receives Viem objects #298 : 1 other dependent PR (#299 )
Add flush mechanism #289 : 1 other dependent PR (#297 )
Allow run multiple txm simultaneously within the same service #285
Added docs for txm #272
Mutex in tx monitor #264
Purge transactions #232
Txm hooks #229
Use kysely instead of mikro-orm #250
master

This stack of pull requests is managed by Graphite. Learn more about stacking.

linear · 2025-01-23T14:10:55Z

HAPPY-311 Investigate and resolve why the first Drand submission is delayed by about 30 seconds after initializing the randomness service. Also, address the Drand network error

linear · 2025-01-23T14:11:03Z

HAPPY-310 Debug and fix why the `random()` function is currently throwing an error

linear · 2025-01-23T14:13:04Z

HAPPY-268 Evaluate gas usage for the postDrand method

not-reed · 2025-01-23T14:44:10Z

packages/contracts/src/randomness/Random.sol

    * will be posted on a block with timestamp T (even if such a block exists) because of network delays.
    */
-    uint256 public constant DRAND_DELAY = 2;
+    uint256 public constant DRAND_DELAY = 4;


i always forget block.timestamp is in seconds, not ms 😄

this needs 4s, to be 2 block delay?

I don't full understand how this works, but would it be possible to call randomForTimestamp and getRevealedValue to 'frontrun' someone else calling random() by 1 block then? effectively seeing the future slightly

Yes, it was a bit confusing. We changed the name of all the time-related variables to indicate whether they are in seconds or milliseconds. Now, this variable is called DRAND_DELAY_SECONDS.

Regarding "frontrunning," yes, you could know the drand value that we are going to use for random(), but you will never know the revealed value that we mix with the drand value. We emit the reveal transaction as soon as we receive the previous block. For example, if we are targeting the reveal of block N, we send the reveal transaction as soon as we receive block N-1.

Additionally, we have modified the node in this PR: HappyChainDevs/op-geth#1 to force the node to always execute the first transaction of the block—actually, the second, since the first is the system transaction "setL1BlockValuesEcotone".

With this, you can never know the output of random() before it has already been executed.

@not-reed The delay's goal is that the "canonical" drand value for block T=X is the latest Drand at X-DRAND_DELAY. That's because sometimes there is a block at X and a Drand value at X, and in those cases we cannot post the Drand value to the chain ontime. There are also issues due to needing to hit submit 1s before the block, network latency to get the Drand value etc...

That's why we had to increase this value: it turned at time T, it would sometimes take 1s to get the Drand value, and with the need to submit enough in advance, we'd sometimes miss the block.

Regarding frontrunning, I don't know I would frame it those terms — but anyone knows the Drand T at time T (+ network delay). So people will always know the Drand value before it hits the chain (we have to know it to post it).

Drand is still useful for two things:

commitments to future randomness, if you commit enough in advance (we have this defined in the contracts), then nobody will know the drand value at that time

removing our ability to influence randomness: synchronous randomness if formed by taking the Drand value + a random value to which we committed before the Drand value was knowns — that way, there's absolutely no way we can influence the randomness

we will still know the value before anyone else, so there is trust is we don't share it

not-reed · 2025-01-23T14:46:58Z

packages/randomness-service/src/DrandService.ts


        const url = `${env.EVM_DRAND_URL}/rounds/${round}`
-        const response = await ResultAsync.fromPromise(fetchWithRetry(url, {}, 2, 500), unknownToError)
+        const response = await ResultAsync.fromPromise(fetchWithRetry(url, {}, 2, 1000), unknownToError)


1000 is the delay?

block 1 at ts 0 => tries and fails
block 1 at ts 1000 => tries and fails
block 2 at ts 2000 => tries and succeeds

is it a problem that the final try if everything fails is guaranteed to be next block now?

We have to submit the drand value at most 4 seconds after it is generated because, after that moment, it will be used by "random()". So, that is not a problem

apps/randomness/src/CustomGasEstimator.ts

apps/randomness/src/env.ts

… local start script

GabrielMartinezRodriguez mentioned this pull request Jan 23, 2025

Add transaction submission failure hook and error handling #366

Merged

10 tasks

GabrielMartinezRodriguez changed the title ~~feat(randomness): increase Drand delay and improve service reliability~~ Improve performance and reabilit Jan 23, 2025

GabrielMartinezRodriguez marked this pull request as ready for review January 23, 2025 14:08

GabrielMartinezRodriguez changed the title ~~Improve performance and reabilit~~ Improve performance and Reliability Jan 23, 2025

GabrielMartinezRodriguez changed the title ~~Improve performance and Reliability~~ Improve performance and reliability Jan 23, 2025

GabrielMartinezRodriguez changed the title ~~Improve performance and reliability~~ Randomness service: Improve performance and reliability Jan 23, 2025

GabrielMartinezRodriguez self-assigned this Jan 23, 2025

GabrielMartinezRodriguez added the reviewing-1 Ready for, or undergoing first-line review label Jan 23, 2025

GabrielMartinezRodriguez force-pushed the gabriel/txm-submission-failed-hook branch from a98fbb3 to 82a0cb6 Compare January 23, 2025 14:45

GabrielMartinezRodriguez force-pushed the gabriel/improve-performance-and-reability-randomness branch from 62ee648 to 53e3fb6 Compare January 23, 2025 14:46

not-reed reviewed Jan 23, 2025

View reviewed changes

GabrielMartinezRodriguez force-pushed the gabriel/txm-submission-failed-hook branch from 82a0cb6 to 98067ed Compare January 31, 2025 14:44

GabrielMartinezRodriguez force-pushed the gabriel/improve-performance-and-reability-randomness branch from 53e3fb6 to 6e8d5d5 Compare January 31, 2025 14:44

GabrielMartinezRodriguez mentioned this pull request Feb 3, 2025

Add config contract #404

Merged

10 tasks

norswap force-pushed the gabriel/txm-submission-failed-hook branch from f5a61b6 to 05dc512 Compare February 10, 2025 20:23

GabrielMartinezRodriguez force-pushed the gabriel/txm-submission-failed-hook branch from 05dc512 to b9ba93f Compare February 10, 2025 20:33

Base automatically changed from gabriel/txm-submission-failed-hook to master February 10, 2025 20:43

GabrielMartinezRodriguez force-pushed the gabriel/improve-performance-and-reability-randomness branch from 82297e0 to c2dee4f Compare February 11, 2025 14:28

GabrielMartinezRodriguez mentioned this pull request Feb 11, 2025

Added Interrupted status to transaction entity #434

Merged

11 tasks

GabrielMartinezRodriguez requested a review from norswap February 12, 2025 12:12

GabrielMartinezRodriguez added reviewing-2 Ready for, or undergoing final review and removed updating Updating after review labels Feb 12, 2025

GabrielMartinezRodriguez mentioned this pull request Feb 12, 2025

debug #435

Closed

11 tasks

aodhgan reviewed Feb 13, 2025

View reviewed changes

apps/randomness/src/CustomGasEstimator.ts Show resolved Hide resolved

aodhgan reviewed Feb 13, 2025

View reviewed changes

apps/randomness/src/env.ts Show resolved Hide resolved

aodhgan added question Something has to be cleared up after review and removed reviewing-2 Ready for, or undergoing final review labels Feb 13, 2025

GabrielMartinezRodriguez added 6 commits February 14, 2025 11:14

feat(tandomnness): improve performance and reability

7b0cace

chore(randomness): move drand logic from index to drand service

44e880c

chore(randomness): handle drand before interval

bbf1d25

chore(randomness): await for drand service

e0bc253

chore: changed gas limit

7ef10f9

chore(randomness): added drand margin to .env.example and remove form…

ca26b17

… local start script

GabrielMartinezRodriguez added reviewing-2 Ready for, or undergoing final review and removed question Something has to be cleared up after review labels Feb 14, 2025

GabrielMartinezRodriguez requested a review from aodhgan February 14, 2025 10:25

GabrielMartinezRodriguez force-pushed the gabriel/improve-performance-and-reability-randomness branch from 4ac15e2 to ca26b17 Compare February 14, 2025 10:28

GabrielMartinezRodriguez mentioned this pull request Feb 14, 2025

Implemented block inactivity timeout #441

Merged

11 tasks

norswap approved these changes Feb 15, 2025

View reviewed changes

norswap merged commit 993c12c into master Feb 15, 2025
3 checks passed

norswap deleted the gabriel/improve-performance-and-reability-randomness branch February 15, 2025 03:30

GabrielMartinezRodriguez restored the gabriel/improve-performance-and-reability-randomness branch February 18, 2025 17:26

GabrielMartinezRodriguez mentioned this pull request Feb 25, 2025

Fix: random() reverts once every 3 blocks #458

Merged

11 tasks

norswap deleted the gabriel/improve-performance-and-reability-randomness branch July 24, 2025 15:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Randomness service: Improve performance and reliability #372

Randomness service: Improve performance and reliability #372

Uh oh!

GabrielMartinezRodriguez commented Jan 23, 2025 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages bot commented Jan 23, 2025 •

edited

Loading

Uh oh!

GabrielMartinezRodriguez commented Jan 23, 2025 •

edited

Loading

Uh oh!

linear bot commented Jan 23, 2025

Uh oh!

linear bot commented Jan 23, 2025

Uh oh!

linear bot commented Jan 23, 2025

Uh oh!

not-reed Jan 23, 2025

Uh oh!

GabrielMartinezRodriguez Feb 11, 2025

Uh oh!

norswap Feb 15, 2025

Uh oh!

not-reed Jan 23, 2025

Uh oh!

GabrielMartinezRodriguez Feb 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Randomness service: Improve performance and reliability #372

Randomness service: Improve performance and reliability #372

Uh oh!

Conversation

GabrielMartinezRodriguez commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Linked Issues

Description

Checklist

Basics

Correctness

Architecture & Documentation

Uh oh!

cloudflare-workers-and-pages bot commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying happychain with Cloudflare Pages

Uh oh!

GabrielMartinezRodriguez commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linear bot commented Jan 23, 2025

Uh oh!

linear bot commented Jan 23, 2025

Uh oh!

linear bot commented Jan 23, 2025

Uh oh!

not-reed Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

GabrielMartinezRodriguez Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

norswap Feb 15, 2025

Choose a reason for hiding this comment

Uh oh!

not-reed Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

GabrielMartinezRodriguez Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

GabrielMartinezRodriguez commented Jan 23, 2025 •

edited

Loading

cloudflare-workers-and-pages bot commented Jan 23, 2025 •

edited

Loading

GabrielMartinezRodriguez commented Jan 23, 2025 •

edited

Loading