Skip to content

test(kt-devnet): add batcher failover kurtosis test#33

Merged
samlaf merged 14 commits intofeat--op-batcher-altda-failover-to-ethdafrom
test--add-batcher-failover-kurtosis-test
Mar 4, 2025
Merged

test(kt-devnet): add batcher failover kurtosis test#33
samlaf merged 14 commits intofeat--op-batcher-altda-failover-to-ethdafrom
test--add-batcher-failover-kurtosis-test

Conversation

@samlaf
Copy link
Copy Markdown

@samlaf samlaf commented Feb 26, 2025

Closes DAINT-303

This PR adds a golang failover test which uses the kurtosis devnet as backend.
This test is used to test the new batcher failover behavior from #34.
We will merge this test into that PR, and then that PR into eigenda-develop (our master branch).

Note to reviewers

Please be very harsh on the golang test, as this same framework will be reused to add new tests for future features.

The "proper" way to implement this would probably have been to reuse op's op-e2e framework, but create a way to populate this system (equivalent to our harness in this PR) from a kurtosis backend. This would have taken me a lot more time to figure out however, and I feel like its something that the OP team might create at some point, so prefer to let them put in the work there and figure out how to do it properly, and we can potentially move to using that at a future point.

@samlaf samlaf marked this pull request as draft February 26, 2025 18:12
@samlaf samlaf force-pushed the feat--op-batcher-altda-failover-to-ethda branch from c548b50 to 2a78e16 Compare February 26, 2025 18:20
@samlaf samlaf force-pushed the test--add-batcher-failover-kurtosis-test branch from 360b1f6 to c52d9f2 Compare February 26, 2025 18:24
@samlaf samlaf marked this pull request as ready for review February 26, 2025 19:58
}

// Fetches all the batch-inbox posted commitments from blockNum (inclusive) to current block.
func fetchBatcherTxsSinceBlock(gethL1Endpoint string, batchInbox string, blockNum uint64) ([]BatcherTx, error) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there no constructions that are already exist in state derivation pipeline that can be leveraged for reading batcher txs instead?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added note in 6111c52

Basically this feels easier for now. If we ever need to manage more complex state (like failing over to 4844 txs), then prob would be good to consider using L1Retriever instead.

// We assume that this enclave is already running.
const enclaveName = "eigenda-memstore-devnet"

// TestFailover tests the failover behavior of the batcher, in response to the proxy returning 503 errors.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a comment describing the parallelism limitations?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// The test then toggles the failover back off and checks that the batcher starts submitting EigenDA batches again.
// The batches inbox transactions are queried via geth's GraphQL API.
// TODO: We will also need to test the failover behavior of the node, which currently doesn't finalize after failover (fixed in https://github.com/Layr-Labs/optimism/pull/23)
func TestFailover(t *testing.T) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will we eventually wanna add tests ensuring failover to?

  • keccak commitment mode
  • 4844

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keccak yes probably, I'll add a comment. 4844 I don't know... it's hard to implement and not a priority for me.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// 1. Check that the original commitments are EigenDA
harness.requireBatcherTxsToBeFromLayer(t, sinceBlock, DALayerEigenDA)

// 2. Failover and check that the commitments are now EthDA
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EthDA is just calldata here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

})

// assume kurtosis is running and is at least at block 10 (just deploying the contracts takes more than 10 blocks)
require.GreaterOrEqual(t, harness.testStartL1BlockNum, uint64(10), "Test started too early in the chain")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magic number 10 is used in a lot of places - consider using a variable

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout. Hardcoded it and updated the graphql query functions to use both a FROM and a TO block. This then makes the semantics of the test much easier to understand, so I was able to document them more easily: 0dba0f2

// Test Harness, which contains all the state needed to run the tests.
// harness also defines some higher-level "require" methods that are used in the tests.
type harness struct {
enclaveCtx *enclaves.EnclaveContext
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not currently used. is it planned for future use, or just there as a template for future tests that might need it?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, added just in case, but agree its cleaner to remove. Done: 2277e3a

if batcherTx.daLayer != expectedLayer {
wrongCommitmentsToDiscard++
}
// as soon as we see 3 ethDA commitments, or an EigenDA commitment, we stop
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc seems to describe a previous version of the code which wasn't generalized to expectedLayer

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Get the kurtosis tag
tag := field.Tag.Get("kurtosis")
if tag == "" {
continue // Skip fields without tags
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the reason for fields to exist without the tag? Is the idea that you'd add kurtosis fields to production structs?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None, good catch. This was claude generated. I updated to return error instead: 80fcebf

// NewServiceEndpoint string `kurtosis:"new-service-name,port-name"`
}

func getEndpointsFromKurtosis(enclaveCtx *enclaves.EnclaveContext) (*EnclaveServiceEndpoints, error) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A doc here would be helpful. It looks like this method replaces the service name in the EnclaveServiceEndpoints struct with a valid localhost endpoint?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super good callout thanks, I even had to figure out for myself why I was using the localhost endpoints instead of the private ip:port endpoints which are more efficient. Documented in 3065622

// Update the proxy's memstore config to start returning 503 errors
// Note: we have to GetConfig, update it and then UpdateConfig because the client doesn't implement a "patch" method,
//
// even though the API does support it.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: weird formatting here, with newline and tab mid-sentence

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

samlaf added 6 commits March 3, 2025 11:12
Was not being used and I had put it there only "in case"
Define l1BlocksQueriedForBatcherTxs and better describe how tests use it
Also refactored graphql query to use both FROM and TO blocks, not only FROM. This makes the semantics of the tests easier to understand.
…ServiceEndpoints field doesn't have kurtosis tag
…ndpointsFromKurtosis

Also better document the function to explain what its doing
@samlaf samlaf requested a review from litt3 March 3, 2025 18:55
@samlaf samlaf requested a review from ethenotethan March 3, 2025 20:04
Copy link
Copy Markdown

@ethenotethan ethenotethan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🛳️

@samlaf samlaf merged commit 7126821 into feat--op-batcher-altda-failover-to-ethda Mar 4, 2025
7 checks passed
@samlaf samlaf deleted the test--add-batcher-failover-kurtosis-test branch March 4, 2025 15:51
samlaf added a commit that referenced this pull request Mar 5, 2025
* test(altda): add test for altda->ethda failover

* feat(batcher): altda->ethda failover when altda is down

* chore: fix typos

* fix(fakeDAServer): handlePut was still handling put when in failover mode

* Update op-batcher/batcher/driver.go

Co-authored-by: Gaston Ponti <pontigaston@gmail.com>

* chore: better logs in batcher

* test(kt-devnet): add batcher failover kurtosis test (#33)

* test(kt-devnet): add batcher failover test

* chore(kt-devnet): use new proxy image v1.6.5 which has memstore rest routes

* test(kt-devnet): fix TestFailover + add comments/logs

* ci(kt-devnet): run new kurtosis failover test in ci

* test(kt-devnet): remove enclaveCtx from test harness

Was not being used and I had put it there only "in case"

* test(kt-devnet): refactor failover test to not use hardcoded 10 constant

Define l1BlocksQueriedForBatcherTxs and better describe how tests use it
Also refactored graphql query to use both FROM and TO blocks, not only FROM. This makes the semantics of the tests easier to understand.

* test(kt-devnet): better comments for requireBatcherTxsToBeFromLayer function

* teest(kt-devnet): return err from getEndpointsFromKurtosis if EnclaveServiceEndpoints field doesn't have kurtosis tag

* docs(kt-failover-test): rename getEndpointsFromKurtosis -> getPublicEndpointsFromKurtosis

Also better document the function to explain what its doing

* style(kt-devnet): reformat a weirdly tabbed comment

* docs(kt-failover-test): describe why we use graphql api insead of l1retriever api

* docs(kt-failover-test): add note mentioning that kt tests need to be run sequentially

* docs(kt-failover-test): update test name to TestFailoverToEthDACalldata

This is done to reflect the fact that batcher currently doesn't support failing over to calldata.

* style: fix lint

* docs(e2eutils): document returned values for WaitForBlockWithTxFromSender

* docs(e2eutils): add doc comment for TransactionsBySender

* docs(op-e2e): remove wrong comment in failover_test

* docs(op-e2e): better test comment in failover_test

* style(op-e2e): merged 2 if statements into one

* style: fix comment typo

---------

Co-authored-by: Gaston Ponti <pontigaston@gmail.com>
samlaf added a commit that referenced this pull request Mar 7, 2025
* test(altda): add test for altda->ethda failover

* feat(batcher): altda->ethda failover when altda is down

* chore: fix typos

* fix(fakeDAServer): handlePut was still handling put when in failover mode

* Update op-batcher/batcher/driver.go

Co-authored-by: Gaston Ponti <pontigaston@gmail.com>

* chore: better logs in batcher

* test(kt-devnet): add batcher failover kurtosis test (#33)

* test(kt-devnet): add batcher failover test

* chore(kt-devnet): use new proxy image v1.6.5 which has memstore rest routes

* test(kt-devnet): fix TestFailover + add comments/logs

* ci(kt-devnet): run new kurtosis failover test in ci

* test(kt-devnet): remove enclaveCtx from test harness

Was not being used and I had put it there only "in case"

* test(kt-devnet): refactor failover test to not use hardcoded 10 constant

Define l1BlocksQueriedForBatcherTxs and better describe how tests use it
Also refactored graphql query to use both FROM and TO blocks, not only FROM. This makes the semantics of the tests easier to understand.

* test(kt-devnet): better comments for requireBatcherTxsToBeFromLayer function

* teest(kt-devnet): return err from getEndpointsFromKurtosis if EnclaveServiceEndpoints field doesn't have kurtosis tag

* docs(kt-failover-test): rename getEndpointsFromKurtosis -> getPublicEndpointsFromKurtosis

Also better document the function to explain what its doing

* style(kt-devnet): reformat a weirdly tabbed comment

* docs(kt-failover-test): describe why we use graphql api insead of l1retriever api

* docs(kt-failover-test): add note mentioning that kt tests need to be run sequentially

* docs(kt-failover-test): update test name to TestFailoverToEthDACalldata

This is done to reflect the fact that batcher currently doesn't support failing over to calldata.

* style: fix lint

* docs(e2eutils): document returned values for WaitForBlockWithTxFromSender

* docs(e2eutils): add doc comment for TransactionsBySender

* docs(op-e2e): remove wrong comment in failover_test

* docs(op-e2e): better test comment in failover_test

* style(op-e2e): merged 2 if statements into one

* style: fix comment typo

---------

Co-authored-by: Gaston Ponti <pontigaston@gmail.com>
samlaf added a commit that referenced this pull request Mar 7, 2025
* test(altda): add test for altda->ethda failover

* feat(batcher): altda->ethda failover when altda is down

* chore: fix typos

* fix(fakeDAServer): handlePut was still handling put when in failover mode

* Update op-batcher/batcher/driver.go

Co-authored-by: Gaston Ponti <pontigaston@gmail.com>

* chore: better logs in batcher

* test(kt-devnet): add batcher failover kurtosis test (#33)

* test(kt-devnet): add batcher failover test

* chore(kt-devnet): use new proxy image v1.6.5 which has memstore rest routes

* test(kt-devnet): fix TestFailover + add comments/logs

* ci(kt-devnet): run new kurtosis failover test in ci

* test(kt-devnet): remove enclaveCtx from test harness

Was not being used and I had put it there only "in case"

* test(kt-devnet): refactor failover test to not use hardcoded 10 constant

Define l1BlocksQueriedForBatcherTxs and better describe how tests use it
Also refactored graphql query to use both FROM and TO blocks, not only FROM. This makes the semantics of the tests easier to understand.

* test(kt-devnet): better comments for requireBatcherTxsToBeFromLayer function

* teest(kt-devnet): return err from getEndpointsFromKurtosis if EnclaveServiceEndpoints field doesn't have kurtosis tag

* docs(kt-failover-test): rename getEndpointsFromKurtosis -> getPublicEndpointsFromKurtosis

Also better document the function to explain what its doing

* style(kt-devnet): reformat a weirdly tabbed comment

* docs(kt-failover-test): describe why we use graphql api insead of l1retriever api

* docs(kt-failover-test): add note mentioning that kt tests need to be run sequentially

* docs(kt-failover-test): update test name to TestFailoverToEthDACalldata

This is done to reflect the fact that batcher currently doesn't support failing over to calldata.

* style: fix lint

* docs(e2eutils): document returned values for WaitForBlockWithTxFromSender

* docs(e2eutils): add doc comment for TransactionsBySender

* docs(op-e2e): remove wrong comment in failover_test

* docs(op-e2e): better test comment in failover_test

* style(op-e2e): merged 2 if statements into one

* style: fix comment typo

---------

Co-authored-by: Gaston Ponti <pontigaston@gmail.com>
samlaf added a commit that referenced this pull request Apr 10, 2025
* test(altda): add test for altda->ethda failover

* feat(batcher): altda->ethda failover when altda is down

* chore: fix typos

* fix(fakeDAServer): handlePut was still handling put when in failover mode

* Update op-batcher/batcher/driver.go

Co-authored-by: Gaston Ponti <pontigaston@gmail.com>

* chore: better logs in batcher

* test(kt-devnet): add batcher failover kurtosis test (#33)

* test(kt-devnet): add batcher failover test

* chore(kt-devnet): use new proxy image v1.6.5 which has memstore rest routes

* test(kt-devnet): fix TestFailover + add comments/logs

* ci(kt-devnet): run new kurtosis failover test in ci

* test(kt-devnet): remove enclaveCtx from test harness

Was not being used and I had put it there only "in case"

* test(kt-devnet): refactor failover test to not use hardcoded 10 constant

Define l1BlocksQueriedForBatcherTxs and better describe how tests use it
Also refactored graphql query to use both FROM and TO blocks, not only FROM. This makes the semantics of the tests easier to understand.

* test(kt-devnet): better comments for requireBatcherTxsToBeFromLayer function

* teest(kt-devnet): return err from getEndpointsFromKurtosis if EnclaveServiceEndpoints field doesn't have kurtosis tag

* docs(kt-failover-test): rename getEndpointsFromKurtosis -> getPublicEndpointsFromKurtosis

Also better document the function to explain what its doing

* style(kt-devnet): reformat a weirdly tabbed comment

* docs(kt-failover-test): describe why we use graphql api insead of l1retriever api

* docs(kt-failover-test): add note mentioning that kt tests need to be run sequentially

* docs(kt-failover-test): update test name to TestFailoverToEthDACalldata

This is done to reflect the fact that batcher currently doesn't support failing over to calldata.

* style: fix lint

* docs(e2eutils): document returned values for WaitForBlockWithTxFromSender

* docs(e2eutils): add doc comment for TransactionsBySender

* docs(op-e2e): remove wrong comment in failover_test

* docs(op-e2e): better test comment in failover_test

* style(op-e2e): merged 2 if statements into one

* style: fix comment typo

---------

Co-authored-by: Gaston Ponti <pontigaston@gmail.com>
iquidus pushed a commit that referenced this pull request Jul 29, 2025
* test(altda): add test for altda->ethda failover

* feat(batcher): altda->ethda failover when altda is down

* chore: fix typos

* fix(fakeDAServer): handlePut was still handling put when in failover mode

* Update op-batcher/batcher/driver.go

Co-authored-by: Gaston Ponti <pontigaston@gmail.com>

* chore: better logs in batcher

* test(kt-devnet): add batcher failover kurtosis test (#33)

* test(kt-devnet): add batcher failover test

* chore(kt-devnet): use new proxy image v1.6.5 which has memstore rest routes

* test(kt-devnet): fix TestFailover + add comments/logs

* ci(kt-devnet): run new kurtosis failover test in ci

* test(kt-devnet): remove enclaveCtx from test harness

Was not being used and I had put it there only "in case"

* test(kt-devnet): refactor failover test to not use hardcoded 10 constant

Define l1BlocksQueriedForBatcherTxs and better describe how tests use it
Also refactored graphql query to use both FROM and TO blocks, not only FROM. This makes the semantics of the tests easier to understand.

* test(kt-devnet): better comments for requireBatcherTxsToBeFromLayer function

* teest(kt-devnet): return err from getEndpointsFromKurtosis if EnclaveServiceEndpoints field doesn't have kurtosis tag

* docs(kt-failover-test): rename getEndpointsFromKurtosis -> getPublicEndpointsFromKurtosis

Also better document the function to explain what its doing

* style(kt-devnet): reformat a weirdly tabbed comment

* docs(kt-failover-test): describe why we use graphql api insead of l1retriever api

* docs(kt-failover-test): add note mentioning that kt tests need to be run sequentially

* docs(kt-failover-test): update test name to TestFailoverToEthDACalldata

This is done to reflect the fact that batcher currently doesn't support failing over to calldata.

* style: fix lint

* docs(e2eutils): document returned values for WaitForBlockWithTxFromSender

* docs(e2eutils): add doc comment for TransactionsBySender

* docs(op-e2e): remove wrong comment in failover_test

* docs(op-e2e): better test comment in failover_test

* style(op-e2e): merged 2 if statements into one

* style: fix comment typo

---------

Co-authored-by: Gaston Ponti <pontigaston@gmail.com>
iquidus pushed a commit that referenced this pull request Aug 28, 2025
* test(altda): add test for altda->ethda failover

* feat(batcher): altda->ethda failover when altda is down

* chore: fix typos

* fix(fakeDAServer): handlePut was still handling put when in failover mode

* Update op-batcher/batcher/driver.go

Co-authored-by: Gaston Ponti <pontigaston@gmail.com>

* chore: better logs in batcher

* test(kt-devnet): add batcher failover kurtosis test (#33)

* test(kt-devnet): add batcher failover test

* chore(kt-devnet): use new proxy image v1.6.5 which has memstore rest routes

* test(kt-devnet): fix TestFailover + add comments/logs

* ci(kt-devnet): run new kurtosis failover test in ci

* test(kt-devnet): remove enclaveCtx from test harness

Was not being used and I had put it there only "in case"

* test(kt-devnet): refactor failover test to not use hardcoded 10 constant

Define l1BlocksQueriedForBatcherTxs and better describe how tests use it
Also refactored graphql query to use both FROM and TO blocks, not only FROM. This makes the semantics of the tests easier to understand.

* test(kt-devnet): better comments for requireBatcherTxsToBeFromLayer function

* teest(kt-devnet): return err from getEndpointsFromKurtosis if EnclaveServiceEndpoints field doesn't have kurtosis tag

* docs(kt-failover-test): rename getEndpointsFromKurtosis -> getPublicEndpointsFromKurtosis

Also better document the function to explain what its doing

* style(kt-devnet): reformat a weirdly tabbed comment

* docs(kt-failover-test): describe why we use graphql api insead of l1retriever api

* docs(kt-failover-test): add note mentioning that kt tests need to be run sequentially

* docs(kt-failover-test): update test name to TestFailoverToEthDACalldata

This is done to reflect the fact that batcher currently doesn't support failing over to calldata.

* style: fix lint

* docs(e2eutils): document returned values for WaitForBlockWithTxFromSender

* docs(e2eutils): add doc comment for TransactionsBySender

* docs(op-e2e): remove wrong comment in failover_test

* docs(op-e2e): better test comment in failover_test

* style(op-e2e): merged 2 if statements into one

* style: fix comment typo

---------

Co-authored-by: Gaston Ponti <pontigaston@gmail.com>
iquidus pushed a commit that referenced this pull request Nov 13, 2025
* test(altda): add test for altda->ethda failover

* feat(batcher): altda->ethda failover when altda is down

* chore: fix typos

* fix(fakeDAServer): handlePut was still handling put when in failover mode

* Update op-batcher/batcher/driver.go

Co-authored-by: Gaston Ponti <pontigaston@gmail.com>

* chore: better logs in batcher

* test(kt-devnet): add batcher failover kurtosis test (#33)

* test(kt-devnet): add batcher failover test

* chore(kt-devnet): use new proxy image v1.6.5 which has memstore rest routes

* test(kt-devnet): fix TestFailover + add comments/logs

* ci(kt-devnet): run new kurtosis failover test in ci

* test(kt-devnet): remove enclaveCtx from test harness

Was not being used and I had put it there only "in case"

* test(kt-devnet): refactor failover test to not use hardcoded 10 constant

Define l1BlocksQueriedForBatcherTxs and better describe how tests use it
Also refactored graphql query to use both FROM and TO blocks, not only FROM. This makes the semantics of the tests easier to understand.

* test(kt-devnet): better comments for requireBatcherTxsToBeFromLayer function

* teest(kt-devnet): return err from getEndpointsFromKurtosis if EnclaveServiceEndpoints field doesn't have kurtosis tag

* docs(kt-failover-test): rename getEndpointsFromKurtosis -> getPublicEndpointsFromKurtosis

Also better document the function to explain what its doing

* style(kt-devnet): reformat a weirdly tabbed comment

* docs(kt-failover-test): describe why we use graphql api insead of l1retriever api

* docs(kt-failover-test): add note mentioning that kt tests need to be run sequentially

* docs(kt-failover-test): update test name to TestFailoverToEthDACalldata

This is done to reflect the fact that batcher currently doesn't support failing over to calldata.

* style: fix lint

* docs(e2eutils): document returned values for WaitForBlockWithTxFromSender

* docs(e2eutils): add doc comment for TransactionsBySender

* docs(op-e2e): remove wrong comment in failover_test

* docs(op-e2e): better test comment in failover_test

* style(op-e2e): merged 2 if statements into one

* style: fix comment typo

---------

Co-authored-by: Gaston Ponti <pontigaston@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants