Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: minimal e2e framework #1586

Merged
merged 8 commits into from
Apr 25, 2023
Merged

feat: minimal e2e framework #1586

merged 8 commits into from
Apr 25, 2023

Conversation

cmwaters
Copy link
Contributor

Ref: #1256

This PR puts together the base pieces for an e2e testing suite. It includes a CLI that can:

  • Setup the file directories for multiple nodes, including generating a genesis with several accounts
  • Start the testnet (with different versions and at different start heights)
  • Stop the network and cleanup the used resources.

Further work aims to support:

  • compatibility testing
  • upgrade testing
  • sync testing
  • fuzz testing
    • for non-determinism
    • invariant checking
    • as a generalized tool for other teams

@MSevey MSevey requested review from a team and rootulp and removed request for a team March 31, 2023 14:42
@MSevey MSevey requested a review from a team March 31, 2023 14:45
@codecov-commenter
Copy link

codecov-commenter commented Mar 31, 2023

Codecov Report

Merging #1586 (9a6e52f) into main (058442d) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #1586   +/-   ##
=======================================
  Coverage   50.96%   50.96%           
=======================================
  Files          92       92           
  Lines        5751     5751           
=======================================
  Hits         2931     2931           
  Misses       2520     2520           
  Partials      300      300           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@cmwaters cmwaters requested a review from rach-id as a code owner March 31, 2023 15:01
Copy link
Member

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is dope! I appreciate the similarities to the tendermint e2e test, and see how we can add the desired tests.

Are we wanting to eventually enable celestia-node to use this as well? I could see them being able to reuse a lot of the code here.

mainly just a had a few questions, but other than that I think it LGTM, and we should merge and start adding tests/iterating as needed

testing/e2e/cmd/e2e/main.go Outdated Show resolved Hide resolved
testing/e2e/pkg/exec.go Outdated Show resolved Hide resolved
testing/e2e/pkg/setup.go Outdated Show resolved Hide resolved
Comment on lines +297 to +306
services:
{{- range .Nodes }}
{{ .Name }}:
labels:
e2e: true
container_name: {{ .Name }}
image: ghcr.io/celestiaorg/celestia-app:{{ index .Versions 0 }}
entrypoint: ["/bin/celestia-appd"]
command: ["start"]
init: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my local docker setup might be wonk, beacuse I had to add
user: root in order to get the test to work. If not, the test will appear to be waiting, but none of the nodes actually start because they do not have permission to write to the volume.

does anyone know what I should do to fix that locally? change the group permissions I have for docker? cc @sweexordious

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. The docker image runs using user 1001 however I made the permissions of the volume 0o777 so everyone should have read write execute access.

@cmwaters
Copy link
Contributor Author

cmwaters commented Apr 3, 2023

Are we wanting to eventually enable celestia-node to use this as well? I could see them being able to reuse a lot of the code here.

Yeah Potentially. Matt mentioned having a common repository for e2e testing of various node types in a system. If we are to do this I'd definitely recommend reusing parts of this code.

@MSevey MSevey requested a review from a team April 3, 2023 08:08
Copy link
Member

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 :shipit:

Copy link
Member

@rach-id rach-id left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome stuff 🎉

When I run this locally, the validators start correctly, but when I want to stop them using ctrl + c, everything hangs until I kill the terminal session.
When this happens, I check the running docker images and I see none, so probably this is not related to docker runtime.

Is that the correct way to stop the framework?
Is anyone having the same issue or it's just me?

testing/e2e/pkg/rpc.go Show resolved Hide resolved
@cmwaters
Copy link
Contributor Author

cmwaters commented Apr 3, 2023

Is that the correct way to stop the framework?

It should catch the sig term and just cancel the context. Let me dig in a little. In any case running stop and then cleanup should stop the network but I guess you're referring to running the entire sequence

@MSevey
Copy link
Member

MSevey commented Apr 3, 2023

cc @celestiaorg/testing for visibility

@evan-forbes evan-forbes removed the request for review from staheri14 April 3, 2023 18:06
@evan-forbes evan-forbes added the testing items that are strictly related to adding or extending test coverage label Apr 3, 2023
@evan-forbes evan-forbes added this to the Mainnet milestone Apr 3, 2023
@Bidon15
Copy link
Member

Bidon15 commented Apr 4, 2023

Few questions from my side

why we opted for in-house framework instead of using interchaintest?

state-sync example
chain-upgrade example

why not just re-use their implementation of chain creation using docker-compose and networking?

https://github.com/strangelove-ventures/interchaintest/blob/main/chain/cosmos/cosmos_chain.go

It has the same capabilities as we have rn.

Same with network pruning/setup
https://github.com/strangelove-ventures/interchaintest/blob/main/internal/dockerutil/setup.go

imho, it looks like they have took care of docker networks cleanup which is clumsy

if we want more control over implementation - why not reuse compose-spec like libp2p does?

Seeing the matrix versioning schema that libp2p does - we can def reuse the hard work they pushed.

Pros of above

Both have CI/CD implementation already meaning that we can speed up dev time and contribute back if something we really need

@Bidon15
Copy link
Member

Bidon15 commented Apr 4, 2023

My main concern over this PR - we are signing up on maintaining long-term, while there are tools that can help us achieve the same without maintaining it (interchaintest is constantly being updated, same is for libp2p/test-plan)
cc: @MSevey

@cmwaters
Copy link
Contributor Author

cmwaters commented Apr 4, 2023

Hey,

why we opted for in-house framework instead of using interchaintest?

The simple answer is we have a modified fork of the SDK which is not compatible with interchaintest. There are changes that I've spoken in the past that we can (and I think we should probably make) that means we don't have this compatibility issue and can eliminate the need to maintain the SDK fork.

Secondly, interchaintest is very IBC orientated. This means some of the initial goals of upgrade testing (we will have a different upgrade mechanism to most cosmos chains), compatibility testing and sync testing don't seem to natively fit their framework.

Lastly, for an open source library, I think there documentation and tutorials around using their library are lacking (although the inline commenting is relatively helpful). As well as maintenance we have to also consider onboarding costs to understanding a new (and large) codebase (when something simpler might suffice).

My main concern over this PR - we are signing up on maintaining long-term

I'm sympathetic to this. Ideally we reach our testing goals with little duplication and overlap. When we spoke prior to this it seemed like testground was not suitable. Rene tells me that robusta uses a different more light-weight framework that may be ideal for the outlined purposes. I'm not sure about the other (libp2p) library you mention but I definitely think that what's done here is quite generic: It's simply managing a series of binaries within a network and using the RPC endpoints to make assertions.

What I'm more concerned about is that it seems like these testing goals are perhaps the same across the other teams (node have their "swamp" testing) which seems, at least to me, to heavily overlap with what core/app is doing. If we have a "testing" team then perhaps efforts from individual teams can be consolidated

Copy link
Collaborator

@rootulp rootulp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. I'm encountering an error while trying to run locally

$ ./build/e2e -f networks/simple.toml
Setting up network simple-53126
Spinning up testnet
Starting validator01 at height 0 on http://localhost:4202
Error: starting network: exec: "docker compose": executable file not found in $PATH

I have docker compose installed in my $PATH:

$ docker-compose --version
Docker Compose version v2.15.1
$ which docker-compose
/usr/local/bin/docker-compose
  1. Should we list that installing docker compose is a prerequisite (potentially in README.md)?
  2. Am I missing a different prerequisite step?

@@ -79,6 +93,7 @@ require (
github.com/dgraph-io/badger/v2 v2.2007.4 // indirect
github.com/dgraph-io/ristretto v0.1.0 // indirect
github.com/dgryski/go-farm v0.0.0-20200201041132-a6ae2369ad13 // indirect
github.com/docker/docker v20.10.21+incompatible
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[informational] I was concerned by the +incompatible suffix but it seems ok per https://stackoverflow.com/a/57356777

return nil, errors.New("network is not running")
}

// return heights in ascending order
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] this line claims to sort heights in ascending order but the function godoc comment claims to return heights in descending order.

// GetHeights loops through all running nodes and returns an array of heights
// in order of highest to lowest.
func GetHeights(ctx context.Context, testnet *Testnet) ([]int64, error) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

@Bidon15
Copy link
Member

Bidon15 commented Apr 5, 2023

There are 2 topics that got confused and we clarified them here.

The original idea behind this PR is for teams to do nightly tests using minimal e2e framework. I thought that this is only for core/app and integration ones executed on every PR.

Going forward, we separated into 2 topics:

  1. Nightly Tests.
  2. Integration Tests that are executed per PR.

Nightly tests

@celestiaorg/devops will implement infra features for robusta-nightlies after @cmwaters shares the notion doc with us.
Robusta-nightly is a good point as the telemetry is already there and a fuzzing client can be plugged in easily.

Integration tests

This PR should be used on Integration level per PR run. In addition, we need to migrate it to a standalone repo, so @celestiaorg/celestia-node and @rollkit can use to fulfil their testing strategies

In addition, we will reuse the compose-spec that libp2p/test-plan has to make it CI/CD ready out-of-the-box

Thanks for the input and clarifications 🚀

staheri14
staheri14 previously approved these changes Apr 7, 2023
Copy link
Contributor

@staheri14 staheri14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on this PR. 👍

"path/filepath"
)

// execute executes a shell command.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// execute executes a shell command.
// exec executes a shell command.

Key crypto.PrivKey
}

func LoadTestnet(manifest Manifest, file string) (*Testnet, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit-optional] I think filePath is closer to what this parameter means.

Suggested change
func LoadTestnet(manifest Manifest, file string) (*Testnet, error) {
func LoadTestnet(manifest Manifest, filePath string) (*Testnet, error) {


// Address returns an RPC endpoint address for the node.
func (n Node) AddressRPC() string {
return fmt.Sprintf("%v:26657", n.IP.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit-optional]: Introducing a constant variable to represent the port number 26657 would enhance the code's future maintainability and readability.

}

func (n Node) IsValidator() bool {
return n.SelfDelegation != 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] Since a non-zero SelfDelegation is the only requirement for being recognized as a validator, why not use a boolean type for SelfDelegation instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we may want to set the self delegation to an actual number which will reflect the initial voting power of the validator

evan-forbes
evan-forbes previously approved these changes Apr 17, 2023
@evan-forbes
Copy link
Member

morge?

Bidon15
Bidon15 previously approved these changes Apr 20, 2023
@cmwaters
Copy link
Contributor Author

I have docker compose installed in my $PATH:

Sorry for missing this @rootulp. docker compose isn't the same binary as docker-compose, it's a command from the docker binary that was recently added. If you haven't got it then I think you need to upgrade your docker version

@cmwaters cmwaters dismissed stale reviews from Bidon15, evan-forbes, and staheri14 via 96fae19 April 23, 2023 19:30
@MSevey MSevey requested a review from a team April 23, 2023 19:30
@MSevey MSevey requested a review from a team April 24, 2023 14:13
@cmwaters cmwaters merged commit 23fb8f3 into main Apr 25, 2023
27 checks passed
@cmwaters cmwaters deleted the cal/e2e branch April 25, 2023 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing items that are strictly related to adding or extending test coverage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants