Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

charon combine errors trying to combine size 3 clusters. #2535

Closed
OisinKyne opened this issue Aug 10, 2023 · 4 comments
Closed

charon combine errors trying to combine size 3 clusters. #2535

OisinKyne opened this issue Aug 10, 2023 · 4 comments
Labels
bug Something isn't working protocol Protocol Team tickets

Comments

@OisinKyne
Copy link
Contributor

馃悶 Bug Report

Description

We've added 3 node clusters. I tried to combine the private key and got an error

Has this worked before in a previous version?

this worked with 4 node clusters

馃敩 Minimal Reproduction

docker run --rm -v "$(pwd):/opt/charon" -it obolnetwork/charon:latest create cluster --nodes 3 --network goerli --num-validators 10 --fee-recipient-addresses 0x3C75594181e03E8ECD8468A0037F058a9dAfad79 --withdrawal-addresses 0x3C75594181e03E8ECD8468A0037F058a9dAfad79 --name "Test combine"

docker run -v "$(pwd):/opt/charon" -it obolnetwork/charon:v0.17-rc combine --cluster-dir ./ 

馃敟 Error


17:28:32.701 INFO cmd        Recombining private key shares           {"input_dir": "/opt/charon", "output_dir": "/opt/charon/validator_keys"}
17:28:32.858 INFO cmd        Loading keystore                         {"path": "/opt/charon/node0/validator_keys"}
17:28:33.051 INFO cmd        Loading keystore                         {"path": "/opt/charon/node1/validator_keys"}
17:28:33.217 INFO cmd        Loading keystore                         {"path": "/opt/charon/node2/validator_keys"}
17:28:33.375 INFO cmd        Recombining private key shares           {"validator_index": 0}
17:28:33.376 INFO cmd        Recombining private key shares           {"validator_index": 1}
17:28:33.376 INFO cmd        Recombining private key shares           {"validator_index": 2}
17:28:33.377 INFO cmd        Recombining private key shares           {"validator_index": 3}
17:28:33.377 INFO cmd        Recombining private key shares           {"validator_index": 4}
17:28:33.377 INFO cmd        Recombining private key shares           {"validator_index": 5}
17:28:33.378 INFO cmd        Recombining private key shares           {"validator_index": 6}
17:28:33.378 INFO cmd        Recombining private key shares           {"validator_index": 7}
17:28:33.378 INFO cmd        Recombining private key shares           {"validator_index": 8}
17:28:33.379 INFO cmd        Recombining private key shares           {"validator_index": 9}
17:28:33.547 ERRO cmd        Fatal error: cannot store keystore: write keystore: open /opt/charon/validator_keys/keystore-4.json: no such file or directory {"filename": "/opt/charon/validator_keys/keystore-4.json"}
        eth2util/keystore/keystore.go:87 .func1
        app/forkjoin/forkjoin.go:195 .func2

馃實 Your Environment

Operating System:

  
Docker on OSX
  

What version of Charon are you running? (Which release)

  
Used `latest` to create keys, tried to use latest to combine and ran into errors where manifests were expected (another bug?), then ran `combine` on v0.17-rc and ran into the error above. 

  
docker run -v "$(pwd):/opt/charon" -it obolnetwork/charon:latest combine --cluster-dir ./node0
17:26:47.208 INFO cmd        Recombining private key shares           {"input_dir": "/opt/charon/node0", "output_dir": "/opt/charon/validator_keys"}
17:26:47.213 ERRO cmd        Fatal error: cannot open manifest file: manifest load error: load dag from disk: couldn't load cluster dag from either manifest or legacy lock file {"name": "validator_keys", "error": "read manifest file: open /opt/charon/node0/validator_keys/cluster-manifest.pb: no such file or directory", "file": "/opt/charon/node0/validator_keys/cluster-manifest.pb"}
        cluster/manifest/load.go:65 .LoadDAG
        cluster/manifest/load.go:26 .LoadCluster
        cmd/combine/combine.go:225 .loadManifest
        cmd/combine/combine.go:61 .Combine
        cmd/combine.go:51 .newCombineFunc
        cmd/combine.go:28 .func1
        cmd/cmd.go:80 .func1
        main.go:19 .main

Anything else relevant (validator index / public key)?
We should include splitting and combining as part of our unit tests if we don't have them already, even better if we mix and match versions.

@github-actions github-actions bot added the protocol Protocol Team tickets label Aug 10, 2023
@OisinKyne
Copy link
Contributor Author

Attempted to re-run this with 4 nodes and getting a different error:

docker run -v "$(pwd):/opt/charon" -it obolnetwork/charon:v0.17-rc combine --cluster-dir ./
17:34:57.908 INFO cmd        Recombining private key shares           {"input_dir": "/opt/charon", "output_dir": "/opt/charon/validator_keys"}
17:34:58.035 INFO cmd        Loading keystore                         {"path": "/opt/charon/node0/validator_keys"}
17:34:58.148 INFO cmd        Loading keystore                         {"path": "/opt/charon/node1/validator_keys"}
17:34:58.268 INFO cmd        Loading keystore                         {"path": "/opt/charon/node2/validator_keys"}
17:34:58.360 INFO cmd        Loading keystore                         {"path": "/opt/charon/node3/validator_keys"}
17:34:58.470 INFO cmd        Recombining private key shares           {"validator_index": 0}
17:34:58.471 INFO cmd        Recombining private key shares           {"validator_index": 1}
17:34:58.471 INFO cmd        Recombining private key shares           {"validator_index": 2}
17:34:58.472 INFO cmd        Recombining private key shares           {"validator_index": 3}
17:34:58.472 INFO cmd        Recombining private key shares           {"validator_index": 4}
17:34:58.574 ERRO cmd        Fatal error: cannot store keystore: write keystore: open /opt/charon/validator_keys/keystore-2.json: no such file or directory {"filename": "/opt/charon/validator_keys/keystore-2.json"}
        eth2util/keystore/keystore.go:87 .func1
        app/forkjoin/forkjoin.go:195 .func2

@OisinKyne
Copy link
Contributor Author

And tried where both create and combine are with v0.17-rc in case latest was messing things up, still issues:

docker run -v "$(pwd):/opt/charon" -it obolnetwork/charon:v0.17-rc combine --cluster-dir ./
17:36:47.481 INFO cmd        Recombining private key shares           {"input_dir": "/opt/charon", "output_dir": "/opt/charon/validator_keys"}
17:36:47.588 INFO cmd        Loading keystore                         {"path": "/opt/charon/node0/validator_keys"}
17:36:47.709 INFO cmd        Loading keystore                         {"path": "/opt/charon/node1/validator_keys"}
17:36:47.815 INFO cmd        Loading keystore                         {"path": "/opt/charon/node2/validator_keys"}
17:36:47.902 INFO cmd        Loading keystore                         {"path": "/opt/charon/node3/validator_keys"}
17:36:47.991 INFO cmd        Recombining private key shares           {"validator_index": 0}
17:36:47.992 INFO cmd        Recombining private key shares           {"validator_index": 1}
17:36:47.993 INFO cmd        Recombining private key shares           {"validator_index": 2}
17:36:47.993 INFO cmd        Recombining private key shares           {"validator_index": 3}
17:36:47.994 INFO cmd        Recombining private key shares           {"validator_index": 4}
17:36:48.082 ERRO cmd        Fatal error: cannot store keystore: write keystore: open /opt/charon/validator_keys/keystore-0.json: no such file or directory {"filename": "/opt/charon/validator_keys/keystore-0.json"}
        eth2util/keystore/keystore.go:87 .func1
        app/forkjoin/forkjoin.go:195 .func2

@boulder225 boulder225 added the bug Something isn't working label Aug 11, 2023
@corverroos
Copy link
Contributor

corverroos commented Aug 11, 2023

So issue is that we do not ensure that the --output-dir=./validator_keys folder exists before trying to write the combined keys to it. If the folder doesn't exist, the writing keystore files fail with no such file or directory.

The issue was introduced here https://github.com/ObolNetwork/charon/pull/1876/files#diff-4eebffe381b647be303ac621d21eee27fa20c186d5561d5a497af88fb0250f91 when the code that ensures the output dir exists was removed.

This fixes it:

diff --git a/eth2util/keystore/keystore.go b/eth2util/keystore/keystore.go
index fdb5cb73..e4ca5b9e 100644
--- a/eth2util/keystore/keystore.go
+++ b/eth2util/keystore/keystore.go
@@ -62,6 +62,10 @@ func storeKeysInternal(secrets []tbls.PrivateKey, dir string, filenameFmt string
 		secret tbls.PrivateKey
 	}

+	if err := os.MkdirAll(dir, 0o700); err != nil {
+		return errors.Wrap(err, "create dir", z.Str("dir", dir))
+	}
+
 	fork, join, cancel := forkjoin.New(
 		context.Background(),
 		func(ctx context.Context, d data) (any, error) {

@corverroos
Copy link
Contributor

Closing this as duplicate of #2536

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working protocol Protocol Team tickets
Projects
None yet
Development

No branches or pull requests

3 participants