Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssv not rebooting / docker Staus Exited (2) / module=libcontainerd "ignoring event" #1328

Open
ThomasBlock opened this issue Feb 23, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@ThomasBlock
Copy link

Describe the bug
I run several instances of ssv. standalone or inside ethdocker. All as docker containers. Two times in the last days i have seen that different nodes dit not reboot. ( ssv crashes after loosing connection to CL. no big deal if it would restart, but it does not ).

To Reproduce
Fresh Ubuntu 22
Install as outlined here https://docs.ssv.network/operator-user-guides/operator-node/installation

docker run --restart unless-stopped --name ssv_node -e \
CONFIG_PATH=/config.yaml -p 13001:13001 -p 12001:12001/udp -p 15000:15000 \
-v "$(pwd)/config.yaml":/config.yaml \
-v "$(pwd)":/data \
-v "$(pwd)/password":/password \
-v "$(pwd)/encrypted_private_key.json":/encrypted_private_key.json \
-it "bloxstaking/ssv-node:latest" make BUILD_PATH="/go/bin/ssvnode" start-node

log standalone ssv

  • v1.2.3-e5a6d711958f5043615bf8f6a95005a6083e714f
  • besu
  • lighthouse
docker ps -a
CONTAINER ID   IMAGE                         COMMAND                  CREATED      STATUS                    PORTS     NAMES
d36ac75e8957   bloxstaking/ssv-node:latest   "make BUILD_PATH=/go…"   4 days ago   Exited (2) 28 hours ago             ssv_node
docker logs ssv_node
2024-02-19T11:16:11.109446Z	ERROR	node is not healthy	{"node": "consensus client", "error": "failed to request syncing: failed to call GET endpoint: Get \"http://rocketpool2.nl.arpa:5052/eth/v1/node/syncing\": dial tcp 192.168.131.180:5052: connect: connection refused", "errorVerbose": "Get \"http://rocketpool2.nl.arpa:5052/eth/v1/node/syncing\": dial tcp 192.168.131.180:5052: connect: connection refused\nfailed to call GET endpoint\ngithub.com/attestantio/go-eth2-client/http.(*Service).get\n\t/go/pkg/mod/github.com/attestantio/go-eth2-client@v0.16.3/http/http.go:66\ngithub.com/attestantio/go-eth2-client/http.(*Service).NodeSyncing\n\t/go/pkg/mod/github.com/attestantio/go-eth2-client@v0.16.3/http/nodesyncing.go:30\ngithub.com/bloxapp/ssv/beacon/goclient.(*goClient).Healthy\n\t/go/src/github.com/bloxapp/ssv/beacon/goclient/goclient.go:203\ngithub.com/bloxapp/ssv/nodeprobe.(*Prober).probe.func1\n\t/go/src/github.com/bloxapp/ssv/nodeprobe/nodeprobe.go:96\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\nfailed to request syncing\ngithub.com/attestantio/go-eth2-client/http.(*Service).NodeSyncing\n\t/go/pkg/mod/github.com/attestantio/go-eth2-client@v0.16.3/http/nodesyncing.go:32\ngithub.com/bloxapp/ssv/beacon/goclient.(*goClient).Healthy\n\t/go/src/github.com/bloxapp/ssv/beacon/goclient/goclient.go:203\ngithub.com/bloxapp/ssv/nodeprobe.(*Prober).probe.func1\n\t/go/src/github.com/bloxapp/ssv/nodeprobe/nodeprobe.go:96\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"}
2024-02-19T11:16:11.109643Z	ERROR	not all nodes are healthy
2024-02-19T11:16:11.109662Z	FATAL	ethereum node(s) are either out of sync or down. Ensure the nodes are healthy to resume.
make: *** [Makefile:102: start-node] Error 1
make: *** wait: No child process.  Stop.
journalctl -u docker
Feb 19 12:16:11 ssv2 dockerd[15728]: time="2024-02-19T12:16:11.134730871+01:00" level=info msg="ignoring event" container=d36ac75e89576178dc74d91cf2397afb0aeff2aae83f59f3bb115ea0c702abbe module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Feb 19 12:16:11 ssv2 dockerd[15728]: time="2024-02-19T12:16:11.141990788+01:00" level=warning msg="ShouldRestart failed, container will not be restarted" container=d36ac75e89576178dc74d91cf2397afb0aeff2aae83f59f3bb115ea0c702abbe daemonShuttingDown=false error="restart canceled" execDuration=69h56m0.401903873s exitStatus="{2 2024-02-19 11:16:11.>
Feb 19 12:16:11 ssv2 dockerd[15728]: time="2024-02-19T12:16:11.142553134+01:00" level=warning msg="failed to close stdin: task d36ac75e89576178dc74d91cf2397afb0aeff2aae83f59f3bb115ea0c702abbe not found: not found"

log ethdocker

  • v1.2.3-e5a6d711958f5043615bf8f6a95005a6083e714f
  • prysm-cl-only.yml
  • besu.yml
docker ps -a
bb2eb0dda72a   bloxstaking/ssv-node:latest                "make BUILD_PATH=/go…"   5 days ago   Exited (2) 17 hours ago                                                                                                                                                                                                         eth-docker-ssv-node-1
ethd logs ssv-node
ssv-node-1  | 2024-02-23T02:09:30.684605Z	error	node is not healthy	{"node": "consensus client", "error": "failed to request syncing: failed to call GET endpoint: Get \"http://consensus:5052/eth/v1/node/syncing\": context deadline exceeded", "errorVerbose": "Get \"http://consensus:5052/eth/v1/node/syncing\": context deadline exceeded\nfailed to call GET endpoint\ngithub.com/attestantio/go-eth2-client/http.(*Service).get\n\t/go/pkg/mod/github.com/attestantio/go-eth2-client@v0.16.3/http/http.go:66\ngithub.com/attestantio/go-eth2-client/http.(*Service).NodeSyncing\n\t/go/pkg/mod/github.com/attestantio/go-eth2-client@v0.16.3/http/nodesyncing.go:30\ngithub.com/bloxapp/ssv/beacon/goclient.(*goClient).Healthy\n\t/go/src/github.com/bloxapp/ssv/beacon/goclient/goclient.go:203\ngithub.com/bloxapp/ssv/nodeprobe.(*Prober).probe.func1\n\t/go/src/github.com/bloxapp/ssv/nodeprobe/nodeprobe.go:96\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\nfailed to request syncing\ngithub.com/attestantio/go-eth2-client/http.(*Service).NodeSyncing\n\t/go/pkg/mod/github.com/attestantio/go-eth2-client@v0.16.3/http/nodesyncing.go:32\ngithub.com/bloxapp/ssv/beacon/goclient.(*goClient).Healthy\n\t/go/src/github.com/bloxapp/ssv/beacon/goclient/goclient.go:203\ngithub.com/bloxapp/ssv/nodeprobe.(*Prober).probe.func1\n\t/go/src/github.com/bloxapp/ssv/nodeprobe/nodeprobe.go:96\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598"}
ssv-node-1  | 2024-02-23T02:09:30.687323Z	error	not all nodes are healthy
ssv-node-1  | 2024-02-23T02:09:30.687356Z	fatal	ethereum node(s) are either out of sync or down. Ensure the nodes are healthy to resume.
ssv-node-1  | make: *** [Makefile:102: start-node] Error 1
ssv-node-1  | make: go: No such file or directory
ssv-node-1  | make: go: No such file or directory
ssv-node-1  | make: go: No such file or directory
ssv-node-1  | Build /go/bin/ssvnode
ssv-node-1  | Build /config/config.yaml
ssv-node-1  | Build 
ssv-node-1  | Command --config=/config/config.yaml
ssv-node-1  | Running node on address: *.*.*.*)
ssv-node-1  | 2024-02-23T02:09:52.464863Z	info	starting SSV-Node:v1.2.3-e5a6d711958f5043615bf8f6a95005a6083e714f
ssv-node-1  | 2024-02-23T02:09:52.465278Z	info	setting ssv network	{"network": "mainnet", "domain": "00000000", "nodeType": "light", "builderProposals(MEV)": "enabled", "beaconNetwork": "mainnet", "genesisEpoch": 218450, "registryContract": "0xDD9BC35aE942eF0cFa76930954a156B3fF30a4E1"}
ssv-node-1  | 2024-02-23T02:09:52.508341Z	info	applying migrations	{"count": 4}
ssv-node-1  | 2024-02-23T02:09:52.508951Z	info	applied migrations successfully	{"count": 0}
ssv-node-1  | 2024-02-23T02:09:52.848530Z	info	successfully setup operator keys	{"pubkey": "***"}
ssv-node-1  | 2024-02-23T02:10:53.966163Z	info	consensus client: connecting	{"address": "http://consensus:5052", "network": "mainnet"}
ssv-node-1  | 2024-02-23T02:11:12.449409Z	fatal	failed to create beacon go-client	{"error": "failed to create http client: failed to confirm node connection: failed to fetch genesis: failed to request genesis: failed to call GET endpoint: Get \"http://consensus:5052/eth/v1/beacon/genesis\": context deadline exceeded", "errorVerbose": "Get \"http://consensus:5052/eth/v1/beacon/genesis\": context deadline exceeded\nfailed to call GET endpoint\ngithub.com/attestantio/go-eth2-client/http.(*Service).get\n\t/go/pkg/mod/github.com/attestantio/go-eth2-client@v0.16.3/http/http.go:66\ngithub.com/attestantio/go-eth2-client/http.(*Service).Genesis\n\t/go/pkg/mod/github.com/attestantio/go-eth2-client@v0.16.3/http/genesis.go:45\ngithub.com/attestantio/go-eth2-client/http.(*Service).fetchStaticValues\n\t/go/pkg/mod/github.com/attestantio/go-eth2-client@v0.16.3/http/service.go:143\ngithub.com/attestantio/go-eth2-client/http.New\n\t/go/pkg/mod/github.com/attestantio/go-eth2-client@v0.16.3/http/service.go:118\ngithub.com/bloxapp/ssv/beacon/goclient.New\n\t/go/src/github.com/bloxapp/ssv/beacon/goclient/goclient.go:154\ngithub.com/bloxapp/ssv/cli/operator.setupConsensusClient\n\t/go/src/github.com/bloxapp/ssv/cli/operator/node.go:545\ngithub.com/bloxapp/ssv/cli/operator.glob..func2\n\t/go/src/github.com/bloxapp/ssv/cli/operator/node.go:151\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:944\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992\ngithub.com/bloxapp/ssv/cli.Execute\n\t/go/src/github.com/bloxapp/ssv/cli/cli.go:27\nmain.main\n\t/go/src/github.com/bloxapp/ssv/cmd/ssvnode/main.go:20\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\nfailed to request genesis\ngithub.com/attestantio/go-eth2-client/http.(*Service).Genesis\n\t/go/pkg/mod/github.com/attestantio/go-eth2-client@v0.16.3/http/genesis.go:47\ngithub.com/attestantio/go-eth2-client/http.(*Service).fetchStaticValues\n\t/go/pkg/mod/github.com/attestantio/go-eth2-client@v0.16.3/http/service.go:143\ngithub.com/attestantio/go-eth2-client/http.New\n\t/go/pkg/mod/github.com/attestantio/go-eth2-client@v0.16.3/http/service.go:118\ngithub.com/bloxapp/ssv/beacon/goclient.New\n\t/go/src/github.com/bloxapp/ssv/beacon/goclient/goclient.go:154\ngithub.com/bloxapp/ssv/cli/operator.setupConsensusClient\n\t/go/src/github.com/bloxapp/ssv/cli/operator/node.go:545\ngithub.com/bloxapp/ssv/cli/operator.glob..func2\n\t/go/src/github.com/bloxapp/ssv/cli/operator/node.go:151\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:944\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992\ngithub.com/bloxapp/ssv/cli.Execute\n\t/go/src/github.com/bloxapp/ssv/cli/cli.go:27\nmain.main\n\t/go/src/github.com/bloxapp/ssv/cmd/ssvnode/main.go:20\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\nfailed to fetch genesis\ngithub.com/attestantio/go-eth2-client/http.(*Service).fetchStaticValues\n\t/go/pkg/mod/github.com/attestantio/go-eth2-client@v0.16.3/http/service.go:144\ngithub.com/attestantio/go-eth2-client/http.New\n\t/go/pkg/mod/github.com/attestantio/go-eth2-client@v0.16.3/http/service.go:118\ngithub.com/bloxapp/ssv/beacon/goclient.New\n\t/go/src/github.com/bloxapp/ssv/beacon/goclient/goclient.go:154\ngithub.com/bloxapp/ssv/cli/operator.setupConsensusClient\n\t/go/src/github.com/bloxapp/ssv/cli/operator/node.go:545\ngithub.com/bloxapp/ssv/cli/operator.glob..func2\n\t/go/src/github.com/bloxapp/ssv/cli/operator/node.go:151\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:944\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992\ngithub.com/bloxapp/ssv/cli.Execute\n\t/go/src/github.com/bloxapp/ssv/cli/cli.go:27\nmain.main\n\t/go/src/github.com/bloxapp/ssv/cmd/ssvnode/main.go:20\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\nfailed to confirm node connection\ngithub.com/attestantio/go-eth2-client/http.New\n\t/go/pkg/mod/github.com/attestantio/go-eth2-client@v0.16.3/http/service.go:119\ngithub.com/bloxapp/ssv/beacon/goclient.New\n\t/go/src/github.com/bloxapp/ssv/beacon/goclient/goclient.go:154\ngithub.com/bloxapp/ssv/cli/operator.setupConsensusClient\n\t/go/src/github.com/bloxapp/ssv/cli/operator/node.go:545\ngithub.com/bloxapp/ssv/cli/operator.glob..func2\n\t/go/src/github.com/bloxapp/ssv/cli/operator/node.go:151\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:944\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992\ngithub.com/bloxapp/ssv/cli.Execute\n\t/go/src/github.com/bloxapp/ssv/cli/cli.go:27\nmain.main\n\t/go/src/github.com/bloxapp/ssv/cmd/ssvnode/main.go:20\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598\nfailed to create http client", "address": "http://consensus:5052"}
ssv-node-1  | make: *** [Makefile:102: start-node] Error 1
journalctl -u docker
Feb 23 03:10:26 ethdocker2 dockerd[798]: time="2024-02-23T03:10:26.552099138+01:00" level=warning msg="Health check for container 35628ea1e4c062ae421aabc6ee308f425fda6e4bbab49eba8df7f08219b61cb3 error: timed out starting health check for container 35628ea1e4c062ae421aabc6ee308f425fda6e4bbab49eba8df7f08219b61cb3"
Feb 23 03:10:26 ethdocker2 dockerd[798]: time="2024-02-23T03:10:26.552377870+01:00" level=error msg="stream copy error: reading from a closed fifo"
Feb 23 03:10:26 ethdocker2 dockerd[798]: time="2024-02-23T03:10:26.552398011+01:00" level=error msg="stream copy error: reading from a closed fifo"
Feb 23 03:11:14 ethdocker2 dockerd[798]: time="2024-02-23T03:11:14.098400837+01:00" level=info msg="ignoring event" container=bb2eb0dda72a35b0ccaa4b2f412967b3e864c2d31f1ef2d3a0ea32e3f64acfbc module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Feb 23 03:12:34 ethdocker2 dockerd[798]: time="2024-02-23T03:11:28.995205827+01:00" level=warning msg="Health check for container 35628ea1e4c062ae421aabc6ee308f425fda6e4bbab49eba8df7f08219b61cb3 error: timed out starting health check for container 35628ea1e4c062ae421aabc6ee308f425fda6e4bbab49eba8df7f08219b61cb3"
Feb 23 03:12:34 ethdocker2 dockerd[798]: time="2024-02-23T03:11:28.995544380+01:00" level=error msg="stream copy error: reading from a closed fifo"
Feb 23 03:12:34 ethdocker2 dockerd[798]: time="2024-02-23T03:11:28.995557231+01:00" level=error msg="stream copy error: reading from a closed fifo"
Feb 23 03:12:34 ethdocker2 dockerd[798]: time="2024-02-23T03:12:18.680412876+01:00" level=warning msg="Health check for container 35628ea1e4c062ae421aabc6ee308f425fda6e4bbab49eba8df7f08219b61cb3 error: timed out starting health check for container 35628ea1e4c062ae421aabc6ee308f425fda6e4bbab49eba8df7f08219b61cb3"
Feb 23 03:12:34 ethdocker2 dockerd[798]: time="2024-02-23T03:12:18.680774258+01:00" level=error msg="stream copy error: reading from a closed fifo"
Feb 23 03:12:34 ethdocker2 dockerd[798]: time="2024-02-23T03:12:18.681322233+01:00" level=warning msg="Health check for container c520d5a80f2f8fe97bc13862feef84a64e45804058abab6ea7f03db302af9c6f error: timed out starting health check for container c520d5a80f2f8fe97bc13862feef84a64e45804058abab6ea7f03db302af9c6f"
Feb 23 03:12:34 ethdocker2 dockerd[798]: time="2024-02-23T03:12:18.681615175+01:00" level=error msg="stream copy error: reading from a closed fifo"
Feb 23 03:12:34 ethdocker2 dockerd[798]: time="2024-02-23T03:12:18.681685706+01:00" level=error msg="stream copy error: reading from a closed fifo"
Feb 23 03:12:34 ethdocker2 dockerd[798]: time="2024-02-23T03:12:18.681985778+01:00" level=error msg="stream copy error: reading from a closed fifo"
Feb 23 03:13:29 ethdocker2 dockerd[798]: time="2024-02-23T03:13:29.537949336+01:00" level=warning msg="Health check for container 35628ea1e4c062ae421aabc6ee308f425fda6e4bbab49eba8df7f08219b61cb3 error: timed out starting health check for container 35628ea1e4c062ae421aabc6ee308f425fda6e4bbab49eba8df7f08219b61cb3"
Feb 23 03:13:32 ethdocker2 dockerd[798]: time="2024-02-23T03:13:29.538173078+01:00" level=error msg="stream copy error: reading from a closed fifo"
@ThomasBlock ThomasBlock added the bug Something isn't working label Feb 23, 2024
@ThomasBlock
Copy link
Author

ThomasBlock commented May 14, 2024

"not rebooting" is nox fixed.

but "node is not healthy" is getting worse for me. one reboot every hour.. problem is on several nodes.. i updated ethdocker and all images to lastest version..

it might somehow be CPU usage related.. the problem got worse in the last two days when other tasks on my server push cpu usage above 80%

ssv-node-1  | {"level":"error","time":"2024-05-14T15:17:42.267391Z","msg":"node is not healthy","node":"consensus client","error":"failed to obtain node syncing status: failed to call GET endpoint: Get \"http://consensus:5052/eth/v1/node/syncing\": context deadline exceeded"}
ssv-node-1  | {"level":"error","time":"2024-05-14T15:17:42.267512Z","msg":"not all nodes are healthy"}
ssv-node-1  | {"level":"fatal","time":"2024-05-14T15:17:42.267523Z","msg":"ethereum node(s) are either out of sync or down. Ensure the nodes are healthy to resume."}

consensus is fine:

consensus-1  | May-14 17:17:32.296[]                 info: Synced - slot: 9072985 - head: (slot -2) 0x5b8b…72fc - exec-block: valid(19869070 0x3d89…) - finalized: 0x6fd0…8a03:283528 - peers: 85
consensus-1  | May-14 17:17:42.127[]                 info: Synced - slot: 9072986 - head: (slot -3) 0x5b8b…72fc - exec-block: valid(19869070 0x3d89…) - finalized: 0x6fd0…8a03:283528 - peers: 81
consensus-1  | May-14 17:17:53.122[]                 info: Synced - slot: 9072987 - head: (slot -2) 0xb35e…1bb4 - exec-block: valid(19869071 0xfc5b…) - finalized: 0x6fd0…8a03:283528 - peers: 84

execution is fine:


execution-1  | 2024-05-14 15:17:03.692+00:00 | vert.x-worker-thread-0 | INFO  | AbstractEngineNewPayload | Imported #19,869,070 / 129 tx / 16 ws / 0 blobs / base fee 7.96 gwei / 10,126,426 (33.8%) gas / (0x3d89c842fb31dbcb013bfb2d70e37c2b6fdd7d2e8691c81b93891d51185722e7) in 1.923s. Peers: 25
execution-1  | 2024-05-14 15:17:50.636+00:00 | vert.x-worker-thread-0 | INFO  | AbstractEngineNewPayload | Imported #19,869,071 / 335 tx / 16 ws / 1 blobs / base fee 7.64 gwei / 28,649,111 (95.5%) gas / (0xfc5b89fed24c73c4feed05c527d4c0e47dd5386da70e1d09b43895df9ee6704b) in 9.093s. Peers: 25
execution-1  | 2024-05-14 15:17:51.793+00:00 | vert.x-worker-thread-0 | INFO  | AbstractEngineForkchoiceUpdated | VALID for fork-choice-update: head: 0xfc5b89fed24c73c4feed05c527d4c0e47dd5386da70e1d09b43895df9ee6704b, finalized: 0x739329aeb75499d6697e6f9f564cd486cbd0b56c37b49fc71027c8e1f581e55b, safeBlockHash: 0xa7dbaecdffda6af18cc2206bcb0ae94fbf4a911d20cffdab28f763786cd7a781

edit: see also #1327

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant