Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad address - cannot read snapshot for v10:zoe - KERNEL PANIC #3877

Closed
dckc opened this issue Aug 25, 2021 · 37 comments · Fixed by #3906
Closed

Bad address - cannot read snapshot for v10:zoe - KERNEL PANIC #3877

dckc opened this issue Aug 25, 2021 · 37 comments · Fixed by #3906
Assignees
Labels
bug Something isn't working testnet-problem problem found during incentivized testnet xsnap the XS execution tool

Comments

@dckc
Copy link
Member

dckc commented Aug 25, 2021

reported in https://discord.com/channels/585576150827532298/819073555446759444/880169455412457602
and https://discord.com/channels/585576150827532298/819073555446759444/880176218673131570

Aug 25 21:18:16 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]: 2021-08-25T19:18:16.617Z launch-chain: Launching SwingSet kernel
Aug 25 21:18:16 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]: Prometheus scrape endpoint: http://0.0.0.0:9464/metrics
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614214]: cannot read snapshot /root/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/8bae75381c20d536812b972f61b52ae4f8ed4a83ad293070bf8a57e7f87d4e0c-load-qSM9Ip.xss: Bad address
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]: 2021-08-25T19:18:40.070Z SwingSet: kernel: ##### KERNEL PANIC: unable to re-create vat v10 #####
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]: portHandler threw (ExitCode#1)
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]: ExitCode#1: v10:zoe exited: I/O error
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]:   at new ErrorCode (packages/xsnap/api.js:49:5)
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]:   at ChildProcess.<anonymous> (packages/xsnap/src/xsnap.js:124:22)
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]:   at ChildProcess.emit (events.js:400:28)
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]: Cannot initialize Controller ExitCode: v10:zoe exited: I/O error
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal systemd[1]: ag-chain-cosmos.service: Main process exited, code=exited, status=1/FAILURE
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal systemd[1]: ag-chain-cosmos.service: Failed with result 'exit-code'.
Aug 25 21:18:43 Ubuntu-2004-focal-64-minimal systemd[1]: ag-chain-cosmos.service: Scheduled restart job, restart counter is at 2.
Aug 25 21:18:43 Ubuntu-2004-focal-64-minimal systemd[1]: Stopped Agoric Cosmos daemon.
Aug 25 21:18:43 Ubuntu-2004-focal-64-minimal systemd[1]: Started Agoric Cosmos daemon.
@dckc
Copy link
Member Author

dckc commented Aug 25, 2021

I'm working on getting the relevant contents of xs-shapshots submitted for forensic analysis.

@EmreNOP
Copy link

EmreNOP commented Aug 26, 2021

Aug 26 06:06:48 agoric ag-chain-cosmos[16025]: cannot read snapshot /root/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/3c12d2556f426be51c154bd623dc8d4aeb5c56d9e80f254a632ff55f14bf7c26-load-qVodTc.xss: Bad address

have same issue.

@kj89
Copy link

kj89 commented Aug 26, 2021

same here Aug 26 08:13:58 agoric-validator ag-chain-cosmos[1097]: cannot read snapshot /root/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/0a5139af6d5c231df506b10f6c18e856caade55a2cbe4566d6a8822c22b00381-load-JDg594.xss: Bad address

@humantraffic
Copy link

I uploaded slog file and the folder of snapshots as requested

https://www.dropbox.com/s/ycz0cbwe2f58yk0/humantraffic-agorictest17-chain.slog.gz?dl=0
https://www.dropbox.com/s/qwegikzcfo19l90/xs-snapshots.tar.gz?dl=0

@alkadeta
Copy link

alkadeta commented Aug 26, 2021

I have the same issue after performing my latest restart task.

Aug 26 07:38:02 agoric ag-chain-cosmos[272306]: cannot read snapshot /home/***/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/0662072aa8268afa30d4805714dba5b02b52e542e15115b5bea678b4076f482d-load-g145I9.xss: Bad address

Slog File: https://drive.google.com/file/d/1XPOp4l8HzE5hKJxzipLEd-6URxzDZVOd/view
xs-snapshots folder: https://drive.google.com/file/d/1ex8HH8F4M51X2h-xeqRBaOeZxbQ3rBY4/view

@MarryRSR
Copy link

MarryRSR commented Aug 26, 2021

THE SAME: (

Aug 26 10:46:57 Ubuntu-2004-focal-64-minimal systemd[1]: Stopped Agoric Cosmos daemon.
Aug 26 10:46:57 Ubuntu-2004-focal-64-minimal systemd[1]: Started Agoric Cosmos daemon.
Aug 26 10:47:00 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[247428]: 2021-08-26T08:47:00.399Z launch-chain: Launching SwingSet kernel
Aug 26 10:47:38 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[247751]: cannot read snapshot /root/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/33faedb936ea43ccaa5fc4c84a1848f2bcdd5b953225a7274ea78027125a3259-load-XPsN2L.xss: Bad address
Aug 26 10:47:38 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[247428]: 2021-08-26T08:47:38.832Z SwingSet: kernel: ##### KERNEL PANIC: unable to re-create vat v10 #####
Aug 26 10:47:38 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[247428]: portHandler threw (ExitCode#1)
Aug 26 10:47:38 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[247428]: ExitCode#1: v10:zoe exited: I/O error
Aug 26 10:47:38 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[247428]:   at new ErrorCode (packages/xsnap/api.js:49:5)
Aug 26 10:47:38 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[247428]:   at ChildProcess.<anonymous> (packages/xsnap/src/xsnap.js:124:22)
Aug 26 10:47:38 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[247428]:   at ChildProcess.emit (events.js:400:28)
Aug 26 10:47:38 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[247428]: Cannot initialize Controller ExitCode: v10:zoe exited: I/O error
Aug 26 10:47:38 Ubuntu-2004-focal-64-minimal systemd[1]: ag-chain-cosmos.service: Main process exited, code=exited, status=1/FAILURE

@absorberch
Copy link

absorberch commented Aug 26, 2021

Same here, I have got it on my RPC node

Aug 26 10:47:28 Agoric-RPCnode ag-chain-cosmos[1037588]: cannot read snapshot /root/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/2b3a7bd6674097027e4094b27e52a01382161337c8ee7a968dda4b48feb4ef9f-load-8WdaNm.xss: Bad address

https://www.dropbox.com/s/7uu886yow5e1ds5/nataagoricRPC-xs-snapshots.tar.gz?dl=0
https://www.dropbox.com/s/ng35gtkcs66s643/nataagoricRPC-agorictest17-chain.gz?dl=0

@Caneryy
Copy link

Caneryy commented Aug 26, 2021

I have same issue.

@smilby
Copy link

smilby commented Aug 26, 2021

Hello, after restart i have this error too.

This is full logs https://docs.google.com/spreadsheets/d/1H2QPSKDmtv2b5wUn8EVz2CI4S8Lo2cDQq-XABAlzy8A/edit?usp=sharing

Aug 26 03:15:27 ubuntu-8gb-hel1-agoric ag-chain-cosmos[799734]: 2021-08-26T01:15:27.277Z block-manager: block 71157 begin
Aug 26 03:15:45 ubuntu-8gb-hel1-agoric systemd[1]: Stopping Agoric Cosmos daemon...
Aug 26 03:15:45 ubuntu-8gb-hel1-agoric systemd[1]: ag-chain-cosmos.service: Main process exited, code=exited, status=98/n/a
Aug 26 03:15:45 ubuntu-8gb-hel1-agoric systemd[1]: ag-chain-cosmos.service: Failed with result 'exit-code'.
Aug 26 03:15:45 ubuntu-8gb-hel1-agoric systemd[1]: Stopped Agoric Cosmos daemon.
Aug 26 03:15:45 ubuntu-8gb-hel1-agoric systemd[1]: Started Agoric Cosmos daemon.
Aug 26 03:15:48 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]: 3:15AM ERR WARNING: The minimum-gas-prices config in app.toml is set to the empty string. This defaults to 0 in the current version, but will error in the next version (SD>
Aug 26 03:15:50 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]: 2021-08-26T01:15:50.960Z launch-chain: Launching SwingSet kernel
Aug 26 03:15:51 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]: Prometheus scrape endpoint: http://0.0.0.0:9464/metrics
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: Logging sent error stack (RemoteError(error:liveSlots:v14#70257)#769)
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: RemoteError(error:liveSlots:v14#70257)#769: already have remote (a string)
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: Error: already have remote (a string)
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at construct ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at Error ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at makeError ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at fullRevive ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at unserialize ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at notifyOnePromise ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at notify ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at dispatchToUserspace ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at runWithoutMetering ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: RemoteError(error:liveSlots:v14#70257)#769 ERROR_NOTE: Rejection from: (Error#770) : 2147 . 0
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: RemoteError(error:liveSlots:v14#70257)#769 ERROR_NOTE: Rejection from: (Error#771) : 2146 . 1
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: RemoteError(error:liveSlots:v14#70257)#769 ERROR_NOTE: Sent as error:liveSlots:v8#70257
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: Error#770: Event: 2146.1
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: Error: Event: 2146.1
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at construct ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at Error ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at trackTurns ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at handle ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at pleaseProvision ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at pleaseProvision ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at win ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: Error#770 ERROR_NOTE: Caused by: (Error#771)
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: Error#771: Event: 2145.1
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]: Error: Event: 2145.1
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at construct ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at Error ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at trackTurns ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at handle ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at deliver ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at dispatchToUserspace ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at runWithoutMetering ()
Aug 26 03:16:06 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400644]:  at ()
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400659]: cannot read snapshot /root/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/52ab799b661522074f91c8ea3d6bf2282ff8e8c0e818db5ed720b8d88ca947f3-load-9bRFzP.xss: Bad a>
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]: 2021-08-26T01:16:28.203Z SwingSet: kernel: ##### KERNEL PANIC: unable to re-create vat v10 #####
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]: portHandler threw (ExitCode#1)
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]: ExitCode#1: v10:zoe exited: I/O error
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]:   at new ErrorCode (packages/xsnap/api.js:49:5)
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]:   at ChildProcess.<anonymous> (packages/xsnap/src/xsnap.js:124:22)
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]:   at ChildProcess.emit (events.js:400:28)
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400541]: Cannot initialize Controller ExitCode: v10:zoe exited: I/O error
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric systemd[1]: ag-chain-cosmos.service: Main process exited, code=exited, status=1/FAILURE
Aug 26 03:16:28 ubuntu-8gb-hel1-agoric systemd[1]: ag-chain-cosmos.service: Failed with result 'exit-code'.
Aug 26 03:16:31 ubuntu-8gb-hel1-agoric systemd[1]: ag-chain-cosmos.service: Scheduled restart job, restart counter is at 1.
Aug 26 03:16:31 ubuntu-8gb-hel1-agoric systemd[1]: Stopped Agoric Cosmos daemon.
Aug 26 03:16:31 ubuntu-8gb-hel1-agoric systemd[1]: Started Agoric Cosmos daemon.
Aug 26 03:16:32 ubuntu-8gb-hel1-agoric ag-chain-cosmos[1400670]: 3:16AM ERR WARNING: The minimum-gas-prices config in app.toml is set to the empty string. This defaults to 0 in the current version, but will error in the next version (SD>
lines 5854-5915

@alipostaci2001
Copy link

alipostaci2001 commented Aug 26, 2021

image

i have this error!
here is my xs-snapshots file link:
https://disk.yandex.com.tr/d/l70acR2IuO2ENw

@donperenjon87
Copy link

I have an error Bad address on the advice of the admin from the Discord, I publish here the files from the folder xs-snapshots
https://drive.google.com/drive/folders/1XDXGJ_8iMqi8kq6MX_lhK0D7fRtHkE3c?usp=sharing

image_2021-08-26_16-01-36

@jjangg96
Copy link

1|ag-chain-cosmos  | 12:08PM INF starting ABCI with Tendermint
1|ag-chain-cosmos  | 12:08PM INF Starting multiAppConn service impl=multiAppConn module=proxy
1|ag-chain-cosmos  | 12:08PM INF Starting localClient service connection=query impl=localClient module=abci-client
1|ag-chain-cosmos  | 12:08PM INF Starting localClient service connection=snapshot impl=localClient module=abci-client
1|ag-chain-cosmos  | 12:08PM INF Starting localClient service connection=mempool impl=localClient module=abci-client
1|ag-chain-cosmos  | 12:08PM INF Starting localClient service connection=consensus impl=localClient module=abci-client
1|ag-chain-cosmos  | 12:08PM INF Starting EventBus service impl=EventBus module=events
1|ag-chain-cosmos  | 12:08PM INF Starting PubSub service impl=PubSub module=pubsub
1|ag-chain-cosmos  | 12:08PM INF Starting IndexerService service impl=IndexerService module=txindex
1|ag-chain-cosmos  | 12:08PM INF ABCI Handshake App Info hash="\x13\x04��S�N�Tt�\n\x1f+/�*\x10��}�u���\t�\x06���" height=74589 module=consensus protocol-version=0 software-version=0.26.15
1|ag-chain-cosmos  | 12:08PM INF ABCI Replay Blocks appHeight=74589 module=consensus stateHeight=74589 storeHeight=74590
1|ag-chain-cosmos  | 12:08PM INF Replay last block using real app module=consensus
1|ag-chain-cosmos  | 12:08PM INF minted coins from module account amount=387656ubld from=mint module=x/bank
1|ag-chain-cosmos  | 2021-08-26T12:08:29.898Z launch-chain: Launching SwingSet kernel
1|ag-chain-cosmos  | cannot read snapshot /home/ubuntu/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/1a5809881eb945b953d1b8a3325b2a4032302143205e368e0505877bcd9eca9b-load-QOByIT.xss: Bad address
1|ag-chain-cosmos  | 2021-08-26T12:09:10.893Z SwingSet: kernel: ##### KERNEL PANIC: unable to re-create vat v10 #####
1|ag-chain-cosmos  | portHandler threw (ExitCode#1)
1|ag-chain-cosmos  | ExitCode#1: v10:zoe exited: I/O error
1|ag-chain-cosmos  |   at new ErrorCode (packages/xsnap/api.js:49:5)
1|ag-chain-cosmos  |   at ChildProcess.<anonymous> (packages/xsnap/src/xsnap.js:124:22)
1|ag-chain-cosmos  |   at ChildProcess.emit (events.js:400:28)
1|ag-chain-cosmos  | Cannot initialize Controller ExitCode: v10:zoe exited: I/O error

https://drive.google.com/file/d/1PA9iun7nPk11EuaMsRIiAHXfYEnHBhpa/view?usp=sharing

@Syd-ai
Copy link

Syd-ai commented Aug 26, 2021

Hello,

I also had the same issue commented here.

  1. My node was doing fine but i did a restart and then the issue happened. Stuck in the restart loop mentionned above, with the Kernal Panic error.
  2. Reset and re synced from scratch took a long time but fixed the issue
  3. Then I restarted again to do my last restart task and same issue happened again.

You can find here my xs-snapshots files : https://www.dropbox.com/s/qyz5clq5osc2wa2/xs-snapshots.zip?dl=0

@bakarapara
Copy link

gor same error. node failed to restart

full log - https://pastebin.com/7Y3J6tnN

@krisboit
Copy link

Hello,
Same issue here, after a restart got the Panic error message

my xs-snapshots files are here: https://drive.google.com/file/d/1y76aeF4C_29wMkCmCkhqNkYgilA0YKPz/view?usp=sharing

@dckc
Copy link
Member Author

dckc commented Aug 26, 2021

For each task where you are struggling due to an issue beyond your control (such as this one), go ahead and fill out the task in the knack portal before the deadline, and include the URL of this issue https://github.com/Agoric/testnet-notes/issues/33 to explain why you're having trouble.

If you later accomplish the task, just submit again.

@dckc
Copy link
Member Author

dckc commented Aug 26, 2021

Thank you, @humantraffic , @krisboit, @alipostaci2001 ; I managed to download your xs-snapshots directories.

I probably don't need any more, but thanks, everybody!

@mrixl
Copy link

mrixl commented Aug 26, 2021

i restart my node and such an error appeared ```
ag-chain-cosmos[11381]: cannot read snapshot /home/mirxl/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/b382c9e68a0e18d73280298d1aeda03ec76346b43cbc8146e5f940a568cf8062-load-cWHBIP.xss: Bad address

@dckc
Copy link
Member Author

dckc commented Aug 26, 2021

Thanks, @mrixl ... but for others coming here, I don't think we need more logs that look pretty much the same. Feel free to just 👍 the issue or something, and as I say in https://github.com/Agoric/testnet-notes/issues/33#issuecomment-906471131 , cite the URL of this issue in knack portal submissions.

@dckc
Copy link
Member Author

dckc commented Aug 26, 2021

@warner do want the whole ag-chain-cosmos state directory?

@sshamanov

This comment has been minimized.

@ColinkaMir

This comment has been minimized.

@dckc
Copy link
Member Author

dckc commented Aug 26, 2021

Would a few of you please share your whole .ag-chain-cosmos directory?

It would probably save us time in reproducing the problem.

Sorry I didn't ask for it in the first place.

@humantraffic , @krisboit, @alipostaci2001 @sshamanov @Syd-ai

@asifhj
Copy link

asifhj commented Aug 26, 2021

Could not complete restart task due to Bad address issue.
image

@edwardmorra-btc
Copy link

edwardmorra-btc commented Aug 26, 2021

Hey! I have the same issue.
I also had the same issue commented here.
After restarting task it got stuck with a Kernel panic error discussed here and in our discord. Below you can find my logs attached, hope that helps.
Thank you for your time!

https://www.dropbox.com/s/er4lmj3jlfc8fo7/log.txt?dl=0

UPD:
https://www.dropbox.com/s/5cj1sq33g5nde0p/xs-snapshots.zip?dl=0

@aditya-manit
Copy link

+1, Looks like a popular issue 😛 😛

@humantraffic
Copy link

Would a few of you please share your whole .ag-chain-cosmos directory?

It would probably save us time in reproducing the problem.

Sorry I didn't ask for it in the first place.

@humantraffic , @krisboit, @alipostaci2001 @sshamanov @Syd-ai

yeah, np.
https://drive.google.com/file/d/1n_EnE9Juhxq30MLIKpwNd3MENw6uM6CE/view?usp=sharing

@kalpatech-team
Copy link

@Syd-ai
Copy link

Syd-ai commented Aug 27, 2021

Would a few of you please share your whole .ag-chain-cosmos directory?

It would probably save us time in reproducing the problem.

Sorry I didn't ask for it in the first place.

@humantraffic , @krisboit, @alipostaci2001 @sshamanov @Syd-ai

Here you go

https://drive.google.com/file/d/1QoiLuAvlh9x5prb01KJ6Lk3ARNvJ7lRF/view?usp=sharing

Happy investigation 🙏

@sshamanov
Copy link

Would a few of you please share your whole .ag-chain-cosmos directory?

It would probably save us time in reproducing the problem.

Sorry I didn't ask for it in the first place.

@humantraffic , @krisboit, @alipostaci2001 @sshamanov @Syd-ai

https://disk.yandex.ru/d/7CChawS92qeyVw

@niocris
Copy link

niocris commented Aug 28, 2021

Hello, exactly the same problem, my post Agoric/testnet-notes#38

@dckc
Copy link
Member Author

dckc commented Aug 30, 2021

Thanks. It looks like I have a couple full node state backups now.

jupyter@slog45nb:~$ ls -lR dx-collect/33-panic/
dx-collect/33-panic/:
total 8
drwxr-xr-x 2 jupyter jupyter 4096 Aug 27 20:00 Syd-ai
drwxr-xr-x 2 jupyter jupyter 4096 Aug 27 18:29 humantraffic

dx-collect/33-panic/Syd-ai:
total 13578524
-rw-r--r-- 1 jupyter jupyter 13904400441 Aug 27 19:58 ag-chain-cosmos-SYD.zip

dx-collect/33-panic/humantraffic:
total 12596552
-rw-r--r-- 1 jupyter jupyter 12898861668 Aug 27 18:15 ag-chain-cosmos.tar.gz

p.s. I think object storage a better fit for .tar.gz files...

jupyter@slog45nb:~$ gsutil -m rsync -r dx-collect/ gs://slogfile-upload-5/dx-collect/

WARNING: gsutil rsync uses hashes when modification time is not available at
both the source and destination. Your crcmod installation isn't using the
module's C extension, so checksumming will run very slowly. If this is your
first rsync since updating gsutil, this rsync can take significantly longer than
usual. For help installing the extension, please see "gsutil help crcmod".

Building synchronization state...
Starting synchronization...
Copying file://dx-collect/33-panic/Syd-ai/ag-chain-cosmos-SYD.zip [Content-Type=application/zip]...
==> NOTE: You are uploading one or more large file(s), which would run          
significantly faster if you enable parallel composite uploads. This
feature can be enabled by editing the
"parallel_composite_upload_threshold" value in your .boto
configuration file. However, note that if you do this large files will
be uploaded as `composite objects
<https://cloud.google.com/storage/docs/composite-objects>`_,which
means that any user who downloads such objects will need to have a
compiled crcmod installed (see "gsutil help crcmod"). This is because
without a compiled crcmod, computing checksums on composite objects is
so slow that gsutil disables downloads of composite objects.

Copying file://dx-collect/33-panic/humantraffic/ag-chain-cosmos.tar.gz [Content-Type=application/x-tar]...
| [2/2 files][ 25.0 GiB/ 25.0 GiB] 100% Done  81.8 MiB/s ETA 00:00:00           
Operation completed over 2 objects/25.0 GiB. 

@jennyhys
Copy link

jennyhys commented Sep 3, 2021

same problem happened to our node as well

@jennyhys
Copy link

jennyhys commented Sep 7, 2021

@dckc any idea what might went wrong? Should I share my xs-snapshot here as well?

@dckc
Copy link
Member Author

dckc commented Sep 7, 2021

I can reproduce the symptoms by trying to load the snapshot into one of our tools:

connolly@jambox:~/projects/agoric/agoric-sdk/packages/xsnap$ ./moddable/build/bin/lin/release/xsnap -r ~/Downloads/8bae75381c20d536812b972f61b52ae4f8ed4a83ad293070bf8a57e7f87d4e0c-load-7IUrAb.xss 
cannot read snapshot /home/connolly/Downloads/8bae75381c20d536812b972f61b52ae4f8ed4a83ad293070bf8a57e7f87d4e0c-load-7IUrAb.xss: Bad address

I'm struggling to come up with a more detailed diagnosis. I have reached out to our collaborators at Moddable for help.

p.s. @warner it does not look like a case of deleting a snapshot too early. The compressed snapshot is there in the contributed diagnostic materials and the uncompressed snapshot.

It's a little interesting that we don't delete the uncompressed snapshot in this error case. I don't think that was by design, but it's somewhat fortunate in this case.

@dckc
Copy link
Member Author

dckc commented Sep 21, 2021

sdf

Using the swingset-tools branch (7f7fb5125) I was able to replay the first few deliveries:

jupyter@slog45nb:~/agoric-sdk$ git describe --tags --always
agorictest-17-101-g7f7fb5125
jupyter@slog45nb:~/agoric-sdk$ git branch
  master
* swingset-tools

jupyter@slog45nb:~/33-panic$ wc transcript-v10.sst 
   255844  32108212 433442037 transcript-v10.sst

jupyter@slog45nb:~/33-panic$ node ~/agoric-sdk/packages/SwingSet/misc-tools/replay-transcript.js transcript-v10.sst 
argv [ 'transcript-v10.sst' ]
using transcript transcript-v10.sst
creating xsnap helper bundles
xs bundles written
xsnap helper bundles created
manager created
delivery 3: ["message","o+0",{"method":"buildZoe","args":{"body":"[{\"@qclass\":\"slot\",\"iface\":\"Alleged: vatAdminService\",\"index\":0},{\"assetKind\":\"nat\",\"displayInfo\":{\"assetKind\":\"nat\",\"decimal
delivery 4: ["notify",[["p-60",false,{"body":"{\"@qclass\":\"slot\",\"iface\":\"Alleged: timerService\",\"index\":0}","slots":["o-51"]}]]]
...
delivery 23: ["dropExports",["o+20"]]
anachrophobia strikes vat v10
delivery completed with 3 expected syscalls remaining
expected: {"0":"dropImports","1":{"0":"o-63","length":1},"length":2}
expected: {"0":"retireImports","1":{"0":"o-63","length":1},"length":2}
expected: {"0":"retireExports","1":{"0":"o+20","length":1},"length":2}
RUN ERR (Error#1)
Error#1: historical inaccuracy in replay of v10
  at Object.finishReplayDelivery (file:///home/jupyter/agoric-sdk/packages/SwingSet/src/kernel/vatManager/transcript.js:91:23)
  at Object.replayOneDelivery (file:///home/jupyter/agoric-sdk/packages/SwingSet/src/kernel/vatManager/manager-helper.js:176:23)
  at processTicksAndRejections (node:internal/process/task_queues:96:5)
  at async replay (file:///home/jupyter/agoric-sdk/packages/SwingSet/misc-tools/replay-transcript.js:171:7)
  at async run (file:///home/jupyter/agoric-sdk/packages/SwingSet/misc-tools/replay-transcript.js:191:3)

earlier episode:

replay tool crashed: Cannot read property 'unmetered' of undefined

Ouch... now what? hm.

jupyter@slog45nb:~/33-panic$ node ~/agoric-sdk/packages/SwingSet/bin/replay-transcript.js transcript-v10.sst 
argv [ 'transcript-v10.sst' ]
replay-one-vat.js transcript.sst
using transcript transcript-v10.sst
RUN ERR (TypeError#1)
TypeError#1: Cannot read property 'unmetered' of undefined
  at build (file:///home/jupyter/agoric-sdk/packages/SwingSet/src/kernel/liveSlots.js:416:45)
  at makeLiveSlots (file:///home/jupyter/agoric-sdk/packages/SwingSet/src/kernel/liveSlots.js:1173:13)
  at Object.createFromBundle (file:///home/jupyter/agoric-sdk/packages/SwingSet/src/kernel/vatManager/manager-local.js:108:16)
  at replay (file:///home/jupyter/agoric-sdk/packages/SwingSet/bin/replay-transcript.js:132:31)
  at processTicksAndRejections (node:internal/process/task_queues:96:5)
  at async run (file:///home/jupyter/agoric-sdk/packages/SwingSet/bin/replay-transcript.js:170:3)

version info

jupyter@slog45nb:~/33-panic$ node --version
v16.6.1
jupyter@slog45nb:~/agoric-sdk$ git describe --tags --always
@agoric/access-token@0.4.13-27-g44cd72f8e

How the log file was extracted

jupyter@slog45nb:~/33-panic$ node ~/agoric-sdk/packages/SwingSet/bin/extract-transcript-from-slogfile.js humantraffic-agorictest17-chain.slog.gz v10 > ,out 2> ,err

jupyter@slog45nb:~/33-panic$ wc transcript-v10.sst 
   255844  32108212 433442037 transcript-v10.sst

jupyter@slog45nb:~/33-panic$ ls ~/agoric-sdk/packages/SwingSet/bin/
extract-transcript-from-kerneldb.js extract-transcript-from-slogfile.js rekernelize replay-transcript.js vat

@dckc dckc transferred this issue from Agoric/testnet-notes Sep 22, 2021
@dckc dckc added this to the Beta Phase 4: Governance milestone Sep 22, 2021
warner added a commit that referenced this issue Sep 29, 2021
* fix a major memory leak: 64 bytes per Map `delete()`, 32 per Set `delete()`
  * should: closes #3839
* unfortunately Map/Set deletion is now O(N) not O(1)
* possibly fix #3877 "cannot read (corrupted?) snapshot"

Note that this breaks snapshot compatibility, and probably metering
compatibility.

closes #3889
warner added a commit that referenced this issue Sep 30, 2021
We upgrade the XS submodule to the latest version:
Moddable-OpenSource/moddable@10cc52e

This fixes a major memory leak: 64 bytes per Map `delete()`, 32 per Set
`delete()`. We believe this should: closes #3839

Unfortunately Map/Set deletion is now O(N) not O(1).

This version of XS also fixes a bug that might be the cause of #3877 "cannot
read (corrupted?) snapshot", but we're not sure.

Note that this breaks snapshot compatibility (snapshots created before this
version cannot be loaded by code after this version). It might also
break metering compatibility, but the chances seem low enough that we decided
to leave the metering version alone.

closes #3889
warner added a commit that referenced this issue Sep 30, 2021
We upgrade the XS submodule to the latest version:
Moddable-OpenSource/moddable@10cc52e

This fixes a major memory leak: 64 bytes per Map `delete()`, 32 per Set
`delete()`. We believe this should: closes #3839

Unfortunately Map/Set deletion is now O(N) not O(1).

This version of XS also fixes a bug that might be the cause of #3877 "cannot
read (corrupted?) snapshot", but we're not sure.

Note that this breaks snapshot compatibility (snapshots created before this
version cannot be loaded by code after this version). It might also
break metering compatibility, but the chances seem low enough that we decided
to leave the metering version alone.

closes #3889
@dckc dckc added bug Something isn't working testnet-problem problem found during incentivized testnet xsnap the XS execution tool labels Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working testnet-problem problem found during incentivized testnet xsnap the XS execution tool
Projects
None yet
Development

Successfully merging a pull request may close this issue.