Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade-info.json not dumped correctly #10428

Closed
4 tasks
RiccardoM opened this issue Oct 25, 2021 · 10 comments · Fixed by #10532
Closed
4 tasks

upgrade-info.json not dumped correctly #10428

RiccardoM opened this issue Oct 25, 2021 · 10 comments · Fixed by #10532
Assignees
Labels
C:Cosmovisor Issues and PR related to Cosmovisor T:Bug

Comments

@RiccardoM
Copy link
Contributor

RiccardoM commented Oct 25, 2021

Summary of bug

During an on-chain upgrade, the upgrade-info.json file is not dumped correctly, resulting in a chain halt.

Version

v0.44.2

Steps to reproduce

Context

Inside our testnet, we have always used Cosmovisor v.0.1.0 in order to manage the various on-chain upgrades that we had. It always worked fine: the upgrade info was read properly from the chain and the binaries were downloaded and run correctly.

Recently, since v.0.1.0 does not support automatic downloads for Cosmos v0.44.x binaries (see here), we asked every node operator to update to Cosmovisor v1.0.0. Unfortunately, it caused a lot of errors during various phases. Following, the whole things we did when upgrading to Cosmovisor v1.0.0 and all the errors that were raised.

Upgrading procedure

1. We installed the correct Cosmovisor version running

go install github.com/cosmos/cosmos-sdk/cosmovisor/cmd/cosmovisor@v1.0.0

2. We tried verifying the running Cosmovisor version:

cosmovisor version

This raised the first error:

ERR failed to read error="lstat /root/.desmos/cosmovisor/current/upgrade-info.json: no such file or directory" filename=/root/.desmos/cosmovisor/current/upgrade-info.json module=cosmovisor

This seems caused by the fact that Cosmovisor searches for that file and it raises an error when not found. However, after running cosmovisor start the chain re-starts properly without much problems even if that file is not present.

3. We tried an on-chain upgrade. This is were things went really bad, even halting our chain.
We submitted the following upgrade proposal on our testnet the same we always did: Desmos v2.1.0 upgrade proposal. We specified an upgrade info link that points to a valid JSON file.

When the upgrade height came, however, the upgrade failed with the following error:

INF pre-upgrade command does not exist. continuing the upgrade. module=cosmovisor
INF No upgrade binary found, beginning to download it module=cosmovisor
ERR  error="cannot download binary. downloading reference link : invalid source string: " module=cosmovisor

I then took a look at the ~/.desmos/data/upgrade-info.json file and it looked like this:

{
  "name":"v2.1.0",
  "height":2589025
}

It appears that the upgrade info is not dumped properly, even thought it was present on chain (see the proposal details here). This prevented Cosmovisor from downloading the binaries, effectively halting our chain.

We solve this problem by manually replacing the ~/.desmos/data/upgrade-info.json contents adding the info field:

{
  "name":"v2.1.0",
  "height":2589025,
  "info":"https://raw.githubusercontent.com/desmos-labs/morpheus/master/morpheus-apollo-2/upgrades/v2.1.0.json?checksum=sha256:8c58278bbaa8a01a91f1fbaa60e1fae08969dc72f2848be4b45e278d74dc3a86"
}

After restarting Cosmovisor, this triggered the upgrade successfully and started downloading the files. It even copied the upgrade-info.json contents correctly inside the ~/.desmos/cosmovisor/upgrades/v2.1.0/upgrade-info.json file.

Additional info

  • Both software versions (pre-upgrade and post-upgrade) are based on Cosmos v0.44.x
  • The pre-upgrade software version was Desmos v2.0.0
  • The post-upgrade software version is Desmos v2.1.0-testnet

For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@RiccardoM RiccardoM changed the title Upgrading to Cosmovisor v1.0.0 throws multiple errors upgrade-info.json not dumped correctly Oct 29, 2021
@RiccardoM
Copy link
Contributor Author

RiccardoM commented Oct 29, 2021

Update

Yesterday we had another on-chain upgrade always using Cosmovisor 1.0.0. The same thing happened: the upgrade plan was not dumped correctly and the chain halted.

Upgrade proposal (one block before upgrade height)

$ desmos q gov proposal 28 --height 2643234 --output json | jq .
{
  "proposal_id": "28",
  "content": {
    "@type": "/cosmos.upgrade.v1beta1.SoftwareUpgradeProposal",
    "title": "Desmos v2.2.0 upgrade",
    "description": "**Desmos v2.2.0 upgrade**\\nThis proposal aims at upgrading the chain software to version v2.2.0. The main change inside this version consists in a bug fix inside the `x/profiles` module. You can read more about such changes [here](https://github.com/desmos-labs/desmos/releases/tag/v2.2.0-testnet). Please note that in order to make sure your node can upgrade safely, you will need to have [Cosmovisor 1.0.0](https://github.com/cosmos/cosmos-sdk/releases/tag/cosmovisor%2Fv1.0.0) installed. If you need help installing it please join [our Discord chat](https://discord.gg/SgqMy8VYE8). \\n\\n By voting YES to this proposal you will signal that you are ready for the upgrade. If this proposal passes, the upgrade will be scheduled to happen at height 2.643.235 which will be around Thursday 28th October at 14:00 UTC.",
    "plan": {
      "name": "v2.2.0",
      "time": "0001-01-01T00:00:00Z",
      "height": "2643235",
      "info": "https://raw.githubusercontent.com/desmos-labs/morpheus/master/morpheus-apollo-2/upgrades/v2.2.0.json?checksum=sha256:6b0b4e97d8c40c8b7ed5193653dd096f84d6a5a627c13a692288dc98f1a97dd4",
      "upgraded_client_state": null
    }
  },
  "status": "PROPOSAL_STATUS_PASSED",
  "final_tally_result": {
    "yes": "17061940745845",
    "abstain": "0",
    "no": "0",
    "no_with_veto": "0"
  },
  "submit_time": "2021-10-26T12:39:35.069121980Z",
  "deposit_end_time": "2021-10-28T12:39:35.069121980Z",
  "total_deposit": [
    {
      "denom": "udaric",
      "amount": "10000000"
    }
  ],
  "voting_start_time": "2021-10-26T12:39:35.069121980Z",
  "voting_end_time": "2021-10-28T12:39:35.069121980Z"
}

On-chain upgrade plan (one block before the upgrade height)

$ desmos q upgrade plan --height 2643234 --output json | jq .
{
  "name": "v2.2.0",
  "time": "0001-01-01T00:00:00Z",
  "height": "2643235",
  "info": "https://raw.githubusercontent.com/desmos-labs/morpheus/master/morpheus-apollo-2/upgrades/v2.2.0.json?checksum=sha256:6b0b4e97d8c40c8b7ed5193653dd096f84d6a5a627c13a692288dc98f1a97dd4",
  "upgraded_client_state": null
}

Dumped JSON (at the upgrade height)

$ cat ~/.desmos/data/upgrade-info.json | jq .
{
  "name":"v2.2.0",
  "height":2643235
}

Error logs

Started Desmos Full Node.
 5:13AM INF Configuration is valid:
 Configurable Values:
   DAEMON_HOME: /home/forbole/.desmos
   DAEMON_NAME: desmos
   DAEMON_ALLOW_DOWNLOAD_BINARIES: true
   DAEMON_RESTART_AFTER_UPGRADE: true
   DAEMON_POLL_INTERVAL: 300ms
   UNSAFE_SKIP_BACKUP: true
   DAEMON_PREUPGRADE_MAX_RETRIES: 0
 Derived Values:
         Root Dir: /home/forbole/.desmos/cosmovisor
      Upgrade Dir: /home/forbole/.desmos/cosmovisor/upgrades
      Genesis Bin: /home/forbole/.desmos/cosmovisor/genesis/bin/desmos
   Monitored File: /home/forbole/.desmos/data/upgrade-info.json
  module=cosmovisor
 5:13AM INF running app args=["start"] module=cosmovisor path=/home/forbole/.desmos/cosmovisor/upgrades/v2.1.0/bin/desmos
 5:13AM INF Daemon shutting down in an attempt to restart module=cosmovisor
 5:13AM INF pre-upgrade command does not exist. continuing the upgrade. module=cosmovisor
 5:13AM INF No upgrade binary found, beginning to download it module=cosmovisor
 5:13AM ERR  error="cannot download binary. downloading reference link : invalid source string: " module=cosmovisor
Main process exited, code=exited, status=1/FAILURE

@robert-zaremba robert-zaremba added C:Cosmovisor Issues and PR related to Cosmovisor T:Bug labels Oct 29, 2021
@robert-zaremba
Copy link
Collaborator

Thanks for a clear and detailed report. It looks to be urgent.

@robert-zaremba
Copy link
Collaborator

@yaruwangway - do you have time to investigate this issue?

@yaruwangway
Copy link
Contributor

Hi @RiccardoM , in this valid JSON file, can you delete the space between all the key value pair to see if it works again ? for example:

  • delete space between "binaries": and {,
  • delete space between "darwin/amd64": and "htttps:

@yaruwangway
Copy link
Contributor

Hi @RiccardoM , can you also let me know from which sdk version you are upgrading to sdk version v0.44.2 ?

@RiccardoM
Copy link
Contributor Author

@yaruwangway We can't try editing the JSON file cause the upgrade plan contains a checksum of it and changing the contents would cause the hashsum to be invalid and the upgrade to fail. Plus, we managed to solve the issue and the upgrade has now passed. I would refrain from testing this on our running testnet since it might cause other disruptions. It might be better to have some unit test setup instead either inside Cosmovisor or the x/upgrade module.

About your second question, we were not upgrading to any Cosmos SDK version during neither of the two mentioned upgrades. The Cosmos SDK upgrade was performed with a previous upgrade that did not have any problem.

@robert-zaremba
Copy link
Collaborator

@RiccardoM , did you identify the issue?

@RiccardoM
Copy link
Contributor Author

@robert-zaremba No I haven't. I've checked the x/upgrade module and did not find anything there. I did not have time to watch for the Cosmovisor code though.

@dwedul-figure
Copy link
Collaborator

After doing some research on this, there's a couple things I'd like to address.

Automatic upgrades not working:

The upgrade module in v0.44.2 of Cosmos-SDK does not output the plan info in the data/upgrade-info.json file. It only contains the name and height. With PR #8590, the "info" field was added and the cosmovisor code was updated to use it. That PR has been merged to master, but is not yet part of any released version of the Cosmos-SDK. However, it is part of the cosmovisor v1.0.0 release.

Basically, the most recent version of cosmovisor won't do automatic upgrades for chains based on Cosmos-SDK prior to v0.45 (which isn't out yet).

Error message in cosmovisor version:

This message is mostly harmless:

ERR failed to read error="lstat /root/.desmos/cosmovisor/current/upgrade-info.json: no such file or directory" filename=/root/.desmos/cosmovisor/current/upgrade-info.json module=cosmovisor

What's happening is that the version of cosmovisor is being printed, then it's attempting to run your configured daemon using the version command (to output version info of your daemon as well). It's using the same run processes that it uses for any other command provided to it. Part of that involves loading that file if available. It's made even more confusing because currently, there's a bug where the version string is blank after installing via go install cosmovisor.

After all that, though, I don't think there will ever automatically be an upgrade-info.json file in the genesis directory. So that case can be better handled to suppress that message; either change it to an info log or else don't log it if current points to genesis.

mergify bot pushed a commit to desmos-labs/desmos that referenced this issue Nov 2, 2021
## Description
This PR updates Cosmos to `v0.44.3` and also cherry picks the changes that have been made inside cosmos/cosmos-sdk#8590 to fix the issue described in cosmos/cosmos-sdk#10428

<!-- Add a description of the changes that this PR introduces and the files that
are the most critical to review. -->

---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] added `!` to the type prefix if API or client breaking change
- [x] targeted the correct branch (see [PR Targeting](https://github.com/desmos-labs/desmos/blob/master/CONTRIBUTING.md#pr-targeting))
- [ ] provided a link to the relevant issue or specification
- [ ] followed the guidelines for [building modules](https://docs.cosmos.network/v0.44/building-modules/intro.html)
- [ ] included the necessary unit and integration [tests](https://github.com/desmos-labs/desmos/blob/master/CONTRIBUTING.md#testing)
- [x] added a changelog entry to `CHANGELOG.md`
- [ ] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [ ] updated the relevant documentation or specification
- [x] reviewed "Files changed" and left comments if necessary
- [x] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)
@robert-zaremba
Copy link
Collaborator

robert-zaremba commented Nov 18, 2021

Indeed we were missing Info in the dumped file. This is in master but was not there in 0.44.x. We have backported it in #10532 and will be a part of the next 0.44.x release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:Cosmovisor Issues and PR related to Cosmovisor T:Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants