Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Upgrade from v0.47.1 -> v0.47.2 causing issues #16027

Closed
ToasterTheBrave opened this issue May 3, 2023 · 16 comments
Closed

[Bug]: Upgrade from v0.47.1 -> v0.47.2 causing issues #16027

ToasterTheBrave opened this issue May 3, 2023 · 16 comments
Assignees

Comments

@ToasterTheBrave
Copy link

Summary of Bug

When upgrading from cosmos-sdk@v0.47.1 to cosmos-sdk@v0.47.2 and starting a new testnet, the testnet fails to start and produce blocks.

The same steps worked previously in cosmos-sdk@v0.47.1. After the upgrade, no blocks are being created and no obvious error messages are being produced. The best I have is the output of the status.

# curl -S localhost:26657/status | json_pp
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   839  100   839    0     0  26764      0 --:--:-- --:--:-- --:--:-- 27966
{
   "id" : -1,
   "jsonrpc" : "2.0",
   "result" : {
      "node_info" : {
         "channels" : "40202122233038606100",
         "id" : "01038f88c61bbfdf38c79a6c50b207fcd1ee70f3",
         "listen_addr" : "192.168.56.21:26656",
         "moniker" : "paloma-priv-testnet-m1",
         "network" : "private-mousedove-432E2A",
         "other" : {
            "rpc_address" : "tcp://0.0.0.0:26657",
            "tx_index" : "on"
         },
         "protocol_version" : {
            "app" : "0",
            "block" : "11",
            "p2p" : "8"
         },
         "version" : "0.37.1"
      },
      "sync_info" : {
         "catching_up" : false,
         "earliest_app_hash" : "",
         "earliest_block_hash" : "",
         "earliest_block_height" : "0",
         "earliest_block_time" : "1970-01-01T00:00:00Z",
         "latest_app_hash" : "",
         "latest_block_hash" : "",
         "latest_block_height" : "0",
         "latest_block_time" : "1970-01-01T00:00:00Z"
      },
      "validator_info" : {
         "address" : "51BF73BFE54368893D50076B417B86DE71FE12E6",
         "pub_key" : {
            "type" : "tendermint/PubKeyEd25519",
            "value" : "pVW282uuYLl2NH82B8heC5GzN6B21cbCY0ATqDPm0Qo="
         },
         "voting_power" : "10000"
      }
   }
}

There are occasionally error messages, but they don't seem overly concerning as it's a host trying to add itself to the address book.

Failed to add new address
Cannot add ourselves with address 58efc00dab37e2d74a11bf846ad1b8934922368c@192.168.56.23:26656

Version

v0.47.2

Steps to Reproduce

This was found in palomachain. To reproduce, upgrade cosmos-sdk and cometbtf there and start up a new testnet. I suspect it may be reproducible in other cosmos apps, but I have not tried.

To further show issues, you can try submitting a proposal. We are submitting as such:

palomad tx gov submit-legacy-proposal evm propose-new-chain bnb-main 56 1000000 24532924 0x746f51c42fc732c46beda0be2f58d74512841eac93fa4ee5512f77938eec7bf5 \
      --title "Support for Binance Smartchain Mainnet on $CHAIN_ID" \
      --description "Add Binance Smart Chain mainnet as supported chain." \
      --deposit 10000000ugrain \
      --fees 400ugrain \
      --gas auto \
      --from="$ADDRESS" \
      --yes

And it returns

Error: rpc error: code = Unknown desc = paloma is not ready; please wait for first block: invalid height
@tac0turtle
Copy link
Member

thanks for opening the issue, could you link me to your code in order to test out the changes

@julienrbrt julienrbrt added the S:needs more info This bug can't be addressed until more information is provided by the reporter. label May 4, 2023
@tac0turtle
Copy link
Member

When upgrading from cosmos-sdk@v0.47.1 to cosmos-sdk@v0.47.2 and starting a new testnet, the testnet fails to start and produce blocks.

im running a testnet and upgraded things with no issue, i tried a few times if i missed something with no luck.

I started a testnet with 0.47.1 ran it a few blocks submitted txs then stopped it and upgraded to 0.47.2 and submitted more txs with success.

I tested with 0.47.0 then upgrade to 0.47.2 as well with blocks and transactions passing.

Could you walk me through what you did here?

@tac0turtle tac0turtle added T:Sprint and removed T:Bug labels May 5, 2023
@ToasterTheBrave
Copy link
Author

Hi @tac0turtle! Thanks for looking into this.

I am starting a fresh paloma testnet with 0.47.2 and seeing these errors. The bootstrapping of the nodes is in an ansible runbook, but I will write up a gist to reproduce this.

@taariq
Copy link

taariq commented May 9, 2023

Okay. So this seems to be a setup issue on the Paloma side of our private testnet. @ToasterTheBrave will close.

@ToasterTheBrave
Copy link
Author

ToasterTheBrave commented May 15, 2023

I've narrowed down exactly what's causing this. It appears to be a specific set of config around logging when starting up a new testnet. Specifically, I'm seeing that log_level="debug" and log_format="json" is breaking. I've got steps to reproduce using simapp.

Environment:

  • I'm running on Vagrant locally. I have three hosts, hostnames m1, m2, and m3
  • I built simapp twice, once for v0.47.1 and once for v0.47.2 to compare. I ran these same commands for each binary

Reproduction steps:

# all hosts
KEY=simapp-$(hostname)
MONIKER=simapp-priv-testnet-$(hostname)
CHAIN=private-simapp-810226

simd init $MONIKER --chain-id $CHAIN

# This is done so we don't have to set a passphrase in the example.  This has been confirmed
# not to change the example case
sed -i 's/keyring-backend = ".*"/keyring-backend = "test"/g' /root/.simapp/config/client.toml

# Here is the breaking config
sed -i 's/^log_format = ".*"/log_format = "json"/g' /root/.simapp/config/config.toml
sed -i 's/^log_level = ".*"/log_level = "debug"/g' /root/.simapp/config/config.toml

simd keys add $KEY

mkdir /root/.simapp/config/gentx

# m1
simd genesis add-genesis-account $KEY 5000000000000stake
simd genesis gentx $KEY 10000000000stake --chain-id $CHAIN
scp .simapp/config/genesis.json root@192.168.56.22:/root/.simapp/config/
scp .simapp/config/gentx/* root@192.168.56.22:/root/.simapp/config/gentx/

#m2
simd genesis add-genesis-account $KEY 5000000000000stake
simd genesis gentx $KEY 10000000000stake --chain-id $CHAIN
scp .simapp/config/genesis.json root@192.168.56.23:/root/.simapp/config/
scp .simapp/config/gentx/* root@192.168.56.23:/root/.simapp/config/gentx/

# m3
simd genesis add-genesis-account $KEY 5000000000000stake
simd genesis gentx $KEY 10000000000stake --chain-id $CHAIN
simd genesis collect-gentxs
scp .simapp/config/genesis.json root@192.168.56.21:/root/.simapp/config/
scp .simapp/config/genesis.json root@192.168.56.22:/root/.simapp/config/

# All hosts: get node id
simd tendermint show-node-id

# All hosts: Update the config.toml persistent_peers
# example values:
#   m1: de66d98c8126f3853142be5c6917a0fc59dd8802@192.168.56.22:26656,5e71fddeb6e8b1e980f97a28ccd80a2cc893f18c@192.168.56.23:26656
#   m2: 90f986726e6e9992c04c1b8715b6391928893453@192.168.56.21:26656,5e71fddeb6e8b1e980f97a28ccd80a2cc893f18c@192.168.56.23:26656
#   m3: 90f986726e6e9992c04c1b8715b6391928893453@192.168.56.21:26656,de66d98c8126f3853142be5c6917a0fc59dd8802@192.168.56.22:26656

# All hosts: start the app
simd start

Now, if we check the status of simapp, it's not producing blocks. Ex:

$ simd status | jq
{
  "NodeInfo": {
    "protocol_version": {
      "p2p": "8",
      "block": "11",
      "app": "0"
    },
    "id": "38d0ac703c2de98a4783a3b1d8a06c9678bf41de",
    "listen_addr": "tcp://0.0.0.0:26656",
    "network": "private-simapp-810226",
    "version": "0.37.1",
    "channels": "40202122233038606100",
    "moniker": "simapp-priv-testnet-m1",
    "other": {
      "tx_index": "on",
      "rpc_address": "tcp://127.0.0.1:26657"
    }
  },
  "SyncInfo": {
    "latest_block_hash": "",
    "latest_app_hash": "",
    "latest_block_height": "0",
    "latest_block_time": "1970-01-01T00:00:00Z",
    "earliest_block_hash": "",
    "earliest_app_hash": "",
    "earliest_block_height": "0",
    "earliest_block_time": "1970-01-01T00:00:00Z",
    "catching_up": false
  },
  "ValidatorInfo": {
    "Address": "3D04865B31075A120B22C5B1911930BA4B538469",
    "PubKey": {
      "type": "tendermint/PubKeyEd25519",
      "value": "03JnSP9fy2y/cJ7d8ScklYXUuaE8hZZzYOUkrrpOFf8="
    },
    "VotingPower": "10000"
  }
}

@ToasterTheBrave
Copy link
Author

@alexanderbez Should this be reopened give the above ^?

@tac0turtle
Copy link
Member

@julienrbrt are you able to double check this?

@tac0turtle tac0turtle reopened this May 15, 2023
@julienrbrt julienrbrt self-assigned this May 15, 2023
@julienrbrt julienrbrt removed the S:needs more info This bug can't be addressed until more information is provided by the reporter. label May 15, 2023
@julienrbrt julienrbrt mentioned this issue May 22, 2023
19 tasks
@tac0turtle
Copy link
Member

@julienrbrt and i were unable to reproduce this issue with a variety of configurations. closing this

@julienrbrt
Copy link
Member

What we can do is tagging a v0.47.3-rc.0 for you to try again.
I still do not see how it can be logger related, but we have actually changed the logger in v0.47.3.
My trials were indeed working.

@ToasterTheBrave
Copy link
Author

@julienrbrt I'll test out the new version. In your trials, were you using three nodes? I'm curious what could possibly be different between our example setups.

@ToasterTheBrave
Copy link
Author

@julienrbrt I tested in v0.47.3. This is still a problem. I need more info about how you ran your tests that didn't produce the error so I can give more context if needed.

@mmsqe
Copy link
Contributor

mmsqe commented Jun 9, 2023

@ToasterTheBrave I get same error when upgrade from v0.47.2 to v0.47.3, but seems related to logger change, test pass with cometbft fix.

@julienrbrt julienrbrt reopened this Jun 9, 2023
@ToasterTheBrave
Copy link
Author

@mmsqe Thank you for the comment! Nice to know I'm not going insane and others have seen the issue 🙂. It looks like you've already opened a ticket with cometbft but didn't get any traction?

@mmsqe
Copy link
Contributor

mmsqe commented Jun 10, 2023

@mmsqe Thank you for the comment! Nice to know I'm not going insane and others have seen the issue 🙂. It looks like you've already opened a ticket with cometbft but didn't get any traction?

I think they see the issue, but need more test.

@julienrbrt
Copy link
Member

julienrbrt commented Jun 14, 2023

In your trials, were you using three nodes?

I was using in-process nodes with a tweaked simd testnet with a debug logger, so possibly this was that.

Possibly, the new cometbft release resolves your issue when used with v0.47.3: https://github.com/cometbft/cometbft/blob/v0.37.2/CHANGELOG.md#security-fixes

@ToasterTheBrave
Copy link
Author

@julienrbrt Yes indeed, this is fixed for us now with cometbft v0.37.2 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants