Skip to content

Troubleshooting

Luca Favatella edited this page Jul 16, 2019 · 38 revisions

Troubleshooting

This page describes how to identify and address common issues running the Aeternity node.

Sections at the bottom of the page deal with common issues in building the node or running development tests.

Node failing mining attempts

This section is applicable to version v0.4.1 or greater. It may apply to version v1.3.0 or greater accommodating for deviations.

(Note: the name of the miner executable were updated in v1.0.0-rc2)

(Note: the name of the log files changed in v2.0.0-rc.1)

(Note: the name of the operational script bin/epoch changed to bin/aeternity v1.3.0)

The following assumes that the node is deployed in directory /tmp/node.

Diagnosis

If the node attempts to mine though fails to do so, you shall read error log entries in /tmp/node/log/epoch_mining.log.

You may read log entries in /tmp/node/log/epoch_mining.log like the following...

2018-01-03 10:18:23.812 [info] <0.903.0>@aec_conductor:create_block_candidate:728 Creating block candidate
2018-01-03 10:18:23.815 [info] <0.903.0>@aec_conductor:handle_block_candidate_reply:744 Created block candidate and nonce (max 13078180597498667020, current 13078180597498667021).
2018-01-03 10:18:23.815 [info] <0.903.0>@aec_conductor:start_mining:643 Starting mining
2018-01-03 10:18:25.871 [error] <0.903.0>@aec_conductor:handle_mining_reply:670 Failed to mine block, runtime error; retrying with different nonce (was 13078180597498667021). Error: {execution_failed,{signal,sigkill,false}}
2018-01-03 10:18:25.872 [info] <0.903.0>@aec_conductor:start_mining:643 Starting mining
2018-01-03 10:18:26.230 [error] <0.903.0>@aec_conductor:handle_mining_reply:670 Failed to mine block, runtime error; retrying with different nonce (was 13078180597498667022). Error: {execution_failed,{signal,sigabrt,true}}
2018-01-03 10:18:26.230 [info] <0.903.0>@aec_conductor:start_mining:643 Starting mining
2018-01-03 10:18:26.371 [error] <0.903.0>@aec_conductor:handle_mining_reply:670 Failed to mine block, runtime error; retrying with different nonce (was 13078180597498667023). Error: {execution_failed,{signal,sigabrt,true}}

... - notice "signal,sigabrt" - and you may read corresponding log entries in /tmp/node/log/epoch_pow_cuckoo.log like the following...

2018-01-03 10:18:23.816 [info] <0.913.0>@aec_pow_cuckoo:generate_int:156 Executing cmd: "env LD_LIBRARY_PATH=../lib:$LD_LIBRARY_PATH ./mean30s-generic -h uXkXZrU2tPmyYThehkTmZf6fqOuc6pvxCc87gv/BV8U=DWBQVvYHf7U= -t 5"
2018-01-03 10:18:25.859 [error] <0.913.0>@aec_pow_cuckoo:wait_for_result:362 OS process died: {signal,sigkill,false}
2018-01-03 10:18:25.880 [info] <0.1209.0>@aec_pow_cuckoo:generate_int:156 Executing cmd: "env LD_LIBRARY_PATH=../lib:$LD_LIBRARY_PATH ./mean30s-generic -h uXkXZrU2tPmyYThehkTmZf6fqOuc6pvxCc87gv/BV8U=DmBQVvYHf7U= -t 5"
2018-01-03 10:18:25.935 [error] <0.1209.0>@aec_pow_cuckoo:wait_for_result:347 ERROR: terminate called after throwing an instance of '
2018-01-03 10:18:25.938 [error] <0.1209.0>@aec_pow_cuckoo:wait_for_result:347 ERROR: std::bad_alloc
2018-01-03 10:18:25.939 [error] <0.1209.0>@aec_pow_cuckoo:wait_for_result:347 ERROR: '

2018-01-03 10:18:25.940 [error] <0.1209.0>@aec_pow_cuckoo:wait_for_result:347 ERROR:   what():
2018-01-03 10:18:25.941 [error] <0.1209.0>@aec_pow_cuckoo:wait_for_result:347 ERROR: std::bad_alloc
2018-01-03 10:18:25.942 [error] <0.1209.0>@aec_pow_cuckoo:wait_for_result:347 ERROR:

2018-01-03 10:18:25.942 [debug] <0.1209.0>@aec_pow_cuckoo:parse_generation_result:420 Looking for 42-cycle on cuckoo30("uXkXZrU2tPmyYThehkTmZf6fqOuc6pvxCc87gv/BV8U=DmBQVvYHf7U=",0) with 50% edges
2018-01-03 10:18:26.229 [error] <0.1209.0>@aec_pow_cuckoo:wait_for_result:362 OS process died: {signal,sigabrt,true}
2018-01-03 10:18:26.230 [info] <0.1211.0>@aec_pow_cuckoo:generate_int:156 Executing cmd: "env LD_LIBRARY_PATH=../lib:$LD_LIBRARY_PATH ./mean30s-generic -h uXkXZrU2tPmyYThehkTmZf6fqOuc6pvxCc87gv/BV8U=D2BQVvYHf7U= -t 5"
2018-01-03 10:18:26.233 [error] <0.1211.0>@aec_pow_cuckoo:wait_for_result:347 ERROR: terminate called after throwing an instance of '
2018-01-03 10:18:26.234 [error] <0.1211.0>@aec_pow_cuckoo:wait_for_result:347 ERROR: std::bad_alloc
2018-01-03 10:18:26.235 [error] <0.1211.0>@aec_pow_cuckoo:wait_for_result:347 ERROR: '

2018-01-03 10:18:26.235 [error] <0.1211.0>@aec_pow_cuckoo:wait_for_result:347 ERROR:   what():
2018-01-03 10:18:26.235 [error] <0.1211.0>@aec_pow_cuckoo:wait_for_result:347 ERROR: std::bad_alloc
2018-01-03 10:18:26.236 [error] <0.1211.0>@aec_pow_cuckoo:wait_for_result:347 ERROR:

2018-01-03 10:18:26.236 [debug] <0.1211.0>@aec_pow_cuckoo:parse_generation_result:420 Looking for 42-cycle on cuckoo30("uXkXZrU2tPmyYThehkTmZf6fqOuc6pvxCc87gv/BV8U=D2BQVvYHf7U=",0) with 50% edges
2018-01-03 10:18:26.371 [error] <0.1211.0>@aec_pow_cuckoo:wait_for_result:362 OS process died: {signal,sigabrt,true}

... - notice "bad_alloc". These are symptoms of memory allocation issues.

Resolution

In presence of memory constrains, you can configure a less memory-intensive (though usually slower) algorithm than the default one. Amend the mining section in the user configuration file from:

mining:
    autostart: true

... to ...

mining:
    autostart: true
    cuckoo:
        miner:
            executable: lean29-generic
            extra_args: ""
            edge_bits: 29

... then stop and start the node (( cd /tmp/node; bin/epoch stop; bin/epoch start; )).

Note: until v1.0.0-rc2 the arguments would be:

...
            executable: lean30
            extra_args: ""
            node_bits: 30

Node won't start with privileged user

This section is applicable to all versions. It may apply to version v1.3.0 or greater accommodating for deviations.

The following assumes that the node is deployed in directory /tmp/node.

Diagnosis

If the node won't start check the log entries in /tmp/node/log/epoch.log for similar errors:

2018-03-05 10:15:16.973 [error] <0.950.0>@gen_server:init_it:357 CRASH REPORT Process exec with 0 neighbours exited with reason: bad return value: "Port program /tmp/node/lib/erlexec-1.7.1/priv/x86_64-unknown-linux-gnu/exec-port with SUID bit set is not allowed to run without setting effective user!" in gen_server:init_it/6 line 357
2018-03-05 10:15:16.974 [error] <0.949.0> Supervisor exec_app had child exec started with exec:start_link([]) at undefined exit with reason bad return value: "Port program /tmp/node/lib/erlexec-1.7.1/priv/x86_64-unknown-linux-gnu/exec-port with SUID bit set is not allowed to run without setting effective user!" in context start_error
2018-03-05 10:15:16.974 [error] <0.947.0> CRASH REPORT Process <0.947.0> with 0 neighbours exited with reason: {{shutdown,{failed_to_start_child,exec,{bad_return_value,"Port program /tmp/node/lib/erlexec-1.7.1/priv/x86_64-unknown-linux-gnu/exec-port with SUID bit set is not allowed to run without setting effective user!"}}},{exec_app,start,[normal,[]]}} in application_master:init/4 line 134
2018-03-05 10:15:16.975 [info] <0.855.0> Application erlexec exited with reason: {{shutdown,{failed_to_start_child,exec,{bad_return_value,"Port program /tmp/node/lib/erlexec-1.7.1/priv/x86_64-unknown-linux-gnu/exec-port with SUID bit set is not allowed to run without setting effective user!"}}},{exec_app,start,[normal,[]]}}
  • notice "SUID bit set is not allowed to run without setting effective user!"

These are symptoms of running the node with privileged user. This can be confirmed by running the id command:

root@localhost:~# id
uid=0(root) gid=0(root) groups=0(root)
  • Notice the uid=0
  • Notice the # at the command prompt

Resolution

You must run your node with non-privileged user for security reasons. Just create a new user (or use already existing one) and make sure the node files are owner by it.

For example:

useradd -m epoch
chown -R epoch:epoch /tmp/node

Change your current user to the newly created one either by relogin with it or using su command if you're still logged in with privileged user:

su epoch

Verify that the user is non-privileged with the id command:

epoch@localhost:~$ id
uid=1001(epoch) gid=1001(epoch) groups=1001(epoch)
  • Notice the uid=1001 (should not be 0)
  • Notice the $ at the command prompt

Node won't start after release update

This section is applicable to v 0.10 or later. It may apply to version v1.3.0 or greater accommodating for deviations.

The following assumes that the node is deployed in directory /tmp/node.

Diagnosis

If the node won't start check the log entries in /tmp/node/log/epoch.log for similar errors:

18:32:06.799 [warning] Expected genesis block hash <<25,77,27,140,45,8,239,74,202,109,47,84,197,77,26,105,247,45,34,75,2,251,250,254,2,212,58,196,52,160,198,153>>, persisted genesis block hash <<173,144,161,143,82,71,247,64,21,169,150,78,58,247,121,152,208,163,240,137,125,185,215,180,252,4,43,172,40,52,188,92>>
18:32:06.799 [error] Persisted chain has a different genesis block than the one being expected. Aborting
{"Kernel pid terminated",application_controller,"{application_start_failure,aecore,{inconsistent_database,{aecore_app,start,[normal,[]]}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,aecore,{inconsistent_database,{aecore_app,start,[normal,[]]}}})

This is a symptom of running the node with an old persisted DB while there had been a non-backwards compatible change in the protocol including the genesis block.

Resolution

You must start the node with a clean DB directory. This can be achieved by:

  • Deleting the contents of your current DB directory

  • Setting a new directory in your user configuration file

Node does not sync with other nodes

This section is applicable to all versions.

Diagnosis

The node gets stuck at a different top height (most commonly close to the top of the chain) than the rest of the network and fail to synchronize.

In rare cases the node will end up in a steady state where it thinks it is fully synchronized, but it is not. Normally it will get out of this state by ping:ing another node but depending on network topology and timing this might fail.

Resolution

Restarting the node will re-start the synchronization.

Chain WS API does not work as expected (get block by hash)

This section is applicable to v0.21.

Diagnosis

Get block by hash doesn't return the list of transactions for micro blocks.

Resolution

Instead of the WS API, it's possible to use the HTTP API. The /micro-blocks/hash/{hash}/transactions endpoint can be used to retrieve the list of transactions.

Node failing compilation of Sophia contract using maps

This section is applicable to v0.25.0.

Diagnosis

Compilation of contract fails with:

{undefined_function,{builtin,{map_get,{map,word,word}}}}`

Resolution

Work around it by adding some unused code that does a map lookup (m[0] or similar).

Error 35 in logs for CUDA miner

Diagnosis

Cuda29 miner was compiled with newer nvidia cuda driver than the one installed in the system running miner. It is possible with prebuild cuckoo miners

Resolution

Install newer cuda drivers

OR

Locally compile aecuckoo miner

Node rejecting State Channel sign messages

This section is applicable to v3.*

Diagnosis

State channel FSM does not yet support generalized accounts. When either one of the two participants of a state channel have upgraded their accounts to be using generalized accounts, their sign on-chain transactions attempts fail:

{  
   "jsonrpc":"2.0",
   "method":"channels.error",
   "params":{  
      "channel_id":null,
      "data":{  
         "message":"not_create_tx"
      }
   },
   "version":1
}

Note that this is an example message and a different message could contain a channel_id. Also the message could be:

  • not_create_tx when trying to sign a channel create transaction by an upgraded account
  • not_deposit_tx when trying to sign a deposit transaction by an upgraded account
  • not_withdraw_tx when trying to sign a withdrawal transaction by an upgraded account
  • not_offchain_tx when trying to sign an off-chain transaction by an upgraded account
  • not_close_mutual_tx when trying to sign a mutual closing transaction by an upgraded account

Resolution

On-chain protocol support is already present but support for GAs in FSM will be added in v 4.0.0. Until then participants can still use then new authentication methods but outside of the scope of the FSM.

Node rejecting State Channel WebSocket connections

This section is applicable to v3.*

Diagnosis

State channel FSM does not yet support generalized accounts. When either one of the two participants of a state channel have upgraded their accounts to be using generalized accounts, the node will reject incoming WebSocket open requests. This is to protect users from using the not yet supported feature.

Resolution

Do not use state channels' WebSocket API and FSM with generalized accounts. Support there will be provided in v 4.0.0

Building

Cannot build the node with build dependencies in non-default path

This section is applicable to versions v3.* for (advanced) users building the node and having installed the libsodium build dependency in a non-default path. Such users export the enviroment variables CFLAGS/LDFLAGS before attempting to build the node e.g. like:

export CFLAGS="-I $(brew --prefix libsodium)/include"
export LDFLAGS="-L$(brew --prefix libsodium)/lib" # No space - [ref](https://gcc.gnu.org/onlinedocs/gcc/Directory-Options.html#Directory-Options).

Diagnosis

From a clean clone of the repo of the node, you ran make prod-build and you got an output like:

...
===> Compiling rocksdb
...
checking whether we are cross compiling... configure: error: in `/Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/deps/snappy':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
make[2]: *** [/Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/deps/snappy/.libs/libsnappy.a] Error 1
make[2]: *** Waiting for unfinished jobs....
...
make[1]: [deps] Error 2 (ignored)
...
cc /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/c_src/refobjects.o /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/c_src/erocksdb_iter.o /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/c_src/erocksdb.o /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/c_src/erocksdb_db.o /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/c_src/erlang_merge.o /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/c_src/batch.o /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/c_src/erocksdb_snapshot.o /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/c_src/util.o /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/c_src/transactions.o /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/c_src/cache.o /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/c_src/rate_limiter.o /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/c_src/bitset_merge_operator.o /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/c_src/backup.o /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/c_src/erocksdb_column_family.o /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/c_src/counter_merge_operator.o /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/c_src/env.o -L /Users/ae/dev/kerl/installations/OTP-20.3.8.20/lib/erl_interface-3.10.2.1/lib -lerl_interface -lei -lstdc++ /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/deps/rocksdb/librocksdb.a /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/deps/snappy/.libs/libsnappy.a /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/deps/lz4/lib/liblz4.a -L/Users/ae/homebrew/opt/libsodium/lib -shared  -o /Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/priv/rocksdb.so
clang: error: no such file or directory: '/Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/deps/rocksdb/librocksdb.a'
clang: error: no such file or directory: '/Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/deps/snappy/.libs/libsnappy.a'
make[1]: *** [/Users/ae/dev/ae/aeternity/_build/default/lib/rocksdb/priv/rocksdb.so] Error 1
===> Hook for compile failed!

make: *** [internal-compile-deps] Error 1

Resolution

Unset CFLAGS and LDFLAGS and run make prod-build so that _build/default/lib/rocksdb gets built, then set again CFLAGS and LDFLAGS and run again make prod-build if build not complete yet (e.g. for compilation error on enacl for 'sodium.h' file not found).

Testing

This section describes how to identify and address common issues running development tests for the Aeternity node.

The test suite "aecore_sync_suite" fails

When running make test The test fails in the Sync Suite, caused by different IP for localhost and HOSTNMAME. This problem has been seen on Ubuntu 18.04.

Diagnosis

The test fails with

 %%% aecore_sync_SUITE ==> all_nodes.two_nodes.start_second_node: FAILED
 %%% aecore_sync_SUITE ==> {test_case_failed,
     {retry_exhausted1,

The file /etc/hosts contains two lines like

 127.0.0.1 localhost
 127.0.1.1 myhostname

Resolution

Make sure you use the same IP for localhost and the HOSTNAME by editing /etc/hosts to:

 127.0.0.1 localhost myhostname

Note that this might not be what you want if your machine has a public IP address and hostname. In that case edit the files ...epoch/config/dev*/sys.config So that the peers entries points to your machine's host name instead of localhost:

     {peers, [<<"aenode://pp$23YdvfRPQ1b1AMWmkKZUGk2cQLqygQp55FzDWZSEUicPjhxtp5@myhostname:3025">>,
              <<"aenode://pp$2M9oPohzsWgJrBBCFeYi3PVT4YF7F2botBtq6J1EGcVkiutx3R@myhostname:3035">>]},
Clone this wiki locally