Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown on blockcount RPC failure and other cases cleanly #693

Merged
merged 4 commits into from
Oct 8, 2020

Conversation

AdamISZ
Copy link
Member

@AdamISZ AdamISZ commented Oct 1, 2020

See commit comments.
Testing: passes test suite (after second commit), and I get sensible error message shutdowns without stack traces (but still with usual verbose IRC messages), both on Qt (with dialog box) and on command line in cases: (a) RPC call for getblockcount fails (I garbled the method name) and (b) RPC connection lost entirely (I shutdown the bitcoin daemon).

Prior to this commit, in case an RPC failure occurred when
accesing the block height, the program would continue but the
wallet would be in an un-writeable state (for command line
programs, specifically yield generators; for Qt the shutdown
would occur).
This commit slightly cleans up the process of shutting down,
ensuring that duplicate shutdown calls do not result in
stack traces. It also ensures that also for command line
programs, the application will immediately shutdown if the
regular heartbeat call to query the block height fails, as this
risks inconsistencies in the wallet (though the previous
situation luckily did not result in this as the call to
BaseWallet.close() resulted in the wallet being read only).
A future PR should develop a more sophisticated approach to
RPC call failures that may allow the program to wait.

stopservice
@AdamISZ
Copy link
Member Author

AdamISZ commented Oct 1, 2020

As was discussed on #673 this is not a full solution, but just cleaning up what we already had (my above verbose commit comment explains this a bit more) and actually ensuring a shutdown for RPC failures which we cannot handle (please don't forget that basic broken-pipe or failures are actually dealt with, including 100 retries .. this must be some kind of exceptional situation). instead of previous bizarre "wallet is read-only".

A full solution would be a much more extensive PR involving having clients keep track of the state of the walletservice, which is somehow signalling that it is in a "freeze" mode, waiting/hoping that the RPC comes back up or corrects itself (?)

A detail that was missing in #673 was - what is the exact failure of RPC encountered? I'm wondering if it wasn't an actual error code being returned by the getblockcount call. This code now at least prints out an error message with the actual RPC error, in this case.

@kristapsk
Copy link
Member

Tests pass, but there's now bad error handling when trying wallet-tool.py default method without Bitcoin Core running.

Before this PR:

$ ./scripts/wallet-tool.py testnet123.jmdat
User data location: /home/bitcoin/.joinmarket/
2020-10-01 21:42:19,145 [DEBUG]  rpc: getblockchaininfo []
2020-10-01 21:42:19,146 [ERROR]  Connection refused.
2020-10-01 21:42:19,146 [ERROR]  Failure of RPC connection to Bitcoin Core. Application cannot continue, shutting down.
Traceback (most recent call last):
  File "./scripts/wallet-tool.py", line 6, in <module>
    jmprint(wallet_tool_main("wallets"), "success")
  File "/home/bitcoin/joinmarket/jmclient/jmclient/wallet_utils.py", line 1390, in wallet_tool_main
    load_program_config(config_path=options.datadir)
  File "/home/bitcoin/joinmarket//jmclient/jmclient/configure.py", line 526, in load_program_config
    global_singleton.config)
  File "/home/bitcoin/joinmarket/jmclient/jmclient/configure.py", line 592, in get_blockchain_interface_instance
    bc_interface = BitcoinCoreInterface(rpc, network)
  File "/home/bitcoin/joinmarket//jmclient/jmclient/blockchaininterface.py", line 173, in __init__
    raise JsonRpcConnectionError("RPC connection to Bitcoin Core "
jmclient.jsonrpc.JsonRpcConnectionError: RPC connection to Bitcoin Core was not established successfully.

After:

$ ./scripts/wallet-tool.py testnet123.jmdat
User data location: /home/bitcoin/.joinmarket/
2020-10-01 21:42:08,156 [DEBUG]  rpc: getblockchaininfo []
2020-10-01 21:42:08,157 [ERROR]  Connection refused.
2020-10-01 21:42:08,157 [ERROR]  Failure of RPC connection to Bitcoin Core. Application cannot continue, shutting down.
Traceback (most recent call last):
  File "/home/bitcoin/joinmarket/jmclient/jmclient/jsonrpc.py", line 85, in queryHTTP
    self.conn.request("POST", self.url, body, headers)
  File "/usr/lib64/python3.6/http/client.py", line 1287, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib64/python3.6/http/client.py", line 1333, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.6/http/client.py", line 1282, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.6/http/client.py", line 1042, in _send_output
    self.send(msg)
  File "/usr/lib64/python3.6/http/client.py", line 980, in send
    self.connect()
  File "/usr/lib64/python3.6/http/client.py", line 952, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "/usr/lib64/python3.6/socket.py", line 724, in create_connection
    raise err
  File "/usr/lib64/python3.6/socket.py", line 713, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/bitcoin/joinmarket/jmclient/jmclient/blockchaininterface.py", line 205, in rpc
    res = self.jsonRpc.call(method, args)
  File "/home/bitcoin/joinmarket/jmclient/jmclient/jsonrpc.py", line 152, in call
    response = self.queryHTTP(request)
  File "/home/bitcoin/joinmarket/jmclient/jmclient/jsonrpc.py", line 128, in queryHTTP
    raise JsonRpcConnectionError("JSON-RPC connection refused.")
jmclient.jsonrpc.JsonRpcConnectionError: JSON-RPC connection refused.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./scripts/wallet-tool.py", line 6, in <module>
    jmprint(wallet_tool_main("wallets"), "success")
  File "/home/bitcoin/joinmarket/jmclient/jmclient/wallet_utils.py", line 1390, in wallet_tool_main
    load_program_config(config_path=options.datadir)
  File "/home/bitcoin/joinmarket/jmclient/jmclient/configure.py", line 526, in load_program_config
    global_singleton.config)
  File "/home/bitcoin/joinmarket/jmclient/jmclient/configure.py", line 592, in get_blockchain_interface_instance
    bc_interface = BitcoinCoreInterface(rpc, network)
  File "/home/bitcoin/joinmarket/jmclient/jmclient/blockchaininterface.py", line 167, in __init__
    blockchainInfo = self.rpc("getblockchaininfo", [])
  File "/home/bitcoin/joinmarket/jmclient/jmclient/blockchaininterface.py", line 217, in rpc
    stop_reactor()
NameError: name 'stop_reactor' is not defined

Also manually fire order creation in coinjoin tests.
This clarification and test change is required due
to the fact that LoopingCalls are designed to fire
immediately by default, before the reactor is
initialized (and therefore in a `running` state),
making it not possible to shutdown the reactor as
a result of events happening in that first call;
so we delay the first call of the maker's orderbook
populating code, so that if a no-coins error
occurs, it will actually shut down the reactor and
hence the whole yield generator program, as intended.
@AdamISZ AdamISZ force-pushed the shutdown-blockheight-failure branch from e2e9990 to 202f8ee Compare October 1, 2020 18:58
@AdamISZ
Copy link
Member Author

AdamISZ commented Oct 1, 2020

Oh thanks for catching that, stupid bug. Fixed in 202f8ee (just imported stop_reactor).

@kristapsk
Copy link
Member

Did another test where the old behaviour was better than the new one. It's 1) start Bitcoin Core, 2) start JoinMarketQt, 3) shutdown Bitcoin Core, 4) try opening wallet in JoinMarketQt.

Before this PR:
image

After, hangs up in this state:
image

With this stack trace after you close Qt app:

2020-10-02 23:35:27,421 [ERROR]  Failure of RPC connection to Bitcoin Core. Application cannot continue, shutting down.
Unhandled error in Deferred:

Traceback (most recent call last):
  File "/home/neonz/git5/joinmarket-clientserver/jmvenv/lib/python3.6/site-packages/qt5reactor/core.py", line 286, in _iterate
    self.runUntilCurrent()
  File "/home/neonz/git5/joinmarket-clientserver/jmvenv/lib/python3.6/site-packages/twisted/internet/base.py", line 913, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "/home/neonz/git5/joinmarket-clientserver/jmvenv/lib/python3.6/site-packages/twisted/internet/defer.py", line 460, in callback
    self._startRunCallbacks(result)
  File "/home/neonz/git5/joinmarket-clientserver/jmvenv/lib/python3.6/site-packages/twisted/internet/defer.py", line 568, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/home/neonz/git5/joinmarket-clientserver/jmvenv/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/home/neonz/git5/joinmarket-clientserver/jmvenv/lib/python3.6/site-packages/twisted/internet/task.py", line 866, in <lambda>
    d.addCallback(lambda ignored: callable(*args, **kw))
  File "/mnt/kaka/home/neonz/git5/joinmarket-clientserver/jmclient/jmclient/wallet_service.py", line 437, in sync_wallet
    self.sync_wallet_fast()
  File "/mnt/kaka/home/neonz/git5/joinmarket-clientserver/jmclient/jmclient/wallet_service.py", line 464, in sync_wallet_fast
    self.sync_addresses_fast()
  File "/mnt/kaka/home/neonz/git5/joinmarket-clientserver/jmclient/jmclient/wallet_service.py", line 502, in sync_addresses_fast
    self.get_address_usages()
  File "/mnt/kaka/home/neonz/git5/joinmarket-clientserver/jmclient/jmclient/wallet_service.py", line 482, in get_address_usages
    fagd = (tuple(item) for sublist in agd for item in sublist)
builtins.TypeError: 'NoneType' object is not iterable

@AdamISZ
Copy link
Member Author

AdamISZ commented Oct 4, 2020

This is dumb, but I've just spent over an hour looking at this, and I can't even replicate your first result; the dialog box with "Walletservice failed to start ..." does not appear for me, the Exception is not caught. I believe I am doing the exact same thing, on master, i.e.: start Core, start joinmarketqt, stop core, load a wallet via load->wallet. for me it just outputs to terminal:

2020-10-04 12:16:06,863 [WARNING]  Connection had broken pipe, attempting reconnect.
2020-10-04 12:16:06,863 [ERROR]  Failure to get blockheight from Bitcoin Core.
Traceback (most recent call last):
  File "joinmarket-qt.py", line 1859, in selectWallet
    self.loadWalletFromBlockchain(firstarg, pwd)
  File "joinmarket-qt.py", line 1890, in loadWalletFromBlockchain
    self.wallet_service = WalletService(wallet)
  File "/home/waxwing/testjminstall/joinmarket-clientserver/jmclient/jmclient/wallet_service.py", line 59, in __init__
    raise Exception("WalletService failed to start "
Exception: WalletService failed to start due to inability to query block height.

... and Qt does not close (i.e. closeEvent is not fired).

Again, to emphasize, this was on master, i.e. i was trying to replicate the first of the two results you just showed.

@AdamISZ
Copy link
Member Author

AdamISZ commented Oct 5, 2020

OK came back to it today, immediately obvious what the problem was: regtest code is in an else branch in selectWallet ... I need to fix that up to behave the same (and yes I know we need not to have separate code for the two cases, but Joinmarket started that way and needs a lot of effort to fix).

@AdamISZ
Copy link
Member Author

AdamISZ commented Oct 5, 2020

As of 5af2d49 :

So, the above issue raised by @kristapsk makes sense after more careful thought: the problem is that the failure is triggered in the constructor of the WalletService; previously we had raised a general Exception in that case, which was caught (on mainnet!) in the selectWallet call, which then displayed an appropriate message (although it didn't auto-close the app, which I think it should).

In this PR, I removed that general Exception in favour of a reactor shutdown, but this allowed the Qt code to continue with the remaining code in loadWalletFromBlockchain, which actually started the wallet service, ignoring the error, and hitting an error during wallet sync (because the RPC calls were failing).

So here after this new commit, we instead signal, inside the WalletService constructor, that an rpc_error has occurred, so the Qt code knows that it happened and prints a message and quits the app, instead of proceeding to do wallet sync.

Testing: I have tested: command line maker quits with proper error message if no coins, and also does so if the bitcoind shuts down during running (so a recheck of the above), check that the case illustrated by @kristapsk above now prints error dialog in GUI, then quits app, Checked also that in case there is no bitcoind error, the wallet load and startup proceed normally in Qt.
There are of course several other tests we can do, please do them if you get the chance.

@kristapsk
Copy link
Member

Just noticed another not so nice error message in this PR vs master. To reproduce: 1) start Bitcoin Core (on testnet), 2) run wallet-tool.py wallet.jmdat, 3) wait for password prompt, 4) shutdown Bitcoin Core, 5) enter wallet password.

Master:

$ ./scripts/wallet-tool.py testnet111.jmdat
User data location: /home/bitcoin/.joinmarket/
2020-10-06 04:12:37,421 [DEBUG]  rpc: getblockchaininfo []
Enter passphrase to decrypt wallet: 
2020-10-06 04:12:54,573 [ERROR]  Failure of RPC connection to Bitcoin Core. Application cannot continue, shutting down.
2020-10-06 04:12:54,573 [ERROR]  Failure to get blockheight from Bitcoin Core.
Traceback (most recent call last):
  File "./scripts/wallet-tool.py", line 6, in <module>
    jmprint(wallet_tool_main("wallets"), "success")
  File "/home/bitcoin/joinmarket-clientserver/jmclient/jmclient/wallet_utils.py", line 1434, in wallet_tool_main
    wallet_service = WalletService(wallet)
  File "/home/bitcoin/joinmarket-clientserver/jmclient/jmclient/wallet_service.py", line 59, in __init__
    raise Exception("WalletService failed to start "
Exception: WalletService failed to start due to inability to query block height.

This PR:

$ ./scripts/wallet-tool.py testnet111.jmdat
User data location: /home/bitcoin/.joinmarket/
2020-10-06 04:11:45,356 [DEBUG]  rpc: getblockchaininfo []
Enter passphrase to decrypt wallet: 
2020-10-06 04:12:03,794 [ERROR]  Failure of RPC connection to Bitcoin Core. Application cannot continue, shutting down.
2020-10-06 04:12:03,795 [ERROR]  Critical error updating blockheight.
2020-10-06 04:12:03,795 [ERROR]  Failure of RPC connection to Bitcoin Core in wallet service startup. Application cannot continue, shutting down.
2020-10-06 04:12:03,795 [DEBUG]  rpc: listaddressgroupings []
2020-10-06 04:12:03,796 [ERROR]  Connection refused.
2020-10-06 04:12:03,796 [ERROR]  Failure of RPC connection to Bitcoin Core. Application cannot continue, shutting down.
Traceback (most recent call last):
  File "./scripts/wallet-tool.py", line 6, in <module>
    jmprint(wallet_tool_main("wallets"), "success")
  File "/home/bitcoin/joinmarket-clientserver/jmclient/jmclient/wallet_utils.py", line 1442, in wallet_tool_main
    if wallet_service.sync_wallet(fast = not options.recoversync):
  File "/home/bitcoin/joinmarket-clientserver/jmclient/jmclient/wallet_service.py", line 444, in sync_wallet
    self.sync_wallet_fast()
  File "/home/bitcoin/joinmarket-clientserver/jmclient/jmclient/wallet_service.py", line 471, in sync_wallet_fast
    self.sync_addresses_fast()
  File "/home/bitcoin/joinmarket-clientserver/jmclient/jmclient/wallet_service.py", line 509, in sync_addresses_fast
    self.get_address_usages()
  File "/home/bitcoin/joinmarket-clientserver/jmclient/jmclient/wallet_service.py", line 489, in get_address_usages
    fagd = (tuple(item) for sublist in agd for item in sublist)
TypeError: 'NoneType' object is not iterable

@AdamISZ
Copy link
Member Author

AdamISZ commented Oct 6, 2020

OK; so this is a pretty obscure case (one could argue the previous one was too); the error you're seeing there happens for the same reason as for the previous one, I can fix it up the same way (and I will) but it's worth noting this is really pretty minor and a very unlikely edge case - "minor" because basically the program crashes either way, it just crashes more messily now.

@kristapsk
Copy link
Member

Did some more testing and haven't found any new issues.

Copy link
Member

@kristapsk kristapsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 5604857

@AdamISZ
Copy link
Member Author

AdamISZ commented Oct 8, 2020

Thanks for the thorough checks.

@AdamISZ AdamISZ merged commit c113b2c into master Oct 8, 2020
@AdamISZ AdamISZ deleted the shutdown-blockheight-failure branch October 9, 2020 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants