-
Notifications
You must be signed in to change notification settings - Fork 35.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpc: Add a -rpcwaittimeout
parameter to limit time spent waiting
#21056
Conversation
c76465d
to
7f19f5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK. Could append a test to the existing -rpcwait
coverage at the end of test/functional/interface_bitcoin_cli.py
to add coverage.
src/bitcoin-cli.cpp
Outdated
@@ -712,6 +714,9 @@ static UniValue ConnectAndCallRPC(BaseRequestHandler* rh, const std::string& str | |||
UniValue response(UniValue::VOBJ); | |||
// Execute and handle connection failures with -rpcwait. | |||
const bool fWait = gArgs.GetBoolArg("-rpcwait", false); | |||
const int timeout = gArgs.GetArg("-rpcwaittimeout", DEFAULT_WAIT_CLIENT_TIMEOUT); | |||
const int64_t deadline = GetTime() + timeout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think GetTime
is deprecated in favor of GetSystemTimeInSeconds
(not mockable) or GetTime<T>
(mockable), see src/util.time.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, amended the use of GetTime
with the GetTime<T>
version (had to look it up, hopefully I got the right version).
@@ -70,6 +71,7 @@ static void SetupCliArgs(ArgsManager& argsman) | |||
argsman.AddArg("-rpcport=<port>", strprintf("Connect to JSON-RPC on <port> (default: %u, testnet: %u, signet: %u, regtest: %u)", defaultBaseParams->RPCPort(), testnetBaseParams->RPCPort(), signetBaseParams->RPCPort(), regtestBaseParams->RPCPort()), ArgsManager::ALLOW_ANY | ArgsManager::NETWORK_ONLY, OptionsCategory::OPTIONS); | |||
argsman.AddArg("-rpcuser=<user>", "Username for JSON-RPC connections", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS); | |||
argsman.AddArg("-rpcwait", "Wait for RPC server to start", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS); | |||
argsman.AddArg("-rpcwaittimeout=<n>", strprintf("Timeout in seconds to wait for the RPC server to start, or 0 for no timeout. (default: %d)", DEFAULT_WAIT_CLIENT_TIMEOUT), ArgsManager::ALLOW_INT, OptionsCategory::OPTIONS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Judging from the PR description you likely thought about this, but could the timeout still be an argument in -rpcwait
and only kick in if the value is greater than 1? That would avoid introducing another config option and as a bonus would be shorter to type. IDK what kind of timeout values API clients need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite happy to switch it around, but having 1
as a special value is kinda weird as well. I thought I'd go the most cautious route first and see which way wins in the review xD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. Is the real time ever that low anyway? Maybe counting -rpcwait=1 as one second would be fine...I don't have a fast computer and generally run debug builds, so I may see slower times than others, but rpcwait is ~3 seconds for me on signet and much longer on regtest (~10s), testnet (~95s) and mainnet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect many users to run into issues if we were to special case -rpcwait=1
, but given the "creativity" of users I wouldn't put it past them to use -rpcwait=42
or -rpcwait=31337
. Some users seem to see "any value" as a personal challenge 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still be in favor of combining rpcwait
and rpcwaittimeout
, but if we prefer two arguments for this, it may be helpful to mention the relationship to -rpcwait
in this description.
src/bitcoin-cli.cpp
Outdated
@@ -39,6 +39,7 @@ UrlDecodeFn* const URL_DECODE = urlDecode; | |||
|
|||
static const char DEFAULT_RPCCONNECT[] = "127.0.0.1"; | |||
static const int DEFAULT_HTTP_CLIENT_TIMEOUT=900; | |||
static const int DEFAULT_WAIT_CLIENT_TIMEOUT=0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, could be constexpr
and apply clang formatting for new code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to catch up about constexpr
, but good point 👍
Concept ACK |
2 similar comments
Concept ACK |
Concept ACK |
Thanks for the reviews and concept ACKs everybody 👍 I'm a bit puzzled by the appveyo failure, which doesn't surface any details related to these changes, is this something my changes broke or was it something with |
Concept ACK, code change looks good to me. Just noting that effective minimum timeout is 1s, maybe sleep with min(timout, 1s)? An alternative would be to specify attempt count. |
Excellent point/ The reason I went for the deadline approach was that in between sleeps we are actually attempting to connect over the network, which has its own timeout ( Effectively the time we can wait in the worst case is |
Appveyor is a bit flakey ATM, it's probably unrelated. |
cb8436f
to
9d2a38a
Compare
Rebased on top of |
9d2a38a
to
7dcead4
Compare
Yep seems, like CI is happier now ^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 7dcead4.
nit, could have a release note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Small note too: you should remove the @
from the OP or laanwj will get spammed by all the altcoins merge (and the merge script will refuse it anyways) :)
src/bitcoin-cli.cpp
Outdated
@@ -713,6 +715,9 @@ static UniValue ConnectAndCallRPC(BaseRequestHandler* rh, const std::string& str | |||
UniValue response(UniValue::VOBJ); | |||
// Execute and handle connection failures with -rpcwait. | |||
const bool fWait = gArgs.GetBoolArg("-rpcwait", false); | |||
const int timeout = gArgs.GetArg("-rpcwaittimeout", DEFAULT_WAIT_CLIENT_TIMEOUT); | |||
const int64_t deadline = GetTime<std::chrono::microseconds>().count() + timeout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want this to be std::chrono::seconds
instead of std::chrono::microseconds
, or to change the doc above to mention the parameter is passed in microseconds.
Can you point me to the conventions for release notes? This is my first PR that might need them 🎉 |
I couldn't find anything in These have a) the broad category (in this case, I guess "RPC client functionality") b) a very brief description (not full documentation!) of the user-visible change. See for example: |
Awesome, thanks. I was expecting something like this to avoid hotspots, and it's way easier than c-lightning's Changelog-in-the-commit-message method 👍 |
Oh great! I somehow couldn't find this (I did grep for a few things). Might make sense to link to this from |
7dcead4
to
d373793
Compare
Added preliminary release notes in a new file, and fixed up the units in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments below. Haven't checked, but ISTM the logic could use std::chrono
throughout rather than converting values back to integers. A functional test verifying the timeout works correctly and a specific error type is raised might be a good idea.
|
||
- The new `-rpcwaittimeout` argument to `bitcoin-cli` sets the timeout | ||
in seconds to use with `-rpcwait`. If the timeout expires | ||
`bitcoin-cli` will report a failure. (#21056) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Among the current headers in doc/release-notes.md
, "New settings" might be the closest, but I'm not sure. There doesn't seem to be a CLI header. Suggestion follows, feel free to ignore.
-RPC client functionality
-------------------------
+New settings
+------------
-- The new `-rpcwaittimeout` argument to `bitcoin-cli` sets the timeout
- in seconds to use with `-rpcwait`. If the timeout expires
+- A new `-rpcwaittimeout` argument to `bitcoin-cli` sets the timeout
+ in seconds to use with `-rpcwait`. If the timeout expires,
`bitcoin-cli` will report a failure. (#21056)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if that header is better. As a compromise "New bitcoin-cli settings" maybe?
I do think it's important to keep mentions of server and other tools's command line arguments separated. Then again, all of this can be organized when merging the fragments—it's not an automatic process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, "New bitcoin-cli settings" seems better to me, too. Though as mentioned in #21056 (comment), if we add the argument to the existing -rpcwait=
(as a CLI setting intended for human manual use rather than by client software, ISTM it may be less subject to API constraints and would be less verbose than -rpcwait -rpcwaittimeout=
), it could be "Updated bitcoin-cli settings".
@@ -70,6 +71,7 @@ static void SetupCliArgs(ArgsManager& argsman) | |||
argsman.AddArg("-rpcport=<port>", strprintf("Connect to JSON-RPC on <port> (default: %u, testnet: %u, signet: %u, regtest: %u)", defaultBaseParams->RPCPort(), testnetBaseParams->RPCPort(), signetBaseParams->RPCPort(), regtestBaseParams->RPCPort()), ArgsManager::ALLOW_ANY | ArgsManager::NETWORK_ONLY, OptionsCategory::OPTIONS); | |||
argsman.AddArg("-rpcuser=<user>", "Username for JSON-RPC connections", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS); | |||
argsman.AddArg("-rpcwait", "Wait for RPC server to start", ArgsManager::ALLOW_ANY, OptionsCategory::OPTIONS); | |||
argsman.AddArg("-rpcwaittimeout=<n>", strprintf("Timeout in seconds to wait for the RPC server to start, or 0 for no timeout. (default: %d)", DEFAULT_WAIT_CLIENT_TIMEOUT), ArgsManager::ALLOW_INT, OptionsCategory::OPTIONS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still be in favor of combining rpcwait
and rpcwaittimeout
, but if we prefer two arguments for this, it may be helpful to mention the relationship to -rpcwait
in this description.
src/bitcoin-cli.cpp
Outdated
if (fWait) { | ||
UninterruptibleSleep(std::chrono::milliseconds{1000}); | ||
const int64_t now = GetTime<std::chrono::seconds>().count(); | ||
if (fWait && (timeout == 0 || now < deadline)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error I'm seeing is a bit confusing. It would be good to raise a specific error for the timeout value exceeded.
$ bitcoind -regtest -daemon ; bitcoin-cli -regtest -rpcwait -rpcwaittimeout=3 -getinfo
Bitcoin Core starting
error: server in warmup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK, but I think actual software shouldn't be using bitcoin-cli at all...
For what it's worth I agree with @luke-jr here. Never really understood why |
Not trying to make this a It was/is easier to call out to an existing program that includes the logic for authentication, error handling and timeouts than to implement it from scratch on top of out io library which doesn't have good http support (this is likely self-inflicted, but it's what we've got and it's working well for us so far). Authentication here being the big win, since if you have Are there downsides of calling out to That being said, I'm sure we're not the only ones that could use a wait timeout (thinking primarily of scripts and operations). |
I guess it's mostly down to personal preference, I prefer to use APIs directly, calling out to external programs reminds me of shell scripts which always are very environment-dependent. One concrete case is that bitcoin RPC might be forwarded from another container. Right now it's necessary to copy Also as Also accessing the API directly gives you full control over things such as timeouts. No need to wait for PRs to be merged here and then part of a release so that users have it 😄 JSONRPC handling is straightforward if you already have JSON for your own APIs.
For authentication—I'd somewhat agree, but we don't do anything complex such as HMAC or challenge/response. The only available authentication mechanism is HTTP basic authentication with a) send
There hasn't been TLS support in Sorry for derailing this further. Again, this is not meant as criticism on this change nor on |
Reworded the release notes as proposed by @jonatack, and rebased on top of Open questions are:
Anything I'm missing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested ACK 4840dee.
As far as I can see the only unresolved thing in this PR is the discussion about whether to reuse
-rpcwait
and redefine the argument to be the timeout in seconds, or whether the current addition of a new flag is preferrable.
I think current approach is fine otherwise how would the user disable waiting - -rpcwait=0
is disable or wait forever?
@cdecker you can update OP now that there is a test.
self.log.info("Test -rpcwait option waits at most -rpcwaittimeout seconds for startup") | ||
self.stop_node(0) # stop the node so we time out | ||
assert_raises_process_error(1, "Could not connect to the server", self.nodes[0].cli('-rpcwait', '-rpcwaittimeout=5').echo) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, could check elapsed time is >= to 5s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added check using assert_greater_than_or_equal
and had to import time
but that should work ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be assert_greater_than_or_equal(time.time(), start_time + 5)
.
@@ -738,6 +740,9 @@ static UniValue ConnectAndCallRPC(BaseRequestHandler* rh, const std::string& str | |||
UniValue response(UniValue::VOBJ); | |||
// Execute and handle connection failures with -rpcwait. | |||
const bool fWait = gArgs.GetBoolArg("-rpcwait", false); | |||
const int timeout = gArgs.GetArg("-rpcwaittimeout", DEFAULT_WAIT_CLIENT_TIMEOUT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, is it fine to accept <0? A negative value is same as -rpcwait=false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, the alternative would be to throw an error, but that might be a bit much.
src/bitcoin-cli.cpp
Outdated
UninterruptibleSleep(std::chrono::milliseconds{1000}); | ||
} catch (const CConnectionFailed& e) { | ||
const int64_t now = GetTime<std::chrono::seconds>().count(); | ||
if (fWait && (timeout == 0 || now < deadline)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe timeout <= 0
- zero or negative means no timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amended the ==
to become <=
👍
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsNo conflicts as of last run. |
Rebased on top of |
@cdecker I think you forgot to autosquash? |
Sorry, used to the c-lightning workflow where we keep fixups to show how things were addressed. Will squash as soon as I wake up 🙂 |
Adds a new numeric `-rpcwaittimeout` that can be used to limit the time we spend waiting on the RPC server to appear. This is used by downstream projects to provide a bit of slack when `bitcoind`s RPC interface is not available right away.
As proposed by @laanwj the error message is now prefixed with the "timeout on transient error:" prefix, to explain why the error is suddenly considered terminal.
Rebased on top of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested ACK 5685cdc but the test needs to be fixed.
self.log.info("Test -rpcwait option waits at most -rpcwaittimeout seconds for startup") | ||
self.stop_node(0) # stop the node so we time out | ||
assert_raises_process_error(1, "Could not connect to the server", self.nodes[0].cli('-rpcwait', '-rpcwaittimeout=5').echo) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be assert_greater_than_or_equal(time.time(), start_time + 5)
.
Suggested-by: Jon Atack <@jonatack>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK b9e76f1.
Yep, I switched the arguments around apparently 😉 |
Is this good in its current state or does something need to be addressed? Probably not the highest priority issue, but would love to remove it from my watchlist, e.g., by merging it 😉 |
Code review ACK b9e76f1 |
Adds a new numeric `-rpcwaittimeout` that can be used to limit the time we spend waiting on the RPC server to appear. This is used by downstream projects to provide a bit of slack when `bitcoind`s RPC interface is not available right away. Github-Pull: bitcoin#21056 Rebased-From: a7fcc8e
As proposed by @laanwj the error message is now prefixed with the "timeout on transient error:" prefix, to explain why the error is suddenly considered terminal. Github-Pull: bitcoin#21056 Rebased-From: f76cb10
Suggested-by: Jon Atack <@jonatack> Github-Pull: bitcoin#21056 Rebased-From: b9e76f1
Adds a new numeric
-rpcwaittimeout
that can be used to limit thetime we spend waiting on the RPC server to appear. This is used by
downstream projects to provide a bit of slack when
bitcoind
s RPCinterface is not available right away.
This makes the
-rpcwait
argument more useful, since we can now limithow long we'll ultimately wait, before potentially giving up and reporting
an error to the caller. It was discussed in the context of the BTCPayServer
wanting to have c-lightning wait for the RPC interface to become available
but still have the option of giving up eventually (4355).
I checked with laanwj whether this is already possible (comment), and
whether this would be a welcome change. Initially I intended to repurpose
the (optional) argument to
-rpcwait
, however I decided against it since itwould potentially break existing configurations, using things like
rpcwait=1
,or
rpcwait=true
(the former would have an unintended short timeout, whenold behavior was to wait indefinitely).
Due to its simplicity I didn't implement a test for it yet, but if that's desired ITest was added during reviews.can provide one.