New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Use our own implementation of urlDecode #29904
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. |
Concept ACK |
1 similar comment
Concept ACK |
Concept ACK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK
Since urlDecode
no longer needs libevent, we can clean up the workaround introduced in #18504 too (edit, oh that's basically what @laanwj said a few comments up):
git diff on 2447bc7
diff --git a/src/bitcoin-cli.cpp b/src/bitcoin-cli.cpp
index 129deeec60..830171f63a 100644
--- a/src/bitcoin-cli.cpp
+++ b/src/bitcoin-cli.cpp
@@ -11,7 +11,6 @@
#include <clientversion.h>
#include <common/args.h>
#include <common/system.h>
-#include <common/url.h>
#include <compat/compat.h>
#include <compat/stdin.h>
#include <policy/feerate.h>
@@ -51,7 +50,6 @@
using CliClock = std::chrono::system_clock;
const std::function<std::string(const char*)> G_TRANSLATION_FUN = nullptr;
-UrlDecodeFn* const URL_DECODE = urlDecode;
static const char DEFAULT_RPCCONNECT[] = "127.0.0.1";
static const int DEFAULT_HTTP_CLIENT_TIMEOUT=900;
diff --git a/src/bitcoin-wallet.cpp b/src/bitcoin-wallet.cpp
index d5dfbbec27..fe90958a5f 100644
--- a/src/bitcoin-wallet.cpp
+++ b/src/bitcoin-wallet.cpp
@@ -11,7 +11,6 @@
#include <clientversion.h>
#include <common/args.h>
#include <common/system.h>
-#include <common/url.h>
#include <compat/compat.h>
#include <interfaces/init.h>
#include <key.h>
@@ -28,7 +27,6 @@
#include <tuple>
const std::function<std::string(const char*)> G_TRANSLATION_FUN = nullptr;
-UrlDecodeFn* const URL_DECODE = nullptr;
static void SetupWalletToolArgs(ArgsManager& argsman)
{
diff --git a/src/bitcoind.cpp b/src/bitcoind.cpp
index 4f0a816388..54796c5abb 100644
--- a/src/bitcoind.cpp
+++ b/src/bitcoind.cpp
@@ -12,7 +12,6 @@
#include <common/args.h>
#include <common/init.h>
#include <common/system.h>
-#include <common/url.h>
#include <compat/compat.h>
#include <init.h>
#include <interfaces/chain.h>
@@ -35,7 +34,6 @@
using node::NodeContext;
const std::function<std::string(const char*)> G_TRANSLATION_FUN = nullptr;
-UrlDecodeFn* const URL_DECODE = urlDecode;
#if HAVE_DECL_FORK
diff --git a/src/common/url.h b/src/common/url.h
index b16b8241af..0283f471bc 100644
--- a/src/common/url.h
+++ b/src/common/url.h
@@ -7,8 +7,6 @@
#include <string>
-using UrlDecodeFn = std::string(const std::string& url_encoded);
-UrlDecodeFn urlDecode;
-extern UrlDecodeFn* const URL_DECODE;
+std::string urlDecode(const std::string& urlEncoded);
#endif // BITCOIN_COMMON_URL_H
diff --git a/src/qt/main.cpp b/src/qt/main.cpp
index ded057dbfa..16befd99e8 100644
--- a/src/qt/main.cpp
+++ b/src/qt/main.cpp
@@ -4,7 +4,6 @@
#include <qt/bitcoin.h>
-#include <common/url.h>
#include <compat/compat.h>
#include <util/translation.h>
@@ -17,7 +16,6 @@
extern const std::function<std::string(const char*)> G_TRANSLATION_FUN = [](const char* psz) {
return QCoreApplication::translate("bitcoin-core", psz).toStdString();
};
-UrlDecodeFn* const URL_DECODE = urlDecode;
const std::function<std::string()> G_TEST_GET_FULL_NAME{};
diff --git a/src/test/util/setup_common.cpp b/src/test/util/setup_common.cpp
index 2c18184261..38350b33cc 100644
--- a/src/test/util/setup_common.cpp
+++ b/src/test/util/setup_common.cpp
@@ -14,7 +14,6 @@
#include <banman.h>
#include <chainparams.h>
#include <common/system.h>
-#include <common/url.h>
#include <consensus/consensus.h>
#include <consensus/params.h>
#include <consensus/validation.h>
@@ -81,7 +80,6 @@ using node::RegenerateCommitments;
using node::VerifyLoadedChainstate;
const std::function<std::string(const char*)> G_TRANSLATION_FUN = nullptr;
-UrlDecodeFn* const URL_DECODE = nullptr;
/** Random context to get unique temp data dirs. Separate from g_insecure_rand_ctx, which can be seeded from a const env var */
static FastRandomContext g_insecure_rand_ctx_temp_path;
diff --git a/src/wallet/rpc/util.cpp b/src/wallet/rpc/util.cpp
index 06ec7db1bc..4e93b0eac5 100644
--- a/src/wallet/rpc/util.cpp
+++ b/src/wallet/rpc/util.cpp
@@ -61,9 +61,9 @@ bool ParseIncludeWatchonly(const UniValue& include_watchonly, const CWallet& wal
bool GetWalletNameFromJSONRPCRequest(const JSONRPCRequest& request, std::string& wallet_name)
{
- if (URL_DECODE && request.URI.substr(0, WALLET_ENDPOINT_BASE.size()) == WALLET_ENDPOINT_BASE) {
+ if (request.URI.substr(0, WALLET_ENDPOINT_BASE.size()) == WALLET_ENDPOINT_BASE) {
// wallet endpoint was used
- wallet_name = URL_DECODE(request.URI.substr(WALLET_ENDPOINT_BASE.size()));
+ wallet_name = urlDecode(request.URI.substr(WALLET_ENDPOINT_BASE.size()));
return true;
}
return false;
nit: Would then update the fn signature to std::string UrlDecode(const std::string& url_encoded)
src/common/url.cpp
Outdated
// Next two characters are part of the percent encoding | ||
i += 2; | ||
} else { | ||
// In case of invalid percent encoding, add the '%' and continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rationale for continuing here? Should we accept invalid URLs? At first sight, I think I'd prefer having this return a util::Result
and have the function fail in case of missing hex digits or invalid hex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also found this a bit strange but it's the behavior of libevent and they even have an explicit test case for this (which failed when I wrote my initial implementation). I didn't investigate this further yet but I think the safest option is to keep this a pure refactor for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better or worse, this silent accepting behavior for invalid hex values in URL decoding is really common. See for example Python:
>>> import urllib.parse
>>> urllib.parse.unquote('%12%%xx%0a')
'\x12%%xx\n'
It's the web philopsophy of "in case of doubt, just do something!" 😅 . Anyhow, in this refactor it doesn't make sense imo to add error handling here.
Addressed @laanwj 's comment and removed the hooking. FWIW, I wondered for a moment why this is in common and not util but that was a conscious decision, see #26294.
Absolutely, I hadn't even seen that yet. For reviewers that wonder as well, the point of this was to not have the wallet depend on Nice, I also missed that one! |
e93a6b6
to
d2a35ae
Compare
Addressed @stickies-v 's feedback as well, thanks a lot! |
d2a35ae
to
f037e67
Compare
f037e67
to
68bf6c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 68bf6c8
Link to the code as of v2.2.12 which we use in depends:
evhttp_uridecode
: https://github.com/libevent/libevent/blob/release-2.1.12-stable/http.c#L3225-L3246evhttp_decode_uri_internal
: https://github.com/libevent/libevent/blob/release-2.1.12-stable/http.c#L3169-L3205
I was also happy to see that test/fuzz/string.cpp
already covers this function.
@@ -51,7 +50,6 @@ | |||
using CliClock = std::chrono::system_clock; | |||
|
|||
const std::function<std::string(const char*)> G_TRANSLATION_FUN = nullptr; | |||
UrlDecodeFn* const URL_DECODE = urlDecode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this doing anything, or did we forget to remove it earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I understand your question, do you mean this line in particular or the whole hooking thing? This line was added the with the rest of hooking code in #18504 and I don't think anything was changed about that until now, so I don't know where we might have forgotten to remove it.
8d96c74
to
f986315
Compare
Addressed feedback by @paplorinc and @Sjors , thanks a lot! I also reorganized the commits a bit because they were doing more than one thing and it was hard to address feedback cleanly. The tests are now split from the implementation, which makes it easier to cherry-pick the unit test commit and run it against master. And I split off the renaming into a scripted diff. |
5e1d353
to
d836cdb
Compare
Concept NACK. Embedding copies of shared code just makes things more fragile, not safer. It's far more likely there's a bugfix that we miss picking up on, than that there's malicious code inserted later. |
Sometimes i'd agree with that reasoning. But let's face it, (Also this isn't a direct copy of the C code! It's a C++ reimplementation that does |
In addition to what @laanwj said, it's also a much simpler re-implementation. The libevent function provided further options that we didn't care about and hardcoded, that's how we could simplify. And as was discussed in the last IRC meeting, there is also interest in removing libevent in general, at which point this would be a necessary change anyway. We are also removing that hooking code which is a big improvement in itself. If you subtract the test code, we are almost removing as much code as we are adding.
So, to be clear, your opinion is for this part I should remove the patch and introduce a change of behavior, right? |
Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
cb8f83d
to
20d44a1
Compare
No idea what this means ;) but I addressed the feedback, thanks a lot! |
The point of this was to be able to build bitcoin-tx and bitcoin-wallet without libevent, see bitcoin#18504. Now that we use our own implementation of urlDecode this is not needed anymore. Co-authored-by: stickies-v <stickies-v@protonmail.com>
-BEGIN VERIFY SCRIPT- sed -i 's/urlDecode/UrlDecode/g' $(git grep -l 'urlDecode' ./src) sed -i 's/urlEncoded/url_encoded/g' $(git grep -l 'urlEncoded' ./src) -END VERIFY SCRIPT-
The previous behavior was the result of casting the result returned from the libevent function evhttp_uridecode to std:string but this was probably not intended.
20d44a1
to
992c714
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-review ACK 992c714
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some differential fuzzing, as this is an ideal fit for it.
ACK 992c714 👈
Show signature
Signature:
untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: ACK 992c714451676cee33d3dff49f36329423270c1c 👈
uoUAes8qa1RR+FiK2t8YFFjOnn0fR1ewcF3JZcxlbFYRLH099vwrEWYq5uCUNGpAtidxMgd/deCwvPSTZbvgDQ==
BOOST_CHECK_EQUAL(UrlDecode("%20invalid%ZX"), " invalid%ZX"); | ||
BOOST_CHECK_EQUAL(UrlDecode("%20%G1%ZX"), " %G1%ZX"); | ||
|
||
BOOST_CHECK_EQUAL(UrlDecode("%1 "), "%1 "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more tests:
%%ffg
%fg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I will add them when I have to retouch, if not I will make a small follow-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should document at least that we thought of the (already added)%00
corner case, e.g. via a test case or a change.
I have verified that the behavior is otherwise the same before and after via:
std::string GenerateRandomString() {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> length_dis(1, 10);
std::uniform_int_distribution<> char_dis(1, 255);
size_t length = length_dis(gen);
std::string res;
res.reserve(length);
for (size_t i = 0; i < length; ++i) {
char c = static_cast<char>(char_dis(gen));
res.push_back(c);
}
return res;
}
std::string urlDecode_old(const std::string &urlEncoded) {
std::string res;
if (!urlEncoded.empty()) {
char *decoded = evhttp_uridecode(urlEncoded.c_str(), false, nullptr);
if (decoded) {
res = std::string(decoded);
free(decoded);
}
}
return res;
}
BOOST_AUTO_TEST_CASE(randomized_url_decode_length_test) {
for (int i = 0; i < 100'000'000; ++i) {
std::string random_encoded = GenerateRandomString();
std::string result1 = urlDecode_old(random_encoded);
std::string result2 = urlDecode(random_encoded);
if (result1 != result2) {
std::cerr
<< "urlDecode discrepancy found, random_encoded = `" + random_encoded + "`, urlDecode_old = `" +
result1 + "`, urlDecode (new) = `" + result2 + "`" << std::endl;
}
BOOST_CHECK_LE(result1.size(), random_encoded.size());
}
}
BOOST_AUTO_TEST_CASE(decode_malformed_test) { | ||
BOOST_CHECK_EQUAL(UrlDecode("%%xhello th+ere \xff"), "%%xhello th+ere \xff"); | ||
|
||
BOOST_CHECK_EQUAL(UrlDecode("%"), "%"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still one sneaky difference that I've found: %00
.
Previously this prematurely terminated the string, now it's kept:
BOOST_CHECK_EQUAL(urlDecode("a%00b"), std::string("a\u0000b", 3));
this failed previously, passes now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 992c714
nit: definitely not worth more back-and-forth, but I think (untested) maflcko's suggestion should work with MSVC when explicitly converting iterators to pointers:
git diff on 992c714
diff --git a/src/common/url.cpp b/src/common/url.cpp
index ecf88d07ea..c901ecb5dc 100644
--- a/src/common/url.cpp
+++ b/src/common/url.cpp
@@ -14,25 +14,24 @@ std::string UrlDecode(std::string_view url_encoded)
std::string res;
res.reserve(url_encoded.size());
- for (size_t i = 0; i < url_encoded.size(); ++i) {
- char c = url_encoded[i];
+ for (auto it = url_encoded.begin(); it < url_encoded.end(); ++it) {
// Special handling for percent which should be followed by two hex digits
// representing an octet values, see RFC 3986, Section 2.1 Percent-Encoding
- if (c == '%' && i + 2 < url_encoded.size()) {
+ if (*it == '%' && std::distance(it, url_encoded.end()) > 2) {
unsigned int decoded_value{0};
- auto [p, ec] = std::from_chars(url_encoded.data() + i + 1, url_encoded.data() + i + 3, decoded_value, 16);
+ auto [p, ec] = std::from_chars(&*(it + 1), &*(it + 3), decoded_value, 16);
// Only if there is no error and the pointer is set to the end of
// the string, we can be sure both characters were valid hex
- if (ec == std::errc{} && p == url_encoded.data() + i + 3) {
+ if (ec == std::errc{} && p == &*(it + 3)) {
res += static_cast<char>(decoded_value);
// Next two characters are part of the percent encoding
- i += 2;
+ it += 2;
continue;
}
// In case of invalid percent encoding, add the '%' and continue
}
- res += c;
+ res += *it;
}
return res;
Yes, it's certainly possible and I briefly explored it myself, but it didn't seem like much of an improvement over the current version anymore. And the current version had already received some review at that point, so I decided it would be better to just revert since it was a style-nit anyway. |
ACK 992c714 |
I went ahead and merged this, despite NACK from luke #29904 (comment) about costs of forking shared code. It seems like the NACK was adequately responded to, with responses acknowledging the tradeoff and giving reasons why it seemed justified in this case (even fixing a bug luke reported #29654) |
fa55972 test: Add two more urlDecode tests (MarcoFalke) Pull request description: Trivial follow-up after #29904 (comment) ACKs for top commit: laanwj: Code review ACK fa55972 fjahr: ACK fa55972 stickies-v: ACK fa55972 Sjors: utACK fa55972 Tree-SHA512: 99916feebb35b5670a365120f962fd6c28cb124635c99ac3ee3520dfc130bd1672f43b06b05b7b0b9e563d223bd009f8d6622817a2d2b4ee24596af40e2cdfaf
It has been a part of SOURCE_FILES since bitcoin#29904
97a4ad5 build, msvc: Drop duplicated `common\url.cpp` source file (Hennadii Stepanov) Pull request description: After #29904, the `common\url.cpp` source file is included into the `SOURCE_FILES` by the `msvc-autogen.py` script. Removes a compiler [warning](https://github.com/bitcoin/bitcoin/actions/runs/8853698173/job/24315127236#step:20:1776): ``` url.obj : warning LNK4006: "class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > __cdecl UrlDecode(class std::basic_string_view<char,struct std::char_traits<char> >)" (?UrlDecode@@ya?AV?$basic_string@DU?$char_traits@D@std@@v?$allocator@D@2@@std@@v?$basic_string_view@DU?$char_traits@D@std@@@2@@z) already defined in common_url.obj; second definition ignored [D:\a\bitcoin\bitcoin\build_msvc\libbitcoin_common\libbitcoin_common.vcxproj] ``` ACKs for top commit: fanquake: ACK 97a4ad5 Tree-SHA512: 294955d6e6940b48a429e2302fb456706a5c62515d479398036b40716ee6b722535876adeb9b988ddb8fc942fabc39fe358c50eff0baaae92bd24bbeb4362885
Ported to the CMake-based build system in hebasto#181. |
Fixes #29654 (as a side-effect)
Removing dependencies is a general goal of the project and the xz backdoor has been an additional wake up call recently. Libevent shows many of the same symptoms, few maintainers and slow releases. While libevent can not be removed completely over night we should start removing it’s usage where it's possible, ideally with the end goal to removing it completely.
This is a pretty easy win in that direction. The
evhttp_uridecode
function from libevent we were using inurlDecode
could be easily emulated in fewer LOC. This also ports the applicable test vectors over from libevent.