Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: introduce single-separator split helper (boost::split replacement) #22953

Conversation

theStack
Copy link
Contributor

This PR adds a simple string split helper SplitString that takes use of the spanparsing Split function that was first introduced in #13697 (commit fe8a7dc). This enables to replace most calls to boost::split, in the cases where only a single separator character is used. Note that while previous attempts to replace boost::split were controversial (e.g. #13751), this one has a trivial implementation: it merely uses an internal helper (that is unit tested and in regular use with output descriptiors) and converts its result from spans to strings. As a drawback though, not all boost::split instances can be tackled.

As a possible optimization, one could return a vector of std::string_views (available since C++17) instead of strings, to avoid copies. This would need more carefulness on the caller sites though, to avoid potential lifetime issues, and it's probably not worth it, considering that none of the places where strings are split are really performance-critical.

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok to use the already existing split function.

src/util/string.h Outdated Show resolved Hide resolved
@theStack theStack force-pushed the 202109-refactor-boost_split_replacement_single_sep branch from e332fcd to e3c7fe4 Compare September 11, 2021 18:19
@theStack
Copy link
Contributor Author

Force-pushed with changes as suggested by MarcoFalke (#22953 (comment)).

Pinging @practicalswift as someone who has always shown interest in potential boost replacements and string processing helpers, with lots of contributions in that area. Maybe you feel like taking a look :)

src/util/string.cpp Outdated Show resolved Hide resolved
@theStack theStack force-pushed the 202109-refactor-boost_split_replacement_single_sep branch from e3c7fe4 to 742fc50 Compare September 12, 2021 09:47
@kiminuo
Copy link
Contributor

kiminuo commented Sep 13, 2021

I would still add a unit test even though there are tests for the internal helper. Maybe it's a little bit over the top so feel free to ignore my suggestion.

(Something like this possibly: kiminuo@ac6dd5a)

@maflcko
Copy link
Member

maflcko commented Sep 13, 2021

Any reason the tests aren't fixed up as well?

src/test/fuzz/script_assets_test_minimizer.cpp:    boost::algorithm::split(words, str, boost::algorithm::is_any_of(","));

Also, if you add a unit test, might as extend the string fuzz target.

@kiminuo
Copy link
Contributor

kiminuo commented Sep 13, 2021

There non-tests occurrences too:

src\rpc\server.cpp:
  410          std::vector<std::string> vargNames;
  411:         boost::algorithm::split(vargNames, argNamePattern, boost::algorithm::is_any_of("|"));
  412          auto fr = argsIn.end();

src\test\transaction_tests.cpp:
  72      std::vector<std::string> words;
  73:     boost::algorithm::split(words, strFlags, boost::algorithm::is_any_of(","));
  74  

src\test\fuzz\script_assets_test_minimizer.cpp:
  133      std::vector<std::string> words;
  134:     boost::algorithm::split(words, str, boost::algorithm::is_any_of(","));
  135  

If I'm not missing something.

edit: Ah, I'm not aware what the difference is between boost::algorithm::split and boost::split. Seems like none?

@theStack theStack force-pushed the 202109-refactor-boost_split_replacement_single_sep branch from 742fc50 to 25915e4 Compare September 13, 2021 11:14
@theStack
Copy link
Contributor Author

theStack commented Sep 13, 2021

I would still add a unit test even though there are tests for the internal helper. Maybe it's a little bit over the top so feel free to ignore my suggestion.

I agree that it's a good idea to add a unit test.

(Something like this possibly: kiminuo@ac6dd5a)

Thanks! As quick feedback, can you change the commit subject to be more verbose (e.g. "test: add unit tests for SplitString helper") and put the test into src/test/util_tests.cpp, instead of creating a new file? I'm happy to either add your commit to this PR, or review your PR if you decide to open your own, based on this one.

Any reason the tests aren't fixed up as well?
...

There non-tests occurrences too:
...

Oh, I only greped for boost::split, not being aware that there is also boost::algorithm::split, which seems to be a synonym. Force-pushed with changes to tackle also the remaining occurences. Also found that there were some unnecessary boost includes in src/torcontrol.cpp that I forgot to remove in the previous push.

@kiminuo
Copy link
Contributor

kiminuo commented Sep 13, 2021

(Something like this possibly: kiminuo@ac6dd5a)

Thanks! As quick feedback, can you change the commit subject to be more verbose (e.g. "test: add unit tests for SplitString helper") and put the test into src/test/util_tests.cpp, instead of creating a new file? I'm happy to either add your commit to this PR, or review your PR if you decide to open your own, based on this one.

Modified here: kiminuo@4516727. Anyway, feel free to modify the commit in whatever way you like :)

@maflcko
Copy link
Member

maflcko commented Sep 13, 2021

The fuzz test I mentioned: (might not compile)

diff --git a/src/test/fuzz/string.cpp b/src/test/fuzz/string.cpp
index 0c1b45b86c..e5a09c0013 100644
--- a/src/test/fuzz/string.cpp
+++ b/src/test/fuzz/string.cpp
@@ -127,6 +127,7 @@ FUZZ_TARGET(string)
         int64_t amount_out;
         (void)ParseFixedPoint(random_string_1, fuzzed_data_provider.ConsumeIntegralInRange<int>(0, 1024), &amount_out);
     }
+    (void)SplitString(random_string_1, fuzzed_data_provider.ConsumeIntegral<char>());
     {
         (void)Untranslated(random_string_1);
         const bilingual_str bs1{random_string_1, random_string_2};

@theStack
Copy link
Contributor Author

The fuzz test I mentioned: (might not compile)

Oh, I didn't see that in your previous comment. It compiles, I added another commit 👌

src/util/string.cpp Outdated Show resolved Hide resolved
src/util/string.cpp Outdated Show resolved Hide resolved
@theStack theStack force-pushed the 202109-refactor-boost_split_replacement_single_sep branch from a30f721 to 8a9f940 Compare September 14, 2021 14:27
@fanquake
Copy link
Member

Concept ACK - this is low-overhead for dropping more Boost dependence. Surprised we can't also nuke anything from test/lint/lint-includes.sh

@laanwj
Copy link
Member

laanwj commented Sep 16, 2021

Concept ACK. I don't think this boost use is very bad (it's a header-only library) but if we can replace it with functionality we already have internally, who not.

Concept ACK - this is low-overhead for dropping more Boost dependence. Surprised we can't also nuke anything from test/lint/lint-includes.sh

Well, split is still used in some places (httprpc.cpp for example), this only replaces the single-separator case. It will take some more work to get rid of all boost/algorithm/string.hpp use, if that's worthwhile.

@theStack theStack force-pushed the 202109-refactor-boost_split_replacement_single_sep branch from 8a9f940 to ed24bf5 Compare September 20, 2021 14:38
@theStack
Copy link
Contributor Author

Force-pushed the alternative variant as suggested by @martinus, changing spanparsing::Split to a templated function and defining an alias SplitString as suggested by @MarcoFalke (see #22953 (comment) and #22953 (comment)).

@theStack theStack force-pushed the 202109-refactor-boost_split_replacement_single_sep branch from ed24bf5 to 8949509 Compare September 20, 2021 17:28
src/util/string.h Outdated Show resolved Hide resolved
Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of forcing devs to remember to construct a std::string manually each time before calling the function.


// Case-sensitivity of the separator.
{
std::vector<std::string> result = SplitString(std::string("AAA"), 'a');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems dangerous to offer a function that silently does the wrong thing when passed the "wrong" type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. This is fixed now by using an explicit interface instead of an alias, as suggested by martinus in #22953 (comment).

@theStack theStack force-pushed the 202109-refactor-boost_split_replacement_single_sep branch from 8949509 to 36ac60d Compare September 21, 2021 10:40
@maflcko maflcko self-requested a review September 21, 2021 10:55
@martinus
Copy link
Contributor

Windows build error seems unrelated, in wallet_address_types.py:

OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted

@hebasto
Copy link
Member

hebasto commented Sep 23, 2021 via email

@theStack theStack force-pushed the 202109-refactor-boost_split_replacement_single_sep branch from 36ac60d to 049e609 Compare September 27, 2021 10:36
@theStack
Copy link
Contributor Author

Rebased on master.

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Apr 2, 2022
…-util.cpp

3bb9627 refactor: remove unused boost header include in bitcoin-util.cpp (Sebastian Falbesoner)

Pull request description:

  This header was included since the introduction of bitcoin-util in
  commit 13762bc, but boost was
  actually never used (see `git log -S boost ./src/bitcoin-util.cpp`).

  Cherry-picked out of bitcoin#22953, which currently needs rebase. This commit could just be merged on its own.

ACKs for top commit:
  MarcoFalke:
    review ACK 3bb9627

Tree-SHA512: 201ee1aa4d49074056654203db73a473479c2b92c49df8dbf8e35979f85178013c66540a665f0f6dc0a2efef88eb091e2b088bebff85d840033dffd8ae719349
theStack and others added 3 commits April 11, 2022 22:19
This helper uses spanparsing::Split internally and enables to replace
all calls to boost::split where only a single separator is passed.

Co-authored-by: Martin Ankerl <Martin.Ankerl@gmail.com>
Co-authored-by: MarcoFalke <falke.marco@gmail.com>
@theStack theStack force-pushed the 202109-refactor-boost_split_replacement_single_sep branch from 1d066fc to a62e844 Compare April 11, 2022 21:27
@theStack
Copy link
Contributor Author

Rebased on master again (there was a conflict in src/rest.cpp after #24098 has been merged).

@fanquake
Copy link
Member

@martinus want to take another look here?

@martinus
Copy link
Contributor

Code review ACK a62e844. Ran all tests. I also like that with boost::split it was not obvious that the resulting container was cleared, and with SplitString API that's obvious.

After that PR there is only one usage of boost::split left in core_read.cpp, that would be a nice followup PR 🙂

@fanquake fanquake merged commit f436bfd into bitcoin:master Apr 26, 2022
@theStack theStack deleted the 202109-refactor-boost_split_replacement_single_sep branch April 26, 2022 09:09
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Apr 26, 2022
fanquake added a commit that referenced this pull request May 4, 2022
f849e63 fuzz: SplitString with multiple separators (Martin Leitner-Ankerl)
d1a9850 http: replace boost::split with SplitString (Martin Leitner-Ankerl)
0d7efcd core_read: Replace boost::split with SplitString (Martin Leitner-Ankerl)
b7ab9db Extend Split to work with multiple separators (Martin Leitner-Ankerl)

Pull request description:

  As a followup of #22953, this removes the remaining occurrences of `boost::split` and replaces them with our own `SplitString`. To be able to do so, this extends the function `spanparsing::Split` to work with multiple separators. Finally this removes 3 more files from `lint-includes.py`.

ACKs for top commit:
  theStack:
    Code-review ACK f849e63

Tree-SHA512: f37d4dbe11cab2046e646045b0f018a75f978d521443a2c5001512737a1370e22b09247d5db0e5c9e4153229a4e2d66731903c1bba3713711c4cae8cedcc775d
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request May 4, 2022
…litString

f849e63 fuzz: SplitString with multiple separators (Martin Leitner-Ankerl)
d1a9850 http: replace boost::split with SplitString (Martin Leitner-Ankerl)
0d7efcd core_read: Replace boost::split with SplitString (Martin Leitner-Ankerl)
b7ab9db Extend Split to work with multiple separators (Martin Leitner-Ankerl)

Pull request description:

  As a followup of bitcoin#22953, this removes the remaining occurrences of `boost::split` and replaces them with our own `SplitString`. To be able to do so, this extends the function `spanparsing::Split` to work with multiple separators. Finally this removes 3 more files from `lint-includes.py`.

ACKs for top commit:
  theStack:
    Code-review ACK f849e63

Tree-SHA512: f37d4dbe11cab2046e646045b0f018a75f978d521443a2c5001512737a1370e22b09247d5db0e5c9e4153229a4e2d66731903c1bba3713711c4cae8cedcc775d
kwvg added a commit to kwvg/dash that referenced this pull request Dec 21, 2022
kwvg added a commit to kwvg/dash that referenced this pull request Dec 26, 2022
kwvg added a commit to kwvg/dash that referenced this pull request Jan 3, 2023
kwvg added a commit to kwvg/dash that referenced this pull request Jan 13, 2023
kwvg added a commit to kwvg/dash that referenced this pull request Jan 15, 2023
kwvg added a commit to kwvg/dash that referenced this pull request Jan 19, 2023
@bitcoin bitcoin locked and limited conversation to collaborators Apr 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants