Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modernize util/strencodings and util/string: string_view and optional #24764

Closed
wants to merge 10 commits into from

Conversation

sipa
Copy link
Member

@sipa sipa commented Apr 4, 2022

Make use of std::string_view and std::optional in the util/{strencodings, string} files.

This avoids many temporary string/vector objects being created, while making the interface easier to read. Changes include:

  • Make all input arguments in functions in util/strencodings and util/string take std::string_view instead of std::string.
  • Add RemovePrefixView and TrimStringView which also return std::string_view objects (the corresponding RemovePrefix and TrimString keep returning an std::string, as that's needed in many call sites still).
  • Stop returning std::string from DecodeBase32 and DecodeBase64, but return vectors. Base32/64 are fundamentally algorithms for encoding bytes as strings; returning std::string from those (especially doing it conditionally based on the input arguments/types) is just bizarre.
  • Stop taking a bool* pf_invalid output argument pointer in DecodeBase32 and DecodeBase64; return an std::optional instead.
  • Make DecodeBase32 and DecodeBase64 more efficient by doing the conversion from characters to integer symbols on-the-fly rather than through a temporary vector.

@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 4, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #24742 ([POC] build: prune Boost headers in depends by fanquake)
  • #24185 (refactor: only use explicit reinterpret/const casts, not implicit by PastaPastaPasta)
  • #24058 (BIP-322 basic support by kallewoof)
  • #23595 (util: Add ParseHexstd::byte() helper by MarcoFalke)
  • #22953 (refactor: introduce single-separator split helper (boost::split replacement) by theStack)
  • #22087 (Validate port-options by amadeuszpawlik)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@laanwj
Copy link
Member

laanwj commented Apr 5, 2022

Concept ACK

Add RemovePrefixView and TrimStringView which also return std::string_view objects (the corresponding RemovePrefix and TrimString keep returning an std::string, as that's needed in many call sites still).

Returning a string_view can be somewhat dangerous (in combination with auto? also when the original string goes out of scope, especially if it's a temporary), so giving these functions new names is a good idea.

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning a string_view can be somewhat dangerous (in combination with auto? also when the original string goes out of scope, especially if it's a temporary), so giving these functions new names is a good idea.

A new name is good, but doesn't solve the problem. I think this can be fixed with an attribute: #20493

I think this can be merged, but I left a few nits and questions.

review ACK 6e1f147 💢

Show signature

Signature:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

review ACK 6e1f14780452d956503f1dbc3c91b17b44287f77 💢
-----BEGIN PGP SIGNATURE-----

iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUghZAv+JPOBXiEZ9L1jCF9HPLDO1vGLEuvq6hw6EkaMSKd0Zn3UaiYvP98jPBMZ
HzTidM3yPWwDfLw3QVTlPCdt0J8wBAwOg5ZgjGRZ7fu1hn0SZ93q960giU68DbeJ
ZZCF4fAYPOkTqf1a06DnV7anglhEIG/JXMg+XESTvUGmtdKnP48sdwek4NZOw1h7
PcA+Ck38qlb68W38HQMwsQYr/aO2Ftc2Utw2GMv3Kkr7nD6kIwtuH9WOTWc8Kw2H
vr5NeRfQS1smm9rP+VrixKmCaC1h37r6NMuXDqh/MRNj77ONwmLDLWES/4bP9ioc
w814fQled+vIue+Mv5yBUNeGZmlOryQsC7wJwNkiRNKxG5H+bSn2KUKEtZAGjHpv
rfNI9X9nbDCvDQdHemrYCessHaCfGFCMeUDC4PWDZ0uNaZSC6RrU0y8pESeMOzxe
BJJxSpq0sHCOAVt8LKGyumBlma/k2+n9d6UNo3++K/am9Uxei4kFgLJ0QebpXKqm
NqMH+EA3
=/7qh
-----END PGP SIGNATURE-----

src/util/strencodings.cpp Outdated Show resolved Hide resolved
src/httprpc.cpp Outdated Show resolved Hide resolved
src/test/base32_tests.cpp Outdated Show resolved Hide resolved
src/test/base64_tests.cpp Outdated Show resolved Hide resolved
src/i2p.cpp Outdated Show resolved Hide resolved
src/test/base32_tests.cpp Outdated Show resolved Hide resolved
src/test/base64_tests.cpp Outdated Show resolved Hide resolved
src/util/strencodings.cpp Outdated Show resolved Hide resolved
src/util/strencodings.cpp Show resolved Hide resolved
@laanwj
Copy link
Member

laanwj commented Apr 5, 2022

A new name is good, but doesn't solve the problem. I think this can be fixed with an attribute: #20493

Agree, I just meant that putting a string_view-returning function as a drop-in replacement would have been a very bad idea, a new function with a new name means that people using it can reason about the risks anew. Sure, if an attribute can help avoid such problems more widely that's of course even better.

@maflcko
Copy link
Member

maflcko commented Apr 5, 2022

review ACK e4092c5 🐤

Only changes:

  • Remove confusing (signed char) cast
  • Use shorter str.substr(0, 2) == "0x" over manual matching/implementation
  • A minor change in a unit test
  • Whitespace changes
Show signature

Signature:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

review ACK e4092c5230bdd1f6439c70fa22b68381e5cc6899 🐤

Only changes:
* Remove confusing (signed char) cast
* Use shorter str.substr(0, 2) == "0x" over manual matching/implementation
* A minor change in a unit test
* Whitespace changes
-----BEGIN PGP SIGNATURE-----

iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUjhQQwAsEPZBkw/LyKgYc/xITU1DL5Oausc0jYNBH7WrBEh055OyfUZ5NgxM87K
bcJTNQMCmLVfWnOcg0FUELaXCGbqua8qPgiKwQT5nxlHN0DXpakSSWrkGh2HKIGe
zwPKwNrMEH7TrCs3Tkn3ZTkiHI3gl+AFTL5zdw1b/sMwPapeZLDqoAQZXTqj+LRB
0d44jHX6oO5faHRI6p+fErVtg40Q+3s49CkYXbTl/u1HEWaKcWBxk0wZ9JHUUWuY
Z2K1OWDWH/KuR/gYlUUuVfJk5jgG6NqNgzErLUjCtYjQ6LGpI90e7ud/ahv2Xndb
vRzi/r/XAQrzlYLKM4ZuoeiTTSKxLJHD6353KVgPVKETioM6o1eyn3qWpg5JwnT4
VglcsozZzoVbSdghoQ397NhFjI+Lfs+d60EcUB8lzPgzP00gAo2smlXXM0QMBRV/
XOweIVcmYulLYK0LnBXbQy4GSqyE06JsIKpugwFJEcKb7mkdJSdJ8Xg4xxwwYyjR
CdV7knXn
=yyjx
-----END PGP SIGNATURE-----

@Empact
Copy link
Contributor

Empact commented Apr 6, 2022

Concept ACK - the ConvertBits functionality is unused in this PR - maybe it could be saved for a time when it is needed? (misread)

nits:

  • Mild preference for more strict separation between refactoring and behavior modification - would make this easier to ACK, at least in part
  • There is a check in IsHex (% 2 == 0) that could be applied to IsHexNumber
  • Could inline or return early in some cases where you're converting to return optional (previous implementation was constrained by the need to set the val) 889a5f1

@sipa
Copy link
Member Author

sipa commented Apr 6, 2022

Mild preference for more strict separation between refactoring and behavior modification - would make this easier to ACK, at least in part

Yeah, I'll separate them (now that we figured out the issue with ignoring the pfInvalid flag in the http auth logic).

There is a check in IsHex (% 2 == 0) that could be applied to IsHexNumber

That would be incorrect. Hex numbers don't need to be a multiple of 2 characters (0xF is a valid hex number, e.g.).

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK 5ea072e only effective change is adding a "return false" 🌷

Show signature

Signature:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

re-ACK 5ea072ed08b43cdfdf8554597dcf8c1a51ef4dae only effective change is adding a "return false" 🌷
-----BEGIN PGP SIGNATURE-----

iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUgUyQwAhIAVoAgFscYNynT/fORwnmC2JJczY8Zne+7IKHyQadrQGB3uGARyJceI
WDRjN7aLHHJwFmFZ/Lz4HGa5sZYwAUC/Vz9y2eX0nu2PInfZgdtNbx8JT0cx5wkX
03LWXiek/qiy6BeEctpE1nP37iSAqB4E18yEhCMqdLe+m7KCq/rTZN2XdF/R3DHK
j9mqyyy7M5dYQUE2dViBdqTydSrhGIhw2HqHXytP8/urfFB04wLXeJHNDZ6lRfZ1
zq28KVL2FOuFXdYTIAtdeCAFAXFgwiZ2lYik1+aftCBBO9BdBnmKqRD0/ERAC2u8
TfVPFMQECHbWxcxGwZGBjhNhcXGI2fVUF2g4SqJVssqPnG53GcSRvN6H0tcKDAH7
g1x6fqDibKBIUXcrTp8l5tlZYknN2wU0wq9M7U6uJIt5gGWOTlpYpbQsMIlzg6Ls
omRKFXjTXHvFmO8eilHMaMVk6qY/12MjoTyHZ1JwN7TWHRxH6Q9XNTFeq1VJE/qd
uqaqLwJe
=bejp
-----END PGP SIGNATURE-----

@@ -75,9 +85,12 @@ inline std::string MakeUnorderedList(const std::vector<std::string>& items)
/**
* Check if a string does not contain any embedded NUL (\0) characters
*/
[[nodiscard]] inline bool ValidAsCString(const std::string& str) noexcept
[[nodiscard]] inline bool ValidAsCString(std::string_view str) noexcept
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name ValidAsCString is now a bit dangerous. I'd assume that it returns true when the argument is 0 terminated at the end (which was the case for the const std::string& str argument), but with a std::string_view argument this might not be the case.

So now even when ValidAsCString(str) returns true, it is not allowed to use e.g. strlen(str.data()).

I would rename the method to something like ContainsNoNUL to make it more clear what's checked, or keep the const std::string& argument for now

Copy link
Member

@laanwj laanwj Apr 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the documentation, it only checks if a string has embedded \0 characters. That std::string is zero-terminated internally is an implementation detail, the caller is still responsible for passing c_str() (and never .data()) when passing it to a function that takes a C string.

it is not allowed to use e.g. strlen(str.data()).

I'd really frown on code like that in any case.

Edit: The only time it'd be remotely acceptable to pass .data() as a C string would be if there is an embedded \0 in the array, which is exactly what this function avoids 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, maybe it's just me, but from the interface ValidAsCString(std::string_view str) I'd guess that it returns true when it actually finds a \0 somewhere, but it's exactly the opposite 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no objection to adding a scripted-diff to rename ValidAsCString to ContainsNoNUL. Either in this pull or as a follow-up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft".

@maflcko
Copy link
Member

maflcko commented Apr 27, 2022

I am happy to pick this up if you are no longer working on it

@sipa
Copy link
Member Author

sipa commented Apr 27, 2022

@MarcoFalke Go for it.

@sipa sipa closed this Apr 27, 2022
@bitcoin bitcoin locked and limited conversation to collaborators Apr 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants