-
Notifications
You must be signed in to change notification settings - Fork 38.2k
refactor, docs: Embedded ASMap [2/3]: Refactor asmap internals and add documentation #33878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33878. ReviewsSee the guideline for information on the review process. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
@sipa would love to get your eyes on the documentation commit before anyone else gets confused by the mistakes I might have made :) |
| #include <vector> | ||
|
|
||
| /* | ||
| * ASMap (Autonomous System Map) Implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the documentation changes in this commit look correct to me.
One thing that may be worth mentioning early on is that it's a bit-packed format, i.e., the entire compressed mapping is treated as a sequence of bits, packed together at 8 per byte, which are concatenated encodings of instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the quick feedback! Added a small paragraph on the bit-packed format in the introduction comment at the top.
93205dd to
9da0160
Compare
|
All the feedback from #28792 that applied to the commits here has been addressed here now. |
7a2bf8e to
eaf5e12
Compare
i'm sure you mean the first three commits? 😄 |
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed eaf5e12
Thanks for peeling off this PR as I suggested!
Yepp, fixed :) |
7de7c21 to
3011c84
Compare
|
Rebased since #33026 was merged 🥳 and addressed all comments. Thanks for the reviews! |
Calculate the asmap version only in one place: A dedicated function in util/asmap. The version was also referred to as asmap checksum in several places. To avoid confusion call it asmap version everywhere.
This prevents holding the asmap data in memory twice. The version hash changes due to spans being serialized without their size-prefix (unlike vectors).
Also makes minor improvement on the python implementation documentation.
3011c84 to
b41f5a2
Compare
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed b41f5a2
Sorry for not finding these earlier, getting closer to A-C-K.
src/util/asmap.h
Outdated
| /** Read asmap from provided binary file */ | ||
| std::vector<std::byte> DecodeAsmap(fs::path path); | ||
| /** Calculate the asmap version, a checksum identifying the asmap being used. */ | ||
| uint256 AsmapVersion(const std::vector<std::byte>& data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meganit in d278d61 "refactor: Unify asmap version calculation and naming":
As uint256 AsmapVersion(const std::vector<std::byte>& data) is new in this PR, it could be introduced as uint256 AsmapVersion(std::span<const std::byte> data) from the beginning to avoid line churn.
| /** Check asmap from embedded data */ | ||
| bool CheckAsmap(std::span<const std::byte> data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in 6b9ce8c on CheckAsmap:
- Justification for existence: I dislike default parameter values, so appreciate a distinct function from
SanityCheckASMap. - Comment: Nothing makes it specialized for only embedded data?
- Location: Maybe it could be added directly after
SanityCheckASMapin both .h + .cpp since they are so closely related? (Could place it before but that would put it betweenInterpretandSanityCheckASMapwhich are nice to have after each other as they are also closely related). - Naming: Could make it more descriptive - something like
CheckStandardAsmaporCheckIPV6Asmap? - Usage: Could be used in more places where we have
SanityCheckAsmap(asmap, 128)₿ git grep CheckAsmap src/test/fuzz/asmap.cpp: if (!SanityCheckAsmap(asmap, 128)) return; src/test/fuzz/asmap_direct.cpp: if (SanityCheckAsmap(asmap, ip_len)) { src/test/fuzz/asmap_direct.cpp: assert(!SanityCheckAsmap(prefix, ip_len)); src/test/fuzz/util/net.h: if (!SanityCheckAsmap(std::span<std::byte>(asmap), 128)) { src/util/asmap.cpp: // - should have been caught by SanityCheckAsmap below src/util/asmap.cpp:bool SanityCheckAsmap(const std::span<const std::byte> asmap, int bits) src/util/asmap.cpp:bool CheckAsmap(const std::span<const std::byte> data) src/util/asmap.cpp: if (!SanityCheckAsmap(data, 128)) { src/util/asmap.cpp: if (!CheckAsmap(buffer)) { src/util/asmap.h:bool SanityCheckAsmap(std::span<const std::byte> asmap, int bits); src/util/asmap.h:bool CheckAsmap(std::span<const std::byte> data);
| using util::ToString; | ||
|
|
||
| static NetGroupManager EMPTY_NETGROUPMAN{{}}; | ||
| static NetGroupManager EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit in 6b9ce8c:
Could avoid repeating type name here and in other files.
diff --git a/src/bench/addrman.cpp b/src/bench/addrman.cpp
index 1e93b9c395..2f474c2411 100644
--- a/src/bench/addrman.cpp
+++ b/src/bench/addrman.cpp
@@ -24,7 +24,7 @@
static constexpr size_t NUM_SOURCES = 64;
static constexpr size_t NUM_ADDRESSES_PER_SOURCE = 256;
-static NetGroupManager EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};
+static auto EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};
static constexpr uint32_t ADDRMAN_CONSISTENCY_CHECK_RATIO{0};
static std::vector<CAddress> g_sources;
diff --git a/src/test/addrman_tests.cpp b/src/test/addrman_tests.cpp
index c9a0ebebf0..f1e54416cd 100644
--- a/src/test/addrman_tests.cpp
+++ b/src/test/addrman_tests.cpp
@@ -24,7 +24,7 @@ using namespace std::literals;
using node::NodeContext;
using util::ToString;
-static NetGroupManager EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};
+static auto EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};
static const bool DETERMINISTIC{true};
static int32_t GetCheckRatio(const NodeContext& node_ctx)
@@ -584,7 +584,7 @@ BOOST_AUTO_TEST_CASE(caddrinfo_get_new_bucket_legacy)
// 101.8.0.0/16 AS8
BOOST_AUTO_TEST_CASE(caddrinfo_get_tried_bucket)
{
- NetGroupManager ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
+ auto ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
CAddress addr1 = CAddress(ResolveService("250.1.1.1", 8333), NODE_NONE);
CAddress addr2 = CAddress(ResolveService("250.1.1.1", 9999), NODE_NONE);
@@ -637,7 +637,7 @@ BOOST_AUTO_TEST_CASE(caddrinfo_get_tried_bucket)
BOOST_AUTO_TEST_CASE(caddrinfo_get_new_bucket)
{
- NetGroupManager ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
+ auto ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
CAddress addr1 = CAddress(ResolveService("250.1.2.1", 8333), NODE_NONE);
CAddress addr2 = CAddress(ResolveService("250.1.2.1", 9999), NODE_NONE);
@@ -714,7 +714,7 @@ BOOST_AUTO_TEST_CASE(caddrinfo_get_new_bucket)
BOOST_AUTO_TEST_CASE(addrman_serialization)
{
- NetGroupManager netgroupman{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
+ auto netgroupman{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
const auto ratio = GetCheckRatio(m_node);
auto addrman_asmap1 = std::make_unique<AddrMan>(netgroupman, DETERMINISTIC, ratio);
diff --git a/src/test/fuzz/asmap.cpp b/src/test/fuzz/asmap.cpp
index fec0f3f50a..6ed5aa7cd9 100644
--- a/src/test/fuzz/asmap.cpp
+++ b/src/test/fuzz/asmap.cpp
@@ -42,6 +42,6 @@ FUZZ_TARGET(asmap)
memcpy(&ipv4, addr_data, addr_size);
net_addr.SetIP(CNetAddr{ipv4});
}
- NetGroupManager netgroupman{NetGroupManager::WithEmbeddedAsmap(asmap)};
+ auto netgroupman{NetGroupManager::WithEmbeddedAsmap(asmap)};
(void)netgroupman.GetMappedAS(net_addr);
}
diff --git a/src/test/fuzz/p2p_handshake.cpp b/src/test/fuzz/p2p_handshake.cpp
index 75e14865a4..a56e1e5f1f 100644
--- a/src/test/fuzz/p2p_handshake.cpp
+++ b/src/test/fuzz/p2p_handshake.cpp
@@ -48,7 +48,7 @@ FUZZ_TARGET(p2p_handshake, .init = ::initialize)
chainman.ResetIbd();
node::Warnings warnings{};
- NetGroupManager netgroupman{NetGroupManager::NoAsmap()};
+ auto netgroupman{NetGroupManager::NoAsmap()};
AddrMan addrman{netgroupman, /*deterministic=*/true, /*consistency_check_ratio=*/0};
auto peerman = PeerManager::make(connman, addrman,
/*banman=*/nullptr, chainman,
diff --git a/src/test/fuzz/process_message.cpp b/src/test/fuzz/process_message.cpp
index 5ac796ae2c..30c121d7a2 100644
--- a/src/test/fuzz/process_message.cpp
+++ b/src/test/fuzz/process_message.cpp
@@ -77,7 +77,7 @@ FUZZ_TARGET(process_message, .init = initialize_process_message)
chainman.DisableNextWrite();
node::Warnings warnings{};
- NetGroupManager netgroupman{NetGroupManager::NoAsmap()};
+ auto netgroupman{NetGroupManager::NoAsmap()};
AddrMan addrman{netgroupman, /*deterministic=*/true, /*consistency_check_ratio=*/0};
auto peerman = PeerManager::make(connman, addrman,
/*banman=*/nullptr, chainman,
diff --git a/src/test/netbase_tests.cpp b/src/test/netbase_tests.cpp
index 5086f21ff4..4d355142e2 100644
--- a/src/test/netbase_tests.cpp
+++ b/src/test/netbase_tests.cpp
@@ -325,7 +325,7 @@ BOOST_AUTO_TEST_CASE(subnet_test)
BOOST_AUTO_TEST_CASE(netbase_getgroup)
{
- NetGroupManager netgroupman{NetGroupManager::NoAsmap()}; // use /16
+ auto netgroupman{NetGroupManager::NoAsmap()}; // use /16
BOOST_CHECK(netgroupman.GetGroup(ResolveIP("127.0.0.1")) == std::vector<unsigned char>({0})); // Local -> !Routable()
BOOST_CHECK(netgroupman.GetGroup(ResolveIP("257.0.0.1")) == std::vector<unsigned char>({0})); // !Valid -> !Routable()
BOOST_CHECK(netgroupman.GetGroup(ResolveIP("10.0.0.1")) == std::vector<unsigned char>({0})); // RFC1918 -> !Routable()| */ | ||
| inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept | ||
| { | ||
| return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a reduced version:
| return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1; | |
| return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> ((7 + bitpos) % 8)) & 1; |
I'm surprised by the way the shift amount changes for different bit positions. Would have expected something closer to this for big endian...
| return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1; | |
| return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - (bitpos % 8))) & 1; |
....but it fails the unit tests.
Godbolt experiment https://godbolt.org/z/6rjEzxhPM outputs:
PR bit shift expression (adfd8006...): (7 - ((bytes.size() * 8 - bitpos) % 8))
PR bit shift reduced: (7 + bitpos) % 8
Expected BE bit shift: 7 - (bitpos % 8)
bitpos: 0, PR: 7, reduced: 7, expected: 7
bitpos: 1, PR: 0, reduced: 0, expected: 6
bitpos: 2, PR: 1, reduced: 1, expected: 5
bitpos: 3, PR: 2, reduced: 2, expected: 4
bitpos: 4, PR: 3, reduced: 3, expected: 3
bitpos: 5, PR: 4, reduced: 4, expected: 2
bitpos: 6, PR: 5, reduced: 5, expected: 1
bitpos: 7, PR: 6, reduced: 6, expected: 0
bitpos: 8, PR: 7, reduced: 7, expected: 7
bitpos: 9, PR: 0, reduced: 0, expected: 6
bitpos: 10, PR: 1, reduced: 1, expected: 5
bitpos: 11, PR: 2, reduced: 2, expected: 4
bitpos: 12, PR: 3, reduced: 3, expected: 3
bitpos: 13, PR: 4, reduced: 4, expected: 2
bitpos: 14, PR: 5, reduced: 5, expected: 1
bitpos: 15, PR: 6, reduced: 6, expected: 0
However, when looking at the usage in Interpret(), we see if we would encounter JUMP as the first instruction, our bitpos is sort of out of bounds by one.
Line 184 in b41f5a2
| uint8_t bits = ip.size() * 8; |
Line 201 in b41f5a2
| if (GetBitBE(ip, bits)) { // Check next IP bit (big-endian) |
If we instead at the callsite make sure to send in in-bounds values - GetBitBE(ip, bits - 1), it turns out we can simplify GetBitBE quite a bit.
--- a/src/util/asmap.cpp
+++ b/src/util/asmap.cpp
@@ -48,8 +48,8 @@ namespace {
constexpr uint32_t INVALID = 0xFFFFFFFF;
/**
- * Extract a single bit from byte array using little-endian bit ordering.
- * Used for ASMap data where bits are numbered right-to-left within each byte (LSB first).
+ * Extract a single bit (LSB first) from byte array using little-endian byte ordering.
+ * Used for ASMap data.
*/
inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
{
@@ -57,12 +57,12 @@ inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
}
/**
- * Extract a single bit from byte array using big-endian bit ordering.
+ * Extract a single bit (LSB first) from byte array using big-endian byte ordering.
* Used for IP addresses to match network byte order conventions.
*/
inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
{
- return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;
+ return (std::to_integer<uint8_t>(bytes[bytes.size() - ((bitpos + 8) / 8)]) >> (bitpos % 8)) & 1;
}
/**
@@ -198,7 +198,7 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
if (jump == INVALID) break; // Jump offset straddles EOF
if (bits == 0) break; // No input bits left
if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF
- if (GetBitBE(ip, bits)) { // Check next IP bit (big-endian)
+ if (GetBitBE(ip, bits - 1)) { // Check next IP bit (big-endian)
pos += jump; // Bit = 1: skip to right subtree
}
// Bit = 0: fall through to left subtree
@@ -213,7 +213,7 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
matchlen = std::bit_width(match) - 1; // An n-bit value matches n-1 input bits
if (bits < matchlen) break; // Not enough input bits
for (uint32_t bit = 0; bit < matchlen; bit++) {
- if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) {
+ if (GetBitBE(ip, bits - 1) != ((match >> (matchlen - 1 - bit)) & 1)) {
return default_asn; // Pattern mismatch - use default
}
bits--;The similarity we get between the shifting in this new GetBitBE and GetBitLE show that we have the same bit ordering, just different byte-ordering.
Lines 54 to 56 in b41f5a2
| inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept | |
| { | |
| return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1; |
That's why I also updated the comments in the diff above.
| const size_t endpos{asmap.size() * 8}; | ||
| uint8_t bits = ip.size() * 8; | ||
| uint32_t default_asn = 0; | ||
| uint32_t jump, match, matchlen; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Here are some changes that bring the Interpret() implementation even closer to SanityCheckAsmap() while also decreasing variable scope, making the code more readable. Maybe something to include in the last commit which aligns the casing of SanityCheckAsmap?
--- a/src/util/asmap.cpp
+++ b/src/util/asmap.cpp
@@ -183,18 +183,16 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
const size_t endpos{asmap.size() * 8};
uint8_t bits = ip.size() * 8;
uint32_t default_asn = 0;
- uint32_t jump, match, matchlen;
- Instruction opcode;
while (pos < endpos) {
- opcode = DecodeType(pos, asmap);
+ Instruction opcode = DecodeType(pos, asmap);
if (opcode == Instruction::RETURN) {
// Found leaf node - return the ASN
- default_asn = DecodeASN(pos, asmap);
- if (default_asn == INVALID) break; // ASN straddles EOF
- return default_asn;
+ uint32_t asn = DecodeASN(pos, asmap);
+ if (asn == INVALID) break; // ASN straddles EOF
+ return asn;
} else if (opcode == Instruction::JUMP) {
// Binary branch: if IP bit is 1, jump forward; else continue
- jump = DecodeJump(pos, asmap);
+ uint32_t jump = DecodeJump(pos, asmap);
if (jump == INVALID) break; // Jump offset straddles EOF
if (bits == 0) break; // No input bits left
if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF
@@ -208,11 +206,11 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
// The match value encodes both length and pattern:
// - highest set bit position determines length (bit_width - 1)
// - lower bits contain the pattern to compare
- match = DecodeMatch(pos, asmap);
+ uint32_t match = DecodeMatch(pos, asmap);
if (match == INVALID) break; // Match bits straddle EOF
- matchlen = std::bit_width(match) - 1; // An n-bit value matches n-1 input bits
+ int matchlen = std::bit_width(match) - 1; // An n-bit value matches n-1 input bits
if (bits < matchlen) break; // Not enough input bits
- for (uint32_t bit = 0; bit < matchlen; bit++) {
+ for (int bit = 0; bit < matchlen; bit++) {
if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) {
return default_asn; // Pattern mismatch - use default
}| inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept | ||
| { | ||
| return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1; | ||
| } | ||
|
|
||
| /** | ||
| * Extract a single bit from byte array using big-endian bit ordering. | ||
| * Used for IP addresses to match network byte order conventions. | ||
| */ | ||
| inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: In other cases we have
DecodeBits(size_t& bitpos, const std::span<const std::byte>& data...andDecodeType(size_t& bitpos, const std::span<const std::byte>& data)- ...
So it might be nice to have the same argument order here:
| inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept | |
| { | |
| return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1; | |
| } | |
| /** | |
| * Extract a single bit from byte array using big-endian bit ordering. | |
| * Used for IP addresses to match network byte order conventions. | |
| */ | |
| inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept | |
| inline bool GetBitLE(uint32_t bitpos, std::span<const std::byte> bytes) noexcept | |
| { | |
| return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1; | |
| } | |
| /** | |
| * Extract a single bit from byte array using big-endian bit ordering. | |
| * Used for IP addresses to match network byte order conventions. | |
| */ | |
| inline bool GetBitBE(uint32_t bitpos, std::span<const std::byte> bytes) noexcept |
Taking it one step further, using the observation that we only inspect a bit once, we could perform incrementing and decrementing within these functions as well. This also is influenced by the other comment investigating GetBitBE.
constexpr uint32_t INVALID = 0xFFFFFFFF;
/**
- * Extract a single bit from byte array using little-endian bit ordering.
- * Used for ASMap data where bits are numbered right-to-left within each byte (LSB first).
+ * Extract a single bit (LSB first) from byte array using little-endian byte ordering.
+ * Used for ASMap data.
*/
-inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
+inline bool DecodeBitLE(size_t& bitpos, std::span<const std::byte> bytes) noexcept
{
- return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
+ const bool bit = (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
+ ++bitpos;
+ return bit;
}
/**
- * Extract a single bit from byte array using big-endian bit ordering.
+ * Extract a single bit (LSB first) from byte array using big-endian byte ordering.
* Used for IP addresses to match network byte order conventions.
*/
-inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
+inline bool DecodeBitBE(uint8_t& bitpos, std::span<const std::byte> bytes) noexcept
{
- return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;
+ --bitpos;
+ return (std::to_integer<uint8_t>(bytes[bytes.size() - ((bitpos + 8) / 8)]) >> (bitpos % 8)) & 1;
}
/**
@@ -88,8 +91,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint
// Read continuation bit to determine if we're in this class
if (bit_sizes_it + 1 != bit_sizes.end()) { // Unless we're in the last class
if (bitpos >= data.size() * 8) break;
- bit = GetBitLE(data, bitpos);
- bitpos++;
+ bit = DecodeBitLE(bitpos, data);
} else {
bit = 0; // Last class has no continuation bit
}
@@ -101,8 +103,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint
// Decode the position within this class in big endian
for (int b = 0; b < *bit_sizes_it; b++) {
if (bitpos >= data.size() * 8) return INVALID; // Reached EOF in mantissa
- bit = GetBitLE(data, bitpos);
- bitpos++;
+ bit = DecodeBitLE(bitpos, data);
val += bit << (*bit_sizes_it - 1 - b); // Big-endian within the class
}
return val;
@@ -198,11 +199,10 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
if (jump == INVALID) break; // Jump offset straddles EOF
if (bits == 0) break; // No input bits left
if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF
- if (GetBitBE(ip, bits)) { // Check next IP bit (big-endian)
+ if (DecodeBitBE(bits, ip)) { // Check next IP bit (big-endian)
pos += jump; // Bit = 1: skip to right subtree
}
// Bit = 0: fall through to left subtree
- bits--;
} else if (opcode == Instruction::MATCH) {
// Compare multiple IP bits against a pattern
// The match value encodes both length and pattern:
@@ -213,10 +213,9 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
matchlen = std::bit_width(match) - 1; // An n-bit value matches n-1 input bits
if (bits < matchlen) break; // Not enough input bits
for (uint32_t bit = 0; bit < matchlen; bit++) {
- if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) {
+ if (DecodeBitBE(bits, ip) != ((match >> (matchlen - 1 - bit)) & 1)) {
return default_asn; // Pattern mismatch - use default
}
- bits--;
}
// Pattern matched - continue execution
} else if (opcode == Instruction::DEFAULT) {
@@ -260,8 +259,7 @@ bool SanityCheckAsmap(const std::span<const std::byte> asmap, int bits)
// Nothing to execute anymore
if (endpos - pos > 7) return false; // Excessive padding
while (pos != endpos) {
- if (GetBitLE(asmap, pos)) return false; // Nonzero padding bit
- ++pos;
+ if (DecodeBitLE(pos, asmap)) return false; // Nonzero padding bit
}
return true; // Sanely reached EOF
} else {There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Maybe ConsumeBit(LE|BE) would be more appropriate than DecodeBit(LE|BE)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't stop thinking about this. It felt weird to implement reverse byte order in GetBitBE but then also start bits from the end of ip data and count backwards. Found a way that I think simplifies things further, making the order we consume bytes between LE&BE implementations the same, but having different bit-order. Diff against b41f5a2:
--- a/src/util/asmap.cpp
+++ b/src/util/asmap.cpp
@@ -48,21 +48,25 @@ namespace {
constexpr uint32_t INVALID = 0xFFFFFFFF;
/**
- * Extract a single bit from byte array using little-endian bit ordering.
- * Used for ASMap data where bits are numbered right-to-left within each byte (LSB first).
+ * Extract a single bit from byte array using little-endian bit ordering (LSB first).
+ * Used for ASMap data.
*/
-inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
+inline bool ConsumeBitLE(size_t& bitpos, std::span<const std::byte> bytes) noexcept
{
- return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
+ const bool bit = (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
+ ++bitpos;
+ return bit;
}
/**
- * Extract a single bit from byte array using big-endian bit ordering.
+ * Extract a single bit from byte array using big-endian bit ordering (MSB first).
* Used for IP addresses to match network byte order conventions.
*/
-inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
+inline bool ConsumeBitBE(uint8_t& bitpos, std::span<const std::byte> bytes) noexcept
{
- return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;
+ const bool bit = (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (7 - (bitpos % 8))) & 1;
+ ++bitpos;
+ return bit;
}
/**
@@ -88,8 +92,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint
// Read continuation bit to determine if we're in this class
if (bit_sizes_it + 1 != bit_sizes.end()) { // Unless we're in the last class
if (bitpos >= data.size() * 8) break;
- bit = GetBitLE(data, bitpos);
- bitpos++;
+ bit = ConsumeBitLE(bitpos, data);
} else {
bit = 0; // Last class has no continuation bit
}
@@ -101,8 +104,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint
// Decode the position within this class in big endian
for (int b = 0; b < *bit_sizes_it; b++) {
if (bitpos >= data.size() * 8) return INVALID; // Reached EOF in mantissa
- bit = GetBitLE(data, bitpos);
- bitpos++;
+ bit = ConsumeBitLE(bitpos, data);
val += bit << (*bit_sizes_it - 1 - b); // Big-endian within the class
}
return val;
@@ -181,7 +183,8 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
{
size_t pos{0};
const size_t endpos{asmap.size() * 8};
- uint8_t bits = ip.size() * 8;
+ uint8_t ip_bit{0};
+ const uint8_t ip_bits_end = ip.size() * 8;
uint32_t default_asn = 0;
uint32_t jump, match, matchlen;
Instruction opcode;
@@ -196,13 +199,12 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
// Binary branch: if IP bit is 1, jump forward; else continue
jump = DecodeJump(pos, asmap);
if (jump == INVALID) break; // Jump offset straddles EOF
- if (bits == 0) break; // No input bits left
+ if (ip_bit == ip_bits_end) break; // No input bits left
if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF
- if (GetBitBE(ip, bits)) { // Check next IP bit (big-endian)
+ if (ConsumeBitBE(ip_bit, ip)) { // Check next IP bit (big-endian)
pos += jump; // Bit = 1: skip to right subtree
}
// Bit = 0: fall through to left subtree
- bits--;
} else if (opcode == Instruction::MATCH) {
// Compare multiple IP bits against a pattern
// The match value encodes both length and pattern:
@@ -211,12 +213,11 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
match = DecodeMatch(pos, asmap);
if (match == INVALID) break; // Match bits straddle EOF
matchlen = std::bit_width(match) - 1; // An n-bit value matches n-1 input bits
- if (bits < matchlen) break; // Not enough input bits
+ if ((ip_bits_end - ip_bit) < matchlen) break; // Not enough input bits
for (uint32_t bit = 0; bit < matchlen; bit++) {
- if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) {
+ if (ConsumeBitBE(ip_bit, ip) != ((match >> (matchlen - 1 - bit)) & 1)) {
return default_asn; // Pattern mismatch - use default
}
- bits--;
}
// Pattern matched - continue execution
} else if (opcode == Instruction::DEFAULT) {
@@ -260,8 +261,7 @@ bool SanityCheckAsmap(const std::span<const std::byte> asmap, int bits)
// Nothing to execute anymore
if (endpos - pos > 7) return false; // Excessive padding
while (pos != endpos) {
- if (GetBitLE(asmap, pos)) return false; // Nonzero padding bit
- ++pos;
+ if (ConsumeBitLE(pos, asmap)) return false; // Nonzero padding bit
}
return true; // Sanely reached EOF
} else {
This is a second slice carved out of #28792. It contains the following changes that are crucial for the embedding of asmap data which is added the following PR in the series (probably this will remain in #28792).
The changes are:
std::byteinstead of bitsspanrather than a vector in the asmap internal to prevent holding the asmap data in memory twiceThe first three commits were already part of #28792, the others are new.
The documentation commit came out of feedback gathered at the latest CoreDev. The primary input for the documentation was the documentation that already existed in the Python implementation (
contrib/asmap/asmap.py) but there are several other comments as well. Please note: I have also asked several LLMs to provide suggestions on how to explain pieces of the implementation and better demonstrate how the parts work together. I have copied bits and pieces that I liked but everything has been edited further by me and obviously all mistakes here are my own.