refactor, docs: Embedded ASMap [2/3]: Refactor asmap internals and add documentation #33878

fjahr · 2025-11-15T20:50:29Z

This is a second slice carved out of #28792. It contains the following changes that are crucial for the embedding of asmap data which is added the following PR in the series (probably this will remain in #28792).

The changes are:

Modernizes and simplifies the asmap code by operating on std::byte instead of bits
Unifies asmap version calculation and naming (previously it was called version and checksum interchangeably)
Operate on a span rather than a vector in the asmap internal to prevent holding the asmap data in memory twice
Add more extensive documentation to the asmap implementation
Unify asmap casing in implemetation function names

The first three commits were already part of #28792, the others are new.

The documentation commit came out of feedback gathered at the latest CoreDev. The primary input for the documentation was the documentation that already existed in the Python implementation (contrib/asmap/asmap.py) but there are several other comments as well. Please note: I have also asked several LLMs to provide suggestions on how to explain pieces of the implementation and better demonstrate how the parts work together. I have copied bits and pieces that I liked but everything has been edited further by me and obviously all mistakes here are my own.

DrahtBot · 2025-11-15T20:50:37Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33878.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#33920 (Export embedded ASMap RPC by fjahr)
#33631 (init: Split file path handling out of -asmap option by fjahr)
#31260 (scripted-diff: Type-safe settings retrieval by ryanofsky)
#28792 (build: Embedded ASMap [3/3]: Build binary dump header file by fjahr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

fjahr · 2025-11-15T21:06:44Z

@sipa would love to get your eyes on the documentation commit before anyone else gets confused by the mistakes I might have made :)

sipa · 2025-11-15T22:05:09Z

src/util/asmap.cpp

 #include <vector>

+/*
+ * ASMap (Autonomous System Map) Implementation


All the documentation changes in this commit look correct to me.

One thing that may be worth mentioning early on is that it's a bit-packed format, i.e., the entire compressed mapping is treated as a sequence of bits, packed together at 8 per byte, which are concatenated encodings of instructions.

Thank you for the quick feedback! Added a small paragraph on the bit-packed format in the introduction comment at the top.

fjahr · 2025-11-16T21:45:58Z

All the feedback from #28792 that applied to the commits here has been addressed here now.

laanwj · 2025-11-17T12:47:43Z

The first three PRs were already part of #28792, the others are new.

i'm sure you mean the first three commits? 😄

src/netgroup.cpp

src/util/asmap.cpp

hodlinator

Reviewed eaf5e12

Thanks for peeling off this PR as I suggested!

src/netgroup.cpp

src/test/netbase_tests.cpp

src/util/asmap.cpp

src/bench/addrman.cpp

src/util/asmap.cpp

src/util/asmap.h

src/util/asmap.cpp

fjahr · 2025-11-19T00:10:15Z

The first three PRs were already part of #28792, the others are new.

i'm sure you mean the first three commits? 😄

Yepp, fixed :)

fjahr · 2025-11-19T15:22:50Z

Rebased since #33026 was merged 🥳 and addressed all comments. Thanks for the reviews!

Calculate the asmap version only in one place: A dedicated function in util/asmap. The version was also referred to as asmap checksum in several places. To avoid confusion call it asmap version everywhere.

This prevents holding the asmap data in memory twice. The version hash changes due to spans being serialized without their size-prefix (unlike vectors).

Also makes minor improvement on the python implementation documentation.

hodlinator

Reviewed b41f5a2

Sorry for not finding these earlier, getting closer to A-C-K.

src/test/netbase_tests.cpp

hodlinator · 2025-11-20T09:15:32Z

src/util/asmap.h

 /** Read asmap from provided binary file */
 std::vector<std::byte> DecodeAsmap(fs::path path);
+/** Calculate the asmap version, a checksum identifying the asmap being used. */
+uint256 AsmapVersion(const std::vector<std::byte>& data);


meganit in d278d61 "refactor: Unify asmap version calculation and naming":
As uint256 AsmapVersion(const std::vector<std::byte>& data) is new in this PR, it could be introduced as uint256 AsmapVersion(std::span<const std::byte> data) from the beginning to avoid line churn.

hodlinator · 2025-11-20T09:35:22Z

src/util/asmap.h

+/** Check asmap from embedded data */
+bool CheckAsmap(std::span<const std::byte> data);


in 6b9ce8c on CheckAsmap:

Justification for existence: I dislike default parameter values, so appreciate a distinct function from SanityCheckASMap.

Comment: Nothing makes it specialized for only embedded data?

Location: Maybe it could be added directly after SanityCheckASMap in both .h + .cpp since they are so closely related? (Could place it before but that would put it between Interpret and SanityCheckASMap which are nice to have after each other as they are also closely related).

Naming: Could make it more descriptive - something like CheckStandardAsmap or CheckIPV6Asmap?

Usage: Could be used in more places where we have SanityCheckAsmap(asmap, 128)
₿ git grep CheckAsmap src/test/fuzz/asmap.cpp: if (!SanityCheckAsmap(asmap, 128)) return; src/test/fuzz/asmap_direct.cpp: if (SanityCheckAsmap(asmap, ip_len)) { src/test/fuzz/asmap_direct.cpp: assert(!SanityCheckAsmap(prefix, ip_len)); src/test/fuzz/util/net.h: if (!SanityCheckAsmap(std::span<std::byte>(asmap), 128)) { src/util/asmap.cpp: // - should have been caught by SanityCheckAsmap below src/util/asmap.cpp:bool SanityCheckAsmap(const std::span<const std::byte> asmap, int bits) src/util/asmap.cpp:bool CheckAsmap(const std::span<const std::byte> data) src/util/asmap.cpp: if (!SanityCheckAsmap(data, 128)) { src/util/asmap.cpp: if (!CheckAsmap(buffer)) { src/util/asmap.h:bool SanityCheckAsmap(std::span<const std::byte> asmap, int bits); src/util/asmap.h:bool CheckAsmap(std::span<const std::byte> data);

hodlinator · 2025-11-20T09:46:05Z

src/test/addrman_tests.cpp

 using util::ToString;

-static NetGroupManager EMPTY_NETGROUPMAN{{}};
+static NetGroupManager EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};


nit in 6b9ce8c:

Could avoid repeating type name here and in other files.

diff --git a/src/bench/addrman.cpp b/src/bench/addrman.cpp index 1e93b9c395..2f474c2411 100644 --- a/src/bench/addrman.cpp +++ b/src/bench/addrman.cpp @@ -24,7 +24,7 @@ static constexpr size_t NUM_SOURCES = 64; static constexpr size_t NUM_ADDRESSES_PER_SOURCE = 256; -static NetGroupManager EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()}; +static auto EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()}; static constexpr uint32_t ADDRMAN_CONSISTENCY_CHECK_RATIO{0}; static std::vector<CAddress> g_sources; diff --git a/src/test/addrman_tests.cpp b/src/test/addrman_tests.cpp index c9a0ebebf0..f1e54416cd 100644 --- a/src/test/addrman_tests.cpp +++ b/src/test/addrman_tests.cpp @@ -24,7 +24,7 @@ using namespace std::literals; using node::NodeContext; using util::ToString; -static NetGroupManager EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()}; +static auto EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()}; static const bool DETERMINISTIC{true}; static int32_t GetCheckRatio(const NodeContext& node_ctx) @@ -584,7 +584,7 @@ BOOST_AUTO_TEST_CASE(caddrinfo_get_new_bucket_legacy) // 101.8.0.0/16 AS8 BOOST_AUTO_TEST_CASE(caddrinfo_get_tried_bucket) { - NetGroupManager ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)}; + auto ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)}; CAddress addr1 = CAddress(ResolveService("250.1.1.1", 8333), NODE_NONE); CAddress addr2 = CAddress(ResolveService("250.1.1.1", 9999), NODE_NONE); @@ -637,7 +637,7 @@ BOOST_AUTO_TEST_CASE(caddrinfo_get_tried_bucket) BOOST_AUTO_TEST_CASE(caddrinfo_get_new_bucket) { - NetGroupManager ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)}; + auto ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)}; CAddress addr1 = CAddress(ResolveService("250.1.2.1", 8333), NODE_NONE); CAddress addr2 = CAddress(ResolveService("250.1.2.1", 9999), NODE_NONE); @@ -714,7 +714,7 @@ BOOST_AUTO_TEST_CASE(caddrinfo_get_new_bucket) BOOST_AUTO_TEST_CASE(addrman_serialization) { - NetGroupManager netgroupman{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)}; + auto netgroupman{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)}; const auto ratio = GetCheckRatio(m_node); auto addrman_asmap1 = std::make_unique<AddrMan>(netgroupman, DETERMINISTIC, ratio); diff --git a/src/test/fuzz/asmap.cpp b/src/test/fuzz/asmap.cpp index fec0f3f50a..6ed5aa7cd9 100644 --- a/src/test/fuzz/asmap.cpp +++ b/src/test/fuzz/asmap.cpp @@ -42,6 +42,6 @@ FUZZ_TARGET(asmap) memcpy(&ipv4, addr_data, addr_size); net_addr.SetIP(CNetAddr{ipv4}); } - NetGroupManager netgroupman{NetGroupManager::WithEmbeddedAsmap(asmap)}; + auto netgroupman{NetGroupManager::WithEmbeddedAsmap(asmap)}; (void)netgroupman.GetMappedAS(net_addr); } diff --git a/src/test/fuzz/p2p_handshake.cpp b/src/test/fuzz/p2p_handshake.cpp index 75e14865a4..a56e1e5f1f 100644 --- a/src/test/fuzz/p2p_handshake.cpp +++ b/src/test/fuzz/p2p_handshake.cpp @@ -48,7 +48,7 @@ FUZZ_TARGET(p2p_handshake, .init = ::initialize) chainman.ResetIbd(); node::Warnings warnings{}; - NetGroupManager netgroupman{NetGroupManager::NoAsmap()}; + auto netgroupman{NetGroupManager::NoAsmap()}; AddrMan addrman{netgroupman, /*deterministic=*/true, /*consistency_check_ratio=*/0}; auto peerman = PeerManager::make(connman, addrman, /*banman=*/nullptr, chainman, diff --git a/src/test/fuzz/process_message.cpp b/src/test/fuzz/process_message.cpp index 5ac796ae2c..30c121d7a2 100644 --- a/src/test/fuzz/process_message.cpp +++ b/src/test/fuzz/process_message.cpp @@ -77,7 +77,7 @@ FUZZ_TARGET(process_message, .init = initialize_process_message) chainman.DisableNextWrite(); node::Warnings warnings{}; - NetGroupManager netgroupman{NetGroupManager::NoAsmap()}; + auto netgroupman{NetGroupManager::NoAsmap()}; AddrMan addrman{netgroupman, /*deterministic=*/true, /*consistency_check_ratio=*/0}; auto peerman = PeerManager::make(connman, addrman, /*banman=*/nullptr, chainman, diff --git a/src/test/netbase_tests.cpp b/src/test/netbase_tests.cpp index 5086f21ff4..4d355142e2 100644 --- a/src/test/netbase_tests.cpp +++ b/src/test/netbase_tests.cpp @@ -325,7 +325,7 @@ BOOST_AUTO_TEST_CASE(subnet_test) BOOST_AUTO_TEST_CASE(netbase_getgroup) { - NetGroupManager netgroupman{NetGroupManager::NoAsmap()}; // use /16 + auto netgroupman{NetGroupManager::NoAsmap()}; // use /16 BOOST_CHECK(netgroupman.GetGroup(ResolveIP("127.0.0.1")) == std::vector<unsigned char>({0})); // Local -> !Routable() BOOST_CHECK(netgroupman.GetGroup(ResolveIP("257.0.0.1")) == std::vector<unsigned char>({0})); // !Valid -> !Routable() BOOST_CHECK(netgroupman.GetGroup(ResolveIP("10.0.0.1")) == std::vector<unsigned char>({0})); // RFC1918 -> !Routable()

hodlinator · 2025-11-20T10:43:47Z

src/util/asmap.cpp

+ */
+inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
+{
+    return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;


Here's a reduced version:

Suggested change

return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;

return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> ((7 + bitpos) % 8)) & 1;

I'm surprised by the way the shift amount changes for different bit positions. Would have expected something closer to this for big endian...

Suggested change

return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;

return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - (bitpos % 8))) & 1;

....but it fails the unit tests.

Godbolt experiment https://godbolt.org/z/6rjEzxhPM outputs:

PR bit shift expression (adfd8006...): (7 - ((bytes.size() * 8 - bitpos) % 8)) PR bit shift reduced: (7 + bitpos) % 8 Expected BE bit shift: 7 - (bitpos % 8) bitpos: 0, PR: 7, reduced: 7, expected: 7 bitpos: 1, PR: 0, reduced: 0, expected: 6 bitpos: 2, PR: 1, reduced: 1, expected: 5 bitpos: 3, PR: 2, reduced: 2, expected: 4 bitpos: 4, PR: 3, reduced: 3, expected: 3 bitpos: 5, PR: 4, reduced: 4, expected: 2 bitpos: 6, PR: 5, reduced: 5, expected: 1 bitpos: 7, PR: 6, reduced: 6, expected: 0 bitpos: 8, PR: 7, reduced: 7, expected: 7 bitpos: 9, PR: 0, reduced: 0, expected: 6 bitpos: 10, PR: 1, reduced: 1, expected: 5 bitpos: 11, PR: 2, reduced: 2, expected: 4 bitpos: 12, PR: 3, reduced: 3, expected: 3 bitpos: 13, PR: 4, reduced: 4, expected: 2 bitpos: 14, PR: 5, reduced: 5, expected: 1 bitpos: 15, PR: 6, reduced: 6, expected: 0

However, when looking at the usage in Interpret(), we see if we would encounter JUMP as the first instruction, our bitpos is sort of out of bounds by one.

bitcoin/src/util/asmap.cpp

Line 184 in b41f5a2

uint8_t bits = ip.size() * 8;

bitcoin/src/util/asmap.cpp

Line 201 in b41f5a2

if (GetBitBE(ip, bits)) { // Check next IP bit (big-endian)

If we instead at the callsite make sure to send in in-bounds values - GetBitBE(ip, bits - 1), it turns out we can simplify GetBitBE quite a bit.

--- a/src/util/asmap.cpp +++ b/src/util/asmap.cpp @@ -48,8 +48,8 @@ namespace { constexpr uint32_t INVALID = 0xFFFFFFFF; /** - * Extract a single bit from byte array using little-endian bit ordering. - * Used for ASMap data where bits are numbered right-to-left within each byte (LSB first). + * Extract a single bit (LSB first) from byte array using little-endian byte ordering. + * Used for ASMap data. */ inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept { @@ -57,12 +57,12 @@ inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept } /** - * Extract a single bit from byte array using big-endian bit ordering. + * Extract a single bit (LSB first) from byte array using big-endian byte ordering. * Used for IP addresses to match network byte order conventions. */ inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept { - return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1; + return (std::to_integer<uint8_t>(bytes[bytes.size() - ((bitpos + 8) / 8)]) >> (bitpos % 8)) & 1; } /** @@ -198,7 +198,7 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const if (jump == INVALID) break; // Jump offset straddles EOF if (bits == 0) break; // No input bits left if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF - if (GetBitBE(ip, bits)) { // Check next IP bit (big-endian) + if (GetBitBE(ip, bits - 1)) { // Check next IP bit (big-endian) pos += jump; // Bit = 1: skip to right subtree } // Bit = 0: fall through to left subtree @@ -213,7 +213,7 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const matchlen = std::bit_width(match) - 1; // An n-bit value matches n-1 input bits if (bits < matchlen) break; // Not enough input bits for (uint32_t bit = 0; bit < matchlen; bit++) { - if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) { + if (GetBitBE(ip, bits - 1) != ((match >> (matchlen - 1 - bit)) & 1)) { return default_asn; // Pattern mismatch - use default } bits--;

The similarity we get between the shifting in this new GetBitBE and GetBitLE show that we have the same bit ordering, just different byte-ordering.

bitcoin/src/util/asmap.cpp

Lines 54 to 56 in b41f5a2

inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept

{

return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;

That's why I also updated the comments in the diff above.

hodlinator · 2025-11-20T13:43:39Z

src/util/asmap.cpp

+    const size_t endpos{asmap.size() * 8};
+    uint8_t bits = ip.size() * 8;
    uint32_t default_asn = 0;
    uint32_t jump, match, matchlen;


nit: Here are some changes that bring the Interpret() implementation even closer to SanityCheckAsmap() while also decreasing variable scope, making the code more readable. Maybe something to include in the last commit which aligns the casing of SanityCheckAsmap?

--- a/src/util/asmap.cpp +++ b/src/util/asmap.cpp @@ -183,18 +183,16 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const const size_t endpos{asmap.size() * 8}; uint8_t bits = ip.size() * 8; uint32_t default_asn = 0; - uint32_t jump, match, matchlen; - Instruction opcode; while (pos < endpos) { - opcode = DecodeType(pos, asmap); + Instruction opcode = DecodeType(pos, asmap); if (opcode == Instruction::RETURN) { // Found leaf node - return the ASN - default_asn = DecodeASN(pos, asmap); - if (default_asn == INVALID) break; // ASN straddles EOF - return default_asn; + uint32_t asn = DecodeASN(pos, asmap); + if (asn == INVALID) break; // ASN straddles EOF + return asn; } else if (opcode == Instruction::JUMP) { // Binary branch: if IP bit is 1, jump forward; else continue - jump = DecodeJump(pos, asmap); + uint32_t jump = DecodeJump(pos, asmap); if (jump == INVALID) break; // Jump offset straddles EOF if (bits == 0) break; // No input bits left if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF @@ -208,11 +206,11 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const // The match value encodes both length and pattern: // - highest set bit position determines length (bit_width - 1) // - lower bits contain the pattern to compare - match = DecodeMatch(pos, asmap); + uint32_t match = DecodeMatch(pos, asmap); if (match == INVALID) break; // Match bits straddle EOF - matchlen = std::bit_width(match) - 1; // An n-bit value matches n-1 input bits + int matchlen = std::bit_width(match) - 1; // An n-bit value matches n-1 input bits if (bits < matchlen) break; // Not enough input bits - for (uint32_t bit = 0; bit < matchlen; bit++) { + for (int bit = 0; bit < matchlen; bit++) { if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) { return default_asn; // Pattern mismatch - use default }

hodlinator · 2025-11-20T14:29:54Z

src/util/asmap.cpp

+inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
+{
+    return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
+}
+
+/**
+ * Extract a single bit from byte array using big-endian bit ordering.
+ * Used for IP addresses to match network byte order conventions.
+ */
+inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept


nit: In other cases we have

DecodeBits(size_t& bitpos, const std::span<const std::byte>& data... and

DecodeType(size_t& bitpos, const std::span<const std::byte>& data)

...

So it might be nice to have the same argument order here:

Suggested change

inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept

{

return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;

}

/**

* Extract a single bit from byte array using big-endian bit ordering.

* Used for IP addresses to match network byte order conventions.

*/

inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept

inline bool GetBitLE(uint32_t bitpos, std::span<const std::byte> bytes) noexcept

{

return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;

}

/**

* Extract a single bit from byte array using big-endian bit ordering.

* Used for IP addresses to match network byte order conventions.

*/

inline bool GetBitBE(uint32_t bitpos, std::span<const std::byte> bytes) noexcept

Taking it one step further, using the observation that we only inspect a bit once, we could perform incrementing and decrementing within these functions as well. This also is influenced by the other comment investigating GetBitBE.

constexpr uint32_t INVALID = 0xFFFFFFFF; /** - * Extract a single bit from byte array using little-endian bit ordering. - * Used for ASMap data where bits are numbered right-to-left within each byte (LSB first). + * Extract a single bit (LSB first) from byte array using little-endian byte ordering. + * Used for ASMap data. */ -inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept +inline bool DecodeBitLE(size_t& bitpos, std::span<const std::byte> bytes) noexcept { - return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1; + const bool bit = (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1; + ++bitpos; + return bit; } /** - * Extract a single bit from byte array using big-endian bit ordering. + * Extract a single bit (LSB first) from byte array using big-endian byte ordering. * Used for IP addresses to match network byte order conventions. */ -inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept +inline bool DecodeBitBE(uint8_t& bitpos, std::span<const std::byte> bytes) noexcept { - return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1; + --bitpos; + return (std::to_integer<uint8_t>(bytes[bytes.size() - ((bitpos + 8) / 8)]) >> (bitpos % 8)) & 1; } /** @@ -88,8 +91,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint // Read continuation bit to determine if we're in this class if (bit_sizes_it + 1 != bit_sizes.end()) { // Unless we're in the last class if (bitpos >= data.size() * 8) break; - bit = GetBitLE(data, bitpos); - bitpos++; + bit = DecodeBitLE(bitpos, data); } else { bit = 0; // Last class has no continuation bit } @@ -101,8 +103,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint // Decode the position within this class in big endian for (int b = 0; b < *bit_sizes_it; b++) { if (bitpos >= data.size() * 8) return INVALID; // Reached EOF in mantissa - bit = GetBitLE(data, bitpos); - bitpos++; + bit = DecodeBitLE(bitpos, data); val += bit << (*bit_sizes_it - 1 - b); // Big-endian within the class } return val; @@ -198,11 +199,10 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const if (jump == INVALID) break; // Jump offset straddles EOF if (bits == 0) break; // No input bits left if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF - if (GetBitBE(ip, bits)) { // Check next IP bit (big-endian) + if (DecodeBitBE(bits, ip)) { // Check next IP bit (big-endian) pos += jump; // Bit = 1: skip to right subtree } // Bit = 0: fall through to left subtree - bits--; } else if (opcode == Instruction::MATCH) { // Compare multiple IP bits against a pattern // The match value encodes both length and pattern: @@ -213,10 +213,9 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const matchlen = std::bit_width(match) - 1; // An n-bit value matches n-1 input bits if (bits < matchlen) break; // Not enough input bits for (uint32_t bit = 0; bit < matchlen; bit++) { - if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) { + if (DecodeBitBE(bits, ip) != ((match >> (matchlen - 1 - bit)) & 1)) { return default_asn; // Pattern mismatch - use default } - bits--; } // Pattern matched - continue execution } else if (opcode == Instruction::DEFAULT) { @@ -260,8 +259,7 @@ bool SanityCheckAsmap(const std::span<const std::byte> asmap, int bits) // Nothing to execute anymore if (endpos - pos > 7) return false; // Excessive padding while (pos != endpos) { - if (GetBitLE(asmap, pos)) return false; // Nonzero padding bit - ++pos; + if (DecodeBitLE(pos, asmap)) return false; // Nonzero padding bit } return true; // Sanely reached EOF } else {

(Maybe ConsumeBit(LE|BE) would be more appropriate than DecodeBit(LE|BE)).

Couldn't stop thinking about this. It felt weird to implement reverse byte order in GetBitBE but then also start bits from the end of ip data and count backwards. Found a way that I think simplifies things further, making the order we consume bytes between LE&BE implementations the same, but having different bit-order. Diff against b41f5a2:

--- a/src/util/asmap.cpp +++ b/src/util/asmap.cpp @@ -48,21 +48,25 @@ namespace { constexpr uint32_t INVALID = 0xFFFFFFFF; /** - * Extract a single bit from byte array using little-endian bit ordering. - * Used for ASMap data where bits are numbered right-to-left within each byte (LSB first). + * Extract a single bit from byte array using little-endian bit ordering (LSB first). + * Used for ASMap data. */ -inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept +inline bool ConsumeBitLE(size_t& bitpos, std::span<const std::byte> bytes) noexcept { - return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1; + const bool bit = (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1; + ++bitpos; + return bit; } /** - * Extract a single bit from byte array using big-endian bit ordering. + * Extract a single bit from byte array using big-endian bit ordering (MSB first). * Used for IP addresses to match network byte order conventions. */ -inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept +inline bool ConsumeBitBE(uint8_t& bitpos, std::span<const std::byte> bytes) noexcept { - return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1; + const bool bit = (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (7 - (bitpos % 8))) & 1; + ++bitpos; + return bit; } /** @@ -88,8 +92,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint // Read continuation bit to determine if we're in this class if (bit_sizes_it + 1 != bit_sizes.end()) { // Unless we're in the last class if (bitpos >= data.size() * 8) break; - bit = GetBitLE(data, bitpos); - bitpos++; + bit = ConsumeBitLE(bitpos, data); } else { bit = 0; // Last class has no continuation bit } @@ -101,8 +104,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint // Decode the position within this class in big endian for (int b = 0; b < *bit_sizes_it; b++) { if (bitpos >= data.size() * 8) return INVALID; // Reached EOF in mantissa - bit = GetBitLE(data, bitpos); - bitpos++; + bit = ConsumeBitLE(bitpos, data); val += bit << (*bit_sizes_it - 1 - b); // Big-endian within the class } return val; @@ -181,7 +183,8 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const { size_t pos{0}; const size_t endpos{asmap.size() * 8}; - uint8_t bits = ip.size() * 8; + uint8_t ip_bit{0}; + const uint8_t ip_bits_end = ip.size() * 8; uint32_t default_asn = 0; uint32_t jump, match, matchlen; Instruction opcode; @@ -196,13 +199,12 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const // Binary branch: if IP bit is 1, jump forward; else continue jump = DecodeJump(pos, asmap); if (jump == INVALID) break; // Jump offset straddles EOF - if (bits == 0) break; // No input bits left + if (ip_bit == ip_bits_end) break; // No input bits left if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF - if (GetBitBE(ip, bits)) { // Check next IP bit (big-endian) + if (ConsumeBitBE(ip_bit, ip)) { // Check next IP bit (big-endian) pos += jump; // Bit = 1: skip to right subtree } // Bit = 0: fall through to left subtree - bits--; } else if (opcode == Instruction::MATCH) { // Compare multiple IP bits against a pattern // The match value encodes both length and pattern: @@ -211,12 +213,11 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const match = DecodeMatch(pos, asmap); if (match == INVALID) break; // Match bits straddle EOF matchlen = std::bit_width(match) - 1; // An n-bit value matches n-1 input bits - if (bits < matchlen) break; // Not enough input bits + if ((ip_bits_end - ip_bit) < matchlen) break; // Not enough input bits for (uint32_t bit = 0; bit < matchlen; bit++) { - if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) { + if (ConsumeBitBE(ip_bit, ip) != ((match >> (matchlen - 1 - bit)) & 1)) { return default_asn; // Pattern mismatch - use default } - bits--; } // Pattern matched - continue execution } else if (opcode == Instruction::DEFAULT) { @@ -260,8 +261,7 @@ bool SanityCheckAsmap(const std::span<const std::byte> asmap, int bits) // Nothing to execute anymore if (endpos - pos > 7) return false; // Excessive padding while (pos != endpos) { - if (GetBitLE(asmap, pos)) return false; // Nonzero padding bit - ++pos; + if (ConsumeBitLE(pos, asmap)) return false; // Nonzero padding bit } return true; // Sanely reached EOF } else {

This was referenced Nov 15, 2025

Embedded ASMap data Tracking Issue (Part 2) #33879

Open

build: Embedded ASMap [3/3]: Build binary dump header file #28792

Open

sipa reviewed Nov 15, 2025

View reviewed changes

DrahtBot added the CI failed label Nov 15, 2025

yuvicc mentioned this pull request Nov 16, 2025

interface_ipc functional test failing in CI #33884

Closed

fjahr force-pushed the 2025-11-asmap-slice-2 branch from 93205dd to 9da0160 Compare November 16, 2025 21:39

fjahr force-pushed the 2025-11-asmap-slice-2 branch 2 times, most recently from 7a2bf8e to eaf5e12 Compare November 16, 2025 22:54

DrahtBot removed the CI failed label Nov 17, 2025

waketraindev mentioned this pull request Nov 17, 2025

test, refactor: Embedded ASMap [1/3]: Selected minor preparatory work #33026

Merged

laanwj reviewed Nov 17, 2025

View reviewed changes

src/netgroup.cpp Show resolved Hide resolved

waketraindev reviewed Nov 17, 2025

View reviewed changes

src/util/asmap.cpp Outdated Show resolved Hide resolved

hodlinator reviewed Nov 17, 2025

View reviewed changes

DrahtBot mentioned this pull request Nov 18, 2025

init: Split file path handling out of -asmap option #33631

Open

DrahtBot added the Needs rebase label Nov 19, 2025

fjahr force-pushed the 2025-11-asmap-slice-2 branch 2 times, most recently from 7de7c21 to 3011c84 Compare November 19, 2025 15:20

DrahtBot added the CI failed label Nov 19, 2025

DrahtBot removed the Needs rebase label Nov 19, 2025

fjahr added 5 commits November 19, 2025 23:26

refactor: Operate on bytes instead of bits in Asmap code

adfd800

refactor: Unify asmap version calculation and naming

d278d61

Calculate the asmap version only in one place: A dedicated function in util/asmap. The version was also referred to as asmap checksum in several places. To avoid confusion call it asmap version everywhere.

refactor: Use span instead of vector for data in util/asmap

6b9ce8c

This prevents holding the asmap data in memory twice. The version hash changes due to spans being serialized without their size-prefix (unlike vectors).

doc: Add more extensive docs to asmap implementation

8a356b7

Also makes minor improvement on the python implementation documentation.

asmap: Unify asmap casing in SanityCheckAsmap function name

b41f5a2

fjahr force-pushed the 2025-11-asmap-slice-2 branch from 3011c84 to b41f5a2 Compare November 19, 2025 22:27

DrahtBot removed the CI failed label Nov 19, 2025

DrahtBot mentioned this pull request Nov 20, 2025

[IBD] specialize block serialization #31868

Draft

hodlinator reviewed Nov 20, 2025

View reviewed changes

This was referenced Nov 20, 2025

scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] #29641

Draft

Export embedded ASMap RPC #33920

Draft

scripted-diff: Type-safe settings retrieval #31260

Open

		/** Check asmap from embedded data */
		bool CheckAsmap(std::span<const std::byte> data);

	return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;
	return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> ((7 + bitpos) % 8)) & 1;

	inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
	{
	return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;

refactor, docs: Embedded ASMap [2/3]: Refactor asmap internals and add documentation #33878

Are you sure you want to change the base?

refactor, docs: Embedded ASMap [2/3]: Refactor asmap internals and add documentation #33878

Conversation

fjahr commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

Uh oh!

fjahr commented Nov 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fjahr commented Nov 16, 2025

Uh oh!

laanwj commented Nov 17, 2025

Uh oh!

Uh oh!

Uh oh!

hodlinator left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fjahr commented Nov 19, 2025

Uh oh!

fjahr commented Nov 19, 2025

Uh oh!

hodlinator left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

fjahr commented Nov 15, 2025 •

edited

Loading

DrahtBot commented Nov 15, 2025 •

edited

Loading