Skip to content

Conversation

@fjahr
Copy link
Contributor

@fjahr fjahr commented Nov 15, 2025

This is a second slice carved out of #28792. It contains the following changes that are crucial for the embedding of asmap data which is added the following PR in the series (probably this will remain in #28792).

The changes are:

  • Modernizes and simplifies the asmap code by operating on std::byte instead of bits
  • Unifies asmap version calculation and naming (previously it was called version and checksum interchangeably)
  • Operate on a span rather than a vector in the asmap internal to prevent holding the asmap data in memory twice
  • Add more extensive documentation to the asmap implementation
  • Unify asmap casing in implemetation function names

The first three commits were already part of #28792, the others are new.

The documentation commit came out of feedback gathered at the latest CoreDev. The primary input for the documentation was the documentation that already existed in the Python implementation (contrib/asmap/asmap.py) but there are several other comments as well. Please note: I have also asked several LLMs to provide suggestions on how to explain pieces of the implementation and better demonstrate how the parts work together. I have copied bits and pieces that I liked but everything has been edited further by me and obviously all mistakes here are my own.

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 15, 2025

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33878.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #33920 (Export embedded ASMap RPC by fjahr)
  • #33631 (init: Split file path handling out of -asmap option by fjahr)
  • #31260 (scripted-diff: Type-safe settings retrieval by ryanofsky)
  • #28792 (build: Embedded ASMap [3/3]: Build binary dump header file by fjahr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@fjahr
Copy link
Contributor Author

fjahr commented Nov 15, 2025

@sipa would love to get your eyes on the documentation commit before anyone else gets confused by the mistakes I might have made :)

#include <vector>

/*
* ASMap (Autonomous System Map) Implementation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the documentation changes in this commit look correct to me.

One thing that may be worth mentioning early on is that it's a bit-packed format, i.e., the entire compressed mapping is treated as a sequence of bits, packed together at 8 per byte, which are concatenated encodings of instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the quick feedback! Added a small paragraph on the bit-packed format in the introduction comment at the top.

@fjahr
Copy link
Contributor Author

fjahr commented Nov 16, 2025

All the feedback from #28792 that applied to the commits here has been addressed here now.

@fjahr fjahr force-pushed the 2025-11-asmap-slice-2 branch 2 times, most recently from 7a2bf8e to eaf5e12 Compare November 16, 2025 22:54
@laanwj
Copy link
Member

laanwj commented Nov 17, 2025

The first three PRs were already part of #28792, the others are new.

i'm sure you mean the first three commits? 😄

Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed eaf5e12

Thanks for peeling off this PR as I suggested!

@fjahr
Copy link
Contributor Author

fjahr commented Nov 19, 2025

The first three PRs were already part of #28792, the others are new.

i'm sure you mean the first three commits? 😄

Yepp, fixed :)

@fjahr fjahr force-pushed the 2025-11-asmap-slice-2 branch 2 times, most recently from 7de7c21 to 3011c84 Compare November 19, 2025 15:20
@fjahr
Copy link
Contributor Author

fjahr commented Nov 19, 2025

Rebased since #33026 was merged 🥳 and addressed all comments. Thanks for the reviews!

Calculate the asmap version only in one place: A dedicated function in util/asmap.

The version was also referred to as asmap checksum in several places. To avoid confusion call it asmap version everywhere.
This prevents holding the asmap data in memory twice.

The version hash changes due to spans being serialized without their size-prefix (unlike vectors).
Also makes minor improvement on the python implementation documentation.
Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed b41f5a2

Sorry for not finding these earlier, getting closer to A-C-K.

src/util/asmap.h Outdated
/** Read asmap from provided binary file */
std::vector<std::byte> DecodeAsmap(fs::path path);
/** Calculate the asmap version, a checksum identifying the asmap being used. */
uint256 AsmapVersion(const std::vector<std::byte>& data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meganit in d278d61 "refactor: Unify asmap version calculation and naming":
As uint256 AsmapVersion(const std::vector<std::byte>& data) is new in this PR, it could be introduced as uint256 AsmapVersion(std::span<const std::byte> data) from the beginning to avoid line churn.

Comment on lines +22 to +23
/** Check asmap from embedded data */
bool CheckAsmap(std::span<const std::byte> data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in 6b9ce8c on CheckAsmap:

  1. Justification for existence: I dislike default parameter values, so appreciate a distinct function from SanityCheckASMap.
  2. Comment: Nothing makes it specialized for only embedded data?
  3. Location: Maybe it could be added directly after SanityCheckASMap in both .h + .cpp since they are so closely related? (Could place it before but that would put it between Interpret and SanityCheckASMap which are nice to have after each other as they are also closely related).
  4. Naming: Could make it more descriptive - something like CheckStandardAsmap or CheckIPV6Asmap?
  5. Usage: Could be used in more places where we have SanityCheckAsmap(asmap, 128)
    ₿ git grep CheckAsmap
    src/test/fuzz/asmap.cpp:    if (!SanityCheckAsmap(asmap, 128)) return;
    src/test/fuzz/asmap_direct.cpp:    if (SanityCheckAsmap(asmap, ip_len)) {
    src/test/fuzz/asmap_direct.cpp:            assert(!SanityCheckAsmap(prefix, ip_len));
    src/test/fuzz/util/net.h:    if (!SanityCheckAsmap(std::span<std::byte>(asmap), 128)) {
    src/util/asmap.cpp:    // - should have been caught by SanityCheckAsmap below
    src/util/asmap.cpp:bool SanityCheckAsmap(const std::span<const std::byte> asmap, int bits)
    src/util/asmap.cpp:bool CheckAsmap(const std::span<const std::byte> data)
    src/util/asmap.cpp:    if (!SanityCheckAsmap(data, 128)) {
    src/util/asmap.cpp:    if (!CheckAsmap(buffer)) {
    src/util/asmap.h:bool SanityCheckAsmap(std::span<const std::byte> asmap, int bits);
    src/util/asmap.h:bool CheckAsmap(std::span<const std::byte> data);

using util::ToString;

static NetGroupManager EMPTY_NETGROUPMAN{{}};
static NetGroupManager EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit in 6b9ce8c:

Could avoid repeating type name here and in other files.
diff --git a/src/bench/addrman.cpp b/src/bench/addrman.cpp
index 1e93b9c395..2f474c2411 100644
--- a/src/bench/addrman.cpp
+++ b/src/bench/addrman.cpp
@@ -24,7 +24,7 @@
 static constexpr size_t NUM_SOURCES = 64;
 static constexpr size_t NUM_ADDRESSES_PER_SOURCE = 256;
 
-static NetGroupManager EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};
+static auto EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};
 static constexpr uint32_t ADDRMAN_CONSISTENCY_CHECK_RATIO{0};
 
 static std::vector<CAddress> g_sources;
diff --git a/src/test/addrman_tests.cpp b/src/test/addrman_tests.cpp
index c9a0ebebf0..f1e54416cd 100644
--- a/src/test/addrman_tests.cpp
+++ b/src/test/addrman_tests.cpp
@@ -24,7 +24,7 @@ using namespace std::literals;
 using node::NodeContext;
 using util::ToString;
 
-static NetGroupManager EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};
+static auto EMPTY_NETGROUPMAN{NetGroupManager::NoAsmap()};
 static const bool DETERMINISTIC{true};
 
 static int32_t GetCheckRatio(const NodeContext& node_ctx)
@@ -584,7 +584,7 @@ BOOST_AUTO_TEST_CASE(caddrinfo_get_new_bucket_legacy)
 // 101.8.0.0/16 AS8
 BOOST_AUTO_TEST_CASE(caddrinfo_get_tried_bucket)
 {
-    NetGroupManager ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
+    auto ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
 
     CAddress addr1 = CAddress(ResolveService("250.1.1.1", 8333), NODE_NONE);
     CAddress addr2 = CAddress(ResolveService("250.1.1.1", 9999), NODE_NONE);
@@ -637,7 +637,7 @@ BOOST_AUTO_TEST_CASE(caddrinfo_get_tried_bucket)
 
 BOOST_AUTO_TEST_CASE(caddrinfo_get_new_bucket)
 {
-    NetGroupManager ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
+    auto ngm_asmap{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
 
     CAddress addr1 = CAddress(ResolveService("250.1.2.1", 8333), NODE_NONE);
     CAddress addr2 = CAddress(ResolveService("250.1.2.1", 9999), NODE_NONE);
@@ -714,7 +714,7 @@ BOOST_AUTO_TEST_CASE(caddrinfo_get_new_bucket)
 
 BOOST_AUTO_TEST_CASE(addrman_serialization)
 {
-    NetGroupManager netgroupman{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
+    auto netgroupman{NetGroupManager::WithEmbeddedAsmap(test::data::asmap)};
 
     const auto ratio = GetCheckRatio(m_node);
     auto addrman_asmap1 = std::make_unique<AddrMan>(netgroupman, DETERMINISTIC, ratio);
diff --git a/src/test/fuzz/asmap.cpp b/src/test/fuzz/asmap.cpp
index fec0f3f50a..6ed5aa7cd9 100644
--- a/src/test/fuzz/asmap.cpp
+++ b/src/test/fuzz/asmap.cpp
@@ -42,6 +42,6 @@ FUZZ_TARGET(asmap)
         memcpy(&ipv4, addr_data, addr_size);
         net_addr.SetIP(CNetAddr{ipv4});
     }
-    NetGroupManager netgroupman{NetGroupManager::WithEmbeddedAsmap(asmap)};
+    auto netgroupman{NetGroupManager::WithEmbeddedAsmap(asmap)};
     (void)netgroupman.GetMappedAS(net_addr);
 }
diff --git a/src/test/fuzz/p2p_handshake.cpp b/src/test/fuzz/p2p_handshake.cpp
index 75e14865a4..a56e1e5f1f 100644
--- a/src/test/fuzz/p2p_handshake.cpp
+++ b/src/test/fuzz/p2p_handshake.cpp
@@ -48,7 +48,7 @@ FUZZ_TARGET(p2p_handshake, .init = ::initialize)
     chainman.ResetIbd();
 
     node::Warnings warnings{};
-    NetGroupManager netgroupman{NetGroupManager::NoAsmap()};
+    auto netgroupman{NetGroupManager::NoAsmap()};
     AddrMan addrman{netgroupman, /*deterministic=*/true, /*consistency_check_ratio=*/0};
     auto peerman = PeerManager::make(connman, addrman,
                                      /*banman=*/nullptr, chainman,
diff --git a/src/test/fuzz/process_message.cpp b/src/test/fuzz/process_message.cpp
index 5ac796ae2c..30c121d7a2 100644
--- a/src/test/fuzz/process_message.cpp
+++ b/src/test/fuzz/process_message.cpp
@@ -77,7 +77,7 @@ FUZZ_TARGET(process_message, .init = initialize_process_message)
     chainman.DisableNextWrite();
 
     node::Warnings warnings{};
-    NetGroupManager netgroupman{NetGroupManager::NoAsmap()};
+    auto netgroupman{NetGroupManager::NoAsmap()};
     AddrMan addrman{netgroupman, /*deterministic=*/true, /*consistency_check_ratio=*/0};
     auto peerman = PeerManager::make(connman, addrman,
                                      /*banman=*/nullptr, chainman,
diff --git a/src/test/netbase_tests.cpp b/src/test/netbase_tests.cpp
index 5086f21ff4..4d355142e2 100644
--- a/src/test/netbase_tests.cpp
+++ b/src/test/netbase_tests.cpp
@@ -325,7 +325,7 @@ BOOST_AUTO_TEST_CASE(subnet_test)
 
 BOOST_AUTO_TEST_CASE(netbase_getgroup)
 {
-    NetGroupManager netgroupman{NetGroupManager::NoAsmap()}; // use /16
+    auto netgroupman{NetGroupManager::NoAsmap()}; // use /16
     BOOST_CHECK(netgroupman.GetGroup(ResolveIP("127.0.0.1")) == std::vector<unsigned char>({0})); // Local -> !Routable()
     BOOST_CHECK(netgroupman.GetGroup(ResolveIP("257.0.0.1")) == std::vector<unsigned char>({0})); // !Valid -> !Routable()
     BOOST_CHECK(netgroupman.GetGroup(ResolveIP("10.0.0.1")) == std::vector<unsigned char>({0})); // RFC1918 -> !Routable()

*/
inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
{
return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a reduced version:

Suggested change
return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;
return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> ((7 + bitpos) % 8)) & 1;

I'm surprised by the way the shift amount changes for different bit positions. Would have expected something closer to this for big endian...

Suggested change
return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;
return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - (bitpos % 8))) & 1;

....but it fails the unit tests.

Godbolt experiment https://godbolt.org/z/6rjEzxhPM outputs:

PR bit shift expression (adfd8006...): (7 - ((bytes.size() * 8 - bitpos) % 8))
PR bit shift reduced: (7 + bitpos) % 8
Expected BE bit shift: 7 - (bitpos % 8)

bitpos:  0, PR: 7, reduced: 7, expected: 7
bitpos:  1, PR: 0, reduced: 0, expected: 6
bitpos:  2, PR: 1, reduced: 1, expected: 5
bitpos:  3, PR: 2, reduced: 2, expected: 4
bitpos:  4, PR: 3, reduced: 3, expected: 3
bitpos:  5, PR: 4, reduced: 4, expected: 2
bitpos:  6, PR: 5, reduced: 5, expected: 1
bitpos:  7, PR: 6, reduced: 6, expected: 0
bitpos:  8, PR: 7, reduced: 7, expected: 7
bitpos:  9, PR: 0, reduced: 0, expected: 6
bitpos: 10, PR: 1, reduced: 1, expected: 5
bitpos: 11, PR: 2, reduced: 2, expected: 4
bitpos: 12, PR: 3, reduced: 3, expected: 3
bitpos: 13, PR: 4, reduced: 4, expected: 2
bitpos: 14, PR: 5, reduced: 5, expected: 1
bitpos: 15, PR: 6, reduced: 6, expected: 0

However, when looking at the usage in Interpret(), we see if we would encounter JUMP as the first instruction, our bitpos is sort of out of bounds by one.

uint8_t bits = ip.size() * 8;

if (GetBitBE(ip, bits)) { // Check next IP bit (big-endian)

If we instead at the callsite make sure to send in in-bounds values - GetBitBE(ip, bits - 1), it turns out we can simplify GetBitBE quite a bit.

--- a/src/util/asmap.cpp
+++ b/src/util/asmap.cpp
@@ -48,8 +48,8 @@ namespace {
 constexpr uint32_t INVALID = 0xFFFFFFFF;
 
 /**
- * Extract a single bit from byte array using little-endian bit ordering.
- * Used for ASMap data where bits are numbered right-to-left within each byte (LSB first).
+ * Extract a single bit (LSB first) from byte array using little-endian byte ordering.
+ * Used for ASMap data.
  */
 inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
 {
@@ -57,12 +57,12 @@ inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
 }
 
 /**
- * Extract a single bit from byte array using big-endian bit ordering.
+ * Extract a single bit (LSB first) from byte array using big-endian byte ordering.
  * Used for IP addresses to match network byte order conventions.
  */
 inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
 {
-    return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;
+    return (std::to_integer<uint8_t>(bytes[bytes.size() - ((bitpos + 8) / 8)]) >> (bitpos % 8)) & 1;
 }
 
 /**
@@ -198,7 +198,7 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
             if (jump == INVALID) break; // Jump offset straddles EOF
             if (bits == 0) break; // No input bits left
             if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF
-            if (GetBitBE(ip, bits)) {  // Check next IP bit (big-endian)
+            if (GetBitBE(ip, bits - 1)) {  // Check next IP bit (big-endian)
                 pos += jump;  // Bit = 1: skip to right subtree
             }
             // Bit = 0: fall through to left subtree
@@ -213,7 +213,7 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
             matchlen = std::bit_width(match) - 1;  // An n-bit value matches n-1 input bits
             if (bits < matchlen) break; // Not enough input bits
             for (uint32_t bit = 0; bit < matchlen; bit++) {
-                if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) {
+                if (GetBitBE(ip, bits - 1) != ((match >> (matchlen - 1 - bit)) & 1)) {
                     return default_asn;  // Pattern mismatch - use default
                 }
                 bits--;

The similarity we get between the shifting in this new GetBitBE and GetBitLE show that we have the same bit ordering, just different byte-ordering.

inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
{
return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;

That's why I also updated the comments in the diff above.

const size_t endpos{asmap.size() * 8};
uint8_t bits = ip.size() * 8;
uint32_t default_asn = 0;
uint32_t jump, match, matchlen;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Here are some changes that bring the Interpret() implementation even closer to SanityCheckAsmap() while also decreasing variable scope, making the code more readable. Maybe something to include in the last commit which aligns the casing of SanityCheckAsmap?

--- a/src/util/asmap.cpp
+++ b/src/util/asmap.cpp
@@ -183,18 +183,16 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
     const size_t endpos{asmap.size() * 8};
     uint8_t bits = ip.size() * 8;
     uint32_t default_asn = 0;
-    uint32_t jump, match, matchlen;
-    Instruction opcode;
     while (pos < endpos) {
-        opcode = DecodeType(pos, asmap);
+        Instruction opcode = DecodeType(pos, asmap);
         if (opcode == Instruction::RETURN) {
             // Found leaf node - return the ASN
-            default_asn = DecodeASN(pos, asmap);
-            if (default_asn == INVALID) break; // ASN straddles EOF
-            return default_asn;
+            uint32_t asn = DecodeASN(pos, asmap);
+            if (asn == INVALID) break; // ASN straddles EOF
+            return asn;
         } else if (opcode == Instruction::JUMP) {
             // Binary branch: if IP bit is 1, jump forward; else continue
-            jump = DecodeJump(pos, asmap);
+            uint32_t jump = DecodeJump(pos, asmap);
             if (jump == INVALID) break; // Jump offset straddles EOF
             if (bits == 0) break; // No input bits left
             if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF
@@ -208,11 +206,11 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
             // The match value encodes both length and pattern:
             // - highest set bit position determines length (bit_width - 1)
             // - lower bits contain the pattern to compare
-            match = DecodeMatch(pos, asmap);
+            uint32_t match = DecodeMatch(pos, asmap);
             if (match == INVALID) break; // Match bits straddle EOF
-            matchlen = std::bit_width(match) - 1;  // An n-bit value matches n-1 input bits
+            int matchlen = std::bit_width(match) - 1;  // An n-bit value matches n-1 input bits
             if (bits < matchlen) break; // Not enough input bits
-            for (uint32_t bit = 0; bit < matchlen; bit++) {
+            for (int bit = 0; bit < matchlen; bit++) {
                 if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) {
                     return default_asn;  // Pattern mismatch - use default
                 }

Comment on lines +54 to +63
inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
{
return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
}

/**
* Extract a single bit from byte array using big-endian bit ordering.
* Used for IP addresses to match network byte order conventions.
*/
inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In other cases we have

  • DecodeBits(size_t& bitpos, const std::span<const std::byte>& data... and
  • DecodeType(size_t& bitpos, const std::span<const std::byte>& data)
  • ...

So it might be nice to have the same argument order here:

Suggested change
inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
{
return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
}
/**
* Extract a single bit from byte array using big-endian bit ordering.
* Used for IP addresses to match network byte order conventions.
*/
inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
inline bool GetBitLE(uint32_t bitpos, std::span<const std::byte> bytes) noexcept
{
return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
}
/**
* Extract a single bit from byte array using big-endian bit ordering.
* Used for IP addresses to match network byte order conventions.
*/
inline bool GetBitBE(uint32_t bitpos, std::span<const std::byte> bytes) noexcept

Taking it one step further, using the observation that we only inspect a bit once, we could perform incrementing and decrementing within these functions as well. This also is influenced by the other comment investigating GetBitBE.

 constexpr uint32_t INVALID = 0xFFFFFFFF;
 
 /**
- * Extract a single bit from byte array using little-endian bit ordering.
- * Used for ASMap data where bits are numbered right-to-left within each byte (LSB first).
+ * Extract a single bit (LSB first) from byte array using little-endian byte ordering.
+ * Used for ASMap data.
  */
-inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
+inline bool DecodeBitLE(size_t& bitpos, std::span<const std::byte> bytes) noexcept
 {
-    return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
+    const bool bit = (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
+    ++bitpos;
+    return bit;
 }
 
 /**
- * Extract a single bit from byte array using big-endian bit ordering.
+ * Extract a single bit (LSB first) from byte array using big-endian byte ordering.
  * Used for IP addresses to match network byte order conventions.
  */
-inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
+inline bool DecodeBitBE(uint8_t& bitpos, std::span<const std::byte> bytes) noexcept
 {
-    return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;
+    --bitpos;
+    return (std::to_integer<uint8_t>(bytes[bytes.size() - ((bitpos + 8) / 8)]) >> (bitpos % 8)) & 1;
 }
 
 /**
@@ -88,8 +91,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint
         // Read continuation bit to determine if we're in this class
         if (bit_sizes_it + 1 != bit_sizes.end()) {  // Unless we're in the last class
             if (bitpos >= data.size() * 8) break;
-            bit = GetBitLE(data, bitpos);
-            bitpos++;
+            bit = DecodeBitLE(bitpos, data);
         } else {
             bit = 0;  // Last class has no continuation bit
         }
@@ -101,8 +103,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint
             // Decode the position within this class in big endian
             for (int b = 0; b < *bit_sizes_it; b++) {
                 if (bitpos >= data.size() * 8) return INVALID; // Reached EOF in mantissa
-                bit = GetBitLE(data, bitpos);
-                bitpos++;
+                bit = DecodeBitLE(bitpos, data);
                 val += bit << (*bit_sizes_it - 1 - b);  // Big-endian within the class
             }
             return val;
@@ -198,11 +199,10 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
             if (jump == INVALID) break; // Jump offset straddles EOF
             if (bits == 0) break; // No input bits left
             if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF
-            if (GetBitBE(ip, bits)) {  // Check next IP bit (big-endian)
+            if (DecodeBitBE(bits, ip)) {  // Check next IP bit (big-endian)
                 pos += jump;  // Bit = 1: skip to right subtree
             }
             // Bit = 0: fall through to left subtree
-            bits--;
         } else if (opcode == Instruction::MATCH) {
             // Compare multiple IP bits against a pattern
             // The match value encodes both length and pattern:
@@ -213,10 +213,9 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
             matchlen = std::bit_width(match) - 1;  // An n-bit value matches n-1 input bits
             if (bits < matchlen) break; // Not enough input bits
             for (uint32_t bit = 0; bit < matchlen; bit++) {
-                if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) {
+                if (DecodeBitBE(bits, ip) != ((match >> (matchlen - 1 - bit)) & 1)) {
                     return default_asn;  // Pattern mismatch - use default
                 }
-                bits--;
             }
             // Pattern matched - continue execution
         } else if (opcode == Instruction::DEFAULT) {
@@ -260,8 +259,7 @@ bool SanityCheckAsmap(const std::span<const std::byte> asmap, int bits)
                 // Nothing to execute anymore
                 if (endpos - pos > 7) return false; // Excessive padding
                 while (pos != endpos) {
-                    if (GetBitLE(asmap, pos)) return false; // Nonzero padding bit
-                    ++pos;
+                    if (DecodeBitLE(pos, asmap)) return false; // Nonzero padding bit
                 }
                 return true; // Sanely reached EOF
             } else {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Maybe ConsumeBit(LE|BE) would be more appropriate than DecodeBit(LE|BE)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't stop thinking about this. It felt weird to implement reverse byte order in GetBitBE but then also start bits from the end of ip data and count backwards. Found a way that I think simplifies things further, making the order we consume bytes between LE&BE implementations the same, but having different bit-order. Diff against b41f5a2:

--- a/src/util/asmap.cpp
+++ b/src/util/asmap.cpp
@@ -48,21 +48,25 @@ namespace {
 constexpr uint32_t INVALID = 0xFFFFFFFF;
 
 /**
- * Extract a single bit from byte array using little-endian bit ordering.
- * Used for ASMap data where bits are numbered right-to-left within each byte (LSB first).
+ * Extract a single bit from byte array using little-endian bit ordering (LSB first).
+ * Used for ASMap data.
  */
-inline bool GetBitLE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
+inline bool ConsumeBitLE(size_t& bitpos, std::span<const std::byte> bytes) noexcept
 {
-    return (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
+    const bool bit = (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (bitpos % 8)) & 1;
+    ++bitpos;
+    return bit;
 }
 
 /**
- * Extract a single bit from byte array using big-endian bit ordering.
+ * Extract a single bit from byte array using big-endian bit ordering (MSB first).
  * Used for IP addresses to match network byte order conventions.
  */
-inline bool GetBitBE(std::span<const std::byte> bytes, uint32_t bitpos) noexcept
+inline bool ConsumeBitBE(uint8_t& bitpos, std::span<const std::byte> bytes) noexcept
 {
-    return (std::to_integer<uint8_t>(bytes[(bytes.size() * 8 - bitpos) / 8]) >> (7 - ((bytes.size() * 8 - bitpos) % 8))) & 1;
+    const bool bit = (std::to_integer<uint8_t>(bytes[bitpos / 8]) >> (7 - (bitpos % 8))) & 1;
+    ++bitpos;
+    return bit;
 }
 
 /**
@@ -88,8 +92,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint
         // Read continuation bit to determine if we're in this class
         if (bit_sizes_it + 1 != bit_sizes.end()) {  // Unless we're in the last class
             if (bitpos >= data.size() * 8) break;
-            bit = GetBitLE(data, bitpos);
-            bitpos++;
+            bit = ConsumeBitLE(bitpos, data);
         } else {
             bit = 0;  // Last class has no continuation bit
         }
@@ -101,8 +104,7 @@ uint32_t DecodeBits(size_t& bitpos, const std::span<const std::byte>& data, uint
             // Decode the position within this class in big endian
             for (int b = 0; b < *bit_sizes_it; b++) {
                 if (bitpos >= data.size() * 8) return INVALID; // Reached EOF in mantissa
-                bit = GetBitLE(data, bitpos);
-                bitpos++;
+                bit = ConsumeBitLE(bitpos, data);
                 val += bit << (*bit_sizes_it - 1 - b);  // Big-endian within the class
             }
             return val;
@@ -181,7 +183,8 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
 {
     size_t pos{0};
     const size_t endpos{asmap.size() * 8};
-    uint8_t bits = ip.size() * 8;
+    uint8_t ip_bit{0};
+    const uint8_t ip_bits_end = ip.size() * 8;
     uint32_t default_asn = 0;
     uint32_t jump, match, matchlen;
     Instruction opcode;
@@ -196,13 +199,12 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
             // Binary branch: if IP bit is 1, jump forward; else continue
             jump = DecodeJump(pos, asmap);
             if (jump == INVALID) break; // Jump offset straddles EOF
-            if (bits == 0) break; // No input bits left
+            if (ip_bit == ip_bits_end) break; // No input bits left
             if (int64_t{jump} >= static_cast<int64_t>(endpos - pos)) break; // Jumping past EOF
-            if (GetBitBE(ip, bits)) {  // Check next IP bit (big-endian)
+            if (ConsumeBitBE(ip_bit, ip)) {  // Check next IP bit (big-endian)
                 pos += jump;  // Bit = 1: skip to right subtree
             }
             // Bit = 0: fall through to left subtree
-            bits--;
         } else if (opcode == Instruction::MATCH) {
             // Compare multiple IP bits against a pattern
             // The match value encodes both length and pattern:
@@ -211,12 +213,11 @@ uint32_t Interpret(const std::span<const std::byte> asmap, const std::span<const
             match = DecodeMatch(pos, asmap);
             if (match == INVALID) break; // Match bits straddle EOF
             matchlen = std::bit_width(match) - 1;  // An n-bit value matches n-1 input bits
-            if (bits < matchlen) break; // Not enough input bits
+            if ((ip_bits_end - ip_bit) < matchlen) break; // Not enough input bits
             for (uint32_t bit = 0; bit < matchlen; bit++) {
-                if (GetBitBE(ip, bits) != ((match >> (matchlen - 1 - bit)) & 1)) {
+                if (ConsumeBitBE(ip_bit, ip) != ((match >> (matchlen - 1 - bit)) & 1)) {
                     return default_asn;  // Pattern mismatch - use default
                 }
-                bits--;
             }
             // Pattern matched - continue execution
         } else if (opcode == Instruction::DEFAULT) {
@@ -260,8 +261,7 @@ bool SanityCheckAsmap(const std::span<const std::byte> asmap, int bits)
                 // Nothing to execute anymore
                 if (endpos - pos > 7) return false; // Excessive padding
                 while (pos != endpos) {
-                    if (GetBitLE(asmap, pos)) return false; // Nonzero padding bit
-                    ++pos;
+                    if (ConsumeBitLE(pos, asmap)) return false; // Nonzero padding bit
                 }
                 return true; // Sanely reached EOF
             } else {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants