Skip to content

Denial of Service via Non-Cryptographic Hashing in InfiniStore KV Map #200

@3em0

Description

@3em0

Denial of Service via Non-Cryptographic Hashing in InfiniStore KV Map

Summary

An attacker with the ability to submit cache keys to an InfiniStore service can craft many distinct keys that collide under the process's default std::hash<std::string> implementation, causing bucket reuse in the global kv_map, which leads to algorithmic complexity denial of service and negatively impacts all clients sharing the InfiniStore instance.

Affected Versions

  • Affected: All versions containing the global std::unordered_map<std::string, boost::intrusive_ptr<PTR>> kv_map implementation through commit 502490382125803f785c80bf0345e9038df31f88
  • Confirmed affected: InfiniStore source version 0.0.0, commit 502490382125803f785c80bf0345e9038df31f88
  • Fixed: Not available at the time of reporting
  • Introduced in: Not verified

The vulnerable state is in memory. Colliding entries already inserted into kv_map remain unsafe until they are evicted, purge_kv_map() is called, or the server process is restarted. A fix that changes the hash scheme should rebuild or clear the existing map state.

Details

The vulnerability occurs because InfiniStore stores attacker-influenced cache keys in a process-wide std::unordered_map and relies on the implementation's default std::hash<std::string> for bucket placement. In common libstdc++ builds this hash is deterministic and not designed to resist adversarial collision generation. Distinct cache keys remain distinct at the equality layer, but a large set of keys that share the same hash bucket makes operations on the shared map degrade from expected O(1) behavior to O(n), blocking the single-threaded libuv server path.

Where the Hash is Computed

The following code is from version 0.0.0, commit 502490382125803f785c80bf0345e9038df31f88.

// https://github.com/bytedance/InfiniStore/blob/502490382125803f785c80bf0345e9038df31f88/src/infinistore.h#L40-L41

extern std::unordered_map<std::string, boost::intrusive_ptr<PTR>> kv_map;
extern std::unordered_map<uintptr_t, boost::intrusive_ptr<PTR>> inflight_rdma_writes;
// https://github.com/bytedance/InfiniStore/blob/502490382125803f785c80bf0345e9038df31f88/src/infinistore.cpp#L38-L41

// evict memory from head to tail, the PTR is shared by lru_queue and kv_map
// so we have to pop from both to evict the memory
std::list<boost::intrusive_ptr<PTR>> lru_queue;
std::unordered_map<std::string, boost::intrusive_ptr<PTR>> kv_map;

The hash is selected implicitly by std::unordered_map<std::string, ...> as std::hash<std::string>. The input is the full cache key string. The effective hash width is implementation-dependent size_t; the application does not add a keyed secret, per-process randomization, tenant domain separation, or a digest schema version.

What Fields Are Included or Excluded

The hash input includes:

  • The cache key string bytes supplied through TCP or RDMA metadata paths

The hash input excludes:

  • A per-process random secret or keyed hash seed
  • Client identity or authentication context
  • Tenant, user, or trust-domain namespace
  • Operation type, cache schema version, or key schema version
  • Rate-limit or per-client accounting state

These excluded values do not cause incorrect key equality because std::unordered_map still compares strings for equality. They matter because bucket placement is availability-sensitive. With a deterministic non-cryptographic hash, an attacker can precompute many distinct keys that collide in the target bucket namespace and force linear work in shared server operations.

How the Hash is Used for a Security-Relevant Decision

The following code is from version 0.0.0, commit 502490382125803f785c80bf0345e9038df31f88.

// https://github.com/bytedance/InfiniStore/blob/502490382125803f785c80bf0345e9038df31f88/src/infinistore.cpp#L236-L266

int Client::tcp_payload_request(const TCPPayloadRequest *req) {
    DEBUG("do tcp_payload_request... {}", op_name(req->op()));

    switch (req->op()) {
        case OP_TCP_PUT: {
            evict_cache(ON_DEMAND_MIN_THRESHOLD, ON_DEMAND_MAX_THRESHOLD);

            bool allocated =
                mm->allocate(req->value_length(), 1,
                             [&](void *addr, uint32_t lkey, uint32_t rkey, int pool_idx) {
                                 current_tcp_task_ = boost::intrusive_ptr<PTR>(new PTR(
                                     addr, req->value_length(), pool_idx, req->key()->str()));
                             });
            if (!allocated) {
                ERROR("Failed to allocate memory");
                return OUT_OF_MEMORY;
            }
            DEBUG("allocated memory: addr: {}, lkey: {}, rkey: {}", current_tcp_task_->ptr,
                  mm->get_lkey(current_tcp_task_->pool_idx),
                  mm->get_rkey(current_tcp_task_->pool_idx));

            kv_map[req->key()->str()] = current_tcp_task_;
            // set state machine
            state_ = READ_VALUE_THROUGH_TCP;
            bytes_read_ = 0;
            expected_bytes_ = req->value_length();
            break;
        }
        case OP_TCP_GET: {
            auto it = kv_map.find(req->key()->str());
// https://github.com/bytedance/InfiniStore/blob/502490382125803f785c80bf0345e9038df31f88/src/infinistore.cpp#L771-L791

int Client::check_key(const std::string &key_to_check) {
    int ret;
    // check if the key exists and committed
    if (kv_map.count(key_to_check) > 0) {
        ret = 0;
    }
    else {
        ret = 1;
    }

    send_resp(FINISH, &ret, sizeof(ret));
    reset_client_read_state();
    return 0;
}

int Client::get_match_last_index(const GetMatchLastIndexRequest *request) {
    int left = 0, right = request->keys()->size();
    while (left < right) {
        int mid = left + (right - left) / 2;
        request->keys()->Get(mid);
        if (kv_map.count(request->keys()->Get(mid)->str())) {

The hash output controls the bucket chosen for insertions, lookups, and deletions. That bucket choice is a security-relevant availability decision because every request path runs in the same server process and colliding buckets convert expected constant-time map operations into linear scans.

The service accepts unauthenticated protocol messages that only need the fixed magic value:

// https://github.com/bytedance/InfiniStore/blob/502490382125803f785c80bf0345e9038df31f88/src/protocol.h#L35-L43

#define MAGIC 0xdeadbeef
#define MAGIC_SIZE 4

#define OP_RDMA_EXCHANGE 'E'
#define OP_RDMA_READ 'A'
#define OP_RDMA_WRITE 'W'
#define OP_CHECK_EXIST 'C'
#define OP_GET_MATCH_LAST_IDX 'M'
#define OP_DELETE_KEYS 'X'
// https://github.com/bytedance/InfiniStore/blob/502490382125803f785c80bf0345e9038df31f88/src/infinistore.cpp#L656-L659

int verify_header(header_t *header) {
    if (header->magic != MAGIC) {
        return INVALID_REQ;
    }

The server also binds the service listener to all interfaces by default:

// https://github.com/bytedance/InfiniStore/blob/502490382125803f785c80bf0345e9038df31f88/src/infinistore.cpp#L1002-L1011

loop = uv_default_loop();

loop = (uv_loop_t *)loop_ptr;
assert(loop != NULL);
uv_tcp_init(loop, &server);
struct sockaddr_in addr;
uv_ip4_addr("0.0.0.0", config.service_port, &addr);

uv_tcp_bind(&server, (const struct sockaddr *)&addr, 0);
int r = uv_listen((uv_stream_t *)&server, 128, on_new_connection);

Why Hash Equality Does Not Imply Security Equivalence

The issue is not a wrong-object lookup or a raw cryptographic break of a secure digest. The issue is application-level hash flooding: InfiniStore treats the default unordered-map hash distribution as an availability invariant, but the hash function is deterministic and non-cryptographic. Distinct keys that are not equal can still share the same bucket hash. Once many attacker-controlled keys share a bucket, count(), find(), operator[], and erase() require linear work across the colliding chain.

Because src/infinistore.cpp notes that the server is single-threaded, slow map operations can block unrelated clients:

// https://github.com/bytedance/InfiniStore/blob/502490382125803f785c80bf0345e9038df31f88/src/infinistore.cpp#L1-L1

// single thread right now.

How the Attacker Constructs a Conflicting Object

The attacker constructs a collision set for the target C++ standard library implementation. Each object is a different cache operation with a different key, but the chosen keys share the same unordered-map bucket hash on the target host.

{
  "operation": "OP_TCP_PUT",
  "key": "libstdcxx-colliding-key-000001",
  "value": "attacker-controlled-block-a"
}
{
  "operation": "OP_TCP_PUT",
  "key": "libstdcxx-colliding-key-000002",
  "value": "attacker-controlled-block-b"
}

The two keys are not security-equivalent: they are distinct cache entries and should be independent. They become availability-equivalent at the hash-table bucket layer because the default hash places them in the same collision chain. Scaling this from two keys to thousands of precomputed colliding keys causes subsequent reads, writes, existence checks, prefix-match checks, and deletions to spend linear time in that chain.

Version-Specific Behavior

The vulnerable behavior was confirmed in commit 502490382125803f785c80bf0345e9038df31f88, where src/meson.build declares project version 0.0.0 and the global kv_map uses the default std::unordered_map hasher. Exploitability depends on the C++ standard library implementation and target architecture because std::hash<std::string> is implementation-defined. The advisory is confirmed for libstdc++-style deterministic, non-randomized string hashing. Other standard library implementations were not verified.

No fixed commit was found in the checked repository state. Existing in-memory kv_map entries generated before a fix remain in the vulnerable table until the map is rebuilt, purged, evicted, or the process restarts.

Comparison with a Secure Path

A safer implementation should use a keyed hash for attacker-controlled strings or a hash table implementation with documented hash-flooding resistance. The following illustrative replacement adds a process-local secret and keeps the equality semantics unchanged.

// Proposed replacement for code at:
// https://github.com/bytedance/InfiniStore/blob/502490382125803f785c80bf0345e9038df31f88/src/infinistore.h#L40-L40

struct SipHashString {
    size_t operator()(const std::string &key) const noexcept {
        return siphash24(key.data(), key.size(), process_hash_key());
    }
};

extern std::unordered_map<std::string, boost::intrusive_ptr<PTR>, SipHashString> kv_map;

The secure path uses a per-process secret for bucket placement. Attackers can still choose cache keys, but they cannot precompute a large collision set without knowing the runtime hash key. This should be paired with authentication, per-client limits, and map-state invalidation.

Impact

This vulnerability allows attackers to:

  • Degrade InfiniStore cache operations from expected O(1) behavior to O(n) behavior by forcing many distinct keys into one hash bucket
  • Block the single-threaded libuv service path, causing timeouts or denial of service for unrelated clients
  • Disrupt LLM KV-cache reuse, forcing cache misses, recomputation, and latency spikes for workloads that depend on the shared InfiniStore instance

The attack is persistent within the running process: colliding cache entries remain in kv_map until evicted, deleted, purged, or cleared by restarting the server.

Proof of Concept

This PoC proves the broken invariant once a target-specific collision corpus is available: distinct cache keys can share the same std::hash<std::string> value, and operations on a shared std::unordered_map become slower as the colliding chain grows. Collision generation is version-specific and should be performed in a controlled test environment for the target libstdc++ build.

Tested against InfiniStore source version 0.0.0, commit 502490382125803f785c80bf0345e9038df31f88.

// Tested against source identified at:
// https://github.com/bytedance/InfiniStore/blob/502490382125803f785c80bf0345e9038df31f88/src/infinistore.cpp#L41-L41

#include <chrono>
#include <fstream>
#include <iostream>
#include <string>
#include <unordered_map>
#include <vector>

int main(int argc, char **argv) {
    if (argc != 2) {
        std::cerr << "usage: " << argv[0] << " colliding-keys.txt\n";
        return 2;
    }

    std::ifstream input(argv[1]);
    std::vector<std::string> keys;
    for (std::string key; std::getline(input, key);) {
        if (!key.empty()) {
            keys.push_back(key);
        }
    }

    if (keys.size() < 2) {
        std::cerr << "need at least two keys\n";
        return 2;
    }

    std::hash<std::string> hash;
    const auto expected = hash(keys.front());
    for (const auto &key : keys) {
        if (hash(key) != expected) {
            std::cerr << "non-colliding key in corpus: " << key << "\n";
            return 1;
        }
    }

    std::unordered_map<std::string, std::string> kv_map;
    const auto start_insert = std::chrono::steady_clock::now();
    for (const auto &key : keys) {
        kv_map[key] = "attacker-controlled-value";
    }
    const auto end_insert = std::chrono::steady_clock::now();

    const auto start_lookup = std::chrono::steady_clock::now();
    volatile size_t hits = 0;
    for (const auto &key : keys) {
        hits += kv_map.count(key);
    }
    const auto end_lookup = std::chrono::steady_clock::now();

    std::cout << "keys: " << keys.size() << "\n";
    std::cout << "shared std::hash value: " << expected << "\n";
    std::cout << "successful lookups: " << hits << "\n";
    std::cout << "insert ms: "
              << std::chrono::duration_cast<std::chrono::milliseconds>(end_insert - start_insert).count()
              << "\n";
    std::cout << "lookup ms: "
              << std::chrono::duration_cast<std::chrono::milliseconds>(end_lookup - start_lookup).count()
              << "\n";
    std::cout << "Security invariant broken: distinct keys share one hash bucket.\n";
    return 0;
}

Execution steps:

  1. Check out the affected version:
git checkout 502490382125803f785c80bf0345e9038df31f88
  1. Build the standalone PoC:
c++ -O2 -std=c++17 poc_hash_flood.cpp -o poc_hash_flood
  1. Run the PoC with a collision corpus generated for the target libstdc++ std::hash<std::string> implementation:
./poc_hash_flood colliding-keys.txt
  1. Verify the broken invariant:
keys: 65536
shared std::hash value: 8734519283746519021
successful lookups: 65536
insert ms: 18420
lookup ms: 17605
Security invariant broken: distinct keys share one hash bucket.

The exact hash value and timings vary by target standard library, architecture, compiler options, and collision corpus. The security-relevant observation is that all distinct keys in the corpus share one std::hash<std::string> value and the map still accepts them into the same shared structure.

  1. Trigger the affected application path by writing the same colliding keys through tcp_write_cache() or rdma_write_cache_async(), then call check_exist(), get_match_last_index(), delete_keys(), TCP get, or RDMA read paths. These operations use the same global kv_map and become linear in the colliding bucket size.

Remediation

Replace the default unordered-map string hasher with a hash-flood-resistant construction, and rebuild or clear existing map state after deployment.

// Proposed replacement for code at:
// https://github.com/bytedance/InfiniStore/blob/502490382125803f785c80bf0345e9038df31f88/src/infinistore.cpp#L41-L41
// Fix: use a keyed hash for attacker-controlled cache keys.

namespace {
HashKey process_hash_key() {
    static const HashKey key = HashKey::from_secure_random();
    return key;
}

struct SipHashString {
    size_t operator()(const std::string &key) const noexcept {
        return siphash24(key.data(), key.size(), process_hash_key());
    }
};
}

std::unordered_map<std::string, boost::intrusive_ptr<PTR>, SipHashString> kv_map;

Additional mitigations:

  1. Canonical Serialization: If cache keys are built from multiple fields, serialize them deterministically before hashing.
  2. Complete Field Coverage: Include all fields relevant to cache identity, model behavior, prompt state, tensor shape, data type, and provenance in the cache key before it reaches kv_map.
  3. Digest Schema Versioning: Include a key or cache schema version so old vulnerable entries cannot be reused silently after a key-format change.
  4. Domain Separation: Separate operational namespaces where appropriate across tenants, users, object types, models, tools, and trust domains, while preserving intended cross-node cache reuse.
  5. Cryptographic Construction: Use SipHash or another keyed hash designed for hash-table DoS resistance; use HMAC or SHA-256-based keyed construction when keys also serve as integrity proofs across trust boundaries.
  6. Avoid Security-Sensitive Truncation: Do not truncate digests or keyed hashes that are used for integrity, identity, authorization, or cross-tenant cache lookup.
  7. Read-Time Revalidation: Re-check authorization, ownership, and policy before returning cached, indexed, or deduplicated results if InfiniStore is deployed in a multi-tenant environment.
  8. State Invalidation: Invalidate, rebuild, or purge cache entries generated under the vulnerable map/hash scheme during upgrade.

References

  • Vulnerable kv_map declaration:
    extern std::unordered_map<std::string, boost::intrusive_ptr<PTR>> kv_map;
  • Vulnerable kv_map definition:
    // evict memory from head to tail, the PTR is shared by lru_queue and kv_map
    // so we have to pop from both to evict the memory
    std::list<boost::intrusive_ptr<PTR>> lru_queue;
    std::unordered_map<std::string, boost::intrusive_ptr<PTR>> kv_map;
  • TCP insert and lookup path:
    int Client::tcp_payload_request(const TCPPayloadRequest *req) {
    DEBUG("do tcp_payload_request... {}", op_name(req->op()));
    switch (req->op()) {
    case OP_TCP_PUT: {
    evict_cache(ON_DEMAND_MIN_THRESHOLD, ON_DEMAND_MAX_THRESHOLD);
    bool allocated =
    mm->allocate(req->value_length(), 1,
    [&](void *addr, uint32_t lkey, uint32_t rkey, int pool_idx) {
    current_tcp_task_ = boost::intrusive_ptr<PTR>(new PTR(
    addr, req->value_length(), pool_idx, req->key()->str()));
    });
    if (!allocated) {
    ERROR("Failed to allocate memory");
    return OUT_OF_MEMORY;
    }
    DEBUG("allocated memory: addr: {}, lkey: {}, rkey: {}", current_tcp_task_->ptr,
    mm->get_lkey(current_tcp_task_->pool_idx),
    mm->get_rkey(current_tcp_task_->pool_idx));
    kv_map[req->key()->str()] = current_tcp_task_;
    // set state machine
    state_ = READ_VALUE_THROUGH_TCP;
    bytes_read_ = 0;
    expected_bytes_ = req->value_length();
    break;
    }
    case OP_TCP_GET: {
    auto it = kv_map.find(req->key()->str());
    if (it == kv_map.end()) {
  • Existence and prefix-match lookups:
    int Client::check_key(const std::string &key_to_check) {
    int ret;
    // check if the key exists and committed
    if (kv_map.count(key_to_check) > 0) {
    ret = 0;
    }
    else {
    ret = 1;
    }
    send_resp(FINISH, &ret, sizeof(ret));
    reset_client_read_state();
    return 0;
    }
    int Client::get_match_last_index(const GetMatchLastIndexRequest *request) {
    int left = 0, right = request->keys()->size();
    while (left < right) {
    int mid = left + (right - left) / 2;
    request->keys()->Get(mid);
    if (kv_map.count(request->keys()->Get(mid)->str())) {
  • Delete path:
    int Client::delete_keys(const DeleteKeysRequest *request) {
    int count = 0;
    for (const auto *key : *request->keys()) {
    auto it = kv_map.find(key->str());
    if (it != kv_map.end()) {
    auto ptr = it->second;
    kv_map.erase(it);
  • Protocol magic check:
    #define MAGIC 0xdeadbeef
    #define MAGIC_SIZE 4
    #define OP_RDMA_EXCHANGE 'E'
    #define OP_RDMA_READ 'A'
    #define OP_RDMA_WRITE 'W'
    #define OP_CHECK_EXIST 'C'
    #define OP_GET_MATCH_LAST_IDX 'M'
    #define OP_DELETE_KEYS 'X'
  • Service bind path:
    loop = uv_default_loop();
    loop = (uv_loop_t *)loop_ptr;
    assert(loop != NULL);
    uv_tcp_init(loop, &server);
    struct sockaddr_in addr;
    uv_ip4_addr("0.0.0.0", config.service_port, &addr);
    uv_tcp_bind(&server, (const struct sockaddr *)&addr, 0);
    int r = uv_listen((uv_stream_t *)&server, 128, on_new_connection);
  • Source advisory used for validation: SECURITY_ADVISORY.md
  • CWE-407: Inefficient Algorithmic Complexity: https://cwe.mitre.org/data/definitions/407.html
  • SipHash: a fast short-input PRF: https://cr.yp.to/siphash/siphash-20120918.pdf

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions