Skip to content

[Kernel Feature] KV-Cache Manager - User-Space Cache Management for LLM Inference #221

@mikejmorgan-ai

Description

@mikejmorgan-ai

KV-Cache Manager - User-Space Cache Management for LLM Inference

Part of: Kernel-Level AI Enhancements (Tier 1 - User-Space)

Description

User-space implementation of kernel-level KV-cache concepts. Manages transformer key-value caches as first-class system resources.

Effort: 3-4 weeks | Bounty: $175

The Solution

cortex cache create llama-cache --size 16G --tier cpu
cortex cache status llama-cache
cortex cache persist llama-cache  # save to disk
cortex cache restore llama-cache  # fast restart
cortex cache evict llama-cache --percent 25

Features

  • POSIX shared memory pools
  • Cross-process cache sharing
  • Multiple eviction policies (LRU, LFU, FIFO, priority)
  • Prefix hash matching for cache sharing
  • Disk persistence for fast restarts
  • Thread-safe allocator with bitmap free list

Memory Layout

┌──────────────────┐
│ Header (4KB)     │ Magic, version, usage stats
├──────────────────┤
│ Free List (4KB)  │ Bitmap of free blocks
├──────────────────┤
│ Data Region      │ KV-cache tensors
└──────────────────┘

Acceptance Criteria

  • Cache pools can be created and destroyed
  • Multiple processes can attach to same pool
  • Eviction works correctly under pressure
  • Persistence and restore works
  • Unit tests pass with >80% coverage

Patent Connection

This implements user-space versions of concepts in our provisional patent for kernel-managed KV-cache memory regions.

Files

Complete implementation available:

  • kv_cache_manager.py (~900 lines)
  • test_kv_cache.py (~400 lines)
  • README_KV_CACHE.md

Priority

High - Core infrastructure for AI-native OS

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions