Hyper Claude Code

The intelligent middleware platform for Claude Code.

Use any model. Track costs. Cache responses. Auto-failover between providers. Ships with a 31-agent AI engineering team as a built-in operating system layer.

At a Glance


6 Providers	NVIDIA NIM, OpenRouter, DeepSeek, LM Studio, llama.cpp, Ollama
4 Middleware Plugins	Cost tracking, response caching, provider failover, request logging
31-Agent Dev Team	CTO-led with NEXUS protocol, Shadow Mind, trust ledger, adversarial review
7 API Endpoints	Cost, cache, health monitoring alongside standard Anthropic Messages API
75K+ Lines of Code	Production-grade Python 3.14 with 20K+ lines of tests
13 Feature Flags	Every capability toggleable, zero overhead when off
3 Architecture Diagrams	Hero, Dispatch Lifecycle, Shadow Mind

What Is This?

Claude Code is the best agentic coding tool — but it only works with Anthropic's models at Anthropic's prices.

Hyper Claude Code sits between Claude Code and any LLM provider. It transparently proxies every request while adding intelligence at each layer: cost tracking, response caching, provider failover, and request logging — all as middleware plugins on an extensible pipeline.

On top of that, it ships with a 31-agent AI engineering team — not a framework, but an operating system layer built on Claude Code. The team has a CTO, domain experts, adversarial reviewers, a trust calibration ledger, a parallel cognitive system (Shadow Mind), and a hiring pipeline that grows the roster when coverage gaps appear.

Claude Code CLI / VS Code / JetBrains
            |
            |  Standard Anthropic Messages API
            v
   +---------------------------+
   |    Hyper Claude Code       |
   |                           |
   |  [Failover]  ->  [Cache]  |
   |      ->  [Logger]         |
   |      ->  [Cost Tracker]   |
   +---------------------------+
            |
            |  Provider-native format
            v
  NIM  OpenRouter  DeepSeek  LM Studio  llama.cpp  Ollama

Use any model. Free, paid, or local — all work out of the box.

Pay nothing. Route to free models on NVIDIA NIM or OpenRouter. Run local with Ollama or llama.cpp.

Stay private. Local models mean your code never leaves your machine.

Mix models per tier. Opus to a powerful cloud model, Sonnet to mid-tier, Haiku to a fast local model.

Quick Start

1. Prerequisites

Install Claude Code, uv, and Python 3.14:

brew install uv          # or: pip install uv
uv python install 3.14

Windows

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
uv python install 3.14

2. Clone and Configure

git clone https://github.com/asiflow/hyper-claude-code.git
cd hyper-claude-code
cp .env.example .env

Edit .env — pick a provider:

NVIDIA_NIM_API_KEY="nvapi-your-key"
MODEL="nvidia_nim/z-ai/glm4.7"
ANTHROPIC_AUTH_TOKEN="your-secret-token"

Generate a strong token: python3 -c "import secrets; print(secrets.token_urlsafe(32))"

3. Start

uv run uvicorn server:app --port 8082

Security: Add --host 0.0.0.0 only if you need network access, and always set ANTHROPIC_AUTH_TOKEN when doing so. Default binds to 127.0.0.1 (local only).

4. Connect Claude Code

ANTHROPIC_AUTH_TOKEN="your-secret-token" ANTHROPIC_BASE_URL="http://localhost:8082" claude

VS Code / JetBrains / Model Picker setup

VS Code — add to settings.json:

"claudeCode.environmentVariables": [
  { "name": "ANTHROPIC_BASE_URL", "value": "http://localhost:8082" },
  { "name": "ANTHROPIC_AUTH_TOKEN", "value": "your-secret-token" }
]

JetBrains ACP — edit Claude ACP config, add:

"env": {
  "ANTHROPIC_BASE_URL": "http://localhost:8082",
  "ANTHROPIC_AUTH_TOKEN": "your-secret-token"
}

Model Picker — interactive model selection at launch:

brew install fzf
alias claude-pick="/path/to/hyper-claude-code/claude-pick"
claude-pick

Intelligent Middleware

HCC is not just a proxy — it's an extensible middleware platform. Four plugins intercept every request, all behind feature flags with zero overhead when disabled.

Plugin	What It Does	Key Config
Cost Intelligence	Per-request cost tracking, session/daily/monthly aggregation, hard budget caps (HTTP 429)	`ENABLE_COST_TRACKING=true` `MAX_SESSION_COST_USD=5.00`
Response Caching	SHA-256 exact-match caching with configurable TTL, LRU eviction, hit-rate stats	`ENABLE_RESPONSE_CACHE=true` `CACHE_TTL_SECONDS=3600`
Provider Failover	Per-provider circuit breaker (CLOSED/OPEN/HALF_OPEN), configurable failover chains	`ENABLE_PROVIDER_FAILOVER=true` `FAILOVER_ERROR_THRESHOLD=0.5`
Request Logging	Full audit trail to local SQLite — model, tokens, latency, cost per request	`ENABLE_REQUEST_LOGGING=true`

Pipeline Architecture

Request -> [Failover] -> [Cache] -> [Logger] -> [Cost] -> Provider
               |             |
         reroute if      return cached
         provider down   (skip everything)

Each plugin is a class extending middleware.Middleware with before_request() and after_response() hooks. Build your own — add it to the pipeline in api/runtime.py.

Providers

Six backends. Two transport types. One unified API.

Provider	Prefix	Key	Transport
NVIDIA NIM	`nvidia_nim/...`	Required	OpenAI chat -> Anthropic SSE translation
OpenRouter	`open_router/...`	Required	Native Anthropic Messages
DeepSeek	`deepseek/...`	Required	Native Anthropic Messages
LM Studio	`lmstudio/...`	None	Native Anthropic Messages
llama.cpp	`llamacpp/...`	None	Native Anthropic Messages
Ollama	`ollama/...`	None	Native Anthropic Messages

Provider setup details

NVIDIA NIM — Get key: MODEL="nvidia_nim/z-ai/glm4.7"

OpenRouter — Get key: MODEL="open_router/stepfun/step-3.5-flash:free" (free models)

DeepSeek — Get key: MODEL="deepseek/deepseek-chat"

LM Studio — Start server, load model: MODEL="lmstudio/your-model"

llama.cpp — Start llama-server: MODEL="llamacpp/local-model"

Ollama — ollama pull llama3.1 && ollama serve: MODEL="ollama/llama3.1"

Mix Providers Per Tier

MODEL_OPUS="nvidia_nim/moonshotai/kimi-k2.5"          # Best model for complex tasks
MODEL_SONNET="open_router/deepseek/deepseek-r1-0528:free"  # Mid-tier, free
MODEL_HAIKU="ollama/llama3.1"                           # Fast local model
MODEL="nvidia_nim/z-ai/glm4.7"                          # Default fallback

Architecture

Request Lifecycle

Every /v1/messages request flows through 10 precisely engineered stages:

 1. INGRESS         FastAPI ASGI application           api/app.py
 2. AUTH             Constant-time token comparison      api/dependencies.py
 3. VALIDATION       Pydantic models (extra="allow")     api/models/anthropic.py
 4. MIDDLEWARE        Failover -> Cache -> Log -> Cost    middleware/*.py
 5. OPTIMIZATION     5 fast-path local handlers          api/optimization_handlers.py
 6. MODEL ROUTING    Claude name -> provider/model ref   api/model_router.py
 7. CONVERSION       Anthropic -> provider format        core/anthropic/conversion.py
 8. UPSTREAM         Rate-limited provider call           providers/*.py
 9. STREAMING        SSE with thinking/tool/text blocks  core/anthropic/sse.py
10. DELIVERY         StreamingResponse to client          api/services.py

Dual Transport

Transport	Used By	Complexity
OpenAI Chat Translation	NVIDIA NIM	Converts Anthropic to OpenAI format. `ThinkTagParser` extracts `<think>` blocks. `HeuristicToolParser` detects tool calls in plain text. Full SSE reconstruction.
Native Anthropic	OpenRouter, DeepSeek, LM Studio, llama.cpp, Ollama	Near-transparent passthrough with thinking history sanitization and SSE state tracking for error recovery.

Three-Tier Rate Limiting

Tier	Mechanism	Purpose
Proactive	`StrictSlidingWindowLimiter`	Prevent hitting limits
Reactive	429 detection + exponential backoff	Respond to actual limits
Concurrency	`asyncio.Semaphore`	Cap simultaneous requests

Security

Layer	Protection
Auth	`secrets.compare_digest` (constant-time, CWE-208). CORS restricted to localhost.
SSRF	Cloud metadata IP blocklist. DNS rebinding protection. Private network blocking.
Input	Pydantic validation. Scheme allowlisting.
Output	6 `LOG_RAW_*` flags all default OFF. Sensitive data redacted.
Process	`create_subprocess_exec` (no shell injection). Atexit cleanup.

Engineering Decisions

Forward-compatible models — extra="allow" passes unknown Anthropic fields through. New API features work automatically.
Streaming-first — Never buffers complete responses. X-Accel-Buffering: no prevents reverse proxy interference.
AST-enforced import boundaries — Contract tests verify core/ never imports api/. Violations fail CI.
Mid-stream error recovery — Closes open SSE blocks correctly before emitting errors, preventing Claude Code replay corruption.

31-Agent Development Team

HCC ships with something no other open-source project has: a 31-agent AI engineering team that activates when you open Claude Code in this repo.

Say "full team session" to activate all 31 agents.

This is not a framework. It's an operating system layer on top of Claude Code — 31 domain-expert agents coordinated through a syscall protocol, governed by runtime hooks, validated by 341 contract test assertions, calibrated by a Bayesian trust ledger, and equipped with a hiring pipeline for growing the roster.

The Layered Architecture

LAYER 4 — USER          Human operator, natural language goals
LAYER 3 — SPECIALISTS   24 domain agents (builders, guardians, intelligence, meta)
LAYER 2 — SYSTEM        CTO, orchestrator, planner, sentinel, validators
LAYER 1.75 — SHADOW     Observer, Pattern Computer, Speculator, Dreamer, Oracle
LAYER 1.5 — TRUST       Hooks, contract tests, trust ledger (hard invariants)
LAYER 1 — KERNEL        Main thread, NEXUS syscall processing, signal persistence
LAYER 0 — RUNTIME       Claude Code CLI, git worktrees, MCP servers

Agent Roster

Tier	Agents	Role
CTO	`cto`	Supreme authority — assesses, delegates, debates, self-evolves
Builders	`elite-engineer` `ai-platform-architect` `frontend-platform-engineer` `beam-architect` `elixir-engineer` `go-hybrid-engineer`	Write production code across any stack
Guardians	`go-expert` `python-expert` `typescript-expert` `deep-qa` `deep-reviewer` `infra-expert` `database-expert` `observability-expert` `test-engineer` `api-expert` `beam-sre`	Review, audit, catch what builders miss
Strategists	`deep-planner` `orchestrator`	Task decomposition, workflow coordination
Intelligence	`memory-coordinator` `cluster-awareness` `benchmark-agent` `erlang-solutions-consultant` `talent-scout` `intuition-oracle`	Cross-agent knowledge, competitive intel, pattern recognition
Meta	`meta-agent` `recruiter`	Evolve agent prompts, hire new specialists on demand
Governance	`session-sentinel`	Protocol compliance, team health audits
Verification	`evidence-validator` `challenger`	Verify claims against source, adversarial review

NEXUS Protocol

Subagents in Claude Code can't spawn other agents or access privileged tools. NEXUS bridges this gap with a syscall interface:

Syscall	What It Does
`[NEXUS:SPAWN]`	Spawn a new agent into the team
`[NEXUS:SCALE]`	Spawn N copies of an agent for parallel work
`[NEXUS:ASK]`	Proxy a question to the human operator
`[NEXUS:MCP]`	Install/configure an MCP server
`[NEXUS:INTUIT]`	Query the Shadow Mind for pattern-based guidance
`[NEXUS:RELOAD]`	Shutdown and respawn an agent with fresh context
`[NEXUS:CRON]`	Create a scheduled task
`[NEXUS:WORKTREE]`	Create an isolated git worktree

Agents emit syscalls via SendMessage. The main thread (kernel) processes them in real-time and responds with [NEXUS:OK] or [NEXUS:ERR].

Trust Infrastructure

Every finding is verified. Every recommendation is challenged.

Evidence Validator — HIGH-severity findings are independently verified against source code. Classified as CONFIRMED, PARTIALLY_CONFIRMED, REFUTED, or UNVERIFIABLE.
Challenger — Strategic recommendations undergo adversarial review along 5 dimensions: steelman alternatives, hidden assumptions, evidence quality, missed cases, downstream impact.
Trust Ledger — Bayesian-blended per-agent accuracy scorecard at .claude/agent-memory/trust-ledger/. The CTO uses trust weights to resolve conflicting findings between agents.

Shadow Mind

An optional parallel cognitive layer that runs alongside the conscious team.

Six components provide probabilistic pattern-based guidance:

Component	Role
Observer	Tails the signal bus, writes structured observations
Pattern Computer	Derives n-grams, co-occurrences, temporal patterns
Pattern Library	Read-only data substrate for computed patterns
Speculator	Generates counterfactual variants per observation
Dreamer	Proposes insight candidates during idle windows
Intuition Oracle	Queryable surface via `[NEXUS:INTUIT]` — synthesizes all sources

Delete .claude/agent-memory/shadow-mind/ to disable without affecting the team. Full spec.

Dynamic Specialist Hiring

The team detects its own coverage gaps and hires new specialists:

talent-scout continuously monitors for domain gaps (5-signal confidence scoring)
When confidence exceeds threshold + session-sentinel co-signs, recruiter executes an 8-phase pipeline:
- Domain research -> Prompt synthesis -> Contract validation -> Challenger review -> Atomic registration -> Post-hire verify -> Probation -> Promotion
meta-agent performs atomic registration (agent file + contract tests + hooks + trust ledger + memory scaffold) in a single commit

Contract Tests

341 assertions (11 contracts x 31 agents) validate every agent file:

python3 .claude/tests/agents/run_contract_tests.py

A pre-commit hook runs these automatically. No agent change ships without passing all contracts.

Team Documentation

Document	What It Covers
Team Overview	Full architecture — layered OS, all 31 agents, NEXUS, Shadow Mind
Cheatsheet	Quick reference — agent-domain map, NEXUS syscalls, dispatch patterns
Runbook	Operational procedures — session lifecycle, Pattern F, incident response
Scenarios	Real-world walkthroughs — audit campaigns, remediation, strategic planning
Agent Template	Create new specialist agents
Architecture Diagrams	Mermaid source + ASCII fallbacks + PNG export instructions
Shadow Mind Spec	Full Shadow Mind component documentation

API Endpoints

All endpoints respect ANTHROPIC_AUTH_TOKEN when configured.

Method	Path	Description
`POST`	`/v1/messages`	Anthropic Messages API (proxied to provider)
`POST`	`/v1/messages/count_tokens`	Token counting
`GET`	`/v1/models`	Available model list
`GET`	`/v1/cost`	Cost summary — session, daily, monthly + budget remaining
`GET`	`/v1/cache/stats`	Cache hit/miss counts, hit rate, entry count
`POST`	`/v1/cache/clear`	Clear all cached responses
`GET`	`/v1/health/providers`	Circuit breaker status per provider

Discord & Telegram Bots

Run Claude Code sessions remotely via chat. Stream progress. Branch conversations with replies.

Security: Bots run Claude Code with --dangerously-skip-permissions within ALLOWED_DIR. Restrict directory scope and always configure channel/user allowlists.

Discord

MESSAGING_PLATFORM="discord"
DISCORD_BOT_TOKEN="your-token"
ALLOWED_DISCORD_CHANNELS="123456789"
CLAUDE_WORKSPACE="./agent_workspace"
ALLOWED_DIR="/path/to/projects"

Create bot in Discord Developer Portal. Enable Message Content Intent.

Telegram

MESSAGING_PLATFORM="telegram"
TELEGRAM_BOT_TOKEN="123456789:ABC..."
ALLOWED_TELEGRAM_USER_ID="your-user-id"
CLAUDE_WORKSPACE="./agent_workspace"
ALLOWED_DIR="/path/to/projects"

Token from @BotFather. User ID from @userinfobot.

Commands: /stop (cancel), /clear (reset), /stats (status).

Voice Notes

Transcribe voice messages via local Whisper or NVIDIA NIM:

uv sync --extra voice_local   # Local Whisper
uv sync --extra voice          # NVIDIA NIM

Configuration

.env.example is the canonical reference.

Model Routing

MODEL="nvidia_nim/z-ai/glm4.7"
MODEL_OPUS=
MODEL_SONNET=
MODEL_HAIKU=
ENABLE_MODEL_THINKING=true

Middleware Feature Flags

ENABLE_MIDDLEWARE_PIPELINE=true       # Master switch
ENABLE_COST_TRACKING=false            # Cost intelligence
MAX_SESSION_COST_USD=                 # Budget cap (empty = no limit)
ENABLE_RESPONSE_CACHE=false           # Response caching
CACHE_TTL_SECONDS=3600                # Cache lifetime
CACHE_MAX_ENTRIES=1000                # Max cached responses
ENABLE_PROVIDER_FAILOVER=false        # Circuit breaker + failover
FAILOVER_ERROR_THRESHOLD=0.5          # Error rate to trip breaker
FAILOVER_COOLDOWN_SECONDS=30          # Recovery window
ENABLE_REQUEST_LOGGING=false          # SQLite audit trail
STORAGE_DB_PATH="~/.fcc/hcc.db"      # Database path

Rate Limits & Timeouts

PROVIDER_RATE_LIMIT=1
PROVIDER_RATE_WINDOW=3
PROVIDER_MAX_CONCURRENCY=5
HTTP_READ_TIMEOUT=120
HTTP_CONNECT_TIMEOUT=10

Security & Diagnostics

ANTHROPIC_AUTH_TOKEN=
LOG_RAW_API_PAYLOADS=false
LOG_RAW_SSE_EVENTS=false
LOG_API_ERROR_TRACEBACKS=false
LOG_RAW_MESSAGING_CONTENT=false
LOG_RAW_CLI_DIAGNOSTICS=false

All raw logging flags default OFF. Enabling them exposes prompts, tool arguments, and model output.

Development

hyper-claude-code/
  server.py              ASGI entry point
  api/                   FastAPI routes, services, model routing, optimizations
  core/                  Anthropic protocol — SSE, conversion, streaming, tools
  providers/             6 provider transports, registry, rate limiting
  middleware/             Cost, cache, failover, logging plugins + pipeline
  storage/               SQLite persistence layer
  messaging/             Discord/Telegram bots, voice, sessions
  cli/                   Entrypoints, Claude subprocess management
  config/                Settings, provider catalog, logging
  tests/                 Unit, contract, and smoke tests (20K+ lines)
  .claude/               31-agent team, hooks, docs, Shadow Mind, trust ledger

uv run ruff format       # Format
uv run ruff check        # Lint
uv run ty check          # Type check
uv run pytest            # Test

Extending

What	How
New provider	Extend `OpenAIChatTransport` or `AnthropicMessagesTransport`, register in `provider_catalog.py` + `registry.py`
New middleware	Extend `middleware.Middleware`, add to pipeline in `api/runtime.py`
New bot platform	Implement `MessagingPlatform` in `messaging/platforms/`
New agent	Use the Agent Template

Troubleshooting

Claude Code says "undefined input_tokens" or malformed response

Set ANTHROPIC_BASE_URL to http://localhost:8082 (not /v1)
Check server.log for upstream errors
Update to latest commit

llama.cpp or LM Studio returns HTTP 400

Verify POST /v1/messages is supported
Increase --ctx-size for Claude Code prompts
Check base URL includes /v1

Provider disconnects during streaming

Reduce PROVIDER_MAX_CONCURRENCY, increase HTTP_READ_TIMEOUT, or enable failover.

Tool calls work on one model but not another

Tool support is model-dependent. Some models emit malformed tool JSON. Try another model.

Contributing

Issues for bugs and feature requests
Keep changes small and covered by tests
Run the full check sequence before PRs
except X, Y is valid Python 3.14 syntax (PEP 758) — do not "fix" it

License

Built by ASIFLOW.ai

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.claude		.claude
.github		.github
api		api
cli		cli
config		config
core		core
docs		docs
examples		examples
messaging		messaging
middleware		middleware
providers		providers
smoke		smoke
storage		storage
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE 2.md		CLAUDE 2.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
PLAN.md		PLAN.md
README.md		README.md
SECURITY.md		SECURITY.md
claude-pick		claude-pick
docker-compose.yml		docker-compose.yml
nvidia_nim_models.json		nvidia_nim_models.json
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
server.py		server.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Hyper Claude Code

At a Glance

What Is This?

Quick Start

1. Prerequisites

2. Clone and Configure

3. Start

4. Connect Claude Code

Intelligent Middleware

Pipeline Architecture

Providers

Mix Providers Per Tier

Architecture

Request Lifecycle

Dual Transport

Three-Tier Rate Limiting

Security

Engineering Decisions

31-Agent Development Team

The Layered Architecture

Agent Roster

NEXUS Protocol

Trust Infrastructure

Shadow Mind

Dynamic Specialist Hiring

Contract Tests

Team Documentation

API Endpoints

Discord & Telegram Bots

Voice Notes

Configuration

Development

Extending

Troubleshooting

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages