Skip to content

Mattbusel/llm-cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

llm-cpp

C++17 MIT License 26 Libraries Zero Deps Core

A suite of 26 single-header C++17 libraries for integrating large language models into native applications. Each library is a self-contained .hpp file — drop in what you need, define one implementation macro, and ship. No Python, no SDKs, no package manager required.


Start Here

Just want to call an LLM?llm-stream

Building a chatbot?llm-chat + llm-retry

Building RAG?llm-rag + llm-embed + llm-rank

Need production observability?llm-log + llm-trace + llm-cost


The Suite

Core — foundational primitives every LLM app needs

Library Description Deps
llm-stream Stream OpenAI & Anthropic responses via SSE libcurl
llm-retry Retry with exponential backoff + circuit breaker None
llm-cost Token counting + cost estimation for 6 models None
llm-cache LRU response cache — skip identical API calls None
llm-format JSON schema enforcement + structured output None
llm-json Recursive-descent JSON parser and builder None

Data — move, retrieve, and reshape information

Library Description Deps
llm-embed Text embeddings + cosine similarity + vector store libcurl
llm-rag Retrieval-augmented generation pipeline libcurl
llm-rank BM25 + LLM passage reranking, hybrid mode libcurl†
llm-compress Context compression: truncate, sliding window, summarize None*
llm-parse Offline HTML/markdown parsing, chunking, TextStats None
llm-batch Batch processing with thread pool, rate limiting, checkpointing libcurl

Ops — observe, test, and operate at scale

Library Description Deps
llm-log Structured JSONL logging for every LLM call None
llm-trace RAII span tracing with OTLP JSON export None
llm-pool Concurrent request pool with priority queue + rate limiting None
llm-mock Mock LLM provider for unit testing — zero network None
llm-eval N-run evaluation + consistency scoring + model comparison libcurl
llm-ab A/B testing with Welch t-test and Cohen d libcurl

App — build complete user-facing features

Library Description Deps
llm-chat Multi-turn conversation manager with token-budget truncation libcurl
llm-agent Tool-calling agent loop (OpenAI function calling) libcurl
llm-vision Multimodal image+text for OpenAI and Anthropic libcurl
llm-template Mustache-style prompt templating None
llm-router Route prompts to the right model by complexity None
llm-guard PII detection + prompt injection scoring — fully offline None
llm-audio Whisper transcription, translation, and TTS libcurl
llm-finetune Fine-tuning job lifecycle: upload, create, poll, manage models libcurl

*llm-compress requires libcurl only for the optional Summarize strategy. †llm-rank requires libcurl only for LLM-based reranking; local BM25 mode has zero deps.


Quickstart

Libraries compose naturally. Here is a production-ready pattern using llm-log, llm-retry, and llm-stream together:

#define LLM_LOG_IMPLEMENTATION
#include "llm_log.hpp"

#define LLM_RETRY_IMPLEMENTATION
#include "llm_retry.hpp"

#define LLM_STREAM_IMPLEMENTATION
#include "llm_stream.hpp"

int main() {
    llm::Logger logger("calls.jsonl");

    llm::Config cfg;
    cfg.api_key = std::getenv("OPENAI_API_KEY");
    cfg.model   = "gpt-4o-mini";

    const std::string prompt = "Explain backpressure in one paragraph.";
    auto log_id = logger.log_request(prompt, cfg.model);

    auto result = llm::with_retry<std::string>([&]() -> std::string {
        std::string output;
        llm::stream_openai(prompt, cfg,
            [&](std::string_view tok) { std::cout << tok << std::flush; output += tok; },
            [](const llm::StreamStats& s) {
                std::cout << "\n[" << s.token_count << " tokens, " << s.tokens_per_sec << " tok/s]\n";
            }
        );
        return output;
    });

    logger.log_response(log_id, result);
}

Another example — guard, route, and chat together:

#define LLM_GUARD_IMPLEMENTATION
#include "llm_guard.hpp"

#define LLM_ROUTER_IMPLEMENTATION
#include "llm_router.hpp"

#define LLM_CHAT_IMPLEMENTATION
#include "llm_chat.hpp"

int main() {
    // 1. Check input for PII / injection
    auto guard = llm::scan(user_input);
    if (!guard.safe) user_input = guard.scrubbed;

    // 2. Route to the right model
    llm::RouterConfig rcfg;
    rcfg.strategy = llm::RoutingStrategy::Balanced;
    rcfg.models   = {{"gpt-4o-mini", 0.15, 0.5, 0.7, 40}, {"gpt-4o", 5.0, 1.0, 0.9, 100}};
    auto decision = llm::Router(rcfg).route(user_input);

    // 3. Send with conversation memory
    llm::ChatConfig ccfg;
    ccfg.api_key = std::getenv("OPENAI_API_KEY");
    ccfg.model   = decision.model_name;
    llm::Conversation conv(ccfg);
    std::cout << conv.chat(user_input) << "\n";
}

Installation

Each library is a single .hpp file. Copy what you need:

# Core
curl -O https://raw.githubusercontent.com/Mattbusel/llm-stream/main/include/llm_stream.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-retry/main/include/llm_retry.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-cost/main/include/llm_cost.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-cache/main/include/llm_cache.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-format/main/include/llm_format.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-json/main/include/llm_json.hpp

# Data
curl -O https://raw.githubusercontent.com/Mattbusel/llm-embed/main/include/llm_embed.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-rag/main/include/llm_rag.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-rank/main/include/llm_rank.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-compress/main/include/llm_compress.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-parse/main/include/llm_parse.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-batch/main/include/llm_batch.hpp

# Ops
curl -O https://raw.githubusercontent.com/Mattbusel/llm-log/main/include/llm_log.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-trace/main/include/llm_trace.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-pool/main/include/llm_pool.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-mock/main/include/llm_mock.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-eval/main/include/llm_eval.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-ab/main/include/llm_ab.hpp

# App
curl -O https://raw.githubusercontent.com/Mattbusel/llm-chat/main/include/llm_chat.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-agent/main/include/llm_agent.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-vision/main/include/llm_vision.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-template/main/include/llm_template.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-router/main/include/llm_router.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-guard/main/include/llm_guard.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-audio/main/include/llm_audio.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-finetune/main/include/llm_finetune.hpp

In exactly one .cpp file per library, define the implementation macro before including:

#define LLM_STREAM_IMPLEMENTATION
#define LLM_RETRY_IMPLEMENTATION
#define LLM_LOG_IMPLEMENTATION
#include "llm_stream.hpp"
#include "llm_retry.hpp"
#include "llm_log.hpp"

All other translation units just #include without the macro.


Requirements

Requirement Detail
C++ standard C++17 or later
Compiler GCC, Clang, MSVC — all supported
External deps libcurl for network libraries (see table above). All others: zero deps.
Build system Any. Works with CMake, Make, Bazel, MSVC, plain g++.

License

All 26 libraries: MIT — Copyright (c) 2026 Mattbusel.

About

The C++ LLM toolkit. 26 single-header libraries for streaming, caching, cost estimation, retry, and structured output. Drop in what you need.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors