The complete toolkit for developers building LLM applications.
Built to drive production ML systems at ApX Machine Learning (apxml.com), now available as open source.
Advanced LLM techniques made simple. Clean, easy-to-use interfaces for complex operations.
Only install what you need. Kerb is modular, no unnecessary dependencies.
Works with any LLM project. Kerb is a toolkit, not a framework. Use it alongside your existing stack.
# Install everything
pip install kerb[all]
# Or install specific modules
pip install kerb[generation] kerb[embeddings] kerb[evaluation]
from kerb.generation import generate, ModelName, LLMProvider
from kerb.prompt import render_template
# Generate with any provider, easy config change.
response = generate(
"Explain quantum computing",
model=ModelName.GPT_4O_MINI,
provider=LLMProvider.OPENAI
)
print(f"Response: {response.content}")
print(f"Tokens: {response.usage.total_tokens}")
print(f"Cost: ${response.cost:.6f}")
Everything you need to build LLM applications.
Module | Description |
---|---|
Agent | Agent orchestration and execution patterns for multi-step reasoning. |
Cache | Response and embedding caching to reduce costs and latency. |
Chunk | Text chunking utilities for optimal context windows and retrieval. |
Config | Configuration management for models, providers, and application settings. |
Context | Context window management and token budget tracking. |
Document | Document loading and processing for PDFs, web pages, and more. |
Embedding | Embedding generation and similarity search helpers. |
Evaluation | Metrics and benchmarking tools for LLM outputs. |
Fine-Tuning | Model fine-tuning utilities and large dataset preparation. |
Generation | Unified LLM generation with multi-provider support (OpenAI, Anthropic, Gemini). |
Memory | Conversation memory and entity tracking for stateful applications. |
Multimodal | Image, audio, and video processing for multimodal models. |
Parsing | Output parsing and validation (JSON, structured data, function calls). |
Preprocessing | Text cleaning and preprocessing for LLM inputs. |
Prompt | Prompt engineering utilities, templates, and chain-of-thought patterns. |
Retrieval | RAG and vector search utilities for semantic retrieval. |
Safety | Content moderation and safety filters. |
Testing | Testing utilities for LLM outputs and evaluation. |
Tokenizer | Token counting and text splitting for any model. |
kerb/
├── core/ # Shared types and interfaces
├── agent/ # Agent systems and reasoning
├── cache/ # Caching mechanisms
├── chunk/ # Text chunking utilities
├── config/ # Configuration management
├── context/ # Context window management
├── document/ # Document loading
├── embedding/ # Embedding generation
├── evaluation/ # Evaluation metrics
├── fine_tuning/ # Model fine-tuning
├── generation/ # LLM text generation
├── memory/ # Memory systems
├── multimodal/ # Multimodal processing
├── parsing/ # Output parsing
├── preprocessing/ # Text preprocessing
├── prompt/ # Prompt management
├── retrieval/ # RAG and retrieval
├── safety/ # Content safety
├── testing/ # Testing utilities
└── tokenizer/ # Token counting
from kerb.document import load_document
from kerb.chunk import chunk_text
from kerb.embedding import embed, embed_batch
from kerb.retrieval import semantic_search, Document
from kerb.generation import generate, ModelName, LLMProvider
# Load and process document
doc = load_document("paper.pdf")
chunks = chunk_text(doc.content, chunk_size=512, overlap=50)
# Create embeddings
chunk_embeddings = embed_batch(chunks)
# Search for relevant chunks
query = "main findings"
query_embedding = embed(query)
documents = [Document(content=c) for c in chunks]
results = semantic_search(
query_embedding=query_embedding,
documents=documents,
document_embeddings=chunk_embeddings,
top_k=5
)
# Generate answer with context
context = "\n".join([r.document.content for r in results])
answer = generate(
f"Based on: {context}\n\nQuestion: What are the main findings?",
model=ModelName.GPT_4O_MINI,
provider=LLMProvider.OPENAI
)
from kerb.cache import create_memory_cache, generate_prompt_key
from kerb.generation import generate, ModelName
cache = create_memory_cache(max_size=1000, default_ttl=3600)
def cached_generate(prompt, model=ModelName.GPT_4O_MINI, temperature=0.7):
cache_key = generate_prompt_key(
prompt,
model=model.value,
temperature=temperature
)
if cached := cache.get(cache_key):
return cached['response']
response = generate(prompt, model=model, temperature=temperature)
cache.set(cache_key, {'response': response, 'cost': response.cost})
return response
# First call
response1 = cached_generate("Explain Python decorators briefly")
# Hit Cache
response2 = cached_generate("Explain Python decorators briefly")
from kerb.agent.patterns import ReActAgent
def llm_function(prompt: str) -> str:
"""Your LLM function (OpenAI, Anthropic, etc.)"""
# Implementation here
return "agent response"
# Create a ReAct agent
agent = ReActAgent(
name="ResearchAgent",
llm_func=llm_function,
max_iterations=5
)
# Execute multi-step task
result = agent.run("Research the latest AI papers and summarize key trends")
print(f"Status: {result.status.value}")
print(f"Output: {result.output}")
print(f"Steps taken: {len(result.steps)}")
from kerb.evaluation import (
calculate_bleu,
calculate_rouge,
calculate_f1_score,
calculate_semantic_similarity
)
# Evaluate translation quality
reference = "Hello, how are you?"
candidate = "Hi, how are you?"
# Calculate metrics
bleu_score = calculate_bleu(candidate, reference)
rouge_scores = calculate_rouge(candidate, reference, rouge_type="rouge-l")
f1 = calculate_f1_score(candidate, reference)
print(f"BLEU: {bleu_score:.3f}")
print(f"ROUGE-L F1: {rouge_scores['fmeasure']:.3f}")
print(f"F1 Score: {f1:.3f}")
from kerb.fine_tuning import (
write_jsonl,
read_jsonl,
TrainingExample,
TrainingDataset,
DatasetFormat,
to_openai_format,
)
from kerb.fine_tuning.jsonl import (
append_jsonl,
merge_jsonl,
validate_jsonl,
count_jsonl_lines,
)
# Create training examples
examples = []
for i in range(10):
examples.append(TrainingExample(
messages=[
{"role": "user", "content": f"How do I use Python feature {i}?"},
{"role": "assistant", "content": f"Here's how to use feature {i}: example_code()"}
],
metadata={"category": "coding", "index": i}
))
dataset = TrainingDataset(
examples=examples,
format=DatasetFormat.CHAT,
metadata={"source": "coding_qa"}
)
# Convert to OpenAI format and write to JSONL
data = to_openai_format(dataset)
write_jsonl(data, "training_data.jsonl")
# Validate the JSONL file
result = validate_jsonl("training_data.jsonl")
print(f"Valid: {result.is_valid}, Examples: {result.total_examples}")
# Count lines efficiently
count = count_jsonl_lines("training_data.jsonl")
print(f"Total examples: {count}")
Apache 2.0 License - see LICENSE for details.