genai-forge

Lightweight, provider-agnostic utilities to call LLMs and parse their outputs with Pydantic. Build once, run on any LLM provider.

Features

🔄 Multi-Provider Support: OpenAI, Anthropic (Claude), Google (Gemini), Mistral AI, Cohere
🔗 Composable Chains: Pipe operators for elegant prompt → LLM → parser workflows
✅ Type-Safe Parsing: Validate LLM outputs with Pydantic models
📄 Smart Chunking: Multiple strategies for splitting documents in RAG pipelines
📦 Minimal Core: Only install what you need with optional provider dependencies
🔧 Integration Ready: Seamless integration with prompting-forge for prompt versioning

Installation

Basic Installation (OpenAI only)

pip install genai-forge

With Specific Providers

# Anthropic (Claude)
pip install genai-forge[anthropic]

# Google (Gemini)
pip install genai-forge[google]

# Mistral AI
pip install genai-forge[mistral]

# Cohere
pip install genai-forge[cohere]

# Token-based chunking support
pip install genai-forge[chunkers]

# All providers and features
pip install genai-forge[all]

Requirements

Python 3.10+
API keys for your chosen provider(s)

Quick Start

1. Set Up Environment Variables

Create a .env file in your project root:

# OpenAI
OPENAI_API_KEY=sk-...

# Anthropic (Claude)
ANTHROPIC_API_KEY=sk-ant-...

# Google (Gemini)
GOOGLE_API_KEY=AIza...

# Mistral AI
MISTRAL_API_KEY=...

# Cohere
COHERE_API_KEY=...

genai-forge automatically loads .env files via python-dotenv.

2. Basic Usage

from genai_forge import get_llm
from prompting_forge.prompting import PromptTemplate

# Create a prompt template
template = PromptTemplate(
    system="You are a concise expert assistant.",
    template="Generate one actionable tip.\nAudience: {audience}\nTime: {time}",
)

# Create an LLM (choose your provider)
llm = get_llm("openai:gpt-4o-mini", temperature=0.2)

# Chain: query | template | llm
query = "Provide a short productivity tip."
chain = query | template | llm
result = chain({"audience": "Backend Python developer", "time": "30 minutes"})
print(result)

Supported Providers & Models

OpenAI

# GPT-4o models
llm = get_llm("openai:gpt-4o", temperature=0.3)
llm = get_llm("openai:gpt-4o-mini", temperature=0.3)

# GPT-4 models
llm = get_llm("openai:gpt-4-turbo", temperature=0.3)
llm = get_llm("openai:gpt-4", temperature=0.3)

# GPT-3.5 models
llm = get_llm("openai:gpt-3.5-turbo", temperature=0.3)

Environment Variable: OPENAI_API_KEY

Anthropic (Claude)

# Claude 3.5 models
llm = get_llm("anthropic:claude-3-5-sonnet-20241022", temperature=0.3)
llm = get_llm("anthropic:claude-3-5-haiku-20241022", temperature=0.3)

# Claude 3 models
llm = get_llm("anthropic:claude-3-opus-20240229", temperature=0.3)
llm = get_llm("anthropic:claude-3-sonnet-20240229", temperature=0.3)
llm = get_llm("anthropic:claude-3-haiku-20240307", temperature=0.3)

Environment Variable: ANTHROPIC_API_KEY

Google (Gemini)

# Gemini 2.0 models
llm = get_llm("google:gemini-2.0-flash-exp", temperature=0.3)

# Gemini 1.5 models
llm = get_llm("google:gemini-1.5-pro", temperature=0.3)
llm = get_llm("google:gemini-1.5-flash", temperature=0.3)
llm = get_llm("google:gemini-1.5-flash-8b", temperature=0.3)

Environment Variable: GOOGLE_API_KEY

Mistral AI

# Mistral models
llm = get_llm("mistral:mistral-large-latest", temperature=0.3)
llm = get_llm("mistral:mistral-medium-latest", temperature=0.3)
llm = get_llm("mistral:mistral-small-latest", temperature=0.3)

# Open models
llm = get_llm("mistral:open-mistral-7b", temperature=0.3)
llm = get_llm("mistral:open-mixtral-8x7b", temperature=0.3)
llm = get_llm("mistral:open-mixtral-8x22b", temperature=0.3)

Environment Variable: MISTRAL_API_KEY

Cohere

# Command R models
llm = get_llm("cohere:command-r-plus", temperature=0.3)
llm = get_llm("cohere:command-r", temperature=0.3)

# Command models
llm = get_llm("cohere:command", temperature=0.3)
llm = get_llm("cohere:command-light", temperature=0.3)

Environment Variable: COHERE_API_KEY

Parsing Structured Outputs with Pydantic

Use PydanticOutputParser to have the LLM return valid JSON validated into a Pydantic model. Format instructions are automatically injected into your prompt.

from typing import List
from pydantic import BaseModel
from genai_forge import get_llm, PydanticOutputParser
from prompting_forge.prompting import PromptTemplate

class CityPlan(BaseModel):
    city: str
    attractions: List[str]
    days: int

template = PromptTemplate(
    system="You are a helpful travel planner.",
    template="Create a city plan.\nCity: {city}\nDays: {days}",
)

# Use any provider you want
llm = get_llm("anthropic:claude-3-5-haiku-20241022", temperature=0.1)
parser = PydanticOutputParser(CityPlan)

# Chain: query | template | llm | parser
query = "Create a 3-day city plan for Tokyo."
chain = query | template | llm | parser
result = chain({"city": "Tokyo", "days": 3})  # -> CityPlan instance
print(f"City: {result.city}")
print(f"Days: {result.days}")
print(f"Attractions: {', '.join(result.attractions)}")

How Parsing Works

PydanticOutputParser:

Accepts tolerant output formats (e.g., extra text or ```json fences)
Extracts JSON from the LLM response
Validates against your Pydantic model
Automatically injects format instructions when used in a chain

API surface:

from genai_forge import PydanticOutputParser, BaseOutputParser, OutputParserException

parser = PydanticOutputParser(YourModel)
instructions = parser.get_format_instructions()  # JSON schema for the LLM
validated_obj = parser.parse(llm_output_text)    # Parsed & validated model

Chaining with the Pipe Operator

The | operator builds elegant pipelines:

# Simple chain
chain = template | llm

# With parser
chain = template | llm | parser

# With query
chain = query | template | llm | parser

# Execute
result = chain(context_variables)

What happens:

query + template → renders prompt with context variables
llm → sends prompt to LLM provider
parser → validates and parses response (format instructions auto-injected)

Text Chunking for RAG Pipelines

genai-forge provides flexible text chunking for document processing and RAG (Retrieval-Augmented Generation) pipelines.

Available Chunkers

Character-Based Chunking

Simple chunking by character count with overlap:

from genai_forge import get_chunker

chunker = get_chunker(
    "character",
    chunk_size=500,
    chunk_overlap=50
)

chunks = chunker.chunk(long_text)
for chunk in chunks:
    print(f"Chunk {chunk.chunk_index}: {len(chunk.text)} chars")

Token-Based Chunking

Chunk by token count (requires tiktoken):

chunker = get_chunker(
    "token",
    chunk_size=512,
    chunk_overlap=50,
    encoding_name="cl100k_base"  # GPT-4 encoding
)

chunks = chunker.chunk(long_text)
for chunk in chunks:
    print(f"Tokens: {chunk.metadata['token_count']}")

Install tiktoken: pip install genai-forge[chunkers]

Sentence-Aware Chunking

Respects sentence boundaries for better semantic coherence:

chunker = get_chunker(
    "sentence",
    chunk_size=1000,
    chunk_overlap=200,
    respect_sentences=True
)

chunks = chunker.chunk(long_text)
for chunk in chunks:
    print(f"Sentences: {chunk.metadata['sentence_count']}")

Semantic Chunking

Groups sentences by semantic similarity using embeddings:

from genai_forge import get_chunker, get_embedding

# Get embedding model
embedding = get_embedding("openai:text-embedding-3-small")

# Create semantic chunker
chunker = get_chunker(
    "semantic",
    embedding=embedding,
    chunk_size=1000,
    similarity_threshold=0.7  # Higher = more similar to stay together
)

chunks = chunker.chunk(long_text)

Simple RAG Pipeline Example

from genai_forge import get_chunker, get_embedding, get_llm
from prompting_forge.prompting import PromptTemplate

# 1. Chunk your documents
chunker = get_chunker("sentence", chunk_size=500, chunk_overlap=100)
chunks = chunker.chunk(document_text)

# 2. Create embeddings
embedding = get_embedding("openai:text-embedding-3-small")
chunk_embeddings = embedding([c.text for c in chunks])

# 3. Store in vector DB (simplified example)
doc_store = [
    {"text": chunk.text, "embedding": emb}
    for chunk, emb in zip(chunks, chunk_embeddings)
]

# 4. Query
query = "What is machine learning?"
query_embedding = embedding(query)

# 5. Find relevant chunks (simplified similarity search)
def cosine_similarity(a, b):
    dot = sum(x * y for x, y in zip(a, b))
    mag_a = sum(x * x for x in a) ** 0.5
    mag_b = sum(y * y for y in b) ** 0.5
    return dot / (mag_a * mag_b)

similarities = [
    (doc, cosine_similarity(query_embedding, doc["embedding"]))
    for doc in doc_store
]
similarities.sort(key=lambda x: x[1], reverse=True)
relevant_context = similarities[0][0]["text"]

# 6. Generate answer with LLM
template = PromptTemplate(
    system="Answer based on the provided context.",
    template="Context: {context}\n\nQuestion: {question}"
)
llm = get_llm("openai:gpt-4o-mini")
chain = template | llm
answer = chain({"context": relevant_context, "question": query})
print(answer)

Custom Chunker

Create your own chunking strategy:

from genai_forge import BaseChunker, Chunk
from genai_forge.chunkers.registry import register_chunker

@register_chunker("custom")
class CustomChunker(BaseChunker):
    def chunk(self, text: str, **kwargs):
        # Your custom chunking logic
        paragraphs = text.split("\n\n")
        chunks = []
        for i, para in enumerate(paragraphs):
            chunk = self._create_chunk(
                text=para,
                start_index=0,
                end_index=len(para),
                chunk_index=i
            )
            chunks.append(chunk)
        return chunks

# Use it
chunker = get_chunker("custom")
chunks = chunker.chunk(text)

Embedding Models

genai-forge also supports embedding models for generating vector representations of text.

Basic Embedding Usage

from genai_forge import get_embedding

# Create an embedding model
embedding = get_embedding("openai:text-embedding-3-small")

# Embed a single text
vector = embedding("Hello, world!")
print(f"Embedding dimension: {len(vector)}")

# Embed multiple texts
texts = ["First document", "Second document", "Third document"]
vectors = embedding(texts)
print(f"Number of vectors: {len(vectors)}")

Supported Embedding Models

OpenAI

# Latest V3 models
emb = get_embedding("openai:text-embedding-3-large")  # 3072 dimensions
emb = get_embedding("openai:text-embedding-3-small")  # 1536 dimensions

# Legacy V2 model
emb = get_embedding("openai:text-embedding-ada-002")  # 1536 dimensions

Environment Variable: OPENAI_API_KEY

Google (Gemini)

emb = get_embedding("google:text-embedding-004")  # 768 dimensions
emb = get_embedding("google:embedding-001")       # 768 dimensions (legacy)

Environment Variable: GOOGLE_API_KEY

Mistral AI

emb = get_embedding("mistral:mistral-embed")  # 1024 dimensions

Environment Variable: MISTRAL_API_KEY

Cohere

# Standard models
emb = get_embedding("cohere:embed-english-v3.0")       # 1024 dimensions
emb = get_embedding("cohere:embed-multilingual-v3.0")  # 1024 dimensions

# Lightweight models
emb = get_embedding("cohere:embed-english-light-v3.0")       # 384 dimensions
emb = get_embedding("cohere:embed-multilingual-light-v3.0")  # 384 dimensions

Environment Variable: COHERE_API_KEY

Embedding Use Cases

Semantic Search

from genai_forge import get_embedding
import numpy as np

# Initialize embedding model
embedding = get_embedding("openai:text-embedding-3-small")

# Documents to search
documents = [
    "Python is a programming language",
    "Machine learning uses algorithms",
    "Natural language processing analyzes text",
]

# Embed documents
doc_vectors = embedding(documents)

# Query
query = "What is NLP?"
query_vector = embedding(query)

# Compute cosine similarities
def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

similarities = [cosine_similarity(query_vector, doc) for doc in doc_vectors]

# Find most similar document
best_idx = np.argmax(similarities)
print(f"Most relevant: {documents[best_idx]}")
print(f"Similarity: {similarities[best_idx]:.4f}")

Document Clustering

from genai_forge import get_embedding
from sklearn.cluster import KMeans

embedding = get_embedding("openai:text-embedding-3-small")

docs = [
    "Python programming tutorial",
    "Java development guide",
    "Cooking recipes for beginners",
    "Advanced Python techniques",
    "Italian cuisine recipes",
]

# Get embeddings
vectors = embedding(docs)

# Cluster
kmeans = KMeans(n_clusters=2, random_state=0)
labels = kmeans.fit_predict(vectors)

for doc, label in zip(docs, labels):
    print(f"Cluster {label}: {doc}")

Multi-Provider Comparison Example

Compare outputs from different providers on the same prompt:

from genai_forge import get_llm
from prompting_forge.prompting import PromptTemplate

template = PromptTemplate(
    system="You are a creative writer.",
    template="Write a haiku about {topic}.",
)

providers = [
    "openai:gpt-4o-mini",
    "anthropic:claude-3-5-haiku-20241022",
    "google:gemini-1.5-flash",
    "mistral:mistral-small-latest",
    "cohere:command-light",
]

context = {"topic": "artificial intelligence"}

for provider_model in providers:
    try:
        llm = get_llm(provider_model, temperature=0.7)
        chain = template | llm
        result = chain(context)
        print(f"\n{provider_model}:")
        print(result)
    except Exception as e:
        print(f"\n{provider_model}: ERROR - {e}")

Advanced Usage: LLMCall with Versioning

LLMCall provides advanced features like prompt versioning and call logging:

from genai_forge import get_llm
from genai_forge.llm import LLMCall
from prompting_forge.prompting import PromptTemplate

template = PromptTemplate(
    system="You are a helpful assistant.",
    template="Explain {concept} in simple terms.",
)

llm = get_llm("openai:gpt-4o-mini")

# Create an LLMCall with versioning
call = LLMCall(
    query="Explain the concept clearly",
    prompt_template=template,
    client=llm,
    name="explainer_assistant",
    enable_versioning=True,  # Saves call records to .llm_call/
)

# Execute
rendered_prompt, response = call.run({"concept": "quantum computing"})

print("Rendered:", rendered_prompt)
print("Response:", response)

Call records are saved to .llm_call/{name}/{timestamp}.json with full request/response details.

Integration with prompting-forge

genai-forge works seamlessly with prompting-forge for prompt versioning and synthesis:

from prompting_forge.prompting import PromptTemplate, FinalPromptTemplate
from genai_forge import get_llm
from genai_forge.llm import LLMCall

# Create versioned prompts
v1 = PromptTemplate(
    system="You are a helpful assistant.",
    template="Translate: {text}",
    instance_name="translator"
)

v2 = PromptTemplate(
    system="You are a professional translator.",
    template="Translate the following text to {language}:\n{text}",
    instance_name="translator"
)

# Synthesize final prompt from versions
llm = get_llm("openai:gpt-4o")
final = FinalPromptTemplate(
    instance_name="translator",
    variables=["text", "language"],
    llm_client=llm
)

# Use final prompt in production
call = LLMCall(
    query="Translate this text",
    prompt_template=final,
    client=llm,
    name="production_translator"
)

result = call.run({"text": "Hello, world!", "language": "Spanish"})

See the prompting-forge documentation for more details.

Provider Configuration

Default Provider

If you don't specify a provider, OpenAI is used by default:

llm = get_llm("gpt-4o-mini")  # Same as "openai:gpt-4o-mini"

Explicit Provider

Always recommended for clarity:

llm = get_llm("openai:gpt-4o-mini")
llm = get_llm("anthropic:claude-3-5-sonnet-20241022")

Override API Key

Pass the API key directly instead of using environment variables:

llm = get_llm(
    "anthropic:claude-3-5-haiku-20241022",
    api_key="sk-ant-your-key-here",
    temperature=0.2
)

Error Handling

from genai_forge import get_llm, OutputParserException

try:
    llm = get_llm("unknown:model")
except ValueError as e:
    print(f"Unknown provider: {e}")

try:
    result = parser.parse(invalid_json)
except OutputParserException as e:
    print(f"Parsing failed: {e}")

Running the Examples

Example files are included in the repository:

example.py - LLM and embedding examples:

Multiple provider usage
PromptTemplate with system prompts
PydanticOutputParser for structured outputs
Embedding models for semantic search

example_chunkers.py - Text chunking for RAG:

Character, token, sentence, and semantic chunking
Custom chunker implementation
Complete RAG pipeline example

Ensure you have a .env with your API keys, then:

python example.py
python example_chunkers.py

Project Structure

genai_forge/
├── __init__.py                  # Public API
├── llm/                         # LLM core
│   ├── base.py                 # BaseLLM, LLM protocol
│   ├── registry.py             # Provider registry & factory
│   └── llm_call.py             # LLMCall with versioning
├── embeddings/                  # Embedding models
│   ├── base.py                 # BaseEmbedding, Embedding protocol
│   └── registry.py             # Embedding registry & factory
├── chunkers/                    # Text chunking for RAG
│   ├── base.py                 # BaseChunker, Chunker protocol, Chunk
│   ├── registry.py             # Chunker registry & factory
│   ├── character_chunker.py   # Character-based chunking
│   ├── token_chunker.py       # Token-based chunking
│   ├── sentence_chunker.py    # Sentence-aware chunking
│   └── semantic_chunker.py    # Semantic chunking with embeddings
├── parsing/                     # Output parsers
│   └── output_parser.py        # PydanticOutputParser
├── providers/                   # LLM providers
│   ├── openai.py               # OpenAI
│   ├── anthropic.py            # Anthropic (Claude)
│   ├── google.py               # Google (Gemini)
│   ├── mistral.py              # Mistral AI
│   └── cohere.py               # Cohere
└── embedding_providers/         # Embedding providers
    ├── openai_emb.py           # OpenAI embeddings
    ├── google_emb.py           # Google embeddings
    ├── mistral_emb.py          # Mistral embeddings
    └── cohere_emb.py           # Cohere embeddings

Architecture

See ARCHITECTURE.md for detailed design documentation.

API Reference

Core Functions

get_llm(model: str, **kwargs) -> LLM

model: Provider and model name (e.g., "openai:gpt-4o-mini")
temperature: Sampling temperature (default: 0.3)
api_key: Optional API key override
provider: Optional explicit provider name
logger: Optional logger instance
Returns: LLM instance (callable)

get_embedding(model: str, **kwargs) -> Embedding

model: Provider and model name (e.g., "openai:text-embedding-3-small")
api_key: Optional API key override
provider: Optional explicit provider name
logger: Optional logger instance
Returns: Embedding instance (callable that takes text and returns vectors)

get_chunker(name: str, **kwargs) -> Chunker

name: Chunker type ("character", "token", "sentence", "semantic")
chunk_size: Target size for each chunk
chunk_overlap: Overlap between consecutive chunks
Other chunker-specific parameters
Returns: Chunker instance (callable that takes text and returns list of Chunks)

PydanticOutputParser(model: Type[T], strict: bool = True)

model: Pydantic model class
strict: Whether to enforce strict validation
Methods:
- get_format_instructions() -> str
- parse(text: str) -> T

LLMCall(query, prompt_template, client, **kwargs)

query: User query string
prompt_template: PromptTemplate instance
client: LLM instance
output_parser: Optional parser
name: Instance name for versioning
enable_versioning: Save call records
version_root: Root directory for versioning
Methods:
- run(context: dict) -> tuple[str, Any]

FAQ

Can I use multiple providers in the same application?

Yes! Each get_llm() call creates an independent LLM instance:

openai_llm = get_llm("openai:gpt-4o-mini")
claude_llm = get_llm("anthropic:claude-3-5-sonnet-20241022")
gemini_llm = get_llm("google:gemini-1.5-pro")

Do I need all provider packages installed?

No. Only install the providers you need:

pip install genai-forge[anthropic,google]  # Only Anthropic and Google

What if a provider API changes?

genai-forge abstracts provider differences. Update the library version, and your code should continue working.

How do I add a custom provider?

See ARCHITECTURE.md § Extensibility for a guide on implementing custom providers.

Can I use this with async code?

Not yet. Async support is planned for a future release.

Contributing

Contributions are welcome! Areas for improvement:

Additional providers (Hugging Face, AI21, etc.)
Async support
Streaming responses
Enhanced error handling
More examples

Changelog

0.2.0 (2025-11-12)

✨ Added multi-provider support: Anthropic, Google, Mistral, Cohere
📚 Comprehensive ARCHITECTURE.md documentation
🔧 Optional provider dependencies
📦 Improved package structure

0.1.17

🚀 Initial release with OpenAI support
✅ Pydantic output parsing
🔗 Chaining with pipe operator
📝 Prompt versioning with LLMCall

License

See LICENSE.

Links

Repository: github.com/ToolForge-AI/genai-forge
Issues: github.com/ToolForge-AI/genai-forge/issues
Related: prompting-forge - Prompt versioning and synthesis

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
examples		examples
genai_forge		genai_forge
.gitignore		.gitignore
.python-version		.python-version
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
example.py		example.py
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

genai-forge

Features

Installation

Basic Installation (OpenAI only)

With Specific Providers

Requirements

Quick Start

1. Set Up Environment Variables

2. Basic Usage

Supported Providers & Models

OpenAI

Anthropic (Claude)

Google (Gemini)

Mistral AI

Cohere

Parsing Structured Outputs with Pydantic

How Parsing Works

Chaining with the Pipe Operator

Text Chunking for RAG Pipelines

Available Chunkers

Character-Based Chunking

Token-Based Chunking

Sentence-Aware Chunking

Semantic Chunking

Simple RAG Pipeline Example

Custom Chunker

Embedding Models

Basic Embedding Usage

Supported Embedding Models

OpenAI

Google (Gemini)

Mistral AI

Cohere

Embedding Use Cases

Semantic Search

Document Clustering

Multi-Provider Comparison Example

Advanced Usage: LLMCall with Versioning

Integration with prompting-forge

Provider Configuration

Default Provider

Explicit Provider

Override API Key

Error Handling

Running the Examples

Project Structure

Architecture

API Reference

Core Functions

FAQ

Can I use multiple providers in the same application?

Do I need all provider packages installed?

What if a provider API changes?

How do I add a custom provider?

Can I use this with async code?

Contributing

Changelog

0.2.0 (2025-11-12)

0.1.17

License

Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 20

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages