Lightweight, provider-agnostic utilities to call LLMs and parse their outputs with Pydantic. Build once, run on any LLM provider.
- 🔄 Multi-Provider Support: OpenAI, Anthropic (Claude), Google (Gemini), Mistral AI, Cohere
- 🔗 Composable Chains: Pipe operators for elegant prompt → LLM → parser workflows
- ✅ Type-Safe Parsing: Validate LLM outputs with Pydantic models
- 📄 Smart Chunking: Multiple strategies for splitting documents in RAG pipelines
- 📦 Minimal Core: Only install what you need with optional provider dependencies
- 🔧 Integration Ready: Seamless integration with
prompting-forgefor prompt versioning
pip install genai-forge# Anthropic (Claude)
pip install genai-forge[anthropic]
# Google (Gemini)
pip install genai-forge[google]
# Mistral AI
pip install genai-forge[mistral]
# Cohere
pip install genai-forge[cohere]
# Token-based chunking support
pip install genai-forge[chunkers]
# All providers and features
pip install genai-forge[all]- Python 3.10+
- API keys for your chosen provider(s)
Create a .env file in your project root:
# OpenAI
OPENAI_API_KEY=sk-...
# Anthropic (Claude)
ANTHROPIC_API_KEY=sk-ant-...
# Google (Gemini)
GOOGLE_API_KEY=AIza...
# Mistral AI
MISTRAL_API_KEY=...
# Cohere
COHERE_API_KEY=...genai-forge automatically loads .env files via python-dotenv.
from genai_forge import get_llm
from prompting_forge.prompting import PromptTemplate
# Create a prompt template
template = PromptTemplate(
system="You are a concise expert assistant.",
template="Generate one actionable tip.\nAudience: {audience}\nTime: {time}",
)
# Create an LLM (choose your provider)
llm = get_llm("openai:gpt-4o-mini", temperature=0.2)
# Chain: query | template | llm
query = "Provide a short productivity tip."
chain = query | template | llm
result = chain({"audience": "Backend Python developer", "time": "30 minutes"})
print(result)# GPT-4o models
llm = get_llm("openai:gpt-4o", temperature=0.3)
llm = get_llm("openai:gpt-4o-mini", temperature=0.3)
# GPT-4 models
llm = get_llm("openai:gpt-4-turbo", temperature=0.3)
llm = get_llm("openai:gpt-4", temperature=0.3)
# GPT-3.5 models
llm = get_llm("openai:gpt-3.5-turbo", temperature=0.3)Environment Variable: OPENAI_API_KEY
# Claude 3.5 models
llm = get_llm("anthropic:claude-3-5-sonnet-20241022", temperature=0.3)
llm = get_llm("anthropic:claude-3-5-haiku-20241022", temperature=0.3)
# Claude 3 models
llm = get_llm("anthropic:claude-3-opus-20240229", temperature=0.3)
llm = get_llm("anthropic:claude-3-sonnet-20240229", temperature=0.3)
llm = get_llm("anthropic:claude-3-haiku-20240307", temperature=0.3)Environment Variable: ANTHROPIC_API_KEY
# Gemini 2.0 models
llm = get_llm("google:gemini-2.0-flash-exp", temperature=0.3)
# Gemini 1.5 models
llm = get_llm("google:gemini-1.5-pro", temperature=0.3)
llm = get_llm("google:gemini-1.5-flash", temperature=0.3)
llm = get_llm("google:gemini-1.5-flash-8b", temperature=0.3)Environment Variable: GOOGLE_API_KEY
# Mistral models
llm = get_llm("mistral:mistral-large-latest", temperature=0.3)
llm = get_llm("mistral:mistral-medium-latest", temperature=0.3)
llm = get_llm("mistral:mistral-small-latest", temperature=0.3)
# Open models
llm = get_llm("mistral:open-mistral-7b", temperature=0.3)
llm = get_llm("mistral:open-mixtral-8x7b", temperature=0.3)
llm = get_llm("mistral:open-mixtral-8x22b", temperature=0.3)Environment Variable: MISTRAL_API_KEY
# Command R models
llm = get_llm("cohere:command-r-plus", temperature=0.3)
llm = get_llm("cohere:command-r", temperature=0.3)
# Command models
llm = get_llm("cohere:command", temperature=0.3)
llm = get_llm("cohere:command-light", temperature=0.3)Environment Variable: COHERE_API_KEY
Use PydanticOutputParser to have the LLM return valid JSON validated into a Pydantic model. Format instructions are automatically injected into your prompt.
from typing import List
from pydantic import BaseModel
from genai_forge import get_llm, PydanticOutputParser
from prompting_forge.prompting import PromptTemplate
class CityPlan(BaseModel):
city: str
attractions: List[str]
days: int
template = PromptTemplate(
system="You are a helpful travel planner.",
template="Create a city plan.\nCity: {city}\nDays: {days}",
)
# Use any provider you want
llm = get_llm("anthropic:claude-3-5-haiku-20241022", temperature=0.1)
parser = PydanticOutputParser(CityPlan)
# Chain: query | template | llm | parser
query = "Create a 3-day city plan for Tokyo."
chain = query | template | llm | parser
result = chain({"city": "Tokyo", "days": 3}) # -> CityPlan instance
print(f"City: {result.city}")
print(f"Days: {result.days}")
print(f"Attractions: {', '.join(result.attractions)}")PydanticOutputParser:
- Accepts tolerant output formats (e.g., extra text or ```json fences)
- Extracts JSON from the LLM response
- Validates against your Pydantic model
- Automatically injects format instructions when used in a chain
API surface:
from genai_forge import PydanticOutputParser, BaseOutputParser, OutputParserException
parser = PydanticOutputParser(YourModel)
instructions = parser.get_format_instructions() # JSON schema for the LLM
validated_obj = parser.parse(llm_output_text) # Parsed & validated modelThe | operator builds elegant pipelines:
# Simple chain
chain = template | llm
# With parser
chain = template | llm | parser
# With query
chain = query | template | llm | parser
# Execute
result = chain(context_variables)What happens:
query+template→ renders prompt with context variablesllm→ sends prompt to LLM providerparser→ validates and parses response (format instructions auto-injected)
genai-forge provides flexible text chunking for document processing and RAG (Retrieval-Augmented Generation) pipelines.
Simple chunking by character count with overlap:
from genai_forge import get_chunker
chunker = get_chunker(
"character",
chunk_size=500,
chunk_overlap=50
)
chunks = chunker.chunk(long_text)
for chunk in chunks:
print(f"Chunk {chunk.chunk_index}: {len(chunk.text)} chars")Chunk by token count (requires tiktoken):
chunker = get_chunker(
"token",
chunk_size=512,
chunk_overlap=50,
encoding_name="cl100k_base" # GPT-4 encoding
)
chunks = chunker.chunk(long_text)
for chunk in chunks:
print(f"Tokens: {chunk.metadata['token_count']}")Install tiktoken: pip install genai-forge[chunkers]
Respects sentence boundaries for better semantic coherence:
chunker = get_chunker(
"sentence",
chunk_size=1000,
chunk_overlap=200,
respect_sentences=True
)
chunks = chunker.chunk(long_text)
for chunk in chunks:
print(f"Sentences: {chunk.metadata['sentence_count']}")Groups sentences by semantic similarity using embeddings:
from genai_forge import get_chunker, get_embedding
# Get embedding model
embedding = get_embedding("openai:text-embedding-3-small")
# Create semantic chunker
chunker = get_chunker(
"semantic",
embedding=embedding,
chunk_size=1000,
similarity_threshold=0.7 # Higher = more similar to stay together
)
chunks = chunker.chunk(long_text)from genai_forge import get_chunker, get_embedding, get_llm
from prompting_forge.prompting import PromptTemplate
# 1. Chunk your documents
chunker = get_chunker("sentence", chunk_size=500, chunk_overlap=100)
chunks = chunker.chunk(document_text)
# 2. Create embeddings
embedding = get_embedding("openai:text-embedding-3-small")
chunk_embeddings = embedding([c.text for c in chunks])
# 3. Store in vector DB (simplified example)
doc_store = [
{"text": chunk.text, "embedding": emb}
for chunk, emb in zip(chunks, chunk_embeddings)
]
# 4. Query
query = "What is machine learning?"
query_embedding = embedding(query)
# 5. Find relevant chunks (simplified similarity search)
def cosine_similarity(a, b):
dot = sum(x * y for x, y in zip(a, b))
mag_a = sum(x * x for x in a) ** 0.5
mag_b = sum(y * y for y in b) ** 0.5
return dot / (mag_a * mag_b)
similarities = [
(doc, cosine_similarity(query_embedding, doc["embedding"]))
for doc in doc_store
]
similarities.sort(key=lambda x: x[1], reverse=True)
relevant_context = similarities[0][0]["text"]
# 6. Generate answer with LLM
template = PromptTemplate(
system="Answer based on the provided context.",
template="Context: {context}\n\nQuestion: {question}"
)
llm = get_llm("openai:gpt-4o-mini")
chain = template | llm
answer = chain({"context": relevant_context, "question": query})
print(answer)Create your own chunking strategy:
from genai_forge import BaseChunker, Chunk
from genai_forge.chunkers.registry import register_chunker
@register_chunker("custom")
class CustomChunker(BaseChunker):
def chunk(self, text: str, **kwargs):
# Your custom chunking logic
paragraphs = text.split("\n\n")
chunks = []
for i, para in enumerate(paragraphs):
chunk = self._create_chunk(
text=para,
start_index=0,
end_index=len(para),
chunk_index=i
)
chunks.append(chunk)
return chunks
# Use it
chunker = get_chunker("custom")
chunks = chunker.chunk(text)genai-forge also supports embedding models for generating vector representations of text.
from genai_forge import get_embedding
# Create an embedding model
embedding = get_embedding("openai:text-embedding-3-small")
# Embed a single text
vector = embedding("Hello, world!")
print(f"Embedding dimension: {len(vector)}")
# Embed multiple texts
texts = ["First document", "Second document", "Third document"]
vectors = embedding(texts)
print(f"Number of vectors: {len(vectors)}")# Latest V3 models
emb = get_embedding("openai:text-embedding-3-large") # 3072 dimensions
emb = get_embedding("openai:text-embedding-3-small") # 1536 dimensions
# Legacy V2 model
emb = get_embedding("openai:text-embedding-ada-002") # 1536 dimensionsEnvironment Variable: OPENAI_API_KEY
emb = get_embedding("google:text-embedding-004") # 768 dimensions
emb = get_embedding("google:embedding-001") # 768 dimensions (legacy)Environment Variable: GOOGLE_API_KEY
emb = get_embedding("mistral:mistral-embed") # 1024 dimensionsEnvironment Variable: MISTRAL_API_KEY
# Standard models
emb = get_embedding("cohere:embed-english-v3.0") # 1024 dimensions
emb = get_embedding("cohere:embed-multilingual-v3.0") # 1024 dimensions
# Lightweight models
emb = get_embedding("cohere:embed-english-light-v3.0") # 384 dimensions
emb = get_embedding("cohere:embed-multilingual-light-v3.0") # 384 dimensionsEnvironment Variable: COHERE_API_KEY
from genai_forge import get_embedding
import numpy as np
# Initialize embedding model
embedding = get_embedding("openai:text-embedding-3-small")
# Documents to search
documents = [
"Python is a programming language",
"Machine learning uses algorithms",
"Natural language processing analyzes text",
]
# Embed documents
doc_vectors = embedding(documents)
# Query
query = "What is NLP?"
query_vector = embedding(query)
# Compute cosine similarities
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
similarities = [cosine_similarity(query_vector, doc) for doc in doc_vectors]
# Find most similar document
best_idx = np.argmax(similarities)
print(f"Most relevant: {documents[best_idx]}")
print(f"Similarity: {similarities[best_idx]:.4f}")from genai_forge import get_embedding
from sklearn.cluster import KMeans
embedding = get_embedding("openai:text-embedding-3-small")
docs = [
"Python programming tutorial",
"Java development guide",
"Cooking recipes for beginners",
"Advanced Python techniques",
"Italian cuisine recipes",
]
# Get embeddings
vectors = embedding(docs)
# Cluster
kmeans = KMeans(n_clusters=2, random_state=0)
labels = kmeans.fit_predict(vectors)
for doc, label in zip(docs, labels):
print(f"Cluster {label}: {doc}")Compare outputs from different providers on the same prompt:
from genai_forge import get_llm
from prompting_forge.prompting import PromptTemplate
template = PromptTemplate(
system="You are a creative writer.",
template="Write a haiku about {topic}.",
)
providers = [
"openai:gpt-4o-mini",
"anthropic:claude-3-5-haiku-20241022",
"google:gemini-1.5-flash",
"mistral:mistral-small-latest",
"cohere:command-light",
]
context = {"topic": "artificial intelligence"}
for provider_model in providers:
try:
llm = get_llm(provider_model, temperature=0.7)
chain = template | llm
result = chain(context)
print(f"\n{provider_model}:")
print(result)
except Exception as e:
print(f"\n{provider_model}: ERROR - {e}")LLMCall provides advanced features like prompt versioning and call logging:
from genai_forge import get_llm
from genai_forge.llm import LLMCall
from prompting_forge.prompting import PromptTemplate
template = PromptTemplate(
system="You are a helpful assistant.",
template="Explain {concept} in simple terms.",
)
llm = get_llm("openai:gpt-4o-mini")
# Create an LLMCall with versioning
call = LLMCall(
query="Explain the concept clearly",
prompt_template=template,
client=llm,
name="explainer_assistant",
enable_versioning=True, # Saves call records to .llm_call/
)
# Execute
rendered_prompt, response = call.run({"concept": "quantum computing"})
print("Rendered:", rendered_prompt)
print("Response:", response)Call records are saved to .llm_call/{name}/{timestamp}.json with full request/response details.
genai-forge works seamlessly with prompting-forge for prompt versioning and synthesis:
from prompting_forge.prompting import PromptTemplate, FinalPromptTemplate
from genai_forge import get_llm
from genai_forge.llm import LLMCall
# Create versioned prompts
v1 = PromptTemplate(
system="You are a helpful assistant.",
template="Translate: {text}",
instance_name="translator"
)
v2 = PromptTemplate(
system="You are a professional translator.",
template="Translate the following text to {language}:\n{text}",
instance_name="translator"
)
# Synthesize final prompt from versions
llm = get_llm("openai:gpt-4o")
final = FinalPromptTemplate(
instance_name="translator",
variables=["text", "language"],
llm_client=llm
)
# Use final prompt in production
call = LLMCall(
query="Translate this text",
prompt_template=final,
client=llm,
name="production_translator"
)
result = call.run({"text": "Hello, world!", "language": "Spanish"})See the prompting-forge documentation for more details.
If you don't specify a provider, OpenAI is used by default:
llm = get_llm("gpt-4o-mini") # Same as "openai:gpt-4o-mini"Always recommended for clarity:
llm = get_llm("openai:gpt-4o-mini")
llm = get_llm("anthropic:claude-3-5-sonnet-20241022")Pass the API key directly instead of using environment variables:
llm = get_llm(
"anthropic:claude-3-5-haiku-20241022",
api_key="sk-ant-your-key-here",
temperature=0.2
)from genai_forge import get_llm, OutputParserException
try:
llm = get_llm("unknown:model")
except ValueError as e:
print(f"Unknown provider: {e}")
try:
result = parser.parse(invalid_json)
except OutputParserException as e:
print(f"Parsing failed: {e}")Example files are included in the repository:
example.py - LLM and embedding examples:
- Multiple provider usage
- PromptTemplate with system prompts
- PydanticOutputParser for structured outputs
- Embedding models for semantic search
example_chunkers.py - Text chunking for RAG:
- Character, token, sentence, and semantic chunking
- Custom chunker implementation
- Complete RAG pipeline example
Ensure you have a .env with your API keys, then:
python example.py
python example_chunkers.pygenai_forge/
├── __init__.py # Public API
├── llm/ # LLM core
│ ├── base.py # BaseLLM, LLM protocol
│ ├── registry.py # Provider registry & factory
│ └── llm_call.py # LLMCall with versioning
├── embeddings/ # Embedding models
│ ├── base.py # BaseEmbedding, Embedding protocol
│ └── registry.py # Embedding registry & factory
├── chunkers/ # Text chunking for RAG
│ ├── base.py # BaseChunker, Chunker protocol, Chunk
│ ├── registry.py # Chunker registry & factory
│ ├── character_chunker.py # Character-based chunking
│ ├── token_chunker.py # Token-based chunking
│ ├── sentence_chunker.py # Sentence-aware chunking
│ └── semantic_chunker.py # Semantic chunking with embeddings
├── parsing/ # Output parsers
│ └── output_parser.py # PydanticOutputParser
├── providers/ # LLM providers
│ ├── openai.py # OpenAI
│ ├── anthropic.py # Anthropic (Claude)
│ ├── google.py # Google (Gemini)
│ ├── mistral.py # Mistral AI
│ └── cohere.py # Cohere
└── embedding_providers/ # Embedding providers
├── openai_emb.py # OpenAI embeddings
├── google_emb.py # Google embeddings
├── mistral_emb.py # Mistral embeddings
└── cohere_emb.py # Cohere embeddings
See ARCHITECTURE.md for detailed design documentation.
get_llm(model: str, **kwargs) -> LLM
model: Provider and model name (e.g., "openai:gpt-4o-mini")temperature: Sampling temperature (default: 0.3)api_key: Optional API key overrideprovider: Optional explicit provider namelogger: Optional logger instance- Returns: LLM instance (callable)
get_embedding(model: str, **kwargs) -> Embedding
model: Provider and model name (e.g., "openai:text-embedding-3-small")api_key: Optional API key overrideprovider: Optional explicit provider namelogger: Optional logger instance- Returns: Embedding instance (callable that takes text and returns vectors)
get_chunker(name: str, **kwargs) -> Chunker
name: Chunker type ("character", "token", "sentence", "semantic")chunk_size: Target size for each chunkchunk_overlap: Overlap between consecutive chunks- Other chunker-specific parameters
- Returns: Chunker instance (callable that takes text and returns list of Chunks)
PydanticOutputParser(model: Type[T], strict: bool = True)
model: Pydantic model classstrict: Whether to enforce strict validation- Methods:
get_format_instructions() -> strparse(text: str) -> T
LLMCall(query, prompt_template, client, **kwargs)
query: User query stringprompt_template: PromptTemplate instanceclient: LLM instanceoutput_parser: Optional parsername: Instance name for versioningenable_versioning: Save call recordsversion_root: Root directory for versioning- Methods:
run(context: dict) -> tuple[str, Any]
Yes! Each get_llm() call creates an independent LLM instance:
openai_llm = get_llm("openai:gpt-4o-mini")
claude_llm = get_llm("anthropic:claude-3-5-sonnet-20241022")
gemini_llm = get_llm("google:gemini-1.5-pro")No. Only install the providers you need:
pip install genai-forge[anthropic,google] # Only Anthropic and Googlegenai-forge abstracts provider differences. Update the library version, and your code should continue working.
See ARCHITECTURE.md § Extensibility for a guide on implementing custom providers.
Not yet. Async support is planned for a future release.
Contributions are welcome! Areas for improvement:
- Additional providers (Hugging Face, AI21, etc.)
- Async support
- Streaming responses
- Enhanced error handling
- More examples
- ✨ Added multi-provider support: Anthropic, Google, Mistral, Cohere
- 📚 Comprehensive ARCHITECTURE.md documentation
- 🔧 Optional provider dependencies
- 📦 Improved package structure
- 🚀 Initial release with OpenAI support
- ✅ Pydantic output parsing
- 🔗 Chaining with pipe operator
- 📝 Prompt versioning with LLMCall
See LICENSE.
- Repository: github.com/ToolForge-AI/genai-forge
- Issues: github.com/ToolForge-AI/genai-forge/issues
- Related: prompting-forge - Prompt versioning and synthesis