Modular AI agent orchestration framework. Build agent workflows as directed graphs, wire them with a fluent Python API, and run them with any LLM provider.
Built by MindMade in Slovenia.
# Everything (recommended)
pip install quartermaster-sdk
# With a specific LLM provider
pip install quartermaster-sdk[openai]
pip install quartermaster-sdk[anthropic]
# From source (for development or running examples)
git clone https://github.com/MindMadeLab/quartermaster-sdk-py.git
cd quartermaster-sdk-py
uv syncThe simplest possible graph — running against a local Ollama in four lines
(no .start(), no .end(), no .build(), no FlowRunner import):
import quartermaster_sdk as qm
qm.configure(provider="ollama", base_url="http://localhost:11434", default_model="gemma4:26b")
result = qm.run(qm.Graph("chat").user().agent(), "Pozdravljen, koliko je ura?")
print(result.text)qm.run() accepts the builder directly and finalises it internally —
.build() is only needed when you want the validated GraphSpec for
serialisation or inspection. For single-shot calls skip the graph entirely:
reply = qm.instruction(system="Respond in Slovenian.", user="Pozdravljen!")
# reply is a str.For typed JSON extraction:
from pydantic import BaseModel
class Classification(BaseModel):
category: str
priority: str
data = qm.instruction_form(Classification, system="Classify.", user=email_body)
# data is a Classification instance.For richer flows you keep the explicit per-node configuration:
agent = (
qm.Graph("My Agent")
.user("What can I help you with?")
.instruction("Respond", model="gpt-4o", system_instruction="You are a helpful assistant.")
)
result = qm.run(agent, "...")Attach a name to any node and read its output from result.captures:
graph = (
qm.Graph("enrich")
.agent("Research", tools=[...], capture_as="notes")
.instruction_form(CustomerData, system="Extract.", capture_as="data")
)
result = qm.run(graph, "VT-Treyd Slovenija")
result["notes"].output_text # agent's free-text research
result["data"].output_text # form-parsed JSON# Typewriter effect -- just the model tokens as they arrive
for token in qm.run.stream(graph, "Hello!").tokens():
print(token, end="", flush=True)
# Dashboard view -- only the tool-call events
for call in qm.run.stream(graph, "Research Slovenia").tool_calls():
print(f"[TOOL] {call.tool}({call.args})")
# Live progress cards -- filter by custom event name
for evt in qm.run.stream(graph, "Run the pipeline").custom(name="source_found"):
ui.add_source(evt.payload["url"])The raw for chunk in qm.run.stream(...) loop still works unchanged when
you want every chunk type in one place. Streams are single-pass -- pick
one consumer (.tokens(), .tool_calls(), .progress(), .custom(),
or raw iteration) per stream.
After a synchronous run (or after draining a stream to its DoneChunk),
result.trace exposes a structured view of every FlowEvent the engine
emitted:
result = qm.run(graph, "Hello!")
result.trace.text # concatenated model output
result.trace.tool_calls # list[dict] across every agent node
result.trace.progress # list[ProgressEvent]
result.trace.custom(name="source_found") # filtered CustomEvent list
result.trace.by_node["Researcher"].text # tokens for a single node
print(result.trace.as_jsonl()) # JSONL export for logs / fixturesThe LLM classifies input and picks ONE branch. No merge needed.
agent = (
Graph("Router")
.user("Describe your issue")
.instruction("Classify", system_instruction="Classify as: Technical or General.")
.decision("Category?", options=["Technical", "General"])
.on("Technical")
.instruction("Tech response", system_instruction="Give a technical answer.")
.end()
.on("General")
.instruction("General response", system_instruction="Give a general answer.")
.end()
.end()
)All branches run concurrently, then merge.
agent = (
Graph("Code Review")
.user("Paste your code")
.parallel()
.branch()
.instruction("Security audit", system_instruction="Check for vulnerabilities.")
.end()
.branch()
.instruction("Performance check", system_instruction="Check for performance issues.")
.end()
.static_merge("Collect results")
.instruction("Final report", system_instruction="Combine all findings.")
.end()
)agent = (
Graph("Registration")
.user("Welcome!")
.user_form("Details", parameters=[
{"name": "full_name", "type": "text", "label": "Name", "required": "true"},
{"name": "email", "type": "email", "label": "Email", "required": "true"},
])
.var("Capture name", variable="name", expression="full_name")
.text("Confirm", template="Thanks {{full_name}}, we'll email {{email}} with details.")
.end()
)from quartermaster_tools import tool
@tool()
def get_weather(city: str, units: str = "celsius") -> dict:
"""Get current weather for a city.
Args:
city: The city name to look up.
units: Temperature units (celsius or fahrenheit).
"""
return {"city": city, "temperature": 22, "units": units}
# Call it directly
result = get_weather(city="Amsterdam")
# Export JSON Schema for LLM function calling
schema = get_weather.info().to_input_schema()
# Or register in a ToolRegistry and export all at once
from quartermaster_tools import ToolRegistry
registry = ToolRegistry()
registry.register(get_weather)
schemas = registry.to_json_schema()See examples/ for runnable examples covering every pattern.
from quartermaster_engine import run_graph
# Run — each node uses the provider/model it declares
run_graph(agent, user_input="What is quantum computing?")
# Interactive mode — pauses at User nodes and prompts stdin
run_graph(agent) # no user_input = interactiveNodes declare their own provider and model:
.instruction("Respond", model="claude-haiku-4-5-20251001", provider="anthropic", ...)
.instruction("Fast reply", model="llama-3.3-70b-versatile", provider="groq", ...)
.instruction("Local", model="gemma4:26b", provider="ollama", ...)Set up your API keys in a .env file at the project root:
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GROQ_API_KEY=gsk_...
XAI_API_KEY=xai-...
Output streams token-by-token in real time. Use show_output=False on nodes
to hide internal steps (variables, conditions) from the output.
| Package | Description |
|---|---|
quartermaster-sdk |
Meta-package -- installs all core packages |
quartermaster-graph |
Graph schema, fluent builder API, validation |
quartermaster-providers |
LLM provider abstraction (OpenAI, Anthropic, Google, Groq, Ollama, vLLM) |
quartermaster-tools |
Tool definition, registry, built-in tools |
quartermaster-nodes |
Node execution protocols and 40+ node implementations |
quartermaster-engine |
Flow execution, traversal, memory, streaming |
quartermaster-mcp-client |
MCP protocol client -- standalone, no framework dependency |
quartermaster-code-runner |
Docker sandboxed code execution -- standalone FastAPI service |
Your Application
|
v
quartermaster-engine Flow execution, traversal, streaming
| | |
v v v
graph nodes tools Schema/builder, node executors, tool registry
|
v
providers OpenAI, Anthropic, Google, Groq, Ollama, vLLM, ...
quartermaster-mcp-client Standalone MCP protocol client
quartermaster-code-runner Standalone Docker code execution
- Graph -- A directed graph (supports cycles via
connect()for loops) of nodes and edges. Built with the fluentGraph("name").user("Input")...end()API (Start node auto-inserted). - GraphSpec -- The serializable graph model (
GraphSpecin quartermaster-graph).qm.run(graph, ...)finalises the builder for you; explicitGraph.build()only matters when you want the validated spec to serialise / inspect.AgentGraphremains as a deprecated backward-compat alias. - User Node -- Every graph typically begins with
.user()to collect user input (Start node is auto-inserted). - Nodes -- Units of work: LLM calls, decisions, user input, memory, tools, templates.
- Edges -- Directed connections between nodes. Decision/IF/Switch edges carry labels.
- Thoughts -- Runtime containers that carry text and variables (metadata) between nodes.
- Memory -- Flow-scoped persistent storage accessible from any node via
write_memory/read_memory. - Providers -- Pluggable LLM backends. Model name auto-resolves to the right provider.
- Tools --
@tool()decorator for custom tools, built-in tools, JSON Schema export viatool.info().to_input_schema(). - Loops --
connect("Continue", "Start")creates back-edges for iterative flows. - Streaming -- Token-by-token output from LLM nodes in real time.
- Multi-provider -- Different LLM providers for different nodes in the same graph.
| Node Type | Behavior | Merge Needed? |
|---|---|---|
decision() |
LLM picks ONE branch | No |
if_node() |
Boolean expression picks ONE branch | No |
switch() |
Expression picks ONE branch | No |
parallel() |
ALL branches run concurrently | Yes -- use static_merge() |
connect() |
Manual edge by node name | Creates loops/cycles |
| Document | Description |
|---|---|
| Getting Started | Installation and first agent |
| Graph Building | Builder API, node types, patterns |
| Architecture | System overview and data flow |
| Providers | LLM providers including local (Ollama, vLLM) |
| Tools Catalog | All built-in tools with parameters |
| Engine | Execution engine internals |
| Security | Safe eval, sandboxing, API key management |
| Node Reference | Detailed node documentation by category |
The Ollama transport fork was collapsed. If you were on the v0.4 paths, apply these renames:
| Removed (v0.4) | Replacement (v0.5) |
|---|---|
OllamaNativeProvider |
OllamaProvider (now inherits from OpenAICompatibleProvider) |
OllamaProvider.chat(...) sync shim |
await provider.generate_native_response(...) |
ChatResult |
NativeResponse |
qm.configure(ollama_tool_protocol="auto"/"native"/"openai_compat") |
removed — tool-name hallucinations are now handled globally by the universal prefix strip |
from quartermaster_providers import ChatResult |
removed from __all__ |
model_supports_native_tools(...) |
removed |
Nothing else is behaviour-breaking. Parallel tool execution is opt-out-free (just emit multiple tool_calls from the model). program_runner(program=<str>) keeps working — the callable form is an addition.
Release notes live in GitHub Discussions — one thread per release with migration tables and known issues. Highlights per version:
v0.6.0 — legacy cleanup + 7 integrator-requested features
- Stream cancellation now actually aborts the in-flight httpx call (vLLM slot freed on SSE disconnect). #68
.agent(extra_body={...})/.instruction_form(extra_body={...})— pass-through for Gemma-4'schat_template_kwargs, vLLM sampling knobs. #62.agent(retry={"max_attempts": N, "on": predicate})— node-level retry primitive. #67qm.parse_partial(text, schema)— progressive-degradation parser for structured output. #64- Sliding-window truncation of oldest
<tool_result>blocks when accumulated prompt exceedsmax_input_tokens. #66 - Client-side salvage of text-form
<|tool_call|>blocks for mis-configured vLLM / Ollama servers. #63 - Cleanup: 35
CamelCaseToolaliases dropped,AgentGraph/AgentVersion/_build_registry/NodeRegistryalias gone,.end(stop=)kwarg removed, lint rules QM002–QM004 pruned. See discussion #70 for full migration table.
v0.5.0 — Ollama transport collapse, parallel tools, callable program_runner
- Ollama provider collapsed into a thin subclass of the OpenAI-compatible client. One transport.
- Parallel tool execution: multiple
tool_callsin one turn dispatch concurrently. program_runner(program=<callable>)accepts@tool()functions directly.- Universal tool-name prefix strip (
default_api:,functions:,mcp:, …) viarsplit(':' or '.'). duckduckgo_searchUA fix.
v0.4.0 — timeouts, stream cancellation, per-node tool scoping
- Application timeouts via
qm.configure(timeout=/connect_timeout=/read_timeout=). - Stream cancellation via
with qm.run.stream(...) as stream:. - Per-node tool scoping (
agent(tools=[...])strictly enforced). - Inline
@toolcallables inagent(tools=[my_func]). - Circuit breaker, session store, typed custom events, static graph linter.
v0.3.0 — filtered streams, live progress, structured trace
- Filtered stream iterators:
stream.tokens()/.tool_calls()/.progress()/.custom(name=...). - Live progress from tools via
qm.current_context().emit_progress(...). - Structured post-mortem
Result.tracewith per-node breakdowns. - One-line OpenTelemetry instrumentation.
# Clone
git clone https://github.com/MindMadeLab/quartermaster-sdk-py.git
cd quartermaster-sdk-py
# Install everything (uv workspace -- one command)
uv sync
# Run an example
uv run examples/01_hello_agent.py
# Run tests for a single package
uv run pytest quartermaster-graph/tests/
# Run all tests
uv run pytest quartermaster-graph/tests/ quartermaster-tools/tests/ quartermaster-engine/tests/See CONTRIBUTING.md for the full development guide.
Apache 2.0 -- Built by MindMade in Slovenia.