Quartermaster

Modular AI agent orchestration framework. Build agent workflows as directed graphs, wire them with a fluent Python API, and run them with any LLM provider.

Built by MindMade in Slovenia.

Install

# Everything (recommended)
pip install quartermaster-sdk

# With a specific LLM provider
pip install quartermaster-sdk[openai]
pip install quartermaster-sdk[anthropic]

# From source (for development or running examples)
git clone https://github.com/MindMadeLab/quartermaster-sdk-py.git
cd quartermaster-sdk-py
uv sync

Quick Start

The simplest possible graph — running against a local Ollama in four lines (no .start(), no .end(), no .build(), no FlowRunner import):

import quartermaster_sdk as qm

qm.configure(provider="ollama", base_url="http://localhost:11434", default_model="gemma4:26b")

result = qm.run(qm.Graph("chat").user().agent(), "Pozdravljen, koliko je ura?")
print(result.text)

qm.run() accepts the builder directly and finalises it internally — .build() is only needed when you want the validated GraphSpec for serialisation or inspection. For single-shot calls skip the graph entirely:

reply = qm.instruction(system="Respond in Slovenian.", user="Pozdravljen!")
# reply is a str.

For typed JSON extraction:

from pydantic import BaseModel

class Classification(BaseModel):
    category: str
    priority: str

data = qm.instruction_form(Classification, system="Classify.", user=email_body)
# data is a Classification instance.

For richer flows you keep the explicit per-node configuration:

agent = (
    qm.Graph("My Agent")
    .user("What can I help you with?")
    .instruction("Respond", model="gpt-4o", system_instruction="You are a helpful assistant.")
)
result = qm.run(agent, "...")

Reading specific node outputs with `capture_as=`

Attach a name to any node and read its output from result.captures:

graph = (
    qm.Graph("enrich")
    .agent("Research", tools=[...], capture_as="notes")
    .instruction_form(CustomerData, system="Extract.", capture_as="data")
)
result = qm.run(graph, "VT-Treyd Slovenija")
result["notes"].output_text    # agent's free-text research
result["data"].output_text     # form-parsed JSON

Streaming with filtered iterators

# Typewriter effect -- just the model tokens as they arrive
for token in qm.run.stream(graph, "Hello!").tokens():
    print(token, end="", flush=True)

# Dashboard view -- only the tool-call events
for call in qm.run.stream(graph, "Research Slovenia").tool_calls():
    print(f"[TOOL] {call.tool}({call.args})")

# Live progress cards -- filter by custom event name
for evt in qm.run.stream(graph, "Run the pipeline").custom(name="source_found"):
    ui.add_source(evt.payload["url"])

The raw for chunk in qm.run.stream(...) loop still works unchanged when you want every chunk type in one place. Streams are single-pass -- pick one consumer (.tokens(), .tool_calls(), .progress(), .custom(), or raw iteration) per stream.

Post-mortem `Result.trace`

After a synchronous run (or after draining a stream to its DoneChunk), result.trace exposes a structured view of every FlowEvent the engine emitted:

result = qm.run(graph, "Hello!")

result.trace.text                       # concatenated model output
result.trace.tool_calls                 # list[dict] across every agent node
result.trace.progress                   # list[ProgressEvent]
result.trace.custom(name="source_found")  # filtered CustomEvent list
result.trace.by_node["Researcher"].text   # tokens for a single node
print(result.trace.as_jsonl())           # JSONL export for logs / fixtures

Decision Routing

The LLM classifies input and picks ONE branch. No merge needed.

agent = (
    Graph("Router")
    .user("Describe your issue")
    .instruction("Classify", system_instruction="Classify as: Technical or General.")
    .decision("Category?", options=["Technical", "General"])
    .on("Technical")
        .instruction("Tech response", system_instruction="Give a technical answer.")
    .end()
    .on("General")
        .instruction("General response", system_instruction="Give a general answer.")
    .end()
    .end()
)

Parallel Execution

All branches run concurrently, then merge.

agent = (
    Graph("Code Review")
    .user("Paste your code")
    .parallel()
    .branch()
        .instruction("Security audit", system_instruction="Check for vulnerabilities.")
    .end()
    .branch()
        .instruction("Performance check", system_instruction="Check for performance issues.")
    .end()
    .static_merge("Collect results")
    .instruction("Final report", system_instruction="Combine all findings.")
    .end()
)

User Forms and Templates

agent = (
    Graph("Registration")
    .user("Welcome!")
    .user_form("Details", parameters=[
        {"name": "full_name", "type": "text", "label": "Name", "required": "true"},
        {"name": "email",     "type": "email", "label": "Email", "required": "true"},
    ])
    .var("Capture name", variable="name", expression="full_name")
    .text("Confirm", template="Thanks {{full_name}}, we'll email {{email}} with details.")
    .end()
)

Custom Tools with @tool()

from quartermaster_tools import tool

@tool()
def get_weather(city: str, units: str = "celsius") -> dict:
    """Get current weather for a city.

    Args:
        city: The city name to look up.
        units: Temperature units (celsius or fahrenheit).
    """
    return {"city": city, "temperature": 22, "units": units}

# Call it directly
result = get_weather(city="Amsterdam")

# Export JSON Schema for LLM function calling
schema = get_weather.info().to_input_schema()

# Or register in a ToolRegistry and export all at once
from quartermaster_tools import ToolRegistry

registry = ToolRegistry()
registry.register(get_weather)
schemas = registry.to_json_schema()

See examples/ for runnable examples covering every pattern.

Running Your Graph

from quartermaster_engine import run_graph

# Run — each node uses the provider/model it declares
run_graph(agent, user_input="What is quantum computing?")

# Interactive mode — pauses at User nodes and prompts stdin
run_graph(agent)  # no user_input = interactive

Nodes declare their own provider and model:

.instruction("Respond", model="claude-haiku-4-5-20251001", provider="anthropic", ...)
.instruction("Fast reply", model="llama-3.3-70b-versatile", provider="groq", ...)
.instruction("Local", model="gemma4:26b", provider="ollama", ...)

Set up your API keys in a .env file at the project root:

ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GROQ_API_KEY=gsk_...
XAI_API_KEY=xai-...

Output streams token-by-token in real time. Use show_output=False on nodes to hide internal steps (variables, conditions) from the output.

Packages

Package	Description
`quartermaster-sdk`	Meta-package -- installs all core packages
`quartermaster-graph`	Graph schema, fluent builder API, validation
`quartermaster-providers`	LLM provider abstraction (OpenAI, Anthropic, Google, Groq, Ollama, vLLM)
`quartermaster-tools`	Tool definition, registry, built-in tools
`quartermaster-nodes`	Node execution protocols and 40+ node implementations
`quartermaster-engine`	Flow execution, traversal, memory, streaming
`quartermaster-mcp-client`	MCP protocol client -- standalone, no framework dependency
`quartermaster-code-runner`	Docker sandboxed code execution -- standalone FastAPI service

Architecture

Your Application
       |
       v
quartermaster-engine        Flow execution, traversal, streaming
  |         |         |
  v         v         v
graph     nodes     tools   Schema/builder, node executors, tool registry
            |
            v
         providers          OpenAI, Anthropic, Google, Groq, Ollama, vLLM, ...

quartermaster-mcp-client    Standalone MCP protocol client
quartermaster-code-runner   Standalone Docker code execution

Key Concepts

Graph -- A directed graph (supports cycles via connect() for loops) of nodes and edges. Built with the fluent Graph("name").user("Input")...end() API (Start node auto-inserted).
GraphSpec -- The serializable graph model (GraphSpec in quartermaster-graph). qm.run(graph, ...) finalises the builder for you; explicit Graph.build() only matters when you want the validated spec to serialise / inspect. AgentGraph remains as a deprecated backward-compat alias.
User Node -- Every graph typically begins with .user() to collect user input (Start node is auto-inserted).
Nodes -- Units of work: LLM calls, decisions, user input, memory, tools, templates.
Edges -- Directed connections between nodes. Decision/IF/Switch edges carry labels.
Thoughts -- Runtime containers that carry text and variables (metadata) between nodes.
Memory -- Flow-scoped persistent storage accessible from any node via write_memory/read_memory.
Providers -- Pluggable LLM backends. Model name auto-resolves to the right provider.
Tools -- @tool() decorator for custom tools, built-in tools, JSON Schema export via tool.info().to_input_schema().
Loops -- connect("Continue", "Start") creates back-edges for iterative flows.
Streaming -- Token-by-token output from LLM nodes in real time.
Multi-provider -- Different LLM providers for different nodes in the same graph.

Branching Rules

Node Type	Behavior	Merge Needed?
`decision()`	LLM picks ONE branch	No
`if_node()`	Boolean expression picks ONE branch	No
`switch()`	Expression picks ONE branch	No
`parallel()`	ALL branches run concurrently	Yes -- use `static_merge()`
`connect()`	Manual edge by node name	Creates loops/cycles

Documentation

Document	Description
Getting Started	Installation and first agent
Graph Building	Builder API, node types, patterns
Architecture	System overview and data flow
Providers	LLM providers including local (Ollama, vLLM)
Tools Catalog	All built-in tools with parameters
Engine	Execution engine internals
Security	Safe eval, sandboxing, API key management
Node Reference	Detailed node documentation by category

Migrating from 0.4 → 0.5

The Ollama transport fork was collapsed. If you were on the v0.4 paths, apply these renames:

Removed (v0.4)	Replacement (v0.5)
`OllamaNativeProvider`	`OllamaProvider` (now inherits from `OpenAICompatibleProvider`)
`OllamaProvider.chat(...)` sync shim	`await provider.generate_native_response(...)`
`ChatResult`	`NativeResponse`
`qm.configure(ollama_tool_protocol="auto"/"native"/"openai_compat")`	removed — tool-name hallucinations are now handled globally by the universal prefix strip
`from quartermaster_providers import ChatResult`	removed from `__all__`
`model_supports_native_tools(...)`	removed

Nothing else is behaviour-breaking. Parallel tool execution is opt-out-free (just emit multiple tool_calls from the model). program_runner(program=<str>) keeps working — the callable form is an addition.

What's new

Release notes live in GitHub Discussions — one thread per release with migration tables and known issues. Highlights per version:

v0.6.0 — legacy cleanup + 7 integrator-requested features

Stream cancellation now actually aborts the in-flight httpx call (vLLM slot freed on SSE disconnect). #68
.agent(extra_body={...}) / .instruction_form(extra_body={...}) — pass-through for Gemma-4's chat_template_kwargs, vLLM sampling knobs. #62
.agent(retry={"max_attempts": N, "on": predicate}) — node-level retry primitive. #67
qm.parse_partial(text, schema) — progressive-degradation parser for structured output. #64
Sliding-window truncation of oldest <tool_result> blocks when accumulated prompt exceeds max_input_tokens. #66
Client-side salvage of text-form <|tool_call|> blocks for mis-configured vLLM / Ollama servers. #63
Cleanup: 35 CamelCaseTool aliases dropped, AgentGraph/AgentVersion/_build_registry/NodeRegistry alias gone, .end(stop=) kwarg removed, lint rules QM002–QM004 pruned. See discussion #70 for full migration table.

v0.5.0 — Ollama transport collapse, parallel tools, callable program_runner

Ollama provider collapsed into a thin subclass of the OpenAI-compatible client. One transport.
Parallel tool execution: multiple tool_calls in one turn dispatch concurrently.
program_runner(program=<callable>) accepts @tool() functions directly.
Universal tool-name prefix strip (default_api:, functions:, mcp:, …) via rsplit(':' or '.').
duckduckgo_search UA fix.

v0.4.0 — timeouts, stream cancellation, per-node tool scoping

Application timeouts via qm.configure(timeout=/connect_timeout=/read_timeout=).
Stream cancellation via with qm.run.stream(...) as stream:.
Per-node tool scoping (agent(tools=[...]) strictly enforced).
Inline @tool callables in agent(tools=[my_func]).
Circuit breaker, session store, typed custom events, static graph linter.

v0.3.0 — filtered streams, live progress, structured trace

Filtered stream iterators: stream.tokens() / .tool_calls() / .progress() / .custom(name=...).
Live progress from tools via qm.current_context().emit_progress(...).
Structured post-mortem Result.trace with per-node breakdowns.
One-line OpenTelemetry instrumentation.

Development

# Clone
git clone https://github.com/MindMadeLab/quartermaster-sdk-py.git
cd quartermaster-sdk-py

# Install everything (uv workspace -- one command)
uv sync

# Run an example
uv run examples/01_hello_agent.py

# Run tests for a single package
uv run pytest quartermaster-graph/tests/

# Run all tests
uv run pytest quartermaster-graph/tests/ quartermaster-tools/tests/ quartermaster-engine/tests/

See CONTRIBUTING.md for the full development guide.

License

Apache 2.0 -- Built by MindMade in Slovenia.

Name		Name	Last commit message	Last commit date
Latest commit History 311 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
quartermaster-code-runner		quartermaster-code-runner
quartermaster-engine		quartermaster-engine
quartermaster-graph		quartermaster-graph
quartermaster-mcp-client		quartermaster-mcp-client
quartermaster-nodes		quartermaster-nodes
quartermaster-providers		quartermaster-providers
quartermaster-sdk		quartermaster-sdk
quartermaster-tools		quartermaster-tools
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RELEASING.md		RELEASING.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quartermaster

Install

Quick Start

Reading specific node outputs with `capture_as=`

Streaming with filtered iterators

Post-mortem `Result.trace`

Decision Routing

Parallel Execution

User Forms and Templates

Custom Tools with @tool()

Running Your Graph

Packages

Architecture

Key Concepts

Branching Rules

Documentation

Migrating from 0.4 → 0.5

What's new

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quartermaster

Install

Quick Start

Reading specific node outputs with capture_as=

Streaming with filtered iterators

Post-mortem Result.trace

Decision Routing

Parallel Execution

User Forms and Templates

Custom Tools with @tool()

Running Your Graph

Packages

Architecture

Key Concepts

Branching Rules

Documentation

Migrating from 0.4 → 0.5

What's new

Development

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Reading specific node outputs with `capture_as=`

Post-mortem `Result.trace`

Packages