Stateful agent workflow runtime for Spring AI (Java). Orchestrate multi-step LLM calls, tool invocations, and shared typed state in a single graph — without writing orchestration logic.
Independent project — not affiliated with
spring-ai-community/spring-ai-agents, which is the official AgentClient abstraction over CLI agents (Claude Code, Codex, Gemini, etc.). This repository has a different scope: a graph-based runtime for stateful agent workflows on top of Spring AI.
Real-world AI systems quickly move past a single ChatClient call:
- multi-step pipelines (research → analyse → draft → review)
- multiple agents coordinating on one task
- transient errors, rate limits, partial failures
- state that must survive a crash or wait on human input
You end up writing the orchestration, the retries, the resume logic. This
project replaces that scaffolding with a structured runtime — an
AgentGraph for explicit flow, RetryPolicy and CircuitBreakerPolicy
for resilience, durable checkpoints for resume.
Use it when at least one of these is true:
- the agent needs more than one LLM call to finish the job
- the workflow has branches, loops, or human-in-the-loop pauses
- failure recovery (retry, breaker, resume after crash) is in scope
- multiple tools or agents must coordinate around shared, typed state
Don't use it for a single LLM call with a stateless prompt — Spring AI's
ChatClient already covers that cleanly.
// 1. Define specialized agents
ExecutorAgent researcher = ExecutorAgent.builder()
.chatClient(chatClient)
.systemPrompt("You are a research specialist. Find key facts.")
.tools(searchTool, wikipediaTool)
.build();
ExecutorAgent writer = ExecutorAgent.builder()
.chatClient(chatClient)
.systemPrompt("Write a clear report from the research findings.")
.build();
// 2. Let a coordinator route dynamically
CoordinatorAgent coordinator = CoordinatorAgent.builder()
.executors(Map.of("research", researcher, "writing", writer))
.routingStrategy(RoutingStrategy.llmDriven(chatClient))
.build();
// 3. Run
AgentResult result = coordinator.execute(
AgentContext.of("Compare Claude 4 and GPT-5 for enterprise use"));
System.out.println(result.text());No while loops. No manual routing. Just agents that collaborate.
Clone and run any of the 3 examples. They use simulated agents, no API key required.
git clone https://github.com/datallmhub/spring-agent-flow.git
cd spring-agent-flow
mvn install -DskipTests -q
mvn -pl spring-agent-flow-samples exec:java # runs MultiAgentCoordination by defaultOutput:
=== Multi-Agent Coordination ===
Request: "Research the latest advances in quantum computing"
[router] Routing to: research
[research] Searching for: quantum computing
Result: Found 3 papers on quantum error correction (2026).
Key finding: logical qubit fidelity reached 99.8%.
---
Request: "Write a blog post about the findings"
[router] Routing to: writing
[writing] Generating content...
Result: # Quantum Computing in 2026
Logical qubit fidelity has reached 99.8%, marking a milestone...
All 3 samples:
| Example | What it shows | Run with |
|---|---|---|
MultiAgentCoordination (default) |
Squad API + routing | mvn -pl spring-agent-flow-samples exec:java |
MinimalPipeline |
Two-step graph | -Dexec.mainClass="...samples.MinimalPipeline" |
AdvancedGraphDemo |
Conditional loops, state, listener | -Dexec.mainClass="...samples.AdvancedGraphDemo" |
While this framework handles complex logic routing, executing graphs against live APIs (Mistral, OpenAI, Anthropic) requires managing their physical limitations. spring-agent-flow incorporates safety features to counteract these common pitfalls:
1. Strict Message Alternation (Safe Defaults)
Some providers (like Mistral) will crash with HTTP 400 Bad Request if your prompt ends with an AssistantMessage or contains consecutive identical roles. During node chaining (A → B → C), ExecutorAgent uses built-in Safe Message Ordering to transparently interleave a padding UserMessage when required to prevent structure violations.
2. Rate Limiting & Transient Errors
Agentic graphs generate requests instantly. A 3-step pipeline fires 3 requests in under 2 seconds. Free-tier architectures (typically 1 request/sec) will fail with 429 Too Many Requests or 503 Service Unavailable.
Solution: Your components should rely on .errorPolicy(ErrorPolicy.RETRY_ONCE) mapped to Spring AI's automatic Retry mechanisms, or manually interleave non-blocking delays in intensive ReAct loops.
3. Non-deterministic Routing
A conditional edge verifying result.text().contains("APPROVED") can break if the agent suddenly decides to output "Document is approved" instead of exact text. Always implement fuzzy matching or use LLM structured JSON output mappings for robust conditional routing rules.
Spring AI is perfect for single-step interactions.
As soon as your system requires:
- multiple steps
- retries and recovery
- coordination between components
you start writing orchestration code. This project removes that layer.
| Spring AI | spring-agent-flow (built on top) |
|---|---|
Low-level primitives (ChatClient, tools, advisors) |
Structured agent runtime (Agent, AgentGraph, Squad) |
| Code-driven orchestration (flexible, unstructured) | Graph-based orchestration with explicit edges & cycles |
| No built-in durable state | Typed shared state (StateKey<T>) + JDBC/Redis checkpoints |
| Error handling left to application code | RetryPolicy (exp. backoff + jitter) and CircuitBreakerPolicy, per-node |
| No native interrupt/resume | First-class interrupted(...) + graph.resume(runId, ...) |
| Observability via app-level integration | Built-in Micrometer metrics through the starter |
Requirements: Java 17+, Spring Boot 3.x, Spring AI 1.0+ (only when using ExecutorAgent or the starter).
Distributed via JitPack.
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
<dependency>
<groupId>com.github.datallmhub.spring-agent-flow</groupId>
<artifactId>spring-agent-flow-starter</artifactId>
<version>v0.5.0</version>
</dependency>repositories {
maven { url 'https://jitpack.io' }
}
dependencies {
implementation 'com.github.datallmhub.spring-agent-flow:spring-agent-flow-starter:v0.5.0'
}The starter pulls everything you typically need. Pick individual modules when you want a smaller footprint or run outside Spring Boot:
| Module | Use case |
|---|---|
spring-agent-flow-starter |
Spring Boot auto-config, properties, Micrometer listener |
spring-agent-flow-core |
Minimal API (Agent, AgentContext, StateKey, AgentResult) |
spring-agent-flow-graph |
AgentGraph, RetryPolicy, CircuitBreakerPolicy SPI, checkpoint contract |
spring-agent-flow-squad |
CoordinatorAgent, ExecutorAgent, ReActAgent, ParallelAgent, RoutingStrategy |
spring-agent-flow-checkpoint |
JdbcCheckpointStore, RedisCheckpointStore, Jackson codec |
spring-agent-flow-resilience4j |
CircuitBreakerPolicy adapter backed by Resilience4j |
spring-agent-flow-cli-agents |
CliAgentNode — runs Claude Code / Codex / Gemini CLI agents as graph nodes |
spring-agent-flow-test |
MockAgent, TestGraph for unit-testing graphs |
The starter auto-configures everything. Minimal application.yml:
spring:
ai:
agents:
enabled: true
default-error-policy: RETRY_ONCE
observability:
metrics: trueThe default. A CoordinatorAgent routes to ExecutorAgents. You focus on the agents, not the plumbing.
CoordinatorAgent coordinator = CoordinatorAgent.builder()
.executors(Map.of(
"research", researchExecutor,
"analysis", analysisExecutor,
"writing", writingExecutor
))
.routingStrategy(RoutingStrategy.llmDriven(chatClient))
.build();
AgentResult result = coordinator.execute(AgentContext.of("..."));For explicit sequencing, conditional branching, or cycles (ReAct loops):
AgentGraph graph = AgentGraph.builder()
.addNode("research", researcher)
.addNode("analyze", analyzer)
.addNode("write", writer)
.addEdge("research", "analyze")
.addEdge(Edge.conditional("analyze",
ctx -> ctx.get(CONFIDENCE).doubleValue() < 0.7,
"research")) // loop back
.addEdge("analyze", "write") // fallback: forward
.errorPolicy(ErrorPolicy.RETRY_ONCE)
.build();
AgentResult result = graph.invoke(AgentContext.of("..."));Every agent supports Flux<AgentEvent> out of the box:
graph.invokeStream(AgentContext.of("hello"))
.subscribe(event -> {
switch (event) {
case AgentEvent.Token t -> System.out.print(t.chunk());
case AgentEvent.NodeTransition x -> System.out.println("\n--> " + x.to());
case AgentEvent.Completed c -> System.out.println("\n[done]");
default -> {}
}
});// Declare keys with types — compile-time safety
StateKey<Double> CONFIDENCE = StateKey.of("confidence", Double.class);
StateKey<String> SUMMARY = StateKey.of("summary", String.class);
// Use them anywhere
AgentContext ctx = context.with(CONFIDENCE, 0.85);
double score = ctx.get(CONFIDENCE); // no cast needed┌─────────────────────────────────────────────────┐
│ Your Application │
├─────────────────────────────────────────────────┤
│ spring-agent-flow-starter (auto-config, metrics) │
├──────────────┬──────────────────┬───────────────┤
│ checkpoint │ resilience4j │ │
│ JDBC, Redis │ CircuitBreaker │ │
├──────────────┴──────────────────┴───────────────┤
│ squad │ graph │
│ CoordinatorAgent │ AgentGraph (runtime) │
│ ExecutorAgent │ Node, Edge │
│ RoutingStrategy │ RetryPolicy, CB SPI │
│ ReActAgent │ CheckpointStore │
│ ParallelAgent │ ErrorPolicy │
├──────────────────────┴──────────────────────────┤
│ spring-agent-flow-core │
│ Agent, AgentContext, AgentResult, AgentEvent │
│ StateKey, StateBag │
├─────────────────────────────────────────────────┤
│ Spring AI │
│ ChatClient, ToolCallbacks │
└─────────────────────────────────────────────────┘
Three layers compose: error policy (what to do on failure), retry (how to recover from transient failure), circuit breaker (when to stop trying).
AgentGraph.builder()
.errorPolicy(ErrorPolicy.FAIL_FAST) // or RETRY_ONCE / SKIP_NODE
.retryPolicy(RetryPolicy.exponential(3, Duration.ofMillis(200)))
.addNode("llm", flakyAgent,
RetryPolicy.exponential(5, Duration.ofMillis(500)), // per-node override
new Resilience4jCircuitBreakerPolicy(registry)) // per-node breaker
.build();RetryPolicy— bounded jitter[cap*(1-f), cap]so retries never exceedmaxDelay; per-node override wins over the graph default.CircuitBreakerPolicy— SPI lives in the graph module (R4j-free).spring-agent-flow-resilience4jships an adapter backed by Resilience4j'sCircuitBreakerRegistry(per-node breakers) or a shared breaker for aggregate counting.ErrorPolicy.SKIP_NODE— propagate past a blown breaker when the rest of the graph can still make useful progress.
See resilient-typed-executor.md and circuit-breaker.md.
When a MeterRegistry is available, the starter registers metrics automatically:
| Metric | Tags | Description |
|---|---|---|
agents.execution.count |
agent, graph, status |
Per-node execution count |
agents.execution.duration |
agent, graph |
Per-node execution time |
agents.graph.transitions |
graph, from, to |
Node-to-node transitions |
agents.execution.errors |
agent, graph, cause |
Error count by type |
Custom instrumentation via AgentListener:
AgentGraph.builder()
.listener(new AgentListener() {
@Override
public void onNodeExit(String g, String node, AgentResult r, long ns) {
log.info("{} completed in {}ms", node, ns / 1_000_000);
}
})- ReAct loop — self-correcting agent with observation/action cycles
- Supervisor pattern — coordinator re-routes until done
- Parallel executors — fan-out/fan-in
- Subgraphs — plug a graph in as a node
- Human-in-the-loop — interrupt, wait for human input, resume
- Durable runs — JDBC or Redis checkpoint store, resume after crash
- Resilient typed executor — tool audit + typed output + retry
- Circuit breaker — trip upstream calls with Resilience4j
MockAgent mock = MockAgent.builder()
.thenReturn("First response")
.thenReturn("Second response")
.build();
TestGraph.Trace trace = TestGraph.trace(
AgentGraph.builder()
.addNode("a", mock)
.addNode("b", MockAgent.returning("done"))
.addEdge("a", "b"));
AgentResult result = trace.invoke(AgentContext.of("test"));
assertThat(trace.visitedInOrder("a", "b")).isTrue();
assertThat(result.text()).isEqualTo("done");- Java 17+
- Spring Boot 3.x
- Spring AI 1.0+
| Version | Focus |
|---|---|
| 0.4 (current) | Subgraphs, parallel fan-out, cancellation, typed output, RetryPolicy, CircuitBreakerPolicy, JDBC/Redis checkpoint store |
| 1.0 | API stabilization, documentation, community feedback |
| 1.1 | Crew roles (CrewAI-inspired), auto-config for checkpoint backends |
| 2.0 | OpenTelemetry tracing, MCP integration, Agent-as-Tool |
Contributions welcome! Please see CONTRIBUTING.md for guidelines.
This project follows the Apache 2.0 License.
- LangGraph — graph-based orchestration
- CrewAI — role-based agent teams
- AWS Strands — agent patterns for Java
- Spring AI — the foundation we build on