Skip to content

Open-source toolkit for running reproducible, offline-first simulations of AI agents against dynamic scenarios

License

Notifications You must be signed in to change notification settings

Fluxloop-AI/fluxloop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

FluxLoop Logo

FluxLoop OSS

Status License SDK PyPI CLI PyPI

Agentic Testing for AI Agents

"Hey, test my agent for order cancellation with angry customers"
β†’ FluxLoop handles the rest: setup, CLI install, synthesis, execution, and analysis.

πŸ€– Agent-First Workflow

Your coding agent (Claude Code) orchestrates the entire testing flow. Just describe what you want to testβ€”FluxLoop does the heavy lifting.

🎯 Simulate at Scale

Run thousands of realistic multi-turn scenarios in parallel. Find edge cases before production.

πŸ“Š Align to Your Standards

Capture your implicit decision criteria. Turn intuition into automated evaluation.


Getting Started

⭐ Claude Code Plugin (Recommended)

Install the plugin, then just talk.

/plugin install Fluxloop-AI/fluxloop-claude-plugin

That's it. Now say:

"test my agent for refund scenarios"

The Agent Test Skill handles everything:

  • βœ… Installs FluxLoop CLI (if needed)
  • βœ… Logs you in
  • βœ… Creates project/scenario
  • βœ… Synthesizes test inputs
  • βœ… Runs simulations
  • βœ… Analyzes results and suggests fixes

No commands to memorize. No manual setup. Just ask.

Example Conversation

User: "Test my chatbot for refund scenarios with frustrated customers"

Agent: Let me set up FluxLoop and run tests...

       βœ“ FluxLoop CLI installed
       βœ“ Logged in
       βœ“ Project created
       βœ“ 10 test inputs synthesized (40% hard cases)
       βœ“ Running simulation...
       
       πŸ“Š Results: 8/10 passed (80%)
       ⚠️ Failed on edge case: customer requesting partial refund
       
       πŸ’‘ Suggested fix: Add handling for partial refund requests
       
       Would you like me to analyze the failures in detail?

πŸ“– Documentation: docs.fluxloop.ai/claude-code


πŸ“¦ Packages

1. Claude Code Plugin ⭐

The primary way to use FluxLoop. Your coding agent orchestrates the entire testing workflow through natural conversation.

Feature Description
Agent Test Skill Auto-activates on "test my agent", handles everything
Zero Config Skill installs CLI, logs in, creates projects automatically
Context-Aware Knows your setup state, guides you through missing steps

πŸ“– Location: packages/fluxloop-plugin/
πŸ“– Docs: docs.fluxloop.ai/claude-code

2. CLI

For power users and CI/CD pipelines. Direct command-line control when you need it.

pip install fluxloop-cli
fluxloop test --scenario my-test

πŸ“– Docs: docs.fluxloop.ai/cli
πŸ“¦ PyPI: fluxloop-cli

3. SDK (Python 3.11+)

Core instrumentation library. Add @fluxloop.agent() decorator to trace agent execution.

import fluxloop

@fluxloop.agent()
def my_agent(input: str) -> str:
    # Your agent logic
    return response

πŸ“– Docs: docs.fluxloop.ai/sdk
πŸ“¦ PyPI: fluxloop


Key Features

πŸ€– Agentic Testing with Claude Code

Just talk naturally:

"Test my order-bot for cancellation scenarios"
"Generate edge cases for payment failures"
"Why did the last test fail?"

The skill understands context and adapts to your state.

🎯 Simple Instrumentation

Works with any Python agent framework:

@fluxloop.agent()
def my_agent(input: str) -> str:
    # LangChain, LlamaIndex, customβ€”anything works
    return response

πŸ“Š Evaluation-First Testing

Define criteria, run reproducible experiments, get actionable insights.

πŸ§ͺ Offline-First Simulation

Run experiments locally with full control. No cloud dependency for testing.


☁️ Seamless Web Integration

FluxLoop combines local execution with cloud intelligence for a powerful testing workflow.

1. Cloud-Powered Synthesis

When you say "generate edge cases", FluxLoop Web synthesizes realistic, diverse test data using advanced LLMs. This data is instantly synced to your local environment for testing.

2. Deep Evaluation & Analysis

Test results are automatically uploaded to alpha.app.fluxloop.ai for deep inspection:

  • πŸ•΅οΈ Trace Analysis: Step-by-step debugging of agent conversations
  • πŸ“ˆ Performance Metrics: Success rates, latency, token usage trends
  • βš–οΈ Comparison: Side-by-side view of how recent changes affected behavior

3. The Perfect Loop

  1. You: "Test my agent" (Claude Code)
  2. Web: Generates test scenarios (Cloud)
  3. CLI: Runs tests locally (Local)
  4. Web: Analyzes results (Cloud)
  5. You: Review summary in IDE & detailed report on Web

What You Can Do

Capability How
πŸ€– Conversational Testing "test my agent with angry customers"
🎯 Instrument Agents @fluxloop.agent() decorator
πŸ“ Synthesize Inputs Skill generates realistic test data
πŸ§ͺ Run Simulations Batch experiments with parallel execution
πŸ’¬ Multi-Turn Conversations Auto-extend into dialogues
πŸ“Š Analyze Results Get insights and fix suggestions

Links

Resource URL
FluxLoop Web alpha.app.fluxloop.ai
Documentation docs.fluxloop.ai
Claude Code Plugin docs.fluxloop.ai/claude-code
CLI Docs docs.fluxloop.ai/cli
SDK Docs docs.fluxloop.ai/sdk

🀝 Why Contribute?

We're building the future of AI agent testingβ€”where your coding agent tests your AI agents.

  • Improve agentic workflows: Make the Claude Code skill smarter
  • Build framework adapters: LangChain, LlamaIndex, CrewAI
  • Enhance synthesis: Better intent-to-input generation
  • Develop evaluation methods: Novel agent performance metrics

Check out our contribution guide and open issues.


🚨 Community & Support


πŸ“„ License

FluxLoop is licensed under the Apache License 2.0.

About

Open-source toolkit for running reproducible, offline-first simulations of AI agents against dynamic scenarios

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published