IteraTūn

Iterate. Tune. Ship better AI.

IteraTūn is an open-source prompt version control and A/B testing platform for AI applications.
It gives your prompts the same discipline that Git gives your code.

The Problem

Every developer building AI applications has this problem — and almost nobody talks about it.

You write a prompt, it works okay, you tweak it, maybe it gets better, maybe worse.
You have no idea. You just keep going.

Week 1:  "Summarize this in 3 bullet points."          ← worked great
Week 2:  "Summarize this in 5 bullet points."          ← seemed fine?
Week 3:  "Summarize this briefly, formal tone."        ← users complaining now
Week 4:  ...you can't even remember what the old prompt was

There is no history. No comparison. No data. No way to go back.

Your codebase is version controlled. Your database is backed up. Your prompts?
Hardcoded in a .env file somewhere, changed on a whim, lost forever.

What IteraTūn Solves

Problem	IteraTūn Solution
No prompt history	Every change is saved as a new version
No way to compare prompts	Side-by-side diff between any two versions
Prompt changes are guesswork	A/B test variants with real traffic
No performance data	Automated scoring via LLM-as-judge
Prompts hardcoded in codebase	Fetch prompts via API at runtime
Can't rollback a bad prompt	One-click rollback to any previous version

How It Works

1. You push a prompt to IteraTūn

from iteratun import IteraTun

client = IteraTun(api_key="your-api-key")

# Save a new version of your prompt
client.push(
    name="summarize-document",
    content="Summarize the following text in 3 bullet points.",
    tags=["summarization", "v1"]
)

2. Your app fetches the prompt at runtime

# Instead of hardcoding prompts in your codebase
prompt = client.get("summarize-document", version="latest")

# Use it normally with any LLM
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": f"{prompt}\n\n{document}"}]
)

3. IteraTūn tracks everything automatically

Every time your app calls .get(), IteraTūn logs:

Which version was served
The response received
Latency and token count
Automated quality score

4. You run an A/B test when you're unsure

# Not sure if v1 or v2 is better? Split the traffic.
client.ab_test(
    name="summarize-document",
    variants=["v1", "v2"],
    split=[50, 50]   # 50% traffic each
)

IteraTūn handles the routing. After enough samples, check the dashboard — the data tells you which prompt wins.

5. Rollback instantly if something breaks

# Bad prompt in production? Roll back in one line.
client.rollback("summarize-document", to_version="v1")

Architecture

┌─────────────────────────────────────────────────────────┐
│                     Your AI App                         │
│              (uses IteraTūn Python SDK)                 │
└──────────────────────────┬──────────────────────────────┘
                           │ push / get / rollback
                           ▼
┌─────────────────────────────────────────────────────────┐
│                  IteraTūn API (FastAPI)                  │
│                                                         │
│   ┌─────────────┐   ┌────────────┐   ┌──────────────┐  │
│   │   Prompt    │   │  A/B Test  │   │   Version    │  │
│   │  Registry   │   │   Engine   │   │   Control    │  │
│   └──────┬──────┘   └─────┬──────┘   └──────┬───────┘  │
│          │                │                  │          │
│          └────────────────▼──────────────────┘          │
│                           │                             │
│                    ┌──────▼──────┐                      │
│                    │    Kafka    │                      │
│                    │ (async eval │                      │
│                    │   queue)    │                      │
│                    └──────┬──────┘                      │
│                           │                             │
│              ┌────────────▼────────────┐                │
│              │      Eval Engine        │                │
│              │  (LLM-as-judge scores   │                │
│              │   every response)       │                │
│              └────────────┬────────────┘                │
└───────────────────────────┼─────────────────────────────┘
                            │
          ┌─────────────────┼──────────────────┐
          │                 │                  │
   ┌──────▼──────┐  ┌───────▼──────┐  ┌───────▼──────┐
   │ PostgreSQL  │  │    Redis     │  │   Dashboard  │
   │ (versions,  │  │  (sessions,  │  │  (WebSocket  │
   │  scores,    │  │   caching,   │  │   live view) │
   │  A/B data)  │  │  rate limit) │  │              │
   └─────────────┘  └──────────────┘  └──────────────┘

Tech Stack

Layer	Technology	Purpose
API	FastAPI	Core REST API
SDK	Python Package	`pip install iteratun`
Queue	Apache Kafka	Async evaluation pipeline
Database	PostgreSQL	Prompts, versions, scores, A/B data
Cache	Redis	Sessions, rate limiting, fast lookups
Auth	OAuth2 + JWT	GitHub / Google login, API keys
Realtime	WebSockets	Live dashboard updates
Deploy	Docker + Kubernetes	Scalable microservice deployment

Key Features

🗂️ Version Control

Every prompt change creates an immutable version
Full diff view between any two versions
Timestamped history with author and changelog
One-click rollback to any previous version

⚡ A/B Testing

Split traffic between prompt variants by percentage
Automatic traffic routing via SDK
Statistical significance tracking
Winner detection with configurable thresholds

🤖 Automated Evaluation

LLM-as-judge scores every response automatically
Scores on: relevance, instruction following, tone consistency, length
Scores aggregated per prompt version over time
Async via Kafka — zero latency impact on your app

📊 Dashboard

Live view of prompt performance via WebSockets
A/B test progress and winner stats
Version history timeline
Per-tag and per-model breakdown

🔑 Multi-tenant & Secure

Project-based isolation
API key management per project
OAuth2 login via GitHub / Google
Rate limiting per user and per project

API Reference (Quick Look)

POST   /api/prompts                     → create a new prompt
POST   /api/prompts/{name}/versions     → push a new version
GET    /api/prompts/{name}              → get latest version
GET    /api/prompts/{name}/versions     → list all versions
GET    /api/prompts/{name}/diff         → diff between two versions
POST   /api/prompts/{name}/rollback     → rollback to a version
POST   /api/prompts/{name}/ab-test      → start an A/B test
GET    /api/prompts/{name}/ab-test      → get A/B test results
GET    /api/prompts/{name}/scores       → get evaluation scores over time

Getting Started

Run Locally

# Clone the repo
git clone https://github.com/AditHash/iteratun
cd iteratun

# Start all services
docker-compose up -d

# API is live at
http://localhost:8000

# Dashboard at
http://localhost:3000

Install the SDK

pip install iteratun

Quick Start

from iteratun import IteraTun

client = IteraTun(api_key="your-api-key")

# Push your first prompt
client.push(
    name="my-first-prompt",
    content="You are a helpful assistant. Answer clearly and concisely."
)

# Fetch and use it
prompt = client.get("my-first-prompt")
print(prompt.content)

Roadmap

Why IteraTūn?

The name says it all.

Itera — you iterate over prompt versions, testing and refining.
Tūn — you tune prompts to peak performance using real data.

Every serious AI team does prompt iteration. Almost none of them do it with any discipline or data.
IteraTūn changes that.

Contributing

IteraTūn is open source and contributions are welcome.

# Fork the repo, create a branch
git checkout -b feature/your-feature

# Make your changes, then open a PR

Please read CONTRIBUTING.md before submitting a pull request.

License

MIT License — free to use, modify, and distribute.

Built with ❤️ by AditHash

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.claude		.claude
backend		backend
frontend		frontend
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
ISSUES.md		ISSUES.md
README.md		README.md
STATUS.md		STATUS.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IteraTūn

The Problem

What IteraTūn Solves

How It Works

1. You push a prompt to IteraTūn

2. Your app fetches the prompt at runtime

3. IteraTūn tracks everything automatically

4. You run an A/B test when you're unsure

5. Rollback instantly if something breaks

Architecture

Tech Stack

Key Features

🗂️ Version Control

⚡ A/B Testing

🤖 Automated Evaluation

📊 Dashboard

🔑 Multi-tenant & Secure

API Reference (Quick Look)

Getting Started

Run Locally

Install the SDK

Quick Start

Roadmap

Why IteraTūn?

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IteraTūn

The Problem

What IteraTūn Solves

How It Works

1. You push a prompt to IteraTūn

2. Your app fetches the prompt at runtime

3. IteraTūn tracks everything automatically

4. You run an A/B test when you're unsure

5. Rollback instantly if something breaks

Architecture

Tech Stack

Key Features

🗂️ Version Control

⚡ A/B Testing

🤖 Automated Evaluation

📊 Dashboard

🔑 Multi-tenant & Secure

API Reference (Quick Look)

Getting Started

Run Locally

Install the SDK

Quick Start

Roadmap

Why IteraTūn?

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages