Skip to content

AditHash/iteratun

Repository files navigation

IteraTūn

Iterate. Tune. Ship better AI.

IteraTūn is an open-source prompt version control and A/B testing platform for AI applications.
It gives your prompts the same discipline that Git gives your code.


The Problem

Every developer building AI applications has this problem — and almost nobody talks about it.

You write a prompt, it works okay, you tweak it, maybe it gets better, maybe worse.
You have no idea. You just keep going.

Week 1:  "Summarize this in 3 bullet points."          ← worked great
Week 2:  "Summarize this in 5 bullet points."          ← seemed fine?
Week 3:  "Summarize this briefly, formal tone."        ← users complaining now
Week 4:  ...you can't even remember what the old prompt was

There is no history. No comparison. No data. No way to go back.

Your codebase is version controlled. Your database is backed up. Your prompts?
Hardcoded in a .env file somewhere, changed on a whim, lost forever.


What IteraTūn Solves

Problem IteraTūn Solution
No prompt history Every change is saved as a new version
No way to compare prompts Side-by-side diff between any two versions
Prompt changes are guesswork A/B test variants with real traffic
No performance data Automated scoring via LLM-as-judge
Prompts hardcoded in codebase Fetch prompts via API at runtime
Can't rollback a bad prompt One-click rollback to any previous version

How It Works

1. You push a prompt to IteraTūn

from iteratun import IteraTun

client = IteraTun(api_key="your-api-key")

# Save a new version of your prompt
client.push(
    name="summarize-document",
    content="Summarize the following text in 3 bullet points.",
    tags=["summarization", "v1"]
)

2. Your app fetches the prompt at runtime

# Instead of hardcoding prompts in your codebase
prompt = client.get("summarize-document", version="latest")

# Use it normally with any LLM
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": f"{prompt}\n\n{document}"}]
)

3. IteraTūn tracks everything automatically

Every time your app calls .get(), IteraTūn logs:

  • Which version was served
  • The response received
  • Latency and token count
  • Automated quality score

4. You run an A/B test when you're unsure

# Not sure if v1 or v2 is better? Split the traffic.
client.ab_test(
    name="summarize-document",
    variants=["v1", "v2"],
    split=[50, 50]   # 50% traffic each
)

IteraTūn handles the routing. After enough samples, check the dashboard — the data tells you which prompt wins.

5. Rollback instantly if something breaks

# Bad prompt in production? Roll back in one line.
client.rollback("summarize-document", to_version="v1")

Architecture

┌─────────────────────────────────────────────────────────┐
│                     Your AI App                         │
│              (uses IteraTūn Python SDK)                 │
└──────────────────────────┬──────────────────────────────┘
                           │ push / get / rollback
                           ▼
┌─────────────────────────────────────────────────────────┐
│                  IteraTūn API (FastAPI)                  │
│                                                         │
│   ┌─────────────┐   ┌────────────┐   ┌──────────────┐  │
│   │   Prompt    │   │  A/B Test  │   │   Version    │  │
│   │  Registry   │   │   Engine   │   │   Control    │  │
│   └──────┬──────┘   └─────┬──────┘   └──────┬───────┘  │
│          │                │                  │          │
│          └────────────────▼──────────────────┘          │
│                           │                             │
│                    ┌──────▼──────┐                      │
│                    │    Kafka    │                      │
│                    │ (async eval │                      │
│                    │   queue)    │                      │
│                    └──────┬──────┘                      │
│                           │                             │
│              ┌────────────▼────────────┐                │
│              │      Eval Engine        │                │
│              │  (LLM-as-judge scores   │                │
│              │   every response)       │                │
│              └────────────┬────────────┘                │
└───────────────────────────┼─────────────────────────────┘
                            │
          ┌─────────────────┼──────────────────┐
          │                 │                  │
   ┌──────▼──────┐  ┌───────▼──────┐  ┌───────▼──────┐
   │ PostgreSQL  │  │    Redis     │  │   Dashboard  │
   │ (versions,  │  │  (sessions,  │  │  (WebSocket  │
   │  scores,    │  │   caching,   │  │   live view) │
   │  A/B data)  │  │  rate limit) │  │              │
   └─────────────┘  └──────────────┘  └──────────────┘

Tech Stack

Layer Technology Purpose
API FastAPI Core REST API
SDK Python Package pip install iteratun
Queue Apache Kafka Async evaluation pipeline
Database PostgreSQL Prompts, versions, scores, A/B data
Cache Redis Sessions, rate limiting, fast lookups
Auth OAuth2 + JWT GitHub / Google login, API keys
Realtime WebSockets Live dashboard updates
Deploy Docker + Kubernetes Scalable microservice deployment

Key Features

🗂️ Version Control

  • Every prompt change creates an immutable version
  • Full diff view between any two versions
  • Timestamped history with author and changelog
  • One-click rollback to any previous version

⚡ A/B Testing

  • Split traffic between prompt variants by percentage
  • Automatic traffic routing via SDK
  • Statistical significance tracking
  • Winner detection with configurable thresholds

🤖 Automated Evaluation

  • LLM-as-judge scores every response automatically
  • Scores on: relevance, instruction following, tone consistency, length
  • Scores aggregated per prompt version over time
  • Async via Kafka — zero latency impact on your app

📊 Dashboard

  • Live view of prompt performance via WebSockets
  • A/B test progress and winner stats
  • Version history timeline
  • Per-tag and per-model breakdown

🔑 Multi-tenant & Secure

  • Project-based isolation
  • API key management per project
  • OAuth2 login via GitHub / Google
  • Rate limiting per user and per project

API Reference (Quick Look)

POST   /api/prompts                     → create a new prompt
POST   /api/prompts/{name}/versions     → push a new version
GET    /api/prompts/{name}              → get latest version
GET    /api/prompts/{name}/versions     → list all versions
GET    /api/prompts/{name}/diff         → diff between two versions
POST   /api/prompts/{name}/rollback     → rollback to a version
POST   /api/prompts/{name}/ab-test      → start an A/B test
GET    /api/prompts/{name}/ab-test      → get A/B test results
GET    /api/prompts/{name}/scores       → get evaluation scores over time

Getting Started

Run Locally

# Clone the repo
git clone https://github.com/AditHash/iteratun
cd iteratun

# Start all services
docker-compose up -d

# API is live at
http://localhost:8000

# Dashboard at
http://localhost:3000

Install the SDK

pip install iteratun

Quick Start

from iteratun import IteraTun

client = IteraTun(api_key="your-api-key")

# Push your first prompt
client.push(
    name="my-first-prompt",
    content="You are a helpful assistant. Answer clearly and concisely."
)

# Fetch and use it
prompt = client.get("my-first-prompt")
print(prompt.content)

Roadmap

  • Prompt versioning and history
  • A/B testing engine
  • Automated LLM-as-judge evaluation
  • Python SDK
  • JavaScript / TypeScript SDK
  • Prompt templates with variable injection
  • Team collaboration and comments per version
  • Webhook alerts on performance degradation
  • Integration with LangChain and LlamaIndex
  • Self-hosted one-click deploy (Railway / Render)

Why IteraTūn?

The name says it all.

Itera — you iterate over prompt versions, testing and refining.
Tūn — you tune prompts to peak performance using real data.

Every serious AI team does prompt iteration. Almost none of them do it with any discipline or data.
IteraTūn changes that.


Contributing

IteraTūn is open source and contributions are welcome.

# Fork the repo, create a branch
git checkout -b feature/your-feature

# Make your changes, then open a PR

Please read CONTRIBUTING.md before submitting a pull request.


License

MIT License — free to use, modify, and distribute.


Built with ❤️ by AditHash

About

Open-source prompt version control and A/B testing platform for LLM applications

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors