Self-Improving Agent

An AI agent that automatically improves its own prompt. Give it any task — it runs, evaluates its own output, rewrites its prompt based on the feedback, and tries again. Each iteration gets better.

Built from scratch. No LangChain. No AutoGen. Just Python + GPT-4o-mini.

How it works

User gives a task
      ↓
Agent runs with current prompt → produces output
      ↓
Evaluator scores output (1-10) across 5 criteria
      ↓
Improver reads score + weaknesses → rewrites the prompt
      ↓
Agent runs again with the improved prompt
      ↓
Repeat until target score reached or max iterations hit
      ↓
Best output wins

The prompt literally rewrites itself every iteration based on what the evaluator found wrong.

Demo

Iteration 1 — Generic prompt → Score 7/10
Iteration 2 — Improver adds analogies, examples, comparisons → Score 8/10
Iteration 3 — Improver refines persona + adds summary instruction → Accuracy 9/10

Score improved from 7 → 8 → 8 across 3 iterations on "Explain transformer architecture to a beginner."

Project structure

self-improving-agent/
├── app.py              # Streamlit UI with iteration viewer
├── agent.py            # Runs the task with current prompt
├── evaluator.py        # Scores output across 5 criteria
├── improver.py         # Rewrites prompt based on feedback
├── prompt_memory.py    # Saves all prompt versions + scores
├── prompts.py          # Base prompts for all 3 agents
├── config.py           # Settings + API keys from .env
└── requirements.txt

Tech stack

Layer	Tool
LLM	GPT-4o-mini (OpenAI)
Frontend	Streamlit
Memory	JSON persistence
Pattern	Meta-agent / self-optimization loop
Config	python-dotenv

Setup

1. Clone the repo

git clone https://github.com/ES7/self-improving-agent
cd self-improving-agent

2. Install dependencies

pip install -r requirements.txt

3. Set up API key

cp .env.example .env

Fill in .env:

OPENAI_API_KEY=your_openai_key_here

4. Run

streamlit run app.py

Evaluation criteria

Every output is scored across 5 dimensions:

Criteria	What it measures
Completeness	Does it fully address the task?
Accuracy	Is the content correct?
Clarity	Is it well-structured and easy to understand?
Depth	Does it go beyond surface level?
Usefulness	Would this actually help someone?

The improver uses weak scores to rewrite the prompt specifically for those gaps.

Settings

Adjustable from the sidebar:

Max iterations — how many times the agent improves (1-5)
Target score — stop early when this score is reached (7-10)

What makes this different

Most AI systems run once and return an output. This one:

Runs multiple times with an evolving prompt
Has a separate evaluator agent with strict scoring criteria
Has a separate improver agent that does prompt engineering automatically
Remembers high-scoring prompts across sessions and reuses them

This is the foundation of how production AI systems do automated prompt optimization.

Author

Ebad Sayed — Final year, IIT (ISM) Dhanbad, Co-founder of Voke AI

Connect: LinkedIn · GitHub · X

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Improving Agent

How it works

Demo

Project structure

Tech stack

Setup

Evaluation criteria

Settings

What makes this different

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Output		Output
.env.example		.env.example
LICENSE		LICENSE
README.md		README.md
agent.py		agent.py
app.py		app.py
config.py		config.py
evaluator.py		evaluator.py
improver.py		improver.py
prompt_history.json		prompt_history.json
prompt_memory.py		prompt_memory.py
prompts.py		prompts.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Self-Improving Agent

How it works

Demo

Project structure

Tech stack

Setup

Evaluation criteria

Settings

What makes this different

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages