AI Software Engineer Loop

This program implements an AI software engineering loop that:

Reads a software specification prompt
Generates a Python implementation that satisfies the spec
Identifies required libraries from the implementation and installs them in a virtual environment
Creates and runs tests for the implementation
Analyzes test results
Iteratively improves the implementation until all tests pass

The program uses Ollama with configurable models to generate code and maintains a memory file to track learnings across iterations. It includes a model evaluator that can test multiple Ollama models to compare their performance.

Requirements

Python 3.7+
Ollama installed and running locally
Langfuse running locally
uv (Python package installer and environment manager)

Installation

Clone this repository
Install the required dependencies:

pip install openai langfuse python-dotenv

Install uv:

pip install uv

Make sure Ollama is running with the deepseek-coder model:

ollama pull deepseek-coder
ollama run deepseek-coder

Set up Langfuse environment variables in a .env file:

LANGFUSE_SECRET_KEY=your_secret_key
LANGFUSE_PUBLIC_KEY=your_public_key

Usage

You can provide the software specification prompt in three ways:

As a command-line argument:

python ai_engineer.py "Create a function that calculates the factorial of a number"

As a file containing the prompt:

python ai_engineer.py prompt.txt

Via standard input:

python ai_engineer.py
# Then type or paste your prompt and press Ctrl+D when finished

How It Works

AI Engineer Loop

The program reads the software specification prompt.
It uses Ollama with the configured model to generate an initial implementation with tests.
It runs the tests and checks if they pass.
If tests fail, it updates a memory file with learnings from the current iteration.
It generates an improved implementation based on the original prompt, test results, and memory.
Steps 3-5 repeat until all tests pass or the maximum number of iterations is reached.

Model Evaluator

The evaluator identifies all available Ollama models.
For each model, it runs multiple evaluations (default: 5 runs per model).
Each run allows multiple iterations (default: 3 iterations per run).

Results are organized in a directory structure:

model_evaluations/
├── model_name_1/
│   ├── 001/
│   │   ├── implementation.py
│   │   ├── memory.json
│   │   └── output.log
│   ├── 002/
│   └── ...
├── model_name_2/
└── ...

A global results file (model_evaluation_results.json) tracks the performance of each model.
The evaluator can be interrupted and resumed at any point.

Output Files

implementation.py: The generated Python implementation
memory.json: A JSON file containing learnings from each iteration

Limitations

The program is limited to a maximum of 10 iterations to prevent infinite loops.
The implementation is limited to a single Python file.
Test execution has a timeout of 30 seconds to prevent hanging.

Customization

You can modify the following constants in the script:

MODEL: The Ollama model to use (default: "deepseek-coder")
MEMORY_FILE: The file to store memory (default: "memory.json")
IMPLEMENTATION_FILE: The file to store the implementation (default: "implementation.py")
MAX_ITERATIONS: Maximum number of iterations (default: 10)

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
sample_prompts		sample_prompts
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
ai_engineer.py		ai_engineer.py
generate_evaluation_graphs.py		generate_evaluation_graphs.py
model_evaluator.py		model_evaluator.py
requirements.txt		requirements.txt
run_evaluations.sh		run_evaluations.sh
run_fibonacci_eval_test.sh		run_fibonacci_eval_test.sh
run_flask_eval.sh		run_flask_eval.sh
sample_prompt.txt		sample_prompt.txt
test_ai_engineer.py		test_ai_engineer.py
test_library_identification.py		test_library_identification.py
test_run_tests.py		test_run_tests.py
test_save_results.py		test_save_results.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Software Engineer Loop

Requirements

Installation

Usage

How It Works

AI Engineer Loop

Model Evaluator

Output Files

Limitations

Customization

About

Uh oh!

Releases

Packages

Uh oh!

Languages

elecnix/ai_engineer_loop

Folders and files

Latest commit

History

Repository files navigation

AI Software Engineer Loop

Requirements

Installation

Usage

How It Works

AI Engineer Loop

Model Evaluator

Output Files

Limitations

Customization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages