feat: rewrite codebase in Python + DSPY #29

enitrat · 2025-07-17T15:52:58Z

PR Summary: Architectural Overhaul - Migration to Python/DSPy Backend

This pull request represents a monumental architectural overhaul of the Cairo Coder project. The core backend service has been completely migrated from a TypeScript/LangChain implementation to a new, more robust, and optimizable Python-based architecture leveraging the DSPy framework.

The new backend is designed for structured AI programming, enabling metric-driven optimization of the RAG pipeline, improved performance, and a more sophisticated testing and evaluation framework. The legacy TypeScript code has been preserved for reference but is now superseded by the Python service as the primary backend.

1. High-Level Summary of Changes

Backend Migration: The primary service logic, previously in TypeScript (packages/agents and packages/backend), has been rewritten in Python under the new python/ directory.
Framework Shift: Replaced LangChain with DSPy, shifting from prompt chaining to a structured, optimizable programming model for language models.
New API Server: A new FastAPI server replaces the old Express.js server, maintaining full OpenAI API compatibility for a seamless transition.
Enhanced Data Pipeline: A new Python-based documentation summarizer script has been introduced. This script pre-processes and summarizes documentation sources, which are then consumed by the existing TypeScript ingester, decoupling data preparation from the ingestion process.
Formalized Evaluation Framework: A comprehensive evaluation system has been built using the Starklings exercises to quantitatively measure the agent's code generation quality.
Modern Python Tooling: The Python ecosystem is managed with modern tools like uv for package management, ruff for linting/formatting, and pytest for testing.
Legacy Code Preservation: The original TypeScript backend code and Dockerfile have been renamed to .old files, and instructions for running it are preserved in a new README.old.md.

2. New Python Backend Architecture (`python/` directory)

A complete, self-contained Python application has been built, representing the new heart of Cairo Coder.

2.1. Technology Stack

Language: Python 3.10+
Web Framework: FastAPI for the asynchronous, high-performance API server.
AI Framework: DSPy for building structured, composable, and optimizable RAG pipelines.
Database: PostgreSQL with pgvector for vector storage, accessed via asyncpg.
Tooling:
- uv: High-performance package manager and virtual environment tool.
- ruff: Linter and formatter.
- pytest: For unit and integration testing.
- marimo & mlflow: For interactive notebooks and experiment tracking during DSPy pipeline optimization.

2.2. Core RAG Pipeline (Built with DSPy)

The core logic is a multi-stage RAG pipeline, where each stage is a distinct, optimizable DSPy module. This modularity is a significant improvement over the previous implementation. The pipeline is located in python/src/cairo_coder/dspy/.

QueryProcessorProgram (query_processor.py):
- Purpose: The first stage of the pipeline. It takes the raw user query and chat history.
- Functionality: It uses a DSPy ChainOfThought signature (CairoQueryAnalysis) to analyze the query, extract semantic search terms, and identify the most relevant documentation sources (e.g., cairo_book, openzeppelin_docs).
- Output: A structured ProcessedQuery object containing search queries and a list of target DocumentSource enums.
DocumentRetrieverProgram (document_retriever.py):
- Purpose: The retrieval stage. It takes the ProcessedQuery from the previous step.
- Functionality: It uses a custom SourceFilteredPgVectorRM (a subclass of DSPy's PgVectorRM) to perform a vector search against the PostgreSQL database. It filters results based on the sources identified by the Query Processor and a similarity threshold.
- Output: A list of relevant Document objects.
GenerationProgram (generation_program.py):
- Purpose: The final generation stage. It takes the user query and the retrieved context (documents).
- Functionality: It uses a CairoCodeGeneration signature within a ChainOfThought module to generate the final Cairo code and explanation. The context provided to the LLM is a carefully formatted string containing all the retrieved documents.
- Streaming & MCP: It supports both streaming responses (for real-time output) and a special "MCP" (Model Context Protocol) mode that returns the raw, formatted documentation context instead of a generated answer.

2.3. API Server (`server/app.py`)

Framework: Built with FastAPI.
Endpoints:
- Maintains the OpenAI-compatible endpoint: POST /v1/chat/completions.
- Introduces new agent-specific endpoints like POST /v1/agents/{agent_id}/chat/completions and GET /v1/agents.
Features:
- Asynchronous: Fully async, from request handling to database queries, ensuring high throughput.
- Streaming Support: Natively supports streaming responses using Server-Sent Events (SSE), matching OpenAI's behavior.
- Dependency Injection: Uses FastAPI's dependency injection to manage the database connection pool and the AgentFactory.
- Lifecycle Management: Uses a lifespan context manager to initialize the database connection pool on startup and gracefully close it on shutdown.

3. Data Pipeline and Preparation

The process of ingesting documentation into the vector database has been significantly re-architected.

New Python Summarizer: A new CLI tool has been added under python/scripts/summarizer/. This tool uses DSPy to:
1. Clone documentation repositories (e.g., cairo-book, cairo-docs).
2. Build the documentation locally (e.g., using mdbook).
3. Extract, merge, and summarize all the content into a single, clean markdown file.
4. The generated summaries are saved in python/scripts/summarizer/generated/.
Modified TypeScript Ingesters: The legacy TypeScript ingesters (CairoBookIngester.ts, CoreLibDocsIngester.ts) have been modified. Instead of cloning and processing repositories themselves, they now simply read the pre-summarized markdown files generated by the new Python script.

This change decouples data preparation from data ingestion, making the pipeline more robust and allowing the powerful summarization capabilities of the LLM to be used for creating a higher-quality knowledge base.

4. Testing, Evaluation, and Optimization Framework

A formal, metric-driven framework has been established to evaluate and improve the agent's performance.

4.1. Starklings Evaluation Script (`starklings_evaluate.py`)

A new evaluation script automates the process of testing the Cairo Coder agent against the exercises in the Starklings repository.
Workflow:
1. Clones the Starklings repository.
2. For each exercise, it sends the problem description to the Cairo Coder API.
3. It takes the generated code solution from the API.
4. It attempts to compile the solution using a local Scarb project (fixtures/runner_crate).
5. It records whether the compilation was successful.
Reporting: The script generates detailed JSON and Markdown reports summarizing the success rate, categorized by exercise type. This provides a quantitative measure of the agent's performance.

4.2. Optimization Datasets and Notebooks

Dataset Generation: A script (generate_starklings_dataset.py) creates a structured dataset (optimizers/datasets/generation_dataset.json) from Starklings. Each entry contains a query (the exercise), context (retrieved via RAG), and an expected answer (the official solution).
Optimization Notebooks: Marimo notebooks (e.g., optimizers/generation_optimizer.py) use this dataset to "compile" the DSPy programs.
Metric-Driven Tuning: The optimization process uses a custom metric (generation_metric) which checks if the generated code compiles successfully. The DSPy MIPROv2 optimizer tunes the prompts and few-shot examples within the RAG pipeline modules to maximize this compilation success rate.
Results: The optimized programs are saved as JSON files (e.g., optimizers/results/optimized_rag.json) and are loaded by the production application.

5. Development, Tooling, and CI Changes

Package Management: The Python project uses uv for fast dependency installation and script execution, configured in pyproject.toml.
Linting/Formatting: ruff has been added as the primary linter and formatter, configured in trunk.yaml.
CI Updates (.github/workflows/trunk-check.yaml):
- The CI workflow now installs uv.
- A new Ty Check step has been added to run static type checking on the Python code, ensuring code quality.
.gitignore: Updated to ignore Python-specific artifacts like .snfoundry_cache, test compilation targets (fixtures/runner_crate/target), and Starklings evaluation results.
README: The main README.md has been completely rewritten to reflect the new Python architecture, installation process (Docker-first), and development workflow.

6. Legacy Code Management

Renaming: The original backend.dockerfile has been moved to backend.old.dockerfile.
Documentation: The original TypeScript installation and development instructions from README.md have been moved to a new README.old.md to preserve them for anyone needing to run the legacy service.

- Remove nest_asyncio dependency and unused async patterns - Add sync methods to RAG pipeline for better performance - Optimize document retrieval with similarity thresholds - Update MCP optimizer with improved query processing - Fix test mocks and remove redundant async operations - Clean up whitespace and improve code organization

- Add retry mechanism with max 3 attempts for AdapterParseError - Apply retry logic to both sync and async forward methods - Add comprehensive test coverage for retry scenarios - Refactor tests to use parametrized testing for sync/async methods - Ensure other exceptions are not retried and fail immediately

enitrat force-pushed the feat/migrate-dspy branch 2 times, most recently from b37abf3 to c9e971b Compare July 17, 2025 15:54

enitrat changed the title ~~feat: migrate DSPY~~ feat: rewrite codebase in Python + DSPY Jul 17, 2025

enitrat force-pushed the feat/migrate-dspy branch 4 times, most recently from b11f131 to 1cae972 Compare July 20, 2025 13:41

enitrat force-pushed the main branch from 6a69993 to 66da14b Compare July 20, 2025 16:22

enitrat added 22 commits July 20, 2025 19:51

migrate to DSPy

f90c5bc

--wip-- [skip ci]

baaba89

--wip-- [skip ci]

3637e72

use DSPy PGVector

0282a2d

feat: native PGVector + DSPY streaming

36070d1

feat: add optimizer for query processing

dacc515

fix: properly filter sources in embedding search

4f2b436

cleanup and test fixes

5817747

enhance context in retriever

0cdd0d1

write optimizer scripts for generation

75b725e

re-optimize retrieval

ea58566

force scarb nightly for faster compilation in runner-crate

ecd5e31

remove useless asyncs

4204816

add optimizer on RAG run

b943f6d

add token usage tracking

1a3482d

feat: add starklings evaluator (py-based)

9884b6e

use multi-workers server

87f19c4

cleanup

66b06f3

add chat/completions endpoint

5c53531

feat: add langsmith tracing

023091a

update instructions

464d727

migrate dockerfile

18268e2

enitrat added 7 commits July 20, 2025 19:51

ruff fmt

283d227

docs: maintainer guide

cd3fc93

fix deprecated method

e8b9827

fix types post-ruff lints

f601a02

minor tweaks

dd26593

feat: add DSPY summarization for Corelib

c0d0b5f

dev: use Evaluator to establish baseline perf

07bacd1

enitrat force-pushed the feat/migrate-dspy branch from fc3c2ef to 07bacd1 Compare July 20, 2025 22:12

enitrat added 14 commits July 20, 2025 23:12

feat: support async

9df5871

fix AgentFactory instanciation

0773cf4

dev: remove dead VectorStore

adcfea2

fix dockerfile

235bdbd

add API Keys instructions

0642e70

committed uv.lock

67880f9

typecheck fixes, readme updates

ebed080

update cairobook ingester using LLM-summarized file

0954724

add missing agent keys

0220634

feat(eval): add compilation error on initial prompt

7220d73

fix types

07be2ca

fix typechecks

2fadb3a

ijusttookadnatest merged commit 8f280a7 into main Jul 29, 2025
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: rewrite codebase in Python + DSPY #29

feat: rewrite codebase in Python + DSPY #29

Uh oh!

enitrat commented Jul 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: rewrite codebase in Python + DSPY #29

feat: rewrite codebase in Python + DSPY #29

Uh oh!

Conversation

enitrat commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary: Architectural Overhaul - Migration to Python/DSPy Backend

1. High-Level Summary of Changes

2. New Python Backend Architecture (python/ directory)

2.1. Technology Stack

2.2. Core RAG Pipeline (Built with DSPy)

2.3. API Server (server/app.py)

3. Data Pipeline and Preparation

4. Testing, Evaluation, and Optimization Framework

4.1. Starklings Evaluation Script (starklings_evaluate.py)

4.2. Optimization Datasets and Notebooks

5. Development, Tooling, and CI Changes

6. Legacy Code Management

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

enitrat commented Jul 17, 2025 •

edited

Loading

2. New Python Backend Architecture (`python/` directory)

2.3. API Server (`server/app.py`)

4.1. Starklings Evaluation Script (`starklings_evaluate.py`)