Skip to content

Conversation

@enitrat
Copy link
Collaborator

@enitrat enitrat commented Jul 17, 2025

PR Summary: Architectural Overhaul - Migration to Python/DSPy Backend

This pull request represents a monumental architectural overhaul of the Cairo Coder project. The core backend service has been completely migrated from a TypeScript/LangChain implementation to a new, more robust, and optimizable Python-based architecture leveraging the DSPy framework.

The new backend is designed for structured AI programming, enabling metric-driven optimization of the RAG pipeline, improved performance, and a more sophisticated testing and evaluation framework. The legacy TypeScript code has been preserved for reference but is now superseded by the Python service as the primary backend.


1. High-Level Summary of Changes

  • Backend Migration: The primary service logic, previously in TypeScript (packages/agents and packages/backend), has been rewritten in Python under the new python/ directory.
  • Framework Shift: Replaced LangChain with DSPy, shifting from prompt chaining to a structured, optimizable programming model for language models.
  • New API Server: A new FastAPI server replaces the old Express.js server, maintaining full OpenAI API compatibility for a seamless transition.
  • Enhanced Data Pipeline: A new Python-based documentation summarizer script has been introduced. This script pre-processes and summarizes documentation sources, which are then consumed by the existing TypeScript ingester, decoupling data preparation from the ingestion process.
  • Formalized Evaluation Framework: A comprehensive evaluation system has been built using the Starklings exercises to quantitatively measure the agent's code generation quality.
  • Modern Python Tooling: The Python ecosystem is managed with modern tools like uv for package management, ruff for linting/formatting, and pytest for testing.
  • Legacy Code Preservation: The original TypeScript backend code and Dockerfile have been renamed to .old files, and instructions for running it are preserved in a new README.old.md.

2. New Python Backend Architecture (python/ directory)

A complete, self-contained Python application has been built, representing the new heart of Cairo Coder.

2.1. Technology Stack

  • Language: Python 3.10+
  • Web Framework: FastAPI for the asynchronous, high-performance API server.
  • AI Framework: DSPy for building structured, composable, and optimizable RAG pipelines.
  • Database: PostgreSQL with pgvector for vector storage, accessed via asyncpg.
  • Tooling:
    • uv: High-performance package manager and virtual environment tool.
    • ruff: Linter and formatter.
    • pytest: For unit and integration testing.
    • marimo & mlflow: For interactive notebooks and experiment tracking during DSPy pipeline optimization.

2.2. Core RAG Pipeline (Built with DSPy)

The core logic is a multi-stage RAG pipeline, where each stage is a distinct, optimizable DSPy module. This modularity is a significant improvement over the previous implementation. The pipeline is located in python/src/cairo_coder/dspy/.

  1. QueryProcessorProgram (query_processor.py):

    • Purpose: The first stage of the pipeline. It takes the raw user query and chat history.
    • Functionality: It uses a DSPy ChainOfThought signature (CairoQueryAnalysis) to analyze the query, extract semantic search terms, and identify the most relevant documentation sources (e.g., cairo_book, openzeppelin_docs).
    • Output: A structured ProcessedQuery object containing search queries and a list of target DocumentSource enums.
  2. DocumentRetrieverProgram (document_retriever.py):

    • Purpose: The retrieval stage. It takes the ProcessedQuery from the previous step.
    • Functionality: It uses a custom SourceFilteredPgVectorRM (a subclass of DSPy's PgVectorRM) to perform a vector search against the PostgreSQL database. It filters results based on the sources identified by the Query Processor and a similarity threshold.
    • Output: A list of relevant Document objects.
  3. GenerationProgram (generation_program.py):

    • Purpose: The final generation stage. It takes the user query and the retrieved context (documents).
    • Functionality: It uses a CairoCodeGeneration signature within a ChainOfThought module to generate the final Cairo code and explanation. The context provided to the LLM is a carefully formatted string containing all the retrieved documents.
    • Streaming & MCP: It supports both streaming responses (for real-time output) and a special "MCP" (Model Context Protocol) mode that returns the raw, formatted documentation context instead of a generated answer.

2.3. API Server (server/app.py)

  • Framework: Built with FastAPI.
  • Endpoints:
    • Maintains the OpenAI-compatible endpoint: POST /v1/chat/completions.
    • Introduces new agent-specific endpoints like POST /v1/agents/{agent_id}/chat/completions and GET /v1/agents.
  • Features:
    • Asynchronous: Fully async, from request handling to database queries, ensuring high throughput.
    • Streaming Support: Natively supports streaming responses using Server-Sent Events (SSE), matching OpenAI's behavior.
    • Dependency Injection: Uses FastAPI's dependency injection to manage the database connection pool and the AgentFactory.
    • Lifecycle Management: Uses a lifespan context manager to initialize the database connection pool on startup and gracefully close it on shutdown.

3. Data Pipeline and Preparation

The process of ingesting documentation into the vector database has been significantly re-architected.

  • New Python Summarizer: A new CLI tool has been added under python/scripts/summarizer/. This tool uses DSPy to:

    1. Clone documentation repositories (e.g., cairo-book, cairo-docs).
    2. Build the documentation locally (e.g., using mdbook).
    3. Extract, merge, and summarize all the content into a single, clean markdown file.
    4. The generated summaries are saved in python/scripts/summarizer/generated/.
  • Modified TypeScript Ingesters: The legacy TypeScript ingesters (CairoBookIngester.ts, CoreLibDocsIngester.ts) have been modified. Instead of cloning and processing repositories themselves, they now simply read the pre-summarized markdown files generated by the new Python script.

This change decouples data preparation from data ingestion, making the pipeline more robust and allowing the powerful summarization capabilities of the LLM to be used for creating a higher-quality knowledge base.


4. Testing, Evaluation, and Optimization Framework

A formal, metric-driven framework has been established to evaluate and improve the agent's performance.

4.1. Starklings Evaluation Script (starklings_evaluate.py)

  • A new evaluation script automates the process of testing the Cairo Coder agent against the exercises in the Starklings repository.
  • Workflow:
    1. Clones the Starklings repository.
    2. For each exercise, it sends the problem description to the Cairo Coder API.
    3. It takes the generated code solution from the API.
    4. It attempts to compile the solution using a local Scarb project (fixtures/runner_crate).
    5. It records whether the compilation was successful.
  • Reporting: The script generates detailed JSON and Markdown reports summarizing the success rate, categorized by exercise type. This provides a quantitative measure of the agent's performance.

4.2. Optimization Datasets and Notebooks

  • Dataset Generation: A script (generate_starklings_dataset.py) creates a structured dataset (optimizers/datasets/generation_dataset.json) from Starklings. Each entry contains a query (the exercise), context (retrieved via RAG), and an expected answer (the official solution).
  • Optimization Notebooks: Marimo notebooks (e.g., optimizers/generation_optimizer.py) use this dataset to "compile" the DSPy programs.
  • Metric-Driven Tuning: The optimization process uses a custom metric (generation_metric) which checks if the generated code compiles successfully. The DSPy MIPROv2 optimizer tunes the prompts and few-shot examples within the RAG pipeline modules to maximize this compilation success rate.
  • Results: The optimized programs are saved as JSON files (e.g., optimizers/results/optimized_rag.json) and are loaded by the production application.

5. Development, Tooling, and CI Changes

  • Package Management: The Python project uses uv for fast dependency installation and script execution, configured in pyproject.toml.
  • Linting/Formatting: ruff has been added as the primary linter and formatter, configured in trunk.yaml.
  • CI Updates (.github/workflows/trunk-check.yaml):
    • The CI workflow now installs uv.
    • A new Ty Check step has been added to run static type checking on the Python code, ensuring code quality.
  • .gitignore: Updated to ignore Python-specific artifacts like .snfoundry_cache, test compilation targets (fixtures/runner_crate/target), and Starklings evaluation results.
  • README: The main README.md has been completely rewritten to reflect the new Python architecture, installation process (Docker-first), and development workflow.

6. Legacy Code Management

  • Renaming: The original backend.dockerfile has been moved to backend.old.dockerfile.
  • Documentation: The original TypeScript installation and development instructions from README.md have been moved to a new README.old.md to preserve them for anyone needing to run the legacy service.

@enitrat enitrat force-pushed the feat/migrate-dspy branch 2 times, most recently from b37abf3 to c9e971b Compare July 17, 2025 15:54
@enitrat enitrat changed the title feat: migrate DSPY feat: rewrite codebase in Python + DSPY Jul 17, 2025
@enitrat enitrat force-pushed the feat/migrate-dspy branch 4 times, most recently from b11f131 to 1cae972 Compare July 20, 2025 13:41
@enitrat enitrat force-pushed the feat/migrate-dspy branch from fc3c2ef to 07bacd1 Compare July 20, 2025 22:12
enitrat added 14 commits July 20, 2025 23:12
- Remove nest_asyncio dependency and unused async patterns
- Add sync methods to RAG pipeline for better performance
- Optimize document retrieval with similarity thresholds
- Update MCP optimizer with improved query processing
- Fix test mocks and remove redundant async operations
- Clean up whitespace and improve code organization
- Add retry mechanism with max 3 attempts for AdapterParseError
- Apply retry logic to both sync and async forward methods
- Add comprehensive test coverage for retry scenarios
- Refactor tests to use parametrized testing for sync/async methods
- Ensure other exceptions are not retried and fail immediately
@ijusttookadnatest ijusttookadnatest merged commit 8f280a7 into main Jul 29, 2025
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants