-
Notifications
You must be signed in to change notification settings - Fork 3
feat: rewrite codebase in Python + DSPY #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
b37abf3 to
c9e971b
Compare
b11f131 to
1cae972
Compare
fc3c2ef to
07bacd1
Compare
- Remove nest_asyncio dependency and unused async patterns - Add sync methods to RAG pipeline for better performance - Optimize document retrieval with similarity thresholds - Update MCP optimizer with improved query processing - Fix test mocks and remove redundant async operations - Clean up whitespace and improve code organization
- Add retry mechanism with max 3 attempts for AdapterParseError - Apply retry logic to both sync and async forward methods - Add comprehensive test coverage for retry scenarios - Refactor tests to use parametrized testing for sync/async methods - Ensure other exceptions are not retried and fail immediately
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Summary: Architectural Overhaul - Migration to Python/DSPy Backend
This pull request represents a monumental architectural overhaul of the Cairo Coder project. The core backend service has been completely migrated from a TypeScript/LangChain implementation to a new, more robust, and optimizable Python-based architecture leveraging the DSPy framework.
The new backend is designed for structured AI programming, enabling metric-driven optimization of the RAG pipeline, improved performance, and a more sophisticated testing and evaluation framework. The legacy TypeScript code has been preserved for reference but is now superseded by the Python service as the primary backend.
1. High-Level Summary of Changes
packages/agentsandpackages/backend), has been rewritten in Python under the newpython/directory.uvfor package management,rufffor linting/formatting, andpytestfor testing..oldfiles, and instructions for running it are preserved in a newREADME.old.md.2. New Python Backend Architecture (
python/directory)A complete, self-contained Python application has been built, representing the new heart of Cairo Coder.
2.1. Technology Stack
pgvectorfor vector storage, accessed viaasyncpg.uv: High-performance package manager and virtual environment tool.ruff: Linter and formatter.pytest: For unit and integration testing.marimo&mlflow: For interactive notebooks and experiment tracking during DSPy pipeline optimization.2.2. Core RAG Pipeline (Built with DSPy)
The core logic is a multi-stage RAG pipeline, where each stage is a distinct, optimizable DSPy module. This modularity is a significant improvement over the previous implementation. The pipeline is located in
python/src/cairo_coder/dspy/.QueryProcessorProgram(query_processor.py):ChainOfThoughtsignature (CairoQueryAnalysis) to analyze the query, extract semantic search terms, and identify the most relevant documentation sources (e.g.,cairo_book,openzeppelin_docs).ProcessedQueryobject containing search queries and a list of targetDocumentSourceenums.DocumentRetrieverProgram(document_retriever.py):ProcessedQueryfrom the previous step.SourceFilteredPgVectorRM(a subclass of DSPy'sPgVectorRM) to perform a vector search against the PostgreSQL database. It filters results based on the sources identified by the Query Processor and a similarity threshold.Documentobjects.GenerationProgram(generation_program.py):CairoCodeGenerationsignature within aChainOfThoughtmodule to generate the final Cairo code and explanation. The context provided to the LLM is a carefully formatted string containing all the retrieved documents.2.3. API Server (
server/app.py)POST /v1/chat/completions.POST /v1/agents/{agent_id}/chat/completionsandGET /v1/agents.AgentFactory.lifespancontext manager to initialize the database connection pool on startup and gracefully close it on shutdown.3. Data Pipeline and Preparation
The process of ingesting documentation into the vector database has been significantly re-architected.
New Python Summarizer: A new CLI tool has been added under
python/scripts/summarizer/. This tool uses DSPy to:cairo-book,cairo-docs).mdbook).python/scripts/summarizer/generated/.Modified TypeScript Ingesters: The legacy TypeScript ingesters (
CairoBookIngester.ts,CoreLibDocsIngester.ts) have been modified. Instead of cloning and processing repositories themselves, they now simply read the pre-summarized markdown files generated by the new Python script.This change decouples data preparation from data ingestion, making the pipeline more robust and allowing the powerful summarization capabilities of the LLM to be used for creating a higher-quality knowledge base.
4. Testing, Evaluation, and Optimization Framework
A formal, metric-driven framework has been established to evaluate and improve the agent's performance.
4.1. Starklings Evaluation Script (
starklings_evaluate.py)fixtures/runner_crate).4.2. Optimization Datasets and Notebooks
generate_starklings_dataset.py) creates a structured dataset (optimizers/datasets/generation_dataset.json) from Starklings. Each entry contains aquery(the exercise),context(retrieved via RAG), and anexpectedanswer (the official solution).optimizers/generation_optimizer.py) use this dataset to "compile" the DSPy programs.generation_metric) which checks if the generated code compiles successfully. The DSPyMIPROv2optimizer tunes the prompts and few-shot examples within the RAG pipeline modules to maximize this compilation success rate.optimizers/results/optimized_rag.json) and are loaded by the production application.5. Development, Tooling, and CI Changes
uvfor fast dependency installation and script execution, configured inpyproject.toml.ruffhas been added as the primary linter and formatter, configured intrunk.yaml..github/workflows/trunk-check.yaml):uv.Ty Checkstep has been added to run static type checking on the Python code, ensuring code quality..snfoundry_cache, test compilation targets (fixtures/runner_crate/target), and Starklings evaluation results.README.mdhas been completely rewritten to reflect the new Python architecture, installation process (Docker-first), and development workflow.6. Legacy Code Management
backend.dockerfilehas been moved tobackend.old.dockerfile.README.mdhave been moved to a newREADME.old.mdto preserve them for anyone needing to run the legacy service.