This project is a Discussion RAG system designed for coherent, context-aware, and stable long-term conversations. It features a unique 4-layer memory architecture to prevent common pitfalls like context drift and reasoning loops.
The system is designed not just to remember facts, but to maintain the foundational pillars of a structured discussion: premises, constraints, and established agreements.
The core of this project is a unique memory system that separates different kinds of information based on their role and lifespan in a conversation. This prevents the LLM from getting confused by transient thoughts or overriding foundational premises.
┌────────────────────────────────────┐
│ Layer 1: Ephemeral Session Context │
│ • Adapts to the user's current state (non-persistent)
├────────────────────────────────────┤
│ Layer 2: Explicit Long-Term Memory │
│ • Core premises, values, and constraints of the discussion
├────────────────────────────────────┤
│ Layer 3: Decision Digest │
│ • Immutable record of confirmed agreements and choices
├────────────────────────────────────┤
│ Layer 4: Sliding Window Messages │
│ • The "scratchpad" for recent turns, hypotheses, and reasoning
└────────────────────────────────────┘
- Ephemeral Session Context: Captures the user's immediate state (e.g., fatigue level, discussion mode) to adapt the AI's response tone and style in real-time. This context is not saved.
- Explicit Long-Term Memory (LTM): Stores the foundational pillars of the discussion, such as core assumptions, constraints, and evaluation criteria. This memory is persistent and ensures the conversation remains consistent over long periods.
- Decision Digest: A persistent log of explicit agreements and decisions made during the conversation. This prevents re-litigating settled points.
- Sliding Window Messages: A standard conversational buffer that holds the most recent exchanges. This is where active reasoning and exploration happen.
This structured approach makes the assistant a more reliable partner for complex, long-running discussions. For a deeper dive into the philosophy, please read the Memory System Design Document.
- Advanced 4-Layer Memory: Provides a stable, long-term conversational foundation.
- LLM-Powered Memory Management: The LLM itself determines when and what to save to LTM or the Decision Digest based on the conversation, responding with a structured JSON object containing both the conversational reply and memory operations.
- Separated Backend and Frontend: A robust FastAPI backend handles all logic, while a clean Streamlit frontend provides an interactive user experience.
- Real-time Memory Inspection: The UI allows the user to view the contents of the Long-Term Memory and Decision Digest at any time.
- Dependency Injection: The backend uses a modern dependency injection pattern for robustness and testability.
- Async API: The entire backend is built on an asynchronous framework (FastAPI) for high performance.
- Backend:
- Framework: FastAPI
- Language: Python 3.12+
- Core Logic: LangChain, Pydantic
- LLM Support:
langchain-openai,langchain-ollama
- Frontend:
- Framework: Streamlit
- Package Management & Venv:
uv - Code Quality:
- Linter/Formatter: Ruff
- Type Checking: MyPy
The application is split into two main components for a clean separation of concerns:
- Backend (FastAPI): A powerful backend that serves a REST API for all core functionalities. It encapsulates the 4-layer memory logic, LLM interactions, and data persistence. All business logic resides here.
- Frontend (Streamlit): A purely presentational layer that consumes the backend API. It is responsible for rendering the chat interface, capturing user input, and displaying the memory state.
This decoupled architecture makes the system highly scalable and maintainable. The backend was refactored to use a Dependency Injection pattern, where a single, cached instance of the chat orchestrator is supplied to the API endpoints. This improves testability and predictability.
For more details, see the Architecture & API Documentation.
All endpoints are prefixed with /api/v1.
| Method | Path | Description |
|---|---|---|
POST |
/chat |
Send a message and get an AI response. |
GET |
/ltm |
Retrieve all items in the Long-Term Memory. |
GET |
/decisions |
Retrieve all items in the Decision Digest. |
Follow these instructions to set up and run the project on your local machine.
- Python 3.12+
- uv: An extremely fast Python package installer and resolver.
-
Clone the repository:
git clone https://github.com/your-username/discussion-rag.git cd discussion-rag -
Set up environment variables: Create a
.envfile by copying the example template.cp .env.example .env
Now, edit the
.envfile to configure your desired LLM provider, API keys, etc. The default configuration uses a mock LLM. -
Install dependencies:
uvwill create a virtual environment (.venv) and install all required packages frompyproject.toml.uv sync --extra dev
You need to run the backend and frontend servers in two separate terminal sessions.
-
Start the Backend (FastAPI) server:
uv run uvicorn backend.main:app --reload
The API will be available at
http://localhost:8000. You can view the auto-generated documentation athttp://localhost:8000/docs. -
Start the Frontend (Streamlit) application:
uv run streamlit run frontend/app.py
The chat interface will be available at
http://localhost:8501.
We adhere to a TDD workflow and use modern tooling to maintain high code quality.
To run the entire test suite, use pytest:
uv run pytest-
Format code with Ruff:
uv run ruff format . -
Lint with Ruff (with auto-fix):
uv run ruff check --fix . -
Type-check with MyPy:
uv run mypy .
This project is licensed under the MIT License - see the LICENSE file for details (if one exists).