Skip to content

chottokun/Learn02_MemorySystemConcept

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Discussion RAG

Python Version Code Style: Ruff Checked with MyPy License: MIT

This project is a Discussion RAG system designed for coherent, context-aware, and stable long-term conversations. It features a unique 4-layer memory architecture to prevent common pitfalls like context drift and reasoning loops.

The system is designed not just to remember facts, but to maintain the foundational pillars of a structured discussion: premises, constraints, and established agreements.

Core Concept: The 4-Layer Memory Architecture

The core of this project is a unique memory system that separates different kinds of information based on their role and lifespan in a conversation. This prevents the LLM from getting confused by transient thoughts or overriding foundational premises.

┌────────────────────────────────────┐
│ Layer 1: Ephemeral Session Context │
│   • Adapts to the user's current state (non-persistent)
├────────────────────────────────────┤
│ Layer 2: Explicit Long-Term Memory │
│   • Core premises, values, and constraints of the discussion
├────────────────────────────────────┤
│ Layer 3: Decision Digest           │
│   • Immutable record of confirmed agreements and choices
├────────────────────────────────────┤
│ Layer 4: Sliding Window Messages   │
│   • The "scratchpad" for recent turns, hypotheses, and reasoning
└────────────────────────────────────┘
  1. Ephemeral Session Context: Captures the user's immediate state (e.g., fatigue level, discussion mode) to adapt the AI's response tone and style in real-time. This context is not saved.
  2. Explicit Long-Term Memory (LTM): Stores the foundational pillars of the discussion, such as core assumptions, constraints, and evaluation criteria. This memory is persistent and ensures the conversation remains consistent over long periods.
  3. Decision Digest: A persistent log of explicit agreements and decisions made during the conversation. This prevents re-litigating settled points.
  4. Sliding Window Messages: A standard conversational buffer that holds the most recent exchanges. This is where active reasoning and exploration happen.

This structured approach makes the assistant a more reliable partner for complex, long-running discussions. For a deeper dive into the philosophy, please read the Memory System Design Document.

✨ Features

  • Advanced 4-Layer Memory: Provides a stable, long-term conversational foundation.
  • LLM-Powered Memory Management: The LLM itself determines when and what to save to LTM or the Decision Digest based on the conversation, responding with a structured JSON object containing both the conversational reply and memory operations.
  • Separated Backend and Frontend: A robust FastAPI backend handles all logic, while a clean Streamlit frontend provides an interactive user experience.
  • Real-time Memory Inspection: The UI allows the user to view the contents of the Long-Term Memory and Decision Digest at any time.
  • Dependency Injection: The backend uses a modern dependency injection pattern for robustness and testability.
  • Async API: The entire backend is built on an asynchronous framework (FastAPI) for high performance.

🛠️ Technology Stack

  • Backend:
    • Framework: FastAPI
    • Language: Python 3.12+
    • Core Logic: LangChain, Pydantic
    • LLM Support: langchain-openai, langchain-ollama
  • Frontend:
    • Framework: Streamlit
  • Package Management & Venv: uv
  • Code Quality:
    • Linter/Formatter: Ruff
    • Type Checking: MyPy

🏗️ Architecture

The application is split into two main components for a clean separation of concerns:

  1. Backend (FastAPI): A powerful backend that serves a REST API for all core functionalities. It encapsulates the 4-layer memory logic, LLM interactions, and data persistence. All business logic resides here.
  2. Frontend (Streamlit): A purely presentational layer that consumes the backend API. It is responsible for rendering the chat interface, capturing user input, and displaying the memory state.

This decoupled architecture makes the system highly scalable and maintainable. The backend was refactored to use a Dependency Injection pattern, where a single, cached instance of the chat orchestrator is supplied to the API endpoints. This improves testability and predictability.

For more details, see the Architecture & API Documentation.

API Endpoints

All endpoints are prefixed with /api/v1.

Method Path Description
POST /chat Send a message and get an AI response.
GET /ltm Retrieve all items in the Long-Term Memory.
GET /decisions Retrieve all items in the Decision Digest.

🚀 Getting Started

Follow these instructions to set up and run the project on your local machine.

1. Prerequisites

  • Python 3.12+
  • uv: An extremely fast Python package installer and resolver.

2. Installation

  1. Clone the repository:

    git clone https://github.com/your-username/discussion-rag.git
    cd discussion-rag
  2. Set up environment variables: Create a .env file by copying the example template.

    cp .env.example .env

    Now, edit the .env file to configure your desired LLM provider, API keys, etc. The default configuration uses a mock LLM.

  3. Install dependencies: uv will create a virtual environment (.venv) and install all required packages from pyproject.toml.

    uv sync --extra dev

3. Running the Application

You need to run the backend and frontend servers in two separate terminal sessions.

  1. Start the Backend (FastAPI) server:

    uv run uvicorn backend.main:app --reload

    The API will be available at http://localhost:8000. You can view the auto-generated documentation at http://localhost:8000/docs.

  2. Start the Frontend (Streamlit) application:

    uv run streamlit run frontend/app.py

    The chat interface will be available at http://localhost:8501.

✅ Development & Testing

We adhere to a TDD workflow and use modern tooling to maintain high code quality.

Running Tests

To run the entire test suite, use pytest:

uv run pytest

Code Quality Checks

  • Format code with Ruff:

    uv run ruff format .
  • Lint with Ruff (with auto-fix):

    uv run ruff check --fix .
  • Type-check with MyPy:

    uv run mypy .

📄 License

This project is licensed under the MIT License - see the LICENSE file for details (if one exists).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages