A multi-chat HTTP server powered by Llama 3 8B with persistent memory and web interface.
- Web-based chat interface with markdown rendering
- Multiple concurrent chat sessions
- Persistent SQLite database for chat history
- Memory system tracking facts, experiences, and topics
- Server-sent events for real-time streaming responses
- Docker-first deployment with embedded model
- Docker and Docker Compose
- 8GB+ available RAM
- 10GB+ free disk space
- Clone the repository:
git clone https://github.com/ByteBaker/ChatLlama
cd ChatLlama- Start the application:
docker-compose up --buildThe model (~5GB) will be automatically downloaded during the first build. This may take a few minutes depending on your connection.
- Access the application:
http://localhost:8000
Set custom port via environment variable:
PORT=3000 docker-compose up --build- Python 3.11+
- 8GB+ available RAM
- Install dependencies:
pip install -r requirements.txt- Download the model:
python src/utils/download_model.py- Run the server:
python src/main.pyChatLlama/
├── src/
│ ├── main.py # Server entry point
│ ├── config.py # Model configuration
│ ├── chat_server.py # Chat logic and memory system
│ ├── http_handler.py # HTTP request handling
│ ├── index.html # Web interface
│ ├── script.js # Frontend JavaScript
│ ├── styles.css # Interface styling
│ └── utils/
│ └── download_model.py # Model download utility
├── data/ # SQLite database (created at runtime)
├── docker-compose.yml # Docker configuration
├── Dockerfile # Container build instructions
└── requirements.txt # Python dependencies
The server automatically categorizes conversation content into:
- Facts: Concrete information and data points
- Experiences: Personal stories and events
- Topics: Subject areas and themes discussed
Memory statistics are included in all chat responses and can be queried independently.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.