This project is a chat assistant built for Ebla Computer Consultancy, designed to provide helpful responses using Retrieval-Augmented Generation (RAG). It integrates:
- FastAPI for the backend API
- Ollama model for LLM responses
- MongoDB for chat session storage
- Chroma for vector database
- Custom UI for the chatbot interface
The assistant remembers previous messages in each chat session and uses RAG to provide context-aware responses by retrieving relevant knowledge chunks.
This project was created to gain a practical understanding of how Retrieval-Augmented Generation (RAG) works. It demonstrates how RAG can improve the accuracy and relevance of AI responses by combining local context with external vectorized knowledge.
- Clean chatbot UI for interaction
- Session-based chat history with memory (stored in MongoDB)
- Vector search integration using Chroma
- Ollama model for natural language responses
- Built with modular FastAPI backend
Make sure you have the following installed:
- Ollama
- Python 3.10+
pip
package managervirtualenv
(recommended)
git clone https://github.com/3bdop/chat-app.git
cd chat-app
Ollama lets you run large language models locally. Follow the instructions for your OS:
-
macOS / Linux / Windows:
https://ollama.com/download -
After installation, verify it's working:
ollama --version
-
Pull llama3 model:
ollama pull llama3
-
Pull mxbai-embed-large model for creating embeddings:
ollama pull mxbai-embed-large
-
Verify installation:
ollama list
-
Create environment (recommended):
python -m venv .venv
-
Activate environment:
source .venv/bin/activate # Windows: .\.venv\Scripts\activate.bat
-
Install requirements:
pip install -r requirements.txt
-
Create a
.env
file in the root and the following with your own values:MONGODB_URI=<your db string> MONGODB_DB_NAME=<your db name>
- To run the app type the following:
fastapi dev src/main.py