# 🚀 Repository Setup Guide

**Purpose:** Set up and run the server using the backend/start.sh script

---

## Quick Start

1. Build the frontend
```bash
npm install
npm run build
```

2. Install the python backend dependencies at `backend/requirements.txt`
```bash
pip install uv
uv sync
```

3. Run the setup script
```bash
./backend/start.sh
```

---

**Note:** The start.sh script will:
- Set up Python virtual environment
- Install dependencies
- Configure environment variables
- Start required services

**Troubleshooting:**
- Ensure you have Python 3.12+ installed
- Check file permissions if script fails to execute
- Verify all required environment variables are set

**Docker Note:** Docker setup is not recommended at this time due to long build times and performance overhead.

## Methodology

### Base Project Selection
We chose [OpenWebUI](https://github.com/open-webui/open-webui) as our foundation because:
- Modern, production-ready codebase with active development
- Clean architecture with clear separation of concerns
- Built-in support for multiple LLM backends
- Robust frontend with real-time chat interface
- Extensive documentation and community support
- Proven scalability in production environments
- Easy to extend and customize for our specific needs

### Embedding Model Selection
We chose `sentence-transformers/all-MiniLM-L6-v2` for text embeddings because:
- Lightweight (384 dimensions) with strong performance
- Fast inference suitable for real-time queries
- Proven track record in semantic search tasks
- Lower memory footprint than larger models
- Outperforms other models of similar size on MTEB benchmarks
- [Highly popular on Hugging Face](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) with over 1M downloads 
- The defacto standard for semantic search across a wide range of open source projects

### LLM Selection
We selected `meta/llama3.2:3b` for RAG Q&A because:
- 3B parameters provide good balance of performance and resource usage
- Strong performance on instruction following
- Efficient inference on consumer hardware
- Lower latency than larger models
- Cost-effective for deployment
- [Top performer on OpenLLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?params=-1%2C6&official=true) when filtering for <6B official models and sorting for top IFEval score.

### Architecture Decisions

<div style="text-align: center;">
    <img src="./images/architecture.png" style="width: 1000px;">
</div>

- **RAG Pipeline**: Chose RAG over fine-tuning to leverage existing knowledge while maintaining flexibility
- **Vector Store**: Using ChromaDB for fast similarity search and efficient storage
- **API Design**: RESTful endpoints for easy integration with frontend
- **Caching**: Implemented response caching to reduce LLM calls


**Authors:**
- Ayesh Ahmad (365966)
- Farooq Afzal (365793)
- Muhammad Faras Siddiqui (365988)

**Last Updated:** May 4, 2025
**Version:** 1.0