# Module 2: Development Environment Setup

## Learning Objectives

By the end of this notebook, you will:
1. Understand the complete tech stack
2. Set up Python environment properly
3. Install all dependencies
4. Configure external services (PostgreSQL, Qdrant, Redis)
5. Verify everything works
6. Run your first API call

**Estimated Time:** 60-90 minutes

**Prerequisites:**
- Python 3.8+ installed
- Git installed
- Docker & Docker Compose (optional but recommended)
- 4GB+ free RAM
- 2GB free disk space

## Step 1: Verify Python Installation

First, let's check your Python version. You need Python 3.8 or higher.

In [None]:
import sys
print(f"Python version: {sys.version}")
print(f"Python executable: {sys.executable}")

# Check version
version_info = sys.version_info
if version_info.major == 3 and version_info.minor >= 8:
    print(f"\n‚úÖ Python {version_info.major}.{version_info.minor} is compatible!")
else:
    print(f"\n‚ùå Python {version_info.major}.{version_info.minor} is too old. Please upgrade to 3.8+")

**What if Python is too old?**

**Option 1: Using pyenv (recommended)**
```bash
# Install pyenv
curl https://pyenv.run | bash

# Install Python 3.11
pyenv install 3.11.0

# Set as global
pyenv global 3.11.0
```

**Option 2: Using conda**
```bash
conda create -n rag-engine python=3.11
conda activate rag-engine
```

**Option 3: Direct download**
- Download from [python.org](https://www.python.org/downloads/)
- Make sure to check "Add Python to PATH" during installation

## Step 2: Clone the Repository

Now let's get the code. Choose one of these options:

In [None]:
# Option 1: Clone via HTTPS (recommended for beginners)
!git clone https://github.com/your-org/rag-engine-mini.git

# Option 2: Clone via SSH (if you have SSH keys set up)
# !git clone git@github.com:your-org/rag-engine-mini.git

# Option 3: If you already have it cloned, just navigate to it
import os
if os.path.exists('rag-engine-mini'):
    print("‚úÖ Repository already cloned!")
else:
    print("‚ö†Ô∏è  Please clone the repository first")

## Step 3: Navigate to Project Directory

Change into the project directory and explore its structure.

In [None]:
import os

# Change to project directory
os.chdir('rag-engine-mini')

# Verify we're in the right place
if os.path.exists('README.md') and os.path.exists('requirements.txt'):
    print("‚úÖ Successfully in project directory!")
    print(f"\nCurrent directory: {os.getcwd()}")
else:
    print("‚ùå Not in the correct directory")
    print(f"Current: {os.getcwd()}")
    print(f"Contents: {os.listdir('.')}")

In [None]:
# Explore the directory structure
print("üìÅ Project Structure:\n")

for item in sorted(os.listdir('.')):
    if os.path.isdir(item):
        if item.startswith('.'):
            continue  # Skip hidden directories
        # Count files in directory
        try:
            count = len([f for f in os.listdir(item) if not f.startswith('.')])
            print(f"üìÇ {item:<20} ({count} items)")
        except:
            print(f"üìÇ {item:<20} (access denied)")
    else:
        size = os.path.getsize(item)
        print(f"üìÑ {item:<20} ({size:,} bytes)")

## Step 4: Create Virtual Environment

Virtual environments isolate project dependencies. This is **essential** for Python development.

In [None]:
# Check if virtual environment exists
venv_exists = os.path.exists('venv') or os.path.exists('.venv')

if venv_exists:
    print("‚úÖ Virtual environment already exists!")
else:
    print("Creating virtual environment...")
    !python -m venv venv
    print("‚úÖ Virtual environment created!")

# Determine activation command based on OS
import platform
system = platform.system()

if system == "Windows":
    activate_cmd = "venv\\Scripts\\activate"
    shell_cmd = f"{activate_cmd} && python"
else:
    activate_cmd = "source venv/bin/activate"
    shell_cmd = f"{activate_cmd} && python"

print(f"\nTo activate manually, run:")
print(f"  {activate_cmd}")
print(f"\nThen install dependencies:")
print(f"  pip install -r requirements.txt")

**Why virtual environments matter:**

Without virtualenv:
```
Project A needs: requests==2.25.0
Project B needs: requests==2.28.0
Result: CONFLICT! Both can't be installed globally
```

With virtualenv:
```
Project A venv: requests==2.25.0
Project B venv: requests==2.28.0
Result: Both work independently!
```

## Step 5: Install Dependencies

Now let's install all the Python packages. This might take 2-5 minutes depending on your internet speed.

In [None]:
# Check requirements.txt exists
if os.path.exists('requirements.txt'):
    with open('requirements.txt') as f:
        lines = f.readlines()
        deps = [line.strip() for line in lines if line.strip() and not line.startswith('#')]
    
    print(f"üì¶ Found {len(deps)} dependencies:")
    for i, dep in enumerate(deps[:10], 1):  # Show first 10
        print(f"  {i}. {dep}")
    if len(deps) > 10:
        print(f"  ... and {len(deps) - 10} more")
else:
    print("‚ùå requirements.txt not found")

In [None]:
# Install dependencies
print("üöÄ Installing dependencies...")
print("This may take a few minutes...\n")

# Run pip install
!pip install -q -r requirements.txt

print("\n‚úÖ Dependencies installed successfully!")

In [None]:
# Verify key packages are installed
import importlib

packages = [
    'fastapi',
    'sqlalchemy',
    'pydantic',
    'httpx',
    'pytest',
    'qdrant_client',
    'redis',
    'celery',
    'openai',
]

print("üîç Verifying installed packages:\n")

missing = []
for pkg in packages:
    try:
        module = importlib.import_module(pkg)
        version = getattr(module, '__version__', 'unknown')
        print(f"  ‚úÖ {pkg:<20} v{version}")
    except ImportError:
        print(f"  ‚ùå {pkg:<20} NOT FOUND")
        missing.append(pkg)

if missing:
    print(f"\n‚ö†Ô∏è  Missing packages: {', '.join(missing)}")
    print("Try installing them individually:")
    for pkg in missing:
        print(f"  pip install {pkg}")
else:
    print("\n‚úÖ All key packages verified!")

## Step 6: Set Up External Services

RAG Engine requires three external services:
1. **PostgreSQL** - Main database (documents, users, metadata)
2. **Qdrant** - Vector database (embeddings, similarity search)
3. **Redis** - Cache and message broker

### Option A: Using Docker Compose (Recommended)

This is the easiest way to get started. Docker Compose will set up all three services automatically.

In [None]:
# Check if Docker is installed
import subprocess

try:
    result = subprocess.run(['docker', '--version'], 
                          capture_output=True, 
                          text=True, 
                          timeout=5)
    if result.returncode == 0:
        print(f"‚úÖ Docker found: {result.stdout.strip()}")
    else:
        print("‚ùå Docker not found")
except:
    print("‚ùå Docker not found or not in PATH")
    print("\nPlease install Docker:")
    print("  Windows/Mac: https://docs.docker.com/get-docker/")
    print("  Linux: sudo apt-get install docker.io docker-compose")

In [None]:
# Check if docker-compose.yml exists
if os.path.exists('docker-compose.yml'):
    print("‚úÖ docker-compose.yml found!\n")
    
    # Show services defined
    with open('docker-compose.yml') as f:
        content = f.read()
    
    # Simple parsing to show services
    import re
    services = re.findall(r'^(\w+):', content, re.MULTILINE)
    services = [s for s in services if s not in ['version', 'services', 'networks', 'volumes']]
    
    print("Services that will be started:")
    for svc in services:
        print(f"  üê≥ {svc}")
else:
    print("‚ö†Ô∏è  docker-compose.yml not found in current directory")

In [None]:
# Start services with Docker Compose
print("üöÄ Starting services with Docker Compose...")
print("This will start PostgreSQL, Qdrant, and Redis\n")

!docker-compose up -d

print("\n‚úÖ Services started!")

In [None]:
# Verify services are running
print("üîç Checking service status:\n")

!docker-compose ps

print("\nIf all services show 'Up', you're good to go!")

**Services will be available at:**
- PostgreSQL: `localhost:5432`
- Qdrant: `localhost:6333`
- Redis: `localhost:6379`

**Default credentials (from docker-compose.yml):**
- PostgreSQL: `user:password@localhost:5432/rag_engine`
- Qdrant: No authentication (dev mode)
- Redis: No password (dev mode)

### Option B: Manual Installation (Advanced)

If you prefer not to use Docker:

**PostgreSQL:**
```bash
# macOS
brew install postgresql
brew services start postgresql

# Ubuntu/Debian
sudo apt-get install postgresql postgresql-contrib
sudo service postgresql start

# Create database
createdb rag_engine
```

**Qdrant:**
```bash
docker run -p 6333:6333 qdrant/qdrant
```

**Redis:**
```bash
# macOS
brew install redis
brew services start redis

# Ubuntu/Debian
sudo apt-get install redis-server
sudo service redis-server start
```

## Step 7: Configure Environment Variables

The application needs configuration via environment variables. We'll create a `.env` file.

In [None]:
# Check if .env.example exists and use it as template
if os.path.exists('.env.example'):
    with open('.env.example') as f:
        env_template = f.read()
    print("‚úÖ Found .env.example template")
    print("\nTemplate preview:")
    print("="*60)
    for line in env_template.split('\n')[:20]:
        print(line)
    if len(env_template.split('\n')) > 20:
        print("...")
else:
    print("‚ö†Ô∏è  No .env.example found. Creating basic template...")
    
    env_template = '''
# Database
DATABASE_URL=postgresql://user:password@localhost:5432/rag_engine

# Vector Store
QDRANT_HOST=localhost
QDRANT_PORT=6333

# Cache
REDIS_URL=redis://localhost:6379/0

# LLM
OPENAI_API_KEY=sk-...

# Security
JWT_SECRET=your-secret-key-change-this
API_KEY_SALT=another-secret
""".strip()

In [None]:
# Create .env file if it doesn't exist
if not os.path.exists('.env'):
    print("Creating .env file...\n")
    
    env_content = '''# Database
DATABASE_URL=postgresql://user:password@localhost:5432/rag_engine

# Vector Store
QDRANT_HOST=localhost
QDRANT_PORT=6333

# Cache & Queue
REDIS_URL=redis://localhost:6379/0
CELERY_BROKER_URL=redis://localhost:6379/1

# LLM Configuration
OPENAI_API_KEY=your-openai-api-key-here
OPENAI_MODEL=gpt-3.5-turbo

# Embeddings
EMBEDDING_MODEL=text-embedding-3-small

# Security
JWT_SECRET=change-this-to-a-secure-random-string
API_KEY_SALT=another-secure-random-string

# App Configuration
ENVIRONMENT=development
LOG_LEVEL=INFO
""".strip()
    
    with open('.env', 'w') as f:
        f.write(env_content)
    
    print("‚úÖ .env file created!")
    print("\n‚ö†Ô∏è  IMPORTANT: Edit .env and add your actual API keys!")
else:
    print("‚úÖ .env file already exists")
    
print("\nTo edit:")
print("  - Jupyter: Open .env file in the file browser")
print("  - Terminal: nano .env  or  vim .env")
print("  - VS Code: code .env")

**‚ö†Ô∏è CRITICAL: Never commit `.env` to Git!**

The `.env` file contains secrets. It's already in `.gitignore`, but double-check:

```bash
# Check if .env is in .gitignore
grep "\.env" .gitignore
```

**Getting an OpenAI API Key:**
1. Go to https://platform.openai.com/
2. Sign up or log in
3. Go to API Keys section
4. Create new secret key
5. Copy and paste into `.env`

**Cost Warning:** OpenAI API calls cost money. GPT-3.5 is ~$0.002 per 1K tokens. Monitor your usage!

## Step 8: Run Database Migrations

Migrations create the database schema (tables, indexes, etc.).

In [None]:
# Check if alembic is installed
try:
    import alembic
    print(f"‚úÖ Alembic found (v{alembic.__version__})")
except ImportError:
    print("‚ùå Alembic not found. Installing...")
    !pip install alembic

# Check for alembic.ini
if os.path.exists('alembic.ini'):
    print("‚úÖ alembic.ini configuration found")
else:
    print("‚ö†Ô∏è  alembic.ini not found. Migrations may not be set up yet.")

In [None]:
# Run migrations
print("üöÄ Running database migrations...\n")

# Note: This requires environment variables to be set
# In production, use: alembic upgrade head

print("To run migrations manually:")
print("  1. Make sure your .env has DATABASE_URL set")
print("  2. Run: alembic upgrade head")
print("\nFor now, we'll check if migrations exist:")

if os.path.exists('alembic'):
    migration_files = [f for f in os.listdir('alembic/versions') if f.endswith('.py')]
    print(f"\n‚úÖ Found {len(migration_files)} migration files")
    print("\nMigration files:")
    for f in sorted(migration_files)[:5]:
        print(f"  - {f}")
    if len(migration_files) > 5:
        print(f"  ... and {len(migration_files) - 5} more")
else:
    print("‚ùå No migrations directory found")

**What are migrations?**

Think of migrations as version control for your database schema:

```
Migration 001: Create users table
Migration 002: Add documents table
Migration 003: Create chunks table
...
```

**Benefits:**
- Track schema changes over time
- Rollback if something breaks
- Consistent schema across environments
- Team collaboration on database design

## Step 9: Verify Everything Works

Let's run comprehensive verification tests.

In [None]:
# Test 1: Import the application
print("Test 1: Importing application modules\n")

try:
    sys.path.insert(0, 'src')
    from main import app
    print("‚úÖ Main application imported successfully")
    print(f"  App name: {app.title}")
except Exception as e:
    print(f"‚ùå Failed to import: {e}")
    print("\nTroubleshooting:")
    print("  1. Are you in the project root directory?")
    print("  2. Is src/ directory present?")
    print("  3. Are all dependencies installed?")

In [None]:
# Test 2: Check configuration
print("\nTest 2: Configuration validation\n")

try:
    from core.config import get_settings
    settings = get_settings()
    
    print("‚úÖ Configuration loaded")
    print(f"  Environment: {settings.environment}")
    print(f"  Database URL: {settings.database_url[:30]}...")
    print(f"  Qdrant Host: {settings.qdrant_host}")
    print(f"  Log Level: {settings.log_level}")
except Exception as e:
    print(f"‚ùå Configuration error: {e}")
    print("\nCheck your .env file is properly configured")

In [None]:
# Test 3: Database connectivity
print("\nTest 3: Database connectivity\n")

try:
    from sqlalchemy import create_engine, text
    from core.config import get_settings
    
    settings = get_settings()
    engine = create_engine(settings.database_url)
    
    with engine.connect() as conn:
        result = conn.execute(text("SELECT 1"))
        row = result.fetchone()
        if row and row[0] == 1:
            print("‚úÖ Database connection successful")
            print(f"  URL: {settings.database_url[:40]}...")
        else:
            print("‚ùå Database test query failed")
            
except Exception as e:
    print(f"‚ùå Database connection failed: {e}")
    print("\nTroubleshooting:")
    print("  1. Is PostgreSQL running? (docker-compose ps)")
    print("  2. Check DATABASE_URL in .env")
    print("  3. Verify database exists: createdb rag_engine")

In [None]:
# Test 4: Vector store connectivity
print("\nTest 4: Vector store (Qdrant) connectivity\n")

try:
    from qdrant_client import QdrantClient
    from core.config import get_settings
    
    settings = get_settings()
    client = QdrantClient(host=settings.qdrant_host, port=settings.qdrant_port)
    
    # Get collections
    collections = client.get_collections()
    print("‚úÖ Qdrant connection successful")
    print(f"  Host: {settings.qdrant_host}:{settings.qdrant_port}")
    print(f"  Collections: {len(collections.collections)}")
    
except Exception as e:
    print(f"‚ùå Qdrant connection failed: {e}")
    print("\nTroubleshooting:")
    print("  1. Is Qdrant running? (docker-compose ps)")
    print("  2. Check QDRANT_HOST and QDRANT_PORT in .env")
    print("  3. Try accessing: http://localhost:6333/dashboard")

In [None]:
# Test 5: Redis connectivity
print("\nTest 5: Redis connectivity\n")

try:
    import redis
    from core.config import get_settings
    
    settings = get_settings()
    r = redis.from_url(settings.redis_url)
    
    # Test with ping
    if r.ping():
        print("‚úÖ Redis connection successful")
        print(f"  URL: {settings.redis_url}")
        info = r.info()
        print(f"  Version: {info.get('redis_version', 'unknown')}")
    else:
        print("‚ùå Redis ping failed")
        
except Exception as e:
    print(f"‚ùå Redis connection failed: {e}")
    print("\nTroubleshooting:")
    print("  1. Is Redis running? (docker-compose ps)")
    print("  2. Check REDIS_URL in .env")

## Step 10: Seed Sample Data (Optional)

Let's add some test data to play with.

In [None]:
# Check if seed script exists
if os.path.exists('scripts/seed_sample_data.py'):
    print("‚úÖ Seed script found!")
    print("\nTo seed sample data, run:")
    print("  python scripts/seed_sample_data.py")
    print("\nOr with options:")
    print("  python scripts/seed_sample_data.py --users 10 --documents 50")
    print("\n‚ö†Ô∏è  Only run this after migrations are applied!")
else:
    print("‚ö†Ô∏è  Seed script not found")

## Step 11: Start the Application

Now let's start the API server!

In [None]:
print("üöÄ Starting the application...")
print("\nYou have 3 options:\n")

print("Option 1: Using Makefile (recommended)")
print("  make run")
print("\nOption 2: Using uvicorn directly")
print("  uvicorn src.main:app --reload --host 0.0.0.0 --port 8000")
print("\nOption 3: Using Python")
print("  python -m src.main")
print("\nOption 4: Run in this notebook (below)")

# Option 4: Uncomment to run in notebook
# !uvicorn src.main:app --reload --host 0.0.0.0 --port 8000

## Step 12: Make Your First API Call

Once the server is running, let's test it!

In [None]:
import requests
import time

BASE_URL = "http://localhost:8000"

# Test 1: Health check
print("Test 1: Health Endpoint\n")
try:
    response = requests.get(f"{BASE_URL}/health", timeout=5)
    if response.status_code == 200:
        data = response.json()
        print(f"‚úÖ API is healthy!")
        print(f"  Status: {data.get('status')}")
        print(f"  Version: {data.get('version')}")
    else:
        print(f"‚ùå Unexpected status: {response.status_code}")
except requests.exceptions.ConnectionError:
    print(f"‚ùå Cannot connect to {BASE_URL}")
    print("  Is the server running?")
except Exception as e:
    print(f"‚ùå Error: {e}")

In [None]:
# Test 2: API Documentation
print("\nTest 2: API Documentation\n")

try:
    response = requests.get(f"{BASE_URL}/docs", timeout=5)
    if response.status_code == 200:
        print(f"‚úÖ API documentation available!")
        print(f"  URL: {BASE_URL}/docs")
        print(f"  Open it in your browser to see all endpoints")
    else:
        print(f"‚ö†Ô∏è  Status: {response.status_code}")
except Exception as e:
    print(f"‚ùå Error: {e}")

In [None]:
# Test 3: List documents (if you have an API key)
print("\nTest 3: Protected Endpoints\n")

API_KEY = "your-api-key-here"  # Replace with actual key

if API_KEY and API_KEY != "your-api-key-here":
    headers = {"Authorization": f"Bearer {API_KEY}"}
    
    try:
        response = requests.get(
            f"{BASE_URL}/api/v1/documents",
            headers=headers,
            timeout=5
        )
        if response.status_code == 200:
            data = response.json()
            print(f"‚úÖ Documents endpoint working!")
            print(f"  Documents: {len(data) if isinstance(data, list) else data.get('total', 'N/A')}")
        else:
            print(f"Status: {response.status_code}")
            print(f"Response: {response.text[:200]}")
    except Exception as e:
        print(f"‚ùå Error: {e}")
else:
    print("‚ö†Ô∏è  Skipping (no API key configured)")
    print("  To get an API key:")
    print("  1. Register a user")
    print("  2. Login to get JWT token")
    print("  3. Or create an API key")

## üéâ Setup Complete!

Congratulations! Your development environment is ready.

## Summary of What We Did:

1. ‚úÖ Verified Python 3.8+
2. ‚úÖ Cloned repository
3. ‚úÖ Created virtual environment
4. ‚úÖ Installed dependencies
5. ‚úÖ Set up external services (PostgreSQL, Qdrant, Redis)
6. ‚úÖ Configured environment variables
7. ‚úÖ Verified connectivity to all services
8. ‚úÖ Made first API calls

## Next Steps:

1. **Run migrations** (if not already done):
   ```bash
   alembic upgrade head
   ```

2. **Seed sample data**:
   ```bash
   python scripts/seed_sample_data.py
   ```

3. **Start the server**:
   ```bash
   make run
   ```

4. **Open the docs**:
   Visit http://localhost:8000/docs

5. **Continue to Module 3**: RAG Fundamentals

## Troubleshooting:

**Issue: Port already in use**
```bash
# Find process using port 8000
lsof -i :8000
# Kill it
kill -9 <PID>
```

**Issue: Permission denied**
```bash
# Fix permissions
sudo chown -R $(whoami) .
```

**Issue: Module not found**
```bash
# Reinstall dependencies
pip install -r requirements.txt --force-reinstall
```

**Issue: Database connection refused**
```bash
# Check if containers are running
docker-compose ps

# Restart services
docker-compose down
docker-compose up -d
```

## Useful Commands:

```bash
# Start all services
docker-compose up -d

# View logs
docker-compose logs -f

# Stop everything
docker-compose down

# Reset database
docker-compose down -v
docker-compose up -d
alembic upgrade head

# Run tests
pytest

# Run specific test
pytest tests/unit/test_core.py -v

# Format code
make format

# Check types
make typecheck
```

---

**You're now ready to build amazing RAG applications!** üöÄ