# Getting Started - Document Search Setup

Welcome to PhariaAI's document search tutorial! This guide will help you set up your environment to enable searching through your own documents using PhariaAI's semantic search capabilities.

## What you'll learn

This tutorial will show you how to:
- Set up a Python environment for document search
- Install the required PhariaAI SDK and dependencies
- Run interactive examples directly in Jupyter notebooks

---

## Prerequisites and Setup

## Required Software Installation

You need to install `uv` (a fast Python package manager) to manage dependencies efficiently.
Further we recommend using **Python 3.11 or 3.12** for optimal compatibility with the PhariaAI SDK.

### macOS Installation

1. **Install Homebrew** (if not already installed): https://docs.brew.sh/Installation

2. **Install uv using Homebrew:**
   ```bash
   brew install uv
   ```

3. **Install Python 3.11 or 3.12** (if needed):
   ```bash
   brew install python@3.11
   # or
   brew install python@3.12
   ```

### Windows Installation

1. **Install uv:** Follow the installation guide at https://docs.astral.sh/uv/getting-started/installation/

2. **Install Python 3.11 or 3.12:** Download from https://python.org/downloads/ (ensure you select version 3.11 or 3.12)

---

## Setting Up Your Environment

### 1. Create Virtual Environment with uv

Navigate to your project directory and create a virtual environment:

```bash
uv venv --python 3.11
```

### 2. Activate the Virtual Environment

**On macOS/Linux:**
```bash
source .venv/bin/activate
```

**On Windows:**
```bash
.venv\Scripts\activate
```

### 3. Install Required Dependencies

Initialize uv
```bash
uv init

```

Install the core packages needed for document search functionality using uv all in one command:

```bash
uv pip install python-dotenv pharia-data-sdk pandas tenacity pharia_skill ipywidgets
```

Install the Jupyter related packages.

```bash
uv pip install jupyter notebook ipykernel
```

---

## Create and populate local .env file

The `.env` file stores your environment-specific configuration (API endpoints, credentials, etc.) outside of version control.

1. **Copy the environment template**
   ```bash
   cd "1. Enable searching documents"
   cp .env.sample .env
   ```

&nbsp;

2. **Fill in the missing values**
   
   Open `.env` and populate any empty variables. These values couldn't be pre-filled because they're unique to your company's infrastructure (API URLs, authentication tokens, service endpoints). Contact your administrator to obtain the correct values for your organization's PhariaAI setup.


---

## Validate environment configuration

Before proceeding, let's ensure your `.env` file is properly configured. Run the following code to validate:
- All required environment variables are set
- URLs are properly formatted
- Authentication token provides valid access to PhariaAI services


In [None]:
import os
import sys
from urllib.parse import urlparse
from dotenv import load_dotenv

# Load environment variables
load_dotenv(override=True)

def mask_sensitive_value(var_name, value):
    """Mask sensitive values for display."""
    if "TOKEN" in var_name:
        return value[:8] + "..." if len(value) > 8 else "***"
    return value

def read_env_sample_defaults():
    """Read default values from .env.sample file."""
    defaults = {}
    
    try:
        # Use dotenv to properly parse the file
        from dotenv import dotenv_values
        defaults = dotenv_values('.env.sample')
        # Remove empty values
        defaults = {k: v for k, v in defaults.items() if v}
    except FileNotFoundError:
        print("   ❌  Could not find .env.sample file to check defaults")
    except Exception as e:
        print(f"   ❌  Error reading .env.sample: {str(e)}")
    
    return defaults

def check_required_variables():
    """Check if all required environment variables are set."""
    required_vars = [
        "PHARIA_API_BASE_URL",
        "PHARIA_AI_TOKEN", 
        "PHARIA_DATA_NAMESPACE",
        "PHARIA_DATA_COLLECTION",
        "INDEX",
        "HYBRID_INDEX",
        "FILTER_INDEX",
        "EMBEDDING_MODEL_NAME"
    ]
    
    # Variables that need user-specific values (should be different from defaults)
    vars_needing_customization = [
        "PHARIA_DATA_COLLECTION",
        "INDEX",
        "HYBRID_INDEX",
        "FILTER_INDEX"
    ]
    
    # Read defaults from .env.sample
    defaults = read_env_sample_defaults()
    
    has_errors = False
    print("1️⃣  Checking required environment variables:")
    
    for var in required_vars:
        value = os.getenv(var)
        if not value or value.strip() == "":
            print(f"   ❌ {var}: NOT SET")
            has_errors = True
        else:
            display_value = mask_sensitive_value(var, value)
            
            # Check if user customized variables that need unique values
            if var in vars_needing_customization and var in defaults:
                default_value = defaults[var]
                if value == default_value:
                    print(f"   ❌  {var}: {display_value} - Using default value! Please add a unique suffix (e.g., {default_value}-yourname)")
                    has_errors = True
                else:
                    print(f"   ✅ {var}: {display_value}")
            else:
                print(f"   ✅ {var}: {display_value}")
    
    return has_errors

def validate_api_url(api_base_url):
    """Validate the API base URL format."""
    print("\n2️⃣  Validating URL format:")
    
    if not api_base_url:
        return False
    
    try:
        parsed = urlparse(api_base_url)
        if not parsed.scheme or not parsed.netloc:
            print(f"   ❌ PHARIA_API_BASE_URL: Invalid URL format")
            return True
        else:
            print(f"   ✅ PHARIA_API_BASE_URL: Valid format")
            return False
    except Exception as e:
        print(f"   ❌ PHARIA_API_BASE_URL: Error parsing URL - {str(e)}")
        return True

def test_api_connection(api_base_url, token):
    """Test connection to PhariaAI API."""
    print("\n3️⃣  Testing PhariaAI API access:")
    
    if not api_base_url or not token:
        print("   ❌  Skipping API test - Missing API URL or token")
        return False
    
    try:
        from pharia_data_sdk.connectors import DocumentIndexClient
        
        # Construct the search API URL (matching the format used in the tutorials)
        search_api_url = f"{api_base_url}/v1/studio/search"
        
        # Try to create a client
        search_client = DocumentIndexClient(
            token=token,
            base_url=search_api_url,
        )
        
        # Try to list namespaces as a basic connectivity test
        try:
            namespaces = search_client.list_namespaces()
            print(f"   ✅ API connection successful")
            return False
                
        except Exception as e:
            if "401" in str(e) or "403" in str(e):
                print("   ❌ Authentication failed - Invalid token")
            else:
                print(f"   ❌ API connection failed: {str(e)}")
                print(f"   ❌ Attempted URL: {search_api_url}")
            return True
                
    except ImportError:
        print("   ❌ pharia-data-sdk not installed - Please run: uv pip install pharia-data-sdk")
        return True
    except Exception as e:
        print(f"   ❌ Unexpected error: {str(e)}")
        return True

# Main validation flow
def validate_environment():
    """Run all environment validation checks."""
    print("🔍 Validating environment configuration...\n")
    
    # Check required variables
    has_var_errors = check_required_variables()
    
    # Validate URL format
    api_base_url = os.getenv("PHARIA_API_BASE_URL", "")
    has_url_errors = validate_api_url(api_base_url)
    
    # Test API connection
    token = os.getenv("PHARIA_AI_TOKEN")
    has_connection_errors = test_api_connection(api_base_url, token)
    
    # Print final summary
    print("\n" + "="*50)
    
    if not (has_var_errors or has_url_errors or has_connection_errors):
        print("✅ All validation checks passed! Your environment is properly configured.")
        print("\nYou can now proceed with the tutorial.")
    else:
        print("❌ Validation failed! Please check the errors above.")        

# Run validation
validate_environment()


🔍 Validating environment configuration...

1️⃣  Checking required environment variables:
   ✅ PHARIA_API_BASE_URL: https://api.customer.pharia.com
   ✅ PHARIA_AI_TOKEN: eyJhbGci...
   ✅ PHARIA_DATA_NAMESPACE: Studio-Onboarding-Tests
   ✅ PHARIA_DATA_COLLECTION: pharia-tutorial-rag-vsp-1
   ✅ INDEX: rag-tutorial-index-vps-1
   ✅ HYBRID_INDEX: rag-tutorial-hybrid-index-vsp-1
   ✅ FILTER_INDEX: rag-tutorial-filter-index-vsp-1
   ✅ EMBEDDING_MODEL_NAME: luminous-base

2️⃣  Validating URL format:
   ✅ PHARIA_API_BASE_URL: Valid format

3️⃣  Testing PhariaAI API access:
   ✅ API connection successful

✅ All validation checks passed! Your environment is properly configured.

You can now proceed with the tutorial.


---

## Running the Tutorial

### Interactive Jupyter Experience

All interactions in this tutorial can be executed directly within the Jupyter notebooks provided. 

Simply:

1. Execute in terminal 

```bash
uv run jupyter notebook
``` 
to start the Jupyter notebook
2. **Run each code section** sequentially 
3. **Follow along** with the examples and explanations

No additional setup is required - the notebooks are designed to be self-contained and interactive.

>**Note:**  You can also execute the Jupyter notebook in your favorite environment (Jupyter Lab, VS Code, etc.) (and not via `uv run jupyter notebook` command as above). Make sure that the jupyter notebooks also use the venv to run if you choose to run it like this.

---


## Validate environment configuration

Before proceeding, let's ensure your `.env` file is properly configured. Run the following code to validate:
- All required environment variables are set
- URLs are properly formatted
- Authentication token provides valid access to PhariaAI services


In [None]:
from validation_utils import validate_environment

# Run validation
validate_environment()