# Gaza Journalist Video Classifier - Validation

**Multimodal classification with Audio + Vision + OCR**

## Instructions:
1. **Run Cell 1 ONCE** - Setup (takes ~5-10 minutes)
2. **Run Cell 2 MANY TIMES** - Process videos with different parameters

## Cell 1: One-Time Setup (Run Once Per Session)

In [None]:
%%bash
# ============================================================================
# COMPLETE SETUP - Run this ONCE at the start of your Colab session
# ============================================================================

echo "[1/6] Installing Python packages..."
pip install -q yt-dlp pandas openpyxl pytesseract pillow requests > /dev/null 2>&1

echo "[2/6] Installing system packages..."
apt-get update -qq > /dev/null 2>&1
apt-get install -qq tesseract-ocr tesseract-ocr-ara ffmpeg git build-essential > /dev/null 2>&1

echo "[3/6] Setting up Whisper.cpp..."
if [ ! -d "whisper.cpp" ]; then
    git clone https://github.com/ggerganov/whisper.cpp.git > /dev/null 2>&1
    cd whisper.cpp
    make -j4 > /dev/null 2>&1
    cd ..
fi

echo "[4/6] Downloading Whisper model..."
if [ ! -f "whisper.cpp/models/ggml-base.bin" ]; then
    cd whisper.cpp
    bash ./models/download-ggml-model.sh base > /dev/null 2>&1
    cd ..
fi

echo "[5/6] Installing Ollama..."
if ! command -v ollama &> /dev/null; then
    curl -fsSL https://ollama.com/install.sh | sh > /dev/null 2>&1
fi

echo "[6/6] Starting Ollama and pulling models..."
# Start Ollama in background
nohup ollama serve > /tmp/ollama.log 2>&1 &
sleep 5

# Pull models (this takes a while)
echo "  - Pulling Qwen 2.5 72B (large model, ~5 mins)..."
ollama pull qwen2.5:72b > /dev/null 2>&1
echo "  - Pulling LLaVA vision model..."
ollama pull llava-llama-3:8b > /dev/null 2>&1

echo ""
echo "✓ Setup complete! Ready to process videos."
echo "You can now run Cell 2 to upload your Excel file and start validation."

## Cell 2: Upload Excel & Run Validation (Run Multiple Times)

In [None]:
# ============================================================================
# UPLOAD & RUN - Run this cell whenever you want to process videos
# ============================================================================

from google.colab import files
import os

# Upload Excel file
print("Please upload your Excel file (Gaza Archive Form)...\n")
uploaded = files.upload()
excel_file = list(uploaded.keys())[0]
print(f"\n✓ Uploaded: {excel_file}\n")

# Configuration
SAMPLE_SIZE = 30  # Change this: 10 (quick), 30 (demo), 50+ (full validation)
LANGUAGE = "ar"   # Arabic audio
OUTPUT_DIR = "validation_output"

print(f"Configuration:")
print(f"  Sample size: {SAMPLE_SIZE} videos")
print(f"  Language: {LANGUAGE}")
print(f"  Output directory: {OUTPUT_DIR}")
print("\nStarting validation...\n")
print("=" * 80)

# Import validation script
exec(open('colab_validation.py').read())

# Run validation
results = run_validation(
    excel_file=excel_file,
    sample_size=SAMPLE_SIZE,
    output_dir=OUTPUT_DIR
)

print("\n" + "=" * 80)
print("✓ Validation complete!")
print(f"\nResults saved in: {OUTPUT_DIR}/")
print(f"Download results: {OUTPUT_DIR}/validation_results.json")