# Gaza Video Classifier - Colab Tutorial

**What you'll learn:**
- How to set up a Python environment in Colab
- How to install and run Ollama + LLaVA
- How to upload files and run your scripts
- How to download results

**Runtime:** Click `Runtime` ‚Üí `Change runtime type` ‚Üí Select `T4 GPU` (or A100 if you have Google One)

---

## Understanding Cells

- **‚ñ∂Ô∏è Play button**: Runs the cell
- **Green checkmark**: Cell finished successfully  
- **Red X**: Cell had an error
- **Spinning circle**: Cell is still running

**Run cells in order from top to bottom the first time!**

---
## PART 1: System Setup (Run Once Per Session)
---

### Cell 1: Install System Dependencies

This installs:
- `ffmpeg`: Extract audio from videos
- `tesseract-ocr`: Read text from images  
- `tesseract-ocr-ara`: Arabic language support

**Time:** ~30 seconds

**What to expect:** Lots of text output, then "‚úÖ System tools installed"

In [None]:
%%bash
# Update package list
apt-get update -qq

# Install video and OCR tools
apt-get install -y -qq ffmpeg tesseract-ocr tesseract-ocr-ara tesseract-ocr-eng > /dev/null 2>&1

echo "‚úÖ System tools installed"
echo "   - ffmpeg: $(ffmpeg -version | head -n1)"
echo "   - tesseract: $(tesseract --version | head -n1)"

### Cell 2: Install Python Packages

This installs:
- `pytesseract`: Python wrapper for OCR
- `pillow`: Image processing
- `requests`: HTTP requests for Ollama API

**Time:** ~10 seconds

In [None]:
!pip install -q pytesseract pillow requests

print("‚úÖ Python packages installed")
import pytesseract
import PIL
import requests
print(f"   - pytesseract: {pytesseract.__version__ if hasattr(pytesseract, '__version__') else 'installed'}")
print(f"   - pillow: {PIL.__version__}")
print(f"   - requests: {requests.__version__}")

### Cell 3: Install Whisper.cpp

This:
- Clones whisper.cpp from GitHub
- Compiles it (builds the C++ code)
- Downloads the multilingual model

**Time:** ~2 minutes (compiling takes time)

**Note:** You'll see compilation output - that's normal!

In [None]:
%%bash
# Clone whisper.cpp
if [ ! -d "whisper.cpp" ]; then
    git clone https://github.com/ggerganov/whisper.cpp.git
    echo "üì• Cloned whisper.cpp"
else
    echo "‚úÖ whisper.cpp already exists"
fi

# Build it
cd whisper.cpp
make -j > /dev/null 2>&1
echo "üî® Built whisper.cpp"

# Download model if not already downloaded
if [ ! -f "models/ggml-base.bin" ]; then
    bash ./models/download-ggml-model.sh base
    echo "üì• Downloaded multilingual model"
else
    echo "‚úÖ Model already exists"
fi

echo "‚úÖ Whisper.cpp ready"

### Cell 4: Install Ollama

Ollama is the server that runs LLaVA and DeepSeek locally.

**Time:** ~30 seconds

In [None]:
!curl -fsSL https://ollama.com/install.sh | sh

print("‚úÖ Ollama installed")

### Cell 5: Start Ollama Server

This starts Ollama in the background so we can call it via API.

**Important:** This cell needs to keep running! Don't stop it.

In [None]:
import subprocess
import time

# Start Ollama server in background
ollama_process = subprocess.Popen(
    ['ollama', 'serve'],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL
)

# Wait for server to start
print("üöÄ Starting Ollama server...")
time.sleep(5)

# Test if it's running
import requests
try:
    response = requests.get('http://localhost:11434/api/tags')
    if response.status_code == 200:
        print("‚úÖ Ollama server running")
    else:
        print("‚ö†Ô∏è Ollama might not be ready yet, wait a moment")
except:
    print("‚ö†Ô∏è Ollama not responding, try running this cell again")

### Cell 6: Download AI Models

This downloads:
- `llava:7b` - Vision model for analyzing frames
- `llama2:13b` - Text model for classification (we'll use this instead of DeepSeek since DeepSeek might not be available)

**Time:** ~5-10 minutes (downloading 4-5 GB total)

**Note:** This is the longest step, but you only do it once per session!

In [None]:
# Download LLaVA for vision
print("üì• Downloading LLaVA (vision model)...")
!ollama pull llava:7b

print("\nüì• Downloading Llama2 (classification model)...")
!ollama pull llama2:13b

print("\n‚úÖ All models downloaded!")
print("\nAvailable models:")
!ollama list

---
## PART 2: Upload Your Code
---

### Cell 7: Upload Python Scripts

Upload these 3 files from your Mac:
1. `analyze_frame_content.py`
2. `extract_text_from_video.py`
3. `classify_video_multimodal.py`

**How to upload:**
1. Run this cell
2. Click "Choose Files" button that appears
3. Select all 3 Python files
4. Wait for upload to complete

In [None]:
from google.colab import files

print("üì§ Upload your 3 Python scripts:")
print("   - analyze_frame_content.py")
print("   - extract_text_from_video.py")
print("   - classify_video_multimodal.py")
print("\nClick 'Choose Files' below and select all 3 files:\n")

uploaded = files.upload()

print(f"\n‚úÖ Uploaded {len(uploaded)} files:")
for filename in uploaded.keys():
    print(f"   - {filename} ({len(uploaded[filename])} bytes)")

### Cell 8: Update Model Name in Scripts

**Why:** Your scripts use `deepseek-v3.1:671b-cloud` but Colab uses `llama2:13b`

This cell automatically fixes that.

In [None]:
# Read classify_video_multimodal.py and update the model name
import re

# Update classify_video_multimodal.py
with open('classify_video_multimodal.py', 'r') as f:
    content = f.read()

# Replace DeepSeek model with Llama2
content = content.replace(
    'from classify_video import extract_audio, transcribe_audio, LOCAL_LLM_MODEL',
    'from classify_video import extract_audio, transcribe_audio\nLOCAL_LLM_MODEL = "llama2:13b"'
)

with open('classify_video_multimodal.py', 'w') as f:
    f.write(content)

print("‚úÖ Updated model name to llama2:13b")
print("   (Scripts now use Colab-compatible model)")

---
## PART 3: Process Videos
---

### Cell 9: Upload Video to Classify

Upload a video file (mp4, avi, mov, etc.)

In [None]:
from google.colab import files

print("üì§ Upload a video file to classify:")
uploaded_video = files.upload()

video_filename = list(uploaded_video.keys())[0]
print(f"\n‚úÖ Video uploaded: {video_filename}")
print(f"   Size: {len(uploaded_video[video_filename]) / 1024 / 1024:.1f} MB")

### Cell 10: Run Classification with Vision

This runs the full multimodal pipeline:
1. Extract audio
2. Transcribe with Whisper
3. Extract 15 frames
4. OCR text from frames
5. Vision analysis with LLaVA
6. Classify with Llama2

**Time:** ~30-60 seconds per video (much faster than your Mac!)

In [None]:
# Get the uploaded video filename
video_file = video_filename

print(f"üé¨ Processing: {video_file}")
print("="*80)

# Run classification
!python3 classify_video_multimodal.py "{video_file}" --language ar --frames 15 --strategy sections

print("\n‚úÖ Classification complete!")

### Cell 11: View Results

Display the classification results

In [None]:
import json
import os

# Find the result file
result_files = [f for f in os.listdir('.') if f.endswith('_multimodal.json')]

if result_files:
    result_file = result_files[0]
    
    with open(result_file, 'r', encoding='utf-8') as f:
        result = json.load(f)
    
    print("="*80)
    print("CLASSIFICATION RESULTS")
    print("="*80)
    print(f"\nüìÅ Video: {result['video_name']}")
    print(f"üìÇ Category: {result['category']}")
    print(f"üè∑Ô∏è  Tags: {', '.join(result['tags'])}")
    print(f"üìä Confidence: {result['confidence']}")
    print(f"\nüí≠ Reasoning:\n{result['reasoning']}")
    
    if result.get('visual_evidence'):
        print(f"\nüëÅÔ∏è  Visual Evidence:")
        for evidence in result['visual_evidence']:
            print(f"   - {evidence}")
    
    print(f"\nüìà Processing Stats:")
    print(f"   Audio: {result['transcript_length']} chars")
    print(f"   OCR: {result['ocr_length']} chars")
    print(f"   Vision: {result['vision_length']} chars ({result['num_frames_analyzed']} frames)")
    print("="*80)
else:
    print("‚ùå No result file found")

### Cell 12: Download Results

Download the JSON file with full classification data

In [None]:
from google.colab import files
import os

# Find and download result file
result_files = [f for f in os.listdir('.') if f.endswith('_multimodal.json')]

if result_files:
    result_file = result_files[0]
    print(f"üì• Downloading: {result_file}")
    files.download(result_file)
    print("‚úÖ Downloaded! Check your Mac's Downloads folder")
else:
    print("‚ùå No result file found")

---
## BONUS: Batch Processing
---

### Cell 13: Process Multiple Videos

Upload and process 5-10 videos at once

In [None]:
from google.colab import files
import os
import json

print("üì§ Upload multiple videos (5-10 recommended):")
uploaded_batch = files.upload()

video_files = [f for f in uploaded_batch.keys() if f.endswith(('.mp4', '.avi', '.mov', '.mkv'))]
print(f"\n‚úÖ Uploaded {len(video_files)} videos\n")

results = []

for i, video_file in enumerate(video_files, 1):
    print(f"\n{'='*80}")
    print(f"Processing {i}/{len(video_files)}: {video_file}")
    print(f"{'='*80}\n")
    
    # Run classification
    !python3 classify_video_multimodal.py "{video_file}" --language ar --frames 15 --strategy sections
    
    # Load result
    result_file = video_file.rsplit('.', 1)[0] + '_multimodal.json'
    if os.path.exists(result_file):
        with open(result_file, 'r', encoding='utf-8') as f:
            results.append(json.load(f))
        print(f"‚úÖ {video_file} classified")
    else:
        print(f"‚ö†Ô∏è {video_file} - no result file")

# Save combined results
with open('batch_results.json', 'w', encoding='utf-8') as f:
    json.dump(results, f, indent=2, ensure_ascii=False)

print(f"\n{'='*80}")
print(f"‚úÖ Batch processing complete!")
print(f"   Processed: {len(results)}/{len(video_files)} videos")
print(f"{'='*80}\n")

# Download batch results
print("üì• Downloading batch results...")
files.download('batch_results.json')
print("‚úÖ Done! Check your Downloads folder")

---
## Tips & Troubleshooting
---

### Session Management
- **Session timeout**: ~90 minutes idle (Google One: longer)
- **Keep alive**: Interact with the page occasionally
- **Download results**: Always download before closing!

### If Something Breaks
- **Ollama not responding**: Re-run Cell 5 (start server)
- **Model not found**: Re-run Cell 6 (download models)
- **Import error**: Re-run Cell 7 (upload scripts)

### Performance
- **With Google One**: A100 GPU = ~30-60s per video
- **Free tier**: T4 GPU = ~60-90s per video
- **Batch processing**: ~50 videos per hour

### Next Session
When you open this notebook again:
1. Run Cells 1-6 (setup) - takes ~10 min
2. Run Cell 7 (upload scripts) - takes ~10 sec  
3. Then jump to Cell 9 (process videos)