# Gaza Video Classifier - Google Colab Edition

Run multimodal video classification (Audio + OCR + Vision) using Google Colab's GPU.

**Benefits:**
- No local resource usage (won't crash your Mac)
- Fast GPU processing with LLaVA vision model
- Google One subscribers get priority GPU access

**Setup Time:** ~5 minutes first run, then instant for subsequent videos

## Step 1: Install Dependencies

In [None]:
# Install required packages
!apt-get update -qq
!apt-get install -y ffmpeg tesseract-ocr tesseract-ocr-ara tesseract-ocr-eng
!pip install -q pytesseract pillow requests whisper

## Step 2: Install Ollama and LLaVA

In [None]:
# Install Ollama
!curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama server in background
import subprocess
import time

ollama_process = subprocess.Popen(['ollama', 'serve'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
time.sleep(5)  # Wait for server to start

print("✅ Ollama server started")

In [None]:
# Download models (this will take a few minutes first time)
# LLaVA for vision analysis
!ollama pull llava:7b

# DeepSeek for classification (or use any other model you prefer)
# Note: You'll need to use a model available on Ollama
# Since DeepSeek might not be available, we'll use llama2 or mistral
!ollama pull llama2:13b  # or mistral:7b for faster processing

print("✅ Models downloaded")

## Step 3: Install Whisper.cpp

In [None]:
# Clone and build whisper.cpp
!git clone https://github.com/ggerganov/whisper.cpp.git
!cd whisper.cpp && make

# Download multilingual model
!cd whisper.cpp && bash ./models/download-ggml-model.sh base

print("✅ Whisper.cpp ready")

## Step 4: Upload Your Python Scripts

Upload these files from your local machine:
- `analyze_frame_content.py`
- `extract_text_from_video.py`
- `classify_video_multimodal.py`

Or run the cell below to create them directly:

In [None]:
# This cell will be populated with your script files if needed
# For now, use Colab's file upload feature:
from google.colab import files

print("Upload your Python scripts (analyze_frame_content.py, extract_text_from_video.py, classify_video_multimodal.py):")
uploaded = files.upload()

print(f"✅ Uploaded {len(uploaded)} files")

## Step 5: Upload Video to Classify

In [None]:
from google.colab import files

print("Upload your video file:")
uploaded_videos = files.upload()

video_filename = list(uploaded_videos.keys())[0]
print(f"✅ Video uploaded: {video_filename}")

## Step 6: Run Multimodal Classification

In [None]:
# Note: Update the LOCAL_LLM_MODEL in classify_video_multimodal.py to use llama2:13b or mistral:7b
# since DeepSeek might not be available on Colab

# Run classification with vision analysis
!python3 classify_video_multimodal.py {video_filename} --language ar --frames 15 --strategy sections

print("\n✅ Classification complete!")

## Step 7: Download Results

In [None]:
from google.colab import files
import os

# Find the output JSON file
json_files = [f for f in os.listdir('.') if f.endswith('_multimodal.json')]

if json_files:
    result_file = json_files[0]
    print(f"Downloading: {result_file}")
    files.download(result_file)
    print("✅ Results downloaded!")
else:
    print("❌ No result file found")

## Optional: Batch Process Multiple Videos

In [None]:
# Upload multiple videos and process them all
from google.colab import files
import os
import json

print("Upload all videos to process:")
uploaded_batch = files.upload()

results = []

for video_file in uploaded_batch.keys():
    if video_file.endswith(('.mp4', '.avi', '.mov')):
        print(f"\n{'='*80}")
        print(f"Processing: {video_file}")
        print(f"{'='*80}\n")
        
        !python3 classify_video_multimodal.py {video_file} --language ar --frames 15
        
        # Load result
        result_file = video_file.replace('.mp4', '_multimodal.json')
        if os.path.exists(result_file):
            with open(result_file, 'r') as f:
                results.append(json.load(f))

# Save batch results
with open('batch_results.json', 'w') as f:
    json.dump(results, f, indent=2, ensure_ascii=False)

print(f"\n✅ Processed {len(results)} videos")
print("Downloading batch results...")
files.download('batch_results.json')

## Performance Notes

**With Google One Colab Benefits:**
- GPU: A100 or V100 (much faster than Mac CPU)
- Processing time: ~30-60 seconds per video (vs 90-150s on Mac)
- No crashes or freezing
- Can process 50+ videos in one session

**Tips:**
- Keep the session active (it will disconnect after ~90 min idle)
- Download results periodically to avoid losing them
- For large batches, save intermediate results