# Reddit Video Processor for YouTube - Google Colab

This notebook allows you to run your Reddit video processing pipeline in the Google Colab environment. It handles all the necessary setup, dependency installation, and file operations to process and upload videos from Reddit to YouTube.

## 1. GPU Setup & Environment Configuration

First, we'll check for GPU availability and configure the environment to take advantage of Colab's T4 GPU.

In [None]:
# Check for GPU availability and optimize for Colab T4 GPU
import torch
import os

# Check if GPU is available
gpu_available = torch.cuda.is_available()
gpu_info = !nvidia-smi -L

if gpu_available:
    gpu_model = torch.cuda.get_device_name(0)
    print(f"🚀 GPU detected: {gpu_model}")
    print(f"GPU Details: {gpu_info[0]}")
    print("CUDA Version:", torch.version.cuda)
    
    # Set environment variables for GPU acceleration
    os.environ["CUDA_VISIBLE_DEVICES"] = "0"  # Use first GPU
    os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:512"  # Optimize memory usage for T4

    # Show memory available
    !nvidia-smi --query-gpu=memory.total,memory.free --format=csv
else:
    print("⚠️ No GPU detected. Running in CPU mode.")
    print("For better performance, enable GPU runtime in Runtime → Change runtime type.")
    
# Import common libraries we'll need throughout the notebook
from google.colab import files
import os
import shutil
import pathlib
import glob
import json
import yaml
from getpass import getpass

## 2. Setup Environment & Dependencies

This cell handles the complete setup process including cloning the repository, installing dependencies and configuring ImageMagick.

In [None]:
# Clone the GitHub repository
!git clone https://github.com/beenycool/yotuubef
%cd yotuubef

# Install necessary system packages
!apt-get update
!apt-get install -y ffmpeg imagemagick

# Install Python packages with GPU support
!pip install -q yt-dlp moviepy praw numpy pillow google-api-python-client google-auth-oauthlib google-auth-httplib2 psutil google-generativeai elevenlabs joblib
!pip install -q opencv-python-headless torch torchvision --extra-index-url https://download.pytorch.org/whl/cu118
!pip install -q scikit-learn pyyaml

# Configure ImageMagick to allow text operations
!sed -i 's/rights="none" pattern="PS"/rights="read|write" pattern="PS"/' /etc/ImageMagick-6/policy.xml
!sed -i 's/rights="none" pattern="PDF"/rights="read|write" pattern="PDF"/' /etc/ImageMagick-6/policy.xml
!sed -i 's/rights="none" pattern="XPS"/rights="read|write" pattern="XPS"/' /etc/ImageMagick-6/policy.xml

# Configure FFMPEG for GPU acceleration if available
if gpu_available:
    print("\n🚀 Configuring FFMPEG for GPU acceleration")
    # Create an ffmpeg configuration with CUDA support
    ffmpeg_config = '''
hwaccel=cuda
hwaccel_output_format=cuda
'''
    with open('ffmpeg.config', 'w') as f:
        f.write(ffmpeg_config)
    os.environ['FFMPEG_CONFIG_FILE'] = 'ffmpeg.config'
    
    # Set environment variables to optimize for GPU video processing
    os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID'
    os.environ['MOVIEPY_CUDA_ENABLED'] = '1'

# Create project directories
os.makedirs("music", exist_ok=True)
os.makedirs("temp_processing", exist_ok=True)

# Verify required files exist in cloned repo
required_files = ['script.py', 'video_processor.py', 'config.yaml']
missing_files = [file for file in required_files if not os.path.exists(file)]

if missing_files:
    raise FileNotFoundError(f"Missing required files in repository: {', '.join(missing_files)}")
    
print("✅ Environment setup complete. All required files found.")

## 3. Project Assets & Files

Check for existing project assets (music, watermark, client secrets) and prompt for upload only if needed.

In [None]:
# --- MUSIC FILES ---
# Check if music files already exist in the repository
existing_music_files = glob.glob("music/*.mp3") + glob.glob("music/*.wav")

if existing_music_files:
    print(f"🎵 Found {len(existing_music_files)} music files in the repository:")
    for music_file in existing_music_files:
        print(f"  - {os.path.basename(music_file)}")
    print("\nUsing existing music files from the repository.")
else:
    print("No music files found in repository. Please upload your music files:")
    uploaded_music = files.upload()
    
    # Move music files to the music directory
    for filename in uploaded_music.keys():
        shutil.move(filename, os.path.join("music", filename))
        print(f"Moved {filename} to music folder")

# --- WATERMARK ---
# Check if watermark already exists in the repository
if os.path.exists("watermark.png"):
    print("\n🖼️ Found watermark.png in the repository. Using existing watermark.")
else:
    print("\nWatermark.png not found in repository.")
    print("Please upload your watermark.png file (or press Enter to skip):")
    uploaded_watermark = files.upload()
    
    for filename in uploaded_watermark.keys():
        if filename.lower().endswith(".png"):
            # Rename to watermark.png if needed
            if filename != "watermark.png":
                shutil.move(filename, "watermark.png")
                print(f"Uploaded and renamed {filename} to watermark.png")
            else:
                print(f"Uploaded watermark.png")

# --- GOOGLE CLIENT SECRETS ---
# Look for existing client secrets files in the repository
client_secrets_files = glob.glob("client_secret*.json")

if client_secrets_files:
    client_secrets_file = client_secrets_files[0]
    print(f"\n🔑 Found client secrets file in the repository: {client_secrets_file}")
    print("Using existing client secrets file for YouTube API authentication.")
else:
    print("\nNo client secrets file found in the repository.")
    print("For YouTube uploads, you need to upload your client_secrets.json file.")
    print("\nPlease upload your client_secrets.json file:")
    
    uploaded_secrets = files.upload()
    client_secrets_file = ""
    
    for filename in uploaded_secrets.keys():
        if "client_secret" in filename and filename.endswith(".json"):
            client_secrets_file = filename
            print(f"Saved {filename} for this session")
            break
    
    if not client_secrets_file:
        print("⚠️ WARNING: No valid client_secrets.json uploaded. YouTube uploads disabled.")

## 4. Configure API Keys & Environment Variables

Set up all necessary API keys for Reddit, Gemini, and ElevenLabs.

In [None]:
# Function to load credentials from a JSON file
def load_credentials_from_file():
    print("Upload your credentials.json file:")
    try:
        uploaded = files.upload()
        if not uploaded:
            print("No file uploaded, will use manual input.")
            return None
            
        filename = list(uploaded.keys())[0]
        if not filename.endswith('.json'):
            print("Warning: Uploaded file is not a JSON file.")
            return None
            
        with open(filename, 'r') as f:
            creds = json.load(f)
            
        print("Successfully loaded credentials from file.")
        return creds
    except Exception as e:
        print(f"Error loading credentials file: {e}")
        return None

# Ask if user wants to load from file
use_file = input("Do you want to load credentials from a JSON file? (y/n): ").lower() == 'y'
creds = None

if use_file:
    creds = load_credentials_from_file()

# Load from environment or ask for input
# Reddit API Credentials
if creds and 'reddit' in creds:
    reddit_client_id = creds['reddit'].get('client_id', '')
    reddit_client_secret = creds['reddit'].get('client_secret', '')
    reddit_user_agent = creds['reddit'].get('user_agent', "python:VideoBot:v1.5 (by /u/YOUR_USERNAME)")
else:
    reddit_client_id = os.environ.get('REDDIT_CLIENT_ID') or input("Enter your Reddit Client ID: ")
    reddit_client_secret = os.environ.get('REDDIT_CLIENT_SECRET') or getpass("Enter your Reddit Client Secret: ")
    reddit_user_agent = os.environ.get('REDDIT_USER_AGENT') or \
                       input("Enter your Reddit User Agent (or press Enter for default): ") or \
                       "python:VideoBot:v1.5 (by /u/YOUR_USERNAME)"

# Gemini API Key
if creds and 'gemini' in creds:
    gemini_api_key = creds['gemini'].get('api_key', '')
else:
    gemini_api_key = os.environ.get('GEMINI_API_KEY') or getpass("Enter your Google Gemini API Key: ")

# ElevenLabs API Key (optional)
if creds and 'elevenlabs' in creds:
    elevenlabs_api_key = creds['elevenlabs'].get('api_key', '')
else:
    elevenlabs_api_key = os.environ.get('ELEVENLABS_API_KEY') or getpass("Enter your ElevenLabs API Key (or press Enter to skip): ")

# Set environment variables
os.environ["REDDIT_CLIENT_ID"] = reddit_client_id
os.environ["REDDIT_CLIENT_SECRET"] = reddit_client_secret
os.environ["REDDIT_USER_AGENT"] = reddit_user_agent
os.environ["GEMINI_API_KEY"] = gemini_api_key
os.environ["GOOGLE_CLIENT_SECRETS_FILE"] = client_secrets_file if client_secrets_file else ""

if elevenlabs_api_key:
    os.environ["ELEVENLABS_API_KEY"] = elevenlabs_api_key

# Verify we have the essential variables 
missing_vars = []
if not reddit_client_id: missing_vars.append("Reddit Client ID")
if not reddit_client_secret: missing_vars.append("Reddit Client Secret")
if not gemini_api_key: missing_vars.append("Gemini API Key")
if not client_secrets_file: missing_vars.append("Google Client Secrets file")

if missing_vars:
    print(f"\n⚠️ WARNING: Missing essential credentials: {', '.join(missing_vars)}")
    print("Some features may not work correctly.")
else:
    print("\n✅ Environment variables set successfully!")

# For security, print masked versions of credentials
def mask_string(s):
    if not s: return "<not set>"
    return s[:3] + "*" * (len(s) - 6) + s[-3:] if len(s) > 6 else "****"

print("\nCredentials summary:")
print(f"Reddit Client ID: {mask_string(reddit_client_id)}")
print(f"Reddit Client Secret: {mask_string(reddit_client_secret)}")
print(f"Gemini API Key: {mask_string(gemini_api_key)}")
print(f"ElevenLabs API Key: {'<not set>' if not elevenlabs_api_key else mask_string(elevenlabs_api_key)}")
print(f"Google Client Secrets: {'<not set>' if not client_secrets_file else 'Loaded from ' + client_secrets_file}")

## 5. Run Script & Download Results

Run the script with subreddits from config.yaml and then download the processed videos if desired.

In [None]:
try:
    # Load config.yaml
    with open('config.yaml') as f:
        config = yaml.safe_load(f)
    
    # Get subreddits from config
    subreddits = config.get('subreddits', [])
    if not subreddits:
        raise ValueError("No subreddits found in config.yaml")
    
    # Set default options
    max_videos = input("Enter maximum number of videos to process (default: 3): ") or "3"
    upload_choice = input("Do you want to upload videos to YouTube? (y/n, default: n): ").lower() == 'y'
    skip_upload_flag = "" if upload_choice else "--skip_upload"
    
    # Add GPU acceleration flag if available
    gpu_flag = "--use_gpu" if gpu_available else ""
    
    # Configure batch size based on GPU memory
    batch_size = "16" if gpu_available else "8"  # T4 GPUs typically handle batch size of 16 well
    
    # Build command with all subreddits and GPU options
    subreddits_str = " ".join(subreddits)
    command = f"python script.py {subreddits_str} --max_videos {max_videos} {skip_upload_flag} {gpu_flag} --batch_size {batch_size}"
    
    # Display GPU info if available
    if gpu_available:
        print(f"\n🚀 Running with GPU acceleration: {gpu_model}")
        print(f"Batch size: {batch_size} (optimized for T4 GPU)")
    
    print(f"\n🔍 Processing videos from subreddits: {', '.join(subreddits)}")
    print(f"Executing: {command}")
    
    # Monitor GPU usage during processing if available
    if gpu_available:
        !nvidia-smi -l 1 -i 0 &
        import time
        time.sleep(2)  # Give some time for nvidia-smi to start
    
    # Run the command
    !{command}
    
    # Kill nvidia-smi monitoring if it was started
    if gpu_available:
        !pkill -f "nvidia-smi -l 1" || true
    
    # Find and download output videos
    output_videos = glob.glob("temp_processing/*_final.mp4")
    
    if output_videos:
        print(f"\n✅ Found {len(output_videos)} processed videos:")
        for video in output_videos:
            print(f"  - {os.path.basename(video)}")
        
        download_choice = input("\nDo you want to download these videos? (y/n): ").lower()
        if download_choice == "y":
            for video in output_videos:
                try:
                    print(f"Downloading {os.path.basename(video)}...")
                    files.download(video)
                except Exception as e:
                    print(f"Error downloading {video}: {e}")
        
        # Offer cleanup option
        cleanup = input("Do you want to clean up temporary files? (y/n): ").lower()
        if cleanup == "y":
            try:
                files_removed = 0
                if os.path.exists("temp_processing"):
                    for file in glob.glob("temp_processing/*"):
                        if not file.endswith("_final.mp4"):  # Keep final videos
                            os.remove(file)
                            files_removed += 1
                print(f"🧹 Removed {files_removed} temporary files.")
            except Exception as e:
                print(f"Error during cleanup: {e}")
    else:
        print("\n⚠️ No processed videos found in the temp_processing directory.")
        
except Exception as e:
    print(f"❌ Error: {e}")
    print("Please ensure config.yaml exists and contains valid subreddit list")

## Troubleshooting Guide

If you encounter issues running the script in Colab, here are some common solutions:

1. **Memory Issues**: If you get out-of-memory errors, try:
   - Reduce batch size to 8 or 4 (T4 GPUs have 16GB VRAM)
   - Restart the runtime and rerun with fewer videos
   - Upgrade to Colab Pro for more reliable GPU access

2. **GPU Acceleration**: To enable GPU acceleration:
   - Go to Runtime → Change runtime type → Select GPU as Hardware accelerator
   - If the notebook doesn't detect the T4 GPU, restart the runtime and run the first cell again
   - Note that Colab sometimes assigns different GPU types; check GPU model in first cell output

3. **T4 GPU Optimization Tips**:
   - For faster processing, keep batch size at 16 for T4 GPUs
   - If you get CUDA out-of-memory errors, try adding `--half_precision` flag to use FP16
   - Monitor GPU memory usage with nvidia-smi to identify bottlenecks

4. **Authentication Issues**: For YouTube API authentication:
   - Make sure your client_secrets.json is properly uploaded
   - Follow the authentication link that appears in the output

5. **CUDA/GPU Issues**:
   - If torch fails to use CUDA, ensure cu118 version is installed correctly
   - "CUDA unavailable" likely means your Colab runtime isn't set to use GPU
   - For CUDA errors during processing, restart runtime with a fresh allocation