# üé¨ YTautoma - YouTube Shorts Automation

Generate 60-second YouTube Shorts using local AI models.

| Component | Model |
|-----------|-------|
| Story | Gemma 3 (Ollama) |
| Images | Z-Image-Turbo |
| Video | Wan 2.2 |
| Voice | VibeVoice / Edge-TTS |

**Works on**: RunPod, Colab (A100), Lambda Labs

## 1Ô∏è‚É£ System Setup

In [None]:
# Install system dependencies
!apt-get update -qq && apt-get install -y -qq ffmpeg
!ffmpeg -version | head -1

In [None]:
# Upgrade PyTorch (run this, then RESTART RUNTIME before next cell)
!pip install -q --upgrade typing_extensions torch torchvision
print('‚úÖ Done! Now restart the runtime (Runtime > Restart runtime) and run the next cell')

In [None]:
# Verify PyTorch (run after restart)
import torch
print(f'PyTorch {torch.__version__} | CUDA: {torch.cuda.is_available()}')

## 2Ô∏è‚É£ Clone & Install

In [None]:
import os

# Auto-detect workspace
WORKSPACE = '/content' if os.path.exists('/content') else '/workspace' if os.path.exists('/workspace') else os.path.expanduser('~')
os.chdir(WORKSPACE)

# Clone repo
!git clone https://github.com/DragonLord1998/YTautoma.git 2>/dev/null || (cd YTautoma && git pull)
os.chdir('YTautoma')
PROJECT_DIR = os.getcwd()
print(f'Project: {PROJECT_DIR}')

In [None]:
# Install Python dependencies
!pip install -q -r requirements.txt
!pip install -q git+https://github.com/huggingface/diffusers
!pip install -q edge-tts easydict

In [None]:
# Install Ollama + Gemma 3
!curl -fsSL https://ollama.com/install.sh | sh

import subprocess, time
subprocess.Popen(['ollama', 'serve'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
time.sleep(5)

!ollama pull gemma3:4b

In [None]:
# Setup Wan 2.2 (video generation)
!mkdir -p models
!git clone https://github.com/Wan-Video/Wan2.2.git models/Wan2.2 2>/dev/null || echo 'Already cloned'
!pip install -q -r models/Wan2.2/requirements.txt

# Download model (10GB) - uncomment to enable video generation
# !huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir models/Wan2.2-TI2V-5B

In [None]:
# Setup VibeVoice (high-quality TTS) - optional
!git clone https://github.com/microsoft/VibeVoice.git models/VibeVoice 2>/dev/null || echo 'Already cloned'
!cd models/VibeVoice && pip install -q -e .

In [None]:
# Create .env configuration
import os
PROJECT_DIR = os.getcwd()

env_content = f"""OLLAMA_MODEL=gemma3:4b
OLLAMA_BASE_URL=http://localhost:11434

ZIMAGE_MODEL=Tongyi-MAI/Z-Image-Turbo
ZIMAGE_DEVICE=cuda

WAN_REPO_PATH={PROJECT_DIR}/models/Wan2.2
WAN_MODEL_PATH={PROJECT_DIR}/models/Wan2.2-TI2V-5B
WAN_T5_CPU=true
WAN_OFFLOAD_MODEL=true

TTS_ENGINE=edge
VIBEVOICE_REPO_PATH={PROJECT_DIR}/models/VibeVoice

LOW_VRAM_MODE=true
TORCH_DTYPE=float16
"""

with open('.env', 'w') as f:
    f.write(env_content)

print('‚úÖ Configuration saved!')

## 3Ô∏è‚É£ Generate YouTube Short

In [None]:
# Quick test: Story only
!python main.py --story-only -c mystery

In [None]:
# Generate story + images
!python main.py --images-only -c horror

In [None]:
# Full pipeline (requires Wan 2.2 model download)
# !python main.py -c sci-fi

## 4Ô∏è‚É£ View & Download

In [None]:
# View generated images
from IPython.display import Image, display
import glob

images = sorted(glob.glob('output/**/base_image.png', recursive=True))[-6:]
for img in images:
    print(img.split('/')[-2])
    display(Image(filename=img, width=250))

In [None]:
# Download outputs
import os, glob

try:
    from google.colab import files
    !cd output && zip -r ../output.zip .
    files.download('output.zip')
except ImportError:
    print('Find outputs at: ./output/')
    !find output -name '*.mp4' -o -name '*.png' | head -20