# üé¨ YTautoma - YouTube Shorts Automation

Generate 60-second YouTube Shorts using local AI models:
- **Story**: Gemma 3 (via Ollama)
- **Images**: Z-Image-Turbo
- **Video**: Wan 2.2
- **Voice**: Edge-TTS (simple) or VibeVoice (advanced)

**Works on**: Colab (A100), RunPod, Lambda Labs

## 1Ô∏è‚É£ Clone & Install

In [None]:
import os

# Auto-detect workspace
if os.path.exists('/content'):
    WORKSPACE = '/content'
elif os.path.exists('/workspace'):
    WORKSPACE = '/workspace'
else:
    WORKSPACE = os.path.expanduser('~')

os.chdir(WORKSPACE)
print(f'Workspace: {WORKSPACE}')

# Clone repo
!git clone https://github.com/DragonLord1998/YTautoma.git 2>/dev/null || (cd YTautoma && git pull)
os.chdir('YTautoma')
PROJECT_DIR = os.getcwd()
print(f'Project: {PROJECT_DIR}')

In [None]:
# Install Python dependencies
!pip install -q -r requirements.txt
!pip install -q git+https://github.com/huggingface/diffusers
!pip install -q edge-tts  # Simple TTS that always works
!pip install -q flash-attn --no-build-isolation 2>/dev/null || echo 'flash-attn optional'

In [None]:
# Install Ollama + Pull Model
!curl -fsSL https://ollama.com/install.sh | sh

import subprocess, time
subprocess.Popen(['ollama', 'serve'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
time.sleep(5)

!ollama pull gemma3:4b

In [None]:
# Setup Wan 2.2 (for video generation)
!mkdir -p models
!git clone https://github.com/Wan-Video/Wan2.2.git models/Wan2.2 2>/dev/null || echo 'Already exists'
!pip install -q -r models/Wan2.2/requirements.txt

# Download TI2V-5B model (smaller, works on most GPUs)
!pip install -q "huggingface_hub[cli]"
!huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir models/Wan2.2-TI2V-5B

In [None]:
# Create .env configuration
import os
PROJECT_DIR = os.getcwd()

env_content = f"""OLLAMA_MODEL=gemma3:4b
OLLAMA_BASE_URL=http://localhost:11434

ZIMAGE_MODEL=Tongyi-MAI/Z-Image-Turbo
ZIMAGE_DEVICE=cuda

WAN_REPO_PATH={PROJECT_DIR}/models/Wan2.2
WAN_MODEL_PATH={PROJECT_DIR}/models/Wan2.2-TI2V-5B
WAN_T5_CPU=true
WAN_OFFLOAD_MODEL=true

# Using edge-tts instead of VibeVoice (simpler, always works)
TTS_ENGINE=edge

LOW_VRAM_MODE=true
TORCH_DTYPE=float16
"""

with open('.env', 'w') as f:
    f.write(env_content)

print('‚úÖ Configuration saved!')

## 2Ô∏è‚É£ Generate YouTube Short

In [None]:
# Quick test: Story only (no GPU needed)
!python main.py --story-only -c mystery

In [None]:
# Generate story + images (skip video for speed)
!python main.py --images-only -c horror

In [None]:
# Full pipeline with video (requires A100 or better)
!python main.py -c sci-fi

## 3Ô∏è‚É£ View & Download

In [None]:
# List outputs
!find output -name '*.mp4' -o -name '*.png' -o -name '*.json' | head -30

In [None]:
# View generated images
from IPython.display import Image, display
import glob

images = sorted(glob.glob('output/**/base_image.png', recursive=True))
for img in images[:6]:
    print(img)
    display(Image(filename=img, width=300))

In [None]:
# Download video (Colab only)
import os, glob

try:
    from google.colab import files
    videos = glob.glob('output/**/*.mp4', recursive=True)
    if videos:
        latest = max(videos, key=lambda x: os.path.getmtime(x))
        print(f'Downloading: {latest}')
        files.download(latest)
    else:
        print('No video yet. Run full pipeline first!')
except ImportError:
    print('Not in Colab. Videos at:')
    !find output -name '*.mp4'