<a href="https://colab.research.google.com/github/Amey-Thakur/ZERO-SHOT-VIDEO-GENERATION/blob/main/Source%20Code/ZERO-SHOT-VIDEO-GENERATION.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#
<h1 align="center">🎬 Zero-Shot Video Generation</h1>
<h3 align="center"><i>Text-to-Video Synthesis via Temporal Latent Warping & Cross-Frame Attention</i></h3>

<div align="center">

| **Author** | **Profiles** |
|:---:|:---|
| **Amey Thakur** | [![GitHub](https://img.shields.io/badge/GitHub-Amey--Thakur-181717?logo=github)](https://github.com/Amey-Thakur) [![ORCID](https://img.shields.io/badge/ORCID-0000--0001--5644--1575-A6CE39?logo=orcid)](https://orcid.org/0000-0001-5644-1575) [![Google Scholar](https://img.shields.io/badge/Google_Scholar-Amey_Thakur-4285F4?logo=google-scholar&logoColor=white)](https://scholar.google.ca/citations?user=0inooPgAAAAJ&hl=en) [![Kaggle](https://img.shields.io/badge/Kaggle-Amey_Thakur-20BEFF?logo=kaggle)](https://www.kaggle.com/ameythakur20) |

---

**Research Foundation:** Based on [Text2Video-Zero](https://arxiv.org/abs/2303.13439) by the Picsart AI Research (PAIR) team.

🚀 **Live Demo:** [Hugging Face Space](https://huggingface.co/spaces/AmeyThakur/ZERO-SHOT-VIDEO-GENERATION) | 🎬 **Video Demo:** [YouTube](https://youtu.be/za9hId6UPoY) | 💻 **Repository:** [GitHub](https://github.com/Amey-Thakur/ZERO-SHOT-VIDEO-GENERATION)

</div>

## 📖 Introduction

> **Zero-shot video generation enables creating temporally consistent videos from text prompts without requiring any video-specific training.**

This implementation demonstrates the **Text2Video-Zero** framework, which transforms pre-trained Text-to-Image diffusion models into video generators by leveraging temporal latent warping and global cross-frame attention.

## ☁️ Cloud Environment Setup
Execute this cell to configure the environment. This script manages platform-agnostic paths and synchronizes required neural weights.

In [None]:
import os
import sys
import shutil
import subprocess

try:
    shell = get_ipython()
    IS_COLAB = 'google.colab' in str(shell)
    IS_KAGGLE = "kaggle" in os.environ.get("KAGGLE_KERNEL_RUN_TYPE", "")
except NameError:
    IS_COLAB = IS_KAGGLE = False

PROJECT_NAME = "ZERO-SHOT-VIDEO-GENERATION"
print(f"🌍 Environment: {'Google Colab' if IS_COLAB else ('Kaggle' if IS_KAGGLE else 'Local/Custom')}")

def run_setup():
    if IS_COLAB or IS_KAGGLE:
        WORKDIR = "/content" if IS_COLAB else "/kaggle/working"
        os.chdir(WORKDIR)
        if not os.path.exists(PROJECT_NAME):
            os.system(f"git clone https://github.com/Amey-Thakur/{PROJECT_NAME}")
        os.chdir(os.path.join(WORKDIR, PROJECT_NAME, "Source Code"))
        print("🛠️ Installing Dependencies...")
        os.system("pip install -q diffusers transformers accelerate einops kornia imageio imageio-ffmpeg moviepy tomesd decord safetensors huggingface_hub ipywidgets")
        
        from huggingface_hub import hf_hub_download
        annotators = {
            "body_pose_model.pth": "lllyasviel/Annotators",
            "hand_pose_model.pth": "lllyasviel/Annotators",
            "dpt_hybrid-midas-501f0c75.pt": "lllyasviel/Annotators",
            "upernet_global_small.pth": "lllyasviel/Annotators"
        }
        os.makedirs("annotator/ckpts", exist_ok=True)
        for f, repo in annotators.items():
            target = os.path.join("annotator/ckpts", f)
            if not os.path.exists(target):
                path = hf_hub_download(repo_id=repo, filename=f)
                shutil.copy(path, target)
    print("✅ Environment Ready.")

run_setup()
if os.getcwd() not in sys.path: sys.path.append(os.getcwd())

## 1️⃣ Model Initialization
Initializing the diffusion pipeline and verifying hardware availability.

In [None]:
import torch
import gc
import warnings
from model import Model, ModelType

warnings.filterwarnings("ignore", category=UserWarning, module="diffusers")
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if device == "cuda" else torch.float32
print(f"🎯 Device: {device} | Precision: {dtype}")
if device == "cuda":
    vram = torch.cuda.get_device_properties(0).total_memory / 1024**3
    print(f"📟 GPU: {torch.cuda.get_device_name(0)} ({vram:.2f} GB VRAM)")

print("⏳ Loading Pipeline...")
model = Model(device=device, dtype=dtype)

def cleanup_vram():
    gc.collect()
    if torch.cuda.is_available(): torch.cuda.empty_cache()
print("✅ Ready.")

## 2️⃣ Video Generation Studio
Use the interface below to generate videos. The parameters allow for fine-tuning motion and resolution.

In [None]:
import ipywidgets as widgets
from IPython.display import display, HTML, clear_output
import base64

EXAMPLES = [
    "an astronaut waving the arm on the moon",
    "a sloth surfing on a wakeboard",
    "a cute cat walking on grass",
    "a horse is galloping on a street",
    "a gorilla dancing on times square"
]

style = HTML("""<style>
    .studio-box { padding: 25px; background-color: #1e1e2e; border-radius: 12px; border: 1px solid #313244; color: #cdd6f4; }
    .gen-btn { background: #89b4fa !important; color: #11111b !important; font-weight: bold !important; height: 45px !important; border-radius: 8px !important; }
    .widget-label { font-weight: 500; color: #a6adc8; }
    .studio-title { font-size: 1.4rem; font-weight: 700; margin-bottom: 15px; color: #f5e0dc; }
</style>""")

presets = widgets.Dropdown(options=[("Select a visual preset...", "")] + [(p, p) for p in EXAMPLES], description='Presets', layout={'width': '100%'})
prompt = widgets.Textarea(value='an astronaut waving the arm on the moon', description='Prompt', layout={'width': '100%', 'height': '80px'})
v_len = widgets.IntSlider(value=8, min=4, max=24, description='Frames', layout={'width': '48%'})
v_res = widgets.Dropdown(options=[256, 512, 768], value=512, description='Resolution', layout={'width': '48%'})
v_mot = widgets.FloatSlider(value=12.0, min=0.0, max=30.0, description='Motion', layout={'width': '48%'})
v_stp = widgets.IntSlider(value=50, min=10, max=100, description='Steps', layout={'width': '48%'})
v_fps = widgets.IntSlider(value=4, min=1, max=12, description='FPS', layout={'width': '48%'})
v_seed = widgets.IntText(value=42, description='Seed', layout={'width': '48%'})
btn = widgets.Button(description='Generate Video', layout={'width': '100%'}); btn.add_class('gen-btn')
out = widgets.Output()

def on_gen(b):
    btn.disabled = True; btn.description = "Processing synthesis..."
    with out: 
        clear_output(); print("⏳ Generating...")
        try:
            path = model.process_text2video(prompt=prompt.value, video_length=v_len.value, resolution=v_res.value, 
                                           motion_field_strength_x=v_mot.value, motion_field_strength_y=v_mot.value, 
                                           seed=v_seed.value, fps=v_fps.value, path="output.mp4")
            with open(path, "rb") as f: data = f.read()
            b64 = base64.b64encode(data).decode()
            display(HTML(f'<div style="padding:15px;background:#181825;border-radius:10px;text-align:center;"><video width="100%" controls autoplay loop><source src="data:video/mp4;base64,{b64}" type="video/mp4"></video></div>'))
        except Exception as e: print(f"❌ Error: {e}")
    btn.disabled = False; btn.description = "Generate Video"; cleanup_vram()

presets.observe(lambda c: setattr(prompt, 'value', c['new']) if c['new'] else None, 'value')
btn.on_click(on_gen)

ui = widgets.VBox([widgets.HTML("<div class='studio-title'>🎬 Video Generation Studio</div>"), presets, prompt, 
                  widgets.HBox([v_len, v_res]), widgets.HBox([v_mot, v_stp]), widgets.HBox([v_fps, v_seed]), btn])
ui.add_class('studio-box')
display(style, ui, out)

## 📚 References
1. **Text2Video-Zero**: [arXiv:2303.13439](https://arxiv.org/abs/2303.13439)
2. **Dreamlike Photoreal 2.0**: [HuggingFace](https://huggingface.co/dreamlike-art/dreamlike-photoreal-2.0)

---
*Amey Thakur | University of Windsor*