# Ruyi Image-to-Video Generator

## Select A100 GPU

This notebook sets up and runs the Ruyi image-to-video generation model directly from the GitHub repository, bypassing compatibility issues with Hugging Face `pipeline`.

In [1]:
# Clone the repository and install requirements
!git clone https://github.com/c1t1zen1/Ruyi-Models
%cd Ruyi-Models
!pip install -r requirements.txt

Cloning into 'Ruyi-Models'...
remote: Enumerating objects: 264, done.[K
remote: Counting objects: 100% (264/264), done.[K
remote: Compressing objects: 100% (208/208), done.[K
remote: Total 264 (delta 122), reused 160 (delta 49), pack-reused 0 (from 0)[K
Receiving objects: 100% (264/264), 6.23 MiB | 8.90 MiB/s, done.
Resolving deltas: 100% (122/122), done.
/content/Ruyi-Models
Collecting tomesd (from -r requirements.txt (line 5))
  Downloading tomesd-0.1.3-py3-none-any.whl.metadata (9.1 kB)
Collecting torchdiffeq (from -r requirements.txt (line 7))
  Downloading torchdiffeq-0.2.5-py3-none-any.whl.metadata (440 bytes)
Collecting torchsde (from -r requirements.txt (line 8))
  Downloading torchsde-0.2.6-py3-none-any.whl.metadata (5.3 kB)
Collecting decord (from -r requirements.txt (line 9))
  Downloading decord-0.6.0-py3-none-manylinux2010_x86_64.whl.metadata (422 bytes)
Collecting datasets (from -r requirements.txt (line 10))
  Downloading datasets-3.3.1-py3-none-any.whl.metadata (19 

## Step 1: Mount Google Drive
This step mounts your Google Drive to access the input images and save the output videos.

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Step 2: Adjust and Run the Script

Modify and run the provided script to generate videos from images. Ensure the script uses the defined input and output paths. After Model downloads and VAE is finished loading adjust camera motion settings.

In [5]:
# @title RUYI Model - clip continuation and combine together

# Full Script: Advanced Image-to-Video Generation with Looping, Stitching, and Robust Last Frame Extraction

import os
import torch
from PIL import Image
import numpy as np
from diffusers import (EulerDiscreteScheduler, EulerAncestralDiscreteScheduler,
                       DPMSolverMultistepScheduler, PNDMScheduler, DDIMScheduler)
from omegaconf import OmegaConf
from transformers import CLIPImageProcessor, CLIPVisionModelWithProjection, CLIPTextModel, CLIPTokenizer
from safetensors.torch import load_file as load_safetensors
from huggingface_hub import snapshot_download
import ipywidgets as widgets
from IPython.display import display
from moviepy.editor import VideoFileClip, concatenate_videoclips  # For final stitching

from ruyi.data.bucket_sampler import ASPECT_RATIO_512, get_closest_ratio
from ruyi.models.autoencoder_magvit import AutoencoderKLMagvit
from ruyi.models.transformer3d import HunyuanTransformer3DModel
from ruyi.pipeline.pipeline_ruyi_inpaint import RuyiInpaintPipeline
from ruyi.utils.lora_utils import merge_lora, unmerge_lora
from ruyi.utils.utils import get_image_to_video_latent, save_videos_grid

# ------------------------------
# User Settings and Configuration (Defaults)
# ------------------------------
# Input/Output paths
start_image_path = "/content/drive/MyDrive/8k_uhd_photorealistic_cyberpunk_city_wit (3).png" # @param {"type":"string","placeholder":"image source"}
base_output_name = "/content/drive/My Drive/GeneratedVideos/City_Future" # @param {"type":"string","placeholder":"output name"}

# Video settings
video_length = 120 # @param {"type":"number"}
base_resolution     = 512       # Approximate resolution (e.g., 512x512)
video_size          = None      # Override resolution if desired: [height, width]

# Control settings (metadata)
aspect_ratio        = "16:9"
motion              = "auto"
camera_direction    = "auto"

# ------------------------------
# Advanced Sampling Settings
# ------------------------------
steps_default       = 25
cfg_default         = 7.0
scheduler_default   = "DPM++"   # Options: ["DPM++", "Euler", "Euler A", "PNDM", "DDIM"]
seed_default        = "4242"    # (as string)
gpu_offload_steps   = 0         # Choose in [0, 10, 7, 5, 1], the latter number requires less GPU memory but longer time

# ------------------------------
# Model Settings
# ------------------------------
config_path         = "config/default.yaml"
model_name          = "Ruyi-Mini-7B"
model_type          = "Inpaint"
model_path          = f"models/{model_name}"
auto_download       = True
auto_update         = True

# ------------------------------
# Precision / FP8 Quantization Settings
# ------------------------------
precision_default   = 'bf16'    # Options: 'bf16' or 'fp32'
fp8_mode_default    = 'none'    # Options: 'none', 'lite', 'strong', 'extreme'
fp8_dtype_default   = 'auto'    # Options: 'auto', 'fp8_e5m2', 'fp8_e4m3fn'

# ------------------------------
# LoRA Settings
# ------------------------------
lora_path           = None
lora_weight         = 1.0

# ------------------------------
# TeaCache Settings (Advanced)
# ------------------------------
tea_cache_enabled_default          = True
tea_cache_threshold_default        = 0.10
tea_cache_skip_start_steps_default = 3
tea_cache_skip_end_steps_default   = 1
tea_cache_offload_cpu_default      = False

# ------------------------------
# Enhance-A-Video Settings (Advanced)
# ------------------------------
enhance_a_video_enabled_default           = True
enhance_a_video_weight_default            = 1.0
enhance_a_video_skip_start_steps_default  = 0
enhance_a_video_skip_end_steps_default    = 0

# ------------------------------
# Looping Settings
# ------------------------------
loops_default = 2  # Number of loops (clips) to generate

# ------------------------------
# Other Settings
# ------------------------------
weight_dtype = torch.bfloat16  # Default (may be changed by user)
device = torch.device("cuda")

# ------------------------------
# Define Helper Functions
# ------------------------------

def try_setup_pipeline(model_path, weight_dtype, config, fp8_quant_mode, fp8_data_type):
    try:
        # Load VAE
        vae = AutoencoderKLMagvit.from_pretrained(model_path, subfolder="vae").to(weight_dtype)
        print("Vae loaded ...")

        # Load Transformer
        transformer_additional_kwargs = OmegaConf.to_container(config['transformer_additional_kwargs'])
        transformer = HunyuanTransformer3DModel.from_pretrained_2d(
            model_path, subfolder="transformer", transformer_additional_kwargs=transformer_additional_kwargs
        ).to(weight_dtype)
        print("Transformer loaded ...")

        # Optionally convert transformer layers to FP8
        if fp8_quant_mode != 'none':
            count_f8 = 0
            fp8_type = torch.float8_e5m2 if fp8_data_type == 'fp8_e5m2' else torch.float8_e4m3fn
            if fp8_quant_mode != 'extreme':
                shape_size = 2816 ** 2
                if fp8_quant_mode == 'strong':
                    shape_size -= 1
                for module in transformer.modules():
                    if module.__class__.__name__ in ["Linear"]:
                        x, y = module.weight.shape
                        if x * y > shape_size:
                            module.to(fp8_type)
                            count_f8 += 1
            else:
                for module in transformer.modules():
                    if len(list(module.modules())) == 1 and list(module.named_parameters()):
                        if module.__class__.__name__ not in ["Embedding", 'LayerNorm', 'Conv2d', 'NonDynamicallyQuantizableLinear']:
                            module.to(fp8_type)
                            count_f8 += 1
            print(f'FP8: {count_f8} layers converted to {fp8_type}')

        # Load CLIP image encoder & processor
        clip_image_encoder = CLIPVisionModelWithProjection.from_pretrained(model_path, subfolder="image_encoder").to(weight_dtype)
        clip_image_processor = CLIPImageProcessor.from_pretrained(model_path, subfolder="image_encoder")

        # Load scheduler and create pipeline
        Choosen_Scheduler = DDIMScheduler  # default scheduler
        scheduler = Choosen_Scheduler.from_pretrained(model_path, subfolder="scheduler")
        pipeline = RuyiInpaintPipeline.from_pretrained(
            model_path, vae=vae, transformer=transformer, scheduler=scheduler, torch_dtype=weight_dtype,
            clip_image_encoder=clip_image_encoder, clip_image_processor=clip_image_processor
        )

        # Load embeddings from the safetensors file.
        embeddings = load_safetensors(os.path.join(model_path, "embeddings.safetensors"))
        pipeline.embeddings = embeddings
        print("Pipeline loaded ...")

        return pipeline
    except Exception as e:
        print("[Ruyi] Setup pipeline failed:", e)
        return None

# ------------------------------
# UI Widgets for Advanced Settings
# ------------------------------
# Precision / FP8 settings
precision_dropdown = widgets.Dropdown(
    options=[('BF16', 'bf16'), ('FP32', 'fp32')],
    value=precision_default,
    description='Precision:'
)
fp8_dropdown = widgets.Dropdown(
    options=[('None', 'none'), ('Lite', 'lite'), ('Strong', 'strong'), ('Extreme', 'extreme')],
    value=fp8_mode_default,
    description='FP8 Mode:'
)
fp8_dtype_dropdown = widgets.Dropdown(
    options=[('Auto', 'auto'), ('fp8_e5m2', 'fp8_e5m2'), ('fp8_e4m3fn', 'fp8_e4m3fn')],
    value=fp8_dtype_default,
    description='FP8 Data Type:'
)

# Sampling Settings
steps_slider = widgets.IntSlider(value=steps_default, min=10, max=100, step=5, description='Steps:')
cfg_slider = widgets.FloatSlider(value=cfg_default, min=1.0, max=20.0, step=0.5, description='Guidance Scale:')
scheduler_dropdown = widgets.Dropdown(options=['DPM++', 'Euler', 'Euler A', 'PNDM', 'DDIM'], value=scheduler_default, description='Scheduler:')
seed_text = widgets.Text(value=seed_default, description='Seed:')

# TeaCache Settings
tea_cache_toggle = widgets.Checkbox(value=tea_cache_enabled_default, description="Enable TeaCache")
tea_cache_threshold_slider = widgets.FloatSlider(value=tea_cache_threshold_default, min=0.05, max=0.20, step=0.01, description="TeaCache Threshold")
tea_cache_skip_start_slider = widgets.IntSlider(value=tea_cache_skip_start_steps_default, min=1, max=10, step=1, description="Skip Start Steps")
tea_cache_skip_end_slider = widgets.IntSlider(value=tea_cache_skip_end_steps_default, min=1, max=10, step=1, description="Skip End Steps")
tea_cache_offload_cpu_toggle = widgets.Checkbox(value=tea_cache_offload_cpu_default, description="Offload TeaCache to CPU")

# Enhance-A-Video Settings
enhance_video_toggle = widgets.Checkbox(value=enhance_a_video_enabled_default, description="Enable Enhance-A-Video")
enhance_video_weight_slider = widgets.FloatSlider(value=enhance_a_video_weight_default, min=0.5, max=5.0, step=0.5, description="Enhance Weight")
enhance_video_skip_start_slider = widgets.IntSlider(value=enhance_a_video_skip_start_steps_default, min=0, max=10, step=1, description="Skip Start Steps")
enhance_video_skip_end_slider = widgets.IntSlider(value=enhance_a_video_skip_end_steps_default, min=0, max=10, step=1, description="Skip End Steps")

# Control Embedding Selection (populated after pipeline initialization)
positive_dropdown = widgets.Dropdown(options=[], description='Positive Emb:')
negative_dropdown = widgets.Dropdown(options=[], description='Negative Emb:')

# Loop Count Dropdown
loops_dropdown = widgets.Dropdown(options=[str(i) for i in range(1, 11)], value=str(loops_default), description='Clip Count:')

# Display UI controls
display(precision_dropdown, fp8_dropdown, fp8_dtype_dropdown)
display(steps_slider, cfg_slider, scheduler_dropdown, seed_text)
display(tea_cache_toggle, tea_cache_threshold_slider, tea_cache_skip_start_slider, tea_cache_skip_end_slider, tea_cache_offload_cpu_toggle)
display(enhance_video_toggle, enhance_video_weight_slider, enhance_video_skip_start_slider, enhance_video_skip_end_slider)
display(positive_dropdown, negative_dropdown, loops_dropdown)

# ------------------------------
# Initialize Pipeline and Load Model
# ------------------------------
print("Checking for model updates ...")
repo_id = f"IamCreateAI/{model_name}"
if auto_download and auto_update:
    snapshot_download(repo_id=repo_id, local_dir=model_path)

def initialize_pipeline():
    global pipeline
    pipeline = try_setup_pipeline(model_path, weight_dtype, OmegaConf.load(config_path), fp8_dropdown.value, fp8_dtype_dropdown.value)
    if pipeline is None and auto_download:
        snapshot_download(repo_id=repo_id, local_dir=model_path)
        pipeline = try_setup_pipeline(model_path, weight_dtype, OmegaConf.load(config_path), fp8_dropdown.value, fp8_dtype_dropdown.value)
    if pipeline is None:
        raise FileNotFoundError(f"[Load Model Failed] Please download the model from '{repo_id}' and place it into '{model_path}'.")
    pipeline.enable_model_cpu_offload()
    pipeline.transformer.hidden_cache_size = gpu_offload_steps
    selected_scheduler = scheduler_dropdown.value
    if selected_scheduler == "DPM++":
        noise_scheduler = DPMSolverMultistepScheduler.from_pretrained(model_path, subfolder='scheduler')
    elif selected_scheduler == "Euler":
        noise_scheduler = EulerDiscreteScheduler.from_pretrained(model_path, subfolder='scheduler')
    elif selected_scheduler == "Euler A":
        noise_scheduler = EulerAncestralDiscreteScheduler.from_pretrained(model_path, subfolder='scheduler')
    elif selected_scheduler == "PNDM":
        noise_scheduler = PNDMScheduler.from_pretrained(model_path, subfolder='scheduler')
    elif selected_scheduler == "DDIM":
        noise_scheduler = DDIMScheduler.from_pretrained(model_path, subfolder='scheduler')
    pipeline.scheduler = noise_scheduler
    global generator
    generator = torch.Generator(device).manual_seed(int(seed_text.value))
    pos_options = sorted({ key.split('.')[1] for key in pipeline.embeddings.keys() if key.startswith('p.') })
    neg_options = sorted({ key.split('.')[1] for key in pipeline.embeddings.keys() if key.startswith('n.') })
    positive_dropdown.options = pos_options
    negative_dropdown.options = neg_options
    positive_dropdown.value = pos_options[0] if pos_options else "default"
    negative_dropdown.value = neg_options[0] if neg_options else "default"
    print("Pipeline initialized.")

initialize_pipeline()

# ------------------------------
# Generation Button
# ------------------------------
generate_button = widgets.Button(description="Generate Videos", button_style="success")
display(generate_button)

# ------------------------------
# Generation Callback Function with Looping and Final Stitching
# ------------------------------
def on_generate_button_clicked(b):
    global pipeline
    selected_precision = precision_dropdown.value
    global weight_dtype
    if selected_precision == 'fp32':
        weight_dtype = torch.float32
    else:
        weight_dtype = torch.bfloat16

    selected_fp8_mode = fp8_dropdown.value
    selected_fp8_dtype = fp8_dtype_dropdown.value
    selected_steps = steps_slider.value
    selected_cfg = cfg_slider.value
    selected_scheduler = scheduler_dropdown.value
    selected_seed = int(seed_text.value)

    tea_cache_enabled_user = tea_cache_toggle.value
    tea_cache_threshold_user = tea_cache_threshold_slider.value
    tea_cache_skip_start = tea_cache_skip_start_slider.value
    tea_cache_skip_end = tea_cache_skip_end_slider.value
    tea_cache_offload_cpu_user = tea_cache_offload_cpu_toggle.value

    enhance_video_enabled_user = enhance_video_toggle.value
    enhance_video_weight_user = enhance_video_weight_slider.value
    enhance_video_skip_start = enhance_video_skip_start_slider.value
    enhance_video_skip_end = enhance_video_skip_end_slider.value

    pos_variant = positive_dropdown.value
    neg_variant = negative_dropdown.value
    print(f"Using positive variant: {pos_variant} and negative variant: {neg_variant}")

    embeddings = pipeline.embeddings
    pos_key = f"p.{pos_variant}"
    positive_embeds = embeddings.get(f"{pos_key}.emb1", embeddings["p.default.emb1"])
    positive_attention_mask = embeddings.get(f"{pos_key}.mask1", embeddings["p.default.mask1"])
    positive_embeds_2 = embeddings.get(f"{pos_key}.emb2", embeddings["p.default.emb2"])
    positive_attention_mask_2 = embeddings.get(f"{pos_key}.mask2", embeddings["p.default.mask2"])

    neg_key = f"n.{neg_variant}"
    negative_embeds = embeddings.get(f"{neg_key}.emb1", embeddings["n.default.emb1"])
    negative_attention_mask = embeddings.get(f"{neg_key}.mask1", embeddings["n.default.mask1"])
    negative_embeds_2 = embeddings.get(f"{neg_key}.emb2", embeddings["n.default.emb2"])
    negative_attention_mask_2 = embeddings.get(f"{neg_key}.mask2", embeddings["n.default.mask2"])

    pipeline.transformer.tea_cache.initialize(
        tea_cache_enabled_user, tea_cache_threshold_user, tea_cache_skip_start, tea_cache_skip_end, selected_steps, tea_cache_offload_cpu_user
    )
    pipeline.transformer.enhance_a_video.initialize(
        enhance_video_enabled_user, enhance_video_weight_user, enhance_video_skip_start, enhance_video_skip_end, selected_steps
    )

    loop_count = int(loops_dropdown.value)
    print(f"Clip count: {loop_count}")

    clip_filenames = []
    current_start_image = Image.open(start_image_path).convert("RGB")

    for i in range(loop_count):
        postfix = f"_{i+1:03d}"
        current_output_video = f"{base_output_name}{postfix}.mp4"
        current_output_png = f"{base_output_name}{postfix}.png"
        print(f"Rendering clip {i+1}: {current_output_video}")

        with torch.no_grad(), torch.autocast(device_type=device.type, dtype=pipeline.transformer.dtype):
            adjusted_video_length = int(video_length // pipeline.vae.mini_batch_encoder * pipeline.vae.mini_batch_encoder) if video_length != 1 else 1
            input_video, input_video_mask, clip_image = get_image_to_video_latent(
                [current_start_image],
                None,
                video_length=adjusted_video_length,
                sample_size=(base_resolution, base_resolution) if video_size is None else video_size
            )

            for _lora_path, _lora_weight in zip([], []):
                pipeline = merge_lora(pipeline, _lora_path, _lora_weight)

            sample = pipeline(
                prompt_embeds = positive_embeds,
                prompt_attention_mask = positive_attention_mask,
                prompt_embeds_2 = positive_embeds_2,
                prompt_attention_mask_2 = positive_attention_mask_2,

                negative_prompt_embeds = negative_embeds,
                negative_prompt_attention_mask = negative_attention_mask,
                negative_prompt_embeds_2 = negative_embeds_2,
                negative_prompt_attention_mask_2 = negative_attention_mask_2,

                video_length = adjusted_video_length,
                height       = base_resolution if video_size is None else video_size[0],
                width        = base_resolution if video_size is None else video_size[1],
                generator    = generator,
                guidance_scale = selected_cfg,
                num_inference_steps = selected_steps,

                video      = input_video,
                mask_video = input_video_mask,
                clip_image = clip_image,
            ).videos

            for _lora_path, _lora_weight in zip([], []):
                pipeline = unmerge_lora(pipeline, _lora_path, _lora_weight)

        output_folder = os.path.dirname(current_output_video)
        if output_folder != '':
            os.makedirs(output_folder, exist_ok=True)
        save_videos_grid(sample, current_output_video, fps=24)
        print(f"Clip saved to {current_output_video}")

        # --- Extract Last Frame Using MoviePy ---
        try:
            clip = VideoFileClip(current_output_video)
            fps = clip.fps if clip.fps else 24
            last_frame_time = clip.duration - (1.0 / fps)
            if last_frame_time < 0:
                last_frame_time = 0
            print(f"Extracting last frame at time {last_frame_time:.3f} sec from {current_output_video}")
            last_frame = clip.get_frame(last_frame_time)
            print("Last frame shape:", last_frame.shape, "dtype:", last_frame.dtype)
        except Exception as e:
            print("Error extracting last frame with MoviePy:", e)
            continue  # Skip this loop iteration if extraction fails.

        # Convert to uint8 if needed.
        if last_frame.dtype != np.uint8:
            last_frame = (np.clip(last_frame, 0, 1) * 255).astype(np.uint8)
        # Determine mode for PIL.
        if last_frame.ndim == 2:
            mode = 'L'
        elif last_frame.ndim == 3:
            if last_frame.shape[2] == 3:
                mode = 'RGB'
            elif last_frame.shape[2] == 1:
                last_frame = np.squeeze(last_frame, axis=2)
                mode = 'L'
            else:
                mode = None
        else:
            mode = None
        if mode is None:
            raise ValueError(f"Unexpected shape for extracted frame: {last_frame.shape}")

        last_frame_img = Image.fromarray(last_frame, mode=mode)
        last_frame_img.save(current_output_png)
        print(f"Last frame saved to {current_output_png}")

        clip_filenames.append(current_output_video)

        # Set the next start image.
        current_start_image = last_frame_img.copy()

    # ------------------------------
    # Final Stitching: Combine all clips into one master clip.
    # Remove the first frame from clip 2 and onward to avoid repetition.
    # ------------------------------
    final_clips = []
    for idx, clip_file in enumerate(clip_filenames):
        print(f"Processing clip for stitching: {clip_file}")
        clip = VideoFileClip(clip_file)
        if idx > 0:
            clip = clip.subclip(1/24, clip.duration)
        final_clips.append(clip)

    if final_clips:
        final_video = concatenate_videoclips(final_clips, method="compose")
        final_output = f"{base_output_name}x.mp4"
        final_video.write_videofile(final_output, fps=24)
        print(f"Final stitched video saved to {final_output}")
    else:
        print("No clips to stitch.")

# Attach the callback.
generate_button.on_click(on_generate_button_clicked)


Dropdown(description='Precision:', options=(('BF16', 'bf16'), ('FP32', 'fp32')), value='bf16')

Dropdown(description='FP8 Mode:', options=(('None', 'none'), ('Lite', 'lite'), ('Strong', 'strong'), ('Extreme…

Dropdown(description='FP8 Data Type:', options=(('Auto', 'auto'), ('fp8_e5m2', 'fp8_e5m2'), ('fp8_e4m3fn', 'fp…

IntSlider(value=25, description='Steps:', min=10, step=5)

FloatSlider(value=7.0, description='Guidance Scale:', max=20.0, min=1.0, step=0.5)

Dropdown(description='Scheduler:', options=('DPM++', 'Euler', 'Euler A', 'PNDM', 'DDIM'), value='DPM++')

Text(value='4242', description='Seed:')

Checkbox(value=True, description='Enable TeaCache')

FloatSlider(value=0.1, description='TeaCache Threshold', max=0.2, min=0.05, step=0.01)

IntSlider(value=3, description='Skip Start Steps', max=10, min=1)

IntSlider(value=1, description='Skip End Steps', max=10, min=1)

Checkbox(value=False, description='Offload TeaCache to CPU')

Checkbox(value=True, description='Enable Enhance-A-Video')

FloatSlider(value=1.0, description='Enhance Weight', max=5.0, min=0.5, step=0.5)

IntSlider(value=0, description='Skip Start Steps', max=10)

IntSlider(value=0, description='Skip End Steps', max=10)

Dropdown(description='Positive Emb:', options=(), value=None)

Dropdown(description='Negative Emb:', options=(), value=None)

Dropdown(description='Clip Count:', index=1, options=('1', '2', '3', '4', '5', '6', '7', '8', '9', '10'), valu…

Checking for model updates ...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.



Fetching 12 files:   0%|          | 0/12 [00:00<?, ?it/s]

image_encoder%2Fpreprocessor_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

embeddings.safetensors:   0%|          | 0.00/75.0M [00:00<?, ?B/s]

model_index.json:   0%|          | 0.00/213 [00:00<?, ?B/s]

image_encoder%2Fconfig.json:   0%|          | 0.00/4.76k [00:00<?, ?B/s]

.gitattributes:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

scheduler%2Fscheduler_config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.71G [00:00<?, ?B/s]

README.md:   0%|          | 0.00/7.72k [00:00<?, ?B/s]

vae%2Fconfig.json:   0%|          | 0.00/804 [00:00<?, ?B/s]

transformer%2Fconfig.json:   0%|          | 0.00/795 [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/14.5G [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/1.06G [00:00<?, ?B/s]

Vae loaded ...
Transformer loaded ...


Loading pipeline components...: 0it [00:00, ?it/s]

Pipeline loaded ...
Pipeline initialized.


Button(button_style='success', description='Generate Videos', style=ButtonStyle())

Using positive variant: 16x9movie2static and negative variant: default
Clip count: 3
Rendering clip 1: /content/drive/My Drive/GeneratedVideos/City_Future_001.mp4


  deprecate("output_type=='np'", "0.33.0", deprecation_message, standard_warn=False)



  0%|          | 0/75 [00:00<?, ?it/s]