#SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

The model can be found on [GitHub](https://github.com/Stability-AI/generative-models). The checkpoint is located on their [HugginFace](https://huggingface.co/stabilityai/sv3d/tree/main).

The script *simple_video_sample.py* has been adapted to be used in this Jupyter notebook

##Installation and setup

In [None]:
# !pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install --upgrade torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cu118

!pip install transformers einops opencv-python matplotli
!pip install transformers einops rembg fire omegaconf safetensors imageio
!pip install opencv-python-headless  # to avoid GUI-related errors in Colab
!pip install onnxruntime-gpu
!pip install git+https://github.com/openai/CLIP.git
!pip install pytorch_lightning
!pip install kornia
!pip install open-clip-torch

!pip uninstall -y imwatermark imhist
!pip install git+https://github.com/guofei9987/imwatermark.git


Looking in indexes: https://download.pytorch.org/whl/cu118
Collecting torchvision==0.16.0
  Downloading https://download.pytorch.org/whl/cu118/torchvision-0.16.0%2Bcu118-cp311-cp311-linux_x86_64.whl (6.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.2/6.2 MB[0m [31m22.2 MB/s[0m eta [36m0:00:00[0m
Collecting torch==2.1.0 (from torchvision==0.16.0)
  Downloading https://download.pytorch.org/whl/cu118/torch-2.1.0%2Bcu118-cp311-cp311-linux_x86_64.whl (2325.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 GB[0m [31m574.9 kB/s[0m eta [36m0:00:00[0m
Collecting triton==2.1.0 (from torch==2.1.0->torchvision==0.16.0)
  Downloading https://download.pytorch.org/whl/triton-2.1.0-0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m89.2/89.2 MB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: triton, torch, torchvision
  Attempting 

Collecting onnxruntime-gpu
  Downloading onnxruntime_gpu-1.21.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.8 kB)
Collecting coloredlogs (from onnxruntime-gpu)
  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)
Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime-gpu)
  Downloading humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)
Downloading onnxruntime_gpu-1.21.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (280.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m280.8/280.8 MB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.0/46.0 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.8/86.8 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25hIns

In [None]:
# Clone the StabilityAI repo
!git clone https://github.com/Stability-AI/generative-models
%cd generative-models

Cloning into 'generative-models'...
remote: Enumerating objects: 1119, done.[K
remote: Counting objects: 100% (554/554), done.[K
remote: Compressing objects: 100% (169/169), done.[K
remote: Total 1119 (delta 428), reused 412 (delta 384), pack-reused 565 (from 1)[K
Receiving objects: 100% (1119/1119), 86.66 MiB | 15.54 MiB/s, done.
Resolving deltas: 100% (589/589), done.
/content/generative-models


##Load the check point

In [None]:
import requests

# Create checkpoint directory
import os
os.makedirs("checkpoints", exist_ok=True)


In [None]:
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
checkpoint_path = "/content/drive/MyDrive/0-Master/TAVA_NVS/sv3d_u.safetensors"

import os

if os.path.exists(checkpoint_path):
    print("Found checkpoint!")
else:
    print("Checkpoint not found. Double-check the path.")


Found checkpoint!


In [None]:
target_path = "/content/generative-models/checkpoints/sv3d_u.safetensors"

# Create checkpoints folder if it doesn't exist
os.makedirs(os.path.dirname(target_path), exist_ok=True)

# Copy from Drive to repo
import shutil
shutil.copy(checkpoint_path, target_path)

print("Copied to:", target_path)


Copied to: /content/generative-models/checkpoints/sv3d_u.safetensors


##Load Input

##Run the model

In [None]:
!sed -i '/from imwatermark import WatermarkEncoder/d' sgm/inference/helpers.py
!sed -i 's/samples = embed_watermark(samples)/# samples = embed_watermark(samples)/' scripts/sampling/simple_video_sample.py
!sed -i '/self.encoder = WatermarkEncoder()/d' sgm/inference/helpers.py
!sed -i '/self.encoder.set_watermark("bits", self.watermark)/d' sgm/inference/helpers.py
!sed -i 's/    def __init__(self, watermark):/    def __init__(self, watermark):\n        pass/' sgm/inference/helpers.py
!sed -i 's/    def __call__(self, images):/    def __call__(self, images):\n        return images/' sgm/inference/helpers.py


In [None]:
import sys
sys.path.append('.')  # ensure base dir is in path

# Create a wrapper to run from notebook
from scripts.sampling.simple_video_sample import sample




A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.11/dist-packages/colab_kernel_launcher.py", line 37, in <module>
    ColabKernelApp.launch_instance()
  File "/usr/local/lib/python3.11/dist-packages/traitlets/config/application.py", line 992, in launch_instance
    app.start()
  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelapp.py", line 712, in start
    self.io_loop.start()
  File "/usr/local/lib/python3.11/dist-package

In [None]:
import torch
torch.cuda.empty_cache()


In [None]:
input_path = "/content/drive/MyDrive/0-Master/TAVA_NVS/0.jpg"

sample(
    input_path=input_path,
    version="sv3d_u",
    device="cpu",
    decoding_t=1,              # 1 frame at a time (least memory)
    motion_bucket_id=64,      # lower motion = less compute
    fps_id=6,                  # moderate frame rate
)



VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing




VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing




VideoTransformerBlock is using checkpointing




VideoTransformerBlock is using checkpointing




VideoTransformerBlock is using checkpointing




VideoTransformerBlock is using checkpointing




VideoTransformerBlock is using checkpointing




VideoTransformerBlock is using checkpointing




VideoTransformerBlock is using checkpointing




VideoTransformerBlock is using checkpointing




VideoTransformerBlock is using checkpointing




VideoTransformerBlock is using checkpointing




VideoTransformerBlock is using checkpointing
VideoTransformerBlock is using checkpointing


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


open_clip_model.safetensors:   0%|          | 0.00/3.94G [00:00<?, ?B/s]

Initialized embedder #0: FrozenOpenCLIPImagePredictionEmbedder with 683800065 params. Trainable: False
Initialized embedder #1: VideoPredictionEmbedderWithEncoder with 83653863 params. Trainable: False
Initialized embedder #2: ConcatTimestepEmbedderND with 0 params. Trainable: False
Restored from checkpoints/sv3d_u.safetensors with 0 missing and 0 unexpected keys


100%|███████████████████████████████████████| 890M/890M [00:58<00:00, 16.0MiB/s]
