> This script and the dependencies in it require the use of a **CUDA-enabled GPU** with appropriate drivers.

# Setup
***Note:*** *This may take up to **15 minutes** on first run.*

## Define Parameters
Define a URL from which to download the input video, along with a filename under which to save it.

Alternatively, a video can be manually uploaded by updating the `FILENAME` parameter accordingly and commenting out the `!wget ...` line.

Some example videos can be found in the inputs folder on the [GitHub repository](https://github.com/TangyPenguin37/mvp_project).

In [None]:
URL = "https://videos.pexels.com/video-files/29460515/12682115_2160_3840_30fps.mp4"
FILENAME = "shipwreck.mp4"                    # Choose a filename under which to save the video, or open an existing file
PROMPT = "make the ship into a pirate ship"   # Choose the prompt with which to edit the model
ITERATIVE = False                             # Choose whether to use the iterative pipeline or not
ITERATIONS = 4                                # Choose the number of iterations, if using the iterative pipeline
BATCH_SIZE = 16                               # Choose a batch size for InstructPix2Pix - default: 16
FRAMES = 200                                  # Choose (approximately) how many frames to extract from the input video
RESOLUTION = (1280, 720)                      # Choose a resolution for the images from FFmpeg to be resized to - lower values decrease quality but also decrease editing and training time.

In [None]:
# Download the given URL - comment this line out if you wish to use a manually uploaded video.
!wget $URL -O $FILENAME

## Download and install dependencies

Clone GitHub repository and install all the necessary dependencies.

-----

**Note:** The COLMAP executable in this repository is built specifically for the **T4 GPU in Google Colab**, so as to not require rebuilding with each run. If you are using a different GPU, even within Google Colab, our script may fail. To remedy this, COLMAP must be built from source on the desired GPU architecture and added to PATH in place of the COLMAP file used here. *Instructions on building COLMAP can be found [here](https://colmap.github.io/install.html).*

---

In [None]:
# Remove the Google Colab sample data directory, if it exists
!rm -rf ./sample_data

# Clone the GitHub repository
!echo -e "\e[34mCloning repository...\e[0m" && GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/TangyPenguin37/mvp_project -q --recurse-submodules && echo -e "\e[34mCloned repository!\e[0m"

# Download and install the necessary dependencies for the Gaussian Splatting script using pip
!echo -e "\e[34mInstalling dependencies for Gaussian Splatting\e[0m" && pip install -q -r ./mvp_project/dependencies/requirements.txt && echo -e "\e[34mInstalled dependencies for Gaussian Splatting!\e[0m"

# Download and install the necessary dependencies for COLMAP using apt-get
!echo -e "\e[34mInstalling dependencies for COLMAP\e[0m" && xargs -a ./mvp_project/dependencies/apt-get.txt apt-get -qq install && echo -e "\e[34mInstalled dependencies for COLMAP!\e[0m"

# Import necessary libraries
import os
import sys
import shutil
import gc
import logging
import torch
from PIL import Image
from pathlib import Path
from diffusers import StableDiffusionInstructPix2PixPipeline

# Set executable permissions for COLMAP and add it to PATH
!chmod +x /content/mvp_project/colmap/src/colmap/exe/colmap
os.environ["PATH"] += ":/content/mvp_project/colmap/src/colmap/exe"

# Set up the InstructPix2Pix model
!echo -e "\e[34mSetting up InstructPix2Pix model...\e[0m"
IP2P = StableDiffusionInstructPix2PixPipeline.from_pretrained("timbrooks/instruct-pix2pix", torch_dtype=torch.float16).to("cuda")
!echo -e "\e[34mSetup completed!\e[0m"

# Algorithm

## Split the video into frames

In [None]:
# Define the folder in which files are placed. The default is the filename without the file extension.
FOLDER = Path(FILENAME).stem

# Create the input folder to store the video frames
!rm -rf ./input/$FOLDER
!mkdir -p ./input/$FOLDER/input

# Extract frames from the video using FFmpeg, getting as close to "FRAMES" frames as possible
!ffpb -i ./$FILENAME -qscale:v 1 -qmin 1 -vf fps=$(expr $FRAMES / $(printf "%.0f\n" $(ffprobe -i $FILENAME -show_entries format=duration -v quiet -of csv='p=0'))) ./input/$FOLDER/input/%03d.jpg

## Run COLMAP
If this section fails, it may be because the version of COLMAP in the GitHub repository is built specifically for the **T4 GPU in Google Colab**. For a different GPU, COLMAP must be rebuilt from source using the instructions [here](https://colmap.github.io/install.html).

In [None]:
# Define arguments for COLMAP - these can be modified if desired
SOURCE = f"/content/input/{FOLDER}"
CAMERA = "OPENCV"
COLMAP = "colmap"
USE_GPU = True

use_gpu_arg = str(int(USE_GPU))

# Create necessary directories for COLMAP
!mkdir -p $SOURCE/distorted/sparse

# Run COLMAP feature extractor to find features in each image
!$COLMAP feature_extractor \
  --database_path $SOURCE/distorted/database.db \
  --image_path $SOURCE/input \
  --ImageReader.single_camera 1 \
  --ImageReader.camera_model $CAMERA \
  --SiftExtraction.use_gpu $use_gpu_arg

# Run COLMAP exhaustive matcher to find matches between images
!$COLMAP exhaustive_matcher \
  --database_path $SOURCE/distorted/database.db \
  --SiftMatching.use_gpu $use_gpu_arg

# Run COLMAP mapper to create a sparse reconstruction
!$COLMAP mapper \
  --database_path $SOURCE/distorted/database.db \
  --image_path $SOURCE/input \
  --output_path $SOURCE/distorted/sparse \
  --Mapper.ba_global_function_tolerance=0.000001

# Run COLMAP image undistorter to undistort images
!$COLMAP image_undistorter \
  --image_path $SOURCE/input \
  --input_path $SOURCE/distorted/sparse/0 \
  --output_path $SOURCE \
  --output_type COLMAP

# Create directory for sparse output
!mkdir -p $SOURCE/sparse/0

# Move files into this directory
for file in os.listdir(f"{SOURCE}/sparse"):
  if (file != '0'):
    shutil.move(
      os.path.join(SOURCE, "sparse", file),
      os.path.join(SOURCE, "sparse", "0", file),
    )

Command to convert .bin COLMAP files to .txt, if desired for a viewer program

In [None]:
# !colmap model_converter \
#   --input_path path-to-binary-reconstruction \
#   --output_path path-to-txt-reconstruction \
#   --output_type TXT

## Diffusion and Gaussian Splatting

In [None]:
# Function to process and edit images in batches
def process_images(input_path, output_path, files, batch_size, prompt, resolution):
  # Create the output folder for the edited images
  !mkdir -p $output_path

  # Process images in batches
  for i in range(0, len(files), batch_size):
    # Clear GPU cache to ensure sufficient memory
    gc.collect()
    torch.cuda.empty_cache()

    # Create a batch of resized images
    batch = [Image.open(os.path.join(input_path, file)).resize(resolution) for file in files[i:min(i + batch_size, len(files))]]

    # Edit the batch of images using InstructPix2Pix
    edited_imgs = (IP2P(prompt=[prompt] * len(batch), image=batch).images)

    # Save and close each of the edited images, keeping the original filenames
    for j, edited_img in enumerate(edited_imgs):
      edited_img.save(os.path.join(output_path, files[i + j]))
      edited_img.close()

    # Close the original images
    for img in batch:
      img.close()

  # Clear GPU cache
  gc.collect()
  torch.cuda.empty_cache()

# Function to train a Gaussian splatting model and produce renders from it
def train_and_render(input_path, output_path, model_path):
  # Train Gaussian Splatting model
  !python ./mvp_project/submodules/gaussian-splatting/train.py -s $input_path -i $output_path --test_iterations -1 -m $model_path

  # Render the output, using the original frames' camera positions
  !python ./mvp_project/submodules/gaussian-splatting/render.py -m $model_path

# Non-iterative pipeline
if not ITERATIVE:

  # Define input and output directories
  INPUT_DIR = "images"
  OUTPUT_DIR = "images_edited"

  input_path = f"/content/input/{FOLDER}/{INPUT_DIR}"
  output_path = f"/content/input/{FOLDER}/{OUTPUT_DIR}"

  # Get sorted list of files in the input directory
  files = sorted(os.listdir(input_path))

  # Process images and save the edited versions
  process_images(input_path, output_path, files, BATCH_SIZE, PROMPT, RESOLUTION)

  # Train the model and render the output
  train_and_render(f"/content/input/{FOLDER}", OUTPUT_DIR, f"/content/output/{FOLDER}/model")

# Iterative pipeline
else:

  # Define input directory and initial output folder
  INPUT_DIR = "images"
  input_path = f"/content/input/{FOLDER}/{INPUT_DIR}"
  output_folder = "images_1"

  # Get sorted list of all files in the input directory
  allfiles = sorted(os.listdir(input_path))

  # Resize all images to the specified resolution
  for file in allfiles:
    img = Image.open(os.path.join(input_path, file))
    img = img.resize(RESOLUTION)
    img.save(os.path.join(input_path, file))
    img.close()

  # Iterate through as many times as specified
  for i in range(ITERATIONS):

    # Define the output path for the current iteration
    output_path = f"/content/input/{FOLDER}/{output_folder}"
    shutil.copytree(input_path, output_path)

    # Select files for the current iteration
    files = allfiles[i::ITERATIONS]

    # Process images and save the edited versions
    process_images(input_path, output_path, files, BATCH_SIZE, PROMPT, RESOLUTION)

    # Train the model and render the output
    train_and_render(f"/content/input/{FOLDER}", output_folder, f"/content/output/{FOLDER}/model_{i}")

    # Replace original files with the rendered output
    for idx, file in enumerate(sorted(os.listdir(f"/content/output/{FOLDER}/model_{i}/train/ours_30000/renders"))):
      os.replace(f"/content/output/{FOLDER}/model_{i}/train/ours_30000/renders/{file}", f"/content/input/{FOLDER}/{output_folder}/{allfiles[idx]}")

    # Update input path and output folder for the next iteration
    input_path = f"/content/input/{FOLDER}/{output_folder}"
    output_folder = f"images_{i+2}"

To view the final Gaussian splatting model, the model can be viewed on Windows using the SIBR Viewer tool found in our [GitHub repository](https://github.com/TangyPenguin37/mvp_project). Alternatively, there are several online tools such as [this one](https://antimatter15.com/splat/).

The Windows viewer also allows the camera positions directly from COLMAP to be passed in, such that the original input video path can be recreated.