# Nerfstudio To D-NeRF dataset

Set Up:

1. Installl Nerfstudio and activate environment
2. Run this notebook with the nerfstudio conda environment

Instructions:

1. Update configuration relative to desired parameters
2. Run the method

#### Notes on D-NeRF dataset:

In the nerfstudio DNeRF parser, `camera_angle_x` parameter (found in the transforms files) has a $1/tanh(0.5*x)$ relationship with the focal length. While I did mess around with this in [Desmos](https://www.desmos.com/calculator/xw0lodoghb) and initially selected a high $x=4.$ value. It didn't play well with kplanes (tested with scene contraction and varying near and far plane positions). In the end $x=0.6$ worked best despite how counter intuitive this is... The parameter is important for good rendering but is not directly recoverable from the nerfstudio dataset (it might be I just haven't looked into it).

Otherwise the `rotation` parameter found for each frame doesn't seem to have any impact on performance (at least in the tests I ran with K-Planes, **this maybe different for other models!!!**).

---
# Configuration
---

Information:
1. `nerfstudio_fp` is the path **to the folder** containing `transforms.json`
2. `output_fp` is the path to the folder you wish to write `transforms_train.json`, `transforms_test.json` and `transforms_val.json`
3. `downscale_images_fp` is the folder name inside of you nerfstudio folder containing the downscaled images if you would rather save these than the original images. As dnerf format doesn't consider downscaled images, this will allow you to use them instead.
4. `method` declares the way we recover the time values:
    - `'exhaustive'`    : match image to frame (**super slow**)
    - `'linear'`    : assign time given image name-index (**fast**) (e.g. image w/name `frame_{i}.png` is going to be at i/n time where n is the number of images and `i` is the index; this is the image-naming format used by nerfstudio-colamp process)

In [7]:
nerfstudio_fp ='data/luca/'
video_fp = 'path/to/video.mp4'
output_fp = 'data/luca_DN'

downscale_images_fp = 'images_2'

method = 'linear'

---
# View the code
---

Information:
1. Import dependencies. *Make sure nerfstudio has been downloaded*
2. View the functions
3. Change the functions (optional)
4. Run the functions

In [8]:
# Import
import os
import json
from pathlib import Path
import random
import shutil

import cv2
import numpy as np
from skimage.metrics import structural_similarity as ssim
from tqdm.notebook import tqdm

from utils_ import *

### Exhaustive SSIM Time Search

**Args:**
- d_fp, v_fp, img_fp: Path, previously discussed
- transfors_fp: Path, path to transforms file

**Notes:**
1. Exhaustive search matches each image (e.g. `frame_0000.png`, `frame_0001.png`, ... ) to each frame in a video. 
2. Each image is a frame with png compression so direct image to frame comparison isn't possible
3. Instead we compare w/ SSIM.
4. This means:

    a. Overlapping frames (such as monocular stationary camera with negligible dynamic motion) will have the same SSIM score and so we will get several frames which match
        
    b. We select the earliest occuring frame match as the time of the png image
    
    c. We accept that this may not always be the case so we add a threshold, whereby we search for the earliest match where SSIM > 0.95 , when this is not the case max(SSIM) > 0.9 is selected.
    
    d. Theoretically, this shouldn't be an issue for NeRF as SSIM threshold is high so should be negligible during NeRF evaluation
    

In [9]:
def exhaustive(d_fp, o_fp, v_fp, img_fp, transforms_fp, shuffle:bool=True):
    pathchecks([d_fp, v_fp, img_fp])

    assert not os.path.exists(o_fp), 'Folder already exists, delete folder to run'

    os.makedirs(o_fp)
    with open(transforms_fp) as fp:
        contents = fp.read()
    transforms = json.loads(contents)
    img_frames = transforms['frames'] # Directly access frame data

    # Initialise opencv video object
    video = cv2.VideoCapture(str(v_fp))
    total_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
    print(f'Total number of frames to process {total_frames} \n Total number of images to process {len(img_frames)}')


    video_frame_counter = 0
    matches = []

    # Sort img_frames by filepath name
    filenum = []
    for img in img_frames:
        num = img['file_path'].split('/')[-1].split('.')[-2].split('_')[-1]
        filenum.append(int(num))
    img_frames = [img for _, img in sorted(zip(filenum, img_frames), reverse=True)]
    
    print('Running')
    iterator = tqdm(enumerate(img_frames))
    # loop through each image in our colmap dataset
    for idx, img in iterator:
        fp = d_fp / img['file_path']
        image = cv2.imread(str(fp), cv2.IMREAD_GRAYSCALE) # load image greyscale

        SSIM = {
            'max': 0.,
            'idxs':[]
        }
        for idx_video in range(video_frame_counter, total_frames):
            # Fetch frame from video
            video.set(cv2.CAP_PROP_FRAME_COUNT, idx_video)
            ret, frame = video.read()
            print(ret)
            if ret: frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # load frame greyscale
            else: break

            # Get SSIM
            ssim_res = ssim(image, frame)
            print(idx_video, ssim_res)
            # Process SSIM
            if ssim_res > 0.93: # if we meet ideal match
                SSIM['max'] = 1.
                SSIM['idxs'] = [idx_video] 
                # video_frame_counter = idx_video + 1
                break
            elif ssim_res > SSIM['max']: # if we have a new max
                SSIM['max'] = ssim_res
                SSIM['idxs'] = [idx_video]
            elif ssim_res == SSIM['max']: # if we have the same max
                SSIM['idxs'].append(idx_video)
        

        if SSIM['max'] < 0.9:
            print(f'Image {idx} has no match: consider lowering SSIM threshold')
        else:
            idx_video_ = min(SSIM['idxs'])
            matches.append({
                "frame" : float(idx_video_),
                "image" : idx
                })

    # shuffle data
    if shuffle == True:
        random.shuffle(matches)

    # train split
    train_split = 0.9
    train_idx = int(train_split * len(matches))
    train_data = matches[:train_idx]
    # test split (on remaining data)
    test_split = 0.9
    test_index = train_idx + int(test_split * (len(matches) - len(train_data)))
    test_data =  matches[train_idx : test_index]
    # val split
    val_data = matches[test_index:]


    # Construct transform files
    local_properties = {
        "roation": 0.0,
    }
    data = [train_data, test_data, val_data]
    for idx, d in enumerate(data):
        file_ = {
            "camera_angle_x": 0.6, # Seems to work best on the toy datasets I used
            "frames":[]
        }
        # file_path
        if idx == 0: file_path = Path(o_fp) / 'transforms_train.json'
        elif idx == 1: file_path = Path(o_fp) / 'transforms_test.json'
        elif idx == 2: file_path = Path(o_fp) / 'transforms_val.json'
        
        for match_idx in d:
            img_data = img_frames[match_idx['image']]
            time = float(match_idx['frame'] / total_frames)
            fname = img_data['file_path'].split('/')[-1]
            
            file_["frames"].append({
            "file_path":f'./train/{fname}',
            "rotation": local_properties['rotation'],
            "time":time,
            "transform_matrix":img_data['transform_matrix']
        })



### Blind Linear Time Search

**Args:**
- Same as before

**Notes:**
- Order collected images by frame number and linearly assign time value between 0 and 1.
- This will make the motion linear with colmap extraction so it may not be ideal.

In [10]:
def linear(d_fp, o_fp, v_fp, img_fp, transforms_fp, shuffle:bool=True, rotation:float=0.0, camera_angle_x:float=0.0):
    assert not os.path.exists(o_fp), 'Folder already exists, delete folder to run'

    os.makedirs(o_fp)
    os.makedirs(o_fp+'/train')
    os.makedirs(o_fp+'/test')
    os.makedirs(o_fp+'/val')
    with open(transforms_fp) as fp:
        contents = fp.read()
    transforms = json.loads(contents)
    img_frames = transforms['frames'] # Directly access frame data
    print(f'Total number of images to process {len(img_frames)}')

    local_properties = {
        "rotation":rotation,
    }

    frames = []
    for idx, img in enumerate(img_frames):
        fname = img['file_path'].split('/')[-1].split('_')[-1].split('.')[0]

        if int(fname) > 300:
            time = 1.
        else:
            time = int(fname)/300 #(len(img_frames))

        frames.append({
            "file_path":f'train/frame_{fname}',
            "rotation": local_properties['rotation'],
            "time":time,
            "transform_matrix":img['transform_matrix']
        })
    
    # shuffle data
    if shuffle == True:
        random.shuffle(frames)

    # train split
    train_split = 0.9
    train_idx = int(train_split * len(frames))
    train_data = frames[:train_idx]
    # test split (on remaining data)
    test_split = 0.9
    test_index = train_idx + int(test_split * (len(frames) - len(train_data)))
    test_data =  frames[train_idx : test_index]
    # val split
    val_data = frames[test_index:]

    print(f' Training dataset: {len(train_data)} | Testing dataset: {len(test_data)}')
    data = [train_data, test_data, val_data]
    for idx, d in enumerate(data):
        file_ = {
            "frames":[]
        }

        for file_key in transforms.keys():
            if "camera_angle" not in file_key and file_key != 'frames':
                file_[file_key] = transforms[file_key]
        
        # file_path
        if idx == 0: file_path = Path(o_fp) / 'transforms_train.json'
        elif idx == 1: file_path = Path(o_fp) / 'transforms_test.json'
        elif idx == 2: file_path = Path(o_fp) / 'transforms_val.json'
        
        for frame in d:
            if idx == 0:
                name = 'train/'+frame['file_path'].split('/')[-1]
            elif idx == 1:
                name = 'test/'+frame['file_path'].split('/')[-1]
            elif idx == 2:
                name = 'val/'+frame['file_path'].split('/')[-1]

            frame['file_path'] = name
            file_['frames'].append(frame)

            if idx == 0:
                destination = Path(o_fp) / 'train' / (frame['file_path'].split('/')[-1] + '.png')
                source = img_fp / (frame['file_path'].split('/')[-1] + '.png')
                shutil.copyfile(source, destination)
            elif idx == 1:
                destination = Path(o_fp) / 'test' / (frame['file_path'].split('/')[-1] + '.png')
                source = img_fp / (frame['file_path'].split('/')[-1] + '.png')
                shutil.copyfile(source, destination)
            elif idx == 2:
                destination = Path(o_fp) / 'val' / (frame['file_path'].split('/')[-1] + '.png')
                source = img_fp / (frame['file_path'].split('/')[-1] + '.png')
                shutil.copyfile(source, destination)
        
        print(file_)
        with open(file_path, 'w') as fp:
            json.dump(file_, fp)   

    

### Handler for nerfstudio2dnerf

**Args:**
- `d_fp`, Path, path to `transforms.json` **folder**
- `o_fp`, Path, path to output folder
- `v_fp`, Path, path to video
- `img_fp`, Path, path to image folder

In [11]:
def handler(d_fp, o_fp, v_fp, img_fp, meth, rotation:float=0.0, camera_angle_x:float=0.0):
    d_fp = Path(d_fp)
    v_fp = Path(v_fp)
    img_fp = d_fp / img_fp

    transforms_fp = d_fp/'transforms.json'

    # Sanity Checks
    folderchecks([d_fp, img_fp])
    
    # meth = 'linear'
    # Handle exhaustive method
    if meth == 'exhaustive':
        exhaustive(d_fp, o_fp, v_fp, img_fp,transforms_fp)    
    elif meth == 'linear':
        linear(d_fp, o_fp, v_fp, img_fp,transforms_fp, rotation=rotation, camera_angle_x=camera_angle_x)    

In [12]:
handler(nerfstudio_fp, output_fp, video_fp, downscale_images_fp, method)

Total number of images to process 780
 Training dataset: 702 | Testing dataset: 70
{'frames': [{'file_path': 'train/frame_00093', 'rotation': 0.0, 'time': 0.31, 'transform_matrix': [[0.15474567907460235, -0.5344951125492546, 0.8308843177414644, 3.7802375693977495], [0.76186734118903, -0.4708678788434989, -0.44479387934523285, 0.8438042641848079], [0.6289768908611009, 0.7018535569008562, 0.33434960061037755, 0.3389751747729203], [0.0, 0.0, 0.0, 1.0]]}, {'file_path': 'train/frame_00748', 'rotation': 0.0, 'time': 1.0, 'transform_matrix': [[-0.8103413415933104, 0.3280375281728239, -0.48552887680859735, -1.941468836603119], [-0.4664444116236558, -0.8626380410858014, 0.19566609552171582, 3.8557881146192563], [-0.35464985685858647, 0.38502855761867233, 0.8520425393419443, 2.865456468323723], [0.0, 0.0, 0.0, 1.0]]}, {'file_path': 'train/frame_00695', 'rotation': 0.0, 'time': 1.0, 'transform_matrix': [[-0.40437136035865723, 0.5033147139999371, -0.763647891107446, -3.433213571868689], [-0.751761