# Instant-ngp

This notebook aims to be a step-by-step guide to train NeRF models and rendering videos from them with nvidia's [instant-ngp](https://github.com/NVlabs/instant-ngp) software using:
 * **Colab** for the heavy lifting.
 * A low-resource **local computer** for the steps that require having a graphical user interface (GUI).
 * **Record3D** for robust camera pose estimation (alternative to COLMAP).

It has been tested on a GTX 1050ti in the local machine and an assigned Tesla T4 in the remote one.

Based on this [notebook](https://colab.research.google.com/drive/10TgQ4gyVejlHiinrmm5XOvQQmgVziK3i?usp=sharing) by [@myagues](https://github.com/NVlabs/instant-ngp/issues/6#issuecomment-1016397579), the main differences being the addition of steps 3 and 4 to ensure compatibility between the local machine and the models trained in the remote machine, of step 10 to render a video from the scene, Record3D support for better pose estimation, and a more guided approach.

## 1.Connect to a GPU runtime

Connect your colab session to a GPU runtime and check that you have been assigned a GPU. It should have a minimum of 8GB of available memory.

In [None]:
!nvidia-smi

Mon Jun 16 21:14:59 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   35C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

## 2. Install dependencies and clone the instant-ngp repo

In [None]:
!apt update && apt install build-essential git python3-dev python3-pip libopenexr-dev libxi-dev libglfw3-dev libglew-dev libomp-dev libxinerama-dev libxcursor-dev ffmpeg jq
!pip install --upgrade cmake

[33m0% [Working][0m            Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Get:2 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Get:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [1,778 kB]
Hit:4 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Get:5 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Hit:6 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:7 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:8 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
Get:9 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [1,250 kB]
Get:10 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [3,296 kB]
Get:11 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [2,986 kB]
Hit:12 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
G

In [None]:
!git clone --recursive https://github.com/nvlabs/instant-ngp
%cd instant-ngp

Cloning into 'instant-ngp'...
remote: Enumerating objects: 4372, done.[K
remote: Counting objects: 100% (35/35), done.[K
remote: Compressing objects: 100% (19/19), done.[K
remote: Total 4372 (delta 17), reused 16 (delta 16), pack-reused 4337 (from 3)[K
Receiving objects: 100% (4372/4372), 187.06 MiB | 33.00 MiB/s, done.
Resolving deltas: 100% (2788/2788), done.
Submodule 'dependencies/OpenXR-SDK' (https://github.com/KhronosGroup/OpenXR-SDK.git) registered for path 'dependencies/OpenXR-SDK'
Submodule 'dependencies/args' (https://github.com/Taywee/args) registered for path 'dependencies/args'
Submodule 'dependencies/dlss' (https://github.com/NVIDIA/DLSS) registered for path 'dependencies/dlss'
Submodule 'dependencies/glfw' (https://github.com/Tom94/glfw) registered for path 'dependencies/glfw'
Submodule 'dependencies/imgui' (https://github.com/ocornut/imgui.git) registered for path 'dependencies/imgui'
Submodule 'dependencies/pybind11' (https://github.com/Tom94/pybind11) registered f

## 3. Set compute capability
Find the compute capability of the GPU in your **local** machine in the following link:
https://developer.nvidia.com/cuda-gpus

You need this to be able to open your trained models in `testbed` inside your local machine later on, so you can explore them or trace a camera path in order to generate a video from your scene.

In [None]:
compute_capability = "61" #@param [50, 52, 60, 61, 70, 72, 75, 80, 86, 87]
%env TCNN_CUDA_ARCHITECTURES=$compute_capability


env: TCNN_CUDA_ARCHITECTURES=61


## 4. Set the right network configuration
For compatibility between the model trained here and the local machine, a network with FP32 or FP16 is chosen.

https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#hardware-precision-matrix

In [None]:
network_type = "FullyFusedMLP" if int(compute_capability) >= 70 else "CutlassMLP"
print(f"Using {network_type}")
%env NN_CONFIG_PATH = ./configs/nerf/base.json
!jq '.network.otype = "CutlassMLP" | .rgb_network.otype = "CutlassMLP"' $NN_CONFIG_PATH | sponge $NN_CONFIG_PATH

Using CutlassMLP
env: NN_CONFIG_PATH=./configs/nerf/base.json


## 5. Build the project and install python requirements

In [None]:
!cmake . -B build -DNGP_BUILD_WITH_GUI=OFF -DCMAKE_POLICY_VERSION_MINIMUM=3.5


-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- The CUDA compiler identification is NVIDIA 12.5.82 with host compiler GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- No release type specified. Setting to 'Release'.
-- Obtained CUDA architectures from environment variable TCNN_CUDA_ARCHITECTURES=61
-- Targeting CUDA architectures: 61
  Fully fuse

In [None]:
# Clean and rebuild if needed
!rm -rf build
!cmake . -B build -DNGP_BUILD_WITH_GUI=OFF -DCMAKE_POLICY_VERSION_MINIMUM=3.5

-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- The CUDA compiler identification is NVIDIA 12.5.82 with host compiler GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- No release type specified. Setting to 'Release'.
-- Obtained CUDA architectures from environment variable TCNN_CUDA_ARCHITECTURES=61
-- Targeting CUDA architectures: 61
  Fully fuse

In [None]:
!cmake --build build --config RelWithDebInfo -j `nproc`

[  2%] [32mBuilding CUDA object CMakeFiles/optix_program.dir/src/optix/pathescape.ptx[0m
[  5%] [32mBuilding CXX object dependencies/tiny-cuda-nn/dependencies/fmt/CMakeFiles/fmt.dir/src/format.cc.o[0m
[  8%] [32mBuilding CUDA object CMakeFiles/optix_program.dir/src/optix/raystab.ptx[0m
[ 10%] [32mBuilding CUDA object CMakeFiles/optix_program.dir/src/optix/raytrace.ptx[0m
[ 10%] Built target optix_program
[ 13%] [32mBuilding CXX object dependencies/tiny-cuda-nn/dependencies/fmt/CMakeFiles/fmt.dir/src/os.cc.o[0m
[ 16%] [32m[1mLinking CXX static library libfmt.a[0m
[ 16%] Built target fmt
[ 18%] [32mBuilding CUDA object dependencies/tiny-cuda-nn/CMakeFiles/tiny-cuda-nn.dir/src/common_host.cu.o[0m
[ 21%] [32mBuilding CUDA object dependencies/tiny-cuda-nn/CMakeFiles/tiny-cuda-nn.dir/src/cpp_api.cu.o[0m
[ 24%] [32mBuilding CUDA object dependencies/tiny-cuda-nn/CMakeFiles/tiny-cuda-nn.dir/src/cutlass_mlp.cu.o[0m
[ 27%] [32mBuilding CUDA object dependencies/tiny-cuda-nn/CMa

In [None]:
# Install the pyngp Python module
!pip install ./build

[31mERROR: Directory './build' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.[0m[31m
[0m

In [None]:
!pip3 install -r requirements.txt

Collecting commentjson (from -r requirements.txt (line 1))
  Downloading commentjson-0.9.0.tar.gz (8.7 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pybind11 (from -r requirements.txt (line 5))
  Downloading pybind11-2.13.6-py3-none-any.whl.metadata (9.5 kB)
Collecting pyquaternion (from -r requirements.txt (line 6))
  Downloading pyquaternion-0.9.9-py3-none-any.whl.metadata (1.4 kB)
Collecting lark-parser<0.8.0,>=0.7.1 (from commentjson->-r requirements.txt (line 1))
  Downloading lark-parser-0.7.8.tar.gz (276 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m276.2/276.2 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Downloading pybind11-2.13.6-py3-none-any.whl (243 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m243.3/243.3 kB[0m [31m20.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyquaternion-0.9.9-py3-none-any.whl (14 kB)
Building wheels for collected packages: 

## 6. [LOCAL MACHINE] Prepare Record3D data
Record3D provides more robust camera pose estimation than COLMAP, especially for scenes with repetitive patterns or lacking texture.

**If you have raw .r3d files:**
1. **Record your scene** using the Record3D iOS app (requires iPhone 12 Pro or newer)
2. **Export the data** using "Shareable/Internal format (.r3d)"
3. **Transfer the .r3d file** to your computer
4. **Extract the data**:
   - Change the file extension from `.r3d` to `.zip`
   - Unzip the file to get a directory with your scene data
5. **Run the preprocessing script**:
   ```bash
   python scripts/record3d2nerf.py --scene path/to/your/data
   ```
   If you captured in landscape orientation, add `--rotate`:
   ```bash
   python scripts/record3d2nerf.py --scene path/to/your/data --rotate
   ```

**If you already have preprocessed Record3D data:**
Your folder should contain:
- `rgbd/` folder with RGB images, depth maps, and confidence maps:
  - `*.jpg` - RGB color images
  - `*.depth` - Depth maps (distance measurements for each pixel)
  - `*.conf` - Confidence maps (reliability of each depth measurement)
- `transforms.json` file with camera poses (generated from Record3D ARKit data)

**Key advantages of Record3D over COLMAP:**
- **Real-time pose estimation** using ARKit (more robust than COLMAP's feature matching)
- **Depth supervision** for better training convergence and accuracy
- **Confidence-weighted training** to handle unreliable depth measurements
- **Better for challenging scenes**: textureless surfaces, repetitive patterns, reflective materials
- **No GUI processing required** - everything runs in command line
- **Faster preprocessing** compared to COLMAP's reconstruction pipeline

**Depth and Confidence Benefits for NeRF Training:**
- Faster convergence due to geometric constraints from depth
- Better novel view synthesis, especially in poorly textured regions
- More accurate geometry reconstruction
- Improved handling of transparent or reflective surfaces using confidence weighting

## 7. Upload your scene

In [None]:
import os

Mount your google drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Then upload your Record3D data (either the `images` folder from COLMAP or the `rgbd` folder from Record3D) and the `transforms.json` file to your drive. The structure should be similar to the following:

**For Record3D data:**
```
/content/drive/MyDrive/nerf_scenes/
└── test_1
    ├── rgbd/
    │   ├── 0.jpg
    │   ├── 0.depth
    │   ├── 0.conf
    │   ├── 1.jpg
    │   └── ...
    └── transforms.json
```

**For COLMAP data (legacy support):**
```
/content/drive/MyDrive/nerf_scenes/
└── fox
    ├── images/
    │   ├── 00001.jpg
    │   └── 00002.jpg
    └── transforms.json
```



In [None]:
print("\nContents of nerf_scenes:")
os.listdir('/content/drive/MyDrive/nerf_scenes')


Contents of nerf_scenes:


['corridor_record3d',
 'corridor_record3d_small',
 'corridor_record3d_small_small']

Enter the path to your scene

In [None]:
import os
scene_path = "/content/drive/MyDrive/nerf_scenes/corridor_record3d" #@param {type:"string"}
if not os.path.isdir(scene_path):
  raise NotADirectoryError(scene_path)

We have a problem: We are running out of RAM very soon: ~1070 images (of full-size) fit untill the sudden termination of the process.

Let's try to decrease the dataset:

In [None]:
# Create smaller dataset with fewer images
import shutil
import json
small_scene_path = scene_path + "_small"

In [None]:
# # Do not need to rerun everytime (in case it already exist)

# Load and examine the original transforms.json
with open(f"{scene_path}/transforms.json", 'r') as f:
    transforms = json.load(f)

print(f"Original dataset has {len(transforms['frames'])} frames")

# Select every 15th frame from transforms.json
n = 10
selected_frames = transforms['frames'][::n]
print(f"Selected {len(selected_frames)} frames (every {n}th)")

# Create new dataset directory
os.makedirs(f"{small_scene_path}/rgbd", exist_ok=True)

# Copy only the files referenced in selected frames
copied_count = 0
for frame in selected_frames:
    # Extract filename from transforms.json file_path
    file_path = frame['file_path']
    # Remove leading "./" if present
    if file_path.startswith('./'):
        file_path = file_path[2:]

    # Get base filename without extension
    if '/' in file_path:
        base_name = file_path.split('/')[-1].split('.')[0]
    else:
        base_name = file_path.split('.')[0]

    # Copy RGB, depth, and confidence files
    for ext in ['.jpg', '.depth', '.conf']:
        src = f"{scene_path}/rgbd/{base_name}{ext}"
        dst = f"{small_scene_path}/rgbd/{base_name}{ext}"
        if os.path.exists(src):
            shutil.copy(src, dst)
            if ext == '.jpg':  # Count only RGB files
                copied_count += 1

print(f"Copied {copied_count} image sets")

# Create new transforms.json with only selected frames
new_transforms = transforms.copy()
new_transforms['frames'] = selected_frames

with open(f"{small_scene_path}/transforms.json", 'w') as f:
    json.dump(new_transforms, f, indent=2)

# Update scene path
scene_path = small_scene_path
print(f"New dataset ready at: {scene_path}")

Original dataset has 3075 frames
Selected 205 frames (every 15th)
Copied 205 image sets
New dataset ready at: /content/drive/MyDrive/nerf_scenes/corridor_record3d_small


In [None]:
# Update scene_path to use the smaller dataset
scene_path = small_scene_path

We can also reduce Image resolution (directly affects the quiality)

Use smaller batch sizes and network configurations

Failed to complete the training process in colab => necessity to move to working station (DL/power)



In [None]:
train_steps = 500
snapshot_path = os.path.join(scene_path, f"{train_steps}.ingp")
!python ./scripts/run.py {scene_path} --n_steps {train_steps} --save_snapshot {snapshot_path} --network {config_path}

[0m16:35:59 [0;32mSUCCESS  [0mInitialized CUDA 12.5. Active GPU is #0: Tesla T4 [75][K[0m
16:35:59 [0;36mINFO     [0mLoading NeRF dataset from[K[0m
16:35:59 [0;36mINFO     [0m  /content/drive/MyDrive/nerf_scenes/corridor_record3d_small/transforms.json[K[0m
16:35:59 [0;34mPROGRESS [0m[]   0% (  0/308)  0s/inf[K[0m[0G16:35:59 [0;34mPROGRESS [0m[]   0% (  1/308) 0s/8s[K[0m[0G16:35:59 [0;34mPROGRESS [0m[]   1% (  2/308) 0s/4s[K[0m[0G16:35:59 [0;34mPROGRESS [0m[]   1% (  3/308) 0s/5s[K[0m[0G16:35:59 [0;34mPROGRESS [0m[]   1% (  4/308) 0s/4s[K[0m[0G16:35:59 [0;34mPROGRESS [0m[]   2% (  5/308) 0s/4s[K[0m[0G16:35:59 [0;34mPROGRESS [0m[]   2% (  6/308) 0s/4s[K[0m[0G16:35:59 [0;34mPROGRESS [0m[]   2% (  7/308) 0s/4s[K[0m[0G16:35:59 [0;34mPROGRESS [0m[]   3% (  8/308) 0s/3s[K[0m[0G16:36:00 [0;34mPROGRESS [0m[]   3% (  9/308) 0s/4s[K[0m[0G16:36:00 [0;34mPROGRESS [0m[]   3% ( 10/308) 0s/3s[K[0m[0G16:36:00 [0;34mPROGRESS [0m[]   4

## 8. Training a model on our scene




train_steps refers to the number of training iterations the NN will perform during training.

Quick test: 200 - 500 steps (poor quality, but v fast)
1000-3000 - decent
5000 - 10000 - good
20000+ - high

In [None]:
os.environ['TCNN_CUDA_ARCHITECTURES'] = '75'

In [None]:
train_steps = 500  #@param {type:"integer"}
snapshot_path = os.path.join(scene_path, f"{train_steps}.ingp")
!python ./scripts/run.py {scene_path} --n_steps {train_steps} --save_snapshot {snapshot_path}

[0m21:33:15 [0;32mSUCCESS  [0mInitialized CUDA 12.5. Active GPU is #0: Tesla T4 [75][K[0m
21:33:15 [0;36mINFO     [0mLoading NeRF dataset from[K[0m
21:33:15 [0;36mINFO     [0m  /content/drive/MyDrive/nerf_scenes/corridor_record3d_small/transforms.json[K[0m
21:33:15 [0;34mPROGRESS [0m[]   0% (  0/205)  0s/inf[K[0m[0G21:33:15 [0;34mPROGRESS [0m[]   0% (  1/205) 0s/8s[K[0m[0G21:33:15 [0;34mPROGRESS [0m[]   1% (  2/205) 0s/6s[K[0m[0G21:33:15 [0;34mPROGRESS [0m[]   1% (  3/205) 0s/4s[K[0m[0G21:33:15 [0;34mPROGRESS [0m[]   2% (  4/205) 0s/5s[K[0m[0G21:33:15 [0;34mPROGRESS [0m[]   2% (  5/205) 0s/4s[K[0m[0G21:33:15 [0;34mPROGRESS [0m[]   3% (  6/205) 0s/5s[K[0m[0G21:33:15 [0;34mPROGRESS [0m[]   3% (  7/205) 0s/4s[K[0m[0G21:33:15 [0;34mPROGRESS [0m[]   4% (  8/205) 0s/5s[K[0m[0G21:33:15 [0;34mPROGRESS [0m[]   4% (  9/205) 0s/4s[K[0m[0G21:33:15 [0;34mPROGRESS [0m[]   5% ( 10/205) 0s/4s[K[0m[0G21:33:15 [0;34mPROGRESS [0m[]   5

In [None]:
train_steps = 200  # Start smaller
snapshot_path = os.path.join(scene_path, f"{train_steps}.ingp")

print(f"Starting training for {train_steps} steps...")
print(f"Will save to: {snapshot_path}")

!pkill -f "drive"
!pkill -f "python3"


!python ./scripts/run.py {scene_path} \
  --n_steps {train_steps} \
  --save_snapshot {snapshot_path} \
  --train

Starting training for 200 steps...
Will save to: /content/drive/MyDrive/nerf_scenes/corridor_record3d/200.ingp


In [None]:
# Create a much smaller dataset
import os
import shutil

# Create smaller dataset with every 5th image
small_scene_path = scene_path + "_tiny"
os.makedirs(f"{small_scene_path}/rgbd", exist_ok=True)

# Copy transforms.json
shutil.copy(f"{scene_path}/transforms.json", f"{small_scene_path}/transforms.json")

# Copy only every 20th image to drastically reduce memory usage
rgbd_files = sorted([f for f in os.listdir(f"{scene_path}/rgbd") if f.endswith('.jpg')])
selected_files = rgbd_files[::5]  # Every 20th image

for filename in selected_files:
    base_name = filename.split('.')[0]
    for ext in ['.jpg', '.depth', '.conf']:
        src = f"{scene_path}/rgbd/{base_name}{ext}"
        dst = f"{small_scene_path}/rgbd/{base_name}{ext}"
        if os.path.exists(src):
            shutil.copy(src, dst)

print(f"Created tiny dataset with {len(selected_files)} images (every 5th)")

# Train on tiny dataset
train_steps = 500
snapshot_path = os.path.join(small_scene_path, f"{train_steps}.ingp")
!python ./scripts/run.py {small_scene_path} --n_steps {train_steps} --save_snapshot {snapshot_path}

Training is being killed by the Linux OOM (Out of Memory) killer due to cgroup memory limits. Colab has internal container memory limits that are being exceeded.  

The drive process (Google Drive mounting) is being killed, which suggests the training process is using too much memory and hitting Colab's internal limits.

## Generate a camera path (on local machine)
 open it in your local machine with `testbed` and generate a `base_cam.jon` file following these [instructions](https://github.com/NVlabs/instant-ngp#testbed-controls). Remember to launch with the `--no-train` argument so that it doesn't start to train on your PC. Setting up the cameras can make your GUI pretty laggy, you can try to play with the `--height` and `--width` parameters or cropping your scene with the `Crop aabb` options to optimize the performance.

Example command:
```
./build/instant-ngp /data/nerf/fox/2000.ingp
```

After you're done, **upload `base_cam.json` to the root folder of your scene.**

## Rendering video

Make sure `base_cam.json` exists:

In [None]:
video_camera_path = os.path.join(scene_path, "base_cam.json")
if not os.path.isfile(video_camera_path):
  raise FileNotFoundError(video_camera_path)

Render the video

In [None]:
video_n_seconds = 5 #@param {type:"integer"}
video_fps = 25 #@param {type:"integer"}
width = 720 #@param {type:"integer"}
height = 720 #@param {type:"integer"}
output_video_path = os.path.join(scene_path, "output_video.mp4")

!python scripts/run.py {snapshot_path} --video_camera_path {video_camera_path} --video_n_seconds 2 --video_fps 25 --width 720 --height 720 --video_output {output_video_path}
print(f"Generated video saved to:\n{output_video_path}")