# 3D Scene Coordinate Prediction and Pose Estimation Demo

This demo showcases how to use a pre-trained neural network to predict scene coordinates and estimate camera pose using the DSAC\* algorithm. We will load point cloud data, process an image, perform inference, and visualize the results.


In [1]:
import torch
from torch.cuda.amp import autocast
from vis_3d import init_figure, plot_points, plot_camera
from ace_network import Regressor
from dataset import CamLocDataset
import numpy as np
import dsacstar
from PIL import Image
from matplotlib import pyplot as plt


### Load Point Cloud Data

First, we load the previously generated point cloud data and perform coordinate system transformations to prepare for further processing.


In [2]:
# Load point cloud data
point_cloud_3d = np.loadtxt("results_folder/point_cloud_out.txt")
pc_xyz = point_cloud_3d[..., :3]

# Convert from OpenGL coordinate system to OpenCV coordinate system
pc_xyz[:, 1] = -pc_xyz[:, 1]
pc_xyz[:, 2] = -pc_xyz[:, 2]

# Color information
color_N3 = point_cloud_3d[..., 3:] * 0.7


### Load and Preprocess Image

We load the test image from the dataset and perform necessary preprocessing, such as resizing and setting intrinsic parameters.


In [3]:
# Image path
queue_image_path = "datasets/7scenes_redkitchen/test/rgb/seq-06-frame-000936.color.png"

# Create test dataset
test_dataset = CamLocDataset(queue_image_path, image_short_size=480)
test_dataset.set_external_focal_length(525)

# Retrieve data
image_1HW, _, _, _, intrinsics_33, _, _, filename, indice = test_dataset[0]

# Open original image
image_rgb = Image.open(queue_image_path)

# Get original size
original_size = image_rgb.size
width, height = original_size

# Resize image
new_size = (width // 8, height // 8)
resized_image_rgb = np.asarray(image_rgb.resize(new_size))


### Load Pre-trained Network Models

We load the pre-trained weights for the encoder and head networks, and set up the model for evaluation.


In [4]:
# Network weight paths
encoder_path = "xfeat.pt"
head_network_path = "results_folder/ace_network.pt"

# Load network weights
encoder_state_dict = torch.load(encoder_path, map_location="cpu")
head_state_dict = torch.load(head_network_path, map_location="cpu")

# Create regressor
network = Regressor.create_from_split_state_dict(encoder_state_dict, head_state_dict)

# Move to GPU and set to evaluation mode
network = network.to('cuda')
network.eval()

# Disable gradient computation and move image to GPU
with torch.no_grad():
    image_1HW = image_1HW.to('cuda', non_blocking=True)


### Predict Scene Coordinates

Using the neural network, we perform inference on the input image to predict 3D scene coordinates.


In [5]:
with torch.no_grad():
    # Perform inference with automatic mixed precision
    with autocast(enabled=True):
        scene_coordinates_3HW = network(image_1HW.unsqueeze(0))[0]
    
    # Move to CPU and convert to float
    scene_coordinates_3HW = scene_coordinates_3HW.float().cpu()
    
    # Extract intrinsic parameters
    focal_length = intrinsics_33[0, 0].item()
    ppX = intrinsics_33[0, 2].item()
    ppY = intrinsics_33[1, 2].item()


### Estimate Pose Using DSAC\*

We use the DSAC\* algorithm to compute the camera pose based on the predicted scene coordinates.


In [None]:
# Initialize output pose matrix
out_pose = torch.zeros((4, 4))

# Estimate pose using DSAC*
inlier_count = dsacstar.forward_rgb(
    scene_coordinates_3HW.unsqueeze(0),
    out_pose,
    64,                    # Maximum iterations
    10,                    # Inlier threshold
    focal_length,
    ppX,
    ppY,
    100,                   # RANSAC threshold
    100,                   # RANSAC max iterations
    network.OUTPUT_SUBSAMPLE,
    1305                   # Random seed
)


### Visualize Prediction Results

We use Matplotlib to display the original image and the normalized scene coordinates.


In [None]:
# Create subplots
fig, axes = plt.subplots(1, 2, figsize=(12, 6))

# Convert scene coordinates to NumPy array
image_np = scene_coordinates_3HW.permute(1, 2, 0).numpy()

# Normalize the scene coordinates for visualization
lower = np.percentile(image_np, 25)
upper = np.percentile(image_np, 75)
sc = (image_np - lower) / (upper - lower)

# Display original image
axes[0].imshow(image_rgb)
axes[0].set_title("Original Image")
axes[0].axis('off')

# Display normalized scene coordinates
axes[1].imshow(sc)
axes[1].set_title("Scene Coordinates")
axes[1].axis('off')

plt.tight_layout()
plt.show()


### 3D Visualization

Using the `vis_3d` library, we visualize the scene coordinates, camera pose, and point cloud data in a 3D plot.


In [None]:
# Initialize 3D figure
fig = init_figure()

# Plot predicted scene coordinates with corresponding colors
plot_points(fig, scene_coordinates_3HW.view(3, -1).permute(1, 0).numpy(), resized_image_rgb.reshape((-1, 3)))

# Plot camera pose
plot_camera(
    fig, 
    R=out_pose[:3, :3].numpy(), 
    t=out_pose[:3, 3].numpy(), 
    K=intrinsics_33.numpy(), 
    color='rgb(255, 255, 255)', 
    size=3, 
    fill=True, 
    text='queue_image'
)

# Plot original point cloud
plot_points(fig, pc_xyz, color_N3)

# Display the figure
fig.show()
