# Digital Twin from NYU2_train Images

This notebook processes images from the `nyu2_train` folder to create test samples of a digital twin. The pipeline uses:

- **MiDaS** for depth estimation
- **MediaPipe** for pose (skeleton) detection
- **Open3D** to build a 3D point cloud from the RGB‑D image
- **Plotly** to visualize the 3D point cloud and overlaid skeleton

Once you verify the results on static images, you can later extend this pipeline to process live video.

In [3]:
!apt-get update
!apt-get install -y libosmesa6-dev libgl1-mesa-glx libglfw3

!pip install opencv-python-headless mediapipe open3d torch torchvision plotly

Reading package lists... Done
E: Could not open lock file /var/lib/apt/lists/lock - open (13: Permission denied)
E: Unable to lock directory /var/lib/apt/lists/
W: Problem unlinking the file /var/cache/apt/pkgcache.bin - RemoveCaches (13: Permission denied)
W: Problem unlinking the file /var/cache/apt/srcpkgcache.bin - RemoveCaches (13: Permission denied)
E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied)
E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?
Defaulting to user installation because normal site-packages is not writeable
Collecting opencv-python-headless
  Downloading opencv_python_headless-4.11.0.86-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (50.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.0/50.0 MB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25hCollecting mediapipe
  Downloading mediapipe-0.10.21-cp310-cp310-manylinux_2_28_x86_64.whl (35.

In [None]:
import os
import cv2
import torch
import numpy as np
import mediapipe as mp
import open3d as o3d
import plotly.graph_objects as go
import matplotlib.pyplot as plt
from torchvision import transforms
from IPython.display import display, clear_output

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)
torch.backends.cudnn.benchmark = True

## Load MiDaS Model

We use the small MiDaS model for faster inference.

In [None]:
model_type = "MiDaS_small"
midas = torch.hub.load("intel-isl/MiDaS", model_type)
midas.to(device)
midas.eval()

midas_transforms = torch.hub.load("intel-isl/MiDaS", "transforms")
transform = midas_transforms.small_transform if model_type == "MiDaS_small" else midas_transforms.default_transform
print("MiDaS model loaded.")

## Setup MediaPipe Pose

We use MediaPipe in static image mode.

In [None]:
mp_pose = mp.solutions.pose
pose_estimator = mp_pose.Pose(static_image_mode=True,
                              model_complexity=1,
                              min_detection_confidence=0.5,
                              min_tracking_confidence=0.5)
pose_connections = mp_pose.POSE_CONNECTIONS
print("MediaPipe Pose loaded.")

## Utility Functions

We define a function to backproject 2D pixel coordinates into 3D space using camera intrinsics.

In [None]:
def backproject(u, v, depth_value, fx, fy, ppx, ppy):
    z = depth_value
    x = (u - ppx) * z / fx
    y = (v - ppy) * z / fy
    return np.array([x, y, z])

## Load Test Images

Make sure to upload your test images (e.g., from the NYU2_train dataset) into a folder named `nyu2_train` in Colab. The following cell lists all image files in that folder.

In [None]:
image_folder = "nyu2_train"  # Folder containing your test images
if not os.path.exists(image_folder):
    raise ValueError(f"Folder '{image_folder}' does not exist. Please upload your images there.")

image_files = [os.path.join(image_folder, f) for f in os.listdir(image_folder) if f.lower().endswith(('.png', '.jpg', '.jpeg'))]
print(f"Found {len(image_files)} images in '{image_folder}'")
image_files[:5]  # Display first 5 file paths

## Process a Single Image

For testing, we process the first image in the list. The steps are:
1. Load and optionally downscale the image.
2. Run MiDaS to estimate the depth map.
3. Create an RGB-D image and generate a point cloud using Open3D.
4. Run MediaPipe to extract pose landmarks and backproject them to 3D.
5. Visualize the digital twin using Plotly.

In [None]:
# Select the first image for testing
if len(image_files) == 0:
    raise ValueError("No images found in the specified folder.")
test_image_path = image_files[0]
print("Processing image:", test_image_path)

# Load image
frame = cv2.imread(test_image_path)
if frame is None:
    raise ValueError("Could not load the image.")

# Optional: Downscale image for faster processing
downscale_factor = 1.0  # Adjust (e.g., 0.5 for half resolution)
orig_height, orig_width, _ = frame.shape
proc_width = int(orig_width * downscale_factor)
proc_height = int(orig_height * downscale_factor)
frame_proc = cv2.resize(frame, (proc_width, proc_height), interpolation=cv2.INTER_AREA)
frame_rgb = cv2.cvtColor(frame_proc, cv2.COLOR_BGR2RGB)

print("Image loaded and preprocessed.")

# --- Depth Estimation using MiDaS ---
with torch.no_grad():
    input_batch = transform(frame_rgb).to(device)
    prediction = midas(input_batch)
    prediction = torch.nn.functional.interpolate(
        prediction.unsqueeze(1),
        size=(proc_height, proc_width),
        mode="bilinear",
        align_corners=False
    ).squeeze()
torch.cuda.synchronize()
depth_map = prediction.cpu().detach().numpy()
depth_map_norm = cv2.normalize(depth_map, None, 0, 1, norm_type=cv2.NORM_MINMAX)
print("Depth estimation complete.")

# --- Create Open3D Point Cloud ---
o3d_color = o3d.geometry.Image(frame_rgb)
o3d_depth = o3d.geometry.Image((depth_map_norm * 1000).astype(np.uint16))
rgbd = o3d.geometry.RGBDImage.create_from_color_and_depth(
    o3d_color, o3d_depth,
    depth_scale=1000.0,
    convert_rgb_to_intensity=False
)

# Approximate camera intrinsics (using processed resolution)
fx = fy = proc_width  # Simplistic assumption
ppx = proc_width / 2
ppy = proc_height / 2
intrinsic = o3d.camera.PinholeCameraIntrinsic(proc_width, proc_height, fx, fy, ppx, ppy)

pcd = o3d.geometry.PointCloud.create_from_rgbd_image(rgbd, intrinsic)
pcd.transform([[1, 0, 0, 0],
               [0, -1, 0, 0],
               [0, 0, -1, 0],
               [0, 0, 0, 1]])
print("3D point cloud created.")

# --- Pose Estimation using MediaPipe ---
results = pose_estimator.process(frame_rgb)
keypoints_3d = []
if results.pose_landmarks:
    for landmark in results.pose_landmarks.landmark:
        u = int(landmark.x * proc_width)
        v = int(landmark.y * proc_height)
        u_clamped = np.clip(u, 0, proc_width - 1)
        v_clamped = np.clip(v, 0, proc_height - 1)
        depth_val = depth_map_norm[v_clamped, u_clamped]
        depth_in_meters = depth_val * 5.0  # Assume maximum depth ~5 m
        keypoints_3d.append(backproject(u, v, depth_in_meters, fx, fy, ppx, ppy))
else:
    keypoints_3d = [np.array([0, 0, 0]) for _ in range(33)]
keypoints_3d = np.array(keypoints_3d)
print("3D skeleton created.")

## Visualize Digital Twin using Plotly

We create an interactive 3D scatter plot of the point cloud and overlay the skeleton as line traces.

In [None]:
# Convert point cloud to NumPy arrays
pts = np.asarray(pcd.points)
if len(pcd.colors) > 0:
    colors = np.asarray(pcd.colors)
else:
    colors = np.ones((pts.shape[0], 3))

pcd_trace = go.Scatter3d(
    x=pts[:, 0],
    y=pts[:, 1],
    z=pts[:, 2],
    mode='markers',
    marker=dict(
        size=1,
        color=['rgb({},{},{})'.format(int(c[0]*255), int(c[1]*255), int(c[2]*255)) for c in colors],
        opacity=0.8
    ),
    name='Point Cloud'
)

line_traces = []
for connection in pose_connections:
    start_idx, end_idx = connection
    if start_idx < len(keypoints_3d) and end_idx < len(keypoints_3d):
        p0 = keypoints_3d[start_idx]
        p1 = keypoints_3d[end_idx]
        line_trace = go.Scatter3d(
            x=[p0[0], p1[0]],
            y=[p0[1], p1[1]],
            z=[p0[2], p1[2]],
            mode='lines',
            line=dict(color='green', width=5),
            showlegend=False
        )
        line_traces.append(line_trace)

fig = go.Figure(data=[pcd_trace] + line_traces)
fig.update_layout(scene=dict(aspectmode='data'),
                  title="Digital Twin - 3D Model from Image")
fig.show()