# Overview

**Monocular Depth Estimation** is the task of estimating the depth value (distance relative to the camera) of each pixel given a single (monocular) RGB image. This challenging task is a ket prerequisite for determining scene understanding for applications such as 3D scene reconstruction, autonomous driving, and AR. State-of-the-art methods usually fall into one of two categories:

* Designing a complex network that is powerful enough to directly regress the depth map
* Splitting the input inot bins or windows to reduce computational complexity

The most popular benchmarks are the KITTI and NYUv2 datasets. Models are typically evaluated using RMSE or absolute relative error.

In this notebook, we will illustrate how to load and visualize depth map data, run monocular depth estimation models, and evaluate depth predictions. We will do so using data from the [SUN RGB-D](https://rgbd.cs.princeton.edu/) dataset.

We will use the Hugginf Face transformers and diffuers librarues ofr inference. FityOne for data management and visualization, and scikit-image for evaluation metrics.

In [None]:
!pip install transformers==4.36.2
!pip install diffusers==0.23.1
!pip install fiftyone==0.23.4
!pip install scikit-image==0.22.0

# Loading and Visualizing SUN-RGBD Depth Data

SUN RGB-D is one of the most popular datasets for monocular depth estimation and semantic segementation tasks. It contains images from the NYU depth v2, Berkeley B3DO, and SUN3D datasets. Here we will only use the NYU depth v2 positions. See [here](https://huggingface.co/datasets/sayakpaul/nyu_depth_v2).


## Downloading the Raw Data

In [None]:
!curl -o sunrgbd.zip https://rgbd.cs.princeton.edu/data/SUNRGBD.zip

We will only be using the depth images, so we will only use the RGB images and the depth images(stored in the `depth_bfx` sub-directories).

In [None]:
!unzip sunrgbd.zip

## Create the Dataset

We are just interested in getting the point across, we will restrict ourselves to the first 20 samples, which are all from the NYU Depth v2 portion of the dataset:

In [None]:
from glob import glob
import fiftyone as fo
import numpy as np

## create, name and persist the dataset
dataset=fo.Dataset(name='SUNRGBD-20', persistent=True)

## pick out first 20 scenes
scene_dirs=glob('SUNRGBD/kv1/NYUdata/*')[:20]

samples=[]

for scene_dir in scene_dirs:
    # get image file path from scene directory
    image_path=glob(f'{scene_dir}/image/*')[0]
    depth_path=glob(f'{scene_dir}/depth_bfx/*')[0]
    
    depth_map=np.array(Image.open(depth_path))
    depth_map=(depth_map*255/np.max(depth_map)).astype('unit8')
    
    ## create sample
    sample=fo.Sample(
        filepath=image_path,
        gt_depth=fo.Heatmap(map=depth_map),
    )
    
    samples.append(sample)

## Add samples to dataset
dataset.add_samples(samples)

We are storing the depth maps as heatmaps. Everything is represented in terms of normalized, relative distances, where 255 reoresents the maximum distance in the scene and 0 represents the minimum distance in the scene. This is a common way to represent depth maps, although it is far from the only way to do so.

## Visualizing Ground Truth Data

In [None]:
# Check the localhost:5151 in broser
session=fo.launch_app(dataset, auto=False)