# ScanNet

This notebook lets you instantiate the **[ScanNet](http://www.scan-net.org/)** dataset from scratch and visualize **3D+2D room samples**.

Note that you will need **at least 1.2T** available for the SanNet raw dataset and **at least 64G** for the processed files at **5cm voxel resolution** and **320x240 image resolution**. 

The ScanNet dataset is composed of **rooms** of video acquisitions of indoor scenes. Thes video streams were used to produce a point cloud and images.

Each room is small enough to be loaded at once into a **64G RAM** memory. The `ScannetDatasetMM` class from `torch_points3d.datasets.segmentation.multimodal.scannet` deals with loading the room and part of the images of the associated video stream.

In [None]:
# Select you GPU
I_GPU = 0

In [None]:
# Uncomment to use autoreload
# %load_ext autoreload
# %autoreload 2

import os
import os.path as osp
import sys
import torch
import numpy as np
from time import time
from omegaconf import OmegaConf
start = time()
import warnings
warnings.filterwarnings('ignore')

torch.cuda.set_device(I_GPU)
DIR = os.path.dirname(os.getcwd())
ROOT = os.path.join(DIR, "..")
sys.path.insert(0, ROOT)
sys.path.insert(0, DIR)

from torch_points3d.utils.config import hydra_read
from torch_geometric.data import Data
from torch_points3d.core.multimodal.data import MMData
from torch_points3d.visualization.multimodal_data import visualize_mm_data
from torch_points3d.core.multimodal.image import SameSettingImageData, ImageData
from torch_points3d.datasets.segmentation.multimodal.scannet import ScannetDatasetMM

If `visualize_mm_data` does not throw any error but the visualization does not appear, you may need to change your plotly renderer below.

In [None]:
import plotly.io as pio

pio.renderers.default = 'jupyterlab'        # for local notebook
# pio.renderers.default = 'iframe_connected'  # for remote notebook. Other working (but seemingly slower) options are: 'sphinx_gallery' and 'iframe'

## Dataset creation

The following will instantiate the dataset. If the data is not found at `DATA_ROOT`, the folder structure will be created there and the raw dataset will be downloaded there. 

**Memory-friendly tip** : if you have already downloaded the dataset once and simply want to instantiate a new dataset with different preprocessing (*e.g* change 3D or 2D resolution, mapping parameterization, etc), I recommend you manually replicate the folder hierarchy of your already-existing dataset and create a symlink to its `raw/` directory to avoid downloading and storing (very) large files twice.

You will find the config file ruling the dataset creation at `conf/data/segmentation/multimodal/scannet-sparse.yaml`. You may edit this file or create new configs inheriting from this one using Hydra and create the associated dataset by modifying `dataset_config` accordingly in the following cell.

In [None]:
# Set your dataset root directory, where the data was/will be downloaded
DATA_ROOT = '/path/to/your/dataset/root/directory'

dataset_config = 'segmentation/multimodal/scannet-sparse'   
models_config = 'segmentation/multimodal/sparseconv3d'    # this does not really matter here, but is expected by hydra for config parsing
model_name = 'Res16UNet34-L4-early-ade20k-interpolate'    # this does not really matter here, but is expected by hydra for config parsing


overrides = [
    'task=segmentation',
    f'data={dataset_config}',
    f'models={models_config}',
    f'model_name={model_name}',
    f'data.dataroot={DATA_ROOT}',
]

cfg = hydra_read(overrides)
# print(OmegaConf.to_yaml(cfg))

The dataset will now be created based on the parsed configuration. I recommend having **at least 1.2T** available for the SanNet raw dataset and **at least 64G** for the processed files at **5cm voxel resolution** and **320x240 image resolution**. 

As long as you do not change core dataset parameters, preprocessing should only be performed once for your dataset. It may take some time, **mostly depending on the 3D and 2D resolutions** you choose to work with (the larger the slower).

In [None]:
# Dataset instantiation
start = time()
dataset = ScannetDatasetMM(cfg.data)
# print(dataset)
print(f"Time = {time() - start:0.1f} sec.")

To visualize the multimodal samples produced by the dataset, we need to remove some of the dataset transforms that affect points, images and mappings.

At training and evaluation time, these transforms are used for data augmentation, dynamic size batching (see our [paper](https://arxiv.org/submit/4264152)), etc...

In [None]:
# Drop some 3D and 2D transforms to allow visualizations
dataset.train_dataset.transform.transforms = dataset.train_dataset.transform.transforms[4:-1]
dataset.train_dataset.transform_image.transforms = dataset.train_dataset.transform_image.transforms[:3]

dataset.val_dataset.transform.transforms = dataset.val_dataset.transform.transforms[:2]
dataset.val_dataset.transform_image.transforms = dataset.val_dataset.transform_image.transforms[:3]

dataset.test_dataset[0].transform.transforms = dataset.test_dataset[0].transform.transforms[:2]
dataset.test_dataset[0].transform_image.transforms = dataset.test_dataset[0].transform_image.transforms[:3]

ScanNet proposes annotations for many classes but we are only interested in a subset of those. So we need no remap the labels.

In [None]:
from torch_points3d.datasets.segmentation.scannet import VALID_CLASS_IDS, SCANNET_COLOR_MAP, CLASS_LABELS, NUM_CLASSES, IGNORE_LABEL, CLASS_COLORS, CLASS_NAMES

def remap_scannet_labels(semantic_label, valid_class_idx=VALID_CLASS_IDS, donotcare_class_ids=[]):
    """Remaps labels to [0 ; num_labels -1]."""
    new_labels = semantic_label.clone()
    mapping_dict = {idx: i for i, idx in enumerate(valid_class_idx)}
    for idx in range(NUM_CLASSES):
        if idx not in mapping_dict:
            mapping_dict[idx] = IGNORE_LABEL
    for idx in donotcare_class_ids:
        mapping_dict[idx] = IGNORE_LABEL
    for source, target in mapping_dict.items():
        mask = semantic_label == source
        new_labels[mask] = target

    broken_labels = new_labels >= len(valid_class_idx)
    new_labels[broken_labels] = IGNORE_LABEL

    return new_labels

## Visualize a single multimodal sample

We can now pick samples from the train, val and test datasets.

In [None]:
i_room = 0
mm_data = dataset.train_dataset[i_room]    # pick a room in the Train set
# mm_data = dataset.val_dataset[i_room]      # pick a room in the Val set
# mm_data = dataset.test_dataset[0][i_room]  # pick a room in the Test set

# Create MMData object, keep only a few images, remap labels and remove 
# IGNORE_LABEL points
mm_data.data.y = remap_scannet_labels(mm_data.data.y)
mm_data = mm_data[mm_data.data.y != IGNORE_LABEL]

visualize_mm_data(mm_data, class_names=CLASS_NAMES, class_colors=CLASS_COLORS, front='y', figsize=1000, pointsize=3, voxel=0.05, show_2d=True)