# Project 4: OpenPrompt: Open-Vocabulary 3D Scene Understanding and Instance Segmentation with Adaptive Prompt Learning

### Dataset
- **Replica Dataset**  
  - Download link: [Replica dataset](https://github.com/aminebdj/OpenYOLO3D/blob/main/scripts/get_replica_dataset.sh)

### Evaluation Script
- **Replica Evaluation Script**  
  - Link: [Replica evaluation script](https://github.com/aminebdj/OpenYOLO3D/tree/main/evaluate/replica)

### Reference Papers for Prompt Learning
1. **Align Your Prompts:** Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
2. **MaPLe:** Multi-modal Prompt Learning

### Modifications and Goals
1. **Objective:**  
   Create an open-vocabulary 3D instance segmentation pipeline.
   - Use **OpenScene** for feature extraction.
   - Use **Mask3D** for class-agnostic proposal generation.

2. **Testing:**  
   Evaluate open-vocabulary instance segmentation results on the **Replica dataset**.  
   - Metric: **Mean Average Precision (mAP)**  
   - Evaluation script: Provided above.

3. **Baseline and Improvements:**  
   - Start with baseline model results.
   - Implement **prompt learning** to improve performance.

### Prompt Learning Details
- Replace the fixed text features from the **CLIP text encoder** with a **trainable prompt** initialized with a text prompt.
- **Training Objective:** Reduce the cosine similarity between visual features of the same object when augmented in different ways (e.g., translations, rotations, or color changes).  
- Ensure consistency in visual features of the same object across augmentations by optimizing the learnable prompt during training.

### Summary
- Develop and test an open-vocabulary 3D instance segmentation pipeline with the specified modifications.
- Leverage prompt learning techniques to enhance the baseline model's performance on the **Replica dataset**.
- Evaluate using **mAP** as the primary metric.

# Installation

Step 1: follow mask3d installation instructions

Step 2: 

In [None]:
sudo apt-get install libopenexr-dev

Step 3: Install clip

In [None]:
pip install git+https://github.com/openai/CLIP.git

Step 4: install tensorboard

In [None]:
pip install tensorboardx

Step 5: install sharedarray

In [None]:
pip install sharedarray

# Running the merged pipeline

## TODO:

1. Download the scannet val checkpoint from https://github.com/JonasSchult/Mask3D -> https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/scannet/scannet_val.ckpt
2. Put it in models/Mask3D/checkpoints/scannet/scannet_val.ckpt

### Output folder structure
```
.
└── experiments/ 
    └── merged_pipeline/      
        ├── run_current_timestamp/
        │   ├── mask3d            # inference results for Mask3D
        │   │   ├── samplename_confidences.txt
        │   │   ├── samplename_labels.txt
        │   │   ├── samplename_masks.pt       # File containing a list of tensors (the masks for each instance)
        │   │   └── ...     
        │   ├── openscene         # inference results for openscene
        │   │    ├── samplename_distill.ply    # Output point cloud
        │   │    ├── samplename_input.ply      # Input point cloud
        │   │    ├── samplename_labels_distill.jpg
        │   │    ├── samplename_features.npy   # The per point features (shape: N x 768)
        │   │    └── ...
        │   ├── samplename_instance_features.npy # Per instance features after running the notebook
        │   └── ...
        └── run_.../
            └── ...
```

In [None]:
### Output folder structure
```
.
└── experiments/ 
    └── merged_pipeline/      
        ├── run_current_timestamp/
        │   ├── mask3d            # Inference results for Mask3D
        │   │   ├── train/        # Train files for Mask3D
        │   │   │   ├── samplename_confidences.txt
        │   │   │   ├── samplename_labels.txt
        │   │   │   ├── samplename_masks.pt               # File containing a list of tensors (the masks for each instance) 
        │   │   │   └── ...
        │   │   ├── val/          # Validation files for Mask3D
        │   │   │   └── ...
        │   │   └── test/         # Test files for Mask3D
        │   │       └── ...
        │   ├── openscene         # Inference results for OpenScene
        │   │   ├── train/        # Train files for OpenScene
        │   │   │   ├── samplename_distill.ply            # Output point cloud
        │   │   │   ├── samplename_input.ply              # Input point cloud
        │   │   │   ├── samplename_labels_distill.jpg
        │   │   │   ├── samplename_features.npy           # The per point features (shape: N x 768)
        │   │   │   └── ...
        │   │   ├── val/          # Validation files for OpenScene
        │   │   │   └── ...
        │   │   └── test/         # Test files for OpenScene
        │   │       └── ...
        │   └── instance_features
        │        ├── train/        # Instance features for training samples
        │        │     ├──  samplename_instance_features.npy # Per instance features after running the notebook
        │        │     └──  ...
        │        ├── val/          # Instance features for validation samples
        │        │     ├── samplename_instance_features.npy # Per instance features after running the notebook
        │        │     └── ...
        │        └── test/         # Instance features for test samples
        │              ├── samplename_instance_features.npy # Per instance features after running the notebook
        │              └── ...
        └── run_.../
            └── ...
```

In [24]:
import experiment
output_path = experiment.setup_experiment()

Created new experiment folder: experiments/merged_pipline/run_2025-01-25-10-52-32


In [25]:
# Use this to get the current output folder
experiment.get_current_path()

'/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-25-10-52-32'

## 1. Openscene:

In [26]:
%%bash -s "$output_path"
# Run openscene
set -x

exp_dir="$1/openscene"
config="./config/openscene/replica/replica_lseg.yaml"
feature_type=distill

mkdir -p "${exp_dir}"
result_dir="${exp_dir}"

export PYTHONPATH="models/openscene"
python -u models/openscene/run/evaluate_merged.py \
  --config=${config} \
  feature_type ${feature_type} \
  save_folder ${result_dir} \
  2>&1 | tee -a ${exp_dir}/eval-$(date +"%Y%m%d_%H%M").log

+ exp_dir=experiments/merged_pipline/run_2025-01-25-10-52-32/openscene
+ config=./config/openscene/replica/replica_lseg.yaml
+ feature_type=distill
+ mkdir -p experiments/merged_pipline/run_2025-01-25-10-52-32/openscene
+ result_dir=experiments/merged_pipline/run_2025-01-25-10-52-32/openscene
+ export PYTHONPATH=models/openscene
+ PYTHONPATH=models/openscene
+ python -u models/openscene/run/evaluate_merged.py --config=./config/openscene/replica/replica_lseg.yaml feature_type distill save_folder experiments/merged_pipline/run_2025-01-25-10-52-32/openscene
++ date +%Y%m%d_%H%M
+ tee -a experiments/merged_pipline/run_2025-01-25-10-52-32/openscene/eval-20250125_1052.log


/cluster/54/blessman/ml3d/dataset
torch.__version__:1.12.1+cu113
torch.version.cuda:11.3
torch.backends.cudnn.version:8302
torch.backends.cudnn.enabled:True
[2025-01-25 10:52:51,976 evaluate_merged.py line 164] arch_3d: MinkUNet18A
data_root: dataset/data/replica_split
data_root_2d_fused_feature: data/replica_multiview_openseg
dist_backend: nccl
dist_url: tcp://127.0.0.1:6787
distributed: False
eval_iou: False
exp_dir: ./experiments/openscene/replica_split
feature_2d_extractor: lseg
feature_type: distill
input_color: False
labelset: matterport
manual_seed: 3407
mark_no_feature_to_unknown: True
model_path: https://cvg-data.inf.ethz.ch/openscene/models/matterport_lseg.pth.tar
multiprocessing_distributed: False
ngpus_per_node: 1
prompt_eng: True
rank: 0
save_feature_as_numpy: True
save_folder: experiments/merged_pipline/run_2025-01-25-10-52-32/openscene
split: all
sync_bn: False
test_batch_size: 1
test_gpu: [0]
test_repeats: 1
test_workers: 0
use_apex: False
use_shm: False
vis_gt: False
v

## Visualize results

In [29]:
import open3d as o3d
import numpy as np
import k3d
import os
import glob

def visualize_ply_with_k3d(file_path, point_size=0.05):
    """
    Load a PLY file and visualize it using k3d in a Jupyter Notebook.

    Args:
        file_path (str): Path to the PLY file.
        point_size (float): Size of the points in the visualization.
    """
    # Load the PLY file using Open3D
    ply_data = o3d.io.read_point_cloud(file_path)
    
    # Check if the file is loaded correctly
    if ply_data.is_empty():
        print("Failed to load PLY file.")
        return

    print("PLY file loaded successfully!")
    print(f"Number of points: {len(ply_data.points)}")

    # Extract points and colors
    coords = np.asarray(ply_data.points)  # 3D coordinates
    colors = np.asarray(ply_data.colors)  # RGB values (normalized to [0, 1])

    # Normalize colors to 0-255 and convert to hexadecimal
    colors = (colors * 255).astype(np.uint64)
    colors_hex = (colors[:, 0] << 16) + (colors[:, 1] << 8) + colors[:, 2]

    # Visualize with k3d
    plot = k3d.plot()
    point_cloud = k3d.points(positions=coords, point_size=point_size, colors=colors_hex)
    plot += point_cloud
    return plot

# Path to your PLY file
file_path = os.path.join(output_path, "openscene")
split = "val"
files = glob.glob(os.path.join(file_path, split, "*.ply"))

# Call the visualization function
for file in files:
    plot = visualize_ply_with_k3d(file)
    if plot:
        plot.display()


PLY file loaded successfully!
Number of points: 1187140




Output()

PLY file loaded successfully!
Number of points: 1187140


Output()

PLY file loaded successfully!
Number of points: 645512


Output()

PLY file loaded successfully!
Number of points: 645512


Output()

## 2. Mask3D:

In [31]:
%%bash -s "$output_path"
# Run openscene
set -x

exp_dir="$1/mask3d"

mkdir -p "${exp_dir}"
result_dir="${exp_dir}"

python -u models/Mask3D/predict.py \
general.checkpoint='models/Mask3D/checkpoints/scannet/scannet_val.ckpt' \
general.data_dir="dataset/data/replica_split" \
general.save_dir=${result_dir} \
general.split="all"
#general.num_targets=21 \
#data.num_labels=21
#model.config.backbone._target_=models.Res16UNet18B \
#general.checkpoint="/cluster/54/blessman/ml3d/models/Mask3D/checkpoints/stpls3d/stpls3d_benchmark_03.ckpt" \

+ exp_dir=experiments/merged_pipline/run_2025-01-25-10-52-32/mask3d
+ mkdir -p experiments/merged_pipline/run_2025-01-25-10-52-32/mask3d
+ result_dir=experiments/merged_pipline/run_2025-01-25-10-52-32/mask3d
+ python -u models/Mask3D/predict.py general.checkpoint=models/Mask3D/checkpoints/scannet/scannet_val.ckpt general.data_dir=dataset/data/replica_split general.save_dir=experiments/merged_pipline/run_2025-01-25-10-52-32/mask3d general.split=all


/cluster/54/blessman/ml3d/dataset
Running on device:  cuda
{'_target_': 'models.Res16UNet34C', 'config': {'dialations': [1, 1, 1, 1], 'conv1_kernel_size': 5, 'bn_momentum': 0.02}, 'in_channels': '${data.in_channels}', 'out_channels': '${data.num_labels}', 'out_fpn': True}




Loading checkpoint!
Save dir:  experiments/merged_pipline/run_2025-01-25-10-52-32/mask3d
Data root:  dataset/data/replica_split
['dataset/data/replica_split/test/office4.pth', 'dataset/data/replica_split/test/room2.pth', 'dataset/data/replica_split/train/office0.pth', 'dataset/data/replica_split/train/office1.pth', 'dataset/data/replica_split/train/office2.pth', 'dataset/data/replica_split/train/room0.pth', 'dataset/data/replica_split/val/office3.pth', 'dataset/data/replica_split/val/room1.pth']
Dataset:  8


RPly: Aborted by user


Processing batch 0 from file office4 ....
Shape of mask:  torch.Size([456153, 100])
Shape of logits:  torch.Size([100, 19])
Shape of labels output:  99
Shape of confidences output:  99
Shape of masks_binary output:  99 torch.Size([993008])


RPly: Aborted by user


Processing batch 1 from file room2 ....
Shape of mask:  torch.Size([318867, 100])
Shape of logits:  torch.Size([100, 19])
Shape of labels output:  100
Shape of confidences output:  100
Shape of masks_binary output:  100 torch.Size([722496])


RPly: Aborted by user


Processing batch 2 from file office0 ....
Shape of mask:  torch.Size([265922, 100])
Shape of logits:  torch.Size([100, 19])
Shape of labels output:  100
Shape of confidences output:  100
Shape of masks_binary output:  100 torch.Size([589517])


RPly: Aborted by user


Processing batch 3 from file office1 ....
Shape of mask:  torch.Size([180492, 100])
Shape of logits:  torch.Size([100, 19])
Shape of labels output:  99
Shape of confidences output:  99
Shape of masks_binary output:  99 torch.Size([423007])


RPly: Aborted by user


Processing batch 4 from file office2 ....
Shape of mask:  torch.Size([378125, 100])
Shape of logits:  torch.Size([100, 19])
Shape of labels output:  98
Shape of confidences output:  98
Shape of masks_binary output:  98 torch.Size([858623])
Processing batch 5 from file room0 ....
Shape of mask:  torch.Size([435468, 100])
Shape of logits:  torch.Size([100, 19])
Shape of labels output:  100
Shape of confidences output:  100
Shape of masks_binary output:  100 torch.Size([954492])


RPly: Aborted by user


Processing batch 6 from file office3 ....
Shape of mask:  torch.Size([515474, 100])
Shape of logits:  torch.Size([100, 19])
Shape of labels output:  100
Shape of confidences output:  100
Shape of masks_binary output:  100 torch.Size([1187140])


RPly: Aborted by user


Processing batch 7 from file room1 ....
Shape of mask:  torch.Size([277142, 100])
Shape of logits:  torch.Size([100, 19])
Shape of labels output:  100
Shape of confidences output:  100
Shape of masks_binary output:  100 torch.Size([645512])


## 3. Merge results and visualize

In [32]:
import torch
import open3d as o3d
import numpy as np
import k3d
from glob import glob
import random

In [33]:
import experiment
# Do this if you don't want to run the models again. Returns the path to the current output folder
output_path = experiment.get_current_path()
print(output_path)

/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-25-10-52-32


### Load masks, pointclouds and features

In [35]:
mask3d_path = os.path.join(output_path, "mask3d")
split = 'val'
mask_paths = sorted(glob(os.path.join(mask3d_path, split, '*.pt')))

openscene_path = os.path.join(output_path, "openscene")
#features_paths = sorted(glob(os.path.join(openscene_path, '*.npy')))
point_cloud_paths = sorted(glob(os.path.join(openscene_path, split, '*input.ply')))

print(mask_paths)
#print(features_paths)
print(point_cloud_paths)

['/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-25-10-52-32/mask3d/val/office3_masks.pt', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-25-10-52-32/mask3d/val/room1_masks.pt']
['/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-25-10-52-32/openscene/val/office3_input.ply', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-25-10-52-32/openscene/val/room1_input.ply']


In [36]:
# Just for visualization purposes (the output colors don't correspond to the actual instance classes)
MATTERPORT_COLOR_MAP_21 = {
    1: (174., 199., 232.), # wall
    2: (152., 223., 138.), # floor
    3: (31., 119., 180.), # cabinet
    4: (255., 187., 120.), # bed
    5: (188., 189., 34.), # chair
    6: (140., 86., 75.), # sofa
    7: (255., 152., 150.), # table
    8: (214., 39., 40.), # door
    9: (197., 176., 213.), # window
    10: (148., 103., 189.), # bookshelf
    11: (196., 156., 148.), # picture
    12: (23., 190., 207.), # counter
    13: (247., 182., 210.), # desk
    14: (219., 219., 141.), # curtain
    15: (255., 127., 14.), # refrigerator
    16: (158., 218., 229.), # shower curtain
    17: (44., 160., 44.), # toilet
    18: (112., 128., 144.), # sink
    19: (227., 119., 194.), # bathtub
    20: (82., 84., 163.), # other
    # 41: (186., 197., 62.), # ceiling
    21: (58., 98., 26.), # ceiling
    0: (0., 0., 0.), # unlabel/unknown
}

In [37]:
assert len(mask_paths) == len(point_cloud_paths) #== len()

In [38]:
for i in range(len(mask_paths)):
    binary_masks = torch.load(mask_paths[i])
    ply_data = o3d.io.read_point_cloud(point_cloud_paths[i])
    
    # Extract points and colors
    coords = np.asarray(ply_data.points)  # 3D coordinates
    colors = np.asarray(ply_data.colors)  # RGB values (normalized to [0, 1])

    print(colors.shape)

    # Set base color
    colors[:] = 0.5

    # Normalize colors to 0-255 and convert to hexadecimal
    colors = (colors * 255).astype(np.uint64)

    for i, mask in enumerate(binary_masks):
        random_index = random.randint(1, len(MATTERPORT_COLOR_MAP_21)-1)
        colors[mask] = MATTERPORT_COLOR_MAP_21[random_index]

    colors_hex = (colors[:, 0] << 16) + (colors[:, 1] << 8) + colors[:, 2]

    # Visualize with k3d
    plot = k3d.plot()
    point_cloud = k3d.points(positions=coords, point_size=0.05, colors=colors_hex)
    plot += point_cloud
    plot.display()

(1187140, 3)




Output()

(645512, 3)


Output()

### Merge:

In [39]:
import numpy as np
import torch
import experiment

output_path = experiment.get_current_path()
output_path

'/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-25-10-52-32'

In [41]:
for split in ['train', 'val', 'test']:
    
    print("\nRunning split: ", split, " -----------------")
    
    mask3d_path = os.path.join(output_path, "mask3d")
    mask_paths = sorted(glob(os.path.join(mask3d_path, split, '*.pt')))

    openscene_path = os.path.join(output_path, "openscene")
    features_paths = sorted(glob(os.path.join(openscene_path, split, '*.npy')))

    print("Instance masks: ", mask_paths)
    print("Per point features: ", features_paths)
    
    assert len(mask_paths) == len(features_paths)
    
    for i in range(len(mask_paths)):
    
        sample_name = os.path.basename(mask_paths[i]).split('_')[0]

        # Make sure that the instance masks and the point feature are from the same input sample
        assert sample_name == os.path.basename(features_paths[i]).split('_')[0]

        # Load masks and features
        masks = torch.load(mask_paths[i])
        features = np.load(features_paths[i])
        print(f"Masks shape: ({len(masks)}, {masks[0].shape[0]})")
        print(f"Features shape: {features.shape}")

        mean_instance_features = []
        # Compute average instance features
        for mask in masks:
            masked_features = features[mask,:]
            mean_instance_features.append(features[mask,:].mean(axis=0))
        mean_instance_features = np.array(mean_instance_features)
        print(mean_instance_features.shape)
        
        folder_path = os.path.join(output_path, "instance_features", split)
            
        os.makedirs(folder_path, exist_ok=True)
        
        file_path = os.path.join(folder_path, f"{sample_name}_instance_features.npy")
            
        np.save(file_path, mean_instance_features)

        print(f"Saved instance features for {sample_name}")


Running split:  train  -----------------
Instance masks:  ['/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-25-10-52-32/mask3d/train/office0_masks.pt', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-25-10-52-32/mask3d/train/office1_masks.pt', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-25-10-52-32/mask3d/train/office2_masks.pt', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-25-10-52-32/mask3d/train/room0_masks.pt']
Per point features:  ['/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-25-10-52-32/openscene/train/office0_features.npy', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-25-10-52-32/openscene/train/office1_features.npy', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-25-10-52-32/openscene/train/office2_features.npy', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-25-10-52-32/openscene/train/room0_features.npy']
Masks shape: (