# Project 4: OpenPrompt: Open-Vocabulary 3D Scene Understanding and Instance Segmentation with Adaptive Prompt Learning

### Dataset
- **Replica Dataset**  
  - Download link: [Replica dataset](https://github.com/aminebdj/OpenYOLO3D/blob/main/scripts/get_replica_dataset.sh)

### Evaluation Script
- **Replica Evaluation Script**  
  - Link: [Replica evaluation script](https://github.com/aminebdj/OpenYOLO3D/tree/main/evaluate/replica)

### Reference Papers for Prompt Learning
1. **Align Your Prompts:** Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
2. **MaPLe:** Multi-modal Prompt Learning

### Modifications and Goals
1. **Objective:**  
   Create an open-vocabulary 3D instance segmentation pipeline.
   - Use **OpenScene** for feature extraction.
   - Use **Mask3D** for class-agnostic proposal generation.

2. **Testing:**  
   Evaluate open-vocabulary instance segmentation results on the **Replica dataset**.  
   - Metric: **Mean Average Precision (mAP)**  
   - Evaluation script: Provided above.

3. **Baseline and Improvements:**  
   - Start with baseline model results.
   - Implement **prompt learning** to improve performance.

### Prompt Learning Details
- Replace the fixed text features from the **CLIP text encoder** with a **trainable prompt** initialized with a text prompt.
- **Training Objective:** Reduce the cosine similarity between visual features of the same object when augmented in different ways (e.g., translations, rotations, or color changes).  
- Ensure consistency in visual features of the same object across augmentations by optimizing the learnable prompt during training.

### Summary
- Develop and test an open-vocabulary 3D instance segmentation pipeline with the specified modifications.
- Leverage prompt learning techniques to enhance the baseline model's performance on the **Replica dataset**.
- Evaluate using **mAP** as the primary metric.

# Installation

Step 1: follow mask3d installation instructions

Step 2: 

In [None]:
sudo apt-get install libopenexr-dev

Step 3: Install clip

In [None]:
pip install git+https://github.com/openai/CLIP.git

Step 4: install tensorboard

In [None]:
pip install tensorboardx

Step 5: install sharedarray

In [None]:
pip install sharedarray

# Running the merged pipeline

## TODO:

1. Download the scannet val checkpoint from https://github.com/JonasSchult/Mask3D -> https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/scannet/scannet_val.ckpt
2. Put it in models/Mask3D/checkpoints/scannet/scannet_val.ckpt

### Output folder structure
```
.
└── experiments/ 
    └── merged_pipeline/      
        ├── run_current_timestamp/
        │   ├── mask3d            # Inference results for Mask3D
        │   │   ├── train/        # Train files for Mask3D
        │   │   │   ├── samplename_confidences.txt
        │   │   │   ├── samplename_labels.txt
        │   │   │   ├── samplename_masks.pt               # File containing a list of tensors (the masks for each instance) 
        │   │   │   └── ...
        │   │   ├── val/          # Validation files for Mask3D
        │   │   │   └── ...
        │   │   └── test/         # Test files for Mask3D
        │   │       └── ...
        │   ├── openscene         # Inference results for OpenScene
        │   │   ├── train/        # Train files for OpenScene
        │   │   │   ├── samplename_distill.ply            # Output point cloud
        │   │   │   ├── samplename_input.ply              # Input point cloud
        │   │   │   ├── samplename_labels_distill.jpg
        │   │   │   ├── samplename_features.npy           # The per point features (shape: N x 768)
        │   │   │   └── ...
        │   │   ├── val/          # Validation files for OpenScene
        │   │   │   └── ...
        │   │   └── test/         # Test files for OpenScene
        │   │       └── ...
        │   └── instance_features
        │        ├── train/        # Instance features for training samples
        │        │     ├──  samplename_instance_features.npy # Per instance features after running the notebook
        │        │     └──  ...
        │        ├── val/          # Instance features for validation samples
        │        │     ├── samplename_instance_features.npy # Per instance features after running the notebook
        │        │     └── ...
        │        └── test/         # Instance features for test samples
        │              ├── samplename_instance_features.npy # Per instance features after running the notebook
        │              └── ...
        └── run_.../
            └── ...
```

In [91]:
import experiment
output_path = experiment.setup_experiment()

Created new experiment folder: experiments/merged_pipline/run_2025-01-28-13-50-10


In [92]:
# Use this to get the current output folder
experiment.get_current_path()

'/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-28-13-50-10'

## 1. Openscene:

In [93]:
%%bash -s "$output_path"
# Run openscene
set -x

exp_dir="$1/openscene"
config="./config/openscene/replica/replica_lseg.yaml"
feature_type=distill

mkdir -p "${exp_dir}"
result_dir="${exp_dir}"

export PYTHONPATH="models/openscene"
python -u models/openscene/run/evaluate_merged.py \
  --config=${config} \
  feature_type ${feature_type} \
  save_folder ${result_dir} \
  2>&1 | tee -a ${exp_dir}/eval-$(date +"%Y%m%d_%H%M").log

+ exp_dir=experiments/merged_pipline/run_2025-01-28-13-50-10/openscene
+ config=./config/openscene/replica/replica_lseg.yaml
+ feature_type=distill
+ mkdir -p experiments/merged_pipline/run_2025-01-28-13-50-10/openscene
+ result_dir=experiments/merged_pipline/run_2025-01-28-13-50-10/openscene
+ python -u models/openscene/run/evaluate_merged.py --config=./config/openscene/replica/replica_lseg.yaml feature_type distill save_folder experiments/merged_pipline/run_2025-01-28-13-50-10/openscene
++ date +%Y%m%d_%H%M
+ tee -a experiments/merged_pipline/run_2025-01-28-13-50-10/openscene/eval-20250128_1350.log


Traceback (most recent call last):
  File "/cluster/54/blessman/ml3d/models/openscene/run/evaluate_merged.py", line 16, in <module>
    from util import metric
ModuleNotFoundError: No module named 'util'


## Visualize results

In [12]:
import os
import glob
from point_cloud import visualize_ply_with_k3d

# Path to your PLY file
file_path = os.path.join(output_path, "openscene")
split = "val"
files = glob.glob(os.path.join(file_path, split, "*.ply"))

# Call the visualization function
for file in files:
    plot = visualize_ply_with_k3d(file)
    if plot:
        plot.display()


PLY file loaded successfully!
Number of points: 1187140




Output()

PLY file loaded successfully!
Number of points: 1187140


Output()

PLY file loaded successfully!
Number of points: 645512


Output()

PLY file loaded successfully!
Number of points: 645512


Output()

## 2. Mask3D:

In [32]:
%%bash -s "$output_path"
# Run mask3d
set -x

exp_dir="$1/mask3d"

mkdir -p "${exp_dir}"
result_dir="${exp_dir}"

python -u models/Mask3D/predict.py \
general.checkpoint='models/Mask3D/checkpoints/scannet/scannet_val.ckpt' \
general.data_dir="dataset/data/replica_split" \
general.save_dir=${result_dir} \
general.split="all" \
general.required_confidence=0.9
#general.num_targets=21 \
#data.num_labels=21
#model.config.backbone._target_=models.Res16UNet18B \
#general.checkpoint="/cluster/54/blessman/ml3d/models/Mask3D/checkpoints/stpls3d/stpls3d_benchmark_03.ckpt" \

+ exp_dir=/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/mask3d
+ mkdir -p /cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/mask3d
+ result_dir=/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/mask3d
+ python -u models/Mask3D/predict.py general.checkpoint=models/Mask3D/checkpoints/scannet/scannet_val.ckpt general.data_dir=dataset/data/replica_split general.save_dir=/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/mask3d general.split=all general.required_confidence=0.9


/cluster/54/blessman/ml3d/dataset
Running on device:  cuda
{'_target_': 'models.Res16UNet34C', 'config': {'dialations': [1, 1, 1, 1], 'conv1_kernel_size': 5, 'bn_momentum': 0.02}, 'in_channels': '${data.in_channels}', 'out_channels': '${data.num_labels}', 'out_fpn': True}




Loading checkpoint!
Save dir:  /cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/mask3d
Data root:  dataset/data/replica_split
Dataset:  8


RPly: Aborted by user


Processing batch 0 from file office4 ....
Shape of mask:  torch.Size([456153, 100])
Shape of logits:  torch.Size([100, 19])
Shape of labels output:  74
Shape of confidences output:  74
Shape of masks_binary output:  74 torch.Size([993008])


RPly: Aborted by user


Processing batch 1 from file room2 ....
Shape of mask:  torch.Size([318867, 100])
Shape of logits:  torch.Size([100, 19])
Shape of labels output:  80
Shape of confidences output:  80
Shape of masks_binary output:  80 torch.Size([722496])


RPly: Aborted by user


Processing batch 2 from file office0 ....
Shape of mask:  torch.Size([265922, 100])
Shape of logits:  torch.Size([100, 19])
Shape of labels output:  76
Shape of confidences output:  76
Shape of masks_binary output:  76 torch.Size([589517])


RPly: Aborted by user


Processing batch 3 from file office1 ....
Shape of mask:  torch.Size([180492, 100])
Shape of logits:  torch.Size([100, 19])
Shape of labels output:  61
Shape of confidences output:  61
Shape of masks_binary output:  61 torch.Size([423007])


RPly: Aborted by user


Processing batch 4 from file office2 ....
Shape of mask:  torch.Size([378125, 100])
Shape of logits:  torch.Size([100, 19])
Shape of labels output:  69
Shape of confidences output:  69
Shape of masks_binary output:  69 torch.Size([858623])
Processing batch 5 from file room0 ....
Shape of mask:  torch.Size([435468, 100])
Shape of logits:  torch.Size([100, 19])
Shape of labels output:  73
Shape of confidences output:  73
Shape of masks_binary output:  73 torch.Size([954492])


RPly: Aborted by user


Processing batch 6 from file office3 ....
Shape of mask:  torch.Size([515474, 100])
Shape of logits:  torch.Size([100, 19])
Shape of labels output:  63
Shape of confidences output:  63
Shape of masks_binary output:  63 torch.Size([1187140])


RPly: Aborted by user


Processing batch 7 from file room1 ....
Shape of mask:  torch.Size([277142, 100])
Shape of logits:  torch.Size([100, 19])
Shape of labels output:  70
Shape of confidences output:  70
Shape of masks_binary output:  70 torch.Size([645512])


## 3. Merge results and visualize

In [117]:
import torch
import open3d as o3d
import numpy as np
import k3d
from glob import glob
import random
import point_cloud

In [100]:
import experiment
# Do this if you don't want to run the models again. Returns the path to the current output folder
output_path = experiment.get_current_path()
print(output_path)

/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08


### Load masks, pointclouds and features

In [130]:
mask3d_path = os.path.join(output_path, "mask3d")
#split = 'val'
mask_paths = utils.get_all_files_in_dir_and_subdir(mask3d_path, "pt")#sorted(glob(os.path.join(mask3d_path, split, '*.pt')))

openscene_path = os.path.join(output_path, "openscene")
#features_paths = sorted(glob(os.path.join(openscene_path, '*.npy')))
point_cloud_paths = utils.get_all_files_in_dir_and_subdir(openscene_path, "distill.ply")

print(mask_paths)
#print(features_paths)
print(point_cloud_paths)

['/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/mask3d/test/office4_masks.pt', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/mask3d/test/room2_masks.pt', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/mask3d/train/office0_masks.pt', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/mask3d/train/office1_masks.pt', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/mask3d/train/office2_masks.pt', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/mask3d/train/room0_masks.pt', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/mask3d/val/office3_masks.pt', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/mask3d/val/room1_masks.pt']
['/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/openscene/test/office4_distill.ply', '/cluster/54/bl

In [102]:
# Just for visualization purposes (the output colors don't correspond to the actual instance classes)
MATTERPORT_COLOR_MAP_21 = {
    1: (174., 199., 232.), # wall
    2: (152., 223., 138.), # floor
    3: (31., 119., 180.), # cabinet
    4: (255., 187., 120.), # bed
    5: (188., 189., 34.), # chair
    6: (140., 86., 75.), # sofa
    7: (255., 152., 150.), # table
    8: (214., 39., 40.), # door
    9: (197., 176., 213.), # window
    10: (148., 103., 189.), # bookshelf
    11: (196., 156., 148.), # picture
    12: (23., 190., 207.), # counter
    13: (247., 182., 210.), # desk
    14: (219., 219., 141.), # curtain
    15: (255., 127., 14.), # refrigerator
    16: (158., 218., 229.), # shower curtain
    17: (44., 160., 44.), # toilet
    18: (112., 128., 144.), # sink
    19: (227., 119., 194.), # bathtub
    20: (82., 84., 163.), # other
    # 41: (186., 197., 62.), # ceiling
    21: (58., 98., 26.), # ceiling
    0: (0., 0., 0.), # unlabel/unknown
}

In [103]:
assert len(mask_paths) == len(point_cloud_paths) #== len()

In [122]:
for file in point_cloud_paths:
    print(os.path.basename(file))
    point_cloud.visualize_ply_with_k3d(file, point_size=2).display()

office4_distill.ply
PLY file loaded successfully!
Number of points: 993008




Output()

room2_distill.ply
PLY file loaded successfully!
Number of points: 722496


Output()

office0_distill.ply
PLY file loaded successfully!
Number of points: 589517


Output()

office1_distill.ply
PLY file loaded successfully!
Number of points: 423007


Output()

office2_distill.ply
PLY file loaded successfully!
Number of points: 858623


Output()

room0_distill.ply
PLY file loaded successfully!
Number of points: 954492


Output()

office3_distill.ply
PLY file loaded successfully!
Number of points: 1187140


Output()

room1_distill.ply
PLY file loaded successfully!
Number of points: 645512


Output()

In [None]:
path = ""

In [131]:
for i in range(len(mask_paths)):
    binary_masks = torch.load(mask_paths[i])
    print(len(binary_masks))
    ply_data = o3d.io.read_point_cloud(point_cloud_paths[i])
    
    # Extract points and colors
    coords = np.asarray(ply_data.points)  # 3D coordinates
    colors = np.asarray(ply_data.colors)  # RGB values (normalized to [0, 1])

    print(colors.shape)

    # Set base color
    colors[:] = 0.5

    # Normalize colors to 0-255 and convert to hexadecimal
    colors = (colors * 255).astype(np.uint64)

    for i, mask in enumerate(binary_masks):
        random_index = random.randint(1, len(MATTERPORT_COLOR_MAP_21)-1)
        colors[mask] = MATTERPORT_COLOR_MAP_21[random_index]

    colors_hex = (colors[:, 0] << 16) + (colors[:, 1] << 8) + colors[:, 2]

    # Visualize with k3d
    plot = k3d.plot()
    point_cloud = k3d.points(positions=coords, point_size=2, colors=colors_hex)
    plot += point_cloud
    plot.display()

74
(993008, 3)


Output()

80
(722496, 3)


Output()

76
(589517, 3)


Output()

61
(423007, 3)


Output()

69
(858623, 3)


Output()

73
(954492, 3)


Output()

63
(1187140, 3)


Output()

70
(645512, 3)


Output()

### Merge:

In [38]:
import numpy as np
import torch
import experiment
from glob import glob

output_path = experiment.get_current_path()
output_path

'/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08'

In [39]:
for split in ['train', 'val', 'test']:
    
    print("\nRunning split: ", split, " -----------------")
    
    mask3d_path = os.path.join(output_path, "mask3d")
    mask_paths = sorted(glob(os.path.join(mask3d_path, split, '*.pt')))

    openscene_path = os.path.join(output_path, "openscene")
    features_paths = sorted(glob(os.path.join(openscene_path, split, '*.npy')))

    print("Instance masks: ", mask_paths)
    print("Per point features: ", features_paths)
    
    assert len(mask_paths) == len(features_paths)
    
    for i in range(len(mask_paths)):
    
        sample_name = os.path.basename(mask_paths[i]).split('_')[0]

        # Make sure that the instance masks and the point feature are from the same input sample
        assert sample_name == os.path.basename(features_paths[i]).split('_')[0]

        # Load masks and features
        masks = torch.load(mask_paths[i])
        features = np.load(features_paths[i])
        print(f"Masks shape: ({len(masks)}, {masks[0].shape[0]})")
        print(f"Features shape: {features.shape}")

        mean_instance_features = []
        # Compute average instance features
        for mask in masks:
            masked_features = features[mask,:]
            mean_instance_features.append(features[mask,:].mean(axis=0))
        mean_instance_features = np.array(mean_instance_features)
        print(mean_instance_features.shape)
        
        folder_path = os.path.join(output_path, "instance_features", split)
            
        os.makedirs(folder_path, exist_ok=True)
        
        file_path = os.path.join(folder_path, f"{sample_name}_instance_features.npy")
            
        np.save(file_path, mean_instance_features)

        print(f"Saved instance features for {sample_name}")


Running split:  train  -----------------
Instance masks:  ['/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/mask3d/train/office0_masks.pt', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/mask3d/train/office1_masks.pt', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/mask3d/train/office2_masks.pt', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/mask3d/train/room0_masks.pt']
Per point features:  ['/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/openscene/train/office0_features.npy', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/openscene/train/office1_features.npy', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/openscene/train/office2_features.npy', '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/openscene/train/room0_features.npy']
Masks shape: (

## 4. Classification

In [123]:
import experiment
current_path = experiment.get_current_path()

print(current_path)

/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08


### Get clip features for replica classes:

In [124]:
from clip_utils import extract_text_feature, REPLICA_LABELS#, MATTERPORT_LABELS_21

In [125]:
labelset = list(REPLICA_LABELS)
text_features, new_label_set = extract_text_feature(labelset)

Use prompt engineering: a XX in a scene
Loading CLIP ViT-B/32 model...
Finish loading


In [126]:
labelset.append('unlabeled')

In [55]:
labelset, new_label_set

(['basket',
  'bed',
  'bench',
  'bin',
  'blanket',
  'blinds',
  'book',
  'bottle',
  'box',
  'bowl',
  'camera',
  'cabinet',
  'candle',
  'chair',
  'clock',
  'cloth',
  'comforter',
  'cushion',
  'desk',
  'desk-organizer',
  'door',
  'indoor-plant',
  'lamp',
  'monitor',
  'nightstand',
  'panel',
  'picture',
  'pillar',
  'pillow',
  'pipe',
  'plant-stand',
  'plate',
  'pot',
  'sculpture',
  'shelf',
  'sofa',
  'stool',
  'switch',
  'table',
  'tablet',
  'tissue-paper',
  'tv-screen',
  'tv-stand',
  'vase',
  'vent',
  'wall-plug',
  'window',
  'rug',
  'unlabeled'],
 ['a basket in a scene',
  'a bed in a scene',
  'a bench in a scene',
  'a bin in a scene',
  'a blanket in a scene',
  'a blinds in a scene',
  'a book in a scene',
  'a bottle in a scene',
  'a box in a scene',
  'a bowl in a scene',
  'a camera in a scene',
  'a cabinet in a scene',
  'a candle in a scene',
  'a chair in a scene',
  'a clock in a scene',
  'a cloth in a scene',
  'a comforter in

In [56]:
import torch
import os
path = os.path.join(current_path, "clip_features.pt")
torch.save(text_features, path)

with open(os.path.join(current_path, "labels.txt"), 'w') as file:
        for string in labelset:
            file.write(string + '\n')
            
with open(os.path.join(current_path, "text_prompts.txt"), 'w') as file:
        for string in new_label_set:
            file.write(string + '\n')

### Get per instance features:

In [127]:
text_features = torch.load(os.path.join(current_path, "clip_features.pt"))
text_features.shape  # torch.Size([21, 512])

torch.Size([48, 512])

In [128]:
instance_path = os.path.join(current_path, "instance_features")
npy_files = [
    os.path.join(root, file)
    for root, _, files in os.walk(instance_path)
    for file in files
    if file.endswith(".npy")
]
npy_files

['/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/instance_features/test/room2_instance_features.npy',
 '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/instance_features/test/office4_instance_features.npy',
 '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/instance_features/val/room1_instance_features.npy',
 '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/instance_features/val/office3_instance_features.npy',
 '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/instance_features/train/office1_instance_features.npy',
 '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/instance_features/train/office0_instance_features.npy',
 '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-29-20-44-08/instance_features/train/room0_instance_features.npy',
 '/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-

### Classify instances

In [129]:
import numpy as np
from clip_utils import classify_features
import torch

for file in npy_files:
    instance_features = np.load(file)
    instance_features = torch.Tensor(instance_features)
    
    sample_name = os.path.basename(file).split('_')[0]
    
    print(f"Processing {sample_name}")
    
    print(text_features.shape)
    print(instance_features.shape)
    
    predicted_classes, confidence_scores = classify_features(text_features, instance_features)

    save_path = os.path.dirname(file)
    torch.save(predicted_classes, os.path.join(save_path, f"{sample_name}_predicted_classes.pl"))
    torch.save(confidence_scores, os.path.join(save_path, f"{sample_name}_confidence_scores.pl"))
    

Processing room2
torch.Size([48, 512])
torch.Size([80, 512])
Processing office4
torch.Size([48, 512])
torch.Size([74, 512])
Processing room1
torch.Size([48, 512])
torch.Size([70, 512])
Processing office3
torch.Size([48, 512])
torch.Size([63, 512])
Processing office1
torch.Size([48, 512])
torch.Size([61, 512])
Processing office0
torch.Size([48, 512])
torch.Size([76, 512])
Processing room0
torch.Size([48, 512])
torch.Size([73, 512])
Processing office2
torch.Size([48, 512])
torch.Size([69, 512])


## 5. Augmenations

In [76]:
import os
from glob import glob
import torch
# Use rotation, translation, change of color
path = "dataset/OpenYOLO3D/output/replica/replica_masks"  # Classes
path1 = "dataset/OpenYOLO3D/output/replica/replica_ground_truth_masks" # Instances
files_masks = sorted(glob(os.path.join(path, '*.pt')))

for file in files_masks:
    sample_name = os.path.basename(file)  
    print(sample_name)
    
    masks, confidences = torch.load(os.path.join(path, sample_name))
    print(masks.shape)
    print(confidences.shape)
    masks, confidences = torch.load(os.path.join(path1, sample_name))
    print(masks.shape)
    print(confidences.shape)

office0.pt
torch.Size([589517, 22])
torch.Size([22])
torch.Size([589517, 68])
torch.Size([68])
office1.pt
torch.Size([423007, 23])
torch.Size([23])
torch.Size([423007, 52])
torch.Size([52])
office2.pt
torch.Size([858623, 27])
torch.Size([27])
torch.Size([858623, 94])
torch.Size([94])
office3.pt
torch.Size([1187140, 27])
torch.Size([27])
torch.Size([1187140, 113])
torch.Size([113])
office4.pt
torch.Size([993008, 28])
torch.Size([28])
torch.Size([993008, 71])
torch.Size([71])
room0.pt
torch.Size([954492, 36])
torch.Size([36])
torch.Size([954492, 94])
torch.Size([94])
room1.pt
torch.Size([645512, 25])
torch.Size([25])
torch.Size([645512, 57])
torch.Size([57])
room2.pt
torch.Size([722496, 21])
torch.Size([21])
torch.Size([722496, 61])
torch.Size([61])


In [77]:
gt_path = "dataset/OpenYOLO3D/output/replica/replica_ground_truth_masks"
point_cloud_base_path = "dataset/data/replica_split"
point_cloud_files = [
    os.path.join(root, file)
    for root, _, files in os.walk(point_cloud_base_path)
    for file in files
    if file.endswith(".pth")
]
print(point_cloud_files)

['dataset/data/replica_split/test/room2.pth', 'dataset/data/replica_split/test/office4.pth', 'dataset/data/replica_split/val/room1.pth', 'dataset/data/replica_split/val/office3.pth', 'dataset/data/replica_split/train/office2.pth', 'dataset/data/replica_split/train/room0.pth', 'dataset/data/replica_split/train/office0.pth', 'dataset/data/replica_split/train/office1.pth']


In [78]:
import random
import numpy as np
import torch
seed = 1234
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
random.seed(seed)

In [56]:
import augmentations
%load_ext autoreload
%autoreload 2

In [80]:
from point_cloud import visualize_point_cloud_with_k3d

for point_cloud in point_cloud_files:
    coords, colors, labels = torch.load(point_cloud)
    
    visualize_point_cloud_with_k3d(coords, colors).display()
    
    sample_name = os.path.basename(point_cloud).split('.')[0]
    
    gt_mask,_ = torch.load(os.path.join(gt_path, f"{sample_name}.pt"))
    for mask in gt_mask.T:
        mask = mask != 0
        _, colors[mask] = augmentations.add_rgb_noise_to_object(coords[mask], colors[mask], sigma=0.1)
    
    normalized_rgb = (colors + 1) / 2
    normalized_rgb = (normalized_rgb * 255).astype(np.uint64)
    
    visualize_point_cloud_with_k3d(coords, normalized_rgb, is_rgb=True, is_norm=True).display()
    break
    



Output()

Output()

In [81]:
from point_cloud import visualize_point_cloud_with_k3d

for point_cloud in point_cloud_files:
    coords, colors, labels = torch.load(point_cloud)
    
    visualize_point_cloud_with_k3d(coords, colors).display()
    
    sample_name = os.path.basename(point_cloud).split('.')[0]
    
    gt_mask,labels = torch.load(os.path.join(gt_path, f"{sample_name}.pt"))
    mask = gt_mask[:,5] != 0
    
    #normalized_rgb = (colors + 1) / 2
    #normalized_rgb = (normalized_rgb * 255).astype(np.uint64)
    #normalized_rgb[mask] = (255., 187., 120.)
    
    for mask in gt_mask.T:
        mask = mask != 0
        coords_tmp = coords.copy()
        colors_tmp = colors.copy()
        coords_tmp[mask], colors_tmp[mask] = augmentations.random_augmentation(coords_tmp[mask], colors_tmp[mask])
    
    visualize_point_cloud_with_k3d(coords, colors).display()
    break

Output()

Output()

##  Instance/Group Feature Extraction for Prompt Learning

### 1. Save augmented scenes

In [76]:
import augmentations

num_augmentations = 10
use_color = False


for point_cloud in point_cloud_files:
    print(point_cloud)
    
    # Load point cloud
    coords, colors, labels = torch.load(point_cloud)
    
    # Get name of current sample (e.g "room0")
    sample_name = os.path.basename(point_cloud).split('.')[0]
    
    # Get ground truth masks to find points in instance
    gt_mask,_ = torch.load(os.path.join(gt_path, f"{sample_name}.pt"))
    mask = gt_mask[:,5] != 0
    
    # Create dir to store augmentations
    output_dir = os.path.join(os.path.dirname(point_cloud), f"{sample_name}_augmentations")
    os.makedirs(output_dir, exist_ok=True)
    
    # Create n augmentations and save to disk
    for i in range(num_augmentations_per_instance):
        coords_tmp = coords.copy()
        colors_tmp = colors.copy()
        for mask in gt_mask.T:
            mask = mask != 0
            coords_tmp[mask], colors_tmp[mask] = augmentations.random_augmentation(coords_tmp[mask], colors_tmp[mask], use_color)
        
        # Save to file
        file_path = os.path.join(output_dir, f"{sample_name}_{i}.pth")

        output = (coords_tmp, colors_tmp, labels)
        torch.save(output, file_path)


dataset/data/replica_split/test/room2.pth
dataset/data/replica_split/test/office4.pth
dataset/data/replica_split/val/room1.pth
dataset/data/replica_split/val/office3.pth
dataset/data/replica_split/train/office2.pth
dataset/data/replica_split/train/room0.pth
dataset/data/replica_split/train/office0.pth
dataset/data/replica_split/train/office1.pth


### 2. Feed augmented scenes through openscne

In [63]:
import experiment
output_path = experiment.get_current_path()
output_path

'/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-27-00-42-47'

In [79]:
%%bash -s "$output_path"
# Run openscene
set -x

exp_dir="$1/openscene/prompt_learning"
config="./config/openscene/replica/replica_lseg_aug.yaml"
feature_type=distill

mkdir -p "${exp_dir}"
result_dir="${exp_dir}"

export PYTHONPATH="models/openscene"
python -u models/openscene/run/evaluate_merged.py \
  --config=${config} \
  feature_type ${feature_type} \
  save_folder ${result_dir} \
  2>&1 | tee -a ${exp_dir}/eval-$(date +"%Y%m%d_%H%M").log

+ exp_dir=/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-27-00-42-47/openscene/prompt_learning
+ config=./config/openscene/replica/replica_lseg_aug.yaml
+ feature_type=distill
+ mkdir -p /cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-27-00-42-47/openscene/prompt_learning
+ result_dir=/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-27-00-42-47/openscene/prompt_learning
+ export PYTHONPATH=models/openscene
+ PYTHONPATH=models/openscene
+ python -u models/openscene/run/evaluate_merged.py --config=./config/openscene/replica/replica_lseg_aug.yaml feature_type distill save_folder /cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-27-00-42-47/openscene/prompt_learning
++ date +%Y%m%d_%H%M
+ tee -a /cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-27-00-42-47/openscene/prompt_learning/eval-20250127_1311.log


torch.__version__:1.12.1+cu113
torch.version.cuda:11.3
torch.backends.cudnn.version:8302
torch.backends.cudnn.enabled:True
[2025-01-27 13:12:02,881 evaluate_merged.py line 164] arch_3d: MinkUNet18A
data_root: dataset/data/replica_split
data_root_2d_fused_feature: data/replica_multiview_openseg
dist_backend: nccl
dist_url: tcp://127.0.0.1:6787
distributed: False
eval_iou: False
exp_dir: ./experiments/openscene/replica_split
feature_2d_extractor: lseg
feature_type: distill
input_color: False
labelset: matterport
manual_seed: 3407
mark_no_feature_to_unknown: True
model_path: https://cvg-data.inf.ethz.ch/openscene/models/matterport_lseg.pth.tar
multiprocessing_distributed: False
ngpus_per_node: 1
prompt_eng: True
rank: 0
save_feature_as_numpy: True
save_folder: /cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-27-00-42-47/openscene/prompt_learning
split: all
sync_bn: False
test_batch_size: 1
test_gpu: [0]
test_repeats: 1
test_workers: 0
use_apex: False
use_augmentations: Tru

RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user


RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user


RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
RPly: Aborted by user
100%|██████████| 88/88 [39:44<00:00, 27.10s/it]


### 3. Merge extracted features

In [106]:
import experiment
import utils
output_path = experiment.get_current_path()
utils.merge_extracted_features_augmented(output_path, num_aug=num_augmentations)

Instance masks:  8
Per point features:  88
Processing office4:
Masks shape: torch.Size([71, 993008])
/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-27-00-42-47/openscene/prompt_learning/test/office4_0_features.npy
Features shape: (993008, 512)
(71, 512)
/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-27-00-42-47/openscene/prompt_learning/test/office4_1_features.npy
Features shape: (993008, 512)
(71, 512)
/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-27-00-42-47/openscene/prompt_learning/test/office4_2_features.npy
Features shape: (993008, 512)
(71, 512)
/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-27-00-42-47/openscene/prompt_learning/test/office4_3_features.npy
Features shape: (993008, 512)
(71, 512)
/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-27-00-42-47/openscene/prompt_learning/test/office4_4_features.npy
Features shape: (993008, 512)
(71, 512)
/cluster/54/blessman/ml3d/experiments/merged

Features shape: (858623, 512)
(94, 512)
/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-27-00-42-47/openscene/prompt_learning/train/office2_1_features.npy
Features shape: (858623, 512)
(94, 512)
/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-27-00-42-47/openscene/prompt_learning/train/office2_2_features.npy
Features shape: (858623, 512)
(94, 512)
/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-27-00-42-47/openscene/prompt_learning/train/office2_3_features.npy
Features shape: (858623, 512)
(94, 512)
/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-27-00-42-47/openscene/prompt_learning/train/office2_4_features.npy
Features shape: (858623, 512)
(94, 512)
/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-27-00-42-47/openscene/prompt_learning/train/office2_5_features.npy
Features shape: (858623, 512)
(94, 512)
/cluster/54/blessman/ml3d/experiments/merged_pipline/run_2025-01-27-00-42-47/openscene/prompt_learni

### Visualize augmented scenes:

In [5]:
import os
import torch
base_path = "dataset/data/replica_split"
point_cloud_files = [
    os.path.join(root, file)
    for root, _, files in os.walk(base_path)
    for file in files
    if file.endswith(".pth")
]
len(point_cloud_files)

88

In [7]:
from point_cloud import visualize_point_cloud_with_k3d

for point_cloud_file in point_cloud_files[:10]:
    coords, colors, labels = torch.load(point_cloud_file)
    
    visualize_point_cloud_with_k3d(coords, colors).display()

[255 255 255 ... 255 255 255]


Output()

[255 255 255 ... 255 255 255]


Output()

[255 255 255 ... 255 255 255]


Output()

[255 255 255 ... 255 255 255]


Output()

[255 255 255 ... 255 255 255]


Output()

[255 255 255 ... 255 255 255]


Output()

[255 255 255 ... 255 255 255]


Output()

[255 255 255 ... 255 255 255]


Output()

[255 255 255 ... 255 255 255]


Output()

[255 255 255 ... 255 255 255]


Output()

## Test ground truth textfiles

In [75]:
from glob import glob
import os
import utils
import point_cloud
import clip_utils
import numpy as np
import torch
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [61]:
gt_path = "dataset/data/replica_split/ground_truth"

In [72]:
files = utils.get_all_files_in_dir(gt_path, "npy")
files

['dataset/data/replica_split/ground_truth/office0.npy',
 'dataset/data/replica_split/ground_truth/office1.npy',
 'dataset/data/replica_split/ground_truth/office2.npy',
 'dataset/data/replica_split/ground_truth/office3.npy',
 'dataset/data/replica_split/ground_truth/office4.npy',
 'dataset/data/replica_split/ground_truth/room0.npy',
 'dataset/data/replica_split/ground_truth/room1.npy',
 'dataset/data/replica_split/ground_truth/room2.npy']

In [73]:
per_point_ids = np.load(files[0])

In [96]:
use_id = per_point_ids[36464]
print("GT id: ", use_id)
print("Label: ", clip_utils.get_label(use_id))

GT id:  38
Label:  table


In [97]:
tv_mask = per_point_ids == use_id

In [98]:
coords, colors, _ = torch.load("dataset/data/replica_split/train/office0.pth")
point_cloud.visualize_point_cloud_with_k3d(coords, colors).display()

colors = (colors + 1) / 2
colors = (colors * 255).astype(np.uint64)

colors[tv_mask] = (214., 39., 40.)

point_cloud.visualize_point_cloud_with_k3d(coords, colors, is_rgb=True, is_norm=True).display()

Output()

Output()

# IGNORE THIS FOR NOW:

In [50]:
import sys
sys.path.append("models/openscene")
sys.path.append("dataset")

from models.mink_unet import mink_unet as model3D
from torch import nn
from torch.utils import model_zoo
from voxelizer import Voxelizer
import experiment
import numpy as np
import os
import torch
import augmentations

from MinkowskiEngine import SparseTensor

In [16]:
gt_path = "dataset/OpenYOLO3D/output/replica/replica_ground_truth_masks"
point_cloud_base_path = "dataset/data/replica_split"
point_cloud_files = [
    os.path.join(root, file)
    for root, _, files in os.walk(point_cloud_base_path)
    for file in files
    if file.endswith(".pth")
]
print(point_cloud_files)

['dataset/data/replica_split/test/room2.pth', 'dataset/data/replica_split/test/office4.pth', 'dataset/data/replica_split/val/room1.pth', 'dataset/data/replica_split/val/office3.pth', 'dataset/data/replica_split/train/office2.pth', 'dataset/data/replica_split/train/room0.pth', 'dataset/data/replica_split/train/office0.pth', 'dataset/data/replica_split/train/office1.pth']


In [5]:
current_path = experiment.setup_experiment()

Created new experiment folder: experiments/merged_pipline/run_2025-01-27-10-48-29


In [6]:
def constructor3d(**kwargs):
    model = model3D(**kwargs)
    return model


class DisNet(nn.Module):
    '''3D Sparse UNet for Distillation.'''
    def __init__(self, cfg=None):
        super(DisNet, self).__init__()
        last_dim = 512

        # MinkowskiNet for 3D point clouds
        net3d = constructor3d(in_channels=3, out_channels=last_dim, D=3, arch="MinkUNet18A")
        self.net3d = net3d

    def forward(self, sparse_3d):
        '''Forward method.'''
        return self.net3d(sparse_3d)

In [7]:
model = DisNet().cuda()

In [8]:
model_path = "https://cvg-data.inf.ethz.ch/openscene/models/matterport_lseg.pth.tar"
checkpoint = model_zoo.load_url(model_path, progress=True)
model.load_state_dict(checkpoint['state_dict'], strict=True)

<All keys matched successfully>

In [42]:
class DataHandler():
    
    # Augmentation arguments
    SCALE_AUGMENTATION_BOUND = (0.9, 1.1)
    ROTATION_AUGMENTATION_BOUND = ((-np.pi / 64, np.pi / 64), (-np.pi / 64, np.pi / 64), (-np.pi,
                                                                                          np.pi))
    TRANSLATION_AUGMENTATION_RATIO_BOUND = ((-0.2, 0.2), (-0.2, 0.2), (0, 0))
    ELASTIC_DISTORT_PARAMS = ((0.2, 0.4), (0.8, 1.6))

    ROTATION_AXIS = 'z'
    LOCFEAT_IDX = 2
    
    def __init__(self, use_color, voxel_size, use_augmentation):
        self.voxelizer = Voxelizer(
                voxel_size=voxel_size,
                clip_bound=None,
                use_augmentation=use_augmentation,
                scale_augmentation_bound=self.SCALE_AUGMENTATION_BOUND,
                rotation_augmentation_bound=self.ROTATION_AUGMENTATION_BOUND,
                translation_augmentation_ratio_bound=self.TRANSLATION_AUGMENTATION_RATIO_BOUND)
        self.use_color = use_color
    
    def get_point_cloud_samples(self, locs, feats_in, labels):
        # no color in the input point cloud, e.g nuscenes
        if np.isscalar(feats_in) and feats_in == 0:
            feats_in = np.zeros_like(locs_in)
        feats_in = (feats_in + 1.) * 127.5

        locs, feats, _, inds_reconstruct = self.voxelizer.voxelize(
            locs, feats_in, labels)
        coords = torch.from_numpy(locs).int()
        coords = torch.cat(
            (torch.ones(coords.shape[0], 1, dtype=torch.int), coords), dim=1)
        if self.use_color:
            feats = torch.from_numpy(feats).float() / 127.5 - 1.
        else:
            feats = torch.ones(coords.shape[0], 3)
        return coords, feats, torch.from_numpy(inds_reconstruct).long()

In [43]:
use_color = False
use_vox_augmentation = False
data_handler = DataHandler(use_color=use_color, voxel_size=0.02, use_augmentation=use_vox_augmentation)
num_augmentatibns_per_instance = 10

In [51]:
for point_cloud_file in point_cloud_files:
    coords, colors, labels = torch.load(point_cloud_file)
    
    sample_name = os.path.basename(point_cloud_file).split('.')[0]
    
    gt_mask,_ = torch.load(os.path.join(gt_path, f"{sample_name}.pt"))
    mask = gt_mask[:,5] != 0
    
    
    for mask in gt_mask.T:
        mask = mask != 0
        instance_coords = coords.copy()
        instance_colors = colors.copy()
        
        batch = data_handler.get_point_cloud_samples(instance_coords, instance_colors, labels)
        transformed_instance_coords, transformed_instance_colors, transformed_instance_colors_recon = batch
        
        transformed_instances_coords = [transformed_instance_coords]
        transformed_instances_colors = [transformed_instance_colors]
        transformed_instances_recon = [transformed_instance_recon]
        
        for _ in range(num_augmentations_per_instance-1):
            aug_instance_coords = coords.copy()
            aug_instance_colors = colors.copy()
            aug_instance_coords[mask], aug_instance_colors[mask] = augmentations.random_augmentation(
                aug_instance_coords[mask], aug_instance_colors[mask]
            )
            
            batch = data_handler.get_point_cloud_samples(aug_instance_coords, aug_instance_colors, labels)
            transformed_instance_coords, transformed_instance_colors, transformed_instance_colors_recon = batch                                                                                                                                  
                                                                                                                                          
            transformed_instances_coords.append(transformed_instance_coords)
            transformed_instances_colors.append(transformed_instance_colors)
            transformed_instances_recon.append(transformed_instance_colors_recon)

        # Stack augmented versions into a single tensor
        #coords = torch.stack(transformed_instances_coords)
        #feat = torch.stack(transformed_instances_colors)
        #inds_reverse = torch.stack(transformed_instances_recon)
        for i in range(len(transformed_instances_coords)):
            coords = transformed_instances_coords[i]
            feat = transformed_instances_colors[i]
            inds_reverse = transformed_instances_recon[i]
            sinput = SparseTensor(feat.cuda(non_blocking=True), coords.cuda(non_blocking=True))

            # Use openscene to get features
            with torch.no_grad():
                model.eval()
                predictions = model(sinput)
                predictions = predictions[inds_reverse, :]
                predictions = predictions.cpu().numpy()

            print(predictions.shape)
            break
        break   
    
    #visualize_point_cloud_with_k3d(coords, colors).display()
    break

../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [211524,0,0], thread: [32,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [211524,0,0], thread: [33,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [211524,0,0], thread: [34,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [211524,0,0], thread: [35,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [211524,0,0], thread: [36,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [211524,0,0], thread: [

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.