# Project 4: OpenPrompt: Open-Vocabulary 3D Scene Understanding and Instance Segmentation with Adaptive Prompt Learning

### Dataset
- **Replica Dataset**  
  - Download link: [Replica dataset](https://github.com/aminebdj/OpenYOLO3D/blob/main/scripts/get_replica_dataset.sh)

### Evaluation Script
- **Replica Evaluation Script**  
  - Link: [Replica evaluation script](https://github.com/aminebdj/OpenYOLO3D/tree/main/evaluate/replica)

### Reference Papers for Prompt Learning
1. **Align Your Prompts:** Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
2. **MaPLe:** Multi-modal Prompt Learning

### Modifications and Goals
1. **Objective:**  
   Create an open-vocabulary 3D instance segmentation pipeline.
   - Use **OpenScene** for feature extraction.
   - Use **Mask3D** for class-agnostic proposal generation.

2. **Testing:**  
   Evaluate open-vocabulary instance segmentation results on the **Replica dataset**.  
   - Metric: **Mean Average Precision (mAP)**  
   - Evaluation script: Provided above.

3. **Baseline and Improvements:**  
   - Start with baseline model results.
   - Implement **prompt learning** to improve performance.

### Prompt Learning Details
- Replace the fixed text features from the **CLIP text encoder** with a **trainable prompt** initialized with a text prompt.
- **Training Objective:** Reduce the cosine similarity between visual features of the same object when augmented in different ways (e.g., translations, rotations, or color changes).  
- Ensure consistency in visual features of the same object across augmentations by optimizing the learnable prompt during training.

### Summary
- Develop and test an open-vocabulary 3D instance segmentation pipeline with the specified modifications.
- Leverage prompt learning techniques to enhance the baseline model's performance on the **Replica dataset**.
- Evaluate using **mAP** as the primary metric.

# Installation

Step 1: follow mask3d installation instructions

Step 2: 

In [None]:
sudo apt-get install libopenexr-dev

Step 3: Install clip

In [None]:
pip install git+https://github.com/openai/CLIP.git

Step 4: install tensorboard

In [None]:
pip install tensorboardx

Step 5: install sharedarray

In [None]:
pip install sharedarray

# Running the merged pipeline

## 1. Openscene:

In [1]:
%%bash
# Run openscene
set -x

exp_dir="./experiments/openscene/replica_split"
config="./models/replica/replica.yaml"
feature_type=distill

mkdir -p "${exp_dir}"
result_dir="${exp_dir}/result_eval"

export PYTHONPATH="models/openscene"
python -u models/openscene/run/evaluate_merged.py \
  --config=${config} \
  feature_type ${feature_type} \
  save_folder ${result_dir} \
  2>&1 | tee -a ${exp_dir}/eval-$(date +"%Y%m%d_%H%M").log

+ exp_dir=./experiments/openscene/replica_split
+ config=./config/replica/replica.yaml
+ feature_type=distill
+ mkdir -p ./experiments/openscene/replica_split
+ result_dir=./experiments/openscene/replica_split/result_eval
+ export PYTHONPATH=models/openscene
+ PYTHONPATH=models/openscene
+ python -u models/openscene/run/evaluate_merged.py --config=./config/replica/replica.yaml feature_type distill save_folder ./experiments/openscene/replica_split/result_eval
++ date +%Y%m%d_%H%M
+ tee -a ./experiments/openscene/replica_split/eval-20250118_1506.log


torch.__version__:1.12.1+cu113
torch.version.cuda:11.3
torch.backends.cudnn.version:8302
torch.backends.cudnn.enabled:True
[2025-01-18 15:07:37,990 evaluate_merged.py line 159] arch_3d: MinkUNet18A
data_root: datasets/data/replica_split
data_root_2d_fused_feature: data/replica_multiview_openseg
dist_backend: nccl
dist_url: tcp://127.0.0.1:6787
distributed: False
eval_iou: False
feature_2d_extractor: openseg
feature_type: distill
input_color: False
labelset: matterport
manual_seed: 3407
mark_no_feature_to_unknown: True
model_path: https://cvg-data.inf.ethz.ch/openscene/models/matterport_openseg.pth.tar
multiprocessing_distributed: False
ngpus_per_node: 1
prompt_eng: True
rank: 0
save_feature_as_numpy: False
save_folder: ./experiments/openscene/replica_split/result_eval
split: val
sync_bn: False
test_batch_size: 1
test_gpu: [0]
test_repeats: 1
test_workers: 0
use_apex: False
use_shm: False
vis_input: True
vis_pred: True
voxel_size: 0.02
world_size: 1
Use prompt engineering: a XX in a sce

## Visualize results

In [15]:
import open3d as o3d
import numpy as np
import k3d
import os
import glob

def visualize_ply_with_k3d(file_path, point_size=0.05):
    """
    Load a PLY file and visualize it using k3d in a Jupyter Notebook.

    Args:
        file_path (str): Path to the PLY file.
        point_size (float): Size of the points in the visualization.
    """
    # Load the PLY file using Open3D
    ply_data = o3d.io.read_point_cloud(file_path)
    
    # Check if the file is loaded correctly
    if ply_data.is_empty():
        print("Failed to load PLY file.")
        return

    print("PLY file loaded successfully!")
    print(f"Number of points: {len(ply_data.points)}")

    # Extract points and colors
    coords = np.asarray(ply_data.points)  # 3D coordinates
    colors = np.asarray(ply_data.colors)  # RGB values (normalized to [0, 1])

    # Normalize colors to 0-255 and convert to hexadecimal
    colors = (colors * 255).astype(np.uint64)
    colors_hex = (colors[:, 0] << 16) + (colors[:, 1] << 8) + colors[:, 2]

    # Visualize with k3d
    plot = k3d.plot()
    point_cloud = k3d.points(positions=coords, point_size=point_size, colors=colors_hex)
    plot += point_cloud
    return plot

# Path to your PLY file
file_path = "experiments/openscene/replica_split/result_eval"
files = glob.glob(os.path.join(file_path, "*.ply"))

# Call the visualization function
for file in files:
    plot = visualize_ply_with_k3d(file)
    if plot:
        plot.display()


PLY file loaded successfully!
Number of points: 1187140


Output()

PLY file loaded successfully!
Number of points: 645512


Output()

PLY file loaded successfully!
Number of points: 645512


Output()

PLY file loaded successfully!
Number of points: 1187140


Output()

## 2. Mask3D:

In [None]:
%%bash
export OMP_NUM_THREADS=3

CURR_DBSCAN=14.0
CURR_TOPK=750
CURR_QUERY=160
CURR_SIZE=54

python main_instance_segmentation.py \
general.experiment_name="validation_query_${CURR_QUERY}_topk_${CURR_TOPK}_dbscan_${CURR_DBSCAN}_size_${CURR_SIZE}" \
general.project_name="stpls3d_eval" \
data/datasets=stpls3d \
general.num_targets=15 \
data.num_labels=15 \
data.voxel_size=0.333 \
data.num_workers=10 \
data.cache_data=true \
data.cropping_v1=false \
general.reps_per_epoch=100 \
model.num_queries=${CURR_QUERY} \
general.on_crops=true \
model.config.backbone._target_=models.Res16UNet18B \
general.train_mode=false \
general.checkpoint="checkpoints/stpls3d/stpls3d_val.ckpt" \
data.crop_length=${CURR_SIZE} \
general.eval_inner_core=50.0 \
general.topk_per_image=${CURR_TOPK} \
general.use_dbscan=true \
general.dbscan_eps=${CURR_DBSCAN}

In [None]:
%%bash
export OMP_NUM_THREADS=3
export WANDB_MODE=offline
export WANDB_MODE=disabled

CURR_DBSCAN=14.0
CURR_TOPK=750
CURR_QUERY=160
CURR_SIZE=54

python main_instance_segmentation.py \
data/datasets=stpls3d \
general.train_mode=false \
general.checkpoint="checkpoints/stpls3d/stpls3d_val.ckpt"