Stream3D: Streaming Zero-Shot 3D Instance Segmentation with Multi-View Noise Mask Filtering and Manifold Refining

Jie Xu, Na Zhao
Singapore University of Technology and Design (SUTD)

CVPR 2026 Findings track

Acknowledgments

We proposed a novel streaming zero-shot/open-vocabulary 3D instance segmentation framework (Stream3D) and this work is built on the previous brilliant work, especially on MaskClustering, OpenMask3D, OVIR-3D, etc. We share the same experiment environment as MaskClustering, so we adopt its README.md file as follows.

Fast Demo

Step 1: Install dependencies

First, install PyTorch following the official instructions, e.g., for CUDA 11.8.:

conda install pytorch==2.0.0 torchvision==0.15.0 pytorch-cuda=11.8 -c pytorch -c nvidia

Then, install Pytorch3D. You can try 'pip install pytorch3d', but it doesn't work. Therefore I install it from source:

cd third_party
git clone git@github.com:facebookresearch/pytorch3d.git
cd pytorch3d && pip install -e .

Finally, install other dependencies:

cd ../..
pip install -r requirements.txt

Step 2: Download demo data from the MaskClustering demo data. Then unzip the data to ./data and your directory should look like this: data/demo/scene0608_00, etc.

Step 3: Run the clustering demo and visualize the class-agnostic result using Pyviz3d:

bash demo.sh

Quantitative Results

In this section, we provide a comprehensive guide on installing the full version of Stream3D, data preparation, and conducting experiments on the ScnaNet, ScanNet++, and MatterPort3D datasets.

Further installation

To run the full pipeline of Stream3D, you need to install 2D instance segmentation tool Cropformer and Open CLIP.

CropFormer

The official installation of Cropformer is composed of two steps: installing detectron2 and then Cropformer. For your convenience, I have combined the two steps into the following scripts. If you have any problems, please refer to the original Cropformer installation guide.

cd third_party
git clone git@github.com:facebookresearch/detectron2.git
cd detectron2
pip install -e .
cd ../
git clone git@github.com:qqlu/Entity.git
cp -r Entity/Entityv2/CropFormer detectron2/projects
cd detectron2/projects/CropFormer/entity_api/PythonAPI
make
cd ../..
cd mask2former/modeling/pixel_decoder/ops
sh make.sh
pip install -U openmim
mim install mmcv

We add an additional script into cropformer to make it sequentialy process all sequences.

cd ../../../../../../../../
cp mask_predict.py third_party/detectron2/projects/CropFormer/demo_cropformer

Finally, download the CropFormer checkpoint and modify the 'cropformer_path' variable in script.py.

CLIP

Install the open clip library by

pip install open_clip_torch

For the checkpoint, when you run the script, it will automatically download the checkpoint. However, if you want to download it manually, you can download it from here and set the path when loading CLIP model using 'create_model_and_transforms' function.

Data Preparation

ScanNet

Please follow the official ScanNet guide to sign the agreement and send it to scannet@googlegroups.com. After receiving the response, you can download the data. You only need to download the ['.aggregation.json', '.sens', '.txt', '_vh_clean_2.0.010000.segs.json', '_vh_clean_2.ply', '_vh_clean_2.labels.ply'] files. Please also set the 'label_map' on to download the 'scannetv2-labels.combined.tsv' file.

After downloading the data, you can run the following script to prepare the data. Please change the 'raw_data_dir', 'target_data_dir', 'split_file_path', 'label_map_file' and 'gt_dir' variables before you run.

cd preprocess/scannet
python process_val.py
python prepare_gt.py

After running the script, you will get the following directory structure:

data/scannet
  ├── processed
      ├── scene0011_00
          ├── pose                            <- folder with camera poses
          │      ├── 0.txt 
          │      ├── 10.txt 
          │      └── ...  
          ├── color                           <- folder with RGB images
          │      ├── 0.jpg  (or .png/.jpeg)
          │      ├── 10.jpg (or .png/.jpeg)
          │      └── ...  
          ├── depth                           <- folder with depth images
          │      ├── 0.png  (or .jpg/.jpeg)
          │      ├── 10.png (or .jpg/.jpeg)
          │      └── ...  
          ├── intrinsic                 
          │      └── intrinsic_depth.txt       <- camera intrinsics
          |      └── ...
          └── scene0011_00_vh_clean_2.ply      <- point cloud of the scene
  └── gt                                       <- folder with ground truth 3D instance masks
      ├── scene0011_00.txt
      └── ...

ScanNet++

Please follow the official ScanNet++ guide to sign the agreement and download the data. In order to help reproduce the results, we provide the configs we use to download and preprocess the scannet++ in preprocess/scannetpp. Please modify the paths in these configs and paste them to the corresponding folders before running the script. Then clone the ScanNet++ toolkit.

To extract the rgb and depth image, run the following script:

  python -m iphone.prepare_iphone_data iphone/configs/prepare_iphone_data.yml
  python -m common.render common/configs/render.yml

Since the original mesh is of super high resolution, we downsample it and generate the ground truth accordingly as the following:

  python -m semantic.prep.prepare_training_data semantic/configs/prepare_training_data.yml
  python -m semantic.prep.prepare_semantic_gt semantic/configs/prepare_semantic_gt.yml

After running the script, you will get the following directory structure:

data/scannetpp
  ├── data
      ├── 0d2ee665be
          ├── iphone                            
          |       ├── rgb
          │         ├── frame_000000.jpg 
          │         ├── frame_000001.jpg 
          │         └── ... 
          |       ├── render_depth 
          │         ├── frame_000000.png 
          │         ├── frame_000001.png 
          │         └── ... 
          |       └── ... 
          └── scans                        
      └── ...
  ├── gt 
  ├── metadata
  ├── pcld_0.25     <- downsampled point cloud of the scene
  └── splits

MatterPort3D

Please follow the official MatterPort3D guide to sign the agreement and download the data. We use a subset of its testing scenes to ensure Mask3D remains within memory constraints. The list of scenes we use can be found in splits/matterport3d.txt. Download only the following: ['undistorted_color_images', 'undistorted_depth_images', 'undistorted_camera_parameters', 'house_segmentations']. Upon download, unzip the files. Your directory structure should resemble (or you can modify the paths in 'preprocess/matterport3d/process.py' and 'dataset/matterport.py'):

data/matterport3d/scans
  ├── 2t7WUuJeko7
      ├── 2t7WUuJeko7
          ├── house_segmentations
          |         ├── 2t7WUuJeko7.ply
          |         └── ...
          ├── undistorted_camera_parameters
          |         └── 2t7WUuJeko7.conf
          ├── undistorted_color_images
          |         ├── xxx_i0_0.jpg
          |         └── ...
          └── undistorted_depth_images
                    ├── xxx_d0_0.png
                    └── ...
  ├── ARNzJeq3xxb
  ├── ...
  └── YVUC4YcDtcY

Then run the following script to prepare the ground truth:

cd preprocess/matterport3d
python process.py

Running Experiments

Simply find the corresponding config in the 'configs' folder and run the following command. Remember to change the 'cropformer_path' variable in the config and the 'CUDA_LIST' variable in the run.py.

  python run.py --config config_name

For example, to run the ScanNet experiment, you can run the following command:

  python run.py --config scannet

This run.py will get the 2D instance masks, run mask clustering, get open-vocabulary features and evaluate the results. The evaluation results will be saved in the 'data/evaluation' folder.

Visualization

To visualize the 3D class-agnostic result of one specific scene, run the following command:

  python -m visualize.vis_scene --config scannet --seq_name scene0608_00

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Code_Stream3D_CVPR2026F.zip		Code_Stream3D_CVPR2026F.zip
LICENSE		LICENSE
README.md		README.md
Stream3D.png		Stream3D.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stream3D: Streaming Zero-Shot 3D Instance Segmentation with Multi-View Noise Mask Filtering and Manifold Refining

CVPR 2026 Findings track

Acknowledgments

Fast Demo

Quantitative Results

Further installation

CropFormer

CLIP

Data Preparation

ScanNet

ScanNet++

MatterPort3D

Running Experiments

Visualization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Stream3D: Streaming Zero-Shot 3D Instance Segmentation with Multi-View Noise Mask Filtering and Manifold Refining

CVPR 2026 Findings track

Acknowledgments

Fast Demo

Quantitative Results

Further installation

CropFormer

CLIP

Data Preparation

ScanNet

ScanNet++

MatterPort3D

Running Experiments

Visualization

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages