Skip to content

Yuzhe-Fu/FlashFPS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FlashFPS

arXiv DAC 2026 License: MIT

Official PyTorch implementation for the DAC'26 paper:

FlashFPS: Efficient Farthest Point Sampling for Large-Scale Point Clouds via Pruning and Caching

by Yuzhe Fu, Hancheng Ye, Cong Guo, Junyao Zhang, Qinsi Wang, Yueqian Lin, Changchun Zhou, Hai "Helen" Li, Yiran Chen.

Demo of FlashFPS on PointNeXt-L @ S3DIS segmentation

FlashFPS-demo.mp4

Abstract

FlashFPS is a hardware-agnostic, plug-and-play framework for efficient Farthest Point Sampling (FPS) in point cloud networks. It achieves on average end-to-end 5.16× speedup over the standard CUDA baseline on GPU, with negligible accuracy loss.

This repository reproduces the network accuracy and speedup performance reported in the paper. This repo currently supports FPS-CUDA, FlashFPS, and the SOTA work QuickFPS on the following workloads:

Network Models Main Library Datasets Supported Methods
PointNeXt-L, PointVector-L openpoints S3DIS, ScanNet FPS-CUDA, FlashFPS, QuickFPS

Detailed setup and experiment instructions are in the sub-folders below:

Hardware note. We recommend TITAN-class, RTX 6000, RTX 3090, or A100 GPUs (all tested successfully). Hopper-architecture GPUs (e.g., H100) are not recommended. All reported numbers in this repo were obtained on TITAN GPUs for consistency.

Minor accuracy variations may occur across GPU architectures due to GPU-dependent numerical behavior; they do not affect the overall conclusions.

Todo

  • Support FlashFPS and FPS-CUDA for PointNeXt-L and PointVector-L.
  • Add QuickFPS for PointNeXt-L and PointVector-L.
  • Support FlashFPS on Point Transformer.
  • Support FlashFPS performance breakdown.

Citation

@article{fu2026flashfps,
  title={FlashFPS: Efficient Farthest Point Sampling for Large-Scale Point Clouds via Pruning and Caching},
  author={Fu, Yuzhe and Ye, Hancheng and Guo, Cong and Zhang, Junyao and Wang, Qinsi and Lin, Yueqian and Zhou, Changchun and Li, Hai Helen and Chen, Yiran},
  journal={arXiv preprint arXiv:2604.17720},
  year={2026},
  doi={10.48550/arXiv.2604.17720},
}

Related Project — FractalCloud HPCA 2026

FlashFPS optimizes the Farthest Point Sampling, delivering an average 5.16× end-to-end speedup on GPUs, and no hardware changes required. If you are interested in full-stack hardware–software co-design of point neural networks (PNNs), please check out our another work:

FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing, which achieves an average 21.7× speedup on PNN inference through a co-designed accelerator.
Repository: FractalCloud

Tip: FlashFPS and FractalCloud share the same environment. If you've already set up one, the other runs out of the box ^_^

Acknowledgment

This repository builds upon FractalCloud, PointNeXt and OpenPoints. The QuickFPS implementation is adapted from QuickFPS and FastPoint. We thank the authors for their open-source contributions.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors