FlashFPS

Official PyTorch implementation for the DAC'26 paper:

FlashFPS: Efficient Farthest Point Sampling for Large-Scale Point Clouds via Pruning and Caching

by Yuzhe Fu, Hancheng Ye, Cong Guo, Junyao Zhang, Qinsi Wang, Yueqian Lin, Changchun Zhou, Hai "Helen" Li, Yiran Chen.

Demo of FlashFPS on PointNeXt-L @ S3DIS segmentation

FlashFPS-demo.mp4

Abstract

FlashFPS is a hardware-agnostic, plug-and-play framework for efficient Farthest Point Sampling (FPS) in point cloud networks. It achieves on average end-to-end 5.16× speedup over the standard CUDA baseline on GPU, with negligible accuracy loss.

This repository reproduces the network accuracy and speedup performance reported in the paper. This repo currently supports FPS-CUDA, FlashFPS, and the SOTA work QuickFPS on the following workloads:

Network Models	Main Library	Datasets	Supported Methods
PointNeXt-L, PointVector-L	openpoints	S3DIS, ScanNet	FPS-CUDA, FlashFPS, QuickFPS

Detailed setup and experiment instructions are in the sub-folders below:

FlashFPS-Openpoints/ — PointNeXt / PointVector on the openpoints backbone. Ready to use.
FlashFPS-PointTransformer/ — Point Transformer backbone. To be released.

Hardware note. We recommend TITAN-class, RTX 6000, RTX 3090, or A100 GPUs (all tested successfully). Hopper-architecture GPUs (e.g., H100) are not recommended. All reported numbers in this repo were obtained on TITAN GPUs for consistency.

Minor accuracy variations may occur across GPU architectures due to GPU-dependent numerical behavior; they do not affect the overall conclusions.

Todo

Support FlashFPS and FPS-CUDA for PointNeXt-L and PointVector-L.
Add QuickFPS for PointNeXt-L and PointVector-L.
Support FlashFPS on Point Transformer.
Support FlashFPS performance breakdown.

Citation

@article{fu2026flashfps,
  title={FlashFPS: Efficient Farthest Point Sampling for Large-Scale Point Clouds via Pruning and Caching},
  author={Fu, Yuzhe and Ye, Hancheng and Guo, Cong and Zhang, Junyao and Wang, Qinsi and Lin, Yueqian and Zhou, Changchun and Li, Hai Helen and Chen, Yiran},
  journal={arXiv preprint arXiv:2604.17720},
  year={2026},
  doi={10.48550/arXiv.2604.17720},
}

Related Project — FractalCloud

FlashFPS optimizes the Farthest Point Sampling, delivering an average 5.16× end-to-end speedup on GPUs, and no hardware changes required. If you are interested in full-stack hardware–software co-design of point neural networks (PNNs), please check out our another work:

FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing, which achieves an average 21.7× speedup on PNN inference through a co-designed accelerator.
Repository: FractalCloud

Tip: FlashFPS and FractalCloud share the same environment. If you've already set up one, the other runs out of the box ^_^

Acknowledgment

This repository builds upon FractalCloud, PointNeXt and OpenPoints. The QuickFPS implementation is adapted from QuickFPS and FastPoint. We thank the authors for their open-source contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
FlashFPS-Openpoints		FlashFPS-Openpoints
FlashFPS-PointTransformer		FlashFPS-PointTransformer
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlashFPS

Demo of FlashFPS on PointNeXt-L @ S3DIS segmentation

Abstract

Todo

Citation

Related Project — FractalCloud

Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FlashFPS

Demo of FlashFPS on PointNeXt-L @ S3DIS segmentation

Abstract

Todo

Citation

Related Project — FractalCloud

Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages