Pointelligence is a repository for 3D point cloud research, currently featuring:
- the official implementation of PointCNN++ (CVPR 2026)—a significant next evolution of PointCNN (NeurIPS 2018).
Some operators are implemented with C++/CUDA as PyTorch extensions, which could be built and installed with the following commands:
cd extensions
pip install --no-build-isolation -e .in examples/FCGF
in examples/Pointcept
Pointelligence is the repo for the official implementation of:
- PointCNN++: Performant Convolution on Native Points
Lihan Li, Haofeng Zhong, Rui Bu, Mingchao Sun, Wenzheng Chen, Baoquan Chen, Yangyan Li@misc{li2025pointcnnperformantconvolutionnative, title={PointCNN++: Performant Convolution on Native Points}, author={Lihan Li and Haofeng Zhong and Rui Bu and Mingchao Sun and Wenzheng Chen and Baoquan Chen and Yangyan Li}, year={2025}, eprint={2511.23227}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2511.23227}, }
To ensure they are tracked effectively, please submit feature requests and issue reports here rather than via email.
While we provide a suite of ready-to-use backbones, our framework is explicitly designed to facilitate the construction of custom network architectures from scratch. We review several key concepts below; combined with the reference implementations in the models directory, these resources are intended to help users quickly master the library's workflow.
The total number of points often varies significantly from one sample to another within a single batch. As illustrated below, the straightforward approach deals with this irregularity (e.g., samples having 2, 7, and 4 points) by forcing data into fixed-size dense tensors. While this satisfies the rigid structural requirements of standard frameworks, it overlooks the data's inherent sparsity. Smaller samples must be padded out with non-existent "ghost" data, squandering significant memory and compute cycles on empty space.
Scenario: A batch of 3 irregular samples with 2, 7, and 4 points respectively.
Legend:
[P] = Valid Point/Feature Data
[.] = "Ghost Data" (Padding/Wasted Memory)
+-----------------------------------------------------------+
| THE STRAIGHTFORWARD APPROACH: Fixed-Size Dense Tensor |
| Status: WASTEFUL. Forces data to match the largest dim. |
+-----------------------------------------------------------+
To create a uniform grid, every sample must be padded to match
the largest necessary dimension (at least 7).
Batch Memory Layout (Fixed grid):
Row 0 (Sample 1): [P][P][.][.][.][.][.]
Row 1 (Sample 2): [P][P][P][P][P][P][P]
Row 2 (Sample 3): [P][P][P][P][.][.][.]
^Valid^ ^Wasted^
VISUAL RESULT: Significant portions of memory are useless padding.
Ragged tensors represent a dedicated solution to the inefficiency shown above. As visualized below, this format is explicitly designed to handle irregularity. Instead of maintaining separate, padded rows in a grid, the Ragged tensor stores the entire batch as a single, contiguous sequence containing only valid points. This packed data is managed by a lightweight auxiliary metadata structure that tracks the individual sample sizes. By eliminating padding entirely, this approach ensures processing applies only to actual geometric and feature data.
+-----------------------------------------------------------+
| THE DEDICATED SOLUTION: Ragged Tensor |
| Status: EFFICIENT. Stores only what exists. |
+-----------------------------------------------------------+
[A. Contiguous Data Storage]
Padding is eliminated. The entire batch is flattened into one
tightly packed sequence of valid data only:
Memory Layout:
[P][P] [P][P][P][P][P][P][P] [P][P][P][P]
\___/ \___________________/ \_________/
S1(2) S2(7) S3(4)
VISUAL RESULT: Zero wasted space.
[B. Auxiliary Metadata]
A small separate structure tracks where samples sizes:
Sample Sizes: [ 2, 7, 4 ]
We employ a voxel-based strategy using the
Because the
The
Unlike in image processing where target pixel locations are fixed, the spatial locations of upsampled points in a point cloud are not inherently known. Therefore, a recommended practice is to explicitly reuse the retained set of original, pre-downsampled points as the upsampling target. This approach is both efficient and unambiguous. Note that the retention and retrieval of these original points must be managed by the enclosing pipeline or calling procedure.
We opt for a fixed radius search over a fixed-number (K-Nearest Neighbors, or KNN) search, as its spatially-local receptive field is better suited for spatial learning, whereas KNN is often a choice imposed by architectural limitations. Given sets of "source" and "query" points—which may be identical—and a specified radius, the
As illustrated in the vertical flow diagram below, the connectivity of a batched point cloud is most efficiently represented as a unified list of
[-SAMPLE 1-] [-------------SAMPLE 2--------------] [-----------SAMPLE 3-----------]
Query (i): 0 1 2 3 4 5 6 7 8 9 10 11 12
| | | | | | | | | | | | |
|-. |-. | |-.-. |-.-. |-.-. |-. |-. | |--.--.--. |--. |--. |--.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | |
v v v v | v v v v v v v v v v v v v | v v v v v v v v v v
Neighbor (j): 0 1 0 1 2 3 4 5 3 4 5 3 4 5 6 7 6 7 8 9 10 11 12 9 10 9 11 9 12
The connections from the diagram above, flattened into two parallel arrays:
[-SAMPLE 1-] [-------------SAMPLE 2--------------] [-----------SAMPLE 3-----------]
Query (i): 0 0 1 1 2 3 3 3 4 4 4 5 5 5 6 6 7 7 8 9 9 9 9 10 10 11 11 12 12
Neighbor (j): 0 1 0 1 2 3 4 5 3 4 5 3 4 5 6 7 6 7 8 9 10 11 12 9 10 9 11 9 12
+---------------------------------------------------------------------------------+
| THE UNIFIED LIST: (i, j) pairs |
| Status: EFFICIENT. A compact stream of all neighborhood edges in the batch. |
+---------------------------------------------------------------------------------+
The full process of convolution on native points involves four steps: output location generation, neighborhood search, convolution triplet construction and the Matrix-Vector Multiplication and Reduction (MVMR). The actual convolution computation occurs in the final MVMR stage, while the preceding three steps serve to structure the input data for this calculation.
This step defines the spatial centers for the convolution operations. Depending on the desired architectural effect, the output locations are generated in one of three ways:
- Standard Convolution (
$stride=1$ ): The input points serve directly as the output locations. - Strided Convolution (
$stride > 1$ ): Output locations are generated by downsampling the input points via$grid_sample_filter$ with a target grid size of$grid_size_{input} \times stride$ (see Downsampling). Note that the$grid_size$ of the output point cloud would be$grid_size_{input} \times stride$ . - Transposed Convolution (Upconvolution): The output locations are explicitly set to the pre-calculated 'upsampled' points, as described in the Upsampling section.
This step executes the neighbor finding process detailed in Neighborhood Computation and Representation by invoking the
The search radius is determined by the formula
-
Inscribed Sphere:
$radius_scaler = \frac{1}{2} \times receptive_field$ . The search ball is the largest sphere that fits inside the${receptive_field}^3$ cube. -
Equal Volume (Default):
$radius_scaler = \sqrt[3]{\frac{3}{4 \pi}} \times receptive_field$ . The search ball has the same volume as the${receptive_field}^3$ cube. -
Circumscribed Sphere:
$radius_scaler = \frac{\sqrt{3}}{2} \times receptive_field$ . The search ball is the smallest sphere that encloses the${receptive_field}^3$ cube.
To perform convolution on the irregular structure as depicted by the neighborhood
[-SAMPLE 1-] [-------------SAMPLE 2--------------] [-----------SAMPLE 3-----------]
Query (i): 0 1 2 3 4 5 6 7 8 9 10 11 12
| | | | | | | | | | | | |
|-. |-. | |-.-. |-.-. |-.-. |-. |-. | |--.--.--. |--. |--. |--.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | |
v v v v | v v v v v v v v v v v v v | v v v v v v v v v v
Neighbor (j): 0 1 0 1 2 3 4 5 3 4 5 3 4 5 6 7 6 7 8 9 10 11 12 9 10 9 11 9 12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | |
v v v v v v v v v v v v v v v v v v v v v v v v v v v v v
Kernel (k): 4 5 5 4 4 4 6 8 2 4 5 1 3 4 4 7 1 4 4 4 2 6 8 6 4 2 4 0 4
The connections from the diagram above, flattened into three parallel arrays:
[-SAMPLE 1-] [-------------SAMPLE 2--------------] [-----------SAMPLE 3-----------]
Query (i): 0 0 1 1 2 3 3 3 4 4 4 5 5 5 6 6 7 7 8 9 9 9 9 10 10 11 11 12 12
Neighbor (j): 0 1 0 1 2 3 4 5 3 4 5 3 4 5 6 7 6 7 8 9 10 11 12 9 10 9 11 9 12
Kernel (k): 4 5 5 4 4 4 6 8 2 4 5 1 3 4 4 7 1 4 4 4 2 6 8 6 4 2 4 0 4
+---------------------------------------------------------------------------------+
| THE UNIFIED LIST: (i, j, k) triplets |
| Status: EFFICIENT. A compact stream of all computation edges in the batch. |
+---------------------------------------------------------------------------------+
More specifically, to determine the kernel index
It is crucial to distinguish between
As detailed in the research paper of PointCNN++, the actual heavy lifting of the convolution arithmetic occurs in this final stage. To encapsulate this complexity, we provide a high-level
It is worth emphasizing that in this framework, feature tensors are first-class citizens. They are the primary carriers of the learned signal and the subject of all gradient backpropagation. The spatial coordinates, having served their purpose in generating the neighbor lists and triplets, are treated simply as "metadata" that guides the data-weight flow, rather than being part of the arithmetic computation itself.