CUDA Implementation of PointTransformerV3 `serialization`

The PyTorch implementation of point serialization in PointTransformerV3, especially hilbert_encode, adds noticeable overhead. Replacing it with a custom CUDA kernel speeds things up quite a bit.

Inference time of Utonia (pretrained PTv3) for 130,000 points was reduced from ~139.5 ms to ~108.6 ms (~22% speedup, NVIDIA A40, averaged over 10 runs after warmup).

Profile Trace Comparison

PyTorch Implementation

CUDA Implementation

Installation

Set your target CUDA architecture in libs/serialize_cuda/setup.py.

cd libs/serialization
python setup.py install

Replace serialization/default.py with the provided version, calling the CUDA kernels when available.

Benchmark

warmup = 2
runs = 10
times = []

for i in range(runs + warmup):
    point = dataset[i]
    point = transform(point)

    with torch.no_grad():
        for key in point.keys():
            if isinstance(point[key], torch.Tensor):
                point[key] = point[key].cuda(non_blocking=True)
        
        torch.cuda.synchronize()
        start = time.perf_counter()

        point = model(point)

        torch.cuda.synchronize()
        end = time.perf_counter()
        if i >= warmup:
            times.append(end - start)
        print(f"Iter time: {end - start:.6f} seconds")

total = sum(times) / runs
print(f"Total time: {total:.6f} seconds")

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
img		img
libs/serialize_cuda		libs/serialize_cuda
serialization		serialization
.gitignore		.gitignore
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA Implementation of PointTransformerV3 `serialization`

Profile Trace Comparison

Installation

Benchmark

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Languages

Folders and files

Latest commit

History

Repository files navigation

CUDA Implementation of PointTransformerV3 serialization

Profile Trace Comparison

Installation

Benchmark

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 0

Languages

CUDA Implementation of PointTransformerV3 `serialization`

Packages

Contributors