Skip to content

CUDA/CUPY kernel for building Common Line matrix #1114

@garrettwrong

Description

@garrettwrong

Given the successful speedups using CUDA for parts of the Sync3N algorithm, we should implement a similar GPU implementation for building the CL matrix.

For unit test sized problems our current implementation is tolerable, but for larger experiments (say 3000 images) it can take 5-6 hours with the current python implementation. The legacy MATLAB code provided both a CPU and GPU implementation, though I am not sure how relevant either are to the implementation that exists in python today (tbd).

Another feature that was nice about the MATLAB code is that it provided a way to store and recall the CL matrix via the workspace. We can consider optionally writing to disk and providing a method to load from disk. I expect that might speed up some development tasks in the future.

Metadata

Metadata

Assignees

No one assigned

    Labels

    GPUOptimizationPerformance or Resource OptimzationenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions