Skip to content

A new fused tiled matrix multiplication algorithm #166

@xmyqsh

Description

@xmyqsh

I've come up with a new fused tiled matrix multiplication algorithm!

access in Indice M*K times
access out indice M*N times
use atomicAdd  K*N*K/shared_memory_size times

The original one in MinkowskiEngine ,

access in Indice M*K*K/shared_memory_size times
access out indice M*N times
use atomicAdd  M*N times

The above is in terms of one kernel, for Kernel Volume kernel, they should both multiply by Kernel Volume.

Which one do you think is better?
Can you elaborate on the latency of atomicAdd?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions