Skip to content

ARM FP16 GEMM: generic C implementation without ISA optimizations #5681

@yuanjia111

Description

@yuanjia111

Hi there,

I noticed that the current FP16 GEMM implementation on ARM is based on a generic C implementation, without leveraging ISA-specific optimizations (e.g., NEON or SVE intrinsics).

Are there any plans to optimize this kernel in the future? Given that some ARM cores lack dedicated FP16 dot-product instructions or MMLA variants, I suspect this might be the primary reason for the current generic implementation. Is this assumption correct?

Thanks for your time and for maintaining this project!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions