ARM FP16 GEMM: generic C implementation without ISA optimizations

Hi there,

I noticed that the current FP16 GEMM implementation on ARM is based on a generic C implementation, without leveraging ISA-specific optimizations (e.g., NEON or SVE intrinsics).

Are there any plans to optimize this kernel in the future? Given that some ARM cores lack dedicated FP16 dot-product instructions or MMLA variants, I suspect this might be the primary reason for the current generic implementation. Is this assumption correct?

Thanks for your time and for maintaining this project!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM FP16 GEMM: generic C implementation without ISA optimizations #5681

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ARM FP16 GEMM: generic C implementation without ISA optimizations #5681

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions