[SME] Utilize predication in fp32 matmul and conv2d schedules #17054

lhutton1 · 2024-05-31T12:33:43Z

Prior to this commit, the matmul and conv2d schedules required padding of the inputs to some multiple of vscale and a final "unpadding" stage.

Instead, we can leverage predicated operations to avoid the the requirement for padding. Both the transpose interleave and outer product fp32 intrinsics are updated to use predication. The get_active_lane_mask intrinsic is utilized to generate a variably sized mask of active lanes depending on the global position the tensor intrinsic is operating on.

For now this relies on using offset_of and stride information from the tensor we're predicating an access on. Likely we will want to build on this in the future with a more intuitive API for determining the current tile location.

Support for batched conv2d was removed since this causes numerical issues which is suspected to be due to how the current tile is determined (paragraph above).

~~Note: this should not be merged until after #17048~~

cc @ekalda @Anndrey24

ekalda

Thanks @lhutton1, very nice work! Looks good to me, but I'll let @Anndrey24 to take a look as well...

Anndrey24 · 2024-06-12T10:28:50Z

LGTM, too! Seems there's just a small merge conflict that has come up.

Prior to this commit, the matmul and conv2d schedules required padding of the inputs to some multiple of vscale and a final "unpadding" stage. Instead, we can leverage predicated operations to avoid the the requirement for padding. Both the transpose interleave and outer product fp32 intrinsics are updated to use predication. The `get_active_lane_mask` intrinsic is utilized to generate a variably sized mask of active lanes depending on the global position the tensor intrinsic is operating on. For now this relies on using `offset_of` and `stride` information from the tensor we're predicating an access on. Likely we will want to build on this in the future with a more intuitive API for determining the current tile location. Support for batched conv2d was removed since this causes numerical issues which is suspected to be due to how the current tile is determined (paragraph above). Change-Id: I79620200c9a94e2ca9d7297c4ed2abf87549cc41

Change-Id: Iaddeb046bdecb0352a067174f6e6e4be335e94fd

ekalda · 2024-06-14T09:47:14Z

Thanks @lhutton1 and @Anndrey24!

github-actions bot requested a review from ekalda May 31, 2024 12:34

lhutton1 mentioned this pull request May 31, 2024

[Tracking Issue] Scalable Matrix Extension (SME) upstreaming #16734

Open

11 tasks

lhutton1 force-pushed the predicate-sme-fp32-schedules branch 2 times, most recently from 398ea16 to 41a1f04 Compare June 10, 2024 09:47

lhutton1 marked this pull request as ready for review June 10, 2024 09:59

ekalda approved these changes Jun 11, 2024

View reviewed changes

lhutton1 added 2 commits June 13, 2024 12:27

Fix tests and rebase

e755e43

Change-Id: Iaddeb046bdecb0352a067174f6e6e4be335e94fd

lhutton1 force-pushed the predicate-sme-fp32-schedules branch from 41a1f04 to e755e43 Compare June 13, 2024 12:55

ekalda merged commit d3011ab into apache:main Jun 14, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SME] Utilize predication in fp32 matmul and conv2d schedules #17054

[SME] Utilize predication in fp32 matmul and conv2d schedules #17054

lhutton1 commented May 31, 2024 •

edited

Loading

ekalda left a comment

Anndrey24 commented Jun 12, 2024

ekalda commented Jun 14, 2024

[SME] Utilize predication in fp32 matmul and conv2d schedules #17054

[SME] Utilize predication in fp32 matmul and conv2d schedules #17054

Conversation

lhutton1 commented May 31, 2024 • edited Loading

ekalda left a comment

Choose a reason for hiding this comment

Anndrey24 commented Jun 12, 2024

ekalda commented Jun 14, 2024

lhutton1 commented May 31, 2024 •

edited

Loading