Skip to content

Conversation

@andcarminati
Copy link
Collaborator

This catches some missing opportunities in GEMM.

FYI: @martien-de-jong .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you be a bit more precise? I think this is based on regression figures for AIE2 at a very particular time, state of the combines, state of the benchmark suites, etc. If there's a specific feature of AIE2P or AIE2 that is driving this, please mention it. If we can derive it from specific instruction characteristics of the conv or the load instructions, consider testing for that specific instruction feature.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this for other patterns in aie2, but not for this. I would prefer to not change the behavior for a target that we are not testing anymore, independent on the reason. In that time, a bunch of spill regression related to accumulator registers were found and I don't have all those results anymore.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would reverse the condition of this nested if, having an early return false and no else.

Copy link
Collaborator

@martien-de-jong martien-de-jong Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very general: parent checks are the cheapest one in town and should absolutely be done first. And I mean first-first, perhaps even before calling the select function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could optimize this selection framework as a whole. I created a ticket to review all cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add 'by moving the VCONV up to the load.'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think advancing the conversion is in general better and easier. We usually want to schedule loads early due to long latency. Can we try it first?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct, addressing register is defined after.

@andcarminati
Copy link
Collaborator Author

QoR numbers:

|----------------------------------------|----------|----------|--------------|
| Core_Compute_Cycle_Count               | Baseline | VLD_CONV | Total diff   |
|----------------------------------------|----------|----------|--------------|
| AddAttributeBroadcasting_aie2_bf16     |      517 |      549 | REGR(+6.19%) |
|----------------------------------------|----------|----------|--------------|
| AddBf16_aie2_0                         |      868 |      868 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AddBf16_aie2_1                         |      996 |      996 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AddBf16_aie2_2                         |      996 |      996 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2D_bfloat16_0                   |     1118 |     1118 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2D_bfloat16_1                   |      822 |      822 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2D_bfloat16_2                   |      532 |      532 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2D_bfloat16_3                   |      532 |      532 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2dVariant_bf16_0                |     3361 |     3361 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2dVariant_bf16_1                |     1988 |     1988 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2dVariant_bf16_2                |     4357 |     4357 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2dVariant_bf16_3                |      791 |      791 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2dVariant_bf16_4                |     1297 |     1297 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2dVariant_bf16_5                |     2686 |     2686 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2dVariant_bf16_6                |     3030 |     3030 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2dVariant_bf16_7                |     2409 |     2409 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2dVariant_bf16_8                |     3721 |     3721 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2dVariant_bf16_9                |     2937 |     2937 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2dVariant_bf16_10               |     3183 |     3183 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2dVariant_bf16_11               |     3415 |     3415 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2dVariant_bf16_13               |     2995 |     2995 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2dVariant_bf16_14               |     3349 |     3349 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2dVariant_bf16_15               |     4789 |     4789 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| AvgPool2dVariant_bf16_16               |     1106 |     1106 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Clip_aie2_bf16                         |      197 |      197 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| CompareOpsAttributeBroadcasting_bf16   |     2056 |     2056 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| CompareOpsAttributeBroadcasting_bf16_1 |     2168 |     2168 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_DW_bf16_0                       |     1368 |     1368 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_DW_bf16_1                       |     3412 |     3412 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_DW_bf16_2                       |     2576 |     2576 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_DW_bf16_3                       |     1368 |     1368 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_DW_bf16_4                       |      856 |      856 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_0                         |    11973 |    11973 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_1                         |    25743 |    25743 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_2                         |    11973 |    11973 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_3                         |    13901 |    13901 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_4                         |    12587 |    12587 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_5                         |    47025 |    47025 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_6                         |     6449 |     6449 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_7                         |     2711 |     2711 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_8                         |     1893 |     1893 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_9                         |    39725 |    39725 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_10                        |    39765 |    39765 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_11                        |    39765 |    39765 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_12                        |     9102 |     9102 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_13                        |    23805 |    23805 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_14                        |    17357 |    17357 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_OC8_0                     |     9827 |     9827 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_OC8_1                     |    21335 |    21335 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_OC8_2                     |     6381 |     6381 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_OC8_3                     |    13397 |    13397 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_OC8_4                     |     6999 |     6999 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_OC8_5                     |    39291 |    39291 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_OC8_6                     |    17555 |    17555 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_OC8_7                     |    19243 |    19243 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_OC8_8                     |    32299 |    32299 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_OC8_9                     |     7027 |     7027 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_OC8_10                    |    36485 |    36485 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_PSUM_FLOAT_0              |     7866 |     7866 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_PSUM_FLOAT_1              |     5976 |     5976 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Conv2D_bfp16_PSUM_FLOAT_2              |     6102 |     6102 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| GeluTemplated_aie2_bf16                |     3747 |     3747 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Hardswish_aie2_1                       |     1283 |     1283 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| MaxPool2D_bf16_0                       |     1695 |     1695 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| MaxPool2D_bf16_1                       |     1191 |     1191 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| MaxPool2D_bf16_2                       |      694 |      694 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| MaxPool2D_bf16_3                       |      694 |      694 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| MaxPool2D_bf16_4                       |      694 |      694 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| MulAttributeBroadcasting_aie2_bf16_0   |      888 |      888 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Neg_aie2_1                             |      132 |      132 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Pad2D_bf16_0                           |     2387 |     2387 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceMaxAxis_1_aie2_bf16              |     7565 |     7565 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceMaxAxis_2_aie2_bf16              |     7581 |     7581 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceMaxAxis_3_aie2_bf16              |     2646 |     2646 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceMaxAxis_4_aie2_bf16              |     7606 |     7606 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceMaxAxis_5_aie2_bf16              |     2652 |     2652 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceMaxAxis_6_aie2_bf16              |     2637 |     2637 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceMaxAxis_7_aie2_bf16              |     1828 |     1828 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceMeanAxis_1_aie2_bf16             |    24940 |    24940 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceMeanAxis_2_aie2_bf16             |    16288 |    16288 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceMeanAxis_3_aie2_bf16             |     8513 |     8513 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceMeanAxis_4_aie2_bf16             |    24948 |    24948 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceMeanAxis_5_aie2_bf16             |     9580 |     9580 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceMeanAxis_6_aie2_bf16             |     8494 |     8494 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceMeanAxis_7_aie2_bf16             |     7254 |     7254 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceSumAxis_1_aie2_bf16              |    17894 |    17894 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceSumAxis_2_aie2_bf16              |    17910 |    17910 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceSumAxis_3_aie2_bf16              |     6755 |     6755 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceSumAxis_4_aie2_bf16              |    17930 |    17930 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceSumAxis_5_aie2_bf16              |     6755 |     6755 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceSumAxis_6_aie2_bf16              |     6736 |     6736 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| ReduceSumAxis_7_aie2_bf16              |     5111 |     5111 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| SiLU_aie2_bf16                         |     1821 |     1821 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| SigmoidTemplated_bf16_0                |     1053 |     1053 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| SigmoidTemplated_bf16_1                |      565 |      565 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| SigmoidTemplated_bf16_1_AIE2p          |      565 |      565 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Sigmoidmode1Templated_bf16_0           |     7336 |     7336 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Sqrt_bf16_0                            |    13775 |    13775 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Sqrt_bf16_1                            |     1791 |     1791 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| Sub_aie2_bf16_0                        |      858 |      858 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| TanhTemplated_aie2_bfloat16_0          |     3865 |     3865 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| TanhTemplated_aie2_bfloat16_1          |     1977 |     1977 | SAME(+0.00%) |
|----------------------------------------|----------|----------|--------------|
| TanhTemplatedmode1_bfloat16            |     7869 |     7854 | IMPR(-0.19%) |
|----------------------------------------|----------|----------|--------------|
| GEMM_Bfp16_opt_0                       |     1049 |     1033 | IMPR(-1.53%) |
|----------------------------------------|----------|----------|--------------|
| GEMM_Bfp16_opt_1                       |     3856 |     3792 | IMPR(-1.66%) |
|----------------------------------------|----------|----------|--------------|
| GEMM_Bfp16_opt_3                       |     3856 |     3792 | IMPR(-1.66%) |
|----------------------------------------|----------|----------|--------------|
| GEMM_Bfp16_opt_4                       |     3856 |     3792 | IMPR(-1.66%) |
|----------------------------------------|----------|----------|--------------|
| Average diff                           | +0.00%   | -0.00%   | -0.00%       |
|----------------------------------------|----------|----------|--------------|
| Diff stdev                             |     0.00 |     0.68 |         0.68 |
|----------------------------------------|----------|----------|--------------|
| Quantile #1                            | +0.00%   | +0.00%   | +0.00%       |
|----------------------------------------|----------|----------|--------------|
| Quantile #2                            | +0.00%   | +0.00%   | +0.00%       |
|----------------------------------------|----------|----------|--------------|
| Quantile #3                            | +0.00%   | +0.00%   | +0.00%       |
|----------------------------------------|----------|----------|--------------|

For AddAttributeBroadcasting_aie2_bf16 we combine more and retain the same II and stage count, but end with a bit more waiting cycles.

Copy link
Collaborator

@martien-de-jong martien-de-jong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Although the 'only for aie2p' condition remains mysterious.

@martien-de-jong martien-de-jong self-assigned this Mar 31, 2025
@andcarminati andcarminati merged commit c5b8fae into aie-public Apr 1, 2025
6 checks passed
@andcarminati andcarminati deleted the andreu.vld.vconv.up branch April 1, 2025 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants