Skip to content

Incorrect results when problem size M is not divisible by 16. #3963

Closed
@rdspring1

Description

@rdspring1

1752 / 16 = 109.5 so it is not a multiple of 16 and there are incorrect results. 1760 / 16 = 110 and 1744 / 16 = 110 is a multiple of 16 and runs correctly.

This is probably an issue with (16, 16) stmatrix store given the multiple of 16 requirement.

To Reproduce:
NVFUSER_ENABLE=fuse_matmul NVFUSER_DISABLE=matmul_expr_eval python profile_matmul.py 1752 4720 584 NN --verbose --validate

Error Message:

===== Matmul Parameters ========

MMA macro: Hopper_64_256_16
CircularBufferOptions:
  circular_buffer_smem_write: true
  circular_buffer_smem_read: false
  smem_circular_buffer_stage: 4
  smem_circular_buffer_prefetch_gap: 1
SupportedVectorization:
  a: 8
  b: 8
  epilogue: 8
MatMulTileOptions: warp tile [64, 256, 64], CTA tile [128, 256, 64]
Async global mem load: true
Indexing mode: int32_t
Tile rasterization order: column-major
Grid swizzle factor: 1
Tiling strategy: OneTilePerCTA
Buffering loop level: CTATiles
Circular buffering strategy: WarpSpecialized
__cluster_dims__(1, 1, 1)
Use shared memory epilogue: 1
Promote re-use of prologue shared memory: 1
Split-K factor: 1
====================================

Traceback (most recent call last):
  File "/opt/pytorch/nvfuser/doc/dev/python_scheduling/profile_matmul.py", line 209, in <module>
    main()
  File "/opt/pytorch/nvfuser/doc/dev/python_scheduling/profile_matmul.py", line 198, in main
    baseline_result, nvf_result = test_matmul_nvf(
                                  ^^^^^^^^^^^^^^^^
  File "/opt/pytorch/nvfuser/doc/dev/python_scheduling/profile_matmul.py", line 135, in test_matmul_nvf
    assert torch.allclose(
           ^^^^^^^^^^^^^^^
AssertionError

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions