universal_gemm/gen_instances.py: make parsing more robust by coconutruben · Pull Request #1899 · ROCm/composable_kernel

coconutruben · 2025-02-19T04:29:19Z

Proposed changes

Why

The current logic is very brittle, and can break down when we parse examples that have new order, or define some keys that were previously only used as default keys

What

parse based on split and kwargs rather than moving a pointer across a list. This makes the code (arguably) a bit more readable, and more robust to changing parameter lenghts
use default values explicitly rather than relying on the underlying template. This means every template is fully instantiated, and easier to debug

Test Plan

import logging

import torch
from torch._inductor import config as inductor_config

log = logging.getLogger(__name__)

def main():
  log.info(f"M: {M}, N: {N}, K: {K}, dtype: {dtype}")
  A = torch.randn(M, K, dtype=dtype).cuda()
  B = torch.randn(K, N, dtype=dtype).cuda()

  # sample the different backends independently
  with inductor_config.patch(
      {
          "max_autotune_gemm_backends": f"ATEN,CK}",
          "search_autotune_cache": False,
          "fx_graph_cache": False,
          "rocm.n_max_profiling_configs": None,
      }
  ):
      # compile the model
      compiled_model = torch.compile(SimpleModel(), mode="max-autotune")
      log.info("compiled model")
      # run the compiled model
      _ = compiled_model(A, B)
      log.info("ran compiled model")

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

The current logic is very brittle, and can break down when we parse examples that have new order, or define some keys that were previously only used as default keys \# What 1. parse based on split and kwargs rather than moving a pointer across a list. This makes the code (arguably) a bit more readable, and more robust to changing parameter lenghts 2. use default values explicitly rather than relying on the underlying template. This means every template is fully instantiated, and easier to debug \# Test Plan ``` import logging import torch from torch._inductor import config as inductor_config log = logging.getLogger(__name__) def main(): log.info(f"M: {M}, N: {N}, K: {K}, dtype: {dtype}") A = torch.randn(M, K, dtype=dtype).cuda() B = torch.randn(K, N, dtype=dtype).cuda() # sample the different backends independently with inductor_config.patch( { "max_autotune_gemm_backends": f"ATEN,CK}", "search_autotune_cache": False, "fx_graph_cache": False, "rocm.n_max_profiling_configs": None, } ): # compile the model compiled_model = torch.compile(SimpleModel(), mode="max-autotune") log.info("compiled model") # run the compiled model _ = compiled_model(A, B) log.info("ran compiled model") ```

coconutruben · 2025-02-19T04:29:47Z

@tenpercent if you agree with this general direction, would you be able to make the corresponding changes in batched and conv?

coconutruben · 2025-02-20T02:53:59Z

it seems that #1796 ran into this same issue and fixes the value error, and added tests. Sorry for not catching that earlier and thanks for adding the tests, we'll have to port those as well!
I do think the rewriting here is beneficial and cleaner to parsing, but open to feedback

tenpercent · 2025-02-20T17:34:15Z

Hi @coconutruben , thanks for looking into this! The parsing logic does look more readable this way, except for the regex part

it seems that #1796 ran into this same issue and fixes the value error

So do I understand correctly that there's no parsing error when you move the pin past #1796 merge? Meaning there's no urgency to merge

In general, the parsing code does look hairy, and a proper refactor would (1) extract the template arguments as python ints, strings and tuples -- this is template instance agnostic, can be shared between universal batched gemms, universal gemms, and grouped convolutions; and (2) try to instantiate a python dataclass from this list; this is custom for each of the dataclasses. Checking the kwargs could be also done at this step

I was also thinking of possibly using python libclang instead of manual code parsing but that would probably be overkill at this point

If not urgent, it could be a good introductory task

tenpercent · 2025-02-20T17:46:54Z

python/ck4inductor/universal_gemm/gen_instances.py

+import re
 import subprocess
-from dataclasses import replace
+


also for the whitespace changes, are they coming from a linter or were they introduced manually?

cgmillette · 2025-10-08T22:43:00Z

@tenpercent @coconutruben
Please update the status for this PR. It has been open for 8 months - please decide whether we can close it or move forward with changes / review. Thank you.

tenpercent · 2025-10-08T22:59:18Z

I believe it's not a high priority to merge it

coconutruben requested review from afagaj, andriy-ca, aosewski, bartekxk, carlushuang, geyyer, illsilin, poyenc and qianfengz as code owners February 19, 2025 04:29

tenpercent reviewed Feb 20, 2025

View reviewed changes

cgmillette added the Stale This PR is stale and needs attention. If not attended to, the PR may be closed due to inactivity. label Oct 8, 2025

tenpercent closed this Oct 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

universal_gemm/gen_instances.py: make parsing more robust#1899

universal_gemm/gen_instances.py: make parsing more robust#1899
coconutruben wants to merge 1 commit intoROCm:developfrom
coconutruben:coconutruben/parse

coconutruben commented Feb 19, 2025

Uh oh!

coconutruben commented Feb 19, 2025

Uh oh!

coconutruben commented Feb 20, 2025

Uh oh!

tenpercent commented Feb 20, 2025 •

edited

Loading

Uh oh!

tenpercent Feb 20, 2025

Uh oh!

cgmillette commented Oct 8, 2025

Uh oh!

tenpercent commented Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

coconutruben commented Feb 19, 2025

Proposed changes

Why

What

Test Plan

Checklist

Uh oh!

coconutruben commented Feb 19, 2025

Uh oh!

coconutruben commented Feb 20, 2025

Uh oh!

tenpercent commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tenpercent Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

cgmillette commented Oct 8, 2025

Uh oh!

tenpercent commented Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tenpercent commented Feb 20, 2025 •

edited

Loading