universal_gemm/gen_instances.py: make parsing more robust#1899
universal_gemm/gen_instances.py: make parsing more robust#1899coconutruben wants to merge 1 commit intoROCm:developfrom
Conversation
The current logic is very brittle, and can break down when we parse
examples that have new order, or define some keys that were previously
only used as default keys
\# What
1. parse based on split and kwargs rather than moving a pointer across
a list. This makes the code (arguably) a bit more readable, and more
robust to changing parameter lenghts
2. use default values explicitly rather than relying on the underlying
template. This means every template is fully instantiated, and easier
to debug
\# Test Plan
```
import logging
import torch
from torch._inductor import config as inductor_config
log = logging.getLogger(__name__)
def main():
log.info(f"M: {M}, N: {N}, K: {K}, dtype: {dtype}")
A = torch.randn(M, K, dtype=dtype).cuda()
B = torch.randn(K, N, dtype=dtype).cuda()
# sample the different backends independently
with inductor_config.patch(
{
"max_autotune_gemm_backends": f"ATEN,CK}",
"search_autotune_cache": False,
"fx_graph_cache": False,
"rocm.n_max_profiling_configs": None,
}
):
# compile the model
compiled_model = torch.compile(SimpleModel(), mode="max-autotune")
log.info("compiled model")
# run the compiled model
_ = compiled_model(A, B)
log.info("ran compiled model")
```
|
@tenpercent if you agree with this general direction, would you be able to make the corresponding changes in batched and conv? |
|
it seems that #1796 ran into this same issue and fixes the value error, and added tests. Sorry for not catching that earlier and thanks for adding the tests, we'll have to port those as well! |
|
Hi @coconutruben , thanks for looking into this! The parsing logic does look more readable this way, except for the regex part
So do I understand correctly that there's no parsing error when you move the pin past #1796 merge? Meaning there's no urgency to merge In general, the parsing code does look hairy, and a proper refactor would (1) extract the template arguments as python ints, strings and tuples -- this is template instance agnostic, can be shared between universal batched gemms, universal gemms, and grouped convolutions; and (2) try to instantiate a python dataclass from this list; this is custom for each of the dataclasses. Checking the kwargs could be also done at this step I was also thinking of possibly using python libclang instead of manual code parsing but that would probably be overkill at this point If not urgent, it could be a good introductory task |
| import re | ||
| import subprocess | ||
| from dataclasses import replace | ||
|
|
There was a problem hiding this comment.
also for the whitespace changes, are they coming from a linter or were they introduced manually?
|
@tenpercent @coconutruben |
|
I believe it's not a high priority to merge it |
Proposed changes
Why
The current logic is very brittle, and can break down when we parse examples that have new order, or define some keys that were previously only used as default keys
What
Test Plan
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed files