feat: YAML-driven torch op codegen with canonical naming and exposed semantic params by voltjia · Pull Request #595 · InfiniTensor/InfiniOps

voltjia · 2026-05-09T07:58:26Z

Summary

Add scripts/generate_torch_ops.py (~920 lines) — a YAML-driven codegen that consumes PyTorch's aten/native_functions.yaml and emits an InfiniOps base class plus a slot-8 PyTorch backend per op listed in scripts/torch_ops.yaml (~459 ops, generating 507 overloads across 437 canonical classes).
Wire the codegen into CMake (src/CMakeLists.txt) under WITH_TORCH=ON: invoke at configure time, glob generated/torch/*.cc, add generated/ to public include paths, install the per-op metadata JSON alongside the bindings.
Update the wrapper generator (scripts/generate_wrappers.py) to scan generated/base/ and generated/torch/, fix pybind11 overload ordering (specific → permissive), preserve std::vector<int64_t> parameters that libclang misreports as int, and route active_implementation_indices through a graceful unknown-device path.
Add safe device-type lookup primitives (detail::ListContains in src/operator.h, TryDeviceTypeFromString in src/pybind11_utils.h) so generated bindings handle devices an op does not implement without aborting.
Add a single data-driven tests/test_torch_ops.py that reads generated/torch_ops_metadata.json and exercises every generated op across three shapes and three dtypes; widen tests/conftest.py to handle non-floating outputs and equal_nan.
Move the Sigmoid helper in src/native/cuda/ops/swiglu/kernel.cuh into detail:: so it does not collide with the auto-generated infini::ops::Sigmoid operator class.
Add pyyaml to [build-system].requires so CMake can run the codegen during pip install.

Codegen design choices driven by review feedback collected across all 513 base PRs against feat/torch-codegen:

Canonical names only. ATen overload-name suffixes (_grad_input, _outtensor, _n_scalar, _values, _x, _l, _q, _u, _output) no longer leak into InfiniOps class names. Multiple ATen overloads of the same base op share a single class, with overloaded operator() methods.
Visible scalars are members. Every visible non-tensor parameter (scalars, strings, vectors) is stored as a base-class member initialized from the constructor argument, alongside the existing tensor-metadata members.
Default-valued non-optional params are exposed. bool upper, bool transpose, bool unitriangular (triangular_solve), int diagonal (triu), str ord (linalg_matrix_norm), int n on the chebyshev/hermite polynomial families, etc. are no longer hidden because they have an ATen default — they are now visible in the generated operator() and forwarded to ATen.

Motivation

Replaces 500+ hand-written src/base/<op>.h headers with a single declarative pipeline driven from PyTorch's schema. Each commit is single-purpose and individually passes ruff check, ruff format --check, and clang-format (version 21).

The previous iteration (feat/torch-codegen-legacy, preserved on the remote) generated suffixed names that reviewers consistently flagged as bad public API (per inline comments on PRs #280, #283-#290, #509, #563-#589). It also hid semantically critical parameters and did not store scalars as members, requiring hand-written corrections in every base PR. This refactor moves those corrections into the codegen itself, so future regenerations produce the reviewer-preferred shape directly. The 77 PRs that were for non-canonical overload names have been closed; the remaining 333 keep + 103 promote PRs have their content regenerated to match the new codegen output.

Closes #

Type of Change

feat — new feature / new operator / new platform
fix — bug fix
perf — performance improvement (no behavioral change)
refactor — code restructuring without behavior change
test — adding or fixing tests only
docs — documentation only
build / ci — build system or CI configuration
chore — tooling, formatting, or other non-code changes
Breaking change (requires a ! in the Conventional Commits prefix or a BREAKING CHANGE: footer)

Platforms Affected

Test Results on Supported Platforms

Platform	Built	`pytest` Result	Notes / Hardware
NVIDIA
Iluvatar
MetaX
Cambricon
Moore
Ascend

Full `pytest` output (optional)

TBD — to be filled in once cross-platform validation completes.

Benchmark / Performance Impact

N/A. This PR adds a codegen pipeline, not a runtime hot-path change. Generated PyTorch backends call at::<op>_out(...) directly, so per-op performance matches a hand-written ATen-backed op.

Notes for Reviewers

The branch was force-pushed over the previous feat/torch-codegen integration branch. The previous content is preserved at feat/torch-codegen-legacy for reference.
The 513 open base PRs against feat/torch-codegen have been processed: 77 redundant overload PRs closed, 333 keep + 103 promote PRs scheduled to be force-pushed with regenerated content matching the new canonical naming and parameter shape.
Slot 8 is reserved for PyTorch backends; native and vendor implementations claim slots 0–7. The slot must be > 0 to avoid a partial-specialization-after-instantiation conflict with Operator<Op> at index 0.
Hand-written src/base/<op>.h continues to shadow generated/base/<op>.h (existence-based; no signature compatibility check). The four pre-existing hand-written bases that do not match the ATen-derived signature (add, linear, matmul, mul) are excluded from scripts/torch_ops.yaml and left to their existing hand-written infrastructure.
Optional ATen types (Tensor?, Scalar?, int?, float?) remain hidden for now — exposing them properly requires threading std::optional through to ATen, which is a separable refactor.

Checklist

Every contributor must verify every item below before requesting review. Tick each box only after the check has actually been performed — do not tick speculatively. If an item truly does not apply, replace the checkbox with N/A and briefly explain why in an inline comment.

Title, Branch, and Commits

PR title follows Conventional Commits.
Branch name follows <type>/xxx-yyyy-zzzz — feat/torch-codegen.
Each commit message follows Conventional Commits.
Large PR with meaningful, well-formed, independently reviewable commits (11 commits, each one logical change).
No stray merge commits from master — branch is rebased cleanly on top of current master.
No fixup! / squash! / wip commits remain.

Scope and Design

Changes are minimal for a codegen-introduction PR; non-essential workflow tooling (merge_base_branches.py from the legacy branch) was dropped.
No dead code, commented-out blocks, debug prints, or TODO without an owner.
No unrelated formatting churn that would obscure the diff.
Public API changes are intentional: the slot-8 dispatch path and the new infini::ops::<Pascal> classes are documented via the codegen's docstring.

General Code Hygiene (applies to all languages)

The code is self-explanatory; comments were added only where the why is non-obvious.
Every modified or added file ends with a single trailing newline.
No trailing whitespace, tab/space mixing, or stray BOMs.
Identifiers in comments and error messages are wrapped in backticks.
All comments and error messages are in English.
Comments and error messages are complete sentences with terminal punctuation.

C++ Specific (if C++ files changed)

Python Specific (if Python files changed)

ruff check is clean for the entire repo.
ruff format --check is clean for the entire repo. Verified per-commit via git rebase master --exec 'ruff format --check . && ruff check .'.
Comments are complete English sentences with backticked code references.
pytest.skip messages are lowercase without terminal period (framework convention).
No blank line between function signature and body when there is no docstring or comment.
Blank lines around control-flow statements.
Blank line before return when not directly following a control-flow statement.
Docstrings follow PEP 257.
Type hints are present on new dataclasses (Param, Op) and on every public function.

Testing

pytest was run locally on every supported platform — pending cross-platform CI completion. NVIDIA in progress at PR-open time.
Reasons for any platform that could not be tested — pending.
New functionality has matching tests under tests/test_torch_ops.py.
Tests use pytest.mark.parametrize correctly: dependent parameters share one decorator (("dtype", "rtol", "atol")); independent parameters use separate decorators.
Default dtype / device parameterization is relied on; op_meta and shape are added with explicit parametrize.
N/A: no flaky tests under parallelism added.
N/A: this is a feature PR, not a bug fix — no regression test required.

Build, CI, and Tooling

Builds cleanly from a fresh directory with pip install .[dev] — pending cross-platform CI.
compile_commands.json regenerates (no change to pyproject.toml's CMAKE_EXPORT_COMPILE_COMMANDS=ON).
N/A: no new backends / devices added.
N/A: CUDA-like GPU mutual exclusion not changed.
Both CI workflows (clang-format.yml, ruff.yml) are expected to be green — verified locally per-commit.
New build dependency pyyaml added to pyproject.toml's [build-system].requires.

Documentation

N/A: README.md, CONTRIBUTING.md, and developer workflow are unchanged for end users; the codegen docstring documents internal behaviour for maintainers.
Codegen docstring + per-section comments explain the codegen pipeline; the PyTorch slot-8 convention is documented in the _PYTORCH_SLOT comment.
N/A: no user-visible breaking change.

Security and Safety

No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
No third-party code added (the codegen reads aten/native_functions.yaml from PyTorch's GitHub but does not vendor it).
No unsafe pointer arithmetic, uninitialized reads, or missing bounds checks introduced.

The torch op codegen script imports `yaml` to parse `scripts/torch_ops.yaml` and PyTorch's `native_functions.yaml`. Since CMake invokes the script at configure time, PyYAML must be available in the build environment.

Frees the `infini::ops::Sigmoid` name for the auto-generated PyTorch operator class emitted by the upcoming `scripts/generate_torch_ops.py`.

Adds two pieces used by the upcoming pybind bindings for auto-generated torch ops: - `detail::ListContains` and an early-out in `Operator::active_implementation_indices` so querying impls for a device the op does not support returns an empty vector instead of crashing in `DispatchFunc`. - `TryDeviceTypeFromString` returning `std::optional<Device::Type>`, so generated bindings can resolve a device name without aborting on unrecognized inputs.

For each entry in `scripts/torch_ops.yaml`, the script finds the matching `.out` variant in PyTorch's `native_functions.yaml` (fetched from GitHub on first invocation, cached under `generated/.cache/`), parses its schema, and emits an InfiniOps base class plus a PyTorch backend specialization at slot 8 that wraps `at::<op>_out`. Key strategies: - Overload-aware lookup: prefers `<name>.out` then any `<name>.<overload>_out`, picking the variant with the most tensor inputs (so `pow.Tensor_Tensor_out` wins over `pow.Tensor_Scalar_out`). - Hidden-parameter pattern: optional types (`Scalar?`, `int[]?`, `ScalarType?`, `Generator?`, …), `bool` defaults, numeric `int`/`float` defaults, `int[N]=[]` defaults, and ATen enum symbols (`Mean`, `Sum`) are filtered from the user-facing API and substituted at the ATen call site. Unlocks reductions, scans, comparisons, losses, and multi-scalar activations from a single mechanism. - Slot 8: reserved for PyTorch backends; native and vendor implementations use 0–7. Also avoids a partial-specialization-after- instantiation conflict with `Operator<Op>` at index 0. - Hand-written-base coexistence: if `src/base/<op>.h` exists, the generator skips emitting `generated/base/<op>.h` so the hand-written one wins. Ops whose pre-existing hand-written base has a different parameter shape (`add`, `linear`, `matmul`, `mul`) are kept out of the YAML; including them would cause the generated torch override to mismatch the hand-written base. - Per-op metadata (`generated/torch_ops_metadata.json`): records the full parameter list per op for the test harness, so adding a new op to the allowlist requires no code changes.

When `WITH_TORCH=ON`, run `scripts/generate_torch_ops.py` at configure time and add the generated tree to the torch source glob and include path. Vendor compilers (`mxcc`/`mcc`) get the same include via the system-`g++` torch recompile loop. When Python bindings are enabled, also install `generated/torch_ops_metadata.json` so the torch-op test can discover the generated catalog at runtime.

Three changes that let `generate_wrappers.py` see the codegen output: - `_find_base_header` resolves an op's base in `src/base/` first, then `generated/base/` — mirroring the C++ include-path order so a hand-written base wins. `_OperatorExtractor`, `_find_optional_tensor_params`, and `_find_vector_tensor_params` use it; clang's parser also picks up `-I generated` so the include in a generated torch source resolves through the parser too. - `_get_all_ops` now scans both base directories and both impl roots (`src/` and `generated/`), so generated PyTorch backends are bound alongside hand-written ones. `_to_include_path` strips either `src/` or `generated/` when emitting legacy-C `#include` directives. - Active-impl device lookup goes through the new `TryDeviceTypeFromString<Self>(device)` helper, returning an empty vector for an unknown name instead of aborting. Also wipes the bindings/src/include output trees at start so files for ops removed from the active set do not linger and get globbed by the next build, and pulls `_get_system_include_flags` out as a module-level `lru_cache` (the `subprocess` probes were the slow path).

Tensor parameters bind to `py::object`, which accepts any Python value and only rejects inside `TensorFromPybind11Handle` at runtime. When a class has both scalar and Tensor overloads of `__call__` or its constructor (e.g. `pow.Tensor_Tensor_out` vs `pow.Tensor_Scalar_out`), pybind's overload resolver tries them in registration order, so the `Tensor` signature swallows scalar calls if it sits first and the call aborts inside the conversion. `_overload_order_key` sorts by (object-like-arg count ascending, total arg count descending), so the most-specific signature is registered first and pybind walks toward more permissive ones only on a real type-mismatch. While here, rename the `__call__` lambda's first parameter from `self` to `op` so it does not collide with ATen ops that take a parameter literally named `self`.

A single parametrized `test_op` reads `generated/torch_ops_metadata.json` (installed alongside the bindings, with a fallback to the source-tree copy), synthesises inputs by parameter type, calls the InfiniOps wrapper at slot 8, and compares each output tensor against `torch.<op>` or its `torch.special` / `torch.nn.functional` counterpart. Adding an op to `scripts/torch_ops.yaml` extends coverage with no test changes. Skip-lists narrow the harness around known harness limitations: vendor kernels that lack a given (op, dtype, device) combination, random ops whose RNG state diverges from a fresh torch reference, low-precision reductions where the functional and `_out` paths diverge, ops that fire CUDA device-side asserts on random inputs, and ops whose inputs or outputs use dtypes outside the InfiniOps `DataType` enum. `tests/conftest.py` now compares non-floating outputs with `torch.equal` (since `torch.allclose` rejects `bool`) and passes `equal_nan=True` for floats so symmetric NaNs (common for special functions fed out-of-domain inputs) do not fail the test.

Reviewers consistently flagged class names like `xlogy_outtensor`, `triangular_solve_x`, `*_grad_input`, `*_forward_output`, `*_n_scalar`, `*_dim_values`, `*_values_stable` etc. as bad public-API naming — the suffix is just an ATen schema artifact and carries no semantic info. Use only the canonical `aten_name` for the InfiniOps class; multiple ATen overloads of the same base op (e.g. `scatter.src`, `scatter.value`, `scatter.reduce`) become overloaded `operator()` methods on a single `Scatter` class, with tensor metadata members shared across overloads. Overloads that collapse to identical visible C++ signatures after hidden defaults are still deduped by `_dedupe_visible_overloads`. The test harness's parametrize-id falls back to `overload_name` so pytest does not collide ids between overloads.

Reviewers flagged on multiple PRs that scalar parameters such as `n` on `special_chebyshev_polynomial_v` were declared in the constructor but never stored on the class — leaving the backend with no way to read them outside of `operator()`. Add a `<type> <name>_;` member for every visible non-tensor parameter, initialized from the matching constructor argument. Same-named scalars across overloads must agree on type; if a later overload disagrees, that overload's value is left default-constructed rather than emitting a conflicting member. Tensor metadata members (`<name>_shape_`, `_strides_`, `_type_`) keep their existing union-across-overloads behaviour.

Reviewers consistently flagged on multiple PRs that semantically critical default-valued parameters were being hidden by the codegen: - `bool upper`, `bool transpose`, `bool unitriangular` on `triangular_solve` (PR #580) - `int diagonal` on `triu` (PR #509) - `int n` on the `special_chebyshev_polynomial_*` family - `str ord` on `linalg_matrix_norm` (PR #280) - `int[N]` dims with `[]` defaults on reductions These were hidden because they have a default in ATen's schema, but defaults do not equal "optional to expose". Stop hiding non-optional default-valued params; they are now visible in the generated `operator()` signatures and forwarded to ATen. Optional ATen types (`Tensor?`, `Scalar?`, `int?`, …) remain hidden for now — exposing them properly requires threading `std::optional` through to ATen, which is a larger refactor and tracked separately.

…tion libclang silently reports the type of `std::vector<int64_t>` parameters as `int` on systems where the STL headers are not fully indexable (observed under the NVIDIA build's libclang). The fallback type then leaks into the generated binding as `const int padding` instead of `const std::vector<int64_t> padding`, and the binding's call to the base operator fails to compile with a long instantiation trace at `Operator::operator()` for any op with `int[N]` schema parameters (im2col, col2im, reflection_pad*, replication_pad*, fft_*, upsample_*, nuclear_norm, …). Adopt the same regex-scan workaround already used for `std::optional<Tensor>` and `std::vector<Tensor>` parameters: scan the base header text for `std::vector<int64_t> <name>` declarations and emit the binding parameter with that exact type, bypassing libclang's inferred spelling.

The wrapper generator picked up `generated/base/<op>.h` headers unconditionally whenever the directory existed. When a CI container inherits a `generated/` tree via rsync but configures with `WITH_TORCH=OFF` (so the codegen never re-runs and the matching torch sources never compile), the generated bindings reference base headers that are not on the include path of any compiled target — `ops.cc` then fails with "fatal error: base/<op>.h: No such file or directory". Skip the `generated/base/` scan unless `--with-torch` is in effect, mirroring the existing gate on `generated/torch/`.

ATen names the first tensor parameter `self` to mirror the method-style invocation `tensor.abs()`. InfiniOps' hand-written bases (`Add`, `Gemm`, …) use `input` for the primary tensor input, matching `CONTRIBUTING.md` §C++'s preference for PyTorch user-facing naming conventions over PyTorch internal C++ names. Rename `self` → `input` at parse time so generated headers stay consistent with hand-written ones.

The generated torch source instantiated all 10 `Operator<Op, kDev, 8>` device specializations unconditionally. Each instantiation pulls in a deep ATen template tree that costs roughly 0.5-1 GB of RSS during compilation; when the build compiles 451 ops in parallel (scikit-build's default ninja `-j$(nproc)`), peak memory exceeds what some CI containers can spare, and `cc1plus` is killed by the OOM killer. Guard each explicit instantiation with `#ifdef WITH_<DEV>`. Each `WITH_<DEV>` macro is set by `target_compile_definitions` (or, for `WITH_METAX` / `WITH_MOORE` / `WITH_CPU`, added to the vendor recompile loop's command line, since those sources are compiled outside the cmake target with the system C++ compiler). A typical NVIDIA-only build now instantiates only `kCpu` + `kNvidia`, cutting template instantiation work to 2 / 10.

The hand-written bases that get added via review (`src/base/<op>.h`) do not carry an `AUTO-GENERATED` header. Generated and reviewed files end up with the same content otherwise — the marker becomes the only visible difference and produces churn during the `generated/` ↔ `src/base/` migration. Drop the marker so a hand-written base is byte-for-byte the same as the generated one.

Some generated signatures (e.g. `Xlogy::operator()(const Tensor input, const Tensor other, Tensor out)` at 89 columns) overflow the 80-column limit enforced by `.clang-format` and CI's `clang-format-action@v4` running `clang-format` v21. The codegen previously emitted them as single lines, so every base PR ran into the same line-length violation once the workflow re-ran. Pipe each emitted header / source through the local `clang-format` (passing `--assume-filename=<path>` so the include-order rule treats each `.cc`'s own header as the primary include). Adds ~30s to a full regeneration but eliminates the recurring CI failure across 433+ PR branches.

The previous fix landed on a slightly older `ruff` version that preferred a multi-line `base_path.write_text(\n ...\n)` form; CI runs the latest `ruff format --check` which collapses the line. Reformatted to match upstream.

Each generated `<op>.cc` instantiates `at::<op>_out(...)`, which expands roughly 0.5-1 GB of ATen template metaprogramming. With 451 ops compiled in parallel at Ninja's default `-j$(nproc)`, peak memory can exceed 30 GB and the OOM killer drops `cc1plus` on build hosts that allocate less RAM (observed on metax, moore, and cambricon CI containers). Add a Ninja job pool `torch_compile=4` and apply it to: - the vendor-system-g++ `add_custom_command` recompile loop (metax / moore), via `JOB_POOL`; - a new `infiniops_torch_objs` OBJECT library for the regular cmake build path (cambricon / nvidia / iluvatar), via `JOB_POOL_COMPILE`. The rest of the build keeps full parallelism.

The codegen pipes generated headers/sources through `clang-format` to satisfy CI's style check. CI containers (metax, moore, cambricon) do not ship a system `clang-format` binary, so cmake-time codegen fails with `FileNotFoundError: clang-format`. Pin it as a build dep so `pip install` provisions `clang-format` into the build env before scikit-build invokes cmake.

CI containers running with `--no-build-isolation` (metax, moore, cambricon) skip `[build-system].requires` and never install `clang-format` from PyPI; system packages do not provide it either, so cmake-time codegen fails with `FileNotFoundError`. Probe `PATH` for `clang-format` at codegen entry; if missing, `pip install clang-format` into the running interpreter and reuse the installed binary. Adds at most a couple of seconds to a first-time configure on hosts without the binary.

Some CI containers (metax, cambricon) run offline and cannot reach PyPI; `pip install clang-format` fails with name-resolution errors and the codegen aborts before any output is written. Generated files live under `generated/` (gitignored), so they do not need to satisfy the repo-level `clang-format` check — they only need to compile. Fall through to writing unformatted output when no `clang-format` binary is reachable. When a binary is available (local dev, online CI), formatting still happens and the output that gets pushed to `src/base/<op>.h` for hand-written-base PRs stays clang-format-clean.

`target_link_libraries(infiniops_torch_objs PUBLIC infiniops)` and `target_sources(infiniops PRIVATE $<TARGET_OBJECTS:...>)` form a cycle that cmake rejects on cambricon ("Cyclic dependencies are allowed only among static libraries"). Inherit `infiniops`'s include directories, compile definitions, and compile options via `$<TARGET_PROPERTY>` generator expressions instead of linking, so the object library compiles with the same settings without a back-edge to `infiniops`.

`torch_mlu` is pinned to an older ATen release whose `<op>_out` overloads do not match the codegen's `pytorch v2.4.0` schema. For example, `at::all_out` in `torch_mlu` only accepts `int64_t dim` or `at::Dimname dim`, while the codegen emits `c10::optional<at::IntArrayRef> dim` (the v2.4.0 `all.dims_out` shape). The build dies with no-known-conversion errors on the first such op. Skip auto-detecting PyTorch on Cambricon for now; the WITH_TORCH backend can be opted in explicitly with `-DWITH_TORCH=ON` once the `torch_mlu` fork catches up with the upstream schema.

Two classes of false failures observed in the cross-platform run: - Multiple ATen overloads sharing one `aten_name` (e.g. `std.dim` and `std.correction`) all map to a single InfiniOps class but have different ATen-side semantics for hidden defaults. The harness builds the same reference call (`torch.<op>(...)`) for every overload, so the secondary overload's nullopt-default behaviour disagrees with the reference. Keep only the first overload of each `aten_name`. - `binary_cross_entropy` / `binary_cross_entropy_backward` carry `weight: Tensor?` (hidden) between visible inputs and `reduction: int` (now visible). The harness passes inputs positionally, so `reduction` lands on the reference's `weight` parameter and `F.binary_cross_entropy` crashes inside `weight.size()`. Skip these ops; the wrapper itself is fine.

`torch.uint16` / `uint32` / `uint64` only exist in PyTorch ≥ 2.3. Vendor forks pinned to older releases (cambricon's `torch_mlu`) fail collection at module import with `AttributeError: module 'torch' has no attribute 'uint16'`. Look up each dtype attribute via `getattr` and drop the missing ones from the supported set.

`mode` blocks indefinitely inside `at::mode_out` when `self` is a MUSA tensor, which hangs the entire CI run for ~30 min before pytest gives up. Add a vendor-hang skip list and put `mode` in it; remove when the `torch_musa` kernel is fixed.

voltjia added 12 commits May 9, 2026 13:24

build: add PyYAML build dependency

7d58db3

The torch op codegen script imports `yaml` to parse `scripts/torch_ops.yaml` and PyTorch's `native_functions.yaml`. Since CMake invokes the script at configure time, PyYAML must be available in the build environment.

refactor(swiglu): move Sigmoid helper to detail::

15da799

Frees the `infini::ops::Sigmoid` name for the auto-generated PyTorch operator class emitted by the upcoming `scripts/generate_torch_ops.py`.

voltjia requested a review from a team May 9, 2026 07:58

voltjia added 16 commits May 9, 2026 16:13

style(scripts): apply ruff format from latest

800619d

The previous fix landed on a slightly older `ruff` version that preferred a multi-line `base_path.write_text(\n ...\n)` form; CI runs the latest `ruff format --check` which collapses the line. Reformatted to match upstream.

style(tests): apply ruff format

a2f1bf5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: YAML-driven torch op codegen with canonical naming and exposed semantic params#595

feat: YAML-driven torch op codegen with canonical naming and exposed semantic params#595
voltjia wants to merge 28 commits intomasterfrom
feat/torch-codegen

voltjia commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

voltjia commented May 9, 2026

Summary

Motivation

Type of Change

Platforms Affected

Test Results on Supported Platforms

Benchmark / Performance Impact

Notes for Reviewers

Checklist

Title, Branch, and Commits

Scope and Design

General Code Hygiene (applies to all languages)

C++ Specific (if C++ files changed)

Python Specific (if Python files changed)

Testing

Build, CI, and Tooling

Documentation

Security and Safety

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant