Skip to content

feat(kernelgen): import NKIPyKernelGen as a subfolder#55

Closed
shaojiex-aws wants to merge 1 commit into
aws-neuron:feat/kernelgenfrom
shaojiex-aws:feat/kernelgen
Closed

feat(kernelgen): import NKIPyKernelGen as a subfolder#55
shaojiex-aws wants to merge 1 commit into
aws-neuron:feat/kernelgenfrom
shaojiex-aws:feat/kernelgen

Conversation

@shaojiex-aws
Copy link
Copy Markdown

Import the open_source branch of NKIPyKernelGen into kernelgen/ as a self-contained subpackage. NKIPyKernelGen is a compiler that traces NumPy functions and lowers them to NISA (Neuron Instruction Set Architecture) for AWS Neuron hardware. Users write kernels in Python with @trace and knob.knob() annotations; the compiler handles tiling, memory placement, layout legalization, and NISA lowering.

What's included

  • kernelgen/nkipy_kernelgen/ — Python tracing frontend:
    • trace.py (@trace decorator)
    • knob.py (tensor annotations: mem_space, tile_size, reduction_tile, partition_dim)
    • traced_array.py (TracedArray wrapping MLIR SSA values)
    • op_vtable.py (NumPy op → MLIR lowering table)
    • transforms/nkipy_opt.py (pipeline orchestration, shells out to nkipy-opt)
  • kernelgen/mlir/ — MLIR dialect + C++ passes:
    • nkipy.annotate op (target, mem_space, partition_dim, tile_size,
      reduction_tile)
    • 20+ transformation passes under mlir/lib/Transforms/ implementing
      the 24-pass compilation pipeline (InferLayout, KnobDrivenTiling,
      AnnotateMemorySpace, LegalizeLayout, InsertSpillReload,
      LinalgToNisa, etc.)
  • kernelgen/tests/ — test suite:
    • passes/ — per-pass FileCheck tests
    • e2e/ — end-to-end tests (trace → NISA → BIR sim / HW)
    • unit/ — Python-level unit tests
    • harness.py — unified test harness with LLVM/BIR_SIM/HW/FileCheck
      modes
  • kernelgen/examples/ — example kernels
  • kernelgen/compiler_explorer/ — Compiler Explorer wrapper for inspecting
    IR at any pipeline stage
  • kernelgen/setup.py, pyproject.toml, pytest.ini, requirements.txt
    — build + test configuration (pip install -e kernelgen/ builds the
    C++ passes via CMake)
  • kernelgen/CLAUDE.md, README.md — pipeline docs and usage notes

Architecture notes

NKIPyKernelGen depends on the NISA dialect defined in private-nki-staging (the nki wheel). NKIPyKernelGen's nkipy-opt binary performs the tensor-level and bufferization phases; lowering to BIR then runs through the upstream nki-opt-pipeline. This import does not bring in the NISA dialect sources — only NKIPyKernelGen's own passes and frontend.

Ignore rules

Added a !mlir/lib/ override in kernelgen/.gitignore so the parent nkipy repo's lib/ rule (intended for Python venv lib/ dirs) does not silently exclude the MLIR C++ pass sources under kernelgen/mlir/lib/.

Source

Imported from NKIPyKernelGen open_source branch @ commit 973c1be ("fix: correct mem_space enum values in builder.annotate()"). Internal git history is not preserved — this is a single squash import for the open-source release.

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Import the open_source branch of NKIPyKernelGen into `kernelgen/` as a
self-contained subpackage. NKIPyKernelGen is a compiler that traces NumPy
functions and lowers them to NISA (Neuron Instruction Set Architecture)
for AWS Neuron hardware. Users write kernels in Python with `@trace` and
`knob.knob()` annotations; the compiler handles tiling, memory placement,
layout legalization, and NISA lowering.

What's included
---------------
- `kernelgen/nkipy_kernelgen/`  — Python tracing frontend:
    - `trace.py` (@trace decorator)
    - `knob.py` (tensor annotations: mem_space, tile_size, reduction_tile,
      partition_dim)
    - `traced_array.py` (TracedArray wrapping MLIR SSA values)
    - `op_vtable.py` (NumPy op → MLIR lowering table)
    - `transforms/nkipy_opt.py` (pipeline orchestration, shells out to
      `nkipy-opt`)
- `kernelgen/mlir/`             — MLIR dialect + C++ passes:
    - `nkipy.annotate` op (target, mem_space, partition_dim, tile_size,
      reduction_tile)
    - 20+ transformation passes under `mlir/lib/Transforms/` implementing
      the 24-pass compilation pipeline (InferLayout, KnobDrivenTiling,
      AnnotateMemorySpace, LegalizeLayout, InsertSpillReload,
      LinalgToNisa, etc.)
- `kernelgen/tests/`            — test suite:
    - `passes/` — per-pass FileCheck tests
    - `e2e/`    — end-to-end tests (trace → NISA → BIR sim / HW)
    - `unit/`   — Python-level unit tests
    - `harness.py` — unified test harness with LLVM/BIR_SIM/HW/FileCheck
      modes
- `kernelgen/examples/`         — example kernels
- `kernelgen/compiler_explorer/` — Compiler Explorer wrapper for inspecting
  IR at any pipeline stage
- `kernelgen/setup.py`, `pyproject.toml`, `pytest.ini`, `requirements.txt`
  — build + test configuration (`pip install -e kernelgen/` builds the
  C++ passes via CMake)
- `kernelgen/CLAUDE.md`, `README.md` — pipeline docs and usage notes

Architecture notes
------------------
NKIPyKernelGen depends on the NISA dialect defined in private-nki-staging
(the `nki` wheel). NKIPyKernelGen's `nkipy-opt` binary performs the
tensor-level and bufferization phases; lowering to BIR then runs through
the upstream `nki-opt-pipeline`. This import does not bring in the NISA
dialect sources — only NKIPyKernelGen's own passes and frontend.

Ignore rules
------------
Added a `!mlir/lib/` override in `kernelgen/.gitignore` so the parent
nkipy repo's `lib/` rule (intended for Python venv `lib/` dirs) does not
silently exclude the MLIR C++ pass sources under `kernelgen/mlir/lib/`.

Source
------
Imported from NKIPyKernelGen `open_source` branch @ commit 973c1be
("fix: correct mem_space enum values in builder.annotate()"). Internal
git history is not preserved — this is a single squash import for the
open-source release.
Copy link
Copy Markdown
Contributor

@vgene vgene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is quite big with some duplicated components (like tests) and some components that should be in separate PR (like compiler explorer).

Let's also review the necessity of all code under /mlir and /tests. It's best to purge any code that may be unnecessary at this merge time. We can always add them back in later PRs.

Comment thread kernelgen/pyproject.toml
@@ -0,0 +1,3 @@
[build-system]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let make sure this is properly integrated with uv.

Top level pyproject.toml should support kernelgen in monorepo structure.

Comment thread kernelgen/tests/README.md
| Marker | Meaning |
|----------|---------|
| `llvm_sim` | LLVM JIT simulation test (CPU) |
| `bir_sim` | BIR simulation test (CPU) |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we need to include BIR sim test? The device is the golden, like how we do it in nkipy main backend

# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
"""
Ported from nkipy/tests/kernels/attention_dynamo.py
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should deduplicate all nkipy tests. The top level /tests folder can support the kernelgen path

Comment thread kernelgen/README.md
@@ -0,0 +1,99 @@

# NKIPy KernelGen
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss the name "KernelGen"

I don't think it's the best name. Is NKI backend a better name?

@@ -0,0 +1,23 @@
---
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use uv to manage build and avoid these scripts/skill

@@ -0,0 +1,17 @@
# Compiler Explorer local configuration for NKIPy MLIR compiler
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler explorer feels orthogonal to the main component. Let's delay the merge of this part.
1/ this should be a top level folder, we can also support compiler explorer for hlo backend
2/ the examples etc should not be a part of compiler explorer

namespace nkipy {

/// Maximum partition dimension size for NeuronCore hardware.
static constexpr int64_t MAX_PARTITION_DIM = 128;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These numbers can change

@shaojiex-aws
Copy link
Copy Markdown
Author

Superseded by #58, which incorporates the review feedback (renamed to nkigen, dropped compiler_explorer / internal scratch / BIR-sim tests, target-aware HardwareConstants, uv workspace integration, deduplicated tests). Closing in favor of the cleaner re-import.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants