Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/tilegym-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -347,7 +347,7 @@ jobs:
test-benchmark:
name: test-benchmark
needs: [config, build]
timeout-minutes: 40
timeout-minutes: 70
if: |
always() &&
needs.config.outputs.run_benchmark == 'true' &&
Expand Down Expand Up @@ -409,7 +409,7 @@ jobs:
password: ${{ secrets.GITHUB_TOKEN }}

- name: Pull and run benchmarks
timeout-minutes: 35
timeout-minutes: 60
run: |
OWNER_LOWER=$(echo '${{ github.repository_owner }}' | tr '[:upper:]' '[:lower:]')
IMAGE="ghcr.io/${OWNER_LOWER}/${{ needs.config.outputs.image_name }}:${{ needs.config.outputs.image_tag }}"
Expand Down
14 changes: 2 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ We have verified that `torch==2.9.1` works. You can also get `triton` packages w

#### 2. Install TileGym

TileGym uses [`cuda-tile`](https://github.com/nvidia/cutile-python) for GPU kernel programming, which depends on the `tileiras` compiler at runtime.
TileGym uses [`cuda-tile`](https://github.com/nvidia/cutile-python) (≥ 1.3.0) for GPU kernel programming, which depends on the `tileiras` compiler at runtime.

##### Install from PyPI (recommended)

Expand All @@ -77,17 +77,7 @@ pip install .[tileiras] # or: pip install . (if you have system tileiras)

For editable (development) mode, use `pip install -e .` or `pip install -e .[tileiras]`.

##### Install `cuda-tile-experimental`

> ⚠️ **Required**: TileGym kernels use features from [`cuda-tile-experimental`](https://github.com/NVIDIA/cutile-python/tree/main/experimental) (e.g., the autotuner). This package is *not* available on PyPI and must be installed separately from source:
>
> ```bash
> pip install "cuda-tile-experimental @ git+https://github.com/NVIDIA/cutile-python.git#subdirectory=experimental"
> ```
>
> `cuda-tile-experimental` is maintained by the CUDA Tile team as a source-only experimental package. See more details in [experimental-features-optional](https://github.com/NVIDIA/cutile-python?tab=readme-ov-file#experimental-features-optional).

All runtime dependencies (except `cuda-tile-experimental`) are declared in [`requirements.txt`](requirements.txt) and are installed automatically by both `pip install tilegym` and `pip install .`.
All runtime dependencies are declared in [`requirements.txt`](requirements.txt) and are installed automatically by both `pip install tilegym` and `pip install .`.

We also provide Dockerfile, you can refer to [modeling/transformers/README.md](modeling/transformers/README.md).

Expand Down
14 changes: 2 additions & 12 deletions README_chs.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ pip install --pre torch --index-url https://download.pytorch.org/whl/cu130

#### 2. 安装 TileGym

TileGym 使用 [`cuda-tile`](https://github.com/nvidia/cutile-python) 进行 GPU 内核编程,运行时依赖 `tileiras` 编译器。
TileGym 使用 [`cuda-tile`](https://github.com/nvidia/cutile-python)(≥ 1.3.0)进行 GPU 内核编程,运行时依赖 `tileiras` 编译器。

##### 从 PyPI 安装(推荐)

Expand All @@ -77,17 +77,7 @@ pip install .[tileiras] # 或者: pip install . (如果您已有系统级 til

如需可编辑(开发)模式,请使用 `pip install -e .` 或 `pip install -e .[tileiras]`。

##### 安装 `cuda-tile-experimental`

> ⚠️ **必需**:TileGym 内核使用了 [`cuda-tile-experimental`](https://github.com/NVIDIA/cutile-python/tree/main/experimental) 中的功能(如自动调优器)。此包*不*在 PyPI 上提供,必须从源码单独安装:
>
> ```bash
> pip install "cuda-tile-experimental @ git+https://github.com/NVIDIA/cutile-python.git#subdirectory=experimental"
> ```
>
> `cuda-tile-experimental` 由 CUDA Tile 团队维护,仅提供源码安装。更多详情请参阅 [experimental-features-optional](https://github.com/NVIDIA/cutile-python?tab=readme-ov-file#experimental-features-optional)。

所有运行时依赖(`cuda-tile-experimental` 除外)均声明在 [`requirements.txt`](requirements.txt) 中,通过 `pip install tilegym` 和 `pip install .` 都会自动安装。
所有运行时依赖均声明在 [`requirements.txt`](requirements.txt) 中,通过 `pip install tilegym` 和 `pip install .` 都会自动安装。

我们还提供了 Dockerfile,您可以参考 [modeling/transformers/README.md](modeling/transformers/README.md)。

Expand Down
14 changes: 2 additions & 12 deletions README_cht.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ pip install --pre torch --index-url https://download.pytorch.org/whl/cu130

#### 2. 安裝 TileGym

TileGym 使用 [`cuda-tile`](https://github.com/nvidia/cutile-python) 進行 GPU 核心程式設計,執行時期依賴 `tileiras` 編譯器。
TileGym 使用 [`cuda-tile`](https://github.com/nvidia/cutile-python)(≥ 1.3.0)進行 GPU 核心程式設計,執行時期依賴 `tileiras` 編譯器。

##### 從 PyPI 安裝(建議)

Expand All @@ -77,17 +77,7 @@ pip install .[tileiras] # 或者: pip install . (如果您已有系統級 til

如需可編輯(開發)模式,請使用 `pip install -e .` 或 `pip install -e .[tileiras]`。

##### 安裝 `cuda-tile-experimental`

> ⚠️ **必需**:TileGym 核心使用了 [`cuda-tile-experimental`](https://github.com/NVIDIA/cutile-python/tree/main/experimental) 中的功能(如自動調優器)。此套件*不*在 PyPI 上提供,必須從原始碼單獨安裝:
>
> ```bash
> pip install "cuda-tile-experimental @ git+https://github.com/NVIDIA/cutile-python.git#subdirectory=experimental"
> ```
>
> `cuda-tile-experimental` 由 CUDA Tile 團隊維護,僅提供原始碼安裝。更多詳情請參閱 [experimental-features-optional](https://github.com/NVIDIA/cutile-python?tab=readme-ov-file#experimental-features-optional)。

所有執行時期依賴(`cuda-tile-experimental` 除外)均宣告於 [`requirements.txt`](requirements.txt) 中,透過 `pip install tilegym` 和 `pip install .` 都會自動安裝。
所有執行時期依賴均宣告於 [`requirements.txt`](requirements.txt) 中,透過 `pip install tilegym` 和 `pip install .` 都會自動安裝。

我們還提供了 Dockerfile,您可以參考 [modeling/transformers/README.md](modeling/transformers/README.md)。

Expand Down
14 changes: 2 additions & 12 deletions README_fr.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ Nous avons vérifié que `torch==2.9.1` fonctionne. Vous pouvez également obten

#### 2. Installer TileGym

TileGym utilise [`cuda-tile`](https://github.com/nvidia/cutile-python) pour la programmation de noyaux GPU, qui dépend du compilateur `tileiras` à l'exécution.
TileGym utilise [`cuda-tile`](https://github.com/nvidia/cutile-python) (≥ 1.3.0) pour la programmation de noyaux GPU, qui dépend du compilateur `tileiras` à l'exécution.

##### Installer depuis PyPI (recommandé)

Expand All @@ -77,17 +77,7 @@ pip install .[tileiras] # ou : pip install . (si vous avez tileiras sur votre

Pour le mode éditable (développement), utilisez `pip install -e .` ou `pip install -e .[tileiras]`.

##### Installer `cuda-tile-experimental`

> ⚠️ **Requis** : Les noyaux TileGym utilisent des fonctionnalités de [`cuda-tile-experimental`](https://github.com/NVIDIA/cutile-python/tree/main/experimental) (par ex. l'auto-tuner). Ce paquet n'est *pas* disponible sur PyPI et doit être installé séparément depuis les sources :
>
> ```bash
> pip install "cuda-tile-experimental @ git+https://github.com/NVIDIA/cutile-python.git#subdirectory=experimental"
> ```
>
> `cuda-tile-experimental` est maintenu par l'équipe CUDA Tile comme un paquet expérimental disponible uniquement depuis les sources. Voir plus de détails dans [experimental-features-optional](https://github.com/NVIDIA/cutile-python?tab=readme-ov-file#experimental-features-optional).

Toutes les dépendances d'exécution (sauf `cuda-tile-experimental`) sont déclarées dans [`requirements.txt`](requirements.txt) et sont installées automatiquement par `pip install tilegym` et `pip install .`.
Toutes les dépendances d'exécution sont déclarées dans [`requirements.txt`](requirements.txt) et sont installées automatiquement par `pip install tilegym` et `pip install .`.

Nous fournissons également un Dockerfile, vous pouvez consulter [modeling/transformers/README.md](modeling/transformers/README.md).

Expand Down
14 changes: 2 additions & 12 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ pip install --pre torch --index-url https://download.pytorch.org/whl/cu130

#### 2. TileGym のインストール

TileGym は GPU カーネルプログラミングに [`cuda-tile`](https://github.com/nvidia/cutile-python) を使用しており、実行時に `tileiras` コンパイラに依存しています。
TileGym は GPU カーネルプログラミングに [`cuda-tile`](https://github.com/nvidia/cutile-python)(≥ 1.3.0)を使用しており、実行時に `tileiras` コンパイラに依存しています。

##### PyPI からインストール(推奨)

Expand All @@ -77,17 +77,7 @@ pip install .[tileiras] # または: pip install . (システムに tileiras

編集可能(開発)モードの場合は、`pip install -e .` または `pip install -e .[tileiras]` を使用してください。

##### `cuda-tile-experimental` のインストール

> ⚠️ **必須**:TileGym カーネルは [`cuda-tile-experimental`](https://github.com/NVIDIA/cutile-python/tree/main/experimental) の機能(例:オートチューナー)を使用しています。このパッケージは PyPI では提供されて*おらず*、ソースから個別にインストールする必要があります:
>
> ```bash
> pip install "cuda-tile-experimental @ git+https://github.com/NVIDIA/cutile-python.git#subdirectory=experimental"
> ```
>
> `cuda-tile-experimental` は CUDA Tile チームによってソースのみの実験的パッケージとして管理されています。詳細は [experimental-features-optional](https://github.com/NVIDIA/cutile-python?tab=readme-ov-file#experimental-features-optional) をご覧ください。

すべてのランタイム依存関係(`cuda-tile-experimental` を除く)は [`requirements.txt`](requirements.txt) に宣言されており、`pip install tilegym` と `pip install .` の両方で自動的にインストールされます。
すべてのランタイム依存関係は [`requirements.txt`](requirements.txt) に宣言されており、`pip install tilegym` と `pip install .` の両方で自動的にインストールされます。

Dockerfile も提供しています。[modeling/transformers/README.md](modeling/transformers/README.md) を参照してください。

Expand Down
4 changes: 1 addition & 3 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,7 @@ huggingface_hub
matplotlib
pandas
numpy
cuda-tile # Or use: pip install cuda-tile[tileiras] for bundled tileiras compiler
# cuda-tile-experimental is NOT on PyPI and must be installed separately from source:
# pip install "cuda-tile-experimental @ git+https://github.com/NVIDIA/cutile-python.git#subdirectory=experimental"
cuda-tile>=1.3.0 # Or use: pip install cuda-tile[tileiras] for bundled tileiras compiler
filelock>=3.20.3 # CVE fix: GHSA-w853-jp5j-5j7f, GHSA-qmgc-5h2g-mvrw
pillow>=12.1.1 # CVE fix: GHSA-cfh3-3jmp-rvhc
# nvidia-ml-py # optional
14 changes: 6 additions & 8 deletions src/tilegym/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,15 @@ def _check_torch_dependencies():


def _check_ct_experimental_dependency():
"""Verify that cuda-tile-experimental is installed with helpful error message."""
"""Verify that cuda-tile with tune support is installed with helpful error message."""
try:
import cuda.tile_experimental # noqa: F401
import cuda.tile.tune # noqa: F401
except (ImportError, ModuleNotFoundError):
raise ImportError(
"\n\n[TileGym] cuda-tile-experimental is required but not installed.\n"
"It is not available on PyPI and must be installed from source:\n\n"
' pip install "cuda-tile-experimental @ '
'git+https://github.com/NVIDIA/cutile-python.git#subdirectory=experimental"\n\n'
"See: https://github.com/NVIDIA/cutile-python?tab=readme-ov-file"
"#experimental-features-optional\n"
"\n\n[TileGym] cuda.tile.tune is required but not available.\n"
"Please install or upgrade cuda-tile:\n\n"
" pip install cuda-tile\n\n"
"See: https://github.com/NVIDIA/cutile-python"
) from None


Expand Down
36 changes: 36 additions & 0 deletions src/tilegym/kernel_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# SPDX-License-Identifier: MIT

"""Kernel utility functions for TileGym."""

from typing import Any
from typing import Dict
from typing import Optional

from tilegym.logger import get_logger


def get_kernel_configs(default_configs: Dict[str, Any], provided_configs: Optional[Dict[str, Any]] = None):
"""
Merge default kernel configs with provided configs.

Args:
default_configs: Default kernel configuration dictionary.
provided_configs: Optional user-provided configuration dictionary.

Returns:
Merged configuration dictionary with provided configs overriding defaults.
"""
logger = get_logger(__name__)

if provided_configs is None:
return default_configs
# log any differences between default_configs and provided_configs
for key, value in default_configs.items():
if key not in provided_configs:
logger.warning(f"Provided kernel config {key} is not in default: {value}")
continue
if provided_configs[key] != value:
logger.info(f"Provided kernel config {key} differs from default: {value} -> {provided_configs[key]}")
return {**default_configs, **provided_configs}
Loading
Loading