Skip to content

feat: migrate cuda.tile_experimental.autotune_launch → cuda.tile.tune.exhaustive_search & other updates#114

Merged
hannahli-nv merged 6 commits into
mainfrom
tilegym_update
Apr 25, 2026
Merged

feat: migrate cuda.tile_experimental.autotune_launch → cuda.tile.tune.exhaustive_search & other updates#114
hannahli-nv merged 6 commits into
mainfrom
tilegym_update

Conversation

@hannahli-nv
Copy link
Copy Markdown
Collaborator

@hannahli-nv hannahli-nv commented Apr 23, 2026

Description

Update codes.

This PR contains 5 new commit(s).

Commits included:

e899fc7 Fix attention calling error
968f54c Fix CUPTI flag
333ff32 Use cutile new autotuner for remaining kernels
12002f9 feat(flashinfer): Add flashinfer kernel, support flashinfer
a808f48 feat: migrate cuda.tile_experimental.autotune_launch → cuda.tile.tune.exhaustive_search

CI Configuration

config:
  build: true
  # valid options are "ops" and "benchmark"
  test: []

Checklist

  • Code formatted and imports sorted via repo specifications (./format.sh)
  • Documentation updated (if needed)
  • CI configuration reviewed

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 23, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/ok to test e899fc7

@hannahli-nv hannahli-nv requested a review from xjmxyt April 23, 2026 09:59
@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/ok to test 4fdd159

@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/ok to test 86d96a3

@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/ok to test e971116

@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/ok to test 4765bdc

@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/ok to test 2e9e9ad

xjmxyt and others added 6 commits April 25, 2026 09:34
… budgets

The migration to cuda.tile.tune.exhaustive_search exhaustively searches
the entire config space and has no built-in per-config compile timeout,
so slow-to-compile configs on sm120 can stall CI. Scope the compile
timeout to autotune only, and raise CI step/job budgets to absorb the
longer adaptive-repeat measurement loop in the new tune API.

- Wrap every cuda.tile.tune.exhaustive_search call site (13 across 10
  op files) with `with ct.compiler_timeout(5):` so individual slow
  configs are killed and routed to result.failures while non-autotune
  ct.launch compiles remain unaffected.
- Bump the test-benchmark job timeout 40 -> 70 min and the
  "Pull and run benchmarks" step timeout 35 -> 60 min.
- Bump the per-benchmark subprocess timeout in run_all_json.py from
  10 min -> 20 min.
@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/ok to test 161ef03

@hannahli-nv hannahli-nv merged commit 6311a1e into main Apr 25, 2026
13 checks passed
@hannahli-nv hannahli-nv deleted the tilegym_update branch April 25, 2026 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants