Skip to content

Conversation

@jamOne-
Copy link
Collaborator

@jamOne- jamOne- commented Nov 14, 2025

Description

Changes:

  • slice level label changed from google.com/gke-tpu-slice-{topology}-id -> cloud.google.com/gke-tpu-slice-{topology}-id
  • removed 2x2 sub-slicing topology and set v6e-2x2 sub_slicing_available=False. Justification: subhost slices are not available and 2x2 is a subhost slice as each host has 2x4 chips.
  • it turns out that Kueue TAS requires nodes to contain all Topology labels in order to schedule workloads, meaning we have to generate Topology CR dynamically during cluster create, based on the device type.

Issue

Testing

  • Tested manually.
  • Unit tests coverage.

@jamOne- jamOne- added the release-features features label Nov 14, 2025
@github-actions
Copy link

🤖 Hi @jamOne-, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

@jamOne- jamOne- marked this pull request as ready for review November 14, 2025 16:37
@jamOne- jamOne- requested a review from FIoannides November 18, 2025 21:04
@jamOne- jamOne- enabled auto-merge (squash) November 19, 2025 08:36
@jamOne- jamOne- merged commit fac1e28 into main Nov 19, 2025
11 checks passed
@jamOne- jamOne- deleted the dominikrabij/sub-slicing-topology-fix branch November 19, 2025 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants