Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
807c3bc
feat: Add credential test with DNS retry logic
DannyLiCom Aug 12, 2025
6a2e9f9
Fixed a minor bug
DannyLiCom Aug 12, 2025
ceb7c3c
Reduce nesting
DannyLiCom Aug 13, 2025
31ac562
Fixed Pylint
DannyLiCom Aug 28, 2025
215a3b0
Merge branch 'develop' into lidanny/feature/check_dns_access
DannyLiCom Aug 28, 2025
7937988
Merge branch 'develop' into lidanny/feature/check_dns_access
DannyLiCom Sep 1, 2025
9e1ca1b
Fixed the Final newline missing error
DannyLiCom Sep 1, 2025
46288ce
Merge branch 'lidanny/feature/check_dns_access' of https://github.com…
DannyLiCom Sep 1, 2025
634550f
Fixed linter
DannyLiCom Sep 1, 2025
95c4a03
Merge branch 'develop' into lidanny/feature/check_dns_access
DannyLiCom Sep 4, 2025
ff49826
Fix Pytype
DannyLiCom Sep 4, 2025
2e7ce9c
Merge branch 'lidanny/feature/check_dns_access' of https://github.com…
DannyLiCom Sep 4, 2025
199580e
Merge branch 'develop' into lidanny/feature/check_dns_access
scaliby Sep 8, 2025
45c2c90
Merge branch 'develop' into lidanny/feature/check_dns_access
DannyLiCom Sep 12, 2025
782e337
Merge branch 'develop' into lidanny/feature/check_dns_access
scaliby Sep 19, 2025
e2de13a
Fixed mypy
DannyLiCom Oct 1, 2025
be43f0a
Merge branch 'develop' into lidanny/feature/check_dns_access
DannyLiCom Oct 1, 2025
302db1d
Fixed pylint
DannyLiCom Oct 1, 2025
0a00530
pyink
DannyLiCom Oct 1, 2025
38158b4
Change global_args to is_dry_run()
DannyLiCom Oct 1, 2025
3e95b0b
Deleted run_command_and_capture_output()
DannyLiCom Oct 2, 2025
adf19bc
Run golden_buddy.sh
DannyLiCom Oct 2, 2025
c59d437
Fixed ruamel
DannyLiCom Oct 2, 2025
fe71f6e
Remove xpk_print(kubectl_output)
DannyLiCom Oct 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 9 additions & 4 deletions goldens/Basic_cluster_create.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,14 @@ gcloud beta container clusters create golden-cluster --project=golden-project --
gcloud container clusters describe golden-cluster --project=golden-project --region=us-central1 --format="value(privateClusterConfig.enablePrivateNodes)"
[XPK] Private Nodes is not enabled on the cluster.
[XPK] Cluster is public and no need to authorize networks.
[XPK] Try 1: get-credentials to cluster golden-cluster
[XPK] Task: `get-credentials to cluster golden-cluster` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster --region=us-central1 --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Try 1: get-credentials-dns-endpoint to cluster golden-cluster
[XPK] Task: `get-credentials-dns-endpoint to cluster golden-cluster` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster --region=us-central1 --dns-endpoint --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Testing credentials with kubectl...
[XPK] Task: `kubectl get pods` is implemented by the following command not running since it is a dry run.
kubectl get pods
[XPK] Credentials test succeeded.
[XPK] Finished get-credentials and kubectl setup.
[XPK] Task: 'Checking CoreDNS deployment existence' in progress for namespace: kube-system
[XPK] Task: `Check CoreDNS deployment in kube-system` is implemented by the following command not running since it is a dry run.
kubectl get deployment coredns -n kube-system
Expand Down Expand Up @@ -70,7 +75,7 @@ kubectl kueue version
kubectl apply --server-side --force-conflicts -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.12.2/manifests.yaml
[XPK] Wait for Kueue to be fully available
[XPK] Task: `Wait for Kueue to be available` is implemented by the following command not running since it is a dry run.
kubectl wait deploy/kueue-controller-manager -nkueue-system --for=condition=available --timeout=10m
kubectl wait deploy/kueue-controller-manager -n kueue-system --for=condition=available --timeout=10m
[XPK] Install Kueue Custom Resources
[XPK] Try 1: Applying Kueue Custom Resources
[XPK] Task: `Applying Kueue Custom Resources` is implemented by the following command not running since it is a dry run.
Expand Down
11 changes: 8 additions & 3 deletions goldens/Batch.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
$ python3 xpk.py batch --project=golden-project --zone=us-central1-a --cluster=golden-cluster --dry-run batch-read.sh
[XPK] Starting xpk
[XPK] Working on golden-project and us-central1-a
[XPK] Try 1: get-credentials to cluster golden-cluster
[XPK] Task: `get-credentials to cluster golden-cluster` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster --region=us-central1 --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Try 1: get-credentials-dns-endpoint to cluster golden-cluster
[XPK] Task: `get-credentials-dns-endpoint to cluster golden-cluster` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster --region=us-central1 --dns-endpoint --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Testing credentials with kubectl...
[XPK] Task: `kubectl get pods` is implemented by the following command not running since it is a dry run.
kubectl get pods
[XPK] Credentials test succeeded.
[XPK] Finished get-credentials and kubectl setup.
[XPK] Task: `GKE Cluster Get ConfigMap` is implemented by the following command not running since it is a dry run.
kubectl get configmap golden-cluster-resources-configmap -o=custom-columns="ConfigData:data" --no-headers=true
[XPK] Task: `GKE Cluster Get ConfigMap` is implemented by the following command not running since it is a dry run.
Expand Down
13 changes: 9 additions & 4 deletions goldens/Cluster_create_private.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,14 @@ gcloud container clusters describe golden-cluster-private --project=golden-proje
[XPK] Task: `Fetching the list of authorized network from cluster describe.` is implemented by the following command not running since it is a dry run.
gcloud container clusters describe golden-cluster-private --project=golden-project --region=us-central1 --format="value(masterAuthorizedNetworksConfig.cidrBlocks[].cidrBlock)"
[XPK] Current machine's IP adrress is already authorized.
[XPK] Try 1: get-credentials to cluster golden-cluster-private
[XPK] Task: `get-credentials to cluster golden-cluster-private` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster-private --region=us-central1 --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Try 1: get-credentials-dns-endpoint to cluster golden-cluster-private
[XPK] Task: `get-credentials-dns-endpoint to cluster golden-cluster-private` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster-private --region=us-central1 --dns-endpoint --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Testing credentials with kubectl...
[XPK] Task: `kubectl get pods` is implemented by the following command not running since it is a dry run.
kubectl get pods
[XPK] Credentials test succeeded.
[XPK] Finished get-credentials and kubectl setup.
[XPK] Task: 'Checking CoreDNS deployment existence' in progress for namespace: kube-system
[XPK] Task: `Check CoreDNS deployment in kube-system` is implemented by the following command not running since it is a dry run.
kubectl get deployment coredns -n kube-system
Expand Down Expand Up @@ -77,7 +82,7 @@ kubectl kueue version
kubectl apply --server-side --force-conflicts -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.12.2/manifests.yaml
[XPK] Wait for Kueue to be fully available
[XPK] Task: `Wait for Kueue to be available` is implemented by the following command not running since it is a dry run.
kubectl wait deploy/kueue-controller-manager -nkueue-system --for=condition=available --timeout=10m
kubectl wait deploy/kueue-controller-manager -n kueue-system --for=condition=available --timeout=10m
[XPK] Install Kueue Custom Resources
[XPK] Try 1: Applying Kueue Custom Resources
[XPK] Task: `Applying Kueue Custom Resources` is implemented by the following command not running since it is a dry run.
Expand Down
13 changes: 9 additions & 4 deletions goldens/Cluster_create_with_gb200-4.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,14 @@ gcloud beta container clusters create golden-cluster --project=golden-project --
gcloud container clusters describe golden-cluster --project=golden-project --region=us-central1 --format="value(privateClusterConfig.enablePrivateNodes)"
[XPK] Private Nodes is not enabled on the cluster.
[XPK] Cluster is public and no need to authorize networks.
[XPK] Try 1: get-credentials to cluster golden-cluster
[XPK] Task: `get-credentials to cluster golden-cluster` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster --region=us-central1 --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Try 1: get-credentials-dns-endpoint to cluster golden-cluster
[XPK] Task: `get-credentials-dns-endpoint to cluster golden-cluster` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster --region=us-central1 --dns-endpoint --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Testing credentials with kubectl...
[XPK] Task: `kubectl get pods` is implemented by the following command not running since it is a dry run.
kubectl get pods
[XPK] Credentials test succeeded.
[XPK] Finished get-credentials and kubectl setup.
[XPK] Task: 'Checking CoreDNS deployment existence' in progress for namespace: kube-system
[XPK] Task: `Check CoreDNS deployment in kube-system` is implemented by the following command not running since it is a dry run.
kubectl get deployment coredns -n kube-system
Expand Down Expand Up @@ -76,7 +81,7 @@ kubectl kueue version
kubectl apply --server-side --force-conflicts -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.12.2/manifests.yaml
[XPK] Wait for Kueue to be fully available
[XPK] Task: `Wait for Kueue to be available` is implemented by the following command not running since it is a dry run.
kubectl wait deploy/kueue-controller-manager -nkueue-system --for=condition=available --timeout=10m
kubectl wait deploy/kueue-controller-manager -n kueue-system --for=condition=available --timeout=10m
[XPK] Install Kueue Custom Resources
[XPK] Try 1: Applying Kueue Custom Resources
[XPK] Task: `Applying Kueue Custom Resources` is implemented by the following command not running since it is a dry run.
Expand Down
11 changes: 8 additions & 3 deletions goldens/Job_cancel.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,14 @@ $ python3 xpk.py job cancel golden-job --project=golden-project --zone=us-centra
[XPK] Starting xpk
[XPK] Starting job cancel for job: ['golden-job']
[XPK] Working on golden-project and us-central1-a
[XPK] Try 1: get-credentials to cluster golden-cluster
[XPK] Task: `get-credentials to cluster golden-cluster` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster --region=us-central1 --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Try 1: get-credentials-dns-endpoint to cluster golden-cluster
[XPK] Task: `get-credentials-dns-endpoint to cluster golden-cluster` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster --region=us-central1 --dns-endpoint --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Testing credentials with kubectl...
[XPK] Task: `kubectl get pods` is implemented by the following command not running since it is a dry run.
kubectl get pods
[XPK] Credentials test succeeded.
[XPK] Finished get-credentials and kubectl setup.
[XPK] Task: `delete job` is implemented by the following command not running since it is a dry run.
kubectl-kjob delete slurm golden-job
[XPK] Exiting XPK cleanly
11 changes: 8 additions & 3 deletions goldens/Job_list.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
$ python3 xpk.py job ls --project=golden-project --zone=us-central1-a --cluster=golden-cluster --dry-run
[XPK] Starting xpk
[XPK] Working on golden-project and us-central1-a
[XPK] Try 1: get-credentials to cluster golden-cluster
[XPK] Task: `get-credentials to cluster golden-cluster` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster --region=us-central1 --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Try 1: get-credentials-dns-endpoint to cluster golden-cluster
[XPK] Task: `get-credentials-dns-endpoint to cluster golden-cluster` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster --region=us-central1 --dns-endpoint --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Testing credentials with kubectl...
[XPK] Task: `kubectl get pods` is implemented by the following command not running since it is a dry run.
kubectl get pods
[XPK] Credentials test succeeded.
[XPK] Finished get-credentials and kubectl setup.
[XPK] Listing jobs for project golden-project and zone us-central1-a:
[XPK] Task: `list jobs` is implemented by the following command not running since it is a dry run.
kubectl-kjob list slurm --profile xpk-def-app-profile
Expand Down
13 changes: 9 additions & 4 deletions goldens/NAP_cluster-create.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,14 @@ gcloud beta container clusters create golden-cluster --project=golden-project --
gcloud container clusters describe golden-cluster --project=golden-project --region=us-central1 --format="value(privateClusterConfig.enablePrivateNodes)"
[XPK] Private Nodes is not enabled on the cluster.
[XPK] Cluster is public and no need to authorize networks.
[XPK] Try 1: get-credentials to cluster golden-cluster
[XPK] Task: `get-credentials to cluster golden-cluster` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster --region=us-central1 --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Try 1: get-credentials-dns-endpoint to cluster golden-cluster
[XPK] Task: `get-credentials-dns-endpoint to cluster golden-cluster` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster --region=us-central1 --dns-endpoint --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Testing credentials with kubectl...
[XPK] Task: `kubectl get pods` is implemented by the following command not running since it is a dry run.
kubectl get pods
[XPK] Credentials test succeeded.
[XPK] Finished get-credentials and kubectl setup.
[XPK] Task: 'Checking CoreDNS deployment existence' in progress for namespace: kube-system
[XPK] Task: `Check CoreDNS deployment in kube-system` is implemented by the following command not running since it is a dry run.
kubectl get deployment coredns -n kube-system
Expand Down Expand Up @@ -81,7 +86,7 @@ kubectl kueue version
kubectl apply --server-side --force-conflicts -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.12.2/manifests.yaml
[XPK] Wait for Kueue to be fully available
[XPK] Task: `Wait for Kueue to be available` is implemented by the following command not running since it is a dry run.
kubectl wait deploy/kueue-controller-manager -nkueue-system --for=condition=available --timeout=10m
kubectl wait deploy/kueue-controller-manager -n kueue-system --for=condition=available --timeout=10m
[XPK] Install Kueue Custom Resources
[XPK] Try 1: Applying Kueue Custom Resources
[XPK] Task: `Applying Kueue Custom Resources` is implemented by the following command not running since it is a dry run.
Expand Down
13 changes: 9 additions & 4 deletions goldens/NAP_cluster-create_with_pathways.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,14 @@ gcloud beta container clusters create golden-cluster --project=golden-project --
gcloud container clusters describe golden-cluster --project=golden-project --region=us-central1 --format="value(privateClusterConfig.enablePrivateNodes)"
[XPK] Private Nodes is not enabled on the cluster.
[XPK] Cluster is public and no need to authorize networks.
[XPK] Try 1: get-credentials to cluster golden-cluster
[XPK] Task: `get-credentials to cluster golden-cluster` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster --region=us-central1 --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Try 1: get-credentials-dns-endpoint to cluster golden-cluster
[XPK] Task: `get-credentials-dns-endpoint to cluster golden-cluster` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster --region=us-central1 --dns-endpoint --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Testing credentials with kubectl...
[XPK] Task: `kubectl get pods` is implemented by the following command not running since it is a dry run.
kubectl get pods
[XPK] Credentials test succeeded.
[XPK] Finished get-credentials and kubectl setup.
[XPK] Task: 'Checking CoreDNS deployment existence' in progress for namespace: kube-system
[XPK] Task: `Check CoreDNS deployment in kube-system` is implemented by the following command not running since it is a dry run.
kubectl get deployment coredns -n kube-system
Expand Down Expand Up @@ -82,7 +87,7 @@ kubectl kueue version
kubectl apply --server-side --force-conflicts -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.12.2/manifests.yaml
[XPK] Wait for Kueue to be fully available
[XPK] Task: `Wait for Kueue to be available` is implemented by the following command not running since it is a dry run.
kubectl wait deploy/kueue-controller-manager -nkueue-system --for=condition=available --timeout=10m
kubectl wait deploy/kueue-controller-manager -n kueue-system --for=condition=available --timeout=10m
[XPK] Install Kueue Custom Resources
[XPK] Try 1: Applying Kueue Custom Resources
[XPK] Task: `Applying Kueue Custom Resources` is implemented by the following command not running since it is a dry run.
Expand Down
11 changes: 8 additions & 3 deletions goldens/Workload_delete.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,14 @@ $ python3 xpk.py workload delete --project=golden-project --zone=us-central1-a -
[XPK] Starting xpk
[XPK] Starting Workload delete
[XPK] Working on golden-project and us-central1-a
[XPK] Try 1: get-credentials to cluster golden-cluster
[XPK] Task: `get-credentials to cluster golden-cluster` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster --region=us-central1 --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Try 1: get-credentials-dns-endpoint to cluster golden-cluster
[XPK] Task: `get-credentials-dns-endpoint to cluster golden-cluster` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster --region=us-central1 --dns-endpoint --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Testing credentials with kubectl...
[XPK] Task: `kubectl get pods` is implemented by the following command not running since it is a dry run.
kubectl get pods
[XPK] Credentials test succeeded.
[XPK] Finished get-credentials and kubectl setup.
[XPK] Task: `Check if PathwaysJob is installed on golden-cluster` is implemented by the following command not running since it is a dry run.
kubectl get pods -n pathways-job-system --no-headers -o custom-columns=NAME:.metadata.name
[XPK] check_if_pathways_job_is_installed 0 0
Expand Down
11 changes: 8 additions & 3 deletions goldens/Workload_list.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,14 @@ $ python3 xpk.py workload list --project=golden-project --zone=us-central1-a --c
[XPK] Starting xpk
[XPK] Starting workload list
[XPK] Working on golden-project and us-central1-a
[XPK] Try 1: get-credentials to cluster golden-cluster
[XPK] Task: `get-credentials to cluster golden-cluster` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster --region=us-central1 --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Try 1: get-credentials-dns-endpoint to cluster golden-cluster
[XPK] Task: `get-credentials-dns-endpoint to cluster golden-cluster` is implemented by the following command not running since it is a dry run.
gcloud container clusters get-credentials golden-cluster --region=us-central1 --dns-endpoint --project=golden-project && kubectl config view && kubectl config set-context --current --namespace=default
[XPK] Testing credentials with kubectl...
[XPK] Task: `kubectl get pods` is implemented by the following command not running since it is a dry run.
kubectl get pods
[XPK] Credentials test succeeded.
[XPK] Finished get-credentials and kubectl setup.
[XPK] Task: `List Jobs with filter-by-status=EVERYTHING with filter-by-job=None` is implemented by the following command not running since it is a dry run.
kubectl get workloads --ignore-not-found -o=custom-columns="Jobset Name:.metadata.ownerReferences[0].name,Created Time:.metadata.creationTimestamp,Priority:.spec.priorityClassName,TPU VMs Needed:.spec.podSets[0].count,TPU VMs Running/Ran:.status.admission.podSetAssignments[-1].count,TPU VMs Done:.status.reclaimablePods[0].count,Status:.status.conditions[-1].type,Status Message:.status.conditions[-1].message,Status Time:.status.conditions[-1].lastTransitionTime"
[XPK] Workload List Output:
Expand Down
Loading
Loading