Skip to content

Conversation

@huntergregory
Copy link
Contributor

@huntergregory huntergregory commented Sep 25, 2023

Reason for Change:
KWOK Pods were not being scheduled when running KWOK in a background process. Now, we run KWOK as a Pod.

Also, change logic to wait for pods to run before creating NetPols.

Issue Fixed:

Requirements:

Notes:

@huntergregory huntergregory added npm Related to NPM. ci Infra or tooling. labels Sep 25, 2023
@huntergregory
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@huntergregory
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@huntergregory huntergregory marked this pull request as ready for review September 28, 2023 18:03
@huntergregory huntergregory requested a review from a team as a code owner September 28, 2023 18:03
@huntergregory huntergregory changed the title test(scale): [NPM] fix flakes in kwok test(scale): [NPM] fix flakes in kwok and capture kernel state on failure Sep 28, 2023
@huntergregory
Copy link
Contributor Author

Windows Scale Test typically fails now because HNS latencies have seemed to increase

@huntergregory
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

rayaisaiah
rayaisaiah previously approved these changes Oct 27, 2023
set -e +x
if [[ $endDate -gt $(( startDate + (20*60) )) ]]; then
echo "timed out waiting for all kwok pods to run"
k get pod -n scale-test -owide
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k top pod/node

@vakalapa vakalapa merged commit 7e90960 into master Nov 29, 2023
@vakalapa vakalapa deleted the hg/scale-infra branch November 29, 2023 23:37
matmerr pushed a commit that referenced this pull request Jan 17, 2024
…lure (#2249)

* test(kwok): try standard tier for cluster

* Revert "test(kwok): try standard tier for cluster"

This reverts commit f76e50a.

* test: run kwok as pod

* fix: add execute permission to sh files

* fix: allow scheduling on linux for kwok pod

* fix: wait timeouts and add retry logic

* fix: make sure to reapply kwok nodes if wait fails

* test: print out cluster state if wait fails

* test: prevent kwok from scheduling on windows node

* test: first wait for kwok pods (20 minutes)

* style: rearrange wait check

* fix: scale up kwok controller for reliability

* fix: typo in scaling kwok pods

* fix: check kwok pods running in test-connectivity instead of test-scale

* fix: wait for pods before adding NetPol

* fix: 7 second timeout for windows agnhost connect

* feat: get cluster state on failure

* debug: fake a failure to verify log capture

* fix: bugs in getting cluster state

* fix: remove newline instead of "n"

* Revert "debug: fake a failure to verify log capture"

This reverts commit 24ec927.

* feat(win-debug): get prom metrics

* fix: leave timeout=5s for win

* style: remove new, unused --connect-timeout parameter

* style: comment

* feat: top node/pod
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Infra or tooling. npm Related to NPM.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants