-
Notifications
You must be signed in to change notification settings - Fork 260
test(scale): [NPM] fix flakes in kwok and capture kernel state on failure #2249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This reverts commit f76e50a.
3c8e43b to
b2b5913
Compare
Contributor
Author
|
/azp run |
|
Azure Pipelines successfully started running 5 pipeline(s). |
Contributor
Author
|
/azp run |
|
Azure Pipelines successfully started running 4 pipeline(s). |
5e9929b to
899b320
Compare
Contributor
Author
|
Windows Scale Test typically fails now because HNS latencies have seemed to increase |
Contributor
Author
|
/azp run |
|
Azure Pipelines successfully started running 4 pipeline(s). |
rayaisaiah
previously approved these changes
Oct 27, 2023
huntergregory
commented
Nov 14, 2023
| set -e +x | ||
| if [[ $endDate -gt $(( startDate + (20*60) )) ]]; then | ||
| echo "timed out waiting for all kwok pods to run" | ||
| k get pod -n scale-test -owide |
Contributor
Author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k top pod/node
vakalapa
approved these changes
Nov 29, 2023
matmerr
pushed a commit
that referenced
this pull request
Jan 17, 2024
…lure (#2249) * test(kwok): try standard tier for cluster * Revert "test(kwok): try standard tier for cluster" This reverts commit f76e50a. * test: run kwok as pod * fix: add execute permission to sh files * fix: allow scheduling on linux for kwok pod * fix: wait timeouts and add retry logic * fix: make sure to reapply kwok nodes if wait fails * test: print out cluster state if wait fails * test: prevent kwok from scheduling on windows node * test: first wait for kwok pods (20 minutes) * style: rearrange wait check * fix: scale up kwok controller for reliability * fix: typo in scaling kwok pods * fix: check kwok pods running in test-connectivity instead of test-scale * fix: wait for pods before adding NetPol * fix: 7 second timeout for windows agnhost connect * feat: get cluster state on failure * debug: fake a failure to verify log capture * fix: bugs in getting cluster state * fix: remove newline instead of "n" * Revert "debug: fake a failure to verify log capture" This reverts commit 24ec927. * feat(win-debug): get prom metrics * fix: leave timeout=5s for win * style: remove new, unused --connect-timeout parameter * style: comment * feat: top node/pod
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reason for Change:
KWOK Pods were not being scheduled when running KWOK in a background process. Now, we run KWOK as a Pod.
Also, change logic to wait for pods to run before creating NetPols.
Issue Fixed:
Requirements:
Notes: