Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zone affinity for uperf clients #811

Merged
merged 3 commits into from
Jun 27, 2023
Merged

Conversation

mukrishn
Copy link
Collaborator

Description

Pod affinity to keep server and client within same zone to avoid external variability.

Fixes

Copy link
Member

@jtaleric jtaleric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is required during scheduling, if the AZ only has a single node, will the client ever launch?

@rsevilla87
Copy link
Member

rsevilla87 commented Apr 27, 2023

Since this is required during scheduling, if the AZ only has a single node, will the client ever launch?

I think not since the policy is requiredDuringSchedulingIgnoredDuringExecution, I think we should add this policy as preferredDuringSchedulingIgnoredDuringExecution to prevent single node on AZ test cases to fail

@mukrishn
Copy link
Collaborator Author

This is keeping the server and client in the same zone, so why can't we enforce it by keeping requiredDuring instead of preferredDuring ? at all cases we will have a single zone isn't it.

@rsevilla87
Copy link
Member

This is keeping the server and client in the same zone, so why can't we enforce it by keeping requiredDuring instead of preferredDuring ? at all cases we will have a single zone isn't it.

In bare metal environments by default there's only one zone, for example

@mukrishn
Copy link
Collaborator Author

In bare metal environments by default there's only one zone, for example

Yes so server and client will use the same zone in that case.

@jtaleric
Copy link
Member

Since this is required during scheduling, if the AZ only has a single node, will the client ever launch?

I think not since the policy is requiredDuringSchedulingIgnoredDuringExecution, I think we should add this policy as preferredDuringSchedulingIgnoredDuringExecution to prevent single node on AZ test cases to fail

Yeah, I think we need to move to preferred vs required.

@mukrishn mukrishn requested a review from smalleni May 2, 2023 15:43
@smalleni
Copy link
Collaborator

smalleni commented May 2, 2023

Since this is required during scheduling, if the AZ only has a single node, will the client ever launch?

I think not since the policy is requiredDuringSchedulingIgnoredDuringExecution, I think we should add this policy as preferredDuringSchedulingIgnoredDuringExecution to prevent single node on AZ test cases to fail

Yeah, I think we need to move to preferred vs required.

This PR is about requiring both server and client to be in the same AZ, not different. And since we will always have atleast 1 AZ, the required requiredDuringSchedulingIgnoredDuringExecution will not cause failures on single zone clusters.

Copy link
Member

@rsevilla87 rsevilla87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smalleni oh, that's right!; somehow I thought it was about deploying the clients/server pairs on different zones, makes sense now. LGTM

@mukrishn
Copy link
Collaborator Author

mukrishn commented May 3, 2023

Note: With this affinity we need at least 2 nodes in a single zone. so airflow jobs worker counts have to be raised to 6.

@mukrishn
Copy link
Collaborator Author

mukrishn commented May 3, 2023

@smalleni just confirmed that baremetal nodes are not having this topology.kubernetes.io/zone label, so it might break BM runs. better to be preferred ?

@jtaleric
Copy link
Member

jtaleric commented May 8, 2023

@smalleni just confirmed that baremetal nodes are not having this topology.kubernetes.io/zone label, so it might break BM runs. better to be preferred ?

Or we apply the label prior to execution...

I ran into this running these scripts against non-ocp deployments too (EKS and GKE in particular).

@stale
Copy link

stale bot commented Jun 18, 2023

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the not_ready label Jun 18, 2023
@mukrishn
Copy link
Collaborator Author

Or we apply the label prior to execution...

Not sure if adding labels would be an appropriate solution without knowing the placement of machines. Never played with zoning on baremetal node before, happy to investigate further if that is a way.

For time being lets make it preferred ?

@rsevilla87
Copy link
Member

Or we apply the label prior to execution...

Not sure if adding labels would be an appropriate solution without knowing the placement of machines. Never played with zoning on baremetal node before, happy to investigate further if that is a way.

For time being lets make it preferred ?

makes sense to me

@stale stale bot removed the not_ready label Jun 20, 2023
@rsevilla87 rsevilla87 added the ok to test Kick off our CI framework label Jun 21, 2023
@codecov-commenter
Copy link

Codecov Report

Merging #811 (10c03cd) into master (0954437) will not change coverage.
The diff coverage is n/a.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@           Coverage Diff           @@
##           master     #811   +/-   ##
=======================================
  Coverage   53.17%   53.17%           
=======================================
  Files           8        8           
  Lines         331      331           
=======================================
  Hits          176      176           
  Misses        155      155           
Flag Coverage Δ
gha 53.17% <ø> (ø)
python-3.9 53.17% <ø> (ø)
unit 53.17% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Copy link
Member

@rsevilla87 rsevilla87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring FIO failures, lgtm!

@rsevilla87 rsevilla87 merged commit 081359d into cloud-bulldozer:master Jun 27, 2023
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok to test Kick off our CI framework
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants