Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Azure VMs pool] Introducing agentpool client #6831

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

wenxuan0923
Copy link

@wenxuan0923 wenxuan0923 commented May 14, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

We will introduce a new vmType: vms to support the autoscaling for single instance vm pool. This new agentpool type will rely on AKS rp for scaling up and down instead of CRP call, which requires the agentpool client, cluster name and cluster resource group info.

In this PR, I'm adding the required info to the config and adding agentpool client into azClient. So far, the construction error is ignored so that no existing behavior will be impacted by this change.

See the PR targeting the 1.29 release: #6685

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label May 14, 2024
@k8s-ci-robot k8s-ci-robot added area/cluster-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/provider/azure Issues or PRs related to azure provider needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 14, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @wenxuan0923. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 14, 2024
Copy link

@tallaxes tallaxes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with minor suggestions

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tallaxes, wenxuan0923
Once this PR has been reviewed and has the lgtm label, please assign x13n for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment on lines 97 to 99
// ARMBaseURL is the URL to use for operations for the VMs pool.
// It can override the default public ARM endpoint for VMs pool scale operations.
ARMBaseURL string `json:"armBaseURL" yaml:"armBaseURL"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used only by AgentPool client, which may not be obvious to those reading the code (especially since cloud-provider client configuration supports overriding RM endpoint for non-public clouds). Should we reflect this in the field name? Maybe something with "AKS" and/or "AP" in the name?

Related: At some point, please document the new configuration options in cloudprovider/azure/README.md

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable prefix with "ARM" which indicates Azure Resource Manager, so I don't think we need to prefix with AKS/AP here if that makes sense

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All existing clients talk to ARM, this field affects only AKS AgentPool client. Ideally the intended use of the variable should be clear from its name. Comments are bonus, but don't replace unambiguous naming.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the name to ARMBaseURLForAPClient

@tallaxes
Copy link

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 30, 2024
@jackfrancis
Copy link
Contributor

@tallaxes does AKS support VM (not VMSS) node pools natively now? I can add an additional test scenario to gather functional coverage for this new scenario.

We don't have to block progress of this PR on that btw.

@@ -87,8 +87,16 @@ type Config struct {
Location string `json:"location" yaml:"location"`
TenantID string `json:"tenantId" yaml:"tenantId"`
SubscriptionID string `json:"subscriptionId" yaml:"subscriptionId"`
ResourceGroup string `json:"resourceGroup" yaml:"resourceGroup"`
VMType string `json:"vmType" yaml:"vmType"`
ClusterName string `json:"clusterName" yaml:"clusterName"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes to the Azure provider config essentially make this an AKS-native provider. Historically this was not the case as it worked in both AKS and non-AKS (e.g., aks-engine, or terraform) modes. I'm not aware of anyone using these non-AKS modes any longer, but we probably want to err on the side of caution and make these new AKS-specific Config properties optional.

I'd suggest making ClusterName andClusterResourceGroup optional.

Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. They are already expected to be absent from azure.json (and thus should not need json/yaml tags?), would be good to ensure (and test) that nothing else in the code treats them as required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this to my attention. Yes, I will make sure these new config values are optional - only required when we are operating on VMs pools.

@tallaxes
Copy link

tallaxes commented May 30, 2024

@jackfrancis For the state of the VM nodepool support in AKS proper, @wenxuan0923 would be the best source. From autoscaler perspective, this PR is one in a sequence, more coming, and there won't be much to test until everything is in place.

@wenxuan0923
Copy link
Author

wenxuan0923 commented May 30, 2024

@tallaxes does AKS support VM (not VMSS) node pools natively now? I can add an additional test scenario to gather functional coverage for this new scenario.

We don't have to block progress of this PR on that btw.

Thanks for reviewing @jackfrancis, as Alex mentioned, the autoscaling feature for single instance pool is still under development. The changes including autoscaler upstream, AKS-rp and overlaymgr. I will let you know once the E2E flow is ready and then we can add more test scenarios for it.

@feiskyer
Copy link
Member

feiskyer commented Jun 4, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 4, 2024
@wenxuan0923
Copy link
Author

/retest

1 similar comment
@wenxuan0923
Copy link
Author

/retest

@k8s-ci-robot
Copy link
Contributor

@wenxuan0923: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-autoscaler-e2e-azure 731045d link false /test pull-cluster-autoscaler-e2e-azure

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants