Skip to content

Container Load Balancer - Load Balancer #8872

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

georgeedward2000
Copy link

What type of PR is this?

/kind feature

What this PR does / why we need it:

The key changes:

  • NRP Service Operations - NRP service create/delete and hooking of LB BP to NRP service
  • LocationAndNrpServiceBatchUpdater- Programming ServiceDto in NRP
  • Location Update API for Inbound scenarios- Programming of LocationDataDto in NRP(offline POD IP population in BP Pool)
  • Node Add/Remove to VMSS

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Added the ability to set the type of backend pool to PodIP. 
This action allows the Load balancer attached to the service to route traffic directly to the containers, 
instead of routing to AKS nodes and then AKS node routing it to the PODs hosting the service.

NRP Service Operations - NRP service create/delete and hooking of LB BP to NRP service
LocationAndNrpServiceBatchUpdater- Programming ServiceDto in NRP
Location Update API for Inbound scenarios- Programming of LocationDataDto in NRP(offline POD IP population in BP Pool)
Node Add/Remove to VMSS
@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Apr 14, 2025
Copy link

linux-foundation-easycla bot commented Apr 14, 2025

CLA Not Signed

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: georgeedward2000
Once this PR has been reviewed and has the lgtm label, please assign feiskyer for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from mainred and nilo19 April 14, 2025 07:31
@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 14, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @georgeedward2000. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@github-actions github-actions bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Apr 14, 2025
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 14, 2025
// Node additions and removals to VMSS (Virtual Machine Scale Sets) do not require reconcileLB operations.
// The changes will be propagated via Location Update API upon endpoint-slice events and backendpool will be updated by NRP.
if az.IsLBBackendPoolTypePodIP() && az.UseStandardV2LoadBalancer() {
isOperationSucceeded = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to the beginning? I don't think we need to count a successful operation here as it is skipped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about externalTrafficPolicy:cluster services?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With CLB, the traffic is directed to the POD that's hosting the service and iptables is not used to choose the POD. So, I think CLB applies to only externalTrafficPolicy:local services.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check and return an error in this podIP+Local case?

}
}
}

lb, err = az.reconcileBackendPoolHosts(ctx, lb, lbToReconcile, service, nodes, clusterName, vmSetName, lbBackendPoolIDs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need reconcile backend pool in this case? Or the batch updater does everything?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reconcileBackendPoolHosts calls EnsureHostsInPool, which is a NOP for the POD IP type backendpool. NRP adds the backendaddresses in the LB backendpool. Batch processor provides the PODs associated with the service to NRP in UpdateLocations API.

klog.V(2).Info("locationAndNRPServiceBatchUpdater.run: started")
for {
select {
case <-updater.channelUpdateTrigger:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the batching logic of this updater?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Difftracker keeps on updating the K8S state with multiple cluster updates (batching). At some point, the batch updater is triggered and uses the Difftracker K8S state as the source of truth for updating NRP

@@ -130,6 +130,8 @@ type Cloud struct {
eventRecorder record.EventRecorder
routeUpdater batchProcessor
backendPoolUpdater batchProcessor
// TBD (enechitoaia): Do we want to use the save interface: batchProcessor? even though we are not using interval based updater?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we will update NRP at regular intervals, thus batching the updates. Any specific reason why we can't use the same interface?

},
)
az.localServiceNameToNRPServiceMap.Delete(serviceUID)
select {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the batchProcessor runs at periodic intervals and updates the NRP based on the K8 state that's set here.

Copy link

@kartickmsft kartickmsft Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to invoke NRP updateService API to remove ref to LB Backendpool here. reconcileLoadBalancer on line 504 will delete the LB/LB Backendpool resource in NRP and batchProcessor will delete NRP service after removing the reference of service from the Locations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should call the NRP API directly without queuing it to the batch updater?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. For removing reference of BP from NRP Service. Otherwise, NRP will throw validation error while deleting the LB/LB Backendpool resource.

@@ -475,6 +482,25 @@ func (az *Cloud) EnsureLoadBalancerDeleted(ctx context.Context, clusterName stri
return err
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serviceReconcileLock need not be acquired on line 420 as we have CLB per service. Similar comment applies at other places also where we are acquiring serviceReconcileLock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lock is only acuqired when the service is being deleted.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it's acquired in functions like EnsureLoadBalancer, which is used to create/update the LB. Am I missing something?

@@ -1976,6 +2002,28 @@ func (az *Cloud) reconcileLoadBalancer(ctx context.Context, clusterName string,
if az.UseMultipleStandardLoadBalancers() {
lbToReconcile = existingLBs
}

if az.IsLBBackendPoolTypePodIP() && az.UseStandardV2LoadBalancer() && strings.EqualFold(string(service.Spec.ExternalTrafficPolicy), string(v1.ServiceExternalTrafficPolicyTypeLocal)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We throw error and fail if ExternalTrafficPolicy is Cluster with StandardV2LB and POD IP backendpooltype??

},
)
az.localServiceNameToNRPServiceMap.Store(serviceUID, struct{}{})
select {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NRP service create/service ->LB backendpool ref update could happen in the periodic batch processor.

}
}
}

lb, err = az.reconcileBackendPoolHosts(ctx, lb, lbToReconcile, service, nodes, clusterName, vmSetName, lbBackendPoolIDs)
if err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to modify getExpectedLBRules to pass empty probe property as we don't separately probe for CLB. This change was present in one of the earlier PRs. Maybe, got missed out in this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are referring to the changes in this PR: #8218
These will be eventually merged in together.

az.difftracker.UpdateK8sEndpoints(updateK8sEndpointsInputType)
}

// TO BE DISCUSSED (enechitoaia): do we want to trigger the batch updater here too?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, no.

if createServicesResponseDTO.Error == nil {
az.difftracker.UpdateNRPLoadBalancers(serviceLoadBalancerList.Additions)
} else {
// discuss action upon failure

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could log an event like what's done for backendpool updater and return. Continuing to the next set of operations may lead to NRP validation failures, causing NRP API failures.

// Update all locations and addresses
locationData := updater.az.difftracker.GetSyncLocationsAddresses()
locationDataRequestDTO := MapLocationDataToDTO(locationData)
locationDataResponseDTO := NRPAPIClient.UpdateNRPLocations(ctx, locationDataRequestDTO)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should invoke the UpdateNRPLocations API only if locationRequestDTO has non-empty locations for update.

Resource: serviceUID,
},
)
az.localServiceNameToNRPServiceMap.Store(serviceUID, struct{}{})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are doing a LoadOrStore in the batchProcessor. I think that is the right place (more granular) to enable Informer code path to start updating the k8 state in the DiffTracker.

for _, serviceName := range serviceLoadBalancerList.Deletions {
updater.az.localServiceNameToNRPServiceMap.Delete(serviceName)

updateK8sEndpointsInputType := UpdateK8sEndpointsInputType{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when a service is being deleted, should we get the NRP state of the Locations associated with the service from DiffTracker and update NRP to disassociate a Service from a Location. Not sure if NRP will throw validation failures if some new location is passed based on latest k8 state for which NRP doesn't have the corresponding location -> service association. I think we need to check NRP behavior.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NRP behavior ignores Addresses that are not present in it's KV store. So, existing code changes are fine.

}

for _, serviceName := range serviceLoadBalancerList.Deletions {
updater.az.localServiceNameToNRPServiceMap.Delete(serviceName)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are deleting from the Map during deletion of LB. I think that's the right place because it will prevent further updates of endpoint-slices(k8 location state) in difftracker of locations for which service association is going to be deleted.

Resource: serviceUID,
},
)
az.localServiceNameToNRPServiceMap.Delete(serviceUID)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think better to move this to line 487 to prevent further update of locations/addresses in k8 difftracker state a little early.

@@ -1976,6 +2002,28 @@ func (az *Cloud) reconcileLoadBalancer(ctx context.Context, clusterName string,
if az.UseMultipleStandardLoadBalancers() {
lbToReconcile = existingLBs
}

if az.IsLBBackendPoolTypePodIP() && az.UseStandardV2LoadBalancer() && strings.EqualFold(string(service.Spec.ExternalTrafficPolicy), string(v1.ServiceExternalTrafficPolicyTypeLocal)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ispodip and usev2lb always show together. can we merge them into a func?

}

func (updater *locationAndNRPServiceBatchUpdater) process(ctx context.Context) {
updater.lock.Lock()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process will be called only from one go routine. So, locking is not needed.

@nilo19
Copy link
Contributor

nilo19 commented Apr 29, 2025

@georgeedward2000 could you please add a design doc here? This feature has many big changes, and a clear documentation may help code review. Thanks.

- Update batch updater
- Handle error cases
- Refactor
klog.V(2).Info("az.locationAndNRPServiceBatchUpdater.channelUpdateTrigger is full. Batch update is already triggered.")
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLB(podip based backendpool and SLBv2 SKU) allows multiple LBs(LB per service), helping overcome the limitations for which multi-slb is used. So, I think better to ensure that error is returned when using multi-slb with CLB(podip based backendpool and SLBv2 SKU) in the function reconcileMultipleStandardLoadBalancerConfigurations when CLB is used.

@@ -1976,6 +2006,33 @@ func (az *Cloud) reconcileLoadBalancer(ctx context.Context, clusterName string,
if az.UseMultipleStandardLoadBalancers() {
lbToReconcile = existingLBs
}

if az.IsLBBackendPoolTypePodIPAndUseStandardV2LoadBalancer() {
if !strings.EqualFold(string(service.Spec.ExternalTrafficPolicy), string(v1.ServiceExternalTrafficPolicyTypeLocal)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think beginning of reconcileService is the better place to validate this config.

oldAddresses: nil,
newAddresses: az.getPodIPToNodeIPMapFromEndpointSlice(es, false),
}
az.difftracker.UpdateK8sEndpoints(updateK8sEndpointsInputType)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since, we are not doing interval based batching updates, I think we should trigger the batchupdater from here.

az.difftracker.UpdateK8sEndpoints(updateK8sEndpointsInputType)
}

// TO BE DISCUSSED (enechitoaia): do we want to trigger the batch updater here too?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since, we are not doing interval based batching updates, I think we should trigger the batchupdater from here.

az.difftracker.UpdateK8sEndpoints(updateK8sEndpointsInputType)
}

// TO BE DISCUSSED (enechitoaia): do we want to trigger the batch updater here too?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for Add/Update.

removeServicesResponseDTO := NRPAPIClient.UpdateNRPServices(ctx, removeServicesRequestDTO)

if removeServicesResponseDTO.Error == nil {
az.difftracker.UpdateNRPLoadBalancers(serviceLoadBalancerList.Additions)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removals??
"https://github.com/kubernetes-sigs/cloud-provider-azure/pull/8464/files#:~:text=func%20(dt%20*DiffTracker)%20UpdateNRPLoadBalancers(syncServicesReturnType%20SyncServicesReturnType)%20%7B" takes SyncServicesReturnType, which has Additions and Removals.
Has the DiffTracker API changed or am I missing something?

}

func (updater *locationAndNRPServiceBatchUpdater) addOperation(operation batchOperation) batchOperation {
// This is a no-op function. The actual processing is done in the run method.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, we can mention that add/remove operation is handled via the DiffTracker APIs.

@georgeedward2000
Copy link
Author

Design Doc to lift review blockers

Background

The following User stands for human users, or any cluster provisioning tools (e.g., AKS).

This design document describes the user-facing design and workflow of the Standard V2 LoadBalancer - Container Based Backendpool.

Design

  1. Current users do not need to take any action, and the ongoing changes will not affect them.

  2. New Users must create a Container Based Cluster.

  3. Then, they must create a Standard V2 sku Load Balancer

  4. After that, no further action is needed, as all networking resources will be automatically provisioned when new pods, LoadBalancer services, and egresses are created.

Workflow

  1. Introduces DiffTracker API to keep track of the synchronization between the Kubernetes (K8s) cluster and its NRP resources structure.

  2. Introduces LocationAndNRPServiceBatchUpdater as a parallel thread worker used as the main engine for DiffTracker synchronization. It runs continuously, waiting for updates in the cluster on a boolean channel and triggering NRP API requests.

  3. All updates within the K8s cluster (regarding pods, services, egresses, etc.) are stored within DiffTracker.

  4. After each update, a boolean value of true is sent through the LocationAndNRPServiceBatchUpdater channel.

  5. The LocationAndNRPServiceBatchUpdater run method waits to consume booleans from the channel. When a value is consumed, it uses all the updates stored in DiffTracker to update NRP using the NRP API. Eventually, all successful NRP API calls are stored back into DiffTracker to assert the equivalence of the two structures (K8s cluster and Azure).

  6. DiffTracker also functions as a batch operations aggregator. When a cluster undergoes multiple rapid updates, DiffTracker's state will continuously be updated. Meanwhile, the LocationAndNRPServiceBatchUpdater, running on a single thread, will consume updates from the channel. As a result, multiple updates can accumulate and be ready to be sent to the NRP in a single batch.

az.difftracker.UpdateK8sEndpoints(updateK8sEndpointsInputType)
}

select {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be within the 384-391 if block. I mean we should trigger update only when the corresponding NRP service has been created or is in the process of being created.

az.difftracker.UpdateK8sEndpoints(updateK8sEndpointsInputType)
}

select {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for UpdateFunc

az.difftracker.UpdateK8sEndpoints(updateK8sEndpointsInputType)
}

select {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for UpdateFunc

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 6, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants