-
Notifications
You must be signed in to change notification settings - Fork 287
Container Load Balancer - DiffTracker Integration #8464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Container Load Balancer - DiffTracker Integration #8464
Conversation
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: georgeedward2000 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @georgeedward2000. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
pkg/provider/azure.go
Outdated
az.diffTracker = diffTrackerState | ||
// Print syncOps to surpress unused variable error | ||
klog.V(2).Infof("Sync operations: %v", syncOperations) | ||
// TODO (enechitoaia): call NRP APIs (including ServiceGateway) to update the state of the NRP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think better to list a TODO for the k8 API calls as well here. The actual implementation could be done in a different function to keep it modular.
@@ -183,6 +183,10 @@ func (az *Config) IsLBBackendPoolTypeNodeIP() bool { | |||
return strings.EqualFold(az.LoadBalancerBackendPoolConfigurationType, consts.LoadBalancerBackendPoolConfigurationTypeNodeIP) | |||
} | |||
|
|||
func (az *Config) IsLBBackendPoolTypePodIP() bool { | |||
return strings.EqualFold(az.LoadBalancerBackendPoolConfigurationType, consts.LoadBalancerBackendPoolConfigurationTypeNodeIP) && az.UseStandardV2LoadBalancer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BackendPoolConfigurationType should be PodIP.
return updateK8Resource(input, dt.K8s.Egresses, ResourceTypeEgress) | ||
} | ||
|
||
func updateK8Resource(input UpdateK8sResource, set *utilsets.IgnoreCaseSet, resourceType string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think utilsets.IgnoreCaseSet is not thread-safe. Better to make the update functions thread-safe as multiple service reconcile threads could update in parallel. Also, a read maybe happening in batch processor go routine when write via update function is taking place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd just lock/defer unlock at the entry points, might get complicated otherwise
continue | ||
} | ||
|
||
if location == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be the first check at line 41. We need to continue.
continue | ||
} | ||
|
||
if location == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be the first check. Continue.
NATGatewayUpdates SyncNRPServicesReturnType | ||
LocationData LocationData | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need DTOs for Service (referring to LB/NATGW) also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's discuss
} | ||
|
||
type NRPState struct { | ||
LoadBalancers *utilsets.IgnoreCaseSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if NRPState should represent Service state in NRP. Service state in NRP will have references to LBBackendpool/NATGWs. To fit into the NRP API for Service/Location, we may need to store more information like LB name etc so that we can construct the URI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's discuss
pkg/provider/difftracker/util.go
Outdated
return true | ||
} | ||
|
||
// Map LocationData to LocationDataDTO to be used as payload in ServiceGateway API calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something similar is needed for Service API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's discuss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial thoughts
@@ -183,6 +183,10 @@ func (az *Config) IsLBBackendPoolTypeNodeIP() bool { | |||
return strings.EqualFold(az.LoadBalancerBackendPoolConfigurationType, consts.LoadBalancerBackendPoolConfigurationTypeNodeIP) | |||
} | |||
|
|||
func (az *Config) IsLBBackendPoolTypePodIP() bool { | |||
return strings.EqualFold(az.LoadBalancerBackendPoolConfigurationType, consts.LoadBalancerBackendPoolConfigurationTypeNodeIP) && az.UseStandardV2LoadBalancer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if pool type is podip but lb sku is just standard (not v2)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's discuss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to throw validation error in aks-rp before we pass on the settings to cloud-provider from aks-rp. There may be self-managed clusters wherein, aks-rp is not involved. So, maybe, throw validation error during cloud-provider init so that cloud-provider init fails and customer corrects the config.
pkg/provider/azure.go
Outdated
|
||
// Initialize the diff tracker state and get the necessary operations to sync the cluster with NRP | ||
diffTrackerState, syncOperations := difftracker.InitializeDiffTrackerState(K8sState, NRPState) | ||
az.diffTracker = diffTrackerState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming: difftracker vs difftrackerstate might be confusing (the latter sounds like a part of difftracker itself) - should we align on the name? probably just diffTracker?
pkg/provider/azure.go
Outdated
diffTrackerState, syncOperations := difftracker.InitializeDiffTrackerState(K8sState, NRPState) | ||
az.diffTracker = diffTrackerState | ||
// Print syncOps to surpress unused variable error | ||
klog.V(2).Infof("Sync operations: %v", syncOperations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just do:
diffTrackerState, _ := difftracker.InitializeDiffTrackerState(K8sState, NRPState)
and avoid unnecessary print.
K8s: k8sState, | ||
NRP: nrpState, | ||
} | ||
syncOperations := diffTrackerState.GetSyncDiffTrackerState() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason to do that? While probably this will be a common pattern to call get that right after initialization, maybe we leave it to do to the user?
K8s: k8sState, | ||
NRP: nrpState, | ||
} | ||
syncOperations := diffTrackerState.GetSyncDiffTrackerState() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be called getSyncOperations?
return updateK8Resource(input, dt.K8s.Egresses, ResourceTypeEgress) | ||
} | ||
|
||
func updateK8Resource(input UpdateK8sResource, set *utilsets.IgnoreCaseSet, resourceType string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd just lock/defer unlock at the entry points, might get complicated otherwise
ResourceTypeEgress = "Egress" | ||
) | ||
|
||
func (dt *DiffTrackerState) UpdateK8service(input UpdateK8sResource) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we expect to have more details here about the service config or this is not necessary?
2. Added locking at entry points 3. Added DTOs and Mappers for LocationData and Service within ServiceGateway API 4. Clean code + updated initialization/sync cloud object 5. Added/Updated Tests
pkg/provider/difftracker/handlers.go
Outdated
package difftracker | ||
|
||
func (dt *DiffTracker) handleService(input UpdateK8sResource) SyncServicesReturnType { | ||
dt.mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think better to move the locks within the UpdateK8service. Same comment applies for the other handlers.
pkg/provider/difftracker/handlers.go
Outdated
defer dt.mu.Unlock() | ||
|
||
dt.UpdateK8service(input) | ||
SyncDiffTrackerReturnType := dt.GetSyncLoadBalancerServices() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may not need the Diff. It's just update K8 state. So, I think we don't need the Get functions and as a result, locks can be made more granular.
return syncServices | ||
} | ||
|
||
func (dt *DiffTracker) GetSyncLoadBalancerServices() SyncServicesReturnType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think better to have lock for Get Diff functions as UpdateK8 state and GetDiff will be called independently from reconcile and batchprocessor threads respectively.
utilsets "sigs.k8s.io/cloud-provider-azure/pkg/util/sets" | ||
) | ||
|
||
func (dt *DiffTracker) UpdateNRPLoadBalancers() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, we need to pass in the Diff State that needs to be updated, as current implementation will sync NRP to the latest k8 state. Not the k8 state that we updated in NRP. Same comment applies for all Update functions.
if dt.NRPResources.LoadBalancers.Has(service) { | ||
continue | ||
} | ||
dt.NRPResources.LoadBalancers.Insert(service) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we insert a duplicate entry? I guess it will just replace with the latest entry, which won't impact any impact if the entry is already present. If my understanding is correct, then we can remove the find in line 15. Will avoid cpu cycles for the new entries.
Similar comment applies for other Update functions as well.
continue | ||
} | ||
dt.NRPResources.NATGateways.Insert(egress) | ||
fmt.Printf("Added egress %s to NRP NATGateways\n", egress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we not using klog here?
Comment applies to line 48 also.
serviceRef := createServiceRef(pod) | ||
addressData := Address{ServiceRef: serviceRef} | ||
|
||
if !exists || !serviceRef.Equals(nrpLocation.Addresses[address].Services) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we check if nrpLocation.Addresses[address] is present before comparing services within the address?
pkg/provider/azure.go
Outdated
} | ||
|
||
// Initialize the diff tracker state and get the necessary operations to sync the cluster with NRP | ||
az.diffTracker = difftracker.InitializeDiffTracker(K8s, NRP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: naming, should be k8s, nrp?
// loadBalancerServicesDTO := difftracker.MapLoadBalancerUpdatesToServiceDataDTO(syncOperations.LoadBalancerUpdates) | ||
// natGatewayServicesDTO := difftracker.MapNATGatewayUpdatesToServiceDataDTO(syncOperations.NATGatewayUpdates) | ||
// servicesDTO := difftracker.ServiceDataDTO{ | ||
// Action: difftracker.PartialUpdate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a full update since we know full k8s state at this point?
// } | ||
|
||
// TODO (enechitoaia): Implement the logic for ServiceGatewayClient | ||
// Call ServiceGate APIs (including ServiceGateway) to update the state of the NRP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before that we'd need to create LBs, NATs, LB rules, etc.
case REMOVE: | ||
set.Delete(input.ID) | ||
default: | ||
return fmt.Errorf("error Update%s, Operation=%s and ID=%s", resourceType, input.Operation, input.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd change that as I think the intention was to indicate func name that was called. But in case of rename, or calling from a different func this will be misleading (eg. right now this will print error UpdateService
, while the func name is Update_K8s_Service
). I'd print out all params as they are, and let callers decide how to present it
}, | ||
}, | ||
}, | ||
NRPResources: NRP{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be more readable if you have a helper func to generate these (eg. pass string map like in expectedNRP
). Will shorten this to:
NRPResources: NRP{ | |
K8sResources: gen("node1": { "10.0.0.1": { "service1"}}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some thoughts
pkg/provider/difftracker/util.go
Outdated
Service: service, | ||
LoadBalancerBackendPools: []LoadBalancerBackendPoolDTO{ | ||
{ | ||
Id: fmt.Sprintf("%s-backendpool", service), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to specify the full uri of the LB backendpool resource in NRP.
pkg/provider/difftracker/util.go
Outdated
serviceDTO := ServiceDTO{ | ||
Service: service, | ||
PublicNatGateway: NatGatewayDTO{ | ||
Id: fmt.Sprintf("%s-natgateway", service), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to specify the full URI of NATGateway resource.
1. Drop event handlers and keep divided update k8s, get sync ops, update nrp flows 2. "Update nrp" now uses sync ops 3. Granular locking to address divded flows 4. Update DTO mappings to include URI formatting 5. Updated tests
existingAddress := nrpLocation.Addresses[addressKey] | ||
if !serviceRefs.Equals(existingAddress.Services) { | ||
existingAddress.Services = serviceRefs | ||
nrpLocation.Addresses[addressKey] = existingAddress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to add the existingAddress back as it's already retrieved in line 78, which means it already exists in nrpLocation.Addresses.
) | ||
|
||
// SyncServices handles the synchronization of services between K8s and NRP | ||
func syncServices(k8sServices, Services *utilsets.IgnoreCaseSet) SyncServicesReturnType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename it to GetServicesToSync to imply that it returns the services to sync?
pkg/provider/difftracker/types.go
Outdated
Pods map[string]Pod | ||
} | ||
|
||
type K8s struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename it to K8s_State and NRP_State for better readability?
pkg/provider/difftracker/types.go
Outdated
} | ||
|
||
// LocationDataDTO represents the DTO for LocationData | ||
type LocationDataDTO struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LocationsDataDTO is better?
pkg/provider/difftracker/types.go
Outdated
isDelete bool | ||
} | ||
|
||
type ServiceDataDTO struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ServicesDataDTO is better?
pkg/provider/difftracker/util.go
Outdated
func (dt *DiffTracker) DeepEqual() bool { | ||
// Compare Services with LoadBalancers | ||
if dt.K8sResources.Services.Len() != dt.NRPResources.LoadBalancers.Len() { | ||
klog.Errorf("Services and LoadBalancers length mismatch") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mismatch between k8 and NRP is not really an error. It could be just a transient state where we didn't sync NRP with k8 yet. So, suggesting to not log mismatches as an error in this function.
klog.Errorf("Identity %s not found in Services for pod %s in node %s\n", identity, podKey, nodeKey) | ||
return false | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we check if services in nrp are different from k8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It checks on line 72 (Inbound) and line 90 (Outbound) for services in NRP_State and not in K8s_State
subscriptionID, | ||
resourceGroup, | ||
service, | ||
fmt.Sprintf("%s-backendpool", service), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the name of the backendpool that's used for CLB in createOrUpdateBackendPool. You could refer to getLocalServiceBackendPoolName()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type property has been introduced for Service. We could pass Type as "Inbound" as we are mapping LB to Service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I found in a test for createOrUpdateBackendPool, which follows the same structure as the one I used:
ptr.To(fmt.Sprintf("/subscriptions/subscriptionID/resourceGroups/rg/providers/Microsoft.Network/loadBalancers/%s/backendAddressPools/%s", lbName, bpName)),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought line 350 should be fmt.Sprintf("%s",service). I mean not "%s-backendpool".
for _, service := range natGatewayUpdates.Additions.UnsortedList() { | ||
serviceDTO := ServiceDTO{ | ||
Service: service, | ||
PublicNatGateway: NatGatewayDTO{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type property has been introduced for Service. We could pass Type as "Outbound" as we are mapping NatGw to Service.
What type of PR is this?
/kind features
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: