add workerCount config and multiple worker supports for istio_gateway and istio_virtualservice #5473

sichenzhao · 2025-05-27T04:07:08Z

What does it do ?

#5458

Motivation

#5458

More

Yes, this PR title follows Conventional Commits
Yes, I added unit tests
Yes, I updated end user documentation accordingly

… and istio_virtualservice

linux-foundation-easycla · 2025-05-27T04:07:12Z

✅login: sichenzhao / (9cad7c2)
✅login: sichenzhao / (9cad7c2, 1b7da57)
✅login: sichenzhao / (9cad7c2, 1b7da57, 984dc6b)
✅login: sichenzhao / (9cad7c2, 1b7da57, 984dc6b, a6578e1)

The committers listed above are authorized under a signed CLA.

k8s-ci-robot · 2025-05-27T04:07:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mloiseleur for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-05-27T04:07:16Z

Welcome @sichenzhao!

It looks like this is your first PR to kubernetes-sigs/external-dns 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/external-dns has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2025-05-27T04:07:17Z

Hi @sichenzhao. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ivankatliarchuk · 2025-05-27T08:07:06Z

I didn’t spend too much time on this, so my analyse could be incorrect, but I wanted to provide an overview of why the current implementation is slow and why proposed solution may not necessary provide performance benefits.

Thank you for taking the time to improve the code. It's never easy—especially when the original version was written to simply work, without performance in mind.

General feedback

While I agree that performance improvements are needed, the current approach feels somewhat naive—it's more of a brute-force solution: add more CPU to gain improvement.

Instead, I’d encourage starting with:

A performance audit
A clear refactoring strategy

The guiding principles should be:

✅ Fast
✅ Simple
✅ Proven — with benchmarking tests showing actual performance gains

My perspective

You may not fully agree with my view, but I’m approaching this from a different angle. I see several concerns with the current solution:

❌ No clear evidence that adding workers improves performance
❌ Code complexity has roughly doubled, increasing maintenance burden
❌ The cyclomatic and runtime complexity spans multiple layers and remains unresolved
⚠️ There is a great potentialfor optimizations—especially if caching or short-circuiting can reduce inner loops.
⚠️Jumping straight into multi threading not necessary provides a performance benefit while adding up other costs.

Regarding `Endpoints` method

The time complexity of the Endpoints method is tied to nested loops and the operations inside them. Here’s the breakdown:

Listing Gateways:
-sc.istioClient.NetworkingV1alpha3().Gateways(sc.namespace).List(...) retrieves all gateways.
- Assume this operation is O(n), where n is the number of gateways.
Filtering by Annotations:
- sc.filterByAnnotations(gateways) iterates over all gateways.
- Time complexity: O(n).
Processing Gateways:

The outer loop iterates over the filtered gateways (O(n)).
For each gateway:
- sc.hostNamesFromGateway(gateway) processes gateway.Spec.Servers and their Hosts.
- Assume there are m servers and h hosts per server.
Time complexity: O(m * h).
- sc.endpointsFromGateway(...) processes hostnames (O(k), where k is the number of hostnames).
- Inside this, sc.targetsFromGateway(...) may involve additional loops over services or ingress objects.

Sorting Endpoints:

The final loop sorts the Targets for each endpoint.
Assume there are e endpoints, and sorting each takes O(t log t), where t is the number of targets per endpoint.
Total complexity: O(e * t log t).

Overall Time Complexity:
Combining these, the total time complexity is approximately:

O(n) + O(n) + O(n * (m * h + k)) + O(e * t log t)

Where:

n = number of gateways.
m = number of servers per gateway.
h = number of hosts per server.
k = number of hostnames per gateway.
e = number of endpoints.
t = number of targets per endpoint

So I've created a visual for myself to understand it better

graph TD
    A[Endpoints Method] --> B[Loop over gateways]
    B --> C[Filter by annotations]
    C --> D[Loop over filtered gateways]
    D --> E[Extract hostnames from gateway]
    E --> F[Loop over servers in gateway]
    F --> G[Loop over hosts in server]
    D --> H[Generate endpoints from gateway]
    H --> I[Loop over hostnames]
    I --> J[Extract targets from gateway]
    J --> K[Loop over ingress load balancer]
    J --> L[Loop over services]
    A --> M[Loop over endpoints to sort targets]

if mermaid not visible here

Main chain of nested logic:

Loop over gateways (top-level Gateway objects): O(g)
Filter by annotations (might loop over annotations per gateway): O(a) per gateway
Loop over filtered gateways: up to O(g) again
Loop over servers in gateway: O(s) per gateway
Loop over hosts in server: O(h) per server
Loop over hostnames (hostname annotation and match rule hosts): O(h')
Loop over ingress load balancer entries: O(lb) per target
Loop over services (possibly EndpointSlice or similar): O(svc)

So potential bottlenecks

FilterByAnnotations() Full scan on annotations, maybe unnecessary if rarely used, or could be improvement with different data structure
ForEach Gateway → Server → Host Deep nesting, repeated strings processings
GetTargetsFromGateway() Repeated LB & service resolution
Sort Targets Happens for every endpoint, even when sorting is not always required

I'm not sure, that wrapping all code in go routine is the right approach at the moment.

We all see it different, but this is potential improvements

Reduce nesting in for loops, instead of nest, unwrap them
Avoid Redundant Filtering. Move annotation-based filtering before iterating servers and hosts. If a gateway is irrelevant, skip it early.
Extract & Cache Common Hostnames. Use a helper to extract hostnames once per Gateway, rather than repeating in multiple subcalls
Improve functions like TargetsFromTargetAnnotation and similar methods, that have for loops

There are quite few I/O operations, that may take a while to respond, every call to kubernets API ads up time to overal performance, slicing them, wrapping in go routines could help, but this may kill/throttle a kubernetes API server on the other side. So there are pros/cons as well to optimise them

sc.istioClient.NetworkingV1alpha3().Gateways(sc.namespace).List(ctx, metav1.ListOptions{})
sc.kubeClient.NetworkingV1().Ingresses(namespace).Get(ctx, name, metav1.GetOptions{})
svcInformer.Lister().Services(namespace).List(labels.Everything())
.... probably more ....

szuecs · 2025-05-30T08:27:03Z

just FYI #5458 (comment)

incfly · 2025-06-04T05:20:32Z

@szuecs Thanks, replied it here. #5458 (comment)

Hi @ivankatliarchuk I don't think the issue we are facing is about algorithm level complexity. If there're N number VirtualServices/Gateways resources, etc, we have to process all of them. It won't be that slow if it's pure computation.
The actual slowness comes from the sequentially processing virtual service: and each processing does both computation and IO (API server request like here, https://github.com/kubernetes-sigs/external-dns/blob/master/source/istio_virtualservice.go#L246). That's why adding works definitely works. We could share numbers with prototype if necessary.

Avoid Redundant Filtering. Move annotation-based filtering before iterating servers and hosts. If a gateway is irrelevant, skip it early.

Unfortunately this won't help for us. Because all Istio resources in the cluster we need to process.

The alternative would be deploy multiple external dns and let each of them work on a subset of Istio resources. We decided that's a sub-optimal approach because it would require dev team to coordinate deployments of external dns with Istio CR generation, potentially even in different teams. Therefore we think it's better to make an OSS PR change so that others in the community managing large number of Istio resources with external dns can also benefit.

Hope this helps!

mloiseleur · 2025-06-06T16:27:00Z

API server request like ...

🤔 With an API server request for each VirtualService, then parallelizing R requests cross G gateways may flood this API server with G*R requests in //, no ?

Wdyt about reducing the number of required API Server requests by treating them in batch instead of one by one ?

ivankatliarchuk · 2025-06-16T07:23:08Z

By any chance, could you first do simple improvements?

Step 1.

support for labelselector

external-dns/source/istio_virtualservice.go

Line 142 in 0aababa

virtualServices, err := sc.virtualserviceInformer.Lister().VirtualServices(sc.namespace).List(labels.Everything())
and here

external-dns/source/istio_gateway.go

Line 130 in 0aababa

gwList, err := sc.istioClient.NetworkingV1alpha3().Gateways(sc.namespace).List(ctx, metav1.ListOptions{})

This is not a killer feature, but a good starting point.

Step 2.
Worth to try implement Indexers. Example commit with suggestion #5493 (comment) and implementation #5493 (comment)

And make sure to understand, that API calls are not only calls made to AWS API or any other provider, but internall kubernets calls are API calls.

Step 3

We will have a look how to improve logic, reduce number or loops and etc.
- Simple improvement; instead of iterating over all services for gateway, fetch them with specific selector aka here https://github.com/kubernetes-sigs/external-dns/blob/master/source/istio_virtualservice.go#L468
- Add cache and indexing for ingresses
  
  external-dns/source/istio_virtualservice.go
  
  Line 441 in 0aababa
  
  ingress, err := sc.kubeClient.NetworkingV1().Ingresses(namespace).Get(ctx, name, metav1.GetOptions{})
- So there are way to unwarp all this loops, and reduce number of iterations

Step 4

This PR with workers we will consider it. At the moment this is just a brute force solution, where bottleneck not identified.

k8s-ci-robot · 2025-06-18T15:27:20Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

sichenzhao added 3 commits May 27, 2025 03:55

add workerCount config and multiple worker supports for istio_gateway…

9cad7c2

… and istio_virtualservice

minor fixes

1b7da57

update unit tests

984dc6b

k8s-ci-robot requested a review from ivankatliarchuk May 27, 2025 04:07

k8s-ci-robot requested a review from szuecs May 27, 2025 04:07

k8s-ci-robot added cncf-cla: no needs-rebase labels May 27, 2025

k8s-ci-robot added the needs-ok-to-test label May 27, 2025

k8s-ci-robot added the size/L label May 27, 2025

Merge branch 'master' into multi_worker_istio

a6578e1

k8s-ci-robot added cncf-cla: yes and removed needs-rebase cncf-cla: no labels May 27, 2025

ivankatliarchuk mentioned this pull request Jun 17, 2025

perf(source): benchmarks on EndpointTargetsFromServices #5536

Merged

3 tasks

k8s-ci-robot added the needs-rebase label Jun 18, 2025

ivankatliarchuk mentioned this pull request Jun 20, 2025

WIP chore(source): istio informers index service objects #5547

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add workerCount config and multiple worker supports for istio_gateway and istio_virtualservice #5473

add workerCount config and multiple worker supports for istio_gateway and istio_virtualservice #5473

Uh oh!

sichenzhao commented May 27, 2025

Uh oh!

linux-foundation-easycla bot commented May 27, 2025 •

edited

Loading

Uh oh!

k8s-ci-robot commented May 27, 2025

Uh oh!

k8s-ci-robot commented May 27, 2025

Uh oh!

k8s-ci-robot commented May 27, 2025

Uh oh!

ivankatliarchuk commented May 27, 2025

Uh oh!

szuecs commented May 30, 2025

Uh oh!

incfly commented Jun 4, 2025 •

edited

Loading

Uh oh!

mloiseleur commented Jun 6, 2025

Uh oh!

ivankatliarchuk commented Jun 16, 2025 •

edited

Loading

Uh oh!

k8s-ci-robot commented Jun 18, 2025

Uh oh!

Uh oh!

add workerCount config and multiple worker supports for istio_gateway and istio_virtualservice #5473

Are you sure you want to change the base?

add workerCount config and multiple worker supports for istio_gateway and istio_virtualservice #5473

Uh oh!

Conversation

sichenzhao commented May 27, 2025

What does it do ?

Motivation

More

Uh oh!

linux-foundation-easycla bot commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented May 27, 2025

Uh oh!

k8s-ci-robot commented May 27, 2025

Uh oh!

k8s-ci-robot commented May 27, 2025

Uh oh!

ivankatliarchuk commented May 27, 2025

General feedback

My perspective

Regarding Endpoints method

Uh oh!

szuecs commented May 30, 2025

Uh oh!

incfly commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mloiseleur commented Jun 6, 2025

Uh oh!

ivankatliarchuk commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Jun 18, 2025

Uh oh!

Uh oh!

linux-foundation-easycla bot commented May 27, 2025 •

edited

Loading

Regarding `Endpoints` method

incfly commented Jun 4, 2025 •

edited

Loading

ivankatliarchuk commented Jun 16, 2025 •

edited

Loading