Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Antrea1.4: Windows node service route logic looks to be broken #4467

Closed
shettyg opened this issue Dec 13, 2022 · 4 comments · Fixed by #4470
Closed

Antrea1.4: Windows node service route logic looks to be broken #4467

shettyg opened this issue Dec 13, 2022 · 4 comments · Fixed by #4470
Labels
kind/bug Categorizes issue or PR as related to a bug. reported-by/end-user Issues reported by end users.

Comments

@shettyg
Copy link
Contributor

shettyg commented Dec 13, 2022

Describe the bug

There looks to be a few issues with SVC routes in windows in Antrea 1.4. Atleast by reading the code, the same issues look to exist in the master branch too.

  1. The Reconcile() function in pkg/agent/route/route_windows.go is called by passing argument "svcIPs map[string]bool". The contents of this data structure look like this:
    map[10.96.0.1:true 10.96.0.10:true 10.96.1.70:true 10.96.10.217:true

But, the "routes" from c.listRoutes() has serviceIPs with their subnet specified. Example: 10.96.0.1/32. So you will always end up with calling util.RemoveNetRoute() for each of the serviceIPs in that function

  1. The Reconcile() function and addServiceRoute() in pkg/agent/route/route_windows.go can be called in parallel threads. So while addServiceRoute() adds routes, Reconcile() deletes routes.

  2. addServiceRoute() does: obj, found := c.hostRoutes.Load(svcIP.String()), but at the end of function loads it with: c.hostRoutes.Store(route.DestinationSubnet.String(), route). So the key is different types.

To Reproduce
Restart antrea-agent and you will see that some services are missing in the windows routes.

Tangential question: Why do we need to add each service IP route? Why not just put the serviceCIDR?

CC: @wenyingd @hongliangl

@shettyg shettyg added the kind/bug Categorizes issue or PR as related to a bug. label Dec 13, 2022
@hongliangl
Copy link
Contributor

I have a PR #3889 that unifies some functions of Windows and Linux. This PR may fix the issue. In this PR,

  • when starting Antrea proxy, aggregated route is calculated from all existing svc IPs.
  • when adding a new svc, if the new aggregated route doesn't contain the svc IP, extend the aggregated route to contain the svc IP; otherwise, do nothing.
  • when deleting a svc, do nothing.

@hongliangl
Copy link
Contributor

hongliangl commented Dec 13, 2022

This PR #3889 plans to be merged in Antrea 1.11.

hongliangl added a commit to hongliangl/antrea that referenced this issue Dec 13, 2022
Fix antrea-io#4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this issue Dec 13, 2022
Fix antrea-io#4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
@hongliangl
Copy link
Contributor

It is not easy to backport for PR #3389. I made another PR #4470 to fix the issue and it is easy to backport. @shettyg

@tnqn
Copy link
Member

tnqn commented Dec 13, 2022

Tangential question: Why do we need to add each service IP route? Why not just put the serviceCIDR?

@shettyg it's because K8s doesn't expose the serviceCIDR information anywhere officially. We wanted to avoid asking user to configure the value when we already watch the service API with AntreaProxy enabled, and to avoid a potential misconfiguration. Instead, we dynamically calculate the serviceCIDR based on clusterIPs we already know, and it should be a subset of the real serviceCIDR used by K8s. But for historical reason, windows implementation didn't do it and we are going to unify the implementation via the PR @hongliangl mentioned.

hongliangl added a commit to hongliangl/antrea that referenced this issue Dec 13, 2022
Fix antrea-io#4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this issue Dec 13, 2022
Fix antrea-io#4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this issue Dec 13, 2022
Fix antrea-io#4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this issue Dec 13, 2022
Fix antrea-io#4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this issue Dec 14, 2022
Fix antrea-io#4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this issue Dec 14, 2022
Fix antrea-io#4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this issue Mar 9, 2023
Fix antrea-io#4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this issue Mar 10, 2023
Fix antrea-io#4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this issue Mar 10, 2023
Fix antrea-io#4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this issue Mar 10, 2023
Fix antrea-io#4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
hongliangl added a commit to hongliangl/antrea that referenced this issue Mar 10, 2023
Fix antrea-io#4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
tnqn pushed a commit that referenced this issue Mar 10, 2023
Fix #4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
luolanzone pushed a commit to luolanzone/antrea that referenced this issue Mar 24, 2023
Fix antrea-io#4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
luolanzone pushed a commit to luolanzone/antrea that referenced this issue Mar 24, 2023
Fix antrea-io#4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
luolanzone pushed a commit to luolanzone/antrea that referenced this issue Mar 24, 2023
Fix antrea-io#4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
luolanzone pushed a commit to luolanzone/antrea that referenced this issue Mar 27, 2023
Fix antrea-io#4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
luolanzone pushed a commit to luolanzone/antrea that referenced this issue Mar 27, 2023
Fix antrea-io#4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
luolanzone pushed a commit to luolanzone/antrea that referenced this issue Mar 28, 2023
Fix antrea-io#4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
tnqn pushed a commit that referenced this issue Mar 28, 2023
Fix #4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
tnqn pushed a commit that referenced this issue Mar 31, 2023
Fix #4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
tnqn pushed a commit that referenced this issue Mar 31, 2023
Fix #4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
tnqn pushed a commit that referenced this issue Mar 31, 2023
Fix #4467

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
jainpulkit22 pushed a commit to urharshitha/antrea that referenced this issue Apr 28, 2023
@tnqn tnqn added the reported-by/end-user Issues reported by end users. label Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. reported-by/end-user Issues reported by end users.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants