Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Windows][AntreaProxy] Fix route reconcile() #2939

Merged
merged 1 commit into from Oct 30, 2021

Conversation

lzhecheng
Copy link
Contributor

@lzhecheng lzhecheng commented Oct 27, 2021

listRoutes() only considers routes with antrea-gw0 as interface
so no need to consider kube-proxy Service routes. Also for AntreaProxy
Service routes, all of them should be returned and be compared with
desiredClusterIPSvcIPs.

NodePort and LoadBalancer Services also have ClusterIPs so the check
if Service is ClusterIP Service is unnecessary.

Signed-off-by: Zhecheng Li lzhecheng@vmware.com

@lzhecheng lzhecheng added area/proxy Issues or PRs related to proxy functions in Antrea kind/bug Categorizes issue or PR as related to a bug. labels Oct 27, 2021
@codecov-commenter
Copy link

codecov-commenter commented Oct 27, 2021

Codecov Report

Merging #2939 (b5819eb) into main (12f8395) will decrease coverage by 2.35%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2939      +/-   ##
==========================================
- Coverage   61.61%   59.25%   -2.36%     
==========================================
  Files         283      283              
  Lines       23810    23811       +1     
==========================================
- Hits        14671    14110     -561     
- Misses       7558     8198     +640     
+ Partials     1581     1503      -78     
Flag Coverage Δ
kind-e2e-tests 46.35% <0.00%> (-2.89%) ⬇️
unit-tests 40.59% <0.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...gent/controller/noderoute/node_route_controller.go 54.91% <0.00%> (+0.14%) ⬆️
...g/agent/apiserver/handlers/featuregates/handler.go 0.00% <0.00%> (-82.36%) ⬇️
...kg/apiserver/registry/system/supportbundle/rest.go 19.54% <0.00%> (-55.18%) ⬇️
pkg/support/dump_others.go 0.00% <0.00%> (-51.73%) ⬇️
pkg/support/dump.go 8.13% <0.00%> (-49.60%) ⬇️
...g/agent/apiserver/handlers/addressgroup/handler.go 0.00% <0.00%> (-40.00%) ⬇️
...agent/apiserver/handlers/appliedtogroup/handler.go 0.00% <0.00%> (-40.00%) ⬇️
pkg/apiserver/handlers/loglevel/handler.go 0.00% <0.00%> (-38.47%) ⬇️
pkg/agent/util/net.go 16.32% <0.00%> (-22.45%) ⬇️
pkg/agent/route/route_linux.go 28.48% <0.00%> (-21.45%) ⬇️
... and 31 more

wenyingd
wenyingd previously approved these changes Oct 27, 2021
Copy link
Contributor

@wenyingd wenyingd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lzhecheng
Copy link
Contributor Author

/test-windows-all /test-all

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what the PR fixes from the description. I can see it originally matched kube-proxy route and now it matches antrea-proxy route, but it doesn't say why we should skip the latter now. We don't expect listRoutes return service routes?

@lzhecheng
Copy link
Contributor Author

@tnqn The routes returned by listRoutes() are to be deleted so we should skip those AntreaProxy Service routes.

Before AntreaProxy change, the if statement in listRoutes() for kube-proxy routes is actually unnecessary. The reason is that the first if statement in listRoutes() checks if gateway is antrea-gw0 and kube-proxy routes are actually not.

@tnqn
Copy link
Member

tnqn commented Oct 28, 2021

@tnqn The routes returned by listRoutes() are to be deleted so we should skip those AntreaProxy Service routes.

Before AntreaProxy change, the if statement in listRoutes() for kube-proxy routes is actually unnecessary. The reason is that the first if statement in listRoutes() checks if gateway is antrea-gw0 and kube-proxy routes are actually not.

Then what's the point of the below code that checks svc IP and how will it delete stale Service routes?

func (c *Client) Reconcile(podCIDRs []string, svcIPs map[string]bool) error {
desiredPodCIDRs := sets.NewString(podCIDRs...)
routes, err := c.listRoutes()
if err != nil {
return err
}
for dst, rt := range routes {
if desiredPodCIDRs.Has(dst) {
c.hostRoutes.Store(dst, rt)
continue
}
if _, ok := svcIPs[dst]; ok {
c.hostRoutes.Store(dst, rt)
continue
}
err := util.RemoveNetRoute(rt)
if err != nil {
return err
}
}
return nil
}

@lzhecheng
Copy link
Contributor Author

@tnqn stale Service routes, flows should be removed here:

if needRemoval {

The svcIPs in your mentioned code block refer to this route:

// The route for virtual IP -> antrea-gw0 should be always kept.

So, route_windows.go should not remove routes related to Service.

@tnqn
Copy link
Member

tnqn commented Oct 29, 2021

  1. Route deletion triggered by AntreaProxy is normal workflow that agent receives the service deletion event. The point of Reconcile is to delete stale routes for the services that are deleted when the agent is not running.
  2. The code is self-contradictory: Reconcile takes all clusterIPs as the 2nd argument and uses it to exclude valid service routes but listRoutes doesn't even return any service routes. Why [Windows] Use antrea-proxy for Cluster IP Service traffic from Windows host #2235 even added the second argument?

listRoutes() only considers routes with antrea-gw0 as interface
so no need to consider kube-proxy Service routes. Also for AntreaProxy
Service routes, all of them should be returned and be compared with
desiredClusterIPSvcIPs.

NodePort and LoadBalancer Services also have ClusterIPs so the check
if Service is ClusterIP Service is unnecessary.

Signed-off-by: Zhecheng Li <lzhecheng@vmware.com>
@lzhecheng lzhecheng changed the title [Windows][AntreaProxy] Fix if statement about AntreaProxy Service routes [Windows][AntreaProxy] Fix route reconcile() Oct 29, 2021
@lzhecheng
Copy link
Contributor Author

@tnqn you're right. I let listRoutes() returns AntreaProxy Service routes as well.

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Oct 29, 2021

/test-all

@tnqn
Copy link
Member

tnqn commented Oct 29, 2021

/test-windows-all

@tnqn
Copy link
Member

tnqn commented Oct 29, 2021

@lzhecheng "jenkins-windows-proxyall-e2e" failed, could you check?

@lzhecheng
Copy link
Contributor Author

@tnqn it's not "failed" but not triggered. Currently, test-windows-all doesn't has proxyall-e2e.
/test-windows-proxyall-e2e /test-windows-conformance /test-windows-networkpolicy

@lzhecheng
Copy link
Contributor Author

@tnqn as your can see, current design is test-windows-all for proxyall disabled and test-windows-proxyall-all for proxyall enabled. Do you think it is better to include proxy-e2e in test-windows-all and remove test-windows-proxyall-all since no proxyall-conformance and proxyall-netpol?

https://github.com/antrea-io/antrea/blob/main/CONTRIBUTING.md#getting-your-pr-verified-by-ci

@tnqn
Copy link
Member

tnqn commented Oct 30, 2021

@lzhecheng Including proxyAll test in test-windows-all sounds good to me.

@tnqn tnqn merged commit 11d1370 into antrea-io:main Oct 30, 2021
@lzhecheng lzhecheng deleted the antreaproxy-win-fix branch November 1, 2021 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/proxy Issues or PRs related to proxy functions in Antrea kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants