Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AddressGroup memberKey not updated on pod IP update #1808

Merged
merged 2 commits into from
Feb 3, 2021

Conversation

Dyanngg
Copy link
Contributor

@Dyanngg Dyanngg commented Feb 1, 2021

This PR fixes #1807.

PR #1467 unified GroupMemberPod and GroupMember into a single struct. Before the unification, PodReference is not set for GroupMemberPods used in AddressGroups, but set for GroupMemberPods used in AppliedToGroups. In #1467, In order to identify whether a GroupMember contains Pod or EE when converting it to GroupMemberPod (for backward compatibility), PodRef was added to GroupMembers all the time instead.

This leads to two separate issues. First issue is #1587, where agents running v1beta1 do not expect PodReference in addressGroup members. This was fixed by @tnqn thru not including podRef in addressGroup conversion function #1586.

Second issue is #1807, where addressGroup updates do not propagate to agents once the Pod IPs are changed after Node restart. This is because if the PodRef is always set for GroupMembers, the memberKey will always be namespaced name, and the key does not change on Pod IP updates.

To fix the second issue, this PR modifies the way normalizeGroupMember function is implemented. The new hashed key of a GroupMember will include all the fields in a GroupMember if that field is set. Namely, for GroupMembers used in AppliedToGroups, the members will have Pod/ExternalEntityReference set. These members will be identified by namespaced-name. For GroupMembers used in AddressGroups, the members will have BOTH Pod/ExternalEntityReference and IPs set. These members will be identified by entity reference + IPs.

For this change to be backward compatible, we need to ensure that across Antrea controller upgrade, all AddressGroups have an updated name. Otherwise, on the old side, AddressGroup update events will not work as expected, since the GroupMember key for the same GroupMembers will be different. This will lead to the same issue pointed out in #1587, where the same addressGroup member IPs gets deleted after being added.

Hence, even if two addressGroups have the same members (different keys) before and after controller upgrade, the Antrea agent needs to treat it as AddressGroup add and delete events, rather than an update. To that end, the controller uses a new uuidNamespace to ensure that for the AddressGroups/AppliedToGroups with same members, distinct group names will be generated across the update.

In addition, this PR also fixes a potential issue in Antrea agent in terms of calculating AddressGroup updates in reconciler (this was the issue which led to #1587, but wasn't address then, because in upgrade scenarios agent code cannot be patched). When calculating the added and deleted addresses in an AddressGroup, only IPs should be taken into account. Performing a diff on GroupMembers and then take the diff IPs is error-prone when the agent and controller have different hash functions for GroupMembers.

@codecov-io
Copy link

codecov-io commented Feb 2, 2021

Codecov Report

❗ No coverage uploaded for pull request base (main@d84c470). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1808   +/-   ##
=======================================
  Coverage        ?   53.36%           
=======================================
  Files           ?      193           
  Lines           ?    16435           
  Branches        ?        0           
=======================================
  Hits            ?     8770           
  Misses          ?     6572           
  Partials        ?     1093           
Flag Coverage Δ
e2e-tests 45.33% <0.00%> (?)
kind-e2e-tests 49.57% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

@Dyanngg Dyanngg changed the title Fix GroupMember entity reference mix-up in AppliedTo and AddressGroups [WIP] Fix GroupMember entity reference mix-up in AppliedTo and AddressGroups Feb 2, 2021
@Dyanngg Dyanngg force-pushed the fix-group-member-podref branch 2 times, most recently from ee57d60 to 450cb03 Compare February 2, 2021 05:27
@Dyanngg Dyanngg changed the title [WIP] Fix GroupMember entity reference mix-up in AppliedTo and AddressGroups Fix GroupMember entity reference mix-up in AppliedTo and AddressGroups Feb 2, 2021
@Dyanngg Dyanngg changed the title Fix GroupMember entity reference mix-up in AppliedTo and AddressGroups Fix AddressGroup memberKey not updated on pod IP update Feb 2, 2021
tnqn
tnqn previously approved these changes Feb 2, 2021
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the quick fix.

pkg/apis/controlplane/sets.go Outdated Show resolved Hide resolved
@Dyanngg
Copy link
Contributor Author

Dyanngg commented Feb 2, 2021

/test-all

@Dyanngg Dyanngg requested a review from tnqn February 2, 2021 07:54
tnqn
tnqn previously approved these changes Feb 2, 2021
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I forgot to mention we could have an e2e case for this scenario. I think it can reproduced by killing sandbox container of a Pod.

abhiraut
abhiraut previously approved these changes Feb 2, 2021
pkg/apis/controlplane/sets.go Show resolved Hide resolved
@antoninbas
Copy link
Contributor

LGTM, I forgot to mention we could have an e2e case for this scenario. I think it can reproduced by killing sandbox container of a Pod.

sounds like a unit test which ensures that the controller sends an update when the IP address for the Pod changes would be sufficient. Do we have such a test in the PR already?

@Dyanngg Dyanngg dismissed stale reviews from abhiraut and tnqn via cf04f4d February 2, 2021 21:57
@Dyanngg
Copy link
Contributor Author

Dyanngg commented Feb 2, 2021

sounds like a unit test which ensures that the controller sends an update when the IP address for the Pod changes would be sufficient. Do we have such a test in the PR already?

Thanks for the suggestion. Just added such test in addressgroup_test.go and confirmed that the case would fail without this patch and pass with the PR.

@Dyanngg Dyanngg requested a review from tnqn February 2, 2021 22:08
@Dyanngg Dyanngg requested a review from abhiraut February 2, 2021 22:08
@Dyanngg
Copy link
Contributor Author

Dyanngg commented Feb 2, 2021

/test-all

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the quick fix and for adding the test

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn tnqn merged commit af069fa into antrea-io:main Feb 3, 2021
@Dyanngg Dyanngg deleted the fix-group-member-podref branch February 3, 2021 03:40
tnqn pushed a commit to tnqn/antrea that referenced this pull request Feb 3, 2021
* Fix identical GroupMember key across pod IP update

* Add UT and improve agent AddressGroup update logic
antoninbas pushed a commit to antoninbas/antrea that referenced this pull request Feb 10, 2021
* Fix identical GroupMember key across pod IP update

* Add UT and improve agent AddressGroup update logic
antoninbas pushed a commit to antoninbas/antrea that referenced this pull request Feb 11, 2021
* Fix identical GroupMember key across pod IP update

* Add UT and improve agent AddressGroup update logic
antoninbas pushed a commit that referenced this pull request Feb 11, 2021
* Fix identical GroupMember key across pod IP update

* Add UT and improve agent AddressGroup update logic
antoninbas pushed a commit that referenced this pull request Feb 11, 2021
* Fix identical GroupMember key across pod IP update

* Add UT and improve agent AddressGroup update logic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AddressGroup is not updated when Pod IP changes due to Node restart
6 participants