Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: reconcile initial state from CRD regardless of existing podInfo #2022

Merged
merged 3 commits into from
Jun 22, 2023

Conversation

rbtr
Copy link
Contributor

@rbtr rbtr commented Jun 16, 2023

Reason for Change:

Fix for bug where CNS may incorrectly initialize after a Node reboot

Issue Fixed:

A bug in CNS initialization can lead to an IP being leaked and Assigned to Pods even though it has been deallocated.

  • CNS starts to scale down and marks the IP as ToBeDeleted for DNC to deallocate
  • the Node reboots / DNC deallocates the IP (in parallel)
  • CNS restarts. During CNS initialization, since there are no Pods running, an initialization step is skipped, and the ToBeDeleted IP is incorrectly marked as Available (it is now leaked)
  • the leaked IP appears Available. As Pods are started on the Node it is incorrectly Assigned to a Pod
  • normal IP pool scaling operations start to fail due to the presence of the leaked IP

Requirements:

Notes:

@rbtr rbtr force-pushed the fix/pending-ip-init branch 3 times, most recently from 6ea3d8e to 6687fb0 Compare June 19, 2023 16:58
@rbtr rbtr marked this pull request as ready for review June 19, 2023 23:02
@rbtr rbtr requested a review from a team as a code owner June 19, 2023 23:02
@rbtr rbtr requested a review from thatmattlong June 19, 2023 23:02
@neaggarwMS neaggarwMS changed the title fix: reconcile initial state from CRD regardless of existing podInfo [CNS] fix: reconcile initial state from CRD regardless of existing podInfo Jun 19, 2023
@rbtr rbtr self-assigned this Jun 20, 2023
@rbtr rbtr added cns Related to CNS. fix Fixes something. labels Jun 20, 2023
@rbtr rbtr changed the title [CNS] fix: reconcile initial state from CRD regardless of existing podInfo fix: reconcile initial state from CRD regardless of existing podInfo Jun 20, 2023
@rbtr rbtr enabled auto-merge (squash) June 20, 2023 17:29
@rbtr rbtr requested review from matmerr and a team as code owners June 20, 2023 18:46
rbtr added 3 commits June 22, 2023 10:14
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
@rbtr rbtr disabled auto-merge June 22, 2023 19:11
@rbtr rbtr merged commit 21b5bbe into Azure:master Jun 22, 2023
@rbtr rbtr deleted the fix/pending-ip-init branch June 22, 2023 19:11
}
return errors.Wrap(err, "failed to initialize CNS state")
}, retry.Context(ctx), retry.Delay(initCNSInitalDelay), retry.MaxDelay(time.Minute))
logger.Printf("Reconciling initial CNS state")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add podInfoProvider stat in the log? or add at least length

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is logged once we get to reconcileNCState

jaer-tsun pushed a commit to jaer-tsun/azure-container-networking that referenced this pull request Jul 17, 2023
…zure#2022)

* fix: reconcile initial state from CRD regardless of existing podInfo

Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>

* fix: add test for no state and pending release in NNC

Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>

* fix: add test for restoring state and pending release in NNC

Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cns Related to CNS. fix Fixes something.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants