-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: CNS/CNI async pod delete #2183
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i reviewed it early on, but can you ensure comments are addressed.
Adding requested documentation that fsnotify is widely used by k8s: it's also used by containerd to watch for the CNI conflists. |
/azp run |
Azure Pipelines successfully started running 3 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core structure and functionality lgtm. logs need to be improved (formatting is handled by the logger, not in the strings), we've added too many ways to configure this, and I have a bunch of suggestions in Go style and best practices
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
01cb08a
to
71ff61b
Compare
* azure-ipam changes for async delete * cilium cnsconfig change for tests * address comments: update config value and add log line * matching ipam changes to #2183 * include containerID in log msg * update addFile args * return on failure to add file * update go.mod
* initial changes for cni/cns delete deadlock * add logs and set watcher path * working fswatcher, removing extra debug lines * watcher changes for azure-ipam * remove additional logger from fsnotify and address comments * /deleteIDs directory as part of cnsconfig * add feature flag for async delete * adds some unit test + remove changes for azure-ipam(split pr, dependency conflicts) * update ut * update uts * swift configmap update * fix configmap for test * addressing comments * fix lint * adding cause to connection error struct * connectionerr lint * addressing comments, change watchfs to watcher method * add ctx to releaseIP func * log containerID in failure to add watcher, exit select if context is cancelled * fix logs in network.go after rebase * catch release ip error in invoker_cns.go * retry on failure to release ip * lint fix * rework asyncdelete watcher Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> * include podinterfaceID in file for releaseIP * close file before delete --------- Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> Co-authored-by: Evan Baker <rbtr@users.noreply.github.com>
* initial changes for cni/cns delete deadlock * add logs and set watcher path * working fswatcher, removing extra debug lines * watcher changes for azure-ipam * remove additional logger from fsnotify and address comments * /deleteIDs directory as part of cnsconfig * add feature flag for async delete * adds some unit test + remove changes for azure-ipam(split pr, dependency conflicts) * update ut * update uts * swift configmap update * fix configmap for test * addressing comments * fix lint * adding cause to connection error struct * connectionerr lint * addressing comments, change watchfs to watcher method * add ctx to releaseIP func * log containerID in failure to add watcher, exit select if context is cancelled * fix logs in network.go after rebase * catch release ip error in invoker_cns.go * retry on failure to release ip * lint fix * rework asyncdelete watcher Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> * include podinterfaceID in file for releaseIP * close file before delete --------- Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> Co-authored-by: Evan Baker <rbtr@users.noreply.github.com>
* initial changes for cni/cns delete deadlock * add logs and set watcher path * working fswatcher, removing extra debug lines * watcher changes for azure-ipam * remove additional logger from fsnotify and address comments * /deleteIDs directory as part of cnsconfig * add feature flag for async delete * adds some unit test + remove changes for azure-ipam(split pr, dependency conflicts) * update ut * update uts * swift configmap update * fix configmap for test * addressing comments * fix lint * adding cause to connection error struct * connectionerr lint * addressing comments, change watchfs to watcher method * add ctx to releaseIP func * log containerID in failure to add watcher, exit select if context is cancelled * fix logs in network.go after rebase * catch release ip error in invoker_cns.go * retry on failure to release ip * lint fix * rework asyncdelete watcher Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> * include podinterfaceID in file for releaseIP * close file before delete --------- Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> Co-authored-by: Evan Baker <rbtr@users.noreply.github.com>
) feat: CNS/CNI async pod delete (#2183) * initial changes for cni/cns delete deadlock * add logs and set watcher path * working fswatcher, removing extra debug lines * watcher changes for azure-ipam * remove additional logger from fsnotify and address comments * /deleteIDs directory as part of cnsconfig * add feature flag for async delete * adds some unit test + remove changes for azure-ipam(split pr, dependency conflicts) * update ut * update uts * swift configmap update * fix configmap for test * addressing comments * fix lint * adding cause to connection error struct * connectionerr lint * addressing comments, change watchfs to watcher method * add ctx to releaseIP func * log containerID in failure to add watcher, exit select if context is cancelled * fix logs in network.go after rebase * catch release ip error in invoker_cns.go * retry on failure to release ip * lint fix * rework asyncdelete watcher * include podinterfaceID in file for releaseIP * close file before delete --------- Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> Co-authored-by: Camryn Lee <31013536+camrynl@users.noreply.github.com>
commit ce5c12b Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Thu Oct 19 08:16:35 2023 -0700 ci: Replace manual install of kubectl with ADO KubectlInstaller task (#2307) * ci: remove kubectl install * ci: add KubectlInstaller for kubectl commit 86e5e13 Author: rjdenney <105380463+rjdenney@users.noreply.github.com> Date: Tue Oct 17 19:41:00 2023 -0400 Adding dualstack overlay support to azure-ipam plugin for Cilium (#2239) * dualstack cilium changes * remove comment * addressing comments and adding unit tests commit 0b45d15 Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Tue Oct 17 08:57:42 2023 -0700 ci: Enable multiple runs with the same commitID (#2292) * ci: add unique value based on pipeline runtime Signed-off-by: John Payne <89417863+jpayne3506@users.noreply.github.com> * ci: adjust cluster names * ci: hange commitID on load-test pipeline --------- Signed-off-by: John Payne <89417863+jpayne3506@users.noreply.github.com> commit e62e44e Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Sat Oct 14 18:51:23 2023 -0700 revert: Cilium load test RBAC (#2291) revert: cilium manifest directories Signed-off-by: John Payne <89417863+jpayne3506@users.noreply.github.com> commit 73d8bfb Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Fri Oct 13 14:06:59 2023 -0700 ci: Add NPM arm64 to CNI Load Test Pipeline (#2289) commit ae37d40 Author: Paul Yu <129891899+paulyufan2@users.noreply.github.com> Date: Fri Oct 13 15:08:58 2023 -0400 [CNI] zap logger for platform package (#2233) * zap logger for platform package commit a3ec127 Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Thu Oct 12 20:22:35 2023 -0700 ci: Update CNI Release Test pipeline (#2283) * ci: update restart node template * ci: parameterize dropgz image commit 6ddc44c Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Thu Oct 12 16:57:44 2023 -0700 ci: Add control through environment variables for CNI Load Test (#2277) * ci: conditional run logic * add: SCENARIO env var control * ci: change RUN to CNI * ci: move SCENARIO to job level * fix: change env vars to unique values * ci: condition logic and description commit 1190646 Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Wed Oct 11 10:31:15 2023 -0700 revert: Latest NPM for integration test (#2263) commit 131cf7a Author: aggarwal0009 <127549148+aggarwal0009@users.noreply.github.com> Date: Mon Oct 9 11:43:01 2023 -0700 Fix: Update pni scope to namespaced (#2282) update pni scope to namespaced commit f46a430 Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Mon Oct 9 11:40:53 2023 -0700 ci: Parameterize CNS image repository (#2280) ci: Parameterize CNS image repo commit 5866205 Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Fri Oct 6 14:03:00 2023 -0700 fix: change artifact naming convention (#2278) * fix: matching template artifacts * test: jobName Parameter * ci: capture all test namespaces commit f4dd79c Author: Hunter Gregory <42728408+huntergregory@users.noreply.github.com> Date: Thu Oct 5 07:20:25 2023 -0700 test(capz): [WIN-NPM] support containerd 1.7 filesystem (#2267) * style: whitespace in all NPM yamls * fix(capz): first try for containerd 1.7 filesystem change * fix(capz): remove kubeconfig arg and add comment commit 728dbb3 Author: Paul Yu <129891899+paulyufan2@users.noreply.github.com> Date: Wed Oct 4 23:07:07 2023 -0400 ci: v4overlay conformance test cases (#2274) v4overlay conformance test cases commit 7a5cb5e Author: Paul Johnston <35265851+pjohnst5@users.noreply.github.com> Date: Wed Oct 4 18:58:14 2023 -0600 chore: install ip-masq-agent as part of overlay cns scenarios (#2273) commit f061370 Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Wed Oct 4 13:44:35 2023 -0700 ci: Agent pool default set at stage level (#2272) commit ae0c08c Author: Paul Yu <129891899+paulyufan2@users.noreply.github.com> Date: Wed Oct 4 14:43:46 2023 -0400 [CNI] zap logger telemetry package (#2266) * zap logger telemetry package commit 64f01b2 Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Tue Oct 3 16:51:12 2023 -0700 ci: Add log template to PR and Load Test Pipeline (#2264) * Initial commit * add: log paths and label to priv. daemonset * add: log template * add: log template to load-test yamls * remove: kubeconfig calls * add: capture failed pods on job failure * add: Linux state files * add: Windows state files * style: change terminal output * add: log template to PR pipeline * fix: rebase * style: add comments to log-template * chore: Addressing Comments * add: sub-directories * ci: Only call log-template on fail for PR commit ad3329a Author: Paul Johnston <35265851+pjohnst5@users.noreply.github.com> Date: Tue Oct 3 12:16:37 2023 -0600 chore: must* functions should panic (#2268) commit 35a6f89 Author: Paul Yu <129891899+paulyufan2@users.noreply.github.com> Date: Mon Oct 2 18:22:42 2023 -0400 ci: dualstack overlay windows test cases (#2262) * dualstack overlay windows test cases commit e874736 Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Fri Sep 29 14:28:43 2023 -0700 ci: Add HNS restart to Windows CNIv1 in Load Test Pipeline (#2261) * add: HNS restart template * add: re-run all tests after HNS restart * style: add provided test coverage to template commit 4423a94 Author: jshr-w <144164353+jshr-w@users.noreply.github.com> Date: Thu Sep 28 20:17:36 2023 -0700 ci: add CNIv2 testing for different proc/OS (#2230) * ci: proc/os load tests * ci: update deprecated param * ci: update naming * ci: add other proc/os load tests * ci: add other proc/os load tests * remove ubuntu18 test * remove redundant test * fix template call * add infiniband test, arm binary * fix rdma node name, build * update RDMA node count for quota * force pipeline permissions * undo force permissions * remove hardcoding * add os as parameter for cniv1 * set default osSKU * merge variable name change * set default OSSKU * set default OSSKU --------- Signed-off-by: jshr-w <144164353+jshr-w@users.noreply.github.com> commit 9200af8 Author: Timothy J. Raymond <timraymond@users.noreply.github.com> Date: Thu Sep 28 15:36:15 2023 -0400 Add OpenAPI doucmentation to CNS (#1461) Add initial swagger documentation for CNS This adds documentation for a few of the endpoints in CNS. It's not exhaustive, but it's a place to start. commit 78a577c Author: Paul Yu <129891899+paulyufan2@users.noreply.github.com> Date: Wed Sep 27 19:36:48 2023 -0400 v4overlay windows test cases (#2187) * feat: adding in v4overlay windows tests * chore: address feedback * fix: addressing feedback --------- Co-authored-by: Paul Johnston <johnstonpaul801@gmail.com> commit a3e6682 Author: Evan Baker <rbtr@users.noreply.github.com> Date: Wed Sep 27 18:29:39 2023 -0500 submodule dependency updates (#2242) Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> commit f0021b5 Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Wed Sep 27 13:23:52 2023 -0700 fix: cluster template env vars (#2258) * fix: cluster templates * cherry-pick: 0c8106a * chore: addressing comments commit 982b794 Author: Jaeryn <13284103+jaer-tsun@users.noreply.github.com> Date: Wed Sep 27 11:49:43 2023 -0700 feat: update cns network container contract for swift 2 (#2250) Co-authored-by: Jaeryn <tsch@microsoft.com> Co-authored-by: Quang Nguyen <nguyenquang@microsoft.com> commit 6251874 Author: Evan Baker <rbtr@users.noreply.github.com> Date: Tue Sep 26 18:18:16 2023 -0500 fix: set cluster auto-upgrade and node auto-upgrade in the hackfile (#2253) Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> commit 0abdc9d Author: Vipul Singh <vipul21sept@gmail.com> Date: Tue Sep 26 14:53:56 2023 -0700 ci: removing the submodules pipeline as PR pipeline covers the checks (#2251) commit e4cefac Author: Evan Baker <rbtr@users.noreply.github.com> Date: Tue Sep 26 16:53:18 2023 -0500 Create initial contrbuting guides (#2244) * start creating contrbuting guides Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> * add CLA instructions Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> --------- Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> commit 53d114c Author: Vipul Singh <vipul21sept@gmail.com> Date: Tue Sep 26 13:33:58 2023 -0700 fix: skipping the k8s e2e unsupported service conformance test (#2255) commit f28fe96 Author: Hunter Gregory <42728408+huntergregory@users.noreply.github.com> Date: Mon Sep 25 17:11:52 2023 -0700 test: [NPM] skip cyclonus for test dir (#2247) Signed-off-by: Hunter Gregory <42728408+huntergregory@users.noreply.github.com> commit beeb66a Author: Paul Johnston <35265851+pjohnst5@users.noreply.github.com> Date: Mon Sep 25 15:40:26 2023 -0600 feat: installing windows cns if windows nodes present (#2246) commit 4943198 Author: Paul Yu <129891899+paulyufan2@users.noreply.github.com> Date: Mon Sep 25 11:59:56 2023 -0400 [CNI] zap logger migration for store package (#2231) * zap logger migration for store package commit 3c6bb62 Author: Evan Baker <rbtr@users.noreply.github.com> Date: Fri Sep 22 17:07:27 2023 -0700 proposal: redesign of CNS IPAM (#2013) proposal: redesign of cns ipam Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> commit 22bc8c7 Author: Rajvi <107083915+rajvinar@users.noreply.github.com> Date: Fri Sep 22 16:02:57 2023 -0700 fixing config.channelmode to execute go routine for Mellanox (#2240) Co-authored-by: Ashvin Deodhar <asdeodha@microsoft.com> commit 5fa9eda Author: rjdenney <105380463+rjdenney@users.noreply.github.com> Date: Fri Sep 22 17:44:37 2023 -0400 Adding overlay v4/dualstack conflists to Windows Dropgz (#2224) adding overlay v4/dualstack conflists to Windows commit 8760107 Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Fri Sep 22 13:52:31 2023 -0700 ci: Add CNIv1 Linux to Load Test Pipeline (#2241) * refactor: single cniv1 template * Add: handle multiple nodepools * Add: linux cniv1 validate * ci: move NPM|CNI integration * ci: Update subscription used * addressing comments commit ec519a5 Author: Paul Johnston <35265851+pjohnst5@users.noreply.github.com> Date: Fri Sep 22 10:18:30 2023 -0600 chore: refactor load test suite (#2229) commit a61940d Author: Ashish Nair <nairashu@gmail.com> Date: Thu Sep 21 19:50:29 2023 -0700 feat: Consume the NCStatus to be able to append subnet is full error to Pod IP reservation failures (#2202) * Added Network Conatiner Status to include the latest error code for a Network Container * Updated the crd to have the Status field included into the Network Container * Updated the names and added Status and ErrorText as two fields in NC Status * Fixed the casing and json values for these variables * Propagated the NC Status inside the CNS and IPAM Monitor pool states * Fixed the lint error of missing comma * Saved the updated NC Status into the CNS statefile * Updated the IP assignment to check and error out subnet is Full when there are no more available IPs for CNS to assign * Fixed a minor compilation issue * Fixed lint failures * Fixed lint failures * Removed the reference from the metastate of the ipam monitor * Added Update Success and Update Failed statuses to the NC Status to be able to clearly indicate response status inside the NNC from DNC-RC * Updated the error to use errors pkg instead of fmt * Updating the cns reconcillation logic to skip if there is a failure updating the NC and there are no IPs allocated for the NC * Handled PR comments: * Updated the code to have the NC status be part of the error directly so that it can be consumed by containerD and cx can perform actions on it. * Code update to not use dynamic slices. * Removed the logic which handled 0 IPs allocated to NNC in CNS reconcile Signed-off-by: GitHub <noreply@github.com> * Addressed the PR comment which helped delete a block of code to store ncIDs and also added more error codes to the NCStatus --------- Signed-off-by: GitHub <noreply@github.com> commit 1b22180 Author: Camryn Lee <31013536+camrynl@users.noreply.github.com> Date: Thu Sep 21 07:45:29 2023 -0700 CNI async delete after ReleaseIPAddress (#2232) add second check for releaseIP failure Co-authored-by: Quang Nguyen <nguyenquang@microsoft.com> commit 028e162 Author: Evan Baker <rbtr@users.noreply.github.com> Date: Wed Sep 20 14:58:21 2023 -0700 migrate to patch for NNC spec updates (#2188) migrate to patch for nnc spec updates commit 95cc2d6 Author: Evan Baker <rbtr@users.noreply.github.com> Date: Wed Sep 20 14:36:30 2023 -0700 chore: consolidate core dependencies upgrade (#2213) * ci: bump actions/checkout from 3 to 4 Bumps [actions/checkout](https://github.com/actions/checkout) from 3 to 4. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v3...v4) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * chore: root dependency updates Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 2791885 Author: Camryn Lee <31013536+camrynl@users.noreply.github.com> Date: Wed Sep 20 10:28:03 2023 -0700 feat: CNS/CNI async delete changes for azure-ipam (#2201) * azure-ipam changes for async delete * cilium cnsconfig change for tests * address comments: update config value and add log line * matching ipam changes to #2183 * include containerID in log msg * update addFile args * return on failure to add file * update go.mod commit c2c59db Author: Paul Johnston <35265851+pjohnst5@users.noreply.github.com> Date: Tue Sep 19 22:05:47 2023 -0600 chore: making installation of CNS daemonset shared (#2227) commit bad286b Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Tue Sep 19 18:40:51 2023 -0700 ci: Enable dropgz-test in PR pipeline (#2116) commit 246fee4 Author: Paul Yu <129891899+paulyufan2@users.noreply.github.com> Date: Tue Sep 19 19:02:28 2023 -0400 [CNI] restruct CNI logger (#2226) * small change for zap log restruct commit 4772008 Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Tue Sep 19 12:37:05 2023 -0700 ci: Add CNIv2 Linux to Load Test pipeline (#2141) * Initial Commit * Add sleep for swift cluster * Change NPM|CNI Integration * Addressing comments * Add: NPM continueOnError * Add: Generate logs for NPM * Change NPM Linux branch - long sleep 10s * refactor: linux validate * fix: rebase * Add: maxSkew for noop deployments * Add: Capture improper node restart * Add: Restart CNS case for Cilium * Addressing Comments * Add: Restart CNS template commit 4fa3bf4 Author: Camryn Lee <31013536+camrynl@users.noreply.github.com> Date: Mon Sep 18 21:04:38 2023 -0700 feat: CNS/CNI async pod delete (#2183) * initial changes for cni/cns delete deadlock * add logs and set watcher path * working fswatcher, removing extra debug lines * watcher changes for azure-ipam * remove additional logger from fsnotify and address comments * /deleteIDs directory as part of cnsconfig * add feature flag for async delete * adds some unit test + remove changes for azure-ipam(split pr, dependency conflicts) * update ut * update uts * swift configmap update * fix configmap for test * addressing comments * fix lint * adding cause to connection error struct * connectionerr lint * addressing comments, change watchfs to watcher method * add ctx to releaseIP func * log containerID in failure to add watcher, exit select if context is cancelled * fix logs in network.go after rebase * catch release ip error in invoker_cns.go * retry on failure to release ip * lint fix * rework asyncdelete watcher Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> * include podinterfaceID in file for releaseIP * close file before delete --------- Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> Co-authored-by: Evan Baker <rbtr@users.noreply.github.com> commit 44dc74e Author: Paul Yu <129891899+paulyufan2@users.noreply.github.com> Date: Fri Sep 15 20:14:44 2023 -0400 [CNI] Migrate network and platform package logging to zap (#2209) * network package zap logger * add zaplogger for platform commit c9adf9a Author: rjdenney <105380463+rjdenney@users.noreply.github.com> Date: Fri Sep 15 18:51:13 2023 -0400 Publishing our Windows dropgz version in linux dropgz manifest (#2218) * publishing windows dropgz * adding 2019 * adding os versions to test * fix arm * removing 2019 cni dropgz test commit 7d479b3 Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Wed Sep 13 11:00:58 2023 -0700 revert: makefile naming for cluster creation (#2214) * revert: cilium naming * Addressing Comments commit d01947f Author: rjdenney <105380463+rjdenney@users.noreply.github.com> Date: Tue Sep 12 11:49:17 2023 -0400 fix: updating CNI to v1.5.11 for dropgz (#2211) commit c281674 Author: Ramiro <64089641+ramiro-gamarra@users.noreply.github.com> Date: Fri Sep 8 16:15:45 2023 -0700 fix: serviceAccount must be specified at pod spec level in dual stack goldpinger deployments (#2208) service account needs to be specified at the pod spec level, not the container commit 61da686 Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Fri Sep 8 14:27:16 2023 -0700 ci: Increase timeout of datapath tests (#2206) commit 210fe86 Author: Ashish Nair <nairashu@gmail.com> Date: Fri Sep 8 11:39:17 2023 -0700 Feat: Created a Network Container Status Section to be updated with the latest error code (#2193) * Added Network Conatiner Status to include the latest error code for a Network Container * Updated the crd to have the Status field included into the Network Container * Updated the names and added Status and ErrorText as two fields in NC Status * Fixed the casing and json values for these variables * Added error code to the NC Status and removed the latest prefix from the varibale names * Removed the timestamp variable from the NC Status * Moved the Status object inside the NC Status to be able to accurately define the status of each NC for the node * Changed to having an enum representing the NC Status which DNC-RC will update after inferring the error and CNS can use this field to propagate and NCRequest failures * Made the validation of the new enum optional to keep it backward compatible commit a90a77e Author: Paul Yu <129891899+paulyufan2@users.noreply.github.com> Date: Fri Sep 8 14:36:19 2023 -0400 restruct zap logger for CNI (#2184) * restruct zap logger commit 666f36c Author: Ramiro <64089641+ramiro-gamarra@users.noreply.github.com> Date: Thu Sep 7 20:39:08 2023 -0700 CNS - Ensuring no stale NCs during NNC reconcile (#2192) * ensuring no stale ncs during nnc reconcile * only save state if mutated * ensuring we only remove stale ncs if none of their ips are assigned commit 1360e02 Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Wed Sep 6 18:00:52 2023 -0700 ci: Changing Service Connection for PR pipeline (#2153) * ci: change service connection * add: change ACN PR azureSubscription commit 23e37aa Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Wed Sep 6 11:43:24 2023 -0700 ci: Increase vmSize for dualstack overlay (#2203) fix: Increase vmSize for dualstack overlay Signed-off-by: John Payne <89417863+jpayne3506@users.noreply.github.com> commit e102891 Author: Hunter Gregory <42728408+huntergregory@users.noreply.github.com> Date: Fri Sep 1 11:25:14 2023 -0700 test(cyclonus): [WIN-NPM] fix consistent failure from not sleeping long enough (#2174) * test(cyclonus): [WIN-NPM] fix consistent failure from not sleeping long enough * build: fix syntax * fix: specify namespace in kubectl * fix: wait timeout=5m Signed-off-by: Hunter Gregory <42728408+huntergregory@users.noreply.github.com> --------- Signed-off-by: Hunter Gregory <42728408+huntergregory@users.noreply.github.com> commit 3f313ad Author: Evan Baker <rbtr@users.noreply.github.com> Date: Wed Aug 30 15:54:48 2023 -0500 fix: add OS=windows to manifest-add args (#2194) due to a change in buildx, we need to explicitly pull the windows variant of the windows image that we have previously built. Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> commit 0b743ac Author: Matthew Long <61910737+thatmattlong@users.noreply.github.com> Date: Mon Aug 28 16:11:12 2023 -0700 fix: don't delete an existing conflist (#2115) commit 1116f7e Author: aggarwal0009 <127549148+aggarwal0009@users.noreply.github.com> Date: Mon Aug 28 12:46:55 2023 -0700 Update MTPNC scope (#2186) * update mtpnc scope * update mtpnc scope to Namespaced commit 9ace4b7 Author: Evan Baker <rbtr@users.noreply.github.com> Date: Mon Aug 28 14:45:36 2023 -0500 change total IPs and add secondary IP metric (#2172) * change total IPs and add secondary IP metric Updates the Total IPs metrics to include the NC Primary IP in the total. Adds a Secondary IPs metric which holds the value that the Total IPs previously held: NC Secondary IPs known to CNS which could be used by Pods. Signed-off-by: GitHub <noreply@github.com> * update help wording on IPAM metrics Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> * reword PrimaryIP metric help Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> --------- Signed-off-by: GitHub <noreply@github.com> Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> commit 1adf24a Author: Evan Baker <rbtr@users.noreply.github.com> Date: Mon Aug 28 13:50:01 2023 -0500 Add MTPNC reconciler for cache population in Swift V2 (#2164) add mtpnc watcher Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> commit c93109a Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Fri Aug 25 17:47:05 2023 -0700 ci: Bypass Dualstack E2E on cluster creation failure (#2185) * test: continue on fail * test: continueOnError * test: succeed cluster creation * revert commented job commit d0559f8 Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Fri Aug 25 10:00:19 2023 -0700 ci: Improve CreateValidator and restartNetworkCmd reliability (#2181) * ci: Improve CreateValidator and restartNetworkCmd reliability * lint fix * Addressing comments commit 2a468c4 Author: rjdenney <105380463+rjdenney@users.noreply.github.com> Date: Fri Aug 25 12:29:32 2023 -0400 fix: updating CNI to v1.5.10 for dropgz (#2168) change CNI version to v1.5.10 for new dropgz commit 216cc23 Author: aggarwal0009 <127549148+aggarwal0009@users.noreply.github.com> Date: Fri Aug 25 09:08:14 2023 -0700 update swiftv2 crds scope (#2176) commit e789e04 Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Thu Aug 24 10:55:23 2023 -0700 ci: update CNS|dropgz version (#2169) * ci: update CNS|dropgz version * Revert cns daemonset to acnpublic commit b1c2508 Author: Paul Yu <129891899+paulyufan2@users.noreply.github.com> Date: Wed Aug 23 21:49:27 2023 -0400 ci: add node and CNS restart test case in dualstack (#2135) * add noderestart test cases commit e5d97bb Author: Diego Becerra <47841864+debecerra@users.noreply.github.com> Date: Wed Aug 23 15:53:46 2023 -0700 Add NotFound error handling to NMAgent client (#2163) This PR makes some minor changes to the NMAgent client package as a part of a larger work item. This commit adds a NotFound method to the error returned by the NMAgent client. It also adds special handling to treat a 400 BadRequest as a NotFound when returned by the DeleteNetwork API call, since this case should be interpreted as a NotFound by the caller. commit 35c6833 Author: aggarwal0009 <127549148+aggarwal0009@users.noreply.github.com> Date: Wed Aug 23 14:22:58 2023 -0700 Ankaggar/consolidate crds (#2171) * organise related swiftv2 CRDs in same GV package * fix linter error * update crdgen * consolidate external swiftv2 crds * fix manifests * more changes * update crdgen * resolve merge conflicts * fix crdgen check failure * further flatten swiftv2 crds * fix UT failure commit e767b15 Author: Quang Nguyen <nguyenquang@microsoft.com> Date: Wed Aug 23 16:04:05 2023 -0400 feat: CNS writes SWIFT conflist (#2110) * cns writes swift conflist * gofumpt ed * var naming * add swift scenario to switch commit 06e3877 Author: Evan Baker <rbtr@users.noreply.github.com> Date: Wed Aug 23 10:58:52 2023 -0500 Upgrade controller-runtime (#2162) * upgrade controller-runtime Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> * update build tools and regen crds Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> * fix import conflict Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> --------- Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> commit b5440dd Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Tue Aug 22 16:46:49 2023 -0700 fix: Change cluster name for cilium nightly pipeline (#2167) commit ef30552 Author: John Payne <89417863+jpayne3506@users.noreply.github.com> Date: Mon Aug 21 16:55:35 2023 -0700 fix: Change argument in load-test template (#2136) * fix: Change argument in load-test template * Addressing Comments commit ec9d41e Author: Paul Yu <129891899+paulyufan2@users.noreply.github.com> Date: Mon Aug 21 15:33:10 2023 -0400 hotfix for adding duplicated routes (#2161) commit 1b2a04a Author: Evan Baker <rbtr@users.noreply.github.com> Date: Mon Aug 21 11:56:43 2023 -0500 feat: stub CNS Pod watcher (#2112) * feat: cns watches pods Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> * indirect pod reconcile for more dynamic behavior Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> --------- Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> commit d17079b Author: Evan Baker <rbtr@users.noreply.github.com> Date: Fri Aug 18 16:49:32 2023 -0500 docs: proposal for async pod delete handling (#2138) * docs: proposal for async pod delete handling Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> * Update docs/feature/async-delete/readme.md Co-authored-by: Tyler Lloyd <tyler@ikq.io> Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> * reword based on PR feedback Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> --------- Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> Co-authored-by: Tyler Lloyd <tyler@ikq.io> commit 626e16c Author: Paul Yu <129891899+paulyufan2@users.noreply.github.com> Date: Fri Aug 18 15:24:55 2023 -0400 ci: add node and CNS restart test case for v4overlay cluster (#2152) * add node restart test case for v4overlay cluster commit 0062d04 Author: tamilmani1989 <tamanoha@microsoft.com> Date: Fri Aug 18 09:14:20 2023 -0700 fix: Remove NLM_F_EXCL flag from Netlink Delete call (#2150) * fix: Remove unix.NLM_F_EXCL from Netlink Delete Route api call unix.NLM_F_EXCL is not expected to set in netlink delete route calls. It's no-op in older kernel and didnt return error. From kernel 5.19+, new flag NLM_F_BULK was defined with same value and serves a purpose in delete route call. This changes breaks azure cni and netlink calls fails in 5.19 kernel and onwards. The fix is to remove setting unix.NLM_F_EXCL in netlink delete route request. * fix: Remove unix.NLM_F_EXCL from Netlink Delete Route api call unix.NLM_F_EXCL is not expected to set in netlink delete route calls. It's no-op in older kernel and didnt return error. From kernel 5.19+, new flag NLM_F_BULK was defined with same value and serves a purpose in delete route call. This changes breaks azure cni and netlink calls fails in 5.19 kernel and onwards. The fix is to remove setting unix.NLM_F_EXCL in netlink delete route request. * Add unit tests for netlink add/delete address and add/delete routes commit ba689eb Author: Evan Baker <rbtr@users.noreply.github.com> Date: Thu Aug 17 14:09:56 2023 -0500 add swift v2 config based on node label (#2144) * add swift v2 config based on node label Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> * add tentative swiftv2 label Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> --------- Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> commit 50422dc Author: aggarwal0009 <127549148+aggarwal0009@users.noreply.github.com> Date: Thu Aug 17 11:54:18 2023 -0700 [Multitenancy]: Add PodNetwork field to MTPNC spec (#2151) * Add PN to MTPNC spec * add annotation * remove omitempty tag commit 4aecfd6 Author: Vipul Singh <vipul21sept@gmail.com> Date: Thu Aug 17 10:31:43 2023 -0700 ci: add packages for submodule trigger (#2154) commit 9a3c50c Author: Behzad Mirkhanzadeh <behzadm@microsoft.com> Date: Wed Aug 16 17:16:55 2023 -0700 perf: 🚀 Increase CNI Lock Timeout to 30 seconds for Linux AKS. (#2101) * Increase Linux Lock Timout to 30 seconds. * addressing the comment by adding a new lock constant for Linux. # Conflicts: # .pipelines/cni/cilium/nightly-release-test.yml
Reason for Change:
This change is to avoid deadlock when a node is fully saturated, and CNS is down.
CNI stores the intended deletes (named by containerIDs) in a filesystem when their IPs cannot be released because CNS is unavailable. The filesystem watcher starts up when CNS starts, and will remove any existing items in the directory. It will continue to check for new files to delete as CNS runs.
Issue Fixed:
Requirements:
Notes:
tested with byocni swift, byocni cilium, and overlay + windows nodepool
Split into 2 PRs, azure-ipam changes will come after this change is merged due to dependency conflicts.
Async delete documentation here.