-
Notifications
You must be signed in to change notification settings - Fork 526
feat: Antrea plugin support in AKS Engine #2407
Conversation
💖 Thanks for opening your first pull request! 💖 We use semantic commit messages to streamline the release process. Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix. Examples of commit messages with semantic prefixes: - |
/assign @jackfrancis |
a03afb8
to
8aa48b8
Compare
parts/k8s/containeraddons/kubernetesmasteraddons-antrea-daemonset.yaml
Outdated
Show resolved
Hide resolved
@@ -141,6 +141,7 @@ aks-engine generate --set agentPoolProfiles[0].count=5,agentPoolProfiles[1].name | |||
|
|||
* To enable the optional network policy enforcement using calico, you have to set the parameter during this step according to this [guide](../topics/features.md#optional-enable-network-policy-enforcement-using-calico) | |||
* To enable the optional network policy enforcement using cilium, you have to set the parameter during this step according to this [guide](../topics/features.md#optional-enable-network-policy-enforcement-using-cilium) | |||
* To enable the optional network policy enforcement using antrea, you have to set the parameter during this step according to this [guide](../topics/features.md#optional-enable-network-policy-enforcement-using-antrea) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this sounds more like a policy only mode.. however, the aim of this patch is for Antrea network plugin and not just policy, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is for policy mode only. For networking there is separate doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct.. what i was looking for was a way to ensure that we do not convey the message that antrea can be used in a policy only with other networking cni and that there should be some validation.. Jack answered that in his comment regarding validation. so we should be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually 3 combination were added in the validation.
- Just NetworkPlugin, Without any Policy. If any policy is specified other than antrea, its a validation error
- Just NetworkPolicy. Without any NetworkPlugin. If network plugin is specified other than antrea, its a validation error
- Both Network and Policy plugin as antrea.
We would remove option 1. Hope it make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed option 1 as a valid option following the recent rationalization of cilium, which follows a similar pattern.
Hello again, I've been enumerating through the existing addons and in my refactor of this one it appears we're in a similar functional space as antrea: Specifically, it appears that like the existing cilium implementation, antrea delivers both IPAM and a NetworkPolicy functionality, and those will be represented by Is that correct? If so, I would advocate that we deliver the antrea specification as follows:
In other words, we want to always deliver a cluster configuration that looks like:
And we permit the user only specifying the networkPolicy part, automatically filling in the antrea networkPlugin configuration for him/her. But we don't permit just an antrea networkPlugin cluster configuration, because that may mislead the user into thinking that we support just an antrea CNI implementation for IPAM without NetworkPolicy, which is in fact not true. Does that all sound right? |
@jackfrancis your suggestion makes sense to me. We do not support a model to use Antrea for IPAM and connectivity only, but use other CNI for NetworkPolicy. |
@jackfrancis Would you update the patch for validation error? Your suggestion sounds good to me too. |
8aa48b8
to
7634ece
Compare
Hi @reachjainrahul and folks, I've rebased this PR and done a bit of cleanup so it resembles other addons (specifically cilium). I have a passing E2E test, so I think this is ready to merge. However, I am seeing this on my cluster:
Note the 8 restarts for both the controller-manager and scheduler pods. Both seem to be exhibiting a similar failure symptom:
Have you seen this in any of your tests? As you can imagine, pod scheduling/reconciliation is significantly de-optimized on a cluster like this, as the key componentry responsible for that is regularly offline. |
/azp run pr-e2e |
Azure Pipelines successfully started running 1 pipeline(s). |
@@ -312,6 +312,11 @@ ensureKubelet() { | |||
sleep 3 | |||
done | |||
{{end}} | |||
{{if HasAntreaNetworkPolicy}} | |||
while [ ! -f /etc/cni/net.d/10-antrea.conf ]; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: if the daemonset implementation ever changes the name lf this CNI config file, we'll have to update this file wait implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Codecov Report
@@ Coverage Diff @@
## master #2407 +/- ##
==========================================
+ Coverage 72.63% 72.69% +0.06%
==========================================
Files 130 130
Lines 24005 24096 +91
==========================================
+ Hits 17435 17517 +82
- Misses 5544 5553 +9
Partials 1026 1026 |
readOnly: true | ||
- command: | ||
- start_ovs | ||
image: antrea/antrea-ubuntu:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should perhaps pin this to a version and not latest? antrea/antrea-ubuntu:v0.2.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed an immutable reference would be preferable. Doesn't have to block merge, though, at this phase.
Also, the references to the images are here:
https://github.com/Azure/aks-engine/pull/2407/files#diff-bb9b6cb6f08a63800be959ba49eaf714R42
"env": { | ||
}, | ||
"options": { | ||
"allowedOrchestratorVersions": ["1.13", "1.14", "1.15", "1.16"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reachjainrahul do @abhiraut do we know for sure that antrea doesn't work w/ k8s 1.17?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in theory yes, it should work.. however let us update this only after we run some ci tests against 1.17? @jianjuns any opinion?
/azp run pr-e2e |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run pr-e2e |
Azure Pipelines successfully started running 1 pipeline(s). |
pkg/engine/params_k8s.go
Outdated
@@ -122,6 +121,7 @@ func assignKubernetesParameters(properties *api.Properties, parametersMap params | |||
// Kubernetes node binaries as packaged by upstream kubernetes | |||
// example at https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.11.md#node-binaries-1 | |||
addValue(parametersMap, "windowsKubeBinariesURL", kubernetesConfig.WindowsNodeBinariesURL) | |||
addValue(parametersMap, "kubeServiceCidr", kubernetesConfig.ServiceCIDR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this revert? Antrea requires ServiceCIDR. Adding here will set it for Windows VM only, right ???
pkg/engine/template_generator.go
Outdated
"HasAntreaNetworkPlugin": func() bool { | ||
return cs.Properties.OrchestratorProfile.KubernetesConfig.NetworkPlugin == NetworkPluginAntrea | ||
"HasAntreaNetworkPolicy": func() bool { | ||
return cs.Properties.OrchestratorProfile.KubernetesConfig.NetworkPlugin == NetworkPolicyAntrea |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldnt this be NetworkPolicy check ?
@@ -1166,7 +1159,7 @@ | |||
"kubeConfigCertificate": "[parameters('kubeConfigCertificate')]", | |||
"kubeConfigPrivateKey": "[parameters('kubeConfigPrivateKey')]", | |||
"kubeDnsServiceIp": "10.0.0.10", | |||
"kubeServiceCidr": "[parameters('kubeServiceCidr')]", | |||
"kubeServiceCidr": "10.0.0.0/16", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this revert intentional?
pkg/api/k8s_versions.go
Outdated
@@ -39,6 +39,10 @@ const ( | |||
ciliumCleanStateImageReference string = "docker.io/cilium/cilium-init:2018-10-16" | |||
ciliumOperatorImageReference string = "docker.io/cilium/operator:v1.4" | |||
ciliumEtcdOperatorImageReference string = "docker.io/cilium/cilium-etcd-operator:v2.0.5" | |||
antreaControllerImageReference string = "antrea/antrea-ubuntu:latest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Antrea 0.2.0 is released today. Could we please update this to antrea/antrea-ubuntu:v0.2.0
I didnt see this in my test which I ran 2 weeks ago. @jianjuns any known issue with latest antrea? my antrea yaml referred to master. |
/azp run pr-e2e |
Commenter does not have sufficient privileges for PR 2407 in repo Azure/aks-engine |
I ran E2E test with Antrea 0.2.0 and didnt see the issue.
|
/azp run pr-e2e |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Congrats on merging your first pull request! 🎉🎉🎉 |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jackfrancis, reachjainrahul The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Rashmi/ciprod12042019 (#3) improvement: Update Containermonitoring addon for december release (#1) * Fixing yaml indentation issues for omsagent (#4) Fixing yaml indentation issues for omsagent * Fixing more indentation issues in omsagent yaml (#5) Fixing more indentation issues in omsagent yaml * updating templates_generated file * Fix: Updating generated file to fix unit tests (#6) fix: Updating generated file to fix unit tests * Adding test coverage for 1.16 and 1.17 (#7) Adding test coverage for 1.16 and 1.17 * Merging changes for omsagent latest release - ciprod01072020 after syncing from remote master(#8) * chore: use go template comments for generate proxy certs script (#2336) * fix: fix ARM dependency issues with vm user-specified extensions on node pools (#2398) * fix: fix ARM dependency issues if many extensions are specified for a node profile * fix scale up case for windows vhd case. (#2483) * refactor: make cilium addon user-configurable (#2480) * refactor: make cilium addon user-configurable * chore: clarify that cilium doesn't work w/ 1.16 and above, add validation * test: addons UT * test: go template UT * ci: use Standard_D8_v3 for cilium test, only run NetworkPolicy tests * fix: error message language * chore: remove debug fmt.Println * ci: revert back to Standard_D2_v3 * chore: upgrade cni-plugins to v0.7.6 (#2484) * fix: hard-coding hyper-v generation when using VHD URls as a quick unblock (#2487) * feat: Configuring docker log rotation for Windows nodes (#2478) * feat: Antrea plugin support in AKS Engine (#2407) * Antrea plugin support in AKS Engine * chore: clean up * chore: use ContainerImage * chore: generated code * refactor: Updating antrea yaml to 0.2.0 Co-authored-by: Jack Francis <jackfrancis@gmail.com> * chore: lint (#2493) * test: revert change to default kubernetes.json api model example (#2494) * chore: update cloud-provider-azure components to v0.4.0 (#2473) * chore: update cloud-provider-azure components to v0.4.0 See https://github.com/kubernetes-sigs/cloud-provider-azure/releases/tag/v0.4.0 * refactor: strip MCR constant to base hostname of URL * fix: fetch Azure cloud-manager images from /oss/kubernetes/ * refactor: make audit-policy and azure-cloud-provider addons user-configurable (#2496) * chore: pre-pull k8s v1.15.7-azs (#2490) * fix: Fix some path handling in collect-windows-logs script (#2488) * docs: remove mentions of old orchestrators (#2501) * chore: Targeting dec patches for windows VHD (#2505) * refactor: move StorageClass into azure-cloud-provider addon (#2497) * add "Standard_DS3_v2" to "AcceleratedNetworking" supported list (#2509) * ci: collect logs during E2E runs (#2520) * refactor: user-configurable flannel and scheduled maintenance addons (#2517) * chore: update Azure NPM to v1.0.31 (#2521) * feat: add support for Kubernetes 1.18.0-alpha.1 (#2503) * feat: add support for Kubernetes 1.18.0-alpha.1 See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.18.md#changelog-since-v1170 * test: add 1.18 to Jenkinsfile * ci: disable kms for 1.18 * chore: move flannel 1.18 spec to containeraddons * chore: generated code * fix: use new cloudprovider implementation for 1.18 Co-authored-by: Jack Francis <jackfrancis@gmail.com> * test: don't test non-working >= 1.16 flannel + docker (#2524) * fix: apply new master node labels for k8s v1.18+ compatibility (#2467) * fix: apply new master node labels for k8s v1.18+ compatibility * test: check master labels in the future for back-compat * feat: cleaning up old kubelet/kubeproxy logs for Windows nodes (#2504) * feat: cleaning up old kubelet/kubeproxy logs for Windows nodes * Fixing path to look for logs * generated files * refactor: standardize to "addons", deprecate "containeraddons" (#2525) * fix: configure addons before setting kubelet config (#2513) * chore: update addon-resizer (#2527) See https://github.com/kubernetes/autoscaler/releases/tag/addon-resizer-1.8.7 * fix: aci-connector region is ignored (#2535) * test: use LOCATION env var for api model in E2E tests (#2542) * fix: promote system addons to system-cluster-critical (#2533) * test: use northeurope for byok testing (#2536) * Changes for omsagent-version-ciprod01072020 * Committing generated file Co-authored-by: Jack Francis <jackfrancis@gmail.com> Co-authored-by: Mark Rossetti <marosset@microsoft.com> Co-authored-by: Rohit <rjaini@microsoft.com> Co-authored-by: Rahul Jain <58573065+reachjainrahul@users.noreply.github.com> Co-authored-by: Matt Boersma <Matt.Boersma@microsoft.com> Co-authored-by: Javier Darsie <44655727+jadarsie@users.noreply.github.com> Co-authored-by: Patrick Lang <PatrickLang@users.noreply.github.com> Co-authored-by: Wenjun Wu <wenjun.wu@live.com> Co-authored-by: Jaeryn <13284103+jaer-tsun@users.noreply.github.com> Co-authored-by: Anish Ramasekar <anish.ramasekar@gmail.com> * deleting github merge auto-generated files * Adding back 1.17 omsagent yaml changes * Updating generated file to address build failures Co-authored-by: Jack Francis <jackfrancis@gmail.com> Co-authored-by: Mark Rossetti <marosset@microsoft.com> Co-authored-by: Rohit <rjaini@microsoft.com> Co-authored-by: Rahul Jain <58573065+reachjainrahul@users.noreply.github.com> Co-authored-by: Matt Boersma <Matt.Boersma@microsoft.com> Co-authored-by: Javier Darsie <44655727+jadarsie@users.noreply.github.com> Co-authored-by: Patrick Lang <PatrickLang@users.noreply.github.com> Co-authored-by: Wenjun Wu <wenjun.wu@live.com> Co-authored-by: Jaeryn <13284103+jaer-tsun@users.noreply.github.com> Co-authored-by: Anish Ramasekar <anish.ramasekar@gmail.com>
Reason for Change:
Add Antrea networking and policy plugin in AKS engine. https://github.com/vmware-tanzu/antrea
Issue Fixed:
Requirements:
Notes: