Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

chore: Update calico to 3.5 and allow calico to work with azure CNI #454

Merged
merged 1 commit into from Apr 1, 2019

Conversation

song-jiang
Copy link
Contributor

@song-jiang song-jiang commented Feb 7, 2019

Reason for Change:

  • Upgrade calico to v3.5

  • Allow calico to work with azure CNI by setting
    NetworkPolicy: "calico" , NetworkPlugin: "azure"
    In this case, calico per-host daemon works with azure CNI plugin and azure IPAM to support network policy.

  • Continue support kubenet network plugin by setting
    NetworkPolicy: "calico" , NetworkPlugin: "kubenet"

Issue Fixed:

Requirements:

Notes:

@codecov
Copy link

codecov bot commented Feb 7, 2019

Codecov Report

Merging #454 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master    #454   +/-   ##
======================================
  Coverage    73.3%   73.3%           
======================================
  Files         130     130           
  Lines       17708   17708           
======================================
  Hits        12981   12981           
  Misses       4026    4026           
  Partials      701     701

@CecileRobertMichon
Copy link
Contributor

@jackfrancis should we hold this until we cut the release?

/hold

@jackfrancis
Copy link
Member

Yes. We want to run this through the appropriate back-channel cluster configuration tests appropriate for this surface area.

@jackfrancis
Copy link
Member

I mean to say that this doesn't need to go in ASAP with the next aks-engine release, so we're on the same page. :)

@jackfrancis
Copy link
Member

@song-jiang Can you add more description about the differences between NetworkPlugin: "calico+azureipam" and NetworkPlugin: "azure" in a calico context?

mv $CNI_BIN_DIR/10-azure.conflist $CNI_CONFIG_DIR/
chmod 600 $CNI_CONFIG_DIR/10-azure.conflist
if [[ "${NETWORK_POLICY}" == "calico" ]] || [[ "${NETWORK_POLICY}" == "" ]]; then
sed -i 's#"mode":"bridge"#"mode":"transparent"#g' $CNI_CONFIG_DIR/10-azure.conflist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to change the Azure CNI configuration here? This bridge mode setting will now apply to non-calico scenarios given the || [[ "${NETWORK_POLICY}" == "" ]] condition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose for this change is to make life easier for users who want to apply calico for network policy on clusters previously installed without specifying a network policy provider.
My understanding is it should not affect any user experience if they choose a non-calico network policy provider.

I am not sure if user really care about using a bridge mode network or a transparent mode network. Would love to hear your input on this. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sharmasushant @tamilmani1989 can you comment on this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default we should have bridge mode and not transparent mode. To my understanding, the config should be set to transparent mode only if user speicifes calico policy. @sharmasushant can comment on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamilmani1989 I'm running this through some final smoke tests. Please comment if we still have open concerns about this change before merging.

@jackfrancis
Copy link
Member

@sharmasushant @tamilmani1989 Could you kindly read through this PR description and comment on @song-jiang's description of the difference between NetworkPolicy: "calico" , NetworkPlugin: "azure" and NetworkPolicy: "calico" , NetworkPlugin: "calico+azureipam".

Based on the reading through the changeset, the "calico+azureipam" flavor is the one that represents the current implementation of calico + Azure CNI. NetworkPolicy: "calico" , NetworkPlugin: "azure" appears to be a novel configuration/implementation.

Any thoughts?

@sharmasushant
Copy link
Contributor

sharmasushant commented Feb 16, 2019

@jackfrancis
We should not change the existing defaults.
The newly added "transparent" mode in azure cni is only supported with calico policy, and should not be default.

The supported default mode for azure cni is "bridge", and default policy should be "azure" as well.

@song-jiang
Copy link
Contributor Author

@jackfrancis @sharmasushant @tamilmani1989 Thanks for your input and clarifications. Apologies for my misunderstanding. I will make changes to my PR accordingly.

@song-jiang
Copy link
Contributor Author

@jackfrancis Could you take another look?

Copy link
Member

@tamilmani1989 tamilmani1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@song-jiang as per email conversation with tigera, we agreed upon azure cni +azure ipam + calico network policy. NetworkPlugin: "calico+azureipam" - this is something new (calico cni +azure ipam) that we didn't talk about over the email thread.

@@ -169,7 +169,7 @@ installClearContainersRuntime() {
}

installNetworkPlugin() {
if [[ "${NETWORK_PLUGIN}" = "azure" ]]; then
if [[ "${NETWORK_PLUGIN}" = "azure" ]] || [[ "${NETWORK_PLUGIN}" = "calico+azureipam" ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is calico+azureipam? I may be missing something here. Can you please explain us which scenario this is used?

@feiskyer
Copy link
Member

@sharmasushant @tamilmani1989 Could you kindly read through this PR description and comment on @song-jiang's description of the difference between NetworkPolicy: "calico" , NetworkPlugin: "azure" and NetworkPolicy: "calico" , NetworkPlugin: "calico+azureipam".

Same questions here.

@song-jiang
Copy link
Contributor Author

@tamilmani1989 "NetworkPlugin: calico+azureipam, NetworkPolicy: calico" is the replacement of current implementation of "NetworkPlugin: azure, NetworkPolicy: calico" which uses calico CNI plugin and azure IPAM.

We leave that as an option for anyone who might be using it and continue wish to do so. I will check with PM on email conversations made earlier.

@jackfrancis
Copy link
Member

@song-jiang I have a preference for simply moving the implementation forward. If we have good reason to change the implementation, we should stand by that opinion, rather than offering two implementations that basically do the same thing, which I would argue is confusing for users.

@song-jiang
Copy link
Contributor Author

@jackfrancis Agree it is less confusing to have just one option. Let me check with PM and come back to you.

@ahrkrak
Copy link
Member

ahrkrak commented Mar 4, 2019

@song-jiang I think we have agreed on this with PM - can you confirm and give @jackfrancis an idea of when you're likely to get this update done?

@acs-bot acs-bot added size/S and removed size/XXL labels Mar 4, 2019
@song-jiang song-jiang changed the title chore: Update calico to 3.5 and allow calico to work with azure CNI [WIP] chore: Update calico to 3.5 and allow calico to work with azure CNI Mar 4, 2019
@song-jiang
Copy link
Contributor Author

@ahrkrak Yes. It has been confirmed with PM we will drop option calico+azureipam. Another changes I need to put in is to disable azure-cni-networkmonitor daemonset if azure CNI set to transparent mode. I expect to get PR updated by Wednesday. cc @jackfrancis

@acs-bot acs-bot added size/XXL and removed size/S labels Mar 5, 2019
@song-jiang song-jiang changed the title [WIP] chore: Update calico to 3.5 and allow calico to work with azure CNI chore: Update calico to 3.5 and allow calico to work with azure CNI Mar 5, 2019
@CecileRobertMichon
Copy link
Contributor

@song-jiang please rebase on latest master to get up to date unit test coverage stats (the last rebase seems to be from 8 days ago).

@@ -704,6 +704,10 @@ func Test_Properties_ValidateNetworkPluginPlusPolicy(t *testing.T) {
networkPlugin: "azure",
networkPolicy: "flannel",
},
{
networkPlugin: "azure",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this combination is already in networkPluginPlusPolicyAllowed in validate.go while this is adding it to the unit test expecting it to not be valid. Unless I'm missing something, it seems contradictory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. My mistake. Thanks!

@song-jiang
Copy link
Contributor Author

@tamilmani1989 It is unlikely pods on same host have mixed veth prefixes. During upgrade process, either calico CNI (old node) or azure CNI (new node) will be used. However, what I found is we could have a scenario where new node gets old version of calico-node ( prefix cali) and azure CNI (prefix azv). This would break network policy.

So current status on upgrading from existing (AKS or bare-metal) cluster with my PR

  • network connection functions as normal.
  • network policy is broken.

I am working on possible solutions to fix this. cc @jackfrancis @sauryadas

@dtzar
Copy link
Contributor

dtzar commented Mar 19, 2019

Don't want to delay this PR! Would be nice to see Calico at 3.5.3 or 3.6.0 to have the latest fixes/functionality, ensure all works, and avoid another PR to update the daemonset yamls.

@feiskyer
Copy link
Member

network policy is broken.

Is this fixed now?

@palma21 palma21 moved this from In progress to Under Review in backlog Mar 21, 2019
@song-jiang
Copy link
Contributor Author

@feiskyer Not yet. It is still in progress.

@ahrkrak
Copy link
Member

ahrkrak commented Mar 28, 2019

We have agreed an upgrade strategy that will work, but needs to be documented. @jackfrancis is picking up ownership of the docs changes.
@jackfrancis will you add those docs changes into this PR or a separate one?

@jackfrancis
Copy link
Member

@ahrkrak filed this as a follow-up:

#908

@jackfrancis
Copy link
Member

/azp run pr-e2e

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jackfrancis
Copy link
Member

smoke tests checked out

lgtm

@tamilmani1989 for sign off on changes as well

@tamilmani1989
Copy link
Member

tamilmani1989 commented Mar 28, 2019

@song-jiang @ahrkrak
If they follow this doc #908 , they won't end up in this issue during upgrade?

network policy is broken.

can you please confirm?

@ahrkrak
Copy link
Member

ahrkrak commented Mar 29, 2019

Correct. Currently #908 just includes a reference to the yaml that you have to kubectl delete. That removes any older calico-nodes, which will then get updated by the addon manager to the newer version. @jackfrancis will update the actual docs to include the howto instructions for doing that.

@jackfrancis
Copy link
Member

/lgtm

@acs-bot acs-bot added the lgtm label Apr 1, 2019
@jackfrancis jackfrancis merged commit 4b9a5bd into Azure:master Apr 1, 2019
backlog automation moved this from Under Review to Done Apr 1, 2019
@welcome
Copy link

welcome bot commented Apr 1, 2019

Congrats on merging your first pull request! 🎉🎉🎉

@acs-bot
Copy link

acs-bot commented Apr 1, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, song-jiang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
backlog
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

10 participants