Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller support for ExternalEntity in Antrea NetworkPolicy #1084

Merged
merged 6 commits into from
Sep 10, 2020

Conversation

Dyanngg
Copy link
Contributor

@Dyanngg Dyanngg commented Aug 13, 2020

This PR adds the following:

  • externalEntitySelector in ingress/egress of AntreaNetworkPolicy
  • Agent ExternalEntity support @suwang48404 (Add ExternalEntity support on agent side. #782)
  • Controller enqueues internalNP/appliedToGroups/addressGroups on ExternalEntity updates
  • Type GroupMember which will be used to unify Pod and ExternalEntity in appliedToGroups/addressGroups

@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-whole-conformance: to trigger all conformance tests on linux.
  • /skip-whole-conformance: to skip all conformance tests on linux.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-windows-networkpolicy: to trigger windows networkpolicy tests.
  • /skip-windows-networkpolicy: to skip windows networkpolicy tests.
  • /test-hw-offload: to trigger ovs hardware offload test.
  • /skip-hw-offload: to skip ovs hardware offload test.
  • /test-all: to trigger all tests (except whole conformance).
  • /skip-all: to skip all tests (except whole conformance).

@Dyanngg Dyanngg force-pushed the anp-ctrl-support branch 6 times, most recently from ace946b to c52a330 Compare August 14, 2020 17:50
@Dyanngg Dyanngg requested a review from abhiraut August 14, 2020 17:56
@Dyanngg Dyanngg force-pushed the anp-ctrl-support branch 4 times, most recently from 5d816d0 to 0f1e8fb Compare August 26, 2020 23:13
@Dyanngg Dyanngg changed the title [WIP] Controller support for AntreaNetworkPolicy Controller support for AntreaNetworkPolicy and ExternalEntities Aug 26, 2020
@Dyanngg Dyanngg requested review from abhiraut, suwang48404, tnqn, jianjuns and antoninbas and removed request for abhiraut August 26, 2020 23:23
@Dyanngg Dyanngg changed the title Controller support for AntreaNetworkPolicy and ExternalEntities Controller support for AntreaNetworkPolicy and ExternalEntity Aug 26, 2020
@Dyanngg
Copy link
Contributor Author

Dyanngg commented Aug 27, 2020

/test-all /test-hw-offload

}
internalNetworkPolicy := &antreatypes.NetworkPolicy{
Name: np.Name,
Namespace: np.Namespace,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thought of this when I was trying to support metrics for all 3 kinds of NetworkPolicies: How did we differentiate K8s NetworkPolicy and Antrea NetworkPolicy that have same names? I know their UIDs are different but in most places we are assuming namespace+name can identify an object at any point of time like K8s does.

Copy link
Contributor Author

@Dyanngg Dyanngg Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Antrea NetworkPolicy will have a float Priority value while for K8s NetworkPolicy that field will be nil

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main problem is when they are converted to internal NetworkPolicies they will override each other and generate update event instead of a new ADD event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given k8s NP is subset of ANP and if we cannot resolve this naming conflict easily via implementation, perhaps we could have a feature flag that let controller listens to KNP or ANP, but not both?

Copy link
Contributor

@abhiraut abhiraut Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the internal network policies keys are UIDs no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianjuns Yes.
For cloud VM and BM case, I assume a Node is associated to a Namespace and do not require configuration of other Namespaces, at least for internal NetworkPolicies, right? If this is the case, we may keep the original Namespace.
I don't know other ways to isolate a Node when RBAC is used, as the authorization review is done by K8s API, we only know the result that whether a client is permitted to access a resource and don't know its association with Nodes or Namespaces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may have discussed this on one of the meetings, on read/watch, any node can retrieve all NP in the same namespace, this is because NP CRD are shared across multiple nodes, and is computed at run-time.

on the other hand, each node will be given a unique role with which the node may use to access CRDs specific to itself, such as VIrtualMachine, VirtualMachineStatus, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tnqn I think you meant internal network policy and group resources will not be Namespaced? Then how we controller agent to access the resources, unless we introduce some filtering in controller itself?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree.. i don't think we should eliminate the Namespace for internal network policy.. setting the name field of networkpolicy with UID should suffice.. and still be able to use the same keyFunc .. no? .. of course need some additional reference to store the name for other purposes.. also this reminds me our AddressGroups and AppliedToGroups are not namespaced .. same issue for agents in BM/VM?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I think let us discuss the namespace/RBAC problems separately. It is mainly that the cloud use case increases our complexity; otherwise, ExternalEntity/Node and internal policy/group should just be cluster scope.

@Dyanngg Dyanngg force-pushed the anp-ctrl-support branch 2 times, most recently from e109aa4 to 29c8898 Compare August 27, 2020 21:55
@Dyanngg Dyanngg changed the title Controller support for AntreaNetworkPolicy and ExternalEntity Controller support for ExternalEntity in AntreaNetworkPolicy Aug 28, 2020
@Dyanngg
Copy link
Contributor Author

Dyanngg commented Sep 2, 2020

/test-all /test-hw-offload

@Dyanngg
Copy link
Contributor Author

Dyanngg commented Sep 2, 2020

/test-all /test-hw-offload

@Dyanngg
Copy link
Contributor Author

Dyanngg commented Sep 3, 2020

/test-hw-offload /test-networkpolicy /test-windows-networkpolicy

abhiraut
abhiraut previously approved these changes Sep 4, 2020
Copy link
Contributor

@abhiraut abhiraut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments have been addressed

@abhiraut
Copy link
Contributor

abhiraut commented Sep 4, 2020

@tnqn @jianjuns can you guys also take another look? thanks!

pkg/apis/controlplane/sets.go Outdated Show resolved Hide resolved
@@ -749,7 +748,7 @@ func groupPodsByServices(services []v1beta1.Service, pods v1beta1.GroupMemberPod
resolvedServices := make([]v1beta1.Service, len(services))
for podKey, pod := range pods {
for i := range services {
resolvedServices[i] = *resolveService(&services[i], pod)
resolvedServices[i] = *resolveService(&services[i], *pod.ToGroupMember())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we keep the original function for performance consideration? creating a new GroupMember struct when resolving every service with every pod and making a copy (because it uses struct as the argument instead of its pointer) might introduce some overhead.

Copy link
Contributor Author

@Dyanngg Dyanngg Sep 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to use pointer in resolveService. As for the comment about extra struct, do you think we need to make separate functions for groupMember and groupMemberPod? Or the current code is good

Copy link
Member

@tnqn tnqn Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the benchmark test BenchmarkGroupPodsByServicesWithNamedPort to measure the difference.

# Convert Pod to GroupMember first, then call resolveService
BenchmarkGroupPodsByServicesWithNamedPort-4           26          43131306 ns/op        13643059 B/op     401963 allocs/op
# A separate resolveService for Pod
BenchmarkGroupPodsByServicesWithNamedPort-4           34          34344858 ns/op         8844193 B/op     301968 allocs/op

I think it's worth to avoid the conversion as this function is called quite frequently for named port case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the benchmarking result. Updated.

pkg/agent/controller/networkpolicy/reconciler.go Outdated Show resolved Hide resolved
pkg/apis/controlplane/v1beta1/types.go Show resolved Hide resolved
pkg/controller/networkpolicy/networkpolicy_controller.go Outdated Show resolved Hide resolved
pkg/controller/networkpolicy/networkpolicy_controller.go Outdated Show resolved Hide resolved
pkg/controller/networkpolicy/networkpolicy_controller.go Outdated Show resolved Hide resolved

klog.V(2).Infof("Processing ExternalEntity %s/%s DELETE event, labels: %v", ee.Namespace, ee.Name, ee.Labels)
// Find all AppliedToGroup keys which match the Pod's labels.
appliedToGroupKeys := n.filterAddressGroupsForPodOrExternalEntity(ee)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed updating this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Thanks for the catch.

pkg/controller/networkpolicy/networkpolicy_controller.go Outdated Show resolved Hide resolved
}
}
} else if groupSelector.NamespaceSelector != nil {
// All the Pods and EEs from Namespaces matching the nsSelector must be selected.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? If I create a K8s NetworkPolicy that allows to all Pods in all Namespaces, both PodSelector and ExternalEntitySelector will be nil, but EEs shouldn't be selected?

Copy link
Contributor Author

@Dyanngg Dyanngg Sep 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting this. Discussed with @abhiraut and agreed the selector semantics should be compatible with K8s. So for any clause that has a namespaceSelector and nothing else, it will select all PODS in that namespace. If the intent is indeed to select all externalEntities in namespace, user need to explicitly specify something like

- namespaceSelector:
      matchLabels:
         project: myproject
   externalEntitySelector: {}

The code is updated accordingly. @suwang48404 FYI.

@Dyanngg
Copy link
Contributor Author

Dyanngg commented Sep 9, 2020

/test-all

@Dyanngg
Copy link
Contributor Author

Dyanngg commented Sep 9, 2020

/test-hw-offload

1 similar comment
@Dyanngg
Copy link
Contributor Author

Dyanngg commented Sep 9, 2020

/test-hw-offload

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have a question about the inconsistency between API schema and implementation, otherwise LGTM.

}
for _, ep := range member.Endpoints {
for _, port := range ep.Ports {
if port.Name == service.Port.StrVal && port.Protocol == *service.Protocol {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation reminds me the discussion #774 (comment).
Apparently we treat a member as the unit of resolving named ports, then does it still make sense to have separate namedPorts for each endpoint? Asking because the struct GroupMember is also inconvenient to reuse for dual stack case as there will be two IPs, it doesn't sound proper to only set namedports to one of them while setting both will be redundant. Since in implementation we assume the namedPorts are shared for all endpoints, should we just make the struct like the following?

type GroupMember struct {
	Pod *PodReference
	ExternalEntity *ExternalEntityReference
	IPs []IPAddress
	Ports []NamedPort
}

I don't mean to address it in this PR but want to make the struct and implementation consistent before it's released.
@jianjuns @suwang48404 @abhiraut @Dyanngg @wenyingd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed your proposal is simpler.

Do you think one Pod/VM can have multiple interfaces that have different namedPort?
Maybe it does not make sense.

Question - does this change (GroupMember) break API compatibility? Will it impact K8s NetworkPolicy of an existing cluster?
If we need to make further changes, let us do it in one release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know if for VMs it makes sense to have multiple interfaces that have different namedPort. However, with the current externalEntity CRD, each endpoint does have its own list of ports specifications. Using single list of namedPort means we need to somehow flatten the lists, or simply use the first endpoint's ports as the port spec fotrthe group member? @suwang48404 for more insights here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianjuns
my 2 cents,

  1. I feel the current declaration is more flexible in dealing with niche requirements.
  2. my understanding is that group member is already part of internal ANP between controller and agent, so agent should understand group members in internal AP OK. In what aspect do u envision this would break k8s NP on existing cluster? Mismatch controller/agent version? Upgraded controller/agent cannot execute K8s NetworkPolicy? ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like i missed that conversation during my PTO at that time. I initially wanted to keep it flexible, but i really did not find a good use case to keep it that way. @suwang48404 do you find a use case for named port per interface? if not, then may be lets switch the use of named port in both GroupMember and the ExternalEntity type and in this release itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abhiraut @jianjuns if we were to go down that path, we need to change the EE CRD definition as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@susanwu88

  1. But if one VM always has the same namedports, we will have to always duplicate them for all interfaces.
  2. Yes, in upgrade, controller and agent can be at different versions. We need to make sure the API is compatible between controller and agent, or support multiple versions of API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding to upgrade, considering,

  1. This PR does not change groupmembership in AddressGrp/AppliedTogroup as it is already merged, so it won't have impact on CRD wire format compatibility.
  2. My understanding is that group member is not processed by agent at all, so even if group member is sent to agent, it won't have impact on agent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianjuns regarding to namedPort, I feel like this is personal preference, and one can argue both ways. I hope it is not show stopper as the impact could be far reaching because it does change the on wire CRD format.

Copy link
Contributor

@jianjuns jianjuns Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This PR does not change groupmembership in AddressGrp/AppliedTogroup as it is already merged, so it won't have impact on CRD wire format compatibility.

My question is not for this PR. I know the change is made earlier. But we must handle it if it affects upgrade from a previous version or to a future version.

I think no one wants to stop this PR for multiple design questions discussed over review. But I hope to get the decided changes in Controller API done in this release, to avoid upgrade issues later. @Dyanngg @abhiraut: please collect all suggestions discussed.

@jianjuns
Copy link
Contributor

@Dyanngg : I assume this PR is ready to merge? But please collect all suggestions and we can have a followup PR to address them.

@abhiraut
Copy link
Contributor

@Dyanngg : I assume this PR is ready to merge? But please collect all suggestions and we can have a followup PR to address them.

created an issue to track #1227

@Dyanngg
Copy link
Contributor Author

Dyanngg commented Sep 10, 2020

@Dyanngg : I assume this PR is ready to merge? But please collect all suggestions and we can have a followup PR to address them.

Yes. Will keep the discussion going in #1227

@Dyanngg Dyanngg merged commit 2456a20 into antrea-io:master Sep 10, 2020
GraysonWu pushed a commit to GraysonWu/antrea that referenced this pull request Sep 22, 2020
…-io#1084)

* Agent supports ExternalEntity.

This commit moves ToAddresses/FromAddresses in CompletedRule and AddressSetByGroup in ruleCache
to use GroupMemberSet instead of GroupMmeberPodSet. Thus both Pods and ExternalEntities are
expressed as GroupMember when in these fields.

Pods in appliedTo field continue be expressed by existing GroupMemberPod, and migration to GroupMember shall
be done in a subsequent PR.

* Add support for ANP and externalEntities in controller

* Unify functions for Pod and ExternalEntity in networkpolicy controller

* Improve UT coverage

* Address comments

* Resolve more comments

Co-authored-by: Su Wang <suw@vmware.com>
GraysonWu pushed a commit to GraysonWu/antrea that referenced this pull request Sep 23, 2020
…-io#1084)

* Agent supports ExternalEntity.

This commit moves ToAddresses/FromAddresses in CompletedRule and AddressSetByGroup in ruleCache
to use GroupMemberSet instead of GroupMmeberPodSet. Thus both Pods and ExternalEntities are
expressed as GroupMember when in these fields.

Pods in appliedTo field continue be expressed by existing GroupMemberPod, and migration to GroupMember shall
be done in a subsequent PR.

* Add support for ANP and externalEntities in controller

* Unify functions for Pod and ExternalEntity in networkpolicy controller

* Improve UT coverage

* Address comments

* Resolve more comments

Co-authored-by: Su Wang <suw@vmware.com>
@Dyanngg Dyanngg deleted the anp-ctrl-support branch September 30, 2020 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants