Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FQDN policy feature in Antrea-native policies #2613

Merged
merged 1 commit into from Aug 20, 2021

Conversation

GraysonWu
Copy link
Contributor

@GraysonWu GraysonWu commented Aug 18, 2021

Fixes issue #2611 .
This PR adds an fqdn field to the egress rules of Antrea-native policy. It can be used to restrict egress access to the Fully Qualified Domain Names, specified either by exact name or wildcard matchPatterns.

Supported formats for the fqdn field are:

  • Exact FQDNs, i.e. facebook.com, maps.google.com or db-svc.default.svc.cluster.local.
  • Wildcard expressions, i.e. *wayfair.com or *.edu.

The standard Allow, Drop and Reject actions apply to FQDN egress rules. In a single NetworkPolicyPeer, fqdn should not be specified together with pod/namespaceSelectors, ClusterGroups or IPBlocks.

Note that this feature is not a L7 policy and only works at L3/L4. DNS packets are intercepted and inspected only to help the fqdn controller cache resolved addresses. DNS response packets will only be dropped if the client's requested FQDN is matched by a FQDN policy rule, but the datapath reconciliation for that rule fails.

Added by Grayson Wu:

  1. Skip fqdnController init and related logic in reconcile.go when Antrea Policy is disabled.
  2. Use fake DNS server addr+port as dnsServerOverride when running UT fqdn_test and networkpolicy_test.

@GraysonWu GraysonWu force-pushed the fqdn-support branch 3 times, most recently from eb56a6a to 2387559 Compare August 19, 2021 00:25
@GraysonWu GraysonWu force-pushed the fqdn-support branch 3 times, most recently from a588fff to c7414cb Compare August 19, 2021 02:54
@codecov-commenter
Copy link

codecov-commenter commented Aug 19, 2021

Codecov Report

Merging #2613 (3632540) into main (1550786) will increase coverage by 5.14%.
The diff coverage is 76.93%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2613      +/-   ##
==========================================
+ Coverage   60.41%   65.56%   +5.14%     
==========================================
  Files         283      285       +2     
  Lines       22509    26453    +3944     
==========================================
+ Hits        13599    17343    +3744     
- Misses       7480     7531      +51     
- Partials     1430     1579     +149     
Flag Coverage Δ
e2e-tests 56.31% <70.25%> (?)
kind-e2e-tests 47.87% <68.00%> (+0.57%) ⬆️
unit-tests 41.69% <33.21%> (-0.32%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/nodeportlocal/k8s/npl_controller.go 64.09% <0.00%> (+4.78%) ⬆️
pkg/ovs/openflow/ofctrl_packetout.go 85.18% <38.46%> (-0.09%) ⬇️
pkg/controller/networkpolicy/validate.go 73.36% <50.00%> (+53.07%) ⬆️
pkg/ovs/openflow/ofctrl_packetin.go 59.70% <50.00%> (+10.49%) ⬆️
pkg/agent/openflow/client.go 70.06% <66.66%> (+11.74%) ⬆️
pkg/agent/openflow/network_policy.go 75.90% <71.92%> (-0.40%) ⬇️
pkg/apis/controlplane/v1beta1/conversion.go 72.51% <75.00%> (-11.83%) ⬇️
pkg/agent/controller/networkpolicy/fqdn.go 77.95% <77.95%> (ø)
pkg/agent/controller/networkpolicy/reconciler.go 77.40% <79.54%> (+0.64%) ⬆️
...ntroller/networkpolicy/networkpolicy_controller.go 71.70% <85.71%> (+2.05%) ⬆️
... and 286 more

@GraysonWu GraysonWu force-pushed the fqdn-support branch 2 times, most recently from 0cd120c to b6a574c Compare August 19, 2021 04:58
@GraysonWu
Copy link
Contributor Author

/test-all

@GraysonWu
Copy link
Contributor Author

/test-all

cmd/antrea-agent/config.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
for port := range addedPods {
addedOFAddrs = append(addedOFAddrs, openflow.NewOFPortAddress(port))
}
if err := f.ofClient.AddAddressToDNSConjunction(dnsInterceptRuleID, addedOFAddrs); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that if the call fails, the pod never gets another chance to be added to the conjunction.
We may have the same problem in the normal policy too. Please add a TODO for this defect. We should fix them later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at that and where this function is called from. addFQDNRule is called from computeOFRulesForAdd, which is a bit surprising to me because based on the name of the function and from previous usage, I would expect that this function compute things and doesn't actually install flows...

computeOFRulesForAdd already has this TODO:

// TODO: Handle the case that the following processing fails or partially succeeds.

I don't have good knowledge of this code but here is what I did:

  • log error returned by addFQDNRule instead of ignoring it
  • add a new TODO / comment at the addFQDNRule call site

pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/controller/networkpolicy/validate.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Show resolved Hide resolved
@antoninbas
Copy link
Contributor

@tnqn I tried to address all your comments

Look like there are some test failures I need to look at and I did to review the code myself

pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/reconciler.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn_test.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/fqdn_test.go Outdated Show resolved Hide resolved
test/e2e/k8s_util.go Outdated Show resolved Hide resolved
test/e2e/k8s_util.go Outdated Show resolved Hide resolved
tnqn
tnqn previously approved these changes Aug 20, 2021
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antoninbas
Copy link
Contributor

/test-all

@antoninbas
Copy link
Contributor

/test-windows-conformance
/test-windows-networkpolicy

@lzhecheng
Copy link
Contributor

/test-windows-networkpolicy

@GraysonWu
Copy link
Contributor Author

/test-windows-conformance
/test-windows-networkpolicy

@antoninbas
Copy link
Contributor

antoninbas commented Aug 20, 2021

There seems to be some issue with the Windows testbeds, which I believe are unrelated to this PR as we observe these issues consistently. I'll merge this PR but we should address the testbed issues before the 1.3 release so we can validate it on Windows. cc @lzhecheng

@antoninbas
Copy link
Contributor

/test-windows-conformance

@antoninbas
Copy link
Contributor

/test-all

@antoninbas
Copy link
Contributor

I root caused and addressed the Windows issue: the FQDN controller was looking for environment variables which were not defined on Windows. I added these environment variables when running the Agent as a DaemonSet using wins. I also added a fallback to the FQDN controller code (using the default local DNS resolver and the net package) for when the Agent is run as process. This can be revisited in the future if needed.

antoninbas
antoninbas previously approved these changes Aug 20, 2021
@GraysonWu
Copy link
Contributor Author

LGTM. Thanks for fixing this!

@antoninbas
Copy link
Contributor

All tests passing. Will squash manually on my laptop before I merge, in order to fix author information.

This PR adds an `fqdn` field to the egress rules of Antrea-native
policy. It can be used to restrict egress access to the Fully Qualified
Domain Names, specified either by exact name or wildcard matchPatterns.

Supported formats for the `fqdn` field are:

* Exact FQDNs, i.e. `facebook.com`, `maps.google.com` or
  `db-svc.default.svc.cluster.local`.
* Wildcard expressions, i.e. `*wayfair.com` or `*.edu`.

The standard `Allow`, `Drop` and `Reject` actions apply to FQDN egress
rules. In a single NetworkPolicyPeer, `fqdn` should not be specified
together with pod/namespaceSelectors, ClusterGroups or IPBlocks.

Note that this feature is not a L7 policy and only works at L3/L4. DNS
packets are intercepted and inspected only to help the fqdn controller
cache resolved addresses. DNS response packets will only be dropped if
the client's requested FQDN is matched by a FQDN policy rule, but the
datapath reconciliation for that rule fails.

Co-authored-by: Antonin Bas <abas@vmware.com>
Co-authored-by: Grayson Wu <wgrayson@vmware.com>

Signed-off-by: Antonin Bas <abas@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants