Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

policy: Adjust existing policy for visibility annotations #16258

Merged
merged 4 commits into from Sep 2, 2021

Conversation

jrajahalme
Copy link
Member

@jrajahalme jrajahalme commented May 21, 2021

Adjust and expand eBPF policy map keys and values to redirect for
visibility on the port of the visibility annotation while still
denying traffic on this port for identities for which the traffic is
denied.

Datapath lookup order is, from highest to lowest precedence:

  1. L3/L4
  2. L4-only (wildcard L3)
  3. L3-only (wildcard L4)
  4. Allow-all

This means that the L4-only allow visibility key can only be added if
there is an allow-all key, and all L3-only deny keys are expanded to
L3/L4 keys. If no L4-only key is added then also the L3-only allow
keys need to be expanded to L3/L4 keys for visibility redirection. In
addition the existing L3/L4 and L4-only allow keys need to be
redirected to the proxy port, if not already redirected.

The above can be accomplished by:

  1. Change existing L4-only ALLOW key on matching port that does not already
    redirect to redirect.
    • e.g., 0:80=allow,0 -> 0:80=allow,
  2. If allow-all key exists, add L4-only visibility redirect key if the L4-only
    key does not already exist.
    • e.g., 0:0=allow,0 -> add 0:80=allow, if 0:80 key does not exist
      • this allows all traffic on port 80, but see step 5 below.
  3. Change all L3/L4 ALLOW keys on matching port that do not already redirect to
    redirect.
    • e.g, :80=allow,0 -> :80=allow,
  4. For each L3-only ALLOW key add the corresponding L3/L4 ALLOW redirect if no
    L3/L4 key already exists and no L4-only key already exists and one is not added.
    • e.g., :0=allow,0 -> add :80=allow, if :80
      and 0:80 do not exist and 0:80 was not added
  5. If a new L4-only key was added: For each L3-only DENY key add the
    corresponding L3/L4 DENY key if no L3/L4 key already exists
    • e.g., :0=deny,0 -> add :80=deny,0 if :80 does not exist.

With the above we only change/expand existing allow keys to redirect, and
expand existing drop keys to also drop on the port of interest, if a new
L4-only key allowing the port is added.

Adjust the unit test to mimic the real behavior a bit better by
regenerating endpoint policy after updating visibility policy. This is
needed due to the test using same port for HTTP and Kafka, and without
regenerating visibility redirects no longer overwrite existing
redirects.

Visibility redirects need to be re-applied after applying incremental
policy map changes. This is due to incremental changes possibly adding
keys for new identities that need visibility redirections or
additional deny keys as defined above.

Testing: New unit tests in policy package cover all the cases
described above.

Fixes: #9088

Pod L7 visibility annotations are now supported also when policy enforcement is enabled.

@jrajahalme jrajahalme added kind/enhancement This would improve or streamline existing functionality. area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. labels May 21, 2021
@jrajahalme jrajahalme requested a review from aanm May 21, 2021 02:25
@jrajahalme jrajahalme requested a review from a team as a code owner May 21, 2021 02:25
@jrajahalme jrajahalme self-assigned this May 21, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels May 21, 2021
@jrajahalme
Copy link
Member Author

@aanm This has critical interaction with deny policies, your review would be highly valued :-)

@jrajahalme
Copy link
Member Author

May need to add back the sidecar exception, as we likely get visibility for all HTTP that goes through a Cilium version of a sidecar already.

@jrajahalme
Copy link
Member Author

Only tested with a version of cilium CLI that adds a HTTP visibility annotation to one of the echo pods. In a situation where an L3-only ingress policy is applied, Hubble shows also the HTTP level events:

  📄 Flow logs for pod cilium-test/echo-other-node-587888bc89-8whsh:
  ✅ May 21 02:10:20.329: cilium-test/client2-657df6649d-4bv5w:36750 -> cilium-test/echo-other-node-587888bc89-8whsh:8080 from-endpoint FORWARDED (TCP Flags: SYN)
  ❓ May 21 02:10:20.329: cilium-test/client2-657df6649d-4bv5w:36750 -> cilium-test/echo-other-node-587888bc89-8whsh:8080 L3-L4 FORWARDED (TCP Flags: SYN)
  ❓ May 21 02:10:20.329: cilium-test/client2-657df6649d-4bv5w:36750 -> cilium-test/echo-other-node-587888bc89-8whsh:8080 to-proxy FORWARDED (TCP Flags: SYN)
  ✅ May 21 02:10:20.329: cilium-test/echo-other-node-587888bc89-8whsh:8080 -> cilium-test/client2-657df6649d-4bv5w:36750 from-proxy FORWARDED (TCP Flags: SYN, ACK)
  ❓ May 21 02:10:20.329: cilium-test/echo-other-node-587888bc89-8whsh:8080 -> cilium-test/client2-657df6649d-4bv5w:36750 to-endpoint FORWARDED (TCP Flags: SYN, ACK)
  ❓ May 21 02:10:20.329: cilium-test/client2-657df6649d-4bv5w:36750 -> cilium-test/echo-other-node-587888bc89-8whsh:8080 from-endpoint FORWARDED (TCP Flags: ACK)
  ❓ May 21 02:10:20.329: cilium-test/client2-657df6649d-4bv5w:36750 -> cilium-test/echo-other-node-587888bc89-8whsh:8080 to-proxy FORWARDED (TCP Flags: ACK)
  ❓ May 21 02:10:20.330: cilium-test/client2-657df6649d-4bv5w:36750 -> cilium-test/echo-other-node-587888bc89-8whsh:8080 from-endpoint FORWARDED (TCP Flags: ACK, PSH)
  ❓ May 21 02:10:20.330: cilium-test/client2-657df6649d-4bv5w:36750 -> cilium-test/echo-other-node-587888bc89-8whsh:8080 to-proxy FORWARDED (TCP Flags: ACK, PSH)
  ❓ May 21 02:10:20.330: cilium-test/echo-other-node-587888bc89-8whsh:8080 -> cilium-test/client2-657df6649d-4bv5w:36750 from-proxy FORWARDED (TCP Flags: ACK)
  ❓ May 21 02:10:20.330: cilium-test/echo-other-node-587888bc89-8whsh:8080 -> cilium-test/client2-657df6649d-4bv5w:36750 to-endpoint FORWARDED (TCP Flags: ACK)
  ❓ May 21 02:10:20.330: cilium-test/client2-657df6649d-4bv5w:36750 -> cilium-test/echo-other-node-587888bc89-8whsh:8080 http-request FORWARDED (HTTP/1.1 GET http://10.244.0.114:8080/)
  ❓ May 21 02:10:20.334: cilium-test/echo-other-node-587888bc89-8whsh:8080 -> cilium-test/client2-657df6649d-4bv5w:36750 from-proxy FORWARDED (TCP Flags: ACK, PSH)
  ❓ May 21 02:10:20.334: cilium-test/echo-other-node-587888bc89-8whsh:8080 -> cilium-test/client2-657df6649d-4bv5w:36750 to-endpoint FORWARDED (TCP Flags: ACK, PSH)
  ❓ May 21 02:10:20.334: cilium-test/client2-657df6649d-4bv5w:36750 -> cilium-test/echo-other-node-587888bc89-8whsh:8080 from-endpoint FORWARDED (TCP Flags: ACK)
  ❓ May 21 02:10:20.334: cilium-test/echo-other-node-587888bc89-8whsh:8080 -> cilium-test/client2-657df6649d-4bv5w:36750 http-response FORWARDED (HTTP/1.1 200 3ms (GET http://10.244.0.114:8080/))
  ❓ May 21 02:10:20.334: cilium-test/client2-657df6649d-4bv5w:36750 -> cilium-test/echo-other-node-587888bc89-8whsh:8080 to-proxy FORWARDED (TCP Flags: ACK)
  ✅ May 21 02:10:20.334: cilium-test/client2-657df6649d-4bv5w:36750 -> cilium-test/echo-other-node-587888bc89-8whsh:8080 from-endpoint FORWARDED (TCP Flags: ACK, FIN)
  ❓ May 21 02:10:20.334: cilium-test/client2-657df6649d-4bv5w:36750 -> cilium-test/echo-other-node-587888bc89-8whsh:8080 to-proxy FORWARDED (TCP Flags: ACK, FIN)
  ❓ May 21 02:10:20.335: cilium-test/echo-other-node-587888bc89-8whsh:8080 -> cilium-test/client2-657df6649d-4bv5w:36750 from-proxy FORWARDED (TCP Flags: ACK, FIN)
  ❓ May 21 02:10:20.335: cilium-test/echo-other-node-587888bc89-8whsh:8080 -> cilium-test/client2-657df6649d-4bv5w:36750 to-endpoint FORWARDED (TCP Flags: ACK, FIN)
  ❓ May 21 02:10:20.335: cilium-test/client2-657df6649d-4bv5w:36750 -> cilium-test/echo-other-node-587888bc89-8whsh:8080 from-endpoint FORWARDED (TCP Flags: ACK)
  ❓ May 21 02:10:20.335: cilium-test/client2-657df6649d-4bv5w:36750 -> cilium-test/echo-other-node-587888bc89-8whsh:8080 to-proxy FORWARDED (TCP Flags: ACK)

@jrajahalme jrajahalme added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label May 21, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 21, 2021
@jrajahalme
Copy link
Member Author

test-me-please

@jrajahalme
Copy link
Member Author

Runtime hit by known flake #16221

@jrajahalme
Copy link
Member Author

Fixed CI test verifying visibility annotation interaction with policy Add/Delete.

@jrajahalme
Copy link
Member Author

test-me-please

@jrajahalme
Copy link
Member Author

test-me-please

@jrajahalme
Copy link
Member Author

test-1.20-4.19

@jrajahalme
Copy link
Member Author

test-1.19-5.4 got 403 error from Vagrant:

06:55:46  + ./print-node-ip.sh
06:55:46  + CILIUM_REGISTRY=147.75.39.45 timeout 15m ./vagrant-ci-start.sh
06:55:46  destroying vms in case this is a retry
06:55:47  ==> k8s1-1.19: VM not created. Moving on...
06:55:48  ==> k8s2-1.19: VM not created. Moving on...
06:55:48  boxes available on the node
06:55:49  cilium/ubuntu      (virtualbox, 199)
06:55:49  cilium/ubuntu-next (virtualbox, 94)
06:55:49  starting vms
06:55:50  Bringing machine 'k8s1-1.19' up with 'virtualbox' provider...
06:55:50  ==> k8s1-1.19: Box 'cilium/ubuntu-5-4' could not be found. Attempting to find and install...
06:55:50      k8s1-1.19: Box Provider: virtualbox
06:55:50      k8s1-1.19: Box Version: 3
06:55:50  ==> k8s1-1.19: Loading metadata for box 'cilium/ubuntu-5-4'
06:55:50      k8s1-1.19: URL: https://vagrantcloud.com/cilium/ubuntu-5-4
06:55:51  The box 'cilium/ubuntu-5-4' could not be found or
06:55:51  could not be accessed in the remote catalog. If this is a private
06:55:51  box on HashiCorp's Vagrant Cloud, please verify you're logged in via
06:55:51  `vagrant login`. Also, please double-check the name. The expanded
06:55:51  URL and error message are shown below:
06:55:51  
06:55:51  URL: https://vagrantcloud.com/cilium/ubuntu-5-4
06:55:51  Error: The requested URL returned error: 403 Forbidden

@jrajahalme
Copy link
Member Author

test-1.19-5.4

@jrajahalme
Copy link
Member Author

jrajahalme commented Aug 9, 2021

test-me-please

Job 'Cilium-PR-Runtime-4.9' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

RuntimePrivilegedUnitTests Run Tests

Failure Output

FAIL: Failed to run privileged unit tests

If it is a flake, comment /mlh new-flake Cilium-PR-Runtime-4.9 so I can create a new GitHub issue to track it.

Job 'Cilium-PR-K8s-1.16-net-next' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks service across nodes Tests NodePort BPF Tests with direct routing Tests LoadBalancer Connectivity to endpoint via LB

Failure Output

FAIL: Can not connect to service "http://192.168.1.146" from outside cluster (1/10)

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.16-net-next so I can create a new GitHub issue to track it.

@jrajahalme
Copy link
Member Author

jrajahalme commented Aug 9, 2021

/mlh new-flake Cilium-PR-Runtime-4.9

👍 created #17104

@jrajahalme
Copy link
Member Author

jrajahalme commented Aug 9, 2021

runtime test fail is a new unrelated flake #17104 introduced by #16772. Fix in #17105

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New changes LGTM, just one test I'm a bit confused by (thread below).

pkg/policy/mapstate_test.go Show resolved Hide resolved
@jrajahalme
Copy link
Member Author

jrajahalme commented Aug 10, 2021

test-me-please

Job 'Cilium-PR-K8s-1.16-net-next' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks service across nodes Tests NodePort BPF Tests L2-less with Wireguard provisioned via kube-wireguarder Tests NodePort BPF

Failure Output

FAIL: Can not connect to service "http://192.168.36.11:30740" from outside cluster (1/10)

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.16-net-next so I can create a new GitHub issue to track it.

@jrajahalme
Copy link
Member Author

CI fails seem like unrelated flakes to me..

@jrajahalme
Copy link
Member Author

net-next hit by #16971

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest changes LGTM.

@jrajahalme
Copy link
Member Author

rebased to pick up CI fixes.

@jrajahalme
Copy link
Member Author

test-me-please

@christarazi christarazi added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Aug 18, 2021
Allow other types in addition to CachedSelectors to own MapState
entries. This is needed to properly track entires added due to deny
rules so that incremental map updates can be performed correctly.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
MapState internal state could be modified by callers via internal
maps. Nil them on entries stored outside of the MapState itself.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
MapState entries that should be deleted together with some other
entry are "dependent entries". These are created for some deny
entries. Keep track of them, and delete them when the main entry is
deleted.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Adjust and expand eBPF policy map to redirect for visibility on the
port of the visibility annotation while still denying traffic on this
port for identities for which the traffic is denied.

Datapath lookup order is, from highest to lowest precedence:
1. L3/L4
2. L4-only (wildcard L3)
3. L3-only (wildcard L4)
4. Allow-all

This means that the L4-only allow visibility key can only be added if
there is an allow-all key, and all L3-only deny keys are expanded to
L3/L4 keys. If no L4-only key is added then also the L3-only allow
keys need to be expanded to L3/L4 keys for visibility redirection. In
addition the existing L3/L4 and L4-only allow keys need to be
redirected to the proxy port, if not already redirected.

The above can be accomplished by:

1. Change existing L4-only ALLOW key on matching port that does not already
   redirect to redirect.
   - e.g., 0:80=allow,0 -> 0:80=allow,<proxyport>
2. If allow-all key exists, add L4-only visibility redirect key if the L4-only
   key does not already exist.
   - e.g., 0:0=allow,0 -> add 0:80=allow,<proxyport> if 0:80 key does not exist
     - this allows all traffic on port 80, but see step 5 below
3. Change all L3/L4 ALLOW keys on matching port that do not already redirect to
   redirect.
   - e.g, <ID1>:80=allow,0 -> <ID1>:80=allow,<proxyport>
4. For each L3-only ALLOW key add the corresponding L3/L4 ALLOW redirect if no
   L3/L4 key already exists and no L4-only key already exists and one is not added.
   - e.g., <ID2>:0=allow,0 -> add <ID2>:80=allow,<proxyport> if <ID2>:80
     and 0:80 do not exist and 0:80 was not added
5. If a new L4-only key was added: For each L3-only DENY key add the
   corresponding L3/L4 DENY key if no L3/L4 key already exists
   - e.g., <ID3>:0=deny,0 -> add <ID3>:80=deny,0 if <ID3>:80 does not exist

With the above we only change/expand existing allow keys to redirect, and
expand existing drop keys to also drop on the port of interest, if a new
L4-only key allowing the port is added.

Adjust the unit test to mimic the real behavior a bit better by
regenerating endpoint policy after updating visibility policy. This is
needed due to the test using same port for HTTP and Kafka, and without
regenerating visibility redirects no longer overwrite existing
redirects.

Visibility redirects need to be re-applied after applying incremental
policy map changes. This is due to incremental changes possibly adding
keys for new identities that need visibility redirections or
additional deny keys as defined above.

Testing: New unit tests in policy package cover all the cases
described above.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
@jrajahalme
Copy link
Member Author

Rebased.

@jrajahalme
Copy link
Member Author

test-me-please

@tklauser tklauser removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Aug 19, 2021
@jrajahalme
Copy link
Member Author

Unrelated Travis-CI integration test flake #11560 on amd64, arm64 OK.

@jrajahalme jrajahalme added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 1, 2021
@aditighag aditighag merged commit 62b469e into cilium:master Sep 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. kind/enhancement This would improve or streamline existing functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable proxy visibility without influencing policy