Skip to content
This repository has been archived by the owner on Oct 26, 2023. It is now read-only.

Adds support for running in Istio environments #21

Merged
merged 4 commits into from
Mar 30, 2021
Merged

Conversation

sonnysideup
Copy link
Contributor

@sonnysideup sonnysideup commented Mar 28, 2021

Changes to application

  • Adds --istio-enabled flag to command and codifies Istio proxy readiness check + quitquitquit wrapper around CRD actions

Changes to chart

  • Adds pod security policy
  • Adds network policy
  • Adds new values to control resource creation/behavior depending on deployment environment

this may be the wrong place to put this logic. ideally, istio fixes
their shit and we no longer have to (a) wait for the sidecar to be ready
and (b) send an envoy quit signal after a job pod is done.

alas, this is not the case right now and wrapping this logic around the
actual command will require us to migrate off of a "distroless" base
docker image.

poop or vomit? pick one.
@codecov
Copy link

codecov bot commented Mar 28, 2021

Codecov Report

Merging #21 (249efc7) into main (bd698df) will decrease coverage by 0.86%.
The diff coverage is 27.58%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #21      +/-   ##
==========================================
- Coverage   80.35%   79.49%   -0.87%     
==========================================
  Files          23       24       +1     
  Lines        1507     1531      +24     
==========================================
+ Hits         1211     1217       +6     
- Misses        207      225      +18     
  Partials       89       89              
Impacted Files Coverage Δ
pkg/crd/crd.go 45.31% <26.66%> (-8.40%) ⬇️
pkg/crd/istio.go 28.57% <28.57%> (ø)
controllers/raycluster_controller.go 43.43% <0.00%> (-0.91%) ⬇️
controllers/sparkcluster_controller.go 45.77% <0.00%> (+1.77%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd698df...249efc7. Read the comment docs.

- adds support for '--istio-enabled' crd cmd flag
- adds psp use permissions to hooks
- adds custom psp for operator
- adds new values to control rendering
exposed ports are health, metrics, and webhook. not sure we need to
block access to these but maybe i'm wrong
@sonnysideup sonnysideup marked this pull request as ready for review March 29, 2021 23:16
@sonnysideup sonnysideup requested a review from a team March 29, 2021 23:16
@sonnysideup
Copy link
Contributor Author

@ddl-audi @steved I just need to add some tests and/or tweak this codecov config (I know, I know, coverage is 🤮).

One of the interesting things to note about this change is the way we handle PSP permissions specific to a cluster (i.e. RayCluster, SparkCluster, etc...). If you deploy the operator into an environment with the PSP admission controller enabled and Istio available, then these changes will allow the operator/crd install to work.

If you then try to create a RayCluster, as an example, that might fail. It all depends the PSP/ServiceAccount/RBAC chain establish for that cluster.

  1. Someone (externally) creates a PSP for their use and then provides that podSecurityPolicy name in the CR. Assuming the PSP is adequate (covers crazy Istio elevated permissions shiz), then this works.
  2. Someone sets up the full PSP/ServiceAccount/RBAC chain externally and provides a serviceAccountName in the CR. Again, if it's correctly set up, then this works.
  3. Someone tries to create a CR without any of the above 2 measures in the environment I described, then 🔥🔥🔥🔥.

I think we have covered internally, correct?

Initially, I decided NOT to create PSPs for clusters since I didn't want to manage external resources via a finalizer. But, since we're now using StatefulSets and clean up their PVCs on CR delete, perhaps this changes the equation? I don't think it obviates my initial concern but I'm curious to get your input.

ddl-audi
ddl-audi previously approved these changes Mar 30, 2021
Copy link
Contributor

@ddl-audi ddl-audi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me based on what we discussed

not very important yet annoying
@sonnysideup sonnysideup merged commit ecc90eb into main Mar 30, 2021
@sonnysideup sonnysideup deleted the istio-support branch March 30, 2021 15:15
Copy link
Contributor

@steved steved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I decided NOT to create PSPs for clusters since I didn't want to manage external resources via a finalizer. But, since we're now using StatefulSets and clean up their PVCs on CR delete, perhaps this changes the equation? I don't think it obviates my initial concern but I'm curious to get your input.

I think instinctively it feels like we're fighting K8s and I'm not sure I'm a huge fan of controlling these global resources in the controller since it runs into ownership issues and such.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
3 participants