Skip to content
This repository has been archived by the owner on Jan 12, 2023. It is now read-only.

emptyDir without sizelimit approach #36

Closed
jleadford opened this issue Nov 21, 2019 · 5 comments · Fixed by #40
Closed

emptyDir without sizelimit approach #36

jleadford opened this issue Nov 21, 2019 · 5 comments · Fixed by #40

Comments

@jleadford
Copy link
Contributor

Hey!

By default, an emptyDir lacks a sizeLimit parameter, and is disk-based;
a Pod with access to said emptyDir can consume the Node's entire disk (i.e. the limit is unbounded) until the offending Pod is deleted or evicted, which can constitute a denial-of-service condition at the affected Node (i.e. DiskPressure). If the emptyDir is memory-backed, writes are counted against the container's memory limit, so this is probably less of a concern.

I'm happy to take up a k-rail policy for this issue. I envisage two approaches:

  • If a Pod's emptyDir lacks a sizeLimit, report a violation (e.g. block)
  • If a Pod's emptyDir lacks a sizeLimit, add a sane default

One might also want to check if a supplied sizeLimit is within a sane bound.

Let me know what sounds good here and I'll move forward.

@jleadford
Copy link
Contributor Author

Nodes will behave differently in the face of DiskPressure, but, hey, we just don't really want a single Pod to be able to get us into a spot where we need to reason about this. :)

@dustin-decker
Copy link
Contributor

dustin-decker commented Nov 22, 2019

This sounds good. How about a mutation policy that has these two configuration options?

  • a maximum sizeLimit
  • a default sizeLimit if one is not specified

You can see the pod_mutate_safe_to_evict policy for an example of a mutating policy.

Exceeding the maximum limit would return a ResourceViolation and nil PatchOperation but no size limit specified would return a PatchOperation and nil ResourceViolation.

I have some regrets with how I specified the first few policy configurations, but going forward I'd suggest we consistently use map like so:

policy_name:
    policy_option: value
MutateEmptyDirSizeLimit struct {
    MaximumSizeLimit int `yaml:"maximum_size_limit"`
    DefaultSizeLimit int `yaml:"default_size_limit"`
} `yaml:"mutate_empty_dir_size_limit"`

@dustin-decker
Copy link
Contributor

I'd be surprised if we haven't encountered this before without knowing. In GKE, a DiskPressure condition makes a Node unhealthy, which kicks of Node Repair to replace the node. If a Pod were to hit this condition consistently it could cause a bit of Pod and Node churn.

@dustin-decker
Copy link
Contributor

Hey @jleadford , FYI, @alpe implemented this in #40.

@jleadford
Copy link
Contributor Author

Dang, the holiday caught me slackin'! :)

Thanks for hacking on that @alpe ! I'll close this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants