Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize default storage class (CSI) #213

Closed
2 of 8 tasks
garloff opened this issue Nov 7, 2022 · 8 comments
Closed
2 of 8 tasks

Standardize default storage class (CSI) #213

garloff opened this issue Nov 7, 2022 · 8 comments
Assignees
Labels
Container Issues or pull requests relevant for Team 2: Container Infra and Tooling SCS is standardized SCS is standardized standards Issues / ADR / pull requests relevant for standardization & certification

Comments

@garloff
Copy link
Contributor

garloff commented Nov 7, 2022

As SCS container user, I want to be sure to have a default persistent storage class available that allows me to use storage that survives the lifecycle of a pod.
(Open question: Do we need to standardize the name?)

Tasks:

  • What are the wanted properties:
    • Locality: Bound to a host, bound to an AZ? => non-local (but restricted to one AZ is acceptable)
    • RWO/RWX => RWO OK

Separate discussions:

Definition of Ready:

  • User Story is small enough to be finished within one sprint
  • User Story is clear and understood by the whole team
  • Acceptance criteria are defined
  • Acceptance criteria are clear and understood by the whole team

Definition of Done:

  • ADR/Standard is written #198
  • Conformance test is available
  • Reference implementation passes
@garloff garloff added the Container Issues or pull requests relevant for Team 2: Container Infra and Tooling label Nov 7, 2022
@garloff
Copy link
Contributor Author

garloff commented Nov 7, 2022

Some results from the discussion today:

  • If we standardize additional storage classes anyway (which must have standard names), we can as well define a standard name for our default storage class as well.
  • We should restrict ourselves to define k8s behavior (RWO, local or non-local) for the default storage class. Additional requirements (redundancy, encryption, performance, but possibly also RWX and local storage and multi-AZ storage) should be in additional classes (Standardize additional storage classes #214).
  • For the k8s properties we agreed:
    • Storage is non-local: The storage can be attached to another node -- does not go down with the node
    • Storage is RWO (Read-Write-Once)

@garloff
Copy link
Contributor Author

garloff commented Dec 5, 2022

Also we expect standard storage to always

  • Be encrypted (more precisely: data at rest is encrypted on the medium, so a stolen medium can not leak data)
  • Be non-local, i.e. survives a node going away

@garloff
Copy link
Contributor Author

garloff commented Dec 5, 2022

TODO: Write ADR

@garloff garloff added SCS is standardized SCS is standardized standards Issues / ADR / pull requests relevant for standardization & certification labels Dec 5, 2022
@garloff
Copy link
Contributor Author

garloff commented Dec 19, 2022

@joshmue to start with first draft

@JohannesEbke
Copy link

From an application writers perspective, I would classify this default storage as defined above as "low/medium-performance redundant, but single-AZ storage". If i write an App using this class I would expect to get some version of "cheap, reasonable, storage" similar to an AWS EBS Volume.

In AWS; one of the key cost/speed decisions is to pick either SSD- or HDD-backed storage. Is it planned to make a decision on this here?

@joshmue
Copy link

joshmue commented Dec 19, 2022

In AWS; one of the key cost/speed decisions is to pick either SSD- or HDD-backed storage. Is it planned to make a decision on this here?

I do not know. When settling on e.g. "low/medium-performance redundant, but single-AZ storage", HDD storage could be ok - implicitly.
I think that it is important to link this decision tightly with #214. So, for example, assuming the result of #214 is a list of storage classes like:

  • "IOPS-200" (IOPS>=200; Bandwidth>=10MB/s)
  • "IOPS-500" (IOPS>=500; Bandwidth>=30MB/s)
  • "IOPS-1000" (IOPS>=1000; Bandwidth>=100MB/s)
  • and so forth

...it will be imperative to make one of these options the default. For example, "IOPS-500". In this scenario, it would be decided to standardize based on effective IOPS performance (maybe also linking bandwidth requirements etc.), disregarding underlying technology.

Considering these two stories are so tightly linked (IMHO), we could either...

  1. write down all non-performance related requirements into an decision record
  2. decide on list/schema of additional storage classes in Standardize additional storage classes #214 in a second decision record
  3. set one of these storage classes as default in a third decision record

...or...

  1. decide on list/schema of additional storage classes in Standardize additional storage classes #214 in a decision record
  2. set one of these storage classes as default in a second decision record

@garloff What do you think?

EDIT: For the sake of simplicity, I disregarded any mechanisms that would bind e.g. disk size to disk performance. If this should be done, maybe a further discussion is required in #214 and we should go with the first option.

@garloff
Copy link
Contributor Author

garloff commented Dec 20, 2022

Great discussion, @joshmue ,@JohannesEbke!
It seems to me that we should follow your first suggestion, @joshmue.
And we need to accept that we need to make progress on the additional storage classes in order to define the default one.

We should probably understand bandwidth scaling with storage size a bit better - I am aware that ceph has ~linear scaling behavior for a certain range. We use ceph in our ref impl and it's quite popular outside of it as well. Other distributed storage solutions tend to have similar behavior.
We could keep things simple and apply our performance numbers to a defined size (say 50GB) and define that performance must not be lower than perf = ref * size/50GB for size <= 50GB and not smaller than ref for > 50GB.

And yes, I believe that having a IOPS-xxx class combining min IOPS with a certain min bandwidth is sophisticated enough...

@garloff
Copy link
Contributor Author

garloff commented Feb 22, 2023

Should be closed with merging SovereignCloudStack/standards#198

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Container Issues or pull requests relevant for Team 2: Container Infra and Tooling SCS is standardized SCS is standardized standards Issues / ADR / pull requests relevant for standardization & certification
Projects
Archived in project
Development

No branches or pull requests

4 participants