Skip to content
This repository has been archived by the owner on May 6, 2020. It is now read-only.

[RFC] create: support oci spec memory limit #390

Closed
wants to merge 1 commit into from
Closed

[RFC] create: support oci spec memory limit #390

wants to merge 1 commit into from

Conversation

wcwxyz
Copy link
Contributor

@wcwxyz wcwxyz commented Aug 9, 2017

We should run VM(container) regarding oci config.json memory limit.

Fixes #381

Signed-off-by: WANG Chao chao.wang@ucloud.cn

config.go Outdated
@@ -376,3 +376,13 @@ func loadConfiguration(configPath string, ignoreLogging bool) (resolvedConfigPat

return resolved, logfilePath, config, nil
}

func ociResourceUpdateRuntimeConfig(ocispec oci.CompatOCISpec, config *oci.RuntimeConfig) error {
mem := int(float64(*ocispec.Linux.Resources.Memory.Limit) / 1024 / 1024)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of thoughts:

  • maybe we should round this up. Docker allows fine granularity (down to bytes). I think it is probably better we allocate more than requested rather than too little
  • we may also want to take into account the overhead the VM kernel and other bits add to the container footprint, and add a fixed size extra space on to cater for that. It will only be significant for containers given a small memory space, but we can just add it to all containers. As a quick experiment I ran up docker run --runtime=cor -it alpine sh and did a free -m - and it looks like CC2.x is currently consuming under 24m. Almost little enough to ignore, but I say lets add 25m on here for VM kernel headroom.

Copy link
Contributor Author

@wcwxyz wcwxyz Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I think user of clear container should be aware of the memory (and cpu) overhead because it's using VM.

So say I wants to have 2G available memory for userspace, I should assigne 2G+overhead when launching qemu.

Anyway, that's my two cents. I don't have strong opinion on this one.

Copy link
Contributor

@jodh-intel jodh-intel Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory.limit is optional:

... and I've just checked with docker 17.05.0-ce and it only sets memory.swappiness. Hence, this is a zero-division error.

I agree with @grahamwhaley that we've got to reach a balance here: if we just set the mem size blindly, we may starve the VM of sufficient resources to actually start.

Copy link
Contributor Author

@wcwxyz wcwxyz Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add extra 25M suggested by @grahamwhaley .

There would be another 25M, so user requested memory size is unnecessary to roundup. Because we can't be precise about how much memory will be available for user space. A roundup to 1M seems useless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grahamwhaley @jodh-intel

Right now, I don't think we should arbitrarily add 25M by default. I run the following tests.

I use docker run -m MEMORY centos:7 /bin/free -m to test

128m

              total        used        free      shared  buff/cache   available
Mem:            117           9          86           0          21          91

512m

              total        used        free      shared  buff/cache   available
Mem:            494          12         461           0          20         461

1024m

              total        used        free      shared  buff/cache   available
Mem:            998          13         964           0          20         959

2048m

              total        used        free      shared  buff/cache   available
Mem:           2004          12        1971           0          20        1960

4096m

              total        used        free      shared  buff/cache   available
Mem:           4017          18        3980           0          18        3949

I find the kernel memory footprint is not consistent. And looking at total memory, some of them is eaten by firmware(?).

It's not easy to get available memory to precisely what user requested. And if we add extra memory for kernel overhead say 25M with requested 128M, we would ended up with total memory 140M. eg:

# docker run -m 128m centos:7 /bin/free -m
              total        used        free      shared  buff/cache   available
Mem:            140           9         109           0          20         114

It looks pretty strange and confused to me, since I only requested 128M memory in the first place.

From my point of view, I can expect less memory available to my application than I request. Because I should know clear container is using VM and VM kernel consumes memory.

I think adding an extra to requested memory should rethink (carefully).

For the time being (for the sake of this patch), we can focus on adding this feature as first step. Then think about how we handle VM kernel memory footprint (documentation or extra memory ...)

config.go Outdated
@@ -376,3 +376,13 @@ func loadConfiguration(configPath string, ignoreLogging bool) (resolvedConfigPat

return resolved, logfilePath, config, nil
}

func ociResourceUpdateRuntimeConfig(ocispec oci.CompatOCISpec, config *oci.RuntimeConfig) error {
mem := int(float64(*ocispec.Linux.Resources.Memory.Limit) / 1024 / 1024)
Copy link
Contributor

@jodh-intel jodh-intel Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory.limit is optional:

... and I've just checked with docker 17.05.0-ce and it only sets memory.swappiness. Hence, this is a zero-division error.

I agree with @grahamwhaley that we've got to reach a balance here: if we just set the mem size blindly, we may starve the VM of sufficient resources to actually start.

config.go Outdated
func ociResourceUpdateRuntimeConfig(ocispec oci.CompatOCISpec, config *oci.RuntimeConfig) error {
mem := int(float64(*ocispec.Linux.Resources.Memory.Limit) / 1024 / 1024)
if mem <= 0 {
return fmt.Errorf("Invalid oci memory limit %d", mem)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/oci/OCI/g

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. I'll add checks for ocispec.Linux.Resources.Memory.Limit.

@wcwxyz
Copy link
Contributor Author

wcwxyz commented Aug 10, 2017

ping @grahamwhaley @jodh-intel

What do you think about #390 (comment)

Can we leave extra memory for VM kernel out of this patch? We can come back when we have a better solution.

@grahamwhaley
Copy link
Contributor

Hi @wcwxyz Yeah, I have been thinking on this :-)
So, thanks for the numbers, that really does help some.
Indeed, as you show, it is not a nice linear overhead. I believe there will be a static overhead (fixed kernel objects etc.), and then there is a more dynamic overhead that scales with the amount of memory - particularly the space and tables the kernel has to allocate at boot to map and manage the memory.
I think we are fine to go ahead without the extra VM overhead for now. It will only really affect the very small memory cases.

Just some extra notes:

  • I see you tested with centos. I deliberately chose alpine as possibly the smallest workload I could find so that workload overheads featured less in the results. if/when we pick this up again we can try multiple workloads to correlate.
  • just for reference, work that was done in rkt had a similar situation, and a similar simple solution - but, yes, more analysis and a more qualified solution would be nice: rkt/rkt@f44c604

@wcwxyz
Copy link
Contributor Author

wcwxyz commented Aug 10, 2017

@grahamwhaley Thanks for your input. I'll do more investigation later if I got a chance.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.7%) to 54.684% when pulling 7af965f on wcwxyz:per-container-memory into 2eeb4f4 on clearcontainers:master.

1 similar comment
@coveralls
Copy link

coveralls commented Aug 10, 2017

Coverage Status

Coverage decreased (-0.7%) to 54.684% when pulling 7af965f on wcwxyz:per-container-memory into 2eeb4f4 on clearcontainers:master.

@coveralls
Copy link

coveralls commented Aug 10, 2017

Coverage Status

Coverage decreased (-0.7%) to 54.684% when pulling ffe8c19 on wcwxyz:per-container-memory into 2eeb4f4 on clearcontainers:master.

We should run VM(container) regarding oci config.json memory limit.

Fixes #381

Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
@coveralls
Copy link

coveralls commented Aug 11, 2017

Coverage Status

Coverage decreased (-0.8%) to 54.607% when pulling cd5cdf3 on wcwxyz:per-container-memory into 2eeb4f4 on clearcontainers:master.

@sboeuf
Copy link
Contributor

sboeuf commented Aug 11, 2017

This PR is a no-go for me because this has to happen in the oci package defined in virtcontainers repo: https://github.com/containers/virtcontainers/blob/master/pkg/oci/utils.go
The reason for this package is to translate OCI to virtcontainers structures directly and that's why we decided to put this into the virtcontainers repo.
-1

Rejected with PullApprove

@wcwxyz
Copy link
Contributor Author

wcwxyz commented Aug 11, 2017

@sboeuf I understand.

The problem is we need to translate OCI spec spec.Linux.Resources.Memory.Limit to RuntimeConfig runtime.VMConfig.Resources.Memory. If it happens in virtcontainer internally, this will be opaque to runtime.

How do you like adding a exported function in oci package to update RuntimeConfig, and runtime call this function? I mean runtime will be aware of this translation and not be surprised.

@sboeuf
Copy link
Contributor

sboeuf commented Aug 11, 2017

The runtime does not need to know about the internal way every option is translated. This means that from the runtimeConfig, the oci.Podconfig() or oci.ContainerConfig() function will translate this into virtcontainers structures.
The reason why we want this inside the oci package is that if someone wanted to write another runtime also based on OCI spec and relying on virtcontainers library, this code would be shared, which is what we expect. You have to see this runtime as a tiny wrapper on top of virtcontainers, nothing more.

@wcwxyz
Copy link
Contributor Author

wcwxyz commented Aug 11, 2017

Emm, ok. I'll do that in oci pkg.

I have a question though. When both OCI spec and RuntimeConfig has set memory, which should we honor?

Considering runtime is the only user of virtcontainers, we won't have this problem in practice.

@sboeuf
Copy link
Contributor

sboeuf commented Aug 11, 2017

There is no way to set max memory with runtimeConfig, only a default value, therefore I don't think there is any conflict.

@sboeuf
Copy link
Contributor

sboeuf commented Aug 16, 2017

I close this one since containers/virtcontainers#341 is the replacement.

@sboeuf sboeuf closed this Aug 16, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants