Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for setting sysctls #19265

Merged
merged 1 commit into from Apr 13, 2016
Merged

Add support for setting sysctls #19265

merged 1 commit into from Apr 13, 2016

Conversation

rhatdan
Copy link
Contributor

@rhatdan rhatdan commented Jan 12, 2016

This patch will allow users to specify namespace specific "kernel parameters"
for running inside of a container.

Dan Walsh dwalsh@redhat.com

Signed-off-by: Dan Walsh dwalsh@redhat.com

@rhatdan
Copy link
Contributor Author

rhatdan commented Jan 12, 2016

This pull request replaces #16632

@rhatdan
Copy link
Contributor Author

rhatdan commented Jan 12, 2016

Opened pull request for engine-api docker/engine-api#38

}
arr := strings.Split(val, "=")
if len(arr) < 2 {
return "", fmt.Errorf("sysctl '%s' is not valid ", val)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we say "not valid" here, or something that gives the user a clue that it's not supported / whitelisted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about
sysctl %s is not a namespaced kernel parameter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could work; is that true for all kernel versions though? Perhaps we should mention that we don't support it in stead? Open to better suggestions though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well then we get a bug report telling us it is a namespaced kernel parameter. If we say unsupported, user might ignore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO mentioning that it's "not whitelisted" makes it more clear to the user that they can come make the case for why it should be added 😇

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well if you guys come to consensus I will change the message. :^)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with "not whitelisted"

and good news; this has been moved to code review!

// ValidateSysctl validates an sysctl and returns it.
func ValidateSysctl(val string) (string, error) {
validSysctlMap := map[string]bool{
"kernel.msgmax": true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all these actually namespaced from 3.10+?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

man namespaces
...
   IPC namespaces (CLONE_NEWIPC)
...

       *  The System V IPC interfaces in /proc/sys/kernel, namely: msgmax,
          msgmnb, msgmni, sem, shmall, shmmax, shmmni, and shm_rmid_forced.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you consider adding kernel.perf_event_paranoid and kernel.kptr_restrict to the whitelist?

as a use case: i'm working on a proof of concept around flame graphs, visualizing perf event data in a containerized node.js app following along the work here https://gist.github.com/trevnorris/9616784.

have had a devil of a time sorting out getting these kernel parameters to stick in a running container. came upon your PR, and am stoked to see a --sysctl flag in the works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mysterlune Do you know if these sysctls are namespaced? If you set them inside of the container, do the settings show up outside of the container? If not, then we would add them, if yes then we will not add them.

You could test this by running a --privileged container.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @rhatdan,

so i ran the container with the --privileged flag as such (as you can see, i'm on a mac; unsure if this matters to this experiment):

docker run -d -p 8080:8080 --name foobar -v /Users/rlune/gitproj/observable_node_poc:/src -v /Users/rlune/gitproj/observable_node_poc:/tmp --privileged observable-node-poc

... and got:

d137a82d7731ab095c484d0e70dd96081c275cf43c50100128ad5ea44e40cb01

... then, exec'd a bash session:

docker exec -it d137a82d7731ab095c484d0e70dd96081c275cf43c50100128ad5ea44e40cb01 bash

... checked the host machine for the kernel flags next...:

16:24 rlune@oakm111430f01:~/gitproj/observable_node_poc $ sysctl -a | grep kernel.perf_event_paranoid
16:25 rlune@oakm111430f01:~/gitproj/observable_node_poc $ sysctl -a | grep kernel.kptr_restrict

... received no output for each lookup. then, set the flags on the instance running in the container:

[root@d137a82d7731 app]# sysctl -w kernel.kptr_restrict=1
kernel.kptr_restrict = 1
[root@d137a82d7731 app]# sysctl -w kernel.perf_event_paranoid=1
kernel.perf_event_paranoid = 1

... read the keys on the image:

[root@d137a82d7731 app]# sysctl -a | grep kernel.kptr_restrict      
kernel.kptr_restrict = 1
sysctl: reading key "net.ipv6.conf.all.stable_secret"
sysctl: reading key "net.ipv6.conf.default.stable_secret"
sysctl: reading key "net.ipv6.conf.eth0.stable_secret"
sysctl: reading key "net.ipv6.conf.lo.stable_secret"

[and got the same sort of output for the `kernel.perf_event_paranoid` also]

... then checked the host machine again:

16:24 rlune@oakm111430f01:~/gitproj/observable_node_poc $ sysctl -a | grep kernel.perf_event_paranoid
16:25 rlune@oakm111430f01:~/gitproj/observable_node_poc $ sysctl -a | grep kernel.kptr_restrict

... and did not see any lookup output for those keys.

is this test sufficient to determine that the kernel.perf_event_paranoid and kernel.kptr_restrict keys are namespaced in a way that will allow their addition to this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it is enough, but it certainly looks good. I guess if you ran another container and made sure they were different.

@jeremyeder PTAL

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dang it... it doesn't look like those flags are namespaced if i run two containers and set the flags within one of them... they show up as such in the other...

so, how does a flag get "namespaced" so that this collision doesn't happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You write a kernel patch. :^(

@rhatdan
Copy link
Contributor Author

rhatdan commented Feb 9, 2016

Now that we are in docker-1.11 could we get this pull request moving. @calavera @cpuguy83

@thaJeztah thaJeztah added the status/needs-attention Calls for a collective discussion during a review session label Feb 15, 2016
@thaJeztah
Copy link
Member

We're okay with this, moving to code review, unless @crosbymichael has major concerns

@thaJeztah thaJeztah added status/2-code-review and removed status/1-design-review status/needs-attention Calls for a collective discussion during a review session labels Feb 18, 2016
@rhatdan rhatdan force-pushed the netsysctl branch 3 times, most recently from 7269711 to ab2c413 Compare February 19, 2016 15:25
@icecrime icecrime added the status/failing-ci Indicates that the PR in its current state fails the test suite label Feb 29, 2016
@thaJeztah
Copy link
Member

ping @calavera @cpuguy83 PTAL

@cpuguy83
Copy link
Member

Just need to update the engine-api vendor, otherwise code LGTM.

@thaJeztah
Copy link
Member

Looks like this needs another rebase @rhatdan 😢

@rhatdan
Copy link
Contributor Author

rhatdan commented Mar 2, 2016

Done.

@rhatdan rhatdan force-pushed the netsysctl branch 2 times, most recently from d6a278d to 1a5db4b Compare March 2, 2016 22:12
@thaJeztah thaJeztah mentioned this pull request Mar 8, 2016
@thaJeztah
Copy link
Member

ping @vdemeester ptal

@vdemeester
Copy link
Member

LGTM 🐰
gccgo is SUCCESS and windowsTP5 fails with a known flakey test.. merging 😉

@vdemeester vdemeester merged commit 988508a into moby:master Apr 13, 2016
@thaJeztah
Copy link
Member

🎉 thanks @rhatdan!

@brunoborges
Copy link

Has anybody tested --sysctl to tweak net.ipv4.tcp_syn_retries to reduce global timeout for TCP connections inside the container? Does it work?

runcom pushed a commit to runcom/docker that referenced this pull request Jun 8, 2016
Upstream reference: moby#19265

This patch will allow users to specify namespace specific "kernel parameters"
for running inside of a container.

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
@vingrad
Copy link

vingrad commented Jun 9, 2016

How can I use sysctl option with a compose file?

@thaJeztah
Copy link
Member

@vingrad this feature is not released yet (it will be in docker 1.12), also, that's a better question for the docker compose issue tracker; https://github.com/docker/compose/issues

liusdu pushed a commit to liusdu/moby that referenced this pull request Oct 30, 2017
This patch will allow users to specify namespace specific "kernel parameters"
for running inside of a container.

cherry-pick from: moby#19265

conflicts:
        contrib/completion/bash/docker
        docs/reference/commandline/create.md
        docs/reference/commandline/run.md
        man/docker-create.1.md
        man/docker-run.1.md
        runconfig/opts/parse.go

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
Signed-off-by: Lei Jitang <leijitang@huawei.com>
liusdu pushed a commit to liusdu/moby that referenced this pull request Oct 30, 2017
Add support for setting sysctls

This patch will allow users to specify namespace specific "kernel parameters"
for running inside of a container.

cherry-pick from: moby#19265
conflicts:
        contrib/completion/bash/docker
        docs/reference/commandline/create.md
        docs/reference/commandline/run.md
        man/docker-create.1.md
        man/docker-run.1.md
        runconfig/opts/parse.go

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
Signed-off-by: Lei Jitang <leijitang@huawei.com>

This is a feature request from euleros.


See merge request docker/docker!385
@merryisfriend
Copy link

Now can we use sysctl option with a dockerfile? I have some of parameters to set in container. @rhatdan

@cpuguy83
Copy link
Member

cpuguy83 commented Dec 6, 2017

No, you can't persist sysctls in an image.

@vinujan59
Copy link

vinujan59 commented Oct 10, 2019

ubuntu@ip:~$ docker --version
Docker version 19.03.2, build 6a30dfc

ubuntu@ip:~$ uname -r
4.15.0-1051-aws

ubuntu@ip:~$ sysctl net.core.rmem_default
net.core.rmem_default = 212992

ubuntu@ip:~$ docker run  --privileged  -it ubuntu:16.04 uname -r
4.15.0-1051-aws

ubuntu@ip:~$ docker run  --privileged  -it ubuntu:16.04  sysctl net.core.rmem_default
sysctl: cannot stat /proc/sys/net/core/rmem_default: No such file or directory

ubuntu@ip:~$ docker run  --privileged --sysctl net.core.rmem_default=524288  -it ubuntu:16.04  /bin/bash
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"write sysctl key net.core.rmem_default: open /proc/sys/net/core/rmem_default: no such file or directory\"": unknown.

ubuntu@ip:~$ docker run  --privileged  --network="host" -it ubuntu:16.04  sysctl net.core.rmem_default
net.core.rmem_default = 212992
  • Have the latest docker version
  • the host has the parameter net.core.rmem_default
  • the same kernel will be used by the running container
  • the container doesn't have this parameter and not settable as well
  • but with host network mode, it can be verified in the container, the value is being shared
    -- implies that net.core.rmem_default is namespaced

is docker doesn't support net.core.rmem_default parameter?

@cpuguy83
Copy link
Member

cpuguy83 commented Oct 10, 2019 via email

@vinujan59
Copy link

Hi Brian,

Thank you for your reply.

I could not get the part "non-root network namespace".

Can you please ellaborate

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet