Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared library loading from /tmp is broken when deleting the loaded library file to prevent outside access. #1911

Closed
srid opened this issue Sep 17, 2013 · 9 comments

Comments

@srid
Copy link
Contributor

srid commented Sep 17, 2013

[ bug description was provided by colleague @andreas-kupries ]

Various methods for creating a single-file executable for scripting
applications allow the wrapping of shared libraries into their executable. When
using these shared libraries the underlying application code will copy the
library out of the internal virtual filesystem to /tmp to make them visible to
libdl for actual loading. For proper hygiene these temporary files are deleted
from /tmp immediately after libdl loaded them. The process (OS and libdl)
is/are still able to access the file through its fd and/or mmap handle. (On
systems which do not allow that, like older HPUX, the temp files are marked to
be deleted on process exit).

Regardless, when doing this in a docker container only the first shared library
is loaded properly in this way, and a second shared library is not. In the
attached example minimally demonstrating the effect a symbol looked up in the
second library is improperly resolved to a pointer in the first library.

In the original bug the effect actually was failure to find a function symbol
definitely present in the 2ndly loaded library, as per 'nm's output.

The problem goes away when either not going through /tmp to load the library,
or when not deleting it immediately after loading, but deferring this to
process-exit.

Outside of a docker container the issue does not happen, the system is able to
load as many shared libraries via the /tmp and deleting them after load without
problem.

https://github.com/ActiveState/docker-issue-1911
Sources (.c and shell scripts) for a demo of the shlib issue.

Four C files: 2 variants of a main application, and two minimal "packages".
Each package exports the function "fun".
The two main variants differ only in a single (un)commented line.

Outside of a docker container both variants are ok, printing A and B, for the
two packages they load. Inside of a container the bad variant does not error,
but will print A twice. I.e. the symbol for 'fun' which should have been from
the "shb" library was taken from "sha".

Session output:

andreask@akucloud:~/z$ ./build.sh
building...

Outside:

andreask@akucloud:~/z$ ./run.sh
running ok...
A
B
running bad...
A
B

Inside docker:

andreask@akucloud:~/z$ sd run -v `pwd`:/z -w /z -i -t ubuntu /bin/bash
WARNING: IPv4 forwarding is disabled.
root@91248898f8fc:/z# ./run.sh
running ok...
A
B
running bad...
A
A

Oh, and changing the if(1) in main-bad to if(0), preventing the demo from going
through /tmp fixes the issue as well.

So, final: This looks to be in the intersection of libdl, lxc containers, and
the filesystem responsible for /tmp inside a docker container. Removing the
loaded shlib breaks some data structures to the point where a second, following
shlib also coming from /tmp is mis-processed (wrong function pointer delivered,
or, in the original example, a symbol not found).

@crosbymichael
Copy link
Contributor

@srid Do you have an idea of what the fix should be?

@creack
Copy link
Contributor

creack commented Jan 9, 2014

I think it has something to do with the way lxc mount /tmp. I"ll investigate.

@ghost ghost assigned creack Jan 9, 2014
@aidanhs
Copy link
Contributor

aidanhs commented Mar 3, 2014

Possibly related to #4301

@aidanhs
Copy link
Contributor

aidanhs commented Mar 3, 2014

Yeah, looks like exactly the same issue, running under devicemapper fixes it (and I guess so would running /tmp as tmpfs or a volume).

I will close that ticket in favour of this one. The only possibly useful comment I made on that ticket:
"""
Comparing straces, the only difference appears to be some memory mapping that doesn't happen on the broken run (on AUFS).

Looking at http://blog.dotcloud.com/kernel-secrets-from-the-paas-garage-part-34-a it seems mmap used to be a problem, but that was a while ago.
"""

andreas-kupries pushed a commit to tcltk/tcl that referenced this issue May 31, 2014
regarding the handling of wrapped dynamic libraries.

The basic flow of operation is to copy such libraries into a temp
file, hand them to the OS loader for processing, and then to delete
them immediately, to prevent them form being accessible to other
executables. On platforms where that is not possible the library is
left in place and things are arranged to delete it on regular process
exit.

An example of the latter are older revisions of HPUX which report that
the file is busy when trying to delete it. Younger revisions of HPUX
have changed to allow the deletion, but are also buggy, the OS loader
mangles its data structures so that a second library loaded in this
manner fails.

More recently it was found that Linux which is usually ok with
deleting the file and gets everything right shows the same trouble as
modern HPUX when the "docker" containerization system is involved, or
more specifically the AUFS in use there. Deleting the loaded library
file mangles data structures and breaks loading of the following
libraries. For a demonstration which does not involve Tcl at all see
the ticket

     moby/moby#1911

in the docker tracker.

This of course breaks the use of wrapped executables within docker
containers.

This commit introduces the function TclSkipUnlink() which centralizes
the handling of such exceptions to unlinking the library after unload,
and provides code handling the known cases. IOW HPUX is generally
forced to not unlink, and ditto when we detect that the copied library
file resides within an AUFS.

The latter must however be explicitly activated by setting the define
-DTCL_TEMPLOAD_NO_UNLINK during build. We still need proper configure
tests to set it on the relevant platforms (i.e. Linux).

The AUFS detection and handling can be overridden by the environment
variable TCL_TEMPLOAD_NO_UNLINK which can force the behaviour either
way (skip or not). In case the user knows best, or wishes to test if
the problem with AUFS has been fixed.
@srid srid unassigned creack Jul 24, 2014
@aidanhs
Copy link
Contributor

aidanhs commented Sep 30, 2014

@srid can you reattach the files (or put them in a repo somewhere since that's all I'll do if you reattach)?

(Edit: I'm wanting to check if this is still an issue)

@srid
Copy link
Contributor Author

srid commented Oct 1, 2014

@aidanhs
Copy link
Contributor

aidanhs commented Oct 2, 2014

Looks like it works ok in ubuntu 14.04, but not 12.04:

Both machines:

~/docker/shutit_modules $ docker version
Client version: 1.2.0
Client API version: 1.14
Go version (client): go1.3.1
Git commit (client): fa7b24f
OS/Arch (client): linux/amd64
Server version: 1.2.0
Server API version: 1.14
Go version (server): go1.3.1
Git commit (server): fa7b24f

This one works:

~/docker/shutit_modules $ docker info
Containers: 4
Images: 75
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Dirs: 83
Execution Driver: native-0.2
Kernel Version: 3.13.0-35-generic
Operating System: Ubuntu 14.04.1 LTS
WARNING: No swap limit support
~/docker/shutit_modules $ man aufs | head | grep version
       aufs - advanced multi layered unification filesystem. version 3.13-20140303

This one is broken:

~ $ sudo docker info
Containers: 5
Images: 14
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Dirs: 24
Execution Driver: native-0.2
Kernel Version: 3.8.0-44-generic
Operating System: Ubuntu precise (12.04.5 LTS)
WARNING: No swap limit support
~ $ man aufs | head | grep version
       aufs - advanced multi layered unification filesystem. version 3.x-rcN-20111205

However, Ubuntu 12.04 end of life is not until 2017, so 'upgrade' is not really a good solution...

@jessfraz
Copy link
Contributor

@srid @aidanhs are you able to reproduce with the latest version of docker or is this resolved, if you can reproduce can you please give the exact steps you used

@aidanhs
Copy link
Contributor

aidanhs commented Mar 2, 2015

Can't reproduce any more, even with an old version of docker. I assume something was fixed in an Ubuntu package.

@jessfraz jessfraz closed this as completed Mar 2, 2015
andreas-kupries pushed a commit to tcltk/tcl that referenced this issue May 31, 2015
regarding the handling of wrapped dynamic libraries.

The basic flow of operation is to copy such libraries into a temp
file, hand them to the OS loader for processing, and then to delete
them immediately, to prevent them from being accessible to other
executables. On platforms where that is not possible the library is
left in place and things are arranged to delete it on regular process
exit.

An example of the latter are older revisions of HPUX which report that
the file is busy when trying to delete it. Younger revisions of HPUX
have changed to allow the deletion, but are also buggy, the OS loader
mangles its data structures so that a second library loaded in this
manner fails.

More recently it was found that Linux which is usually ok with
deleting the file and gets everything right shows the same trouble as
modern HPUX when the "docker" containerization system is involved, or
more specifically the AUFS in use there. Deleting the loaded library
file mangles data structures and breaks loading of the following
libraries. For a demonstration which does not involve Tcl at all see
the ticket

     moby/moby#1911

in the docker tracker.

This of course breaks the use of wrapped executables within docker
containers.

This commit introduces the function TclSkipUnlink() which centralizes
the handling of such exceptions to unlinking the library after unload,
and provides code handling the known cases. IOW HPUX is generally
forced to not unlink, and ditto when we detect that the copied library
file resides within an AUFS.

The latter must however be explicitly activated by setting the define
-DTCL_TEMPLOAD_NO_UNLINK during build. We still need proper configure
tests to set it on the relevant platforms (i.e. Linux).

The AUFS detection and handling can be overridden by the environment
variable TCL_TEMPLOAD_NO_UNLINK which can force the behaviour either
way (skip or not). In case the user knows best, or wishes to test if
the problem with AUFS has been fixed.
sebres pushed a commit to sebres/tcl that referenced this issue Oct 2, 2015
regarding the handling of wrapped dynamic libraries.

The basic flow of operation is to copy such libraries into a temp
file, hand them to the OS loader for processing, and then to delete
them immediately, to prevent them from being accessible to other
executables. On platforms where that is not possible the library is
left in place and things are arranged to delete it on regular process
exit.

An example of the latter are older revisions of HPUX which report that
the file is busy when trying to delete it. Younger revisions of HPUX
have changed to allow the deletion, but are also buggy, the OS loader
mangles its data structures so that a second library loaded in this
manner fails.

More recently it was found that Linux which is usually ok with
deleting the file and gets everything right shows the same trouble as
modern HPUX when the "docker" containerization system is involved, or
more specifically the AUFS in use there. Deleting the loaded library
file mangles data structures and breaks loading of the following
libraries. For a demonstration which does not involve Tcl at all see
the ticket

     moby/moby#1911

in the docker tracker.

This of course breaks the use of wrapped executables within docker
containers.

This commit introduces the function TclSkipUnlink() which centralizes
the handling of such exceptions to unlinking the library after unload,
and provides code handling the known cases. IOW HPUX is generally
forced to not unlink, and ditto when we detect that the copied library
file resides within an AUFS.

The latter must however be explicitly activated by setting the define
-DTCL_TEMPLOAD_NO_UNLINK during build. We still need proper configure
tests to set it on the relevant platforms (i.e. Linux).

The AUFS detection and handling can be overridden by the environment
variable TCL_TEMPLOAD_NO_UNLINK which can force the behaviour either
way (skip or not). In case the user knows best, or wishes to test if
the problem with AUFS has been fixed.
andreas-kupries pushed a commit to tcltk/tcl that referenced this issue Sep 8, 2016
regarding the handling of wrapped dynamic libraries.

The basic flow of operation is to copy such libraries into a temp
file, hand them to the OS loader for processing, and then to delete
them immediately, to prevent them from being accessible to other
executables. On platforms where that is not possible the library is
left in place and things are arranged to delete it on regular process
exit.

An example of the latter are older revisions of HPUX which report that
the file is busy when trying to delete it. Younger revisions of HPUX
have changed to allow the deletion, but are also buggy, the OS loader
mangles its data structures so that a second library loaded in this
manner fails.

More recently it was found that Linux which is usually ok with
deleting the file and gets everything right shows the same trouble as
modern HPUX when the "docker" containerization system is involved, or
more specifically the AUFS in use there. Deleting the loaded library
file mangles data structures and breaks loading of the following
libraries. For a demonstration which does not involve Tcl at all see
the ticket

     moby/moby#1911

in the docker tracker.

This of course breaks the use of wrapped executables within docker
containers.

This commit introduces the function TclSkipUnlink() which centralizes
the handling of such exceptions to unlinking the library after unload,
and provides code handling the known cases. IOW HPUX is generally
forced to not unlink, and ditto when we detect that the copied library
file resides within an AUFS.

The latter must however be explicitly activated by setting the define
-DTCL_TEMPLOAD_NO_UNLINK during build. We still need proper configure
tests to set it on the relevant platforms (i.e. Linux).

The AUFS detection and handling can be overridden by the environment
variable TCL_TEMPLOAD_NO_UNLINK which can force the behaviour either
way (skip or not). In case the user knows best, or wishes to test if
the problem with AUFS has been fixed.
cpuguy83 pushed a commit to cpuguy83/docker that referenced this issue May 25, 2021
…aster

Lock goroutine to OS thread while changing NS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants