Skip to content

Include python-gdb.py in images shipped with debugging symbols #701

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 24, 2022
Merged

Conversation

malor
Copy link
Contributor

@malor malor commented Feb 19, 2022

Description

Starting with gdb 7, it's possible to extend gdb with Python. CPython is shipped with a script that allows to extract high-level information specific to the interpreter, such as application-level stack traces, the values of local variables, etc (see https://devguide.python.org/gdb/ for details). Include this script in the images that are shipped with debugging symbols.

The script is copied to /usr/share/gdb/auto-load/usr/local/bin/python${VERSION}-gdb.py and loaded automatically when gdb is used to debug a process running the python binary.

The increase of the image size is negligible (~72KiB). Users will have to install gdb separately, if they need it.

Closes #89.

Prior art

The previous attempt to implement this (#251) incorrectly assumed that gdb has to be compiled from scratch, so that it can be linked against the version of libpython shipped in the image. In fact, the Python version used by gdb to run its extension scripts has nothing to do with the Python interpreter under debugging; the two can happily co-exist together in the same image.

(Dockerfile is provided in the "Usage" section below)

$ docker run --rm -t -i --cap-add=SYS_PTRACE python:debugging-demo /bin/bash

# libpython used for running Python applications in this Docker image
root@7962ab0092f4:/# ldd $(which python) | grep libpython
        libpython3.10.so.1.0 => /usr/local/lib/libpython3.10.so.1.0 (0x00007f181fd57000)

# libpython used for running python-gdb.py in gdb; installed from a Debian package repository as a dependency of the gdb package
root@7962ab0092f4:/# ldd $(which gdb) | grep libpython
        libpython3.9.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython3.9.so.1.0 (0x00007f7de38b9000)

Usage

There are a few complications associated with debugging Python applications running in Docker containers compared to normal debugging:

  • gdb needs access (at least) to the mount (or rather, the filesystem contents, i.e. the binary files and the corresponding debugging symbols) and pid namespaces of a Docker container under debugging
  • the gdb process needs to have the SYS_PTRACE capability

Debugging can be done in different ways:

Install gdb in production images

Suitable for low-level troubleshooting during development, but not in production.

Pros:

  • very straightforward

Cons:

  • significantly increases image size
  • requires running production containers with the SYS_PTRACE capability

Demo:

$ cat Dockerfile
# syntax=docker/dockerfile:1.3-labs

FROM python:3.10-bullseye

RUN apt-get update && apt-get install -y gdb

COPY <<EOF /opt/eggs.py
def f():
    g()

def g():
    abs(42)

f()
EOF

CMD ["gdb", "python", "-q", "-ex", "set breakpoint pending on", "-ex", "break builtin_abs", "-ex", "run /opt/eggs.py", "-ex", "py-bt", "-ex", "continue", "-ex", "quit"]

$ DOCKER_BUILDKIT=1 docker build . -t python:debugging-demo

$ docker run --rm -ti --cap-add SYS_PTRACE python:debugging-demo
Reading symbols from python...
Function "builtin_abs" not defined.
Breakpoint 1 (builtin_abs) pending.
Starting program: /usr/local/bin/python /opt/eggs.py
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, builtin_abs (module=<module at remote 0x7fd4ec5d8590>, x=42) at Python/bltinmodule.c:307
307     Python/bltinmodule.c: No such file or directory.
Traceback (most recent call first):
  <built-in method abs of module object at remote 0x7fd4ec5d8590>
  File "/opt/eggs.py", line 5, in g
    abs(42)
  File "/opt/eggs.py", line 2, in f
    g()
  File "/opt/eggs.py", line 7, in <module>
    f()
Continuing.
[Inferior 1 (process 12) exited normally]

Build a separate image with gdb

Use a multistage build to produce two versions of the Docker image:

  • production, which only includes the required dependencies
  • debug, which is based on the production image, but also has gdb (and its dependencies) installed

Suitable for debugging in production.

Pros:

  • no extra dependencies in production images
  • no need to add the SYS_PTRACE capability to production containers

Cons:

  • more involved; requires starting two separate containers that share the pid namespace

Demo:

$ cat Dockerfile
# prod image
FROM python:3.10-bullseye AS production

COPY <<EOF /opt/eggs.py
import time

def f():
    g()

def g():
    time.sleep(600)
    abs(42)

f()
EOF

CMD ["python", "/opt/eggs.py"]

# debug image: same contents + gdb installed
FROM production AS debug
RUN apt-get update && apt-get install -y gdb
CMD ["gdb"]

$ DOCKER_BUILDKIT=1 docker build . --target=production -t python:debugging-demo-production
$ DOCKER_BUILDKIT=1 docker build . --target=debug -t python:debugging-demo-debug

# start the production container
$ docker run --rm -d python:debugging-demo-production
d9af3d19848ec673b36fffdb62cdd699bb5314d81f1a934f049e22793b85bd4f

# start the debug container and attach it to the PID namespace of the production container
$ docker run --rm -ti --cap-add SYS_PTRACE --pid=container:d9af3d19848ec673b36fffdb62cdd699bb5314d81f1a934f049e22793b85bd4f python:debugging-demo-debug /bin/bash

root@f4314023f375:/# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 16:44 ?        00:00:00 python /opt/eggs.py

root@f4314023f375:/# gdb -q python -p 1 -ex "set sysroot" -ex "py-bt"
Reading symbols from python...
Attaching to program: /usr/local/bin/python, process 1
Error while mapping shared library sections:
Could not open `target:/usr/local/lib/libpython3.10.so.1.0' as an executable file: Operation not permitted
Error while mapping shared library sections:
Could not open `target:/lib/x86_64-linux-gnu/libc.so.6' as an executable file: Operation not permitted
Error while mapping shared library sections:
Could not open `target:/lib/x86_64-linux-gnu/libpthread.so.0' as an executable file: Operation not permitted
Error while mapping shared library sections:
Could not open `target:/lib/x86_64-linux-gnu/libdl.so.2' as an executable file: Operation not permitted
Error while mapping shared library sections:
Could not open `target:/lib/x86_64-linux-gnu/libutil.so.1' as an executable file: Operation not permitted
Error while mapping shared library sections:
Could not open `target:/lib/x86_64-linux-gnu/libm.so.6' as an executable file: Operation not permitted
Error while mapping shared library sections:
Could not open `target:/lib64/ld-linux-x86-64.so.2' as an executable file: Operation not permitted

warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
0x00007f0b93bf8866 in ?? ()
Reading symbols from /usr/local/lib/libpython3.10.so.1.0...
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...
Reading symbols from /usr/lib/debug/.build-id/54/eef5ce96cf37cb175b0d93186836ca1caf470c.debug...
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...
Reading symbols from /usr/lib/debug/.build-id/50/18237bbf012b4094027fd0b96fc22a24496ea4.debug...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...
Reading symbols from /usr/lib/debug/.build-id/11/8b90161526d181807818c459baee841993795b.debug...
Reading symbols from /lib/x86_64-linux-gnu/libutil.so.1...
Reading symbols from /usr/lib/debug/.build-id/56/75f6cc697d1e1fb135c65cbb0f917550fe85ac.debug...
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...
Reading symbols from /usr/lib/debug/.build-id/e9/d2c06479b13dd3cfa78d714d11dccf6fcbee51.debug...
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Reading symbols from /usr/lib/debug/.build-id/32/438eb3b034da54caf58c7a65446639f7cfe274.debug...
Traceback (most recent call first):
  <built-in method sleep of module object at remote 0x7f0b9373f510>
  File "/opt/eggs.py", line 7, in g
    time.sleep(600)
  File "/opt/eggs.py", line 4, in f
    g()
  File "/opt/eggs.py", line 10, in <module>
    f()

The critical bit here is set sysroot, which tells gdb to load symbol files from the local mount namespace: because two images are almost identical -- they share the same "production" base -- the very same file paths will exist in the "debug" image as well. Otherwise, the gdb process running in the "debug" container would not have permissions to access the mount namespace of the "prod" container (see the permissions errors in the example above that are triggered before sysroot is modified).

Run gdb in the root namespace

gdb supports entering the mount namespace of a process under debugging, so it can be run directly in the root pid namespace. It enters the target's pid namespace automatically if it differs from the namespace gdb is started in.

Pros:

  • no need to have gdb installed in any of the containers
  • very similar to debugging normal Python processes run in the root namespace

Cons:

  • python-gdb.py for the target CPython version will not be loaded automatically, or might not even exist on the host
  • while gdb itself supports this mode of operation, pythong-gdb.py does not: it will not be able to load Python source code from a different mount namespace
  • gdb prints a warning that this mode is unreliable:

warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable. Connect to gdbserver inside the container.

Demo:

$ docker run --rm -d python:debugging-demo-production
a5ae819169b8420ddf6b365fbb16ce3ce1b0e211df646732420df7ed5210eab0

$ ps -ef | grep eggs
root        4276    4255  2 14:26 ?        00:00:00 python /opt/eggs.py

$ sudo gdb -q -p 4276
Attaching to process 4276
Reading symbols from target:/usr/local/bin/python3.10...
Reading symbols from target:/usr/local/lib/libpython3.10.so.1.0...
Reading symbols from target:/lib/x86_64-linux-gnu/libc.so.6...
(No debugging symbols found in target:/lib/x86_64-linux-gnu/libc.so.6)
Reading symbols from target:/lib/x86_64-linux-gnu/libpthread.so.0...
(No debugging symbols found in target:/lib/x86_64-linux-gnu/libpthread.so.0)
Reading symbols from target:/lib/x86_64-linux-gnu/libdl.so.2...
(No debugging symbols found in target:/lib/x86_64-linux-gnu/libdl.so.2)
Reading symbols from target:/lib/x86_64-linux-gnu/libutil.so.1...
(No debugging symbols found in target:/lib/x86_64-linux-gnu/libutil.so.1)
Reading symbols from target:/lib/x86_64-linux-gnu/libm.so.6...
(No debugging symbols found in target:/lib/x86_64-linux-gnu/libm.so.6)
Reading symbols from target:/lib64/ld-linux-x86-64.so.2...
(No debugging symbols found in target:/lib64/ld-linux-x86-64.so.2)

warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable.  Connect to gdbserver inside the container.

warning: Expected absolute pathname for libpthread in the inferior, but got target:/lib/x86_64-linux-gnu/libpthread.so.0.

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
0x00007f382c3c6866 in select () from target:/lib/x86_64-linux-gnu/libc.so.6

# need to load the python-gdb.py script first ; only works in the same mount namespace!
(gdb) source /home/malor/src/cpython/Tools/gdb/libpython.py

# the stack trace will not have source code lines, as python-gdb.py can't load them from a different mount namespace 
(gdb) py-bt
Traceback (most recent call first):
  <built-in method sleep of module object at remote 0x7f382bf13510>
  File "/opt/eggs.py", line 7, in g
  File "/opt/eggs.py", line 4, in f
  File "/opt/eggs.py", line 10, in <module>

Copy link
Member

@tianon tianon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for the detailed overview!

I have just a minor suggestion, but otherwise I think this is great.

malor and others added 3 commits February 23, 2022 22:58
Starting with gdb 7, it's possible to extend gdb with Python. CPython
is shipped with a script that allows to extract high-level information
specific to the interpreter, such has application-level stack traces,
the values of local variables, etc (see https://devguide.python.org/gdb/
for details). Include this script in the images that are shipped with
debugging symbols.

The script is copied to /usr/share/gdb/auto-load/usr/local/bin/python${VERSION}-gdb.py
and loaded automatically when gdb is used to debug a process running the
python binary.

The image size increase is negligible (~72KiB). Users will have to
install gdb separately, if they need it.

Closes #89
Co-authored-by: Tianon Gravi <admwiggin@gmail.com>
@malor malor requested a review from tianon February 23, 2022 23:18
@tianon tianon requested a review from yosifkit February 23, 2022 23:18
@yosifkit yosifkit merged commit a4b3681 into docker-library:master Feb 24, 2022
docker-library-bot added a commit to docker-library-bot/official-images that referenced this pull request Feb 24, 2022
Changes:

- docker-library/python@a4b3681: Include python-gdb.py in images shipped with debugging symbols (docker-library/python#701)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

include python-gdb.py in base containers
4 participants