Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEGFAULT when certain modules are used in a project with Alpine 3.6/Python 3.6 #211

Closed
beaugunderson opened this issue Jul 12, 2017 · 29 comments

Comments

@beaugunderson
Copy link

beaugunderson commented Jul 12, 2017

We've recently upgraded several of our containers to Python 3.6.

Some of the containers always segfault immediately after starting Django's runserver command.

In a couple of the containers upgrading our dependencies fixed the issue, but on some of the other containers that didn't have the issue upgrading the dependencies caused it to start happening.

In one example upgrading requests from 2.12.1 to 2.18.1 stopped the SEGFAULTs but adding the latest version of boto3 caused them to start again.

The issue only happens when run under docker.

I have a simple example repo here and would like to offer a bounty of $100 to whoever can either fix the issue or explain it in a way that allows me to fix it. ✨

Edit: we've tested with 3-alpine and 3-alpine3.6; same result.

Also SEGFAULTs on 3.6.2rc2-alpine3.6.

More clues: requests is also implicated in boto3 which includes botocore--which includes a vendored requests 2.7.0.

@xyzz
Copy link

xyzz commented Jul 12, 2017

Can you check this ultrajson/ultrajson#254 (comment)

@yosifkit
Copy link
Member

A couple pointers, don't install python3 or python3-dev from apk if you are using the python docker image. The python in the docker image is installed from source so anything that links to or uses the apk installed python will probably have issues. But this is not the source of the segfault.

Don't remove dependencies in a later RUN line as this won't save any image size. Don't apk update on its own RUN line (use && to chain it to the command using it). Don't apk upgrade in a container; it is best to wait for the base image to update.

Following are the two Dockerfiles that I ran your compose on; both still segfault. The first is using just the python provided in the docker python image (alpine based). The second is using python3 from Alpine Linux (apk). So, my opinion is that this is not a bug in the docker python image, but I would suspect is one of the "package expects glibc but has muslc" as noted on the readme from Docker Hub. Therefore, it is most likely to need a fix in python upstream or muslc. @ncopa, do you think you could help debug where the fix is needed?

FROM python:3.6-alpine3.6

# logging to the console breaks without this
ENV PYTHONUNBUFFERED 1
ENV PYTHONFAULTHANDLER 1

RUN apk add --no-cache \
        bash \
        build-base \
        gettext \
        linux-headers \
        musl-dev \
        postgresql-client \
        postgresql-dev

RUN mkdir -p /app/

WORKDIR /app/

# so we can cache the installed python modules apart from the app files
COPY *.txt /app/

RUN pip3 install --upgrade pip setuptools
RUN pip3 install -r requirements.txt -r dev-requirements.txt

COPY . /app/
FROM alpine:3.6

# logging to the console breaks without this
ENV PYTHONUNBUFFERED 1
ENV PYTHONFAULTHANDLER 1

RUN apk add --no-cache \
        bash \
        build-base \
        gettext \
        linux-headers \
        musl-dev \
        postgresql-client \
        postgresql-dev \
        python3 \
        python3-dev \
# alpine doesn't provide a `python` symlink to `python3`, so create your own
    && ln -s /usr/bin/python3 /usr/local/bin/python

RUN mkdir -p /app/

WORKDIR /app/

# so we can cache the installed python modules apart from the app files
COPY *.txt /app/

RUN pip3 install --upgrade pip setuptools
RUN pip3 install -r requirements.txt -r dev-requirements.txt

COPY . /app/

This Dockerfile, on the other hand, works fine.

# Debian based image
FROM python:3.6

RUN mkdir -p /app/

WORKDIR /app/

# so we can cache the installed python modules apart from the app files
COPY *.txt /app/

RUN pip3 install --upgrade pip setuptools
RUN pip3 install -r requirements.txt -r dev-requirements.txt

COPY . /app/

@beaugunderson
Copy link
Author

@yosifkit good catch on not installing python3; I had just removed that locally as well :)

re: the other Docker optimizations, thanks! I have more reading to do I guess :)

@beaugunderson
Copy link
Author

@xyzz I implemented your changes here; I still get a segfault: https://github.com/beaugunderson/crash-test/compare/bg-stack-fix

is there a way to test that LD_PRELOAD is working correctly? I don't see any output for the printf I added...

@xyzz
Copy link

xyzz commented Jul 12, 2017

RUN cd /app/ && gcc -shared -fPIC stack-fix.c -o stack-fix.o

.so, not .o

@beaugunderson
Copy link
Author

oof, simple typo! thanks :)

@beaugunderson
Copy link
Author

@xyzz that works! where shall I send your $100? :)

@xyzz
Copy link

xyzz commented Jul 12, 2017

awesome! Please donate it to Doctors Without Borders or a similar charity.

@beaugunderson
Copy link
Author

@xyzz done! i assume I'll get some confirmation once it goes through (I did it through work) and am happy to send that along when I get it :)

also: do you happen to know where the best place to escalate this issue might be? it seems like an 'alpine's use of musl' issue, correct?

screenshot

@xyzz
Copy link

xyzz commented Jul 12, 2017

I've already poked an alpine developer about it, I think they will patch python to use bigger stack argument in pthread_create.

@beaugunderson
Copy link
Author

beaugunderson commented Jul 13, 2017 via email

@beaugunderson
Copy link
Author

I'll close this issue since I don't think there's anything to be done here :)

I'll also add this link to a previous discussion of increasing the stack size in musl just to provide another concrete example.

@beaugunderson
Copy link
Author

finally verifying that I made that donation ✨

screenshot

jaimebuelta pushed a commit to jaimebuelta/django-docker-template that referenced this issue Aug 4, 2017
Need to include a hack to solve a problem with running runserver
in python3.6. Check for more info:

docker-library/python#211
@jozo
Copy link

jozo commented Oct 22, 2017

I have solved SEGFAULT issue by adding this code to my python script:

import threading; threading.stack_size(2*1024*1024)

It looks to me that this code does the same job as xyzz's solution but in python. I found this solution on ultrajson/ultrajson#254

@beaugunderson
Copy link
Author

@fadawar this fix improves the SEGFAULT behavior for my containers but doesn't fix it entirely... I go from an immediate SEGFAULT on launch to a few SEGFAULTs a day; not sure why there's a difference! but I have to go back to the old fix

@frol
Copy link

frol commented Nov 2, 2017

Interesting, I have been using Python 3 in Alpine Docker image (frolvlad/alpine-python3) for over 2 years and bumped into this issue only today. I didn't make any significant changes lately, but I got Docker image rebuilt and it might have fetched a newer version of python3 or musl from Alpine repository.

Here is the reproduction:

$ docker run -it --rm --publish 5000:5000 frolvlad/flask-restplus-server-example:alpine-stack-issue

When I query curl http://127.0.0.1:5000/api/v1/swagger.json, Python just segfaults:

python3.6[4701]: segfault at 7f74b665aff8 ip 00007f74b61b611b sp 00007f74b665b000 error 6 in libpython3.6m.so.1.0[7f74b6127000+3e8000]

Setting a bigger stack size suggested by @fadawar solves the issue (or at least masks the problem for now).

I have tested the behaviour on a few different hosts because I thought that it might have been related to the kernel (all resulted in the same segfault):

  • Host: Arch Linux with Linux kernel 4.13.9 (latest)
    Docker version: 17.10

  • Host: Arch Linux with Linux kernel 4.9.59 (LTS)
    Docker version: 17.10

  • Host: Ubuntu 16.04 with Linux kernel 4.4.0-96
    Docker version: 17.05

  • Host: Ubuntu 14.04 with Linux kernel 4.4.0-59
    Docker version: 17.05

Interestingly, if I rebase on frolvlad/alpine-python2 (Python 2.7 package from Alpine repositories) everything works just fine.

The project which I reproduce this issue on is public, so you can play with it: https://github.com/frol/flask-restplus-server-example (I have added the hotfix threading.stack_size(2*1024*1024) into the master, so refer to the alpine-stack-issue tag)

frol added a commit to frol/flask-restplus-server-example that referenced this issue Nov 2, 2017
@mimischi
Copy link

Wow, would never have thought to end up here.

I've been hunting a bug in my Django code for almost a week and finally just nailed it down to python:alpine3.6. For me it was extremely weird, because accessing two specific URLs (which just show forms using crispy-forms) led to a crash of the Django container. No information, just a crash.

Thanks @fadawar for the snippet. That makes my code work again.

So what is the status with this bug now -- will we need to wait for a fix in musl?

@sobolevn
Copy link

sobolevn commented Nov 12, 2017

The same here. I have a segfault when running from rest_framework import routers.
Link to the Dockerfile: https://github.com/wemake-services/wemake-django-template/blob/master/%7B%7Bcookiecutter.project_name%7D%7D/docker/django/Dockerfile

Right now I am using --noreload option for runserver, since it only affects my development containers.

@ncopa
Copy link
Contributor

ncopa commented Nov 14, 2017

Does it make any difference if you add ENV JSON_MAX_STACK_BUFFER_SIZE=1024 before running pip?

algitbot pushed a commit to alpinelinux/aports that referenced this issue Nov 14, 2017
we segfault before we hit the sys.getrecurslimit(), which is bad. Some
testing showed that to be able to reach the default
sys.getrecurslimit(), which is 1000, we need at least a 453k stack on
x86_64. We set it to 1MB so we have a bit extra.

It is also worth to note that upstream set stacksize to 4MB on freebsd
and 5MB on OSX.

ref #8134
docker-library/python#211
@IlianIliev
Copy link

@ncopa I tried it, but it still crashes for me.

@sobolevn
Copy link

@ncopa no luck either.

@IlianIliev
Copy link

@sobolevn It works fine with alpine 3.7 so if you can, just upgrade switch to it.

@davidwindell
Copy link

@IlianIliev does that mean the LD_PRELOAD hack isn't needed at all with Alpine 3.7?

@IlianIliev
Copy link

@davidwindell exactly. I just updated to Alpine 3.7, installed python 3 and it was working like a charm.

@JayH5
Copy link
Contributor

JayH5 commented Dec 20, 2017

With the latest python images for Alpine this problem should hopefully be fixed across all Python and Alpine Linux versions.

@davidwindell
Copy link

@JayH5 is this also fixed on python not just python3?

@JayH5
Copy link
Contributor

JayH5 commented Jan 8, 2018

@davidwindell yup, Python 2.7 as well. See #211.

amercader added a commit to okfn/docker-ckan that referenced this issue Jan 23, 2018
Fixes #2

I nailed down the segmentation faults occurring to using the development
server (Paster, --reload option didn't affect). Of course I don't know
the exact issue with paster but the underlying issue seems to be Alpine
3.6 + some python packages + a too small stack size.

docker-library/python#211

There are several issues across different projects that confirm that
upgrading to Alpine 3.7 solves the issue, so that's what we did.

The upgrade to Alpine 3.7 caused the following issues:

* curl was not part of the image by default, added it to the installed
  packages

* psycopg2 build failed because of:

        psycopg/psycopg2#594
  The fix is to upgrade the psycopg2 version. This is hardcoded in
  CKAN's requirements.txt so we work around it changing the version when
  building the image (this should be upgraded in CKAN core)

Also for good measure we don't use binaries when building the
requirements because of the issues described here:

ckan/ckan#3893
neomantra pushed a commit to openresty/docker-openresty that referenced this issue Mar 15, 2018
@fr0der1c
Copy link

I'm using python:3.7-alpine3.8 and I still got the segmentation fault problem. Is this a regression?

@ncopa
Copy link
Contributor

ncopa commented Nov 28, 2018

I'm using python:3.7-alpine3.8 and I still got the segmentation fault problem. Is this a regression?

@fr0der1c Please open a new issue. I don't think it is same problem.

$ cat test-stacksize.py 
import threading
import sys

def fun(i):
  try:
    fun(i+1)
  except:
    sys.exit(0)

t = threading.Thread(target=fun, args=[1])
t.start()
$ docker run --rm  python:3.7-alpine3.8 < test-stacksize.py 
$ 

No segfault here...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests