BlockingIOError on App Engine after some time #50

drazzib · 2020-06-22T20:26:34Z

After adding google-cloud-profiler==1.1.1 to my Python 3.7 App Engine project, I'm getting lots of

BlockingIOError: [Errno 11] Resource temporarily unavailable
Exception ignored when trying to write to the signal wakeup fd:

in my service stderr logs

I'm having the same issue either with App Engine Standard python3.7 or with App Engine Flex over a custom runtime (Docker image)

Is this a known issue, is there something I can do to mitigate this issue ?

The text was updated successfully, but these errors were encountered:

kalyanac · 2020-06-22T21:30:57Z

@drazzib Thank you for reporting this. This is not a known issue. We will try to reproduce this issue.

kalyanac · 2020-06-22T22:01:00Z

Can you share your app.yaml, requirements.txt, Dockerfile? I tried with Quickstart apps on Profiling Python applications | Cloud Profiler | Google Cloud and did not see that messages.

drazzib · 2020-06-23T08:43:08Z

app.yaml:

runtime: custom
env: flex
manual_scaling:
  instances: 1
resources:
  cpu: 4
  memory_gb: 6
  disk_size_gb: 20
env_variables:
  FLASK_APP: main.py
  FLASK_ENV: production

requirements.txt:

wrapt==1.11.2
sklearn==0.0
Flask==1.1.1
gunicorn==20.0.4
future==0.18.2
h5py==2.10.0
google-cloud-storage==1.26.0
google-cloud-core==1.3.0
google-cloud-profiler==1.1.1
google-cloud-logging==1.15.0
pandas==1.0.2
datalab==1.1.5
Keras==2.3.1
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
tensorflow-cpu==2.1.0
cachetools==4.1.0

Dockerfile:

FROM gcr.io/google-appengine/python

# Create a virtualenv for dependencies. This isolates these packages from
# system-level packages.
# Use -p python3 or -p python3.7 to select python version. Default is version 2.
RUN virtualenv -p python3 /env

# Setting these environment variables are the same as running
# source /env/bin/activate.
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH

# Copy the application's requirements.txt and run pip to install all
# dependencies into the virtualenv.
ADD requirements.txt /app/requirements.txt
RUN pip install --upgrade pip
RUN pip install -r /app/requirements.txt

# Add the application source code.
ADD . /app

# Run a WSGI server to serve the application. gunicorn must be declared as
# a dependency in requirements.txt.
CMD gunicorn -c gunicorn.conf.py -b :$PORT main:app

drazzib · 2020-06-24T12:10:19Z

Hi @kalyanac & @jqll !

I've just seen that a new 1.1.2 release has been pushed to PyPI repository which contains 17d58aa commit.

Is this relevant for issue discussed above ?

kalyanac · 2020-06-24T16:34:06Z

Hi @drazzib, No the new release is not related to this issue. We are still having trouble reproducing this error.

kalyanac · 2020-06-24T16:39:13Z

Are you seeing any profiles collected despite these messages?

drazzib · 2020-06-24T19:41:25Z

Yes, I can confirm that - even with this error printed very often on stderr
there is still some collected profile visible on Google StackDriver Profiler UI

kalyanac · 2020-06-25T02:23:57Z

Thank you for the response. Few more questions:

What profile types do you see in the Profiler UI?
For each profile type, select a 10 minutes in Timespan, and check the profile count shown in the root node. The root node has the profile count displayed as "... averaged over n profiles". Please report what the n is.
If the service is not running currently, please set the end time to a time when the service was running by clicking "Now" and setting the end time.

kalyanac · 2020-07-01T02:55:02Z

Gentle ping.

drazzib · 2020-07-05T14:35:17Z

Hi @kalyanac , sorry for the delay.

In Profiler UI, I see both "CPU Time" et "Wall time" for my application
For each profile type, under a 10 minutes in Timespan, I can see
- for "CPU Time" 20.91 ms (100%), averaged over 11 profiles
- for "Wall Time" 10 s (100%), averaged over 9 profiles

kalyanac · 2020-07-06T14:27:50Z

Hi @drazzib, it seems like the agent is uploading profiles at a rate we expect it to. So we don't anticipate any loss of profile data at the moment.

We'll try to reproduce this issue and root cause the source of the message.

radyz · 2020-07-06T19:52:46Z

I just started having this same issue today with my CloudRun service

Libs:
google-cloud-profiler = 1.1.2
gunicorn = 20.0.4
Django = 3

 Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/googlecloudprofiler/pythonprofiler.py", line 180, in _handler
    if not self._started or self._in_handler:
BlockingIOError: [Errno 11] Resource temporarily unavailable"

kalyanac · 2020-07-06T20:21:17Z

@radyz thanks for reporting it.

jqll · 2020-07-06T22:49:10Z

Thanks for reporting the issue.

I think the problem is that the app enabled signal wakeup fd somewhere. When the profiler profiles, it triggers a SIGALRM/SIGPROF signal every 10ms. Since too many signals arrive too quickly, it may cause the fd’s buffer to be full, see https://docs.python.org/3/library/signal.html#signal.set_wakeup_fd:

There are two common ways to use this function. In both approaches, you use the fd to wake up when a signal arrives, but then they differ in how they determine which signal or signals have arrived.

In the first approach, we read the data out of the fd’s buffer, and the byte values give you the signal numbers. This is simple, but in rare cases it can run into a problem: generally the fd will have a limited amount of buffer space, and if too many signals arrive too quickly, then the buffer may become full, and some signals may be lost. If you use this approach, then you should set warn_on_full_buffer=True, which will at least cause a warning to be printed to stderr when signals are lost.

In the second approach, we use the wakeup fd only for wakeups, and ignore the actual byte values. In this case, all we care about is whether the fd’s buffer is empty or non-empty; a full buffer doesn’t indicate a problem at all. If you use this approach, then you should set warn_on_full_buffer=False, so that your users are not confused by spurious warning messages.

This will not affect the profiler. But if the app definitely needs to access the signal numbers as described in the first approach above, you probably should not use the profiler.

mgrdcm · 2020-07-09T02:22:23Z

I'm also having this same thing happen and have for years. GAE Standard, both Python 3.7 and Python 3.8, Django 3.0.8, default Gunicorn setup. Also tried a while ago to use uwsgi and same thing happens.

I also notice that I don't get very useful information from within my python app inside of Profiler, which I wonder if is related.

kalyanac · 2020-07-09T08:24:23Z

@mgrdcm Can you open a new issue about not getting useful information from Profiler? We would like to understand what is happening and help address it.

mgrdcm · 2020-07-09T17:14:51Z

Something on this that I've only noticed in the last 24 hours on GAE Standard is this:

Traceback (most recent call last): File "/layers/google.python.pip/pip/googlecloudprofiler/pythonprofiler.py", line 165, in _handler def _handler(self, unused_signum, frame): BlockingIOError: [Errno 11] Resource temporarily unavailable

Previously I was only getting the same block as mentioned in the more generic errors in the original post:

BlockingIOError: [Errno 11] Resource temporarily unavailable
Exception ignored when trying to write to the signal wakeup fd:

jqll · 2020-09-11T19:32:24Z

Hi @mgrdcm, sorry for the late response. When you see the new error below, does the profile you get contain "unknown" frames? If that's the case, could you share some setup so that we can reproduce the problem? Thanks!

BlockingIOError: [Errno 11] Resource temporarily unavailable
Exception ignored when trying to write to the signal wakeup fd:

mgrdcm · 2020-12-13T18:46:29Z

@jqll I apologies, I missed this message until now. I stopped using cloud profiler due to this and a few other reasons. Has anything changed on this issue since September? If it hasn't been diagnosed or resolved I'm happy to add it back in and see if I'm still seeing the problem!

kalyanac · 2021-01-07T23:31:34Z

We have documented this in our known issues section of the documentation.

Closing because there is no further action expected on this issue.

mgrdcm · 2021-01-08T01:04:01Z

@kalyanac I'm running a pretty basic Django app on GAE Standard, and I just searched through all of my code and all the dependencies and nothing is calling set_wakeup_fd.

kalyanac · 2021-01-08T01:08:23Z

It is probably set by one of the Django or any other imports Django uses.

kalyanac · 2021-01-08T01:10:08Z

Here is an example from gunicorn

mgrdcm · 2021-01-08T01:14:13Z

Heh, I was just arriving at that conclusion - it's gunicorn itself that's doing it, which is the default server that Google App Engine Standard uses! None of the rest of the code I'm using in my project (directly or indirectly) does it.

So it looks like right now there's no way to use google-cloud-profiler reliably on App Engine Standard. Unless we switch to a different non-default server?

mgrdcm · 2021-01-09T01:29:26Z

@kalyanac Because of ☝️ meaning that you can't use one Google product (this package) on another (Google App Engine Standard with Python 3) without this happening, it kinda feels like a bug that should still be open. Maybe GAE offers a patched version of gunicorn? Or this package has a workaround for this case?

kalyanac · 2021-01-20T21:41:07Z

@mgrdcm This is not specific to gunicorn, any package or user code could set the same.
I appreciate your point regarding App Engine. We will post an update if we figure out a way to address this.

zackarydev · 2021-03-09T22:56:13Z

Has anyone discovered a temporary fix for this other than disabling the profiler?

wyk9787 · 2021-03-09T23:46:28Z

Has anyone discovered a temporary fix for this other than disabling the profiler?

You can find the current solution here: https://cloud.google.com/profiler/docs/troubleshooting#python-blocking. I will also update the README to mention this.

mgrdcm · 2021-03-10T02:41:21Z

Not a fix, but I switched to using the New Relic agent on GAE Standard Python 3 and it works wonderfully. Their new free tier is pretty generous, and so far I'm able to stay within it for my use case.

kalyanac assigned jqll Jun 22, 2020

nolanmar511 unassigned jqll Jan 6, 2021

kalyanac closed this as completed Jan 7, 2021

kalyanac reopened this Jan 7, 2021

kalyanac closed this as completed Jan 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BlockingIOError on App Engine after some time #50

BlockingIOError on App Engine after some time #50

drazzib commented Jun 22, 2020

kalyanac commented Jun 22, 2020

kalyanac commented Jun 22, 2020

drazzib commented Jun 23, 2020

drazzib commented Jun 24, 2020

kalyanac commented Jun 24, 2020

kalyanac commented Jun 24, 2020

drazzib commented Jun 24, 2020

kalyanac commented Jun 25, 2020

kalyanac commented Jul 1, 2020

drazzib commented Jul 5, 2020

kalyanac commented Jul 6, 2020

radyz commented Jul 6, 2020 •

edited

Loading

kalyanac commented Jul 6, 2020

jqll commented Jul 6, 2020

mgrdcm commented Jul 9, 2020

kalyanac commented Jul 9, 2020

mgrdcm commented Jul 9, 2020 •

edited

Loading

jqll commented Sep 11, 2020

mgrdcm commented Dec 13, 2020

kalyanac commented Jan 7, 2021

mgrdcm commented Jan 8, 2021

kalyanac commented Jan 8, 2021

kalyanac commented Jan 8, 2021

mgrdcm commented Jan 8, 2021 •

edited

Loading

mgrdcm commented Jan 9, 2021

kalyanac commented Jan 20, 2021

zackarydev commented Mar 9, 2021

wyk9787 commented Mar 9, 2021

mgrdcm commented Mar 10, 2021

BlockingIOError on App Engine after some time #50

BlockingIOError on App Engine after some time #50

Comments

drazzib commented Jun 22, 2020

kalyanac commented Jun 22, 2020

kalyanac commented Jun 22, 2020

drazzib commented Jun 23, 2020

drazzib commented Jun 24, 2020

kalyanac commented Jun 24, 2020

kalyanac commented Jun 24, 2020

drazzib commented Jun 24, 2020

kalyanac commented Jun 25, 2020

kalyanac commented Jul 1, 2020

drazzib commented Jul 5, 2020

kalyanac commented Jul 6, 2020

radyz commented Jul 6, 2020 • edited Loading

kalyanac commented Jul 6, 2020

jqll commented Jul 6, 2020

mgrdcm commented Jul 9, 2020

kalyanac commented Jul 9, 2020

mgrdcm commented Jul 9, 2020 • edited Loading

jqll commented Sep 11, 2020

mgrdcm commented Dec 13, 2020

kalyanac commented Jan 7, 2021

mgrdcm commented Jan 8, 2021

kalyanac commented Jan 8, 2021

kalyanac commented Jan 8, 2021

mgrdcm commented Jan 8, 2021 • edited Loading

mgrdcm commented Jan 9, 2021

kalyanac commented Jan 20, 2021

zackarydev commented Mar 9, 2021

wyk9787 commented Mar 9, 2021

mgrdcm commented Mar 10, 2021

radyz commented Jul 6, 2020 •

edited

Loading

mgrdcm commented Jul 9, 2020 •

edited

Loading

mgrdcm commented Jan 8, 2021 •

edited

Loading