Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory/Resource leak #357

Closed
fschulze opened this issue Dec 9, 2021 · 14 comments · Fixed by #358
Closed

Memory/Resource leak #357

fschulze opened this issue Dec 9, 2021 · 14 comments · Fixed by #358

Comments

@fschulze
Copy link

fschulze commented Dec 9, 2021

With devpi-server I can now confirm a memory leak somehow related to waitress. I observed it on Debian and FreeBSD, but not on macOS. The last test I did was replacing waitress with cheroot and the leak is completely gone with that. I'm currently working on creating a minimal example for reproduction, but wanted to give a heads up in case someone else noticed something and started investigating. It is certainly possible it is related in how we use waitress in devpi-server, but I don't think we do anything special. I don't want to switch to cheroot, because it lacks some features and seems to be less performant.

@fschulze
Copy link
Author

fschulze commented Dec 9, 2021

While trying to reproduce this, I realized it is not a leak, but it seems like waitress isn't cleaning up right after serving a request. If I reduce the number of threads, I can clearly see a upper limit of memory usage. If I follow up requests with large response bodies with requests with small bodies, the memory usage goes down. So depending on the usage pattern the memory usage can get very high if there were a bunch of large responses. Is there a way to fix this from my side, or does this need to be fixed in waitress?

@slav0nic
Copy link

pay attention to devpi_server.readonly.DictViewReadonly as i see, it leaked on every new request like /root/pypi/+simple/<pkgname>/
tothemoon
(on both waitress / chroot servers)
looks like in this case cheroot.wsgi is more memory effective (and it seems to work faster, but I'm don't dive deeper), but really does not get rid of the problem on your app level.

@fschulze
Copy link
Author

@slav0nic using devpi-server directly for this leads down many dead ends of which DictViewReadonly is unfortunately one. There is an internal cache, but it has a size limit and eventually flattens out. I'm very sorry to have wasted your time there. They key insight is, that using sys._debugmallocstats() all memory used by Python is accounted for, including the cache above etc., but the virtual and resident memory of the process is much higher than what sys._debugmallocstats() reports.

I tried to create a minimal example, but so far I can only partially replicate the issue. I'm still missing some key interaction between what devpi-server does and what waitress does.

@digitalresistor
Copy link
Member

Waitress does buffering of the output, those are stored in overflow able buffers, that may write data to disk once it reaches a certain high water mark. Those are then cycled out as they are drained.

The OverflowableBuffer is here: https://github.com/Pylons/waitress/blob/master/src/waitress/buffers.py#L190

A task (WSGI thread) will call write_soon on a channel:

def write_soon(self, data):

Which will check to make sure that we are not above outbuf_high_watermark, if we are, we block the WSGI task from continuing to add more data to the output buffers.

To avoid growing the output buffer unbounded, this was an issue because multiple tasks may re-use the same output buffer if sending data back to the same connection, we check to see if the output buffer has reached a high water mark itself, and if so we create a new one:

if self.current_outbuf_count >= self.adj.outbuf_high_watermark:

Then we just append the data to the existing buffer:

self.outbufs[-1].append(data)

Once the buffer is added to the back of the list, it goes through _flush_some which does the actual sending into the tcp/ip socket once that socket is marked writeable.

def _flush_some(self):

That happens here.

We will send data in chunks:

while outbuflen > 0:
, in a tight loop until send fails because the socket would block (and thus it returns 0 for the number of bytes sent). When the buffer is empty, and it is not the last one, it is popped from the list:

if len(self.outbufs) > 1:

if it is the last one, it sticks around, but the task thread will recycle it eventually if the amount of data written to it is too much. This avoids the issue we had whereby buffers were growing unbounded.

We call .close() on the buffer when we drop it.

Append is fairly simple: https://github.com/Pylons/waitress/blob/master/src/waitress/buffers.py#L244

There's one issue I do see, I am not sure how easily I can test to make sure it fixes this issue, so lemme open a PR real quick and if you could give that a whirl I would appreciate it.

@digitalresistor
Copy link
Member

@fschulze would you mind giving this change a test run: #358

It's the only issue I can see whereby when we overflow from BytesIO to file, we may end up keeping the BytesIO around without calling .close() on it. Eventually I would expect Python to clean that memory up because the object goes out of scope...

@digitalresistor
Copy link
Member

pay attention to devpi_server.readonly.DictViewReadonly as i see, it leaked on every new request like /root/pypi/+simple/<pkgname>/
tothemoon
(on both waitress / chroot servers)
looks like in this case cheroot.wsgi is more memory effective (and it seems to work faster, but I'm don't dive deeper), but really does not get rid of the problem on your app level.

cheroot does not buffer outgoing responses, and each connection is directly tied to a WSGI thread. This means that if a client is slow to read, you end up holding up that thread for as long as it takes them to read the response. (my cheroot experience is a couple of years out of date now, so it may not be accurate anymore)

Waitress attempts to avoid that by buffering up to a certain amount of the response, so that the WSGI thread can start working on another request sooner.

@fschulze
Copy link
Author

@bertjwregeer that seems to be it. Even with the default 50 threads, the memory doesn't grow above 7GB VIRT and 5GB RES in my initial testing. I'll let it run for a few days and report back.

@fschulze
Copy link
Author

@bertjwregeer memory usage has been stable for the past 20 days

@digitalresistor
Copy link
Member

@fschulze I have released a new beta version: https://pypi.org/project/waitress/2.1.0b0/

I would absolutely love your help in testing/validating that the various changes before I ship them!

@fschulze
Copy link
Author

fschulze commented Feb 7, 2022

I have updated m.devpi.net and another instance and will let you know if anything comes up or is stable after a few days.

@fschulze
Copy link
Author

@bertjwregeer so far nothing seems broken for me and memory usage is stable (though slightly higher, but that is likely unrelated).

@digitalresistor
Copy link
Member

@fschulze thank you so much for testing! I'll likely release it sometime this weekend!

@fschulze
Copy link
Author

It seems like the leak isn't gone completely. It is much much better than before, but very slowly it still leaks resident memory that is not accounted for by arena allocations of Python. I will investigate further.

This shouldn't hold up a release though.

@digitalresistor
Copy link
Member

@fschulze I look forward to your investigation, in the mean time I have released 2.1.0: https://pypi.org/project/waitress/2.1.0/

kraj pushed a commit to YoeDistro/meta-openembedded that referenced this issue Mar 7, 2022
Changelog:
=========
Python Version Support
----------------------
- Python 3.6 is no longer supported by Waitress
- Python 3.10 is fully supported by Waitress

Bugfix
-------
- "wsgi.file_wrapper" now sets the "seekable", "seek", and "tell"
  attributes from the underlying file if the underlying file is seekable. This
  allows WSGI middleware to implement things like range requests for example

  See Pylons/waitress#359 and
  Pylons/waitress#363

- In Python 3 "OSError" is no longer subscriptable, this caused failures on
  Windows attempting to loop to find an socket that would work for use in the
  trigger.

  See Pylons/waitress#361

- Fixed an issue whereby "BytesIO" objects were not properly closed, and
  thereby would not get cleaned up until garbage collection would get around to
  it.

  This led to potential for random memory spikes/memory issues, see
  Pylons/waitress#358 and
  Pylons/waitress#357 .

Features
--------
- When the WSGI app starts sending data to the output buffer, we now attempt to
  send data directly to the socket. This avoids needing to wake up the main
  thread to start sending data. Allowing faster transmission of the first byte.
  See Pylons/waitress#364

- Add REQUEST_URI to the WSGI environment.

  REQUEST_URI is similar to "request_uri" in nginx. It is a string that
  contains the request path before separating the query string and
  decoding "%"-escaped characters.

Signed-off-by: Wang Mingyu <wangmy@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
kraj pushed a commit to YoeDistro/meta-openembedded that referenced this issue Mar 8, 2022
Changelog:
=========
Python Version Support
----------------------
- Python 3.6 is no longer supported by Waitress
- Python 3.10 is fully supported by Waitress

Bugfix
-------
- "wsgi.file_wrapper" now sets the "seekable", "seek", and "tell"
  attributes from the underlying file if the underlying file is seekable. This
  allows WSGI middleware to implement things like range requests for example

  See Pylons/waitress#359 and
  Pylons/waitress#363

- In Python 3 "OSError" is no longer subscriptable, this caused failures on
  Windows attempting to loop to find an socket that would work for use in the
  trigger.

  See Pylons/waitress#361

- Fixed an issue whereby "BytesIO" objects were not properly closed, and
  thereby would not get cleaned up until garbage collection would get around to
  it.

  This led to potential for random memory spikes/memory issues, see
  Pylons/waitress#358 and
  Pylons/waitress#357 .

Features
--------
- When the WSGI app starts sending data to the output buffer, we now attempt to
  send data directly to the socket. This avoids needing to wake up the main
  thread to start sending data. Allowing faster transmission of the first byte.
  See Pylons/waitress#364

- Add REQUEST_URI to the WSGI environment.

  REQUEST_URI is similar to "request_uri" in nginx. It is a string that
  contains the request path before separating the query string and
  decoding "%"-escaped characters.

Signed-off-by: Wang Mingyu <wangmy@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
kraj pushed a commit to YoeDistro/meta-openembedded that referenced this issue Mar 8, 2022
Changelog:
=========
Python Version Support
----------------------
- Python 3.6 is no longer supported by Waitress
- Python 3.10 is fully supported by Waitress

Bugfix
-------
- "wsgi.file_wrapper" now sets the "seekable", "seek", and "tell"
  attributes from the underlying file if the underlying file is seekable. This
  allows WSGI middleware to implement things like range requests for example

  See Pylons/waitress#359 and
  Pylons/waitress#363

- In Python 3 "OSError" is no longer subscriptable, this caused failures on
  Windows attempting to loop to find an socket that would work for use in the
  trigger.

  See Pylons/waitress#361

- Fixed an issue whereby "BytesIO" objects were not properly closed, and
  thereby would not get cleaned up until garbage collection would get around to
  it.

  This led to potential for random memory spikes/memory issues, see
  Pylons/waitress#358 and
  Pylons/waitress#357 .

Features
--------
- When the WSGI app starts sending data to the output buffer, we now attempt to
  send data directly to the socket. This avoids needing to wake up the main
  thread to start sending data. Allowing faster transmission of the first byte.
  See Pylons/waitress#364

- Add REQUEST_URI to the WSGI environment.

  REQUEST_URI is similar to "request_uri" in nginx. It is a string that
  contains the request path before separating the query string and
  decoding "%"-escaped characters.

Signed-off-by: Wang Mingyu <wangmy@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
kraj pushed a commit to YoeDistro/meta-openembedded that referenced this issue Mar 9, 2022
Changelog:
=========
Python Version Support
----------------------
- Python 3.6 is no longer supported by Waitress
- Python 3.10 is fully supported by Waitress

Bugfix
-------
- "wsgi.file_wrapper" now sets the "seekable", "seek", and "tell"
  attributes from the underlying file if the underlying file is seekable. This
  allows WSGI middleware to implement things like range requests for example

  See Pylons/waitress#359 and
  Pylons/waitress#363

- In Python 3 "OSError" is no longer subscriptable, this caused failures on
  Windows attempting to loop to find an socket that would work for use in the
  trigger.

  See Pylons/waitress#361

- Fixed an issue whereby "BytesIO" objects were not properly closed, and
  thereby would not get cleaned up until garbage collection would get around to
  it.

  This led to potential for random memory spikes/memory issues, see
  Pylons/waitress#358 and
  Pylons/waitress#357 .

Features
--------
- When the WSGI app starts sending data to the output buffer, we now attempt to
  send data directly to the socket. This avoids needing to wake up the main
  thread to start sending data. Allowing faster transmission of the first byte.
  See Pylons/waitress#364

- Add REQUEST_URI to the WSGI environment.

  REQUEST_URI is similar to "request_uri" in nginx. It is a string that
  contains the request path before separating the query string and
  decoding "%"-escaped characters.

Signed-off-by: Wang Mingyu <wangmy@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
halstead pushed a commit to openembedded/meta-openembedded that referenced this issue Mar 9, 2022
Changelog:
=========
Python Version Support
----------------------
- Python 3.6 is no longer supported by Waitress
- Python 3.10 is fully supported by Waitress

Bugfix
-------
- "wsgi.file_wrapper" now sets the "seekable", "seek", and "tell"
  attributes from the underlying file if the underlying file is seekable. This
  allows WSGI middleware to implement things like range requests for example

  See Pylons/waitress#359 and
  Pylons/waitress#363

- In Python 3 "OSError" is no longer subscriptable, this caused failures on
  Windows attempting to loop to find an socket that would work for use in the
  trigger.

  See Pylons/waitress#361

- Fixed an issue whereby "BytesIO" objects were not properly closed, and
  thereby would not get cleaned up until garbage collection would get around to
  it.

  This led to potential for random memory spikes/memory issues, see
  Pylons/waitress#358 and
  Pylons/waitress#357 .

Features
--------
- When the WSGI app starts sending data to the output buffer, we now attempt to
  send data directly to the socket. This avoids needing to wake up the main
  thread to start sending data. Allowing faster transmission of the first byte.
  See Pylons/waitress#364

- Add REQUEST_URI to the WSGI environment.

  REQUEST_URI is similar to "request_uri" in nginx. It is a string that
  contains the request path before separating the query string and
  decoding "%"-escaped characters.

Signed-off-by: Wang Mingyu <wangmy@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
Signed-off-by: Trevor Gamblin <trevor.gamblin@windriver.com>
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jan 28, 2024
2.1.2
-----

Bugfix
~~~~~~

- When expose_tracebacks is enabled waitress would fail to properly encode
  unicode thereby causing another error during error handling. See
  Pylons/waitress#378

- Header length checking had a calculation that was done incorrectly when the
  data was received across multple socket reads. This calculation has been
  corrected, and no longer will Waitress send back a 413 Request Entity Too
  Large. See Pylons/waitress#376

Security Bugfix
~~~~~~~~~~~~~~~

- in 2.1.0 a new feature was introduced that allowed the WSGI thread to start
  sending data to the socket. However this introduced a race condition whereby
  a socket may be closed in the sending thread while the main thread is about
  to call select() therey causing the entire application to be taken down.
  Waitress will no longer close the socket in the WSGI thread, instead waking
  up the main thread to cleanup. See Pylons/waitress#377

2.1.1
-----

Security Bugfix
~~~~~~~~~~~~~~~

- Waitress now validates that chunked encoding extensions are valid, and don't
  contain invalid characters that are not allowed. They are still skipped/not
  processed, but if they contain invalid data we no longer continue in and
  return a 400 Bad Request. This stops potential HTTP desync/HTTP request
  smuggling. Thanks to Zhang Zeyu for reporting this issue. See
  GHSA-4f7p-27jc-3c36

- Waitress now validates that the chunk length is only valid hex digits when
  parsing chunked encoding, and values such as ``0x01`` and ``+01`` are no
  longer supported. This stops potential HTTP desync/HTTP request smuggling.
  Thanks to Zhang Zeyu for reporting this issue. See
  GHSA-4f7p-27jc-3c36

- Waitress now validates that the Content-Length sent by a remote contains only
  digits in accordance with RFC7230 and will return a 400 Bad Request when the
  Content-Length header contains invalid data, such as ``+10`` which would
  previously get parsed as ``10`` and accepted. This stops potential HTTP
  desync/HTTP request smuggling Thanks to Zhang Zeyu for reporting this issue. See
  GHSA-4f7p-27jc-3c36

2.1.0
-----

Python Version Support
~~~~~~~~~~~~~~~~~~~~~~

- Python 3.6 is no longer supported by Waitress

- Python 3.10 is fully supported by Waitress

Bugfix
~~~~~~

- ``wsgi.file_wrapper`` now sets the ``seekable``, ``seek``, and ``tell``
  attributes from the underlying file if the underlying file is seekable. This
  allows WSGI middleware to implement things like range requests for example

  See Pylons/waitress#359 and
  Pylons/waitress#363

- In Python 3 ``OSError`` is no longer subscriptable, this caused failures on
  Windows attempting to loop to find an socket that would work for use in the
  trigger.

  See Pylons/waitress#361

- Fixed an issue whereby ``BytesIO`` objects were not properly closed, and
  thereby would not get cleaned up until garbage collection would get around to
  it.

  This led to potential for random memory spikes/memory issues, see
  Pylons/waitress#358 and
  Pylons/waitress#357 .

  With thanks to Florian Schulze for testing/vaidating this fix!

Features
~~~~~~~~

- When the WSGI app starts sending data to the output buffer, we now attempt to
  send data directly to the socket. This avoids needing to wake up the main
  thread to start sending data. Allowing faster transmission of the first byte.
  See Pylons/waitress#364

  With thanks to Michael Merickel for being a great rubber ducky!

- Add REQUEST_URI to the WSGI environment.

  REQUEST_URI is similar to ``request_uri`` in nginx. It is a string that
  contains the request path before separating the query string and
  decoding ``%``-escaped characters.
daregit pushed a commit to daregit/yocto-combined that referenced this issue May 22, 2024
Changelog:
=========
Python Version Support
----------------------
- Python 3.6 is no longer supported by Waitress
- Python 3.10 is fully supported by Waitress

Bugfix
-------
- "wsgi.file_wrapper" now sets the "seekable", "seek", and "tell"
  attributes from the underlying file if the underlying file is seekable. This
  allows WSGI middleware to implement things like range requests for example

  See Pylons/waitress#359 and
  Pylons/waitress#363

- In Python 3 "OSError" is no longer subscriptable, this caused failures on
  Windows attempting to loop to find an socket that would work for use in the
  trigger.

  See Pylons/waitress#361

- Fixed an issue whereby "BytesIO" objects were not properly closed, and
  thereby would not get cleaned up until garbage collection would get around to
  it.

  This led to potential for random memory spikes/memory issues, see
  Pylons/waitress#358 and
  Pylons/waitress#357 .

Features
--------
- When the WSGI app starts sending data to the output buffer, we now attempt to
  send data directly to the socket. This avoids needing to wake up the main
  thread to start sending data. Allowing faster transmission of the first byte.
  See Pylons/waitress#364

- Add REQUEST_URI to the WSGI environment.

  REQUEST_URI is similar to "request_uri" in nginx. It is a string that
  contains the request path before separating the query string and
  decoding "%"-escaped characters.

Signed-off-by: Wang Mingyu <wangmy@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
Signed-off-by: Trevor Gamblin <trevor.gamblin@windriver.com>
daregit pushed a commit to daregit/yocto-combined that referenced this issue May 22, 2024
Changelog:
=========
Python Version Support
----------------------
- Python 3.6 is no longer supported by Waitress
- Python 3.10 is fully supported by Waitress

Bugfix
-------
- "wsgi.file_wrapper" now sets the "seekable", "seek", and "tell"
  attributes from the underlying file if the underlying file is seekable. This
  allows WSGI middleware to implement things like range requests for example

  See Pylons/waitress#359 and
  Pylons/waitress#363

- In Python 3 "OSError" is no longer subscriptable, this caused failures on
  Windows attempting to loop to find an socket that would work for use in the
  trigger.

  See Pylons/waitress#361

- Fixed an issue whereby "BytesIO" objects were not properly closed, and
  thereby would not get cleaned up until garbage collection would get around to
  it.

  This led to potential for random memory spikes/memory issues, see
  Pylons/waitress#358 and
  Pylons/waitress#357 .

Features
--------
- When the WSGI app starts sending data to the output buffer, we now attempt to
  send data directly to the socket. This avoids needing to wake up the main
  thread to start sending data. Allowing faster transmission of the first byte.
  See Pylons/waitress#364

- Add REQUEST_URI to the WSGI environment.

  REQUEST_URI is similar to "request_uri" in nginx. It is a string that
  contains the request path before separating the query string and
  decoding "%"-escaped characters.

Signed-off-by: Wang Mingyu <wangmy@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
Signed-off-by: Trevor Gamblin <trevor.gamblin@windriver.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants