Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when doing get() and scope closes #4833

Open
gregdan3 opened this issue Jun 17, 2020 · 14 comments
Open

Memory leak when doing get() and scope closes #4833

gregdan3 opened this issue Jun 17, 2020 · 14 comments
Labels
bug reproducer: present This PR or issue contains code, which reproduce the problem described or clearly understandable STR

Comments

@gregdan3
Copy link

🐞 Describe the bug

When downloading content from a website, memory usage increases roughly proportionally to the size of the content downloaded, with some overhead for the library and other objects in memory.

When the scope where that content is downloaded is closed and inaccessible, I expect the garbage collector to de-allocate the objects which were downloaded, but memory usage appears to remain near the same level as peak memory usage right after downloads finish.

💡 To Reproduce

  1. Set up a default ClientSession
  2. Open a connection with any site
  3. Download content from that site
  4. Leave the scope in which that content was downloaded
  5. Examine memory usage during and after download, observing that memory usage has not significantly decreased

💡 Expected behavior
Once the downloaded objects are no longer in scope, they should be de-allocated by the garbage collector and memory usage should fall back to a similar level as when the script initialized.

📋 Logs/tracebacks

I have no specific logs/tracebacks (no exception occurred), but I have profiled the relevant functions in two scripts with memory_profiler to demonstrate memory usage.

The first script is created with aiohttp, demonstrating the bug:

aiohttp_profile.txt

Filename: ./aiohttp_example.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    22     26.8 MiB     26.8 MiB           1   @profile
    23                                         async def quicktest():
    24     26.8 MiB      0.0 MiB           1       url = "https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"
    25     26.8 MiB      0.0 MiB           1       session = aiohttp.ClientSession()
    26     26.8 MiB      0.0 MiB           8       jobs = [grab_data(url, session) for _ in range(5)]
    27    260.7 MiB    233.9 MiB           2       jobs_result = await asyncio.gather(*jobs)
    28    260.7 MiB      0.0 MiB           1       if not all(jobs_result):  # sanity check
    29                                                 _log.info("Something didn't give back data")
    30    260.7 MiB      0.0 MiB           1       await session.close()


Filename: ./aiohttp_example.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    33     26.7 MiB     26.7 MiB           1   @profile
    34                                         def main():
    35     26.7 MiB      0.0 MiB           1       logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
    36     26.8 MiB      0.1 MiB           1       loop = asyncio.get_event_loop()
    37     26.8 MiB      0.0 MiB           1       try:
    38    201.4 MiB    174.6 MiB           1           loop.run_until_complete(quicktest())
    39                                                 # memory usage should be back to normal
    40    201.4 MiB      0.0 MiB           1           _log.info("Completed example")
    41                                             except KeyboardInterrupt:
    42                                                 _log.info("shutting down...")
    43                                                 loop.stop()

The bug is observed here: On line 38, I expect memory usage to reach its peak, and the next line should see memory usage drop to init levels. However, on line 40, memory usage is still at the same level, even though the scope where the memory was allocated is gone and inaccessible.

The second is created using requests in a nearly identical way, but demonstrates the behavior I expect in aiohttp:

requests_profile.txt

Filename: ./requests_example.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    19     23.9 MiB     23.9 MiB           1   @profile
    20                                         def quicktest():
    21     23.9 MiB      0.0 MiB           1       url = "https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"
    22     23.9 MiB      0.0 MiB           1       session = requests.Session()
    23    126.2 MiB    102.2 MiB           8       jobs_result = [grab_data(url, session) for _ in range(5)]
    24    126.2 MiB      0.0 MiB           1       if not all(jobs_result):  # sanity check
    25                                                 _log.info("Something didn't give back data")
    26    126.2 MiB      0.0 MiB           1       session.close()


Filename: ./requests_example.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    29     23.9 MiB     23.9 MiB           1   @profile
    30                                         def main():
    31     23.9 MiB      0.0 MiB           1       logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
    32     23.9 MiB      0.0 MiB           1       try:
    33     27.4 MiB      3.5 MiB           1           quicktest()
    34                                                 # memory usage should be back to normal
    35     27.4 MiB      0.0 MiB           1           _log.info("Completed example")
    36                                             except KeyboardInterrupt:
    37                                                 _log.info("shutting down...")

Here the memory usage shown by memory_profiler is considerably different, because the memory usage drops as the function quicktest() on line 33 ends, so the scope of main() never sees the same level of memory usage.

📋 Your version of the Python

python --version
Python 3.8.3

📋 Your version of the aiohttp/yarl/multidict distributions

python -m pip show aiohttp
Name: aiohttp
Version: 3.6.2
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: fafhrd91@gmail.com
License: Apache 2
Location: /home/gregdan3/.local/share/virtualenvs/memory_leak_aiohttp-hbLDHJl6/lib/python3.8/site-packages
Requires: attrs, chardet, multidict, async-timeout, yarl
Required-by:
$ python -m pip show multidict
(memory_leak_aiohttp) gregdan3@Theros ~/g/c/memory_leak_aiohttp (master)>
python -m pip show multidict
Name: multidict
Version: 4.7.6
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /home/gregdan3/.local/share/virtualenvs/memory_leak_aiohttp-hbLDHJl6/lib/python3.8/site-packages
Requires:
Required-by: yarl, aiohttp
python -m pip show yarl
Name: yarl
Version: 1.4.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /home/gregdan3/.local/share/virtualenvs/memory_leak_aiohttp-hbLDHJl6/lib/python3.8/site-packages
Requires: multidict, idna
Required-by: aiohttp

📋 Additional context

The relevant scripts and my Pipfile* are here:
scripts.zip

#!/usr/bin/env python
import asyncio
import logging
from concurrent.futures import ThreadPoolExecutor

import aiohttp
from memory_profiler import profile

_log = logging.getLogger(__name__)

LOG_FORMAT = (
    "[%(asctime)s] [%(filename)22s:%(lineno)-4s] [%(levelname)8s]    %(message)s"
)


async def grab_data(url, session):
    async with session.get(url) as response:
        data = await response.read()
    return data


@profile
async def quicktest():
    url = "https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"
    session = aiohttp.ClientSession()
    jobs = [grab_data(url, session) for _ in range(5)]
    jobs_result = await asyncio.gather(*jobs)
    if not all(jobs_result):  # sanity check
        _log.info("Something didn't give back data")
    await session.close()


@profile
def main():
    logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
    loop = asyncio.get_event_loop()
    try:
        loop.run_until_complete(quicktest())
        # memory usage should be back to normal
        _log.info("Completed example")
    except KeyboardInterrupt:
        _log.info("shutting down...")
        loop.stop()


if __name__ == "__main__":
    main()
#!/usr/bin/env python
import logging
import requests
from memory_profiler import profile

_log = logging.getLogger(__name__)

LOG_FORMAT = (
    "[%(asctime)s] [%(filename)22s:%(lineno)-4s] [%(levelname)8s]    %(message)s"
)


def grab_data(url, session):
    with session.get(url) as response:
        data = response.content
    return data


@profile
def quicktest():
    url = "https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"
    session = requests.Session()
    jobs_result = [grab_data(url, session) for _ in range(5)]
    if not all(jobs_result):  # sanity check
        _log.info("Something didn't give back data")
    session.close()


@profile
def main():
    logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
    try:
        quicktest()
        # memory usage should be back to normal
        _log.info("Completed example")
    except KeyboardInterrupt:
        _log.info("shutting down...")


if __name__ == "__main__":
    main()

And my Pipfile:

[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]

[packages]
memory_profiler = {git = "ssh://git@github.com/pythonprofilers/memory_profiler.git"}
aiohttp = "*"
requests = "*"

I have not embedded my Pipfile.lock due to its length, but it is in the scripts.zip above and demonstrates the same versions for my dependencies as I showed before.

All of my dependencies are listed above.

I created and tested the scripts above on my personal machine running Archlinux, but originally observed the bug that led me to create these scripts on a machine running CentOS.

uname --all
Linux HOSTNAME 5.6.15-arch1-1 #1 SMP PREEMPT Wed, 27 May 2020 23:42:26 +0000 x86_64 GNU/Linux
Linux HOSTNAME 3.10.0-862.9.1.el7.x86_64 #1 SMP Mon Jul 16 16:29:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
@gregdan3 gregdan3 added the bug label Jun 17, 2020
@ElijahSink
Copy link

I'm having this exact same issue.

@jmfrees
Copy link

jmfrees commented Jul 31, 2020

Any update on this? Would love to see this issue addressed!

@wei-hai
Copy link

wei-hai commented Aug 6, 2020

experiencing the same issue

@Mahesh0253
Copy link

On Python 3.7.8 it's working fine, but with Python 3.8+ versions getting memory leak problem.

I hope someone solve this problem

@derlih
Copy link
Contributor

derlih commented Nov 11, 2020

Just did a check on my machine on 2a9d5e9 (master):

Filename: ./tests/4833_test.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    22     32.4 MiB     32.4 MiB           1   @profile
    23                                         async def quicktest():
    24     32.4 MiB      0.0 MiB           1       url = "http://localhost:8000/A17_FlightPlan.pdf"
    25     32.4 MiB      0.0 MiB           1       session = aiohttp.ClientSession()
    26     32.4 MiB      0.0 MiB           8       jobs = [grab_data(url, session) for _ in range(5)]
    27    131.9 MiB     99.5 MiB           2       jobs_result = await asyncio.gather(*jobs)
    28    131.9 MiB      0.0 MiB           1       if not all(jobs_result):  # sanity check
    29                                                 _log.info("Something didn't give back data")
    30    131.9 MiB      0.0 MiB           1       await session.close()


[2020-11-11 18:24:58,934] [          4833_test.py:40  ] [    INFO]    Completed example
Filename: ./tests/4833_test.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    33     32.2 MiB     32.2 MiB           1   @profile
    34                                         def main():
    35     32.4 MiB      0.2 MiB           1       logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
    36     32.4 MiB      0.0 MiB           1       loop = asyncio.get_event_loop()
    37     32.4 MiB      0.0 MiB           1       try:
    38     33.2 MiB      0.8 MiB           1           loop.run_until_complete(quicktest())
    39                                                 # memory usage should be back to normal
    40     33.2 MiB      0.0 MiB           1           _log.info("Completed example")
    41                                             except KeyboardInterrupt:
    42                                                 _log.info("shutting down...")
    43                                                 loop.stop()
$ python --version
Python 3.8.5
$ uname -a
Linux 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.1 LTS
Release:	20.04
Codename:	focal

@derlih
Copy link
Contributor

derlih commented Nov 11, 2020

Same PC on aiohttp 3.7.2

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    22     26.7 MiB     26.7 MiB           1   @profile
    23                                         async def quicktest():
    24     26.7 MiB      0.0 MiB           1       url = "http://localhost:8000/A17_FlightPlan.pdf"
    25     26.7 MiB      0.0 MiB           1       session = aiohttp.ClientSession()
    26     26.7 MiB      0.0 MiB           8       jobs = [grab_data(url, session) for _ in range(5)]
    27    126.2 MiB     99.5 MiB           2       jobs_result = await asyncio.gather(*jobs)
    28    126.2 MiB      0.0 MiB           1       if not all(jobs_result):  # sanity check
    29                                                 _log.info("Something didn't give back data")
    30    126.2 MiB      0.0 MiB           2       await session.close()


[2020-11-11 18:32:41,084] [          4833_test.py:40  ] [    INFO]    Completed example
Filename: /home/dmitry/Sources/aiohttp/tests/4833_test.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    33     26.7 MiB     26.7 MiB           1   @profile
    34                                         def main():
    35     26.7 MiB      0.0 MiB           1       logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
    36     26.7 MiB      0.0 MiB           1       loop = asyncio.get_event_loop()
    37     26.7 MiB      0.0 MiB           1       try:
    38     27.5 MiB      0.8 MiB           1           loop.run_until_complete(quicktest())
    39                                                 # memory usage should be back to normal
    40     27.5 MiB      0.0 MiB           1           _log.info("Completed example")
    41                                             except KeyboardInterrupt:
    42                                                 _log.info("shutting down...")
    43                                                 loop.stop()
$ python --version
Python 3.8.5
$ python -m pip show aiohttp
Name: aiohttp
Version: 3.7.2
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: fafhrd91@gmail.com
License: Apache 2
Location: /tmp/mem_test/venv/lib/python3.8/site-packages
Requires: multidict, yarl, attrs, typing-extensions, chardet, async-timeout
Required-by: 
$ python -m pip show multidict
Name: multidict
Version: 5.0.0
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /tmp/mem_test/venv/lib/python3.8/site-packages
Requires: 
Required-by: yarl, aiohttp
$ python -m pip show yarl
Name: yarl
Version: 1.6.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /tmp/mem_test/venv/lib/python3.8/site-packages
Requires: idna, multidict
Required-by: aiohttp

@asvetlov
Copy link
Member

aiohttp doesn't use weak references to break cycles between moving parts.
In practice, it means that the memory is released not immediately but at the next call of the garbage collector.
You can enforce the gc by pushing gc.collect() just after loop.run_until_complete(quicktest()).

Does it help?

@derlih
Copy link
Contributor

derlih commented Nov 12, 2020

@asvetlov in my environment it is not needed. In the main function there is no memory leak.
May be it was a problem in older Python 3.8 or may be Python was patched somehow in a CentOS.

I will check later this leak in CentOS docker.

@asvetlov
Copy link
Member

OK. BTW, for checking on different Python versions I use https://github.com/pyenv/pyenv

pyenv install 3.9.0
pyenv virtualenv 3.9.0 aiohttp-3.9.0
pyenv local aiohttp-3.9.0

Works like a charm.

P.S.
Need to initialize PATH and PYENV_ROOT in bashrc/zshrc first as described in pyenv installation section first.

@derlih
Copy link
Contributor

derlih commented Nov 13, 2020

For CentOS I've got the leak :-(

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    22     28.3 MiB     28.3 MiB           1   @profile
    23                                         async def quicktest():
    24     28.3 MiB      0.0 MiB           1       url = "http://localhost:8000/A17_FlightPlan.pdf"
    25     28.3 MiB      0.0 MiB           1       session = aiohttp.ClientSession()
    26     28.3 MiB      0.0 MiB           8       jobs = [grab_data(url, session) for _ in range(5)]
    27    128.0 MiB     99.7 MiB           2       jobs_result = await asyncio.gather(*jobs)
    28    128.0 MiB      0.0 MiB           1       if not all(jobs_result):  # sanity check
    29                                                 _log.info("Something didn't give back data")
    30    128.0 MiB      0.0 MiB           2       await session.close()


[2020-11-13 12:52:31,871] [          4833_test.py:40  ] [    INFO]    Completed example
Filename: ./4833_test.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
    33     28.1 MiB     28.1 MiB           1   @profile
    34                                         def main():
    35     28.3 MiB      0.1 MiB           1       logging.basicConfig(level=logging.INFO, format=LOG_FORMAT)
    36     28.3 MiB      0.0 MiB           1       loop = asyncio.get_event_loop()
    37     28.3 MiB      0.0 MiB           1       try:
    38     68.7 MiB     40.4 MiB           1           loop.run_until_complete(quicktest())
    39                                                 # memory usage should be back to normal
    40     68.7 MiB      0.0 MiB           1           _log.info("Completed example")
    41                                             except KeyboardInterrupt:
    42                                                 _log.info("shutting down...")
    43                                                 loop.stop()

Dockerfile:

FROM centos
RUN yum install -y python3
RUN yum groupinstall -y 'Development Tools'
RUN yum install -y python3-devel
RUN pip3 install memory_profiler
RUN yum install -y wget
RUN wget https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf
RUN pip3 install aiohttp==3.7.2
ADD 4833_test.py /
$ python3 --version
Python 3.6.8

@asvetlov
Copy link
Member

Oooh. Thanks.
At least I have comprehensive instruction for "how to reproduce".

@asvetlov asvetlov added the reproducer: present This PR or issue contains code, which reproduce the problem described or clearly understandable STR label Nov 13, 2020
@JustinTArthur
Copy link
Contributor

Having a memory leak in one of my aiohttp apps, I've been following this closely. If memory_profiler is to be trusted, this still happens with aiohttp 3.7.x and cpython 3.9.4.

I've tried the following with no change to the apparent leak reported by memory_profiler:

  • Manually calling gc.collect() before or after stopping the event loop.
    • Also adding an asyncio.sleep() before or after the collect.
  • Encapsulating the behavior in asyncio.run().
  • Avoiding context managers and manually calling __aenter__ and __aexit__ on the request context.
  • Using non-SSL/TLS connections
  • Using AsyncResolver instead of the loop/OS/threadpool-based one.
  • Avoiding DNS entirely and just hitting 127.0.0.1.
  • Managing the Tasks directly instead of spawning with gather.
  • Using sequenced awaits instead of using concurrent tasks.
  • Using the AIOHTTP_NO_EXTENSIONS environment variable.

There are dependencies between a HttpResponseParser, Connection, ClientResponse, ResponseHandler, and Connector objects, some of them are circular. However, as of Python 3.4, the garbage collector should be able to collect them all as long as nothing refers to one of them after running all of their finalizers (like __del__(…)).

@gboumis-parrot
Copy link

I experience the same problem with aiohttp 3.8.1 and python 3.7.1

@gregdan3
Copy link
Author

gregdan3 commented Oct 27, 2022

Here's some bells and whistles to make spotting the issue easier:

https://github.com/gregdan3/aiohttp-memleak

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug reproducer: present This PR or issue contains code, which reproduce the problem described or clearly understandable STR
Projects
None yet
Development

No branches or pull requests

9 participants