Memory management in the default implementation of Python, CPython, uses reference
counting. This ensures that as soon as all references to an object have expired, the
referenced object is also cleared. CPython also has a built-in cycle detector to ensure that
self-referencing objects are eventually garbage collected.

In theory, this means that most Python programmers don’t have to worry about allocating
or deallocating memory in their programs. It’s taken care of automatically by the language
and the CPython runtime. However, in practice, programs eventually do run out of
memory due to held references. Figuring out where your Python programs are using or
leaking memory proves to be a challenge.

The first way to debug memory usage is to ask the gc built-in module to list every object
currently known by the garbage collector. Although it’s quite a blunt tool, this approach
does let you quickly get a sense of where your program’s memory is being used.

Here, I run a program that wastes memory by keeping references. It prints out how many
objects were created during execution and a small sample of allocated objects.



# using_gc.py

import gc

found_objects = gc.get_objects()

print(‘%d objects before’ % len(found_objects))

import waste_memory

x = waste_memory.run()

found_objects = gc.get_objects()

print(‘%d objects after’ % len(found_objects))

for obj in found_objects[:3]:

    print(repr(obj)[:100])
    
Console>

4756 objects before

14873 objects after

waste_memory.MyObject object at 0x1063f6940

waste_memory.MyObject object at 0x1063f6978

waste_memory.MyObject object at 0x1063f69b0

The problem with gc.get_objects is that it doesn’t tell you anything about how the
objects were allocated. In complicated programs, a specific class of object could be
allocated many different ways. The overall number of objects isn’t nearly as important as
identifying the code responsible for allocating the objects that are leaking memory.

Python 3.4 introduces a new tracemalloc built-in module for solving this problem.
tracemalloc makes it possible to connect an object back to where it was allocated.
Here, I print out the top three memory usage offenders in a program using
tracemalloc:


# top_n.py
import tracemalloc
tracemalloc.start(10) # Save up to 10 stack frames
time1 = tracemalloc.take_snapshot()
import waste_memory
x = waste_memory.run()
time2 = tracemalloc.take_snapshot()
stats = time2.compare_to(time1, ‘lineno’)
for stat in stats[:3]:
print(stat)
>>>
waste_memory.py:6: size=2235 KiB (+2235 KiB), count=29981 (+29981),
average=76 B
waste_memory.py:7: size=869 KiB (+869 KiB), count=10000 (+10000), average=89
B waste_memory.py:12: size=547 KiB (+547 KiB), count=10000 (+10000), average=56
B



It’s immediately clear which objects are dominating my program’s memory usage and
where in the source code they were allocated.
The tracemalloc module can also print out the full stack trace of each allocation (up
to the number of frames passed to the start method). Here, I print out the stack trace of
the biggest source of memory usage in the program:

A stack trace like this is most valuable for figuring out which particular usage of a
common function is responsible for memory consumption in a program.
Unfortunately, Python 2 doesn’t provide the tracemalloc built-in module. There are
open source packages for tracking memory usage in Python 2 (such as heapy), though
they do not fully replicate the functionality of tracemalloc.



* It can be difficult to understand how Python programs use and leak memory. 

* The gc module can help you understand which objects exist, but it has no information about how they were allocated.

* The tracemalloc built-in module provides powerful tools for understanding the source of memory usage.

* tracemalloc is only available in Python 3.4 and above.