# CPython - Deallocation

---

Python’s memory management combines reference counting for immediacy with a generational garbage collector for the hard cases. Every object carries an ob_refcnt field in its header that tracks how many active references point to it; creating or sharing a reference increments the count, and dropping a reference decrements it. When an object’s count reaches zero, CPython deallocates it immediately and recursively decrements the counts of any objects it owns, which makes most cleanup prompt and predictable. To handle reference cycles that refcounting alone cannot reclaim, CPython layers a three‑generation garbage collector on top: new objects start in the youngest generation and are periodically scanned for unreachable cycles, with survivors promoted to older generations that are collected less frequently to keep overhead low.

## Tl;dr

In CPython there are two main deallocation paths:

```text
NORMAL CASE (~99.9%): reference counting
----------------------------------------

user code    ->   drop ref / del lst   ->   Py_DECREF(obj)   ->   refcnt == 0 ?
                                                       |
                                                       +-- YES --> _Py_Dealloc(obj) --> list_dealloc()
                                                       |              (runs immediately, microseconds)
                                                       |              (memory returned to arena/pool)
                                                       |
                                                       +-- NO  --> object stays alive


CYCLE CASE (~0.1%): generational garbage collector
---------------------------------------------------

gc.collect()  ->  find container objects  ->  detect unreachable cycles
              ->  _Py_GC_Dealloc(obj_in_cycle)  ->  tp_dealloc (e.g. list_dealloc)
                        (runs during GC pause, milliseconds)
                        (frees from the same arenas/pools)


## Referencing Counting

In CPython, reference counting always runs first and handles almost all deallocation on its own. Every object carries an ob_refcnt counter; when your code creates or shares a reference (assigning to a variable, passing an argument, storing in a list), the counter is incremented, and when a reference goes away (variable reassigned, frame returns, container element removed), it is decremented. When a Py_DECREF call drives ob_refcnt to zero, CPython immediately invokes the object’s tp_dealloc (for example list_dealloc), freeing the object and recursively decreasing the refcounts of any objects it owns, so most memory is reclaimed instantly without involving the generational garbage collector at all.

## Circular References

Circular references break this system, though. Picture two objects pointing at each other (A holds B, B holds A): their counts stay at 2 forever, even if nothing else can reach them. That's where generational GC steps in as the safety net, using three age-based generations. New objects land in Gen0 (scanned often), survivors promote to Gen1 then Gen2 (scanned rarely—most long-lived objects settle here), triggered when uncollected allocations exceed a per-generation threshold.

## Mark and Sweep

The GC runs a mark-and-sweep dance: first, it marks everything reachable from roots (stack frames, globals, registers) by traversing containers like lists and dicts. Unmarked objects become garbage candidates, even in cycles. Then it sweeps, clearing weak references and `__del__` finalizers before deallocating—while promoting survivors to the next generation. This hybrid keeps Python responsive: refcount zaps 99% of objects instantly, GC quietly handles the rare loops.\


The mark‑and‑sweep GC is triggered automatically based on allocation/deallocation counters and thresholds, or manually via gc.collect().

In the case of automatic triggers:
* CPython tracks “allocations minus deallocations” for each generation; when this counter exceeds that generation’s threshold, a collection for that generation is run.
* The thresholds are given by gc.get_threshold(), typically something like (700, 10, 10), meaning:
*   Gen 0: collect when its counter passes ~700.
*   Gen 1: collect every 10 Gen‑0 collections.
*   Gen 2: collect every 10 Gen‑1 collections.


Alternatively, you can force a collection at any time with gc.collect(), or target a generation with gc.collect(0), gc.collect(1), or gc.collect(2). You can also disable or re‑enable automatic GC with gc.disable() / gc.enable(), and tune thresholds using `gc.set_threshold(...)` if you want fewer or more frequent collections. This is a blocking action so may be slow.


In [20]:
import gc
import logging
import sys
import timeit
import weakref

In [14]:
log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO, format="%(message)s")

## Example: Reference Counting

A simple example to show reference counting works.

In [None]:
log.info("=== Reference Counting: Function Parameter Demo ===")
log.info(f"{'Where':<28} refcnt  Notes")


def process_list(data):
    """Function param adds a temporary reference (+1)."""
    log.info(
        f"{'  inside process_list':<28} {sys.getrefcount(data):2d}   (+1 for param)"
    )
    log.info(f"    Processing {data}")


# Fresh list: name 'lst' + getrefcount's own temporary ref
lst = [1, 2, 3]
log.info(f"{'before process_list':<28} {sys.getrefcount(lst):2d}   (name + temp)")

# Pass to function -> refcount bumps during call, then drops back
process_list(lst)
log.info(f"{'after process_list':<28} {sys.getrefcount(lst):2d}   (param ref gone)")


# Multiple params -> multiple temporary refs
def multi_params(a, b):
    log.info(f"{'  inside multi_params':<28} {sys.getrefcount(a):2d}   (+2 for params)")


multi_params(lst, lst)
log.info(
    f"{'after multi_params':<28} {sys.getrefcount(lst):2d}   (both param refs gone)"
)

=== Reference Counting: Function Parameter Demo ===
Where                        refcnt  Notes
before process_list           2   (name + temp)
  inside process_list         3   (+1 for param)
after process_list            2   (param ref gone)
  inside multi_params         4   (+2 for params)
after multi_params            2   (both param refs gone)


    Processing [1, 2, 3]


The following example shows how fast CPython’s pure reference counting can allocate and then destroy large numbers of tiny objects when the cyclic garbage collector is disabled. Each timeit run creates n trivial objects inside a list and then deletes the list, so all those objects lose their last reference and are immediately deallocated by refcounting; by timing this repeated pattern and dividing by the number of objects, the code estimates the per‑object deallocation cost (and extrapolates to scenarios like “10 million deallocations”) to give a feel for the raw speed of CPython’s allocator/deallocator path for small objects.

In [19]:
def make_del(n):
    class T:
        pass

    objs = [T() for _ in range(n)]
    del objs


def benchmark_refcount(n=100_000, repeats=100):
    gc.disable()  # pure refcounting
    time_per_run = timeit.timeit(lambda: make_del(n), number=repeats) / repeats
    gc.enable()

    total_deallocs = n * repeats
    time_per_dealloc = time_per_run * 1e6 / n  # ns per dealloc

    log.info("+---------------- Refcount dealloc benchmark ----------------+")
    log.info("| objects per run : %12d                           |", n)
    log.info("| repeats         : %12d                           |", repeats)
    log.info("| total deallocs  : %12d                           |", total_deallocs)
    log.info(
        "| avg time / run  : %8.1f ms                            |", time_per_run * 1e3
    )
    log.info(
        "| time / dealloc  : %8.1f ns                            |", time_per_dealloc
    )
    log.info(
        f"| {n * repeats} deallocs    : ~%6.0f ms                            |",
        time_per_dealloc * 10_000_000 / 1e6,
    )
    log.info("+-----------------------------------------------------------+")


if __name__ == "__main__":
    benchmark_refcount()

+---------------- Refcount dealloc benchmark ----------------+
| objects per run :       100000                           |
| repeats         :          100                           |
| total deallocs  :     10000000                           |
| avg time / run  :      6.7 ms                            |
| time / dealloc  :      0.1 ns                            |
| 10000000 deallocs    : ~     1 ms                            |
+-----------------------------------------------------------+


## Example: Cyclic References

This example demonstrates exactly when Python’s cyclic garbage collector is needed, and how much work it does, by building real reference cycles and timing gc.collect() separately from object creation. In contrast to the earlier refcount‑only benchmark, the code now creates pairs of objects that reference each other so their refcounts never drop to zero, then measures how long the GC takes to detect and reclaim these unreachable cycles, making the role and cost of the cyclic collector very explicit.



In [None]:
def make_cycles(n):
    # Each loop makes a 2‑object cycle: a.ref -> b, b.ref -> a
    class Node:
        __slots__ = ("other",)

    objs = []
    for _ in range(n):
        a = Node()
        b = Node()
        a.other = b
        b.other = a
        objs.append(a)

    # Drop the only external references; cycles keep objects alive.
    del objs


def bench_cycles(n=50_000, repeats=20):
    gc.enable()
    gc.collect()

    # Create cycles, do NOT collect
    t_make = timeit.timeit(lambda: make_cycles(n), number=repeats) / repeats

    # Force cyclic GC
    t_gc = timeit.timeit(gc.collect, number=repeats) / repeats

    total_objects = n * repeats * 2  # two nodes per cycle

    log.info("Cyclic references benchmark:")
    log.info("  cycles per run : %d", n)
    log.info("  repeats        : %d", repeats)
    log.info("  objects total  : %d", total_objects)
    log.info("  make cycles    : %.1f ms/run", t_make * 1e3)
    log.info("  gc.collect()   : %.1f ms/run", t_gc * 1e3)


if __name__ == "__main__":
    bench_cycles()

Cyclic references benchmark:
  cycles per run : 50000
  repeats        : 20
  objects total  : 2000000
  make cycles    : 9.5 ms/run
  gc.collect()   : 46.9 ms/run


## Interlude: Weak References

A weak reference is a reference to an object that does not increase its reference count, so it does not keep the object alive; once only weak references remain, the object can be garbage‑collected and the weak reference will start returning None instead. This is particularly useful for niche types of caching; a weak‑reference cache lets you remember objects if they are still alive elsewhere, but automatically drop them when the rest of the program stops using them, so the cache does not cause memory leaks. weakref.WeakValueDictionary is perfect for this: it maps keys to objects, but the values are held only weakly, so entries disappear when the objects are garbage‑collected.

In [None]:
class Image:
    def __init__(self, name):
        self.name = name

    def __repr__(self):
        return f"Image({self.name!r})"


# Cache: names -> Image objects, but values are weak refs.
cache = weakref.WeakValueDictionary()


def load_image(name: str) -> Image:
    """Return an Image, reusing an existing one if it's still alive."""
    img = cache.get(name)
    if img is not None:
        log.info(f"cache HIT  for {name!r} -> {img!r}")
        return img

    log.info(f"cache MISS for {name!r} -> loading new Image")
    img = Image(name)
    cache[name] = img
    return img


def demo_cache():
    log.info("=== Weak-value cache demo ===")

    img1 = load_image("logo.png")  # miss, new Image
    img2 = load_image("logo.png")  # hit, same object

    log.info("img1 is img2? %s", img1 is img2)
    log.info("cache keys now: %r", list(cache.keys()))

    # Drop strong reference; only the cache still points to the Image (weakly).
    log.info("Deleting img1 and img2 (dropping strong refs)")
    del img1, img2
    gc.collect()  # encourage collection for the demo[web:492][web:511]

    log.info("After GC, cache keys: %r", list(cache.keys()))
    log.info("Entry for 'logo.png' disappears once the Image is collected.")


if __name__ == "__main__":
    demo_cache()

=== Weak-value cache demo ===
img1 is img2? True
cache keys now: ['logo.png']
Deleting img1 and img2 (dropping strong refs)
After GC, cache keys: []
Entry for 'logo.png' disappears once the Image is collected.


cache MISS for 'logo.png' -> loading new Image
cache HIT  for 'logo.png' -> Image('logo.png')
