## Python - Memory Allocations

---

The Python memory manager is an internal component responsible for efficiently allocating, managing, and deallocating memory for Python objects on the private heap. Its main functions and architecture include:

* Separation of Stack and Heap: While Python uses the stack to store references and execution context, the memory manager handles dynamic object storage on the heap transparently.
* Private Heap Management: Controls a dedicated memory area called the **private heap** where all Python objects and data structures reside.
* Layered Allocators:
    * Raw Memory Allocator: Requests large blocks of memory directly from the operating system for general-purpose needs.
    * Object Allocator (PyMalloc): Optimized for small objects (≤ 512 bytes), uses arenas (large chunks), pools (4KB each), and blocks (small chunks) to reduce fragmentation and improve speed.
    * Object-Specific Allocators: Specialized allocators for common Python types like integers, lists, and dictionaries to optimize their memory usage further.
* Interaction with Garbage Collector: Works alongside Python’s GC which handles cyclic references and overall heap organisation.

## pymalloc

* Size: 256 KiB (32-bit) or 1 MiB (64-bit) chunks from OS via `mmap()/malloc()`
* Subdivided: Into 64 pools of 4KB each
* Purpose: Minimize system calls by pre-allocating large chunks
* Lifecycle: Created when pools exhausted, freed only when completely empty

In [1]:
import logging
import os
import sys
import tracemalloc

import psutil
from tabulate import tabulate

In [2]:
logging.basicConfig(level=logging.INFO, format="%(message)s")
log = logging.getLogger()

In [3]:
tracemalloc.start(10)  # Track 10 frames of traceback

# Small objects (<512 bytes) use pymalloc arenas
small_objects = [b"x" * 100 for _ in range(5000)]
log.info("Small objects allocated (pymalloc, 500KB data + overhead)")

# Large objects (>=512 bytes) use rawalloc (system malloc)
large_objects = [b"x" * 2000 for _ in range(10)]
log.info("Large objects allocated (rawalloc/system malloc, 20KB data + overhead)")

# Huge object (>256KB, typically mmap)
huge = bytearray(10**8)  # 100MB object allocated likely via mmap
log.info("Huge object allocated (mmap, 100MB)")

# Take snapshot before cleanup
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics("lineno")

# Get current/peak tracked memory (from tracemalloc)
current, peak = tracemalloc.get_traced_memory()

# Get process RSS (resident set size, true memory usage)
process = psutil.Process(os.getpid())
rss_mb = process.memory_info().rss / 1024**2


log.info("\n=== MEMORY SUMMARY ===")
log.info(
    f"Total tracked: {sum(stat.size for stat in top_stats) / 1024**2:.1f} MB ({len(snapshot.traces)} traces)"
)
log.info(
    f"Live tracked: {current / 1024**2:.1f} MB, Peak tracked: {peak / 1024**2:.1f} MB"
)
log.info(f"Process RSS: {rss_mb:.1f} MB (includes untracked rawalloc/mmap)")


log.info("\n=== TOP 5 ALLOCATORS BY SIZE ===")
for i, stat in enumerate(top_stats[:5], 1):
    frame = stat.traceback[0]
    log.info(
        f"#{i}: {stat.count} blocks, {stat.size / 1024:.1f} KB at {frame.filename}:{frame.lineno}"
    )


log.info(f"\nPID: {os.getpid()}")
tracemalloc.stop()

Small objects allocated (pymalloc, 500KB data + overhead)


Large objects allocated (rawalloc/system malloc, 20KB data + overhead)


Huge object allocated (mmap, 100MB)



=== MEMORY SUMMARY ===


Total tracked: 95.4 MB (116 traces)


Live tracked: 95.4 MB, Peak tracked: 95.4 MB


Process RSS: 165.6 MB (includes untracked rawalloc/mmap)



=== TOP 5 ALLOCATORS BY SIZE ===


#1: 2 blocks, 97656.3 KB at /tmp/ipykernel_2501/3962429947.py:12


#2: 1 blocks, 40.8 KB at /tmp/ipykernel_2501/3962429947.py:4


#3: 14 blocks, 3.0 KB at /usr/lib/python3.12/codeop.py:126


#4: 8 blocks, 0.8 KB at /home/runner/work/theoria/theoria/.venv/lib/python3.12/site-packages/ipykernel/iostream.py:277


#5: 8 blocks, 0.7 KB at /home/runner/work/theoria/theoria/.venv/lib/python3.12/site-packages/ipykernel/iostream.py:288



PID: 2501


## Class Sizes

| Type              | Size (bytes) | Explanation |
|-------------------|--------------|-------------|
| **NoneType**      | 16           | Minimal singleton (just object header) |
| **bool**          | 28           | Singleton `int` subclass (True/False cached) |
| **int (small)**   | 28           | Small ints (-5..256) pre-cached in memory |
| **int (large)**   | 40           | Dynamic limb array for bigints (2**100) |
| **str (empty)**   | 41           | Header + compact UTF-8 buffer overhead |
| **str ('a')**     | 42           | +1B for single ASCII char |
| **str ('abc')**   | 44           | +3B for 3 ASCII chars|
| **list (empty)**  | 56           | Pointer array (capacity 0) + over-allocation |
| **dict (empty)**  | **64**       | **Compact dicts (Python 3.6+)** |
| **set (empty)**   | 216          | Hash table (2/3 load factor pre-allocation) |
| **tuple (empty)** | 40           | Most compact - fixed size array |
| **list [1,2,3]**  | 88           | Capacity 4-8 (over-allocated for O(1) append) |
| **dict {'a':1}**  | 184          | Compact keys+values layout |
| **set {1,2,3}**   | 216          | Hash table doesn't shrink (fixed slots) |

## Key Insights

- **Dict Revolution**: 64B empty (was 232B pre-Python 3.6)
- **Sets Wasteful**: 216B even when populated
- **Tuples Win**: 40B - most memory efficient collection
- **Lists Smart**: Over-allocate for amortized O(1) appends
- **Strings Optimal**: ASCII strings = raw bytes (no pointer overhead)


In [4]:
empty_dict = {}
one_item_dict = {"a": 1}
small_set = {1, 2, 3}

data = [
    ["NoneType", sys.getsizeof(None)],
    ["bool", sys.getsizeof(True)],
    ["int (small)", sys.getsizeof(2)],
    ["int (large)", sys.getsizeof(2**100)],
    ["str (empty)", sys.getsizeof("")],
    ["str ('a')", sys.getsizeof("a")],
    ["str ('abc')", sys.getsizeof("abc")],
    ["list (empty)", sys.getsizeof([])],
    ["dict (empty)", sys.getsizeof(empty_dict)],
    ["set (empty)", sys.getsizeof(set())],
    ["tuple (empty)", sys.getsizeof(tuple())],
    ["list [1,2,3]", sys.getsizeof([1, 2, 3])],
    ["dict {'a':1}", sys.getsizeof(one_item_dict)],
    ["set {1,2,3}", sys.getsizeof(small_set)],
]

log.info(tabulate(data, headers=["Type", "Size (bytes)"]))

Type             Size (bytes)
-------------  --------------
NoneType                   16
bool                       28
int (small)                28
int (large)                40
str (empty)                41
str ('a')                  42
str ('abc')                44
list (empty)               56
dict (empty)               64
set (empty)               216
tuple (empty)              40
list [1,2,3]               88
dict {'a':1}              184
set {1,2,3}               216


## `__slots__`

`__slots__` tells Python to store class instance attributes in a fixed-size array instead of a dynamic `__dict__ `dictionary. This eliminates per-instance dictionary overhead (~100-200 bytes saved per object) and speeds up attribute access.

Equivalently, **you sacrifice Python's dynamic attribute creation for memory/speed gains.** It's a deliberate static vs dynamic trade-off.

In [5]:
class PointNormal:
    def __init__(self, x, y):
        self.x = x
        self.y = y


class PointSlots:
    __slots__ = ("x", "y")  # Fixed attributes only

    def __init__(self, x, y):
        self.x = x
        self.y = y


p_normal = PointNormal(1, 2)
p_slots = PointSlots(1, 2)

log.info(
    f"PointNormal size: {sys.getsizeof(p_normal) + sys.getsizeof(p_normal.__dict__)} bytes"
)
log.info(
    f"PointSlots size: {sys.getsizeof(p_slots) + sys.getsizeof(PointSlots.__slots__)} bytes"
)

PointNormal size: 344 bytes


PointSlots size: 104 bytes
