## Python - Allocations

---

CPython's memory system provides a self-contained, high-performance solution for managing the lifecycle of all objects on a dedicated private heap, integrated seamlessly with the PyMalloc arena allocator for optimal small-object performance. This architecture ensures that every value - scalars, collections, functions, classes, and custom instances - At the top level, Python keeps names and stack frames (local variables, call stack) separate from objects on the heap (lists, dicts, your Tick instances).

When you do `x = Tick(...)`, the name x is just a reference; the actual Tick object lives on the heap and can be pointed to by many names or containers. Every heap object has a reference count; when this count drops to zero (no references anywhere), CPython immediately frees that object’s memory back to its allocator.

Python also runs a cyclic garbage collector to clean up groups of objects that reference each other in cycles and would otherwise escape simple reference countingundergoes identical allocation, reference tracking, and deallocation procedures, while arenas minimize system call overhead and fragmentation for the millions of tiny allocations typical in data-intensive applications.

## pymalloc 

Calling the OS allocator (`malloc`/`free` or `mmap`) for every small object would be too slow and fragment memory badly, so CPython uses pymalloc for small objects (up to a few hundred bytes).
Pymalloc grabs memory from the OS in larger chunks and then sub-divides and reuses it, so most small-object allocations never talk to the OS at all.

Conceptually, pymalloc uses three levels:
* Arenas: big chunks from the OS (hundreds of KB or ~1MB, depending on version)
* Pools: 4KB regions inside an arena, each pool dedicated to one size class
* Blocks: fixed-size slots inside a pool, each block holding exactly one small object

Each pool keeps bookkeeping information: which size class it serves, how many blocks are currently used, and a pointer to a free list of blocks that have been freed and are ready to be reused.  Arenas track which pools are still available and which are fully used; if all pools in all arenas for a size class are full, pymalloc will get another arena from the OS and carve out more pools.

## Free lists

When an object like a Tick is freed (its reference count reaches zero), its block does not go back to the OS immediately; instead, it is pushed onto the free list for that pool. The next time Python needs another object of the same size class, pymalloc first checks the free list for a pool of that size and can just pop a block off the list, which is extremely fast and cache-friendly.


## Demo

In [None]:
import gc
import logging
import sys

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(message)s",
)

log = logging.getLogger(__name__)

In [25]:
class Tick:
    def __init__(self, price: float, size: int, ts: float):
        self.price = price
        self.size = size
        self.ts = ts

First we show that Python variables are just references to heap objects, and that CPython tracks how many references each object has via a reference count. We create a Tick object, showing that multiple variables point to the same heap object, and logs the reference count before and after deleting one reference.

In [26]:
def names_and_refcounts():
    log.info("=== Step 1: Names → objects → refcounts ===")

    t = Tick(100.5, 10, 0.0)
    log.info("Created Tick object %r", t)
    log.info("Heap-ish address (id): %s", hex(id(t)))
    log.info("Approx heap size of Tick: %d bytes", sys.getsizeof(t))

    # Multiple names pointing to the same object
    a = t
    b = t
    same_id = (id(a) == id(b) == id(t))
    log.info("id(a) == id(b) == id(t)? %s", same_id)

    rc_before = sys.getrefcount(t)
    log.info("Reference count for t (sys.getrefcount): %d", rc_before)

    # Drop one reference
    del a
    rc_after = sys.getrefcount(t)
    log.info("After del a, refcount(t): %d", rc_after)

if __name__ == "__main__":
    names_and_refcounts()


INFO    === Step 1: Names → objects → refcounts ===
INFO    Created Tick object <__main__.Tick object at 0x769e24529c10>
INFO    Heap-ish address (id): 0x769e24529c10
INFO    Approx heap size of Tick: 48 bytes
INFO    id(a) == id(b) == id(t)? True
INFO    Reference count for t (sys.getrefcount): 4
INFO    After del a, refcount(t): 3


We now allocate many Tick objects to stress the allocator, then drops them and forces garbage collection, logging how many unreachable objects were collected.

In [None]:
def many_objects_and_gc():
    log.info("=== Step 2: Many small objects & GC ===")

    def make_ticks(n: int):
        return [Tick(100.0 + i, i, float(i)) for i in range(n)]

    ticks = make_ticks(10_000)
    log.info("Created %d Tick objects", len(ticks))
    log.info("Example Tick size: %d bytes", sys.getsizeof(ticks[0]))

    # Drop the list so all Tick instances become unreachable
    del ticks
    collected = gc.collect()
    log.info("Forced GC, unreachable objects collected: %d", collected)
    
