\[<< [Basic Overview](./01_python_basic_overview.ipynb) | [Index](./00_index.ipynb) | [Function parameters and arguments](./03_function_parameters_and_arguments.ipynb) >>\]

# Memory Management in Python

At the end of this section you will have these understanding:
- How Python objects are stored in memory?
- What are reference counters?
- How does Garbage collection works in Python?
- Difference between `is` and `==`.
- Memory optimization.
- Best practices for memory management.


| C                                                                        | Python                                                                        |
| ------------------------------------------------------------------------ | ----------------------------------------------------------------------------- |
| ![](./static/01/C_Memory_Management.png)                                 | ![](./static/01/Python_Memory_Management.png)                                 |
| [Link to repl](https://replit.com/@lyndabaka/Memory-Management-in-C?v=1) | [Link to repl](https://replit.com/@lyndabaka/Memory-Management-in-Python?v=1) |ng)

## Everything is in memory as an object

In Python, everything you work with, whether it's a number, a string, or even a function, is treated as an object. These objects are stored in memory for the program to access.

The place where these objects are stored is called the **heap**. It's like a big storage space where Python keeps track of all the objects.

In [None]:
a_int = 10
a_str = "hello"
a_list = [1, 2, 3, 4]
a_dict = {"A": 1, "B": 2}


def a_func():
    pass


class A:
    pass


print(hex(id(a_int)))
print(hex(id(a_str)))
print(hex(id(a_list)))
print(hex(id(a_dict)))
print(hex(id(a_func)))
print(hex(id(A)))

In [None]:
print(isinstance(a_int, object))
print(isinstance(a_str, object))
print(isinstance(a_list, object))
print(isinstance(a_dict, object))
print(isinstance(a_func, object))
print(isinstance(A, object))

This means we can `assign` them to a variable, `pass` them to a function, or `return` them from a function.

## Finding memory address using `id()` function

If you want to know the memory address of any object, you can use the `id()` function. It will give you a unique identifier for that object.

The memory address provided by `id()` is in base-10, but if you pass it to `hex()`, it will convert it to hexadecimal notation. This can make it easier for you to read and work with.

ref: [PyObject](https://docs.python.org/3/c-api/structures.html)

In [None]:
a = 100

a_address = id(a)

print(a_address)
print(hex(a_address))

![](https://www.honeybadger.io/images/blog/posts/memory-management-in-python/var_as_ref_ex_a.png?1691803805)

<div style="text-align: center"><a href="https://www.honeybadger.io/blog/memory-management-in-python/">Source</a></div>

Every object consists of: `reference count`, `type`, `value`.


In [None]:
b = a

b_address = id(b)
print(b_address)
print(hex(b_address))

![](https://www.honeybadger.io/images/blog/posts/memory-management-in-python/var_as_ref_ex_a_and_b.png?1691803805)

[Source](https://www.honeybadger.io/blog/memory-management-in-python/)

In [None]:
c = "hello"

c_address = id(c)
print(c_address)
print(hex(c_address))

In [None]:
# We can use ctypes.cast to create a new Python object which reference to same address
import ctypes

# *a_address
print(ctypes.cast(a_address, ctypes.py_object).value)

# *c_address
print(ctypes.cast(c_address, ctypes.py_object).value)

## Finding reference count using `sys.getrefcount(...)`

In Python, the memory manager keeps track of how many references point to each object in memory. This count is known as the **reference count** for that object.

In [None]:
my_var = 12324

To access the reference count of an object, you can use the `sys.getrefcount()` function. However, there is a small artifact associated with this function.

When you call `sys.getrefcount()`, it increments the reference count by 1 temporarily. This happens because the function itself accesses the object, which causes the count to go up by 1.

In [None]:
import sys


# The counter starts at 1 when an object is created.
before = sys.getrefcount(my_var)

# It increments when a reference is created
my_var2 = my_var
after_reference_by_other = sys.getrefcount(my_var)

In [None]:
hex(id(my_var))

If you create another variable and assign it the same value as the first variable, both variables will point to the same memory address. This means that changes made to one variable will also affect the other, making it a **pass by reference** behavior in Python.

In [None]:
# The counter starts at 1 when an object is created.
before = sys.getrefcount(my_var)

# It increments when a reference is created
my_var2 = my_var
after_reference_by_other = sys.getrefcount(my_var)

print(f"Starting reference: {before}")
print(f"After another variable reference: {after_reference_by_other}")

> Note that sync value are passed by reference in Python, sys.getrefcount(...) will return one additional count to actual reference number. It can be more in case you are running Python Interpreter in some differnt environment, like Jupyter notebook.

In [None]:
print(id(my_var))
print(id(my_var2))
id(my_var) == id(my_var2)

Another way to get the reference count without the artifact is by using the `ctypes.c_long.from_address(var_address).value` function. This approach requires the direct address of the object.

In [None]:
# We can actually use ctypes to get the actual reference counter
address = id(my_var)
ctypes.c_long.from_address(address).value

In [None]:
# reference counter decrements when a reference is deleted.
del my_var2
ctypes.c_long.from_address(address).value

In [None]:
ctypes.cast(address, ctypes.py_object).value

In [None]:
# The counter reaching zero indicates no more references to the object.
del my_var

# This might now give some random value.
# As of fact reference counter for value 12324 is 0 now so python's garbage collection kicks in and free up the space
ctypes.c_long.from_address(address).value

In [None]:
# Doing this may crash the Python interpreter ass the address now does not contains python object
# ctypes.cast(address, ctypes.py_object).value

- Objects with a reference count of zero are considered garbage.
- Garbage collection reclaims memory occupied by unreferenced objects.
- Reference counting offers efficient memory management and immediate resource reclamation.
- However, it doesn't handle cyclic references (objects referencing each other).
- Python uses additional garbage collection mechanisms to handle cyclic references.
- The reference counter ensures timely deallocation of objects and efficient memory usage.

**Best PyCon talk in Memory management**

[![](https://img.youtube.com/vi/F6u5rhUQ6dU/0.jpg)](https://youtu.be/F6u5rhUQ6dU)

## Get size of object using `sys.getrefcount(...)`

In [None]:
sys.getsizeof(1)

In [None]:
sys.getsizeof("Python")

In [None]:
sys.getsizeof([1, 2, 3, 4])

In [None]:
sys.getsizeof((1, 2, 3, 4))

In [None]:
sys.getsizeof({1, 2, 3, 4})

Now let's look at them by incrementing them by 1 at a time.

Let's start with `string`

In [None]:
# It should be 49, but jupyter notebook it shows 51 bytes
# https://stackoverflow.com/questions/53899931/why-does-an-empty-string-in-python-sometimes-take-up-49-bytes-and-sometimes-51

sys.getsizeof("")

In [None]:
sys.getsizeof("P")

In [None]:
sys.getsizeof("Py")

In [None]:
sys.getsizeof("Pyt")

i.e. empty string has an overhead size of 49 bytes and each new character adds 1 byte to the string size.

Let's check `list`

In [None]:
sys.getsizeof([])

In [None]:
sys.getsizeof([1])

In [None]:
sys.getsizeof([1, 2])

Seems fine right? Like empty list has an overhead size of 56 bytes (or 28 bytes in WebAssembly) and each integer added to the list adds up 8 bytes (or 4 bytes in WebAssembly).

In [None]:
sys.getsizeof([1, 2])

In [None]:
sys.getsizeof([1, 2, 3, 4])

In [None]:
sys.getsizeof([1, 2, [1, 2, 3, 4]])

Why? That's because of the the item inside the list does not store the object, but reference to the object. In this case we are storing `[ref of 1, ref of 2, reference of [1, 2, 3, 4]]` 

There are snippets and tools which can be used to get the actual size of a object.
- [COMPUTE MEMORY FOOTPRINT OF AN OBJECT AND ITS CONTENTS (PYTHON RECIPE)](https://code.activestate.com/recipes/577504/)
- [memray](https://bloomberg.github.io/memray/index.html) - only works on Linux and macOS. 
- [Pympler](https://py`thonhosted.org/Pympler/#)
- [memory-profiler](https://github.com/pythonprofilers/memory_profiler)
- [Scalene](https://github.com/plasma-umass/scalene) - only works properly on Linux and macOS.

In [None]:
%pip install pympler

In [None]:
from pympler import asizeof

asizeof.asizeof([1, 2, [1, 2, 3, 4]])

In [None]:
%pip install scalene

In [None]:
%%writefile example/concate_with_plus.py
def add_string_with_plus(iters):
    s = ""
    for i in range(iters):
        s += "abc"
    assert len(s) == 3*iters
    
add_string_with_plus(500000)

In [None]:
%%writefile example/concate_with_join.py
def add_string_with_join(iters):
    l = []
    for i in range(iters):
        l.append("abc")
    s = "".join(l)
    assert len(s) == 3*iters
    
add_string_with_join(500000)

In [None]:
!scalene example/concate_with_plus.py

In [None]:
!scalene example/concate_with_join.py

You will get better output once you run it from terminal:
![](./static/scalene_example.png)

In [None]:
%pip install memory-profiler 

In [2]:
%%writefile example/memory_profile.py
@profile
def my_func():
    a = [1] * (10 ** 6)
    b = [2] * (2 * 10 ** 7)
    del b
    return a

my_func()

Overwriting example/memory_profile.py


In [4]:
!python -m memory_profiler example/memory_profile.py

Filename: example/memory_profile.py

Line #    Mem usage    Increment  Occurrences   Line Contents
     1   37.766 MiB   37.766 MiB           1   @profile
     2                                         def my_func():
     3   45.410 MiB    7.645 MiB           1       a = [1] * (10 ** 6)
     4  198.000 MiB  152.590 MiB           1       b = [2] * (2 * 10 ** 7)
     5   45.414 MiB -152.586 MiB           1       del b
     6   45.414 MiB    0.000 MiB           1       return a




Later we can also check how much memory a small program in Python takes as compare to other language.

## Garbage collection Gotha!

**NOTE**: This might seems a bit advance and can be skipped for intermediate level! Maybe revisit this once other topics are covered.

- This section covers some of the gocha of how we can have objects which are still not automatically collected by the garbage collector, sometimes also called as `memory leak`.
- Some may be collected manually by running `gc.collect()`.
- Other may not be because of the way they were referencec.
We will also see the BKM of maintaining weak references which are removed as soon as the object goes out of scope.

In [None]:
class C:
    def __init__(self, x):
        self.x = x

    def __del__(self):
        print(f"Deleting object: {self}")

In [None]:
c = C(10)

In [None]:
c = None

#### Circular reference

In [None]:
a = [1, 2]
a.append(a)

print(a)

In [None]:
a[2][2][2][2][1]

In [None]:
c = C(10)
c.cr = c

In [None]:
c.cr.cr.cr.cr.x

In [None]:
c = None

In [None]:
import gc

gc.collect()

#### `weakref`: Maintaining reference to object without need to trigger manual garbage collection

In [None]:
cache = set()

In [None]:
c1 = C(10)
c2 = C(11)

In [None]:
print(c1)
print(c2)

In [None]:
cache.add(c1)
cache.add(c2)

In [None]:
c1 = None
c2 = None

In [None]:
cache

In [None]:
gc.collect()

In [None]:
c1 = C(10)
c2 = C(11)

In [None]:
import weakref

cache = set()
cache.add(weakref.ref(c1))
cache.add(weakref.ref(c2))

In [None]:
c1 = None
c2 = None

In [None]:
cache

In [None]:
cache.pop()() is None

#### Better way is to use `weakref.WeakKeyDictionary`

In [None]:
c1 = C(10)
c2 = C(11)

In [None]:
cache = weakref.WeakKeyDictionary()
cache[c1] = hash(c1)
cache[c2] = hash(c2)

In [None]:
print(list(cache.items()))

In [None]:
c1 = None
c2 = None

In [None]:
print(list(cache.items()))

#### `@lru_cache` in methods can also cause memory leak!

- ref: [issue19859 in python bugs](https://bugs.python.org/issue19859)
- ref: [How do I cache method calls](https://docs.python.org/3/faq/programming.html#how-do-i-cache-method-calls)

In [None]:
import time


class Calculator:
    def __init__(self, num):
        self.num = num

    def calculate(self, power):
        print("doing heavy work...")
        time.sleep(0.5)  # simulate the compute delay
        return self.num**power

    def __del__(self):
        print(f"Deleting object: {self}")

In [None]:
c = Calculator(20)
print(c.calculate(3))
print(c.calculate(3))
print(c.calculate(3))
print(c.calculate(3))

In [None]:
c = None

In [None]:
import time
from functools import lru_cache


class Calculator:
    def __init__(self, num):
        self.num = num

    @lru_cache(maxsize=20)
    def calculate(self, power):
        print("doing heavy work...")
        time.sleep(0.5)  # simulate the compute delay
        return self.num**power

    def __del__(self):
        print(f"Deleting object: {self}")

In [None]:
c = Calculator(20)
print(c.calculate(3))
print(c.calculate(3))
print(c.calculate(3))
print(c.calculate(3))

In [None]:
c = None

In [None]:
Calculator.calculate.cache_info()

In [None]:
import gc

gc.collect()

In [None]:
Calculator.calculate.cache_clear()

## Python is dynamically typed

1. **Runtime Variable Type Determination:** Python is dynamically typed, meaning variable types are determined during runtime, not compile time.
2. **Implicit Type Declaration:** Variables don't require explicit type declaration; type is inferred from assigned values.
3. **Variable Type Flexibility:** Variables can hold different data types at different points in the program.
4. **Runtime Reassignment:** Variables can be reassigned to various types during program execution.
5. **Code Conciseness and Flexibility:** Allows concise and adaptable code by using the same variable for different data types.

In [None]:
my_var = 10
print(hex(id(my_var)))
print(type(my_var))

In [None]:
my_var = "Test string"
print(hex(id(my_var)))
print(type(my_var))

## Mutability

Mutation in Python refers to the process of changing the value of an object while keeping the same address in memory. In other words, when you mutate an object, you modify its internal state without creating a completely new object.

### Mutation vs Rebinding

Not exactly language specific feature. ref: [Wikipedia post](https://en.wikipedia.org/wiki/Name_binding#Rebinding_and_mutation)

In [None]:
# Mutable objects provide a way to mutate them (change the internal state)
a_list = [10, 20, 30]

print(hex(id(a_list)))

a_list.append(40)

print(hex(id(a_list)))

In [None]:
# Note that this create a new object in memory
a_list2 = [10, 20, 30]

print(hex(id(a_list2)))

a_list2 = a_list2 + [40]

# Note how Python recreated a new object here,
# since it evaluates the right hand side and then reassign it to a_list2 object
print(hex(id(a_list2)))

In [None]:
a_tuple = ([10, 20, 30], 50, 60)

a_tuple[0].append(40)
print(a_tuple)

> Note that in Python almost every user define data structure is mutable. Only immutable data type in Python are `int`, `float`, `complex`, `str`, `tuple`, `frozenset`, `bytes`, `True`, `False`, `None`.

## Variable Equality

- `a == b` checks if value of a is equal to value of b
- `a is b` checks if value of id(a) is equal to value of id(b)

In [None]:
a_int = 10
b_int = 10

print(a_int == b_int)
print(a_int is b_int)  # id(a) == id(b)

In [None]:
a_int = 999
b_int = 999

print(a_int == b_int)
print(a_int is b_int)  # id(a) == id(b)

In [None]:
a_str = "python"
b_str = "python"

print(a_str == b_str)
print(a_str is b_str)

In [None]:
a_str = "python course"
b_str = "python course"

print(a_str == b_str)
print(a_str is b_str)

We will check why this happens when we discuss what Python does for memory optimization.

In [None]:
a_list = [1, 2, 3, 4]
b_list = [1, 2, 3, 4]

print(a_list == b_list)
print(a_list is b_list)

In [None]:
a_int = 10
a_float = 10.0

print(a_int == a_float)
print(a_int is a_float)

In [None]:
a_obj = None
b_obj = None

print(a_obj == None)
print(a_obj is None)
print(a_obj == b_obj)
print(a_obj is b_obj)

## Memory optimization

#### Interning

[From Wikipedia](https://en.wikipedia.org/wiki/Interning_(computer_science))
> In computer science, interning is re-using objects of equal value on-demand instead of creating new objects. This creational pattern is frequently used for numbers and strings in different programming languages.

##### Number Interning:
- Python interns small integers in the range [-5, 256]. This means that any variable referencing an integer within this range will point to the same memory location. For example, `x = 5` and `y = 5` will have `x is y` evaluate to `True`.
- Numbers outside the interned range or those created dynamically are not interned. For example, `x = 1000` and `y = 1000` will have `x is y` evaluate to `False`.

##### String Interning:
- String literals which are valid identifier are interned by default.
- Strings created at runtime (i.e., not string literals) are typically not interned. This includes strings obtained through concatenation or string formatting.
- User can manually intern a string using `sys.intern`.

In [None]:
a_string = "python"
b_string = "python"

print(a_string == b_string)
print(a_string is b_string)

In [None]:
a_string = "python!"
b_string = "python!"

print(a_string == b_string)
print(a_string is b_string)

In [None]:
a_string = "_this_string_will_be_intern_since_it_is_valid_identifier"
b_string = "_this_string_will_be_intern_since_it_is_valid_identifier"

print(a_string == b_string)
print(a_string is b_string)

In [None]:
a_string = "Generally this is not interned, but we can intern it using sys.intern"
b_string = "Generally this is not interned, but we can intern it using sys.intern"

print(a_string == b_string)
print(a_string is b_string)

a_string = sys.intern(
    "Generally this is not interned, but we can intern it using sys.intern"
)
b_string = sys.intern(
    "Generally this is not interned, but we can intern it using sys.intern"
)

print(a_string == b_string)
print(a_string is b_string)

#### Peephole

[From Wikipedia](https://en.wikipedia.org/wiki/Peephole_optimization)

> Peephole optimization is an optimization technique performed on a small set of compiler-generated instructions; the small set is known as the peephole or window.

In [None]:
c = compile("24 * 60", "<string>", "eval")
print(c.co_consts)

In [None]:
c = compile("(1, 2) * 5", "<string>", "eval")
print(c.co_consts)

In [None]:
c = compile('"xyz" * 4', "<string>", "eval")
print(c.co_consts)

In [None]:
def a_func():
    a_int = 24 * 60
    # short length sequence <= 20
    a_tuple = (1, 2) * 5
    a_string = "xyz" * 4


a_func.__code__.co_consts

## Other practices

Use `del` to delete reference when a object is not required anymore

Use `__slot__` to save on memory.

In [None]:
import sys


class C:
    def __init__(self, x, y):
        self.x = x
        self.y = y


c = C(1, "Hello World")

In [None]:
"__dict__" in dir(c)

In [None]:
c.__dict__

In [None]:
help(sys.getsizeof)

In [None]:
sys.getsizeof(c) + sys.getsizeof(c.__dict__)

In [None]:
class C:
    __slots__ = ("x", "y")

    def __init__(self, x, y):
        self.x = x
        self.y = y


c = C(1, "Hello World")

In [None]:
"__dict__" in dir(c)

In [None]:
sys.getsizeof(c)

**Question before we proceed to next section**: What is the output of the following code?

In [None]:
a_list = [[]] * 3
a_list[0].append(1)
a_list[1].append(2)
a_list[2].append(3)

print(a_list)

Have a look at the [official docs](https://docs.python.org/3/library/stdtypes.html#common-sequence-operations) to know how to handle this kind of issues.

\[<< [Basic Overview](./01_python_basic_overview.ipynb) | [Index](./00_index.ipynb) | [Function parameters and arguments](./03_function_parameters_and_arguments.ipynb) >>\]