# Memory Management in Python

At the end of this section you will have these understanding:
- How Python objects are stored in memory?
- What are reference counters?
- How does Garbage collection works in Python?
- Difference between `is` and `==`.
- Memory optimization.
- Best practices for memory management.


| C                                                                        | Python                                                                        |
| ------------------------------------------------------------------------ | ----------------------------------------------------------------------------- |
| ![](./static/01/C_Memory_Management.png)                                 | ![](./static/01/Python_Memory_Management.png)                                 |
| [Link to repl](https://replit.com/@lyndabaka/Memory-Management-in-C?v=1) | [Link to repl](https://replit.com/@lyndabaka/Memory-Management-in-Python?v=1) |ng)

### Everything is in memory as an object

In Python, everything you work with, whether it's a number, a string, or even a function, is treated as an object. These objects are stored in memory for the program to access.

The place where these objects are stored is called the **heap**. It's like a big storage space where Python keeps track of all the objects.

In [None]:
a_int = 10
a_str = "hello"
a_list = [1, 2, 3, 4]
a_dict = {"A":1, "B": 2}

def a_func():
    pass

class A:
    pass

print(hex(id(a_int)))
print(hex(id(a_str)))
print(hex(id(a_list)))
print(hex(id(a_dict)))
print(hex(id(a_func)))
print(hex(id(A)))

In [None]:
print(isinstance(a_int, object))
print(isinstance(a_str, object))
print(isinstance(a_list, object))
print(isinstance(a_dict, object))
print(isinstance(a_func, object))
print(isinstance(A, object))

This means we can `assign` them to a variable, `pass` them to a function, or `return` them from a function.

### id() function

If you want to know the memory address of any object, you can use the `id()` function. It will give you a unique identifier for that object.

The memory address provided by `id()` is in base-10, but if you pass it to `hex()`, it will convert it to hexadecimal notation. This can make it easier for you to read and work with.

In [None]:
a = 5

a_address = id(a)

print(a_address)
print(hex(a_address))

In [None]:
b = "hello"

b_address = id(b)
print(b_address)
print(hex(b_address))

In [None]:
# We can use ctypes.cast to create a new Python object which reference to same address
import ctypes

# *a_address
print(ctypes.cast(a_address, ctypes.py_object).value)

# *b_address
print(ctypes.cast(b_address, ctypes.py_object).value)

### sys.getrefcount(...)

In Python, the memory manager keeps track of how many references point to each object in memory. This count is known as the **reference count** for that object.

In [None]:
my_var = 12324

To access the reference count of an object, you can use the `sys.getrefcount()` function. However, there is a small artifact associated with this function.

When you call `sys.getrefcount()`, it increments the reference count by 1 temporarily. This happens because the function itself accesses the object, which causes the count to go up by 1.

In [None]:
import sys


# The counter starts at 1 when an object is created.
before = sys.getrefcount(my_var)

# It increments when a reference is created
my_var2 = my_var
after_reference_by_other = sys.getrefcount(my_var)

In [None]:
hex(id(my_var))

If you create another variable and assign it the same value as the first variable, both variables will point to the same memory address. This means that changes made to one variable will also affect the other, making it a **pass by reference** behavior in Python.

In [None]:
# The counter starts at 1 when an object is created.
before = sys.getrefcount(my_var)

# It increments when a reference is created
my_var2 = my_var
after_reference_by_other = sys.getrefcount(my_var)

print(f"Starting reference: {before}")
print(f"After another variable reference: {after_reference_by_other}")

> Note that sync value are passed by reference in Python, sys.getrefcount(...) will return one additional count to actual reference number. It can be more in case you are running Python Interpreter in some differnt environment, like Jupyter notebook.

In [None]:
print(id(my_var))
print(id(my_var2))
id(my_var) == id(my_var2)

Another way to get the reference count without the artifact is by using the `ctypes.c_long.from_address(var_address).value` function. This approach requires the direct address of the object.

In [None]:
# We can actually use ctypes to get the actual reference counter
address = id(my_var)
ctypes.c_long.from_address(address).value

In [None]:
# reference counter decrements when a reference is deleted.
del my_var2
ctypes.c_long.from_address(address).value

In [None]:
ctypes.cast(address, ctypes.py_object).value

In [None]:
# The counter reaching zero indicates no more references to the object.
del my_var

# This might now give some random value. 
# As of fact reference counter for value 12324 is 0 now so python's garbage collection kicks in and free up the space
ctypes.c_long.from_address(address).value

In [None]:
# Doing this may crash the Python interpreter ass the address now does not contains python object
# ctypes.cast(address, ctypes.py_object).value

- Objects with a reference count of zero are considered garbage.
- Garbage collection reclaims memory occupied by unreferenced objects.
- Reference counting offers efficient memory management and immediate resource reclamation.
- However, it doesn't handle cyclic references (objects referencing each other).
- Python uses additional garbage collection mechanisms to handle cyclic references.
- The reference counter ensures timely deallocation of objects and efficient memory usage.

PyCon talk in Memory management

[![](https://img.youtube.com/vi/F6u5rhUQ6dU/0.jpg)](https://youtu.be/F6u5rhUQ6dU)

### Python is dynamically typed

In [None]:
my_var = 10
print(hex(id(my_var)))
print(type(my_var))

In [None]:
my_var = "Test string"
print(hex(id(my_var)))
print(type(my_var))

### Mutability

Mutation in Python refers to the process of changing the value of an object while keeping the same address in memory. In other words, when you mutate an object, you modify its internal state without creating a completely new object.

In [None]:
# Mutable objects provide a way to mutate them (change the internal state)
a_list = [10, 20, 30]

print(hex(id(a_list)))

a_list.append(40)

print(hex(id(a_list)))

In [None]:
# Note that this create a new object in memory
a_list2 = [10, 20, 30]

print(hex(id(a_list2)))

a_list2 = a_list2 + [40]

# Note how Python recreated a new object here, 
# since it evaluates the right hand side and then reassign it to a_list2 object
print(hex(id(a_list2)))

In [None]:
a_tuple = ([10, 20, 30], 50, 60)

a_tuple[0].append(40)
print(a_tuple)

> Note that in Python almost every user define data structure is mutable. Only immutable data type in Python are `int`, `float`, `complex`, `str`, `tuple`, `frozenset`, `bytes`, `True`, `False`, `None`.

### Variable Equality

- `a == b` checks if value of a is equal to value of b
- `a is b` checks if value of id(a) is equal to value of id(b)

In [None]:
a_int = 10
b_int = 10

print(a_int == b_int)
print(a_int is b_int) # id(a) == id(b)

In [None]:
a_int = 999
b_int = 999

print(a_int == b_int)
print(a_int is b_int) # id(a) == id(b)

In [None]:
a_str = "python"
b_str = "python"

print(a_str == b_str)
print(a_str is b_str)

In [None]:
a_str = "python course"
b_str = "python course"

print(a_str == b_str)
print(a_str is b_str)

We will check why this happens when we discuss what Python does for memory optimization.

In [None]:
a_list = [1, 2, 3, 4]
b_list = [1, 2, 3, 4]

print(a_list == b_list)
print(a_list is b_list)

In [None]:
a_int = 10
a_float = 10.0

print(a_int == a_float)
print(a_int is a_float)

In [None]:
a_obj = None
b_obj = None

print(a_obj == None)
print(a_obj is None)
print(a_obj == b_obj)
print(a_obj is b_obj)

### Memory optimization

#### Interning

[From Wikipedia](https://en.wikipedia.org/wiki/Interning_(computer_science))
> In computer science, interning is re-using objects of equal value on-demand instead of creating new objects. This creational pattern is frequently used for numbers and strings in different programming languages.

##### Number Interning:
- Python interns small integers in the range [-5, 256]. This means that any variable referencing an integer within this range will point to the same memory location. For example, `x = 5` and `y = 5` will have `x is y` evaluate to `True`.
- Numbers outside the interned range or those created dynamically are not interned. For example, `x = 1000` and `y = 1000` will have `x is y` evaluate to `False`.

##### String Interning:
- String literals which are valid identifier are interned by default.
- Strings created at runtime (i.e., not string literals) are typically not interned. This includes strings obtained through concatenation or string formatting.
- User can manually intern a string using `sys.intern`.

In [None]:
a_string = "python"
b_string = "python"

print(a_string == b_string)
print(a_string is b_string)

In [None]:
a_string = "python!"
b_string = "python!"

print(a_string == b_string)
print(a_string is b_string)

In [None]:
a_string = "_this_string_will_be_intern_since_it_is_valid_identifier"
b_string = "_this_string_will_be_intern_since_it_is_valid_identifier"

print(a_string == b_string)
print(a_string is b_string)

In [None]:
a_string = "Generally this is not interned, but we can intern it using sys.intern"
b_string = "Generally this is not interned, but we can intern it using sys.intern"

print(a_string == b_string)
print(a_string is b_string)

a_string = sys.intern("Generally this is not interned, but we can intern it using sys.intern")
b_string = sys.intern("Generally this is not interned, but we can intern it using sys.intern")

print(a_string == b_string)
print(a_string is b_string)

#### Peephole

[From Wikipedia](https://en.wikipedia.org/wiki/Peephole_optimization)

> Peephole optimization is an optimization technique performed on a small set of compiler-generated instructions; the small set is known as the peephole or window.

In [None]:
c = compile("24 * 60", '<string>', 'eval')
print(c.co_consts)

In [None]:
c = compile("(1, 2) * 5", '<string>', 'eval')
print(c.co_consts)

In [None]:
c = compile("\"xyz\" * 4", '<string>', 'eval')
print(c.co_consts)

In [None]:
def a_func():
    a_int = 24 * 60
    # short length sequence <= 20
    a_tuple = (1, 2) * 5  
    a_string = "xyz" * 4

a_func.__code__.co_consts

### Best practice

Use `del` to delete reference when a object is not required anymore

Use `__slot__` to save on memory.

In [None]:
import sys


class C:
    def __init__(self, x, y):
        self.x = x
        self.y = y

c = C(1, "Hello World")

In [None]:
"__dict__" in dir(c)

In [None]:
c.__dict__

In [None]:
help(sys.getsizeof)

In [None]:
sys.getsizeof(c) + sys.getsizeof(c.__dict__)

In [None]:
class C:
    __slots__ = ("x", "y")
    def __init__(self, x, y):
        self.x = x
        self.y = y

c = C(1, "Hello World")

In [None]:
"__dict__" in dir(c)

In [None]:
sys.getsizeof(c)