### Variables are just Memory References

Objects are stored on the heap at different memory addresses.

e.g. `my_var = 10`

my_var --> Reference (0x100) --> Memory Address (0x100) --> 10

In [1]:
# 'my_var' just references the memory address which holds the actual value.
my_var = 10

In [2]:
# You can find the memory address referenced by a variable by using the 
# id() function, this returns a base-10 number.
id(my_var)

139947211917904

In [3]:
# You can convert this to hexadecimal by using the `hex()` function.
hex(id(my_var))

'0x7f47ffda8a50'

__Reference Counting__

The Python Memory Manager keeps track of the amount of references to a particular memory address.

In [4]:
# We can have 2 references to the same memory address.
my_var = 10
other_var = my_var

my_var ----->

            0x100 --> 10
               
other_var -->

When all references to an address are deleted or out of scope, 
the Python Memory Manager will allow that address to be re-allocated, essentially deleting the object from memory.

__Finding the Reference Count__

In [6]:
import sys

a = [1, 2, 3]

# This increases the reference count by 1, since variables are passed by reference in Python.
sys.getrefcount(a)

2

In [7]:
import ctypes

# This will avoid the extra reference count
def true_ref_count(address: int):
    return ctypes.c_long.from_address(address).value

In [8]:
b = [2, 4, 6]

true_ref_count(id(b))

1

In [9]:
c = [3, 5, 7]
d = c

true_ref_count(id(c))

2

### Garbage Collection

__Circular References__

my_var --> Object A --> Object B --> Object A

When `my_var` is deleted, Object A still is referenced by Object B, so its space is not re-allocated. This results in a __memory leak__.

The garbage collector can identify these circular references and clean them  up.

__Garbage Collection__

- Can be controlled programmatically using the `gc` module.
- runs periodically on its own
- can be called manually
- by default is turned _on_, but can be turned _off_ for performance reasons
    * __Beware!__ Be sure code does not create circular references, otherwise this will result in memory leaks.


In [8]:
import ctypes
import gc

def true_ref_count(address: int):
    return ctypes.c_long.from_address(address).value

def object_exists(address: int):
    for obj in gc.get_objects():
        if id(obj) == address:
            return True
    
    return False

In [9]:
class A:
    def __init__(self):
        self.b = B(self)
        
        print(f"A - self: {hex(id(self))}\n b: {hex(id(self.b))}")


In [10]:
class B:
    def __init__(self, a):
        self.a = a
        
        print(f"B - self: {hex(id(self))}\n a: {hex(id(self.a))}")

In [11]:
gc.disable()

In [12]:
my_var = A()

B - self: 0x7f3a0c3c4100
 a: 0x7f3a0c3c4340
A - self: 0x7f3a0c3c4340
 b: 0x7f3a0c3c4100


In [13]:
a_id = id(my_var)
b_id = id(my_var.b)

In [14]:
true_ref_count(a_id)

2

In [15]:
true_ref_count(b_id)

1

In [16]:
object_exists(a_id)

True

In [17]:
object_exists(b_id)

True

In [18]:
my_var = None

In [19]:
true_ref_count(a_id)

1

In [28]:
true_ref_count(b_id)

1

__Variable Re-Assignment__

In [11]:
# Here, my_var references an object at a certain memory address
my_var = 10

print(hex(id(my_var)))

0x7f47ffda8a50


In [12]:
# Here, the value at the original memory address does NOT change.
# my_var simply references a *different* address with the new value
my_var = 15

print(hex(id(my_var)))

0x7f47ffda8af0


Even when incrementing a value, the value of the original address
is not modified.

A new address with the incremented value is created and the reference
of `my_var` is changed

In fact, the value of `int` objects can never be changed (immutability)

In [13]:
my_var = my_var + 5

print(hex(id(my_var)))

0x7f47ffda8b90


Somewhat surprisingly, here we can see that when 2 variables are assigned to the same value, even in separate statements, they share the same reference!

In [14]:
x = 10
y = 10

In [15]:
print(hex(id(x)))
print(hex(id(y)))

0x7f47ffda8a50
0x7f47ffda8a50


__Variable Equality__

We can think of variable equality in two fundamental ways:

Using the _identity operator_ `is`
- Compares the memory addresses of two objects

Using the _equality_ operator `==`
- Compares the object state or data

In [2]:
a = 10
b = a

print(a is b)
print(a == b)

True
True


In [3]:
a = 'hello'
b = 'hello'

print(a is b)
print(a == b)

True
True


In [4]:
a = [1, 2, 3]
b = [1, 2, 3]

print(a is b)
print(a == b)

False
True


In [5]:
a = 10
b = 10.0

print(a is b)
print(a == b)

False
True


__The None object__

The `None` object can be assigned to variables to indicate they're values are not yet set as expected e.g. an "empty" or null pointer

The None object is a real object that is managed by the Python Memory Manager, and furthermore will always use a shared reference when assignined to variables.

We can test is a variable is null by comparing its memory address to the address of the `None` object, using the `is` operator

e.g. `a is None`

In [6]:
a = None
b = None
c = None

a is b is c

True

### Interning

At startup, Python (CPython) pre-loads, or caches, a global list of integers in the range **[-5, 256]** (inclusive).

Any time an integer is referenced in that range, Python will use the cached version of that object

These are Singleton objects, and this is done as an optimization strategy since small integers are used often.

When we write `a = 10`, Python just has to point to the existing cached reference for 10

But when we write `a = 257` or any integrer outside of that range, Python will create a new object every time

In [1]:
a = 10
b = 10

a is b # a and b are the same memory address

True

In [2]:
a = 257
b = 257

a is b

False

In [3]:
a = 10
b = int(10)
c = int('10')
d = int('1010', 2)

a is b is c is d

True

__String Interning__

Python, both internally and in your code, deals with lots of dictionary type lookups, on string keys, which means a lot of string equality testing.

If we want to see if 2 strings are equal...

e.g.

```
a = 'some_long_string'
b = 'some_long_string'
```

Using `a == b`, we ned to compare the 2 strings character by character.

But if we know that they have been __interned__, or cached, then `a` and `b` are pointing to the same string in memory and we can compare using the identity operator: `is`

In this case, we are comparing 2 integers (memory addresses) and is therefore much faster

Some strings are also automatically interned - but not all

As Python code is compiled, identifiers are interned:
- variable names
- function names
- class names
- etc...

Identifiers must start with an \_ or a letter and can only contain \_, letters, and numbers

_Some_ string literals may also be automatically interned:
- string literals that look like identifiers (e.g. 'hello_world')
- although if it starts with a digit, even though that is not a valid identifier, it may still get interned. __But dont count on it!__

[ ! ] Not all strings are interned by python

You can also force strings to be interned:

```
import sys

a = sys.intern('the quick brown fox')
b = sys.intern('the quick brown fox')
```

But in general, you should avoid forcing interning unless you have a specific need to, like:
- dealing with a large number of strings that could have high repetition, e.g. tokenizing large corpus of text (NLP)
- lots of string comparisons


In [4]:
a = 'hello'
b = 'hello'

a is b

True

In [5]:
a = 'hello world'
b = 'hello world'

a is b

False

In [6]:
a = '_this_is_a_long_string_that_could_be_an_identifier'
b = '_this_is_a_long_string_that_could_be_an_identifier'

a is b

True

### Peephole Optimizations

This is another variety of optimizations that can occur at compile time.

__Constant Expressions__

Numeric Calculations:

e.g. `24 * 60`, Python will pre-calculate 24\*60 -> 1440 and store it

Short Sequences (__length < 20__)

e.g.

`(1, 2) * 5` -> (1, 2, 1, 2, 1, 2, 1, 2, 1, 2)

`'abc' * 5` -> abcabcabc

`'hello + ' world'` -> hello world

[ ! ] Expressions like the ones above can essentially be thought of as constants, so they are pre-calculated at compile time and the result is used whenever the same expression re-appears


In [17]:
def my_func():
    a = 24 * 60
    b = (1, 2) * 5
    c = 'abc' * 3
    d = ['a', 'b'] * 3 # will not be cached since lists are mutable

In [18]:
my_func.__code__.co_consts

(None, 1440, (1, 2, 1, 2, 1, 2, 1, 2, 1, 2), 'abcabcabc', 'a', 'b', 3)

__Membership Tests: Mutables are replaced by Immutables__

When membership tests such as:

`if e in [1, 2, 3]`

are encountered, the list literal is replaced by its immutable counterpart (a tuple), `(1, 2, 3)`

- lists -> tuples
- sets -> frozensets

[ ! ] Set membership is much faster than list or tuple membership (since sets are basically hashmaps like dictionaries)

So instead of writing:

`if e in [1, 2, 3]` OR `if e in (1, 2, 3)`

write `if e in {1, 2, 3}`

In [21]:
def is_member(e):
    if e in [1, 2, 3]: # this will become a tuple
        pass

In [22]:
is_member.__code__.co_consts

(None, (1, 2, 3))

In [25]:
def is_member(e):
    if e in {1, 2, 3}: # this will become a frozenset
        pass

In [26]:
is_member.__code__.co_consts

(None, frozenset({1, 2, 3}))