<a href="https://colab.research.google.com/github/aserdargun/DSML101/blob/main/Part_1_Section_03_Variables_and_Memory.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **PART 1: FUNCTIONAL PROGRAMMING**

### 01 - Variables and Memory References

We can find the memory address that a variable references, by using `id()` function.

The `id()` function returns the memory address of its argument as a base-10 integer.

We can use the function `hex()` to convert the base-10 to base-16

In [None]:
my_var = 10
print('my_var = {0}'.format(my_var))
print('memory address of my_var (decimal): {0}'.format(id(my_var)))
print('memory address of my_var (hex): {0}'.format(hex(id(my_var))))

my_var = 10
memory address of my_var (decimal): 2245052295760
memory address of my_var (hex): 0x20ab78b6a50


In [None]:
greeting = 'Hello'
print('greeting = {0}'.format(greeting))
print('memory address of my_var (decimal): {0}'.format(id(greeting)))
print('memory address of my_var (hex): {0}'.format(hex(id(greeting))))

greeting = Hello
memory address of my_var (decimal): 2245138675440
memory address of my_var (hex): 0x20abcb176f0


---
**BE CAREFUL!**

*Note how the memory address of `my_var` is different from that of `greeting`.*

*Strictly speaking, `my_var` is not "equal" to 10.*

*Instead `my_var` is a reference to an (integer) object (containing the value 10) located at the memory address `id(my_var)`*

*Similarly for the variable `greeting`*

### 02 - Reference Counting

Method that returns the reference count for a given variable's memory address:

In [None]:
import ctypes

def ref_count(address):
    return ctypes.c_long.from_address(address).value

Let's make a variable, and check it's reference count:

In [None]:
my_var = [1, 2, 3, 4]
ref_count(id(my_var))

1

There is another built-in function we can use to obtain the reference count:

In [None]:
import sys
sys.getrefcount(my_var)

2

But why is this returning 3, instead of the expected 2 we obtained with the previous function?

Answer: The `sys.getrefcount()` function takes `my_var` as an argument, this means it receives (and stores) a reference to `my_var`'s memory address also - hence the count is off by 1. So we will use `from_address()` instead.

We make another reference to the same reference as `my_var`:

In [None]:
other_var = my_var

Let's look at the memory address of those two variables and the reference counts:

In [None]:
print(hex(id(my_var)), hex(id(other_var)))
print(ref_count(id(my_var)))

0x20abdb3ff80 0x20abdb3ff80
2


Force one reference to go away:

In [None]:
other_var = None

And we look at the reference count again:

In [None]:
print(ref_count(id(my_var)))

1


We see that the reference count has gone back to 2.

You'll probably never need  to do anythinli like this in Python. Memory management is completely transparent - this is just to illustrate some of what is going behind the scenes as it helps to understand upcoming concepts.

---
**BE CAREFUL!**

*In Google Colab Environment, `getrefcount()` function gives us the value 3, but in your local Jupyter notebook, it will be 2.*

### 03 - Garbage Collection

In [None]:
import ctypes
import gc

We use the same function that we used in the lesson on reference counting to calculate the numver of references to a specified object (using its memory address to avoid creating an extra reference)

In [None]:
def ref_count(address):
    return ctypes.c_long.from_address(address).value

We create a function that will search the objects in the GC for a specified id and tell us if the object was found or not:

In [None]:
def object_by_id(object_id):
    for obj in gc.get_objects():
        if id(obj) == object_id:
            return "Object exists"
    return "Not found"

Next we define two classes that we will use to create a circular reference.

Class A's constructor will create an instance of class B and pass itself to class B's constructor that will then store that reference in some instance variable.

In [None]:
class A:
    def __init__(self):
        self.b = B(self)
        print('A: self: {0}, b:{1}'.format(hex(id(self)), hex(id(self.b))))

In [None]:
class B:
    def __init__(self, a):
        self.a = a
        print('B: self: {0}, a: {1}'.format(hex(id(self)), hex(id(self.a))))

We turn off the GC so we can see how reference counts are affected when the GC does not run and whne it does (by running it manually)

In [None]:
gc.disable()

Now we create an instance of A, which will, in turn, create an instance of B which will store a reference to the calling A instance.

In [None]:
my_var = A()

B: self: 0x20abcb27df0, a: 0x20abcb27730
A: self: 0x20abcb27730, b:0x20abcb27df0


As we can see A and B's constructors ran, and we also see from the memory addresses that we have a circular reference.

In fact `my_var` is also a reference to the same A instance:

In [None]:
print(hex(id(my_var)))

0x20abcb27730


Another way to see this:

In [None]:
print('a: \t{0}'.format(hex(id(my_var))))
print('a.b: \t{0}'.format(hex(id(my_var.b))))
print('b.a: \t{0}'.format(hex(id(my_var.b.a))))

a: 	0x20abcb27730
a.b: 	0x20abcb27df0
b.a: 	0x20abcb27730


In [None]:
a_id = id(my_var)
b_id = id(my_var.b)

We can see how many references we have for `a` and `b`

In [None]:
my_var = None

In [None]:
print('refcount(a) = {0}'.format(ref_count(a_id)))
print('refcount(b) = {0}'.format(ref_count(b_id)))
print('a: {0}'.format(object_by_id(a_id)))
print('b: {0}'.format(object_by_id(b_id)))

refcount(a) = 1
refcount(b) = 1
a: Object exists
b: Object exists


As we can see, the reference counts are now both equal to 1 (a pure circular reference), and reference counting alone did not destroy the A and B instances - they're still around. If no garbage collection is performed this would result in a memory leak.

Let's run the GC manually and re-check whether the objects still exist:

In [None]:
gc.collect()
print('refcount(a) = {0}'.format(ref_count(a_id)))
print('refcount(b) = {0}'.format(ref_count(b_id)))
print('a: {0}'.format(object_by_id(a_id)))
print('b: {0}'.format(object_by_id(b_id)))

refcount(a) = 0
refcount(b) = 0
a: Not found
b: Not found


### 04 - Dynamic vs Static Typing

Python is dynamically typed.

This means that the type of a variable is simply the type of the object the variable name points to (references).The variable itself has no associated type.

In [None]:
a = "hello"

In [None]:
type(a)

str

In [None]:
a = 10

In [None]:
type(a)

int

In [None]:
a = lambda x: x ** 2

In [None]:
a(2)

4

In [None]:
type(a)

function

As you can see from the above examples, the type of the variable `a` changed over time - in fact it was simply the type of the object `a` was referencing at that time. No type was ever attached to the variable name itself.

### 05 - Variable Re-Assignment

Notice how the memory address of `a` is different every time.

In [None]:
a = 10
hex(id(a))

'0x20ab78b6a50'

In [None]:
a = 15
hex(id(a))

'0x20ab78b6af0'

In [None]:
a = 5
hex(id(a))

'0x20ab78b69b0'

In [None]:
a = a + 1
hex(id(a))

'0x20ab78b69d0'

However, look at this:

a = 10
b = 10
print(hex(id(a)))
print(hex(id(b)))

The memory adresses of both `a` and `b` are the same!!

### 06 - Object Mutability

Certain Python built-in object types (aka data types) are mutable.

That is, the internal contents (state) of the object in memory can be modified.

In [None]:
my_list = [1, 2, 3]
print(my_list)
print(hex(id(my_list)))

[1, 2, 3]
0x20abcb36900


In [None]:
my_list.append(4)
print(my_list)
print(hex(id(my_list)))

[1, 2, 3, 4]
0x20abcb36900


As you can see, the memory address of `my_list` has not changed.

But the contents of `my_list` has changed from [1, 2, 3] to [1, 2, 3, 4].

---
**BE CAREFUL!**

*On the other hand, consider this:*

In [None]:
my_list_1 = [1, 2, 3]
print(my_list_1)
print(hex(id(my_list_1)))

[1, 2, 3]
0x20abdb33140


In [None]:
my_list_1 = my_list_1 + [4]
print(my_list_1)
print(hex(id(my_list_1)))

[1, 2, 3, 4]
0x20abdb444c0


*Notice here that the memory address of `my_list_1` did change.*

*This is because concatenating two lists objects `my_list_1` and `[4]` did not modify the contents of `my_list_1` - instead it created a new list object and re-assigned `my_list_1` to reference this new object.*

*Similarly with ditionary objects that are also mutable types.*

In [None]:
my_dict = dict(key1 = 'value 1')
print(my_dict)
print(hex(id(my_dict)))

{'key1': 'value 1'}
0x20abca8c080


In [None]:
my_dict['key1'] = 'modified value 1'
print(my_dict)
print(hex(id(my_dict)))

{'key1': 'modified value 1'}
0x20abca8c080


In [None]:
my_dict['key2'] = 'value 2'
print(my_dict)
print(hex(id(my_dict)))

{'key1': 'modified value 1', 'key2': 'value 2'}
0x20abca8c080


Once again we see that while we are modifying the contents of the dictionary, the memory address of `my_dict` has not changed.

Now consider the immutable sequence type: tuple

The tuple is immutable, so elements cannot be added, removed or replaced.

In [None]:
t = (1 ,2 ,3)

This tuple will never change at all. It has three elementes, the integers 1, 2 , and 3. This will remain the case as long as `t`'s reference is not changed.

---
**BE CAREFUL!**

*But, consider the following tuple:*

In [None]:
a = [1, 2]
b = [3, 4]
t = (a, b)

*Now, `t` is still immutable, i.e. it contains a reference to the object `a` abd the object `b`. That will never change as long as `t`'s reference is not re-assigned.*

*However, the elements `a` and `b` are, themselves, mutable.*

In [None]:
a.append(3)
b.append(5)
print(t)

([1, 2, 3], [3, 4, 5])


*Observe that the contents of `a` and `b` did change!*

*So immutability can be a littlme more subtle than just thinking something can never change.*

*The tuple `t` did not change - it contains two elements, that are the references `a` and `b`. And that will not change. But, because the referenced elements are mutable themselves, it appears as though the tuple has changed.*

*It  hasn't though - tha distinction is subtle but important to understand!*

### 07 - Function Arguments and Mutability

Consider a function that receives a string argument, and changes the argument in some way:

In [None]:
def process(s):
    print('initial s # = {0}'.format(hex(id(s))))
    s = s + ' world'
    print('s after change # = {0}'.format(hex(id(s))))

In [None]:
my_var = 'hello'
print('my_var # = {0}'.format(hex(id(my_var))))

my_var # = 0x20abc789230


Note that when s is received, it is referencing the same object as `my_var`.

After we "modify" s, s is pointing to a new memory address:


In [None]:
process(my_var)

initial s # = 0x20abc789230
s after change # = 0x20abca7aef0


And our own variable `my_var` is still pointing to the original memory address:

In [None]:
process(my_var)

initial s # = 0x20abc789230
s after change # = 0x20abca8ba70


And our own variable `my_var` is still pointing to the original memory address:

In [None]:
print('my_var # = {0}'.format(hex(id(my_var))))

my_var # = 0x20abc789230


Let's see how this works with mutable objects:

In [None]:
def modify_list(items):
    print('initial items # = {0}'.format(hex(id(items))))
    if len(items) > 0:
        items[0] = items[0] ** 2
    items.pop()
    items.append(5)
    print('final items # = {0}'.format(hex(id(items))))

In [None]:
my_list = [2, 3, 4]
print('my_list # = {0}'.format(hex(id(my_list))))

my_list # = 0x20abdb34bc0


In [None]:
modify_list(my_list)

initial items # = 0x20abdb34bc0
final items # = 0x20abdb34bc0


In [None]:
print(my_list)
print('my_list # = {0}'.format(hex(id(my_list))))

[4, 3, 5]
my_list # = 0x20abdb34bc0


As you can see, thoughout all the code, the memory address referenced by `my_list` and `items` is always the same (shared) reference - we are simply modifying the contents (internal state) of the object at that memory address.

Now, even with immutable container objects we have to be careful, e.g. a tuple containing a list (the tuple is immutable, but the list element inside the tuple is mutable)

In [None]:
def modify_tuple(t):
    print('initial t # = {0}'.format(hex(id(t))))
    t[0].append(100)
    print('final t # = {0}'.format(hex(id(t))))

In [None]:
my_tuple = ([1, 2], 'a')

In [None]:
hex(id(my_tuple))

'0x20abcb36200'

In [None]:
modify_tuple(my_tuple)

initial t # = 0x20abcb36200
final t # = 0x20abcb36200


In [None]:
my_tuple

([1, 2, 100], 'a')

As you can see, the first element of the tuple was mutated.

### 08 - Shared References and Mutability

The following sets up a shared reference between the variables `my_var_1` and `my_var_2`

In [None]:
my_var_1 = 'hello'
my_var_2 = my_var_1
print(my_var_1)
print(my_var_2)

hello
hello


In [None]:
print(hex(id(my_var_1)))
print(hex(id(my_var_2)))

0x20abc789230
0x20abc789230


In [None]:
my_var_2 = my_var_2 + ' world!'

In [None]:
print(hex(id(my_var_1)))
print(hex(id(my_var_2)))

0x20abc789230
0x20abcb25db0


---
**BE CAREFUL!**

*Be careful if the variable type is mutable!*

*Here we create a list `(my_list_1)` and create a variable `(my_list_2)` referencing the same list object:*

In [None]:
my_list_1 = [1, 2, 3]
my_list_2 = my_list_1
print(my_list_1)
print(my_list_2)

[1, 2, 3]
[1, 2, 3]


*As we can see they have the same memory address (shared reference):*

In [None]:
print(hex(id(my_list_1)))
print(hex(id(my_list_2)))

0x20abca7ae80
0x20abca7ae80


Now we modify the list referenced by `my_list_2`:

In [None]:
my_list_2.append(4)

`my_list_2` has been modified:

In [None]:
print(my_list_2)

[1, 2, 3, 4]


And since my_list_1 references the same list object, it has also changed:

In [None]:
print(my_list_1)

[1, 2, 3, 4]


As you can see, both variables still share the same reference:

In [None]:
print(hex(id(my_list_1)))
print(hex(id(my_list_2)))

0x20abca7ae80
0x20abca7ae80


**Behind the scenes with Python's memory manager**

Recall from a few lectures back:

In [None]:
a = 10
b = 10

In [None]:
print(hex(id(a)))
print(hex(id(b)))

0x20ab78b6a50
0x20ab78b6a50


Same memory address!!

This is safe for Python to do because integer objects are immutable.

So, even though `a` and `b` initially shared the same memory address, we can never modify a's value by "modifying" b' value.

The only way to change `b`'s value is to change it's reference, which will never affect `a`.

In [None]:
b = 15

In [None]:
print(hex(id(a)))
print(hex(id(b)))

0x20ab78b6a50
0x20ab78b6af0


However, for mutable objects, Python's memory manager does not do this, since that would not be safe.

In [None]:
my_list_1 = [1, 2, 3]
my_list_2 = [1, 2 ,3]

As you can see, although the two variables were assingend identical "contents", the memory addresses are not the same:

In [None]:
print(hex(id(my_list_1)))
print(hex(id(my_list_2)))

0x20abdb41440
0x20abca8b600


### 09 - Variable Equality

From the previous lecture we know that `a` and `b` will have a shared reference:

In [None]:
a = 10
b = 10

print(hex(id(a)))
print(hex(id(b)))

0x20ab78b6a50
0x20ab78b6a50


When we use the `is` operator, we are comparing t he moemory address references:

In [None]:
print("a is b ", a is b)

a is b  True


But if we use the `==` operator, we are comparing the contents:

In [None]:
print("a == b", a == b)

a == b True


The following however, do not have a shared reference:

In [None]:
a = [1, 2, 3]
b = [1, 2, 3]

print(hex(id(a)))
print(hex(id(b)))

0x20abdb42a40
0x20abca6fa80


Although they are not the same objects, they do contain the same "values"

In [None]:
print("a is b", a is b)
print("a == b", a == b)

a is b False
a == b True


Python will attempt to compare values as best as posssible, for example:

In [None]:
a = 10
b = 10.0

These are not the same reference, since one object is an `int` and the other is a `float`

In [None]:
print(type(a))
print(type(b))

<class 'int'>
<class 'float'>


In [None]:
print(hex(id(a)))
print(hex(id(b)))

0x20ab78b6a50
0x20abdb528f0


In [None]:
print("a is b", a is b)
print("a == b", a == b)

a is b False
a == b True


So, even though `a` is an integer 10, and `b` is a float 10.0, the values will still compare as equal.

In fact, this will also have the same behavior:

In [None]:
c = 10 + 0j
print(type(c))

<class 'complex'>


In [None]:
print("a is c", a is c)
print("a == c", a == c)

a is c False
a == c True


**The None Object**

`None` is a built-in "variable" of type `NoneType`.

Basically the keyword `None` is a reference to an object instance of `NoneType`.

NoneType objects are immutable! Python's memory manager will therefore use shared references to the None object.

In [None]:
print(None)

None


In [None]:
hex(id(None))

'0x7fff99ea9cd8'

In [None]:
type(None)

NoneType

In [None]:
a = None
print(type(a))
print(hex(id(a)))

<class 'NoneType'>
0x7fff99ea9cd8


In [None]:
a is None

True

In [None]:
a == None

True

In [None]:
b = None
hex(id(b))

'0x7fff99ea9cd8'

In [None]:
a is b

True

In [None]:
a == b

True

In [None]:
l = []

In [None]:
type(l)

list

In [None]:
l is None

False

In [None]:
l == None

False

### 10 - Everything is an Object

In [None]:
a = 10

`a` is an object of type `int`, i.e. `a` is an instance of the `int` class.

In [None]:
print(type(a))

<class 'int'>


if `int` is a class, we should be able to declare it using standard class instatiation:

In [None]:
b = int(10)

In [None]:
print(b)
print(type(b))

10
<class 'int'>


We can even request the class documentation:

In [None]:
help(int)

Help on class int in module builtins:

class int(object)
 |  int([x]) -> integer
 |  int(x, base=10) -> integer
 |  
 |  Convert a number or string to an integer, or return 0 if no arguments
 |  are given.  If x is a number, return x.__int__().  For floating point
 |  numbers, this truncates towards zero.
 |  
 |  If x is not a number or if base is given, then x must be a string,
 |  bytes, or bytearray instance representing an integer literal in the
 |  given base.  The literal can be preceded by '+' or '-' and be surrounded
 |  by whitespace.  The base defaults to 10.  Valid bases are 0 and 2-36.
 |  Base 0 means to interpret the base from the string as an integer literal.
 |  >>> int('0b100', base=0)
 |  4
 |  
 |  Built-in subclasses:
 |      bool
 |  
 |  Methods defined here:
 |  
 |  __abs__(self, /)
 |      abs(self)
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __and__(self, value, /)
 |      Return self&value.
 |  
 |  __bool__(self, /)
 |      True if 

As we see from the docs, we can even create an `int` using an overloaded constructor:

In [None]:
b = int('10', base=2)

In [None]:
print(b)
print(type(b))

2
<class 'int'>


**Functions are Objects too**

In [None]:
def square(a):
    return a ** 2

In [None]:
type(square)

function

In fact, we can even assign them to a variable:

In [None]:
f = square

In [None]:
type(f)

function

In [None]:
f is square

True

In [None]:
f(2)

4

In [None]:
type(f(2))

int

A function can return a function:

In [None]:
def cube(a):
    return a ** 3

In [None]:
def select_function(fn_id):
    if fn_id == 1:
        return square
    else:
        return cube

In [None]:
f = select_function(1)
print(hex(id(f)))
print(hex(id(square)))
print(hex(id(cube)))
print(type(f))
print('f is square: ', f is square)
print('f is cube: ', f is cube)
print(f)
print(f(2))

0x20abca65310
0x20abca65310
0x20abdb509d0
<class 'function'>
f is square:  True
f is cube:  False
<function square at 0x0000020ABCA65310>
4


We could even call it this way:

In [None]:
select_function(1)(5)

25

A Function can be passed as an argument to another function.

(This example is pretty useless, but it illustrates the point effectively)

In [None]:
def exec_function(fn, n):
    return fn(n)

In [None]:
result = exec_function(cube, 2)
print(result)

8


### 11 - Python Optimizations - Interning

Earlier, we saw shared references being created automatically by Python:

In [None]:
a = 100
b = 100
print(id(a))
print(id(b))

2245052487120
2245052487120


Note how a and b reference the same object.

---
**BE CAREFUL!**

*But consider the following example:*

In [None]:
a = 500
b = 500
print(id(a))
print(id(b))

2245155841488
2245155841552


*As you can see, the variables `a` and `b` do not point to the same object!*

*This is because Python pre-caches integer objects in the range [-5, 256]*

*So for example:*

In [None]:
a = 256
b = 256
print(id(a))
print(id(b))

2245052492176
2245052492176


*and*

In [None]:
a = -5
b = -5
print(id(a))
print(id(b))

2245052295280
2245052295280


do have the same reference.

*This is called **interning**: Python **interns** the integers in the range [-5, 256].*

*The integers in the range [-5, 256] are essentially **singleton** objects.*

In [None]:
a = 10
b = int(10)
c = int('10')
d = int('1010', 2)

In [None]:
print(a, b, c, d)

10 10 10 10


In [None]:
a is b

True

In [None]:
a is c

True

In [None]:
a is d

True

As you can see, all these variables were created in different ways, but since the integer object with value 10 behaves like a singleton, they all ended up pointing to the same object in memory.

### 12 - Python Optimizations: String Interning

Python will automatically intern certain strings.

In particular all the identifiers (variable names, function names, class names, etc) are interned (singleton objects created).

Python will also intern string literals that look like identifiers.

For example:

In [None]:
a = 'hello'
b = 'hello'
print(id(a))
print(id(b))

2245134946864
2245134946864


---
**BE CAREFUL!**

*But not the following:*

In [None]:
a = 'hello, world!'
b = 'hello, world!'
print(id(a))
print(id(b))

2245155879600
2245155642096


However, because the following literals resemble identifiers, even though they are quite long, Python will still automatically intern them:

In [None]:
a = 'hello_world'
b = 'hello_world'
print(id(a))
print(id(b))

2245155889328
2245155889328


And even longer:

In [None]:
a = '_this_is_a_long_string_that_could_be_used_as_an_identifier'
b = '_this_is_a_long_string_that_could_be_used_as_an_identifier'
print(id(a))
print(id(b))

2245155887680
2245155887680


Even if the string starts with a digit:

In [None]:
a = '1_hello_world'
b = '1_hello_world'
print(id(a))
print(id(b))

2245155889328
2245155889328


That was interned (pointer is the same), but look at this one:

In [None]:
a = '1 hello world'
b = '1 hello world'
print(id(a))
print(id(b))

2245155890352
2245155889968


Interning strings (making them singleton objects) means that testing for string equality can be done faster by comparing the memory address:

In [None]:
a = 'this_is_a_long_string'
b = 'this_is_a_long_string'
print('a==b:', a == b)
print('a is b:', a is b)

a==b: True
a is b: True


---
**BE CAREFUL!**

*Note: Remember, using `is` ONLY works if the strings were interned!*

*Here's where this technique fails:*

In [None]:
a = 'hello world'
b = 'hello world'
print('a==b:', a==b)
print('a is b:', a is b)

a==b: True
a is b: False


You can force strings to be interned (but only use it if you have a valid performance optimization need):

In [None]:
import sys

In [None]:
a = sys.intern('hello world')
b = sys.intern('hello world')
c = 'hello world'
print(id(a))
print(id(b))
print(id(c))

2245155890544
2245155890544
2245155890928


Notice how `a` and `b` are pointing to the same object, but `c` is NOT.

So, since both `a` and `b` were interned we can use `is` to test for equality of the two strings:

In [None]:
print('a==b:', a==b)
print('a is b', a is b)

a==b: True
a is b True


So, does interning really make a big speed difference?

Yes, but only if you are performing a lot of comparisons.

Let's run som quick and dirty benchamarks:

In [None]:
def compare_using_equals(n):
    a = 'a long string that is not interned' * 200
    b = 'a long string that is not interned' * 200
    for i in range(n):
        if a == b:
            pass

In [None]:
def compare_using_interning(n):
    a = sys.intern('a long string that is not interned' * 200)
    b = sys.intern('a long string that is not interned' * 200)
    for i in range(n):
        if a is b:
            pass

In [None]:
import time

start = time.perf_counter()
compare_using_equals(10000000)
end = time.perf_counter()

print('equality: ', end-start)

equality:  4.1064913000009255


In [None]:
start = time.perf_counter()
compare_using_interning(10000000)
end = time.perf_counter()

print('identity: ', end-start)

identity:  0.44426750000093307


As you can see, the performance difference, especially for long strings, and for many comparisons, can be quite radical!

### 13 - Python Optimizations - PeepHole

Peephole optimizations refer to a certain class of optimization stratigies Python employs during any compilation phases.

**Constant Expressions**

Let's see how Python reduces constant expressions for optimization purposes:

In [None]:
def my_func():
    a = 24 *60
    b = (1, 2) * 5
    c = 'abc' * 3
    d = 'ab' * 11
    e = 'the quick brown fox' * 10
    f = [1, 2] * 5

In [None]:
my_func.__code__.co_consts

(None,
 1440,
 (1, 2, 1, 2, 1, 2, 1, 2, 1, 2),
 'abcabcabc',
 'ababababababababababab',
 'the quick brown foxthe quick brown foxthe quick brown foxthe quick brown foxthe quick brown foxthe quick brown foxthe quick brown foxthe quick brown foxthe quick brown foxthe quick brown fox',
 1,
 2,
 5)

As you can see in the example above `24 * 60` was pre-calculated and cached as a constant (`1440`).

Similarly, `(1, 2) * 5` was cached as `(1, 2, 1, 2, 1, 2, 1, 2, 1, 2)` and `'abc' * 3` was cached as `abcabcabc`

On the other hand, note how `'the quick brown fox'* 10` was not pre-calculated (too long).

Similarly `[1, 2] * 5` was not pre-calculated either siince a list is mutable, and hence not a constant.

**Membership Tests**

In membership testing, optimizations are applied as can be seen below:

In [None]:
def my_func():
    if e in [1, 2, 3]:
        pass

In [None]:
my_func.__code__.co_consts

(None, (1, 2, 3))

As you can see, the mutable list `[1, 2, 3]` was converted to an immutable tuple.

It is OK to do this here, since we are testing membersip of the list at that point in time, hence it is safe to convert it to a tuple, which is more efficient tan testing membership of a list.

In the same way, set membership will be converted to frozen set membership:

In [None]:
def my_func():
    if e in {1, 2, 3}:
        pass

In [None]:
my_func.__code__.co_consts

(None, frozenset({1, 2, 3}))

In general, when you are writing your code, if you can use `set` membership testing, prefer that over a list or tuple - it is quite a bit more efficien.

Let's do a small quick (and dirty) benchmark of this:

In [None]:
import string
import time

char_list = list(string.ascii_letters)
char_tuple = tuple(string.ascii_letters)
char_set = set(string.ascii_letters)

print(char_list)
print()
print(char_tuple)
print()
print(char_set)

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']

('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z')

{'d', 'A', 'x', 't', 'y', 'X', 'H', 'I', 'b', 'm', 'p', 'i', 'z', 'j', 'l', 'S', 'v', 'C', 'Q', 'n', 'L', 'B', 'r', 'J', 'T', 'Y', 's', 'W', 'q', 'N', 'o', 'G', 'F', 'O', 'h', 'f', 'w', 'E', 'u', 'k', 'D', 'Z', 'P', 'V', 'M', 'g', 'U', 'a', 'c', 'e', 'R', 'K'}


In [None]:
def membership_test(n, container):
    for i in range(n):
        if 'p' in container:
            pass

In [None]:
start = time.perf_counter()
membership_test(10000000, char_list)
end = time.perf_counter()
print('list membership: ', end-start)

list membership:  3.376766100000168


In [None]:
start = time.perf_counter()
membership_test(10000000, char_tuple)
end = time.perf_counter()
print('tuple membership: ', end-start)

tuple membership:  3.5227051000001666


In [None]:
start = time.perf_counter()
membership_test(10000000, char_set)
end = time.perf_counter()
print('set membership: ', end-start)

set membership:  0.4033576000001631


As you can see, set membership tests run quite a bit faster - which is not suprising since they are basically dictionary-like objects, so hash maps are used for looking up an item to determine membership.