# Python Tutorial

Inspired by stanford cs224n

## Collections

Python has several built-in types that are useful for storing and manipulating data: list, tuple, dict. Here is the official Python documentation on these types (and many others): https://docs.python.org/3/library/stdtypes.html.

### Lists

Lists are mutable arrays. Let's see how they work.

In [1]:
names = ["Zach", "Jay"]

In [2]:
# Index into list by index
print(names[0])

Zach


In [3]:
# Append to list (appends to end of list)
names.append("Richard")
print(names)

['Zach', 'Jay', 'Richard']


In [4]:
# Get length of list
print(len(names))

3


In [5]:
# Concatenate two lists
# += operator is a short hand for list1 = list1 + list2 (can also be used for -, *, / and on other types of variables)
names += ["Abi", "Kevin"]
print(names)

['Zach', 'Jay', 'Richard', 'Abi', 'Kevin']


In [6]:
# Two ways to create an empty list
more_names = []
more_names = list()

In [7]:
# Create a list that contains different data types, this is allowed in Python
stuff = [1, ["hi", "bye"], -0.12, None]
print(stuff)

[1, ['hi', 'bye'], -0.12, None]


List slicing is a useful way to access a slice of elements in a list.

In [8]:
numbers = [0, 1, 2, 3, 4, 5, 6]

# Slices from start index (inclusive) to end index (exclusive)
print(numbers[0:3])

[0, 1, 2]


In [9]:
# When start index is not specified, it is start of list
# When end index is not specified, it is end of list
print(numbers[:3])
print(numbers[5:])

[0, 1, 2]
[5, 6]


In [10]:
# : takes the slice of all elements along a dimension, is very useful when working with numpy arrays
print(numbers[:])

[0, 1, 2, 3, 4, 5, 6]


In [11]:
# Negative index wraps around, start counting from the end of list
print(numbers[-1])
print(numbers[-3:])
print(numbers[3:-2])

6
[4, 5, 6]
[3, 4]


### Tuples

Tuples are immutable arrays. Let's see how they work.

In [12]:
# Use parentheses for tuples, square brackets for lists
names = ("Zach", "Jay")

In [13]:
# Syntax for accessing an element and getting length are the same as lists
print(names[0])
print(len(names))

Zach
2


In [14]:
# But unlike lists, tuples do not support item re-assignment
names[0] = "Richard"

TypeError: 'tuple' object does not support item assignment

In [15]:
# Create an empty tuple
empty = tuple()
print(empty)

# Create a tuple with a single item, the comma is important
single = (10,)
print(single)

()
(10,)


## Dictionary

Dictionaries are hash maps. Let's see how they work.

In [16]:
# Two ways to create an empty dictionary
phonebook = {}
phonebook = dict()

In [17]:
# Create dictionary with one item
phonebook = {"Zach": "12-37"}
# Add another item
phonebook["Jay"] = "34-23"

In [18]:
# Check if a key is in the dictionary
print("Zach" in phonebook)
print("Kevin" in phonebook)

True
False


In [19]:
# Get corresponding value for a key
print(phonebook["Jay"])

34-23


In [20]:
# Delete an item
del phonebook["Zach"]
print(phonebook)

{'Jay': '34-23'}


## Loops

In [21]:
# Basic for loop
for i in range(5):
    print(i)

0
1
2
3
4


In [22]:
# To iterate over a list
names = ["Zach", "Jay", "Richard"]
for name in names:
    print(name)

Zach
Jay
Richard


In [23]:
# To iterate over indices and values in a list
# Way 1
for i in range(len(names)):
    print(i, names[i])

print("---")

# Way 2
for i, name in enumerate(names):
    print(i, name)

0 Zach
1 Jay
2 Richard
---
0 Zach
1 Jay
2 Richard


In [24]:
# To iterate over a dictionary
phonebook = {"Zach": "12-37", "Jay": "34-23"}

# Iterate over keys
for name in phonebook:
    print(name)

print("---")

# Iterate over values
for number in phonebook.values():
    print(number)

print("---")

# Iterate over keys and values
for name, number in phonebook.items():
    print(name, number)

Zach
Jay
---
12-37
34-23
---
Zach 12-37
Jay 34-23


List comprehensions

In [25]:
l = [i for i in range(10)]
print(l)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [26]:
l = [i * j for i in range(10) 
            for j in range(1, 3)]
print(l)

[0, 0, 1, 2, 2, 4, 3, 6, 4, 8, 5, 10, 6, 12, 7, 14, 8, 16, 9, 18]


## Functions

In [27]:
def foo(arg1, arg2='default value'):
    return arg1, arg2

In [28]:
foo(5)

(5, 'default value')

In [193]:
# Type hints are optional and are just that hints
# There is no obligation to actually follow them
def bar(arg1: int, arg2: str='default value') -> tuple[int, str]:
    return arg1, arg2

In [194]:
bar(42, 'meaning')

(42, 'meaning')

In [195]:
bar('meaning', 42)

('meaning', 42)

#### `*args` & `**kwargs`

In [196]:
def bar(*args, **kwargs):
    print(f'args = {args} {type(args)}')
    print(f'kwargs = {kwargs} {type(kwargs)}')

In [30]:
bar(4, 2, 'AI',
    animals=['pandas', 'cats', ''],
    people={'Knut', 'Dijkstra', 'Curry'})

args = (4, 2, 'AI') <class 'tuple'>
kwargs = {'animals': ['pandas', 'cats', ''], 'people': {'Dijkstra', 'Knut', 'Curry'}} <class 'dict'>


### Higher-Order Functions: Map Filter And Zip

In [31]:
def fact(n):
    return 1 if n < 2 else n * fact(n-1)

In [32]:
map(fact, [1, 2, 3, 4, 5])

<map at 0x23ee3558c10>

In [33]:
l = list(map(fact, [1, 2, 3, 4, 5]))
print(l)

[1, 2, 6, 24, 120]


In [50]:
lambda x : x + 1

<function __main__.<lambda>(x)>

In [49]:
(lambda x : x + 1)(1)

2

In [48]:
l1 = [1, 2, 3, 4, 5]
l2 = [10, 20, 30, 40, 50]

f = lambda x, y: x+y

m = map(f, l1, l2)
print(list(m))

[11, 22, 33, 44, 55]


In [55]:
l = [1, 2, 3, 4, 5, 6, 7, 8, 9]
result = filter(lambda x: x % 2 == 0, l)
print(list(result))

[2, 4, 6, 8]


In [54]:
l1 = 1, 2, 3
l2 = 'a', 'b', 'c'
list(zip(l1, l2))

[(1, 'a'), (2, 'b'), (3, 'c')]

### Closures

In [56]:
def outer():
    x = 'python'
    def inner():
        print(x)
    return inner

In [57]:
fn = outer()

In [58]:
fn()

python


In [59]:
def outer():
    x = [1, 2, 3]
    print('outer:', hex(id(x)))
    def inner():
        print('inner:', hex(id(x)))
        print(x)
    return inner

In [60]:
fn = outer()

outer: 0x23ee364f940


In [61]:
fn()

inner: 0x23ee364f940
[1, 2, 3]


In [62]:
def adder(n):
    def inner(x):
        return x + n
    return inner

In [63]:
def create_adders():
    adders = []
    for n in range(1, 5):
        adders.append(lambda x: x + n)
    return adders

In [64]:
adders = create_adders()

In [65]:
adders

[<function __main__.create_adders.<locals>.<lambda>(x)>,
 <function __main__.create_adders.<locals>.<lambda>(x)>,
 <function __main__.create_adders.<locals>.<lambda>(x)>,
 <function __main__.create_adders.<locals>.<lambda>(x)>]

In [79]:
adders[0](10), adders[1](10)

(14, 14)

In [81]:
def create_adders():
    adders = []
    for n in range(1, 5):
        adders.append(lambda x, step=n: x + step)
    return adders

In [82]:
adders_2 = create_adders()

In [83]:
adders_2[0](10), adders_2[1](10)

(11, 12)

### Decorators

In [87]:
def counter(fn):
    count = 0
    
    def inner(*args, **kwargs):
        nonlocal count
        count += 1
        print(f'Function {fn.__name__} was called {count} times with args {args} and kwargs {kwargs}')
        return fn(*args, **kwargs)
    return inner

In [88]:
def add(a, b):
    return a + b

In [89]:
add = counter(add)

In [90]:
add(1, 2)

Function add was called 1 times with args (1, 2) and kwargs {}


3

In [96]:
add(1, b=4)

Function add was called 3 times with args (1,) and kwargs {'b': 4}


5

In [92]:
@counter
def mult(a, b):
    return a * b

In [93]:
mult(1, 2)

Function mult was called 1 times with args (1, 2) and kwargs {}


2

In [95]:
mult(4, b=2)

Function mult was called 3 times with args (4,) and kwargs {'b': 2}


8

In [141]:
def fib(n):
    print(f"Calculating {n}")
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)

In [156]:
fib(3)

Calculating 3
Calculating 2
Calculating 1
Calculating 0
Calculating 1


2

In [157]:
fib(4)

Calculating 4
Calculating 3
Calculating 2
Calculating 1
Calculating 0
Calculating 1
Calculating 2
Calculating 1
Calculating 0


3

In [158]:
from functools import wraps

def memoize(fn):
    cache = dict()
    
    @wraps(fn)
    def inner(n):
        if n not in cache:
            cache[n] = fn(n)
        return cache[n]
    
    return inner

In [170]:
@memoize
def fib_memo(n):
    print(f"Calculating {n}")
    if n < 2:
        return n
    return fib_memo(n-1) + fib_memo(n-2)

In [171]:
fib_memo(3)

Calculating 3
Calculating 2
Calculating 1
Calculating 0


2

In [172]:
fib_memo(4)

Calculating 4


3

In [114]:
from functools import lru_cache

In [186]:
@lru_cache(10)
def fib_memo_2(n):
    print(f"Calculating {n}")
    if n < 2:
        return n
    return fib_memo_2(n-1) + fib_memo_2(n-2)

In [187]:
fib_memo_2(1), fib_memo_2(2), fib_memo_2(3)

Calculating 1
Calculating 2
Calculating 0
Calculating 3


(1, 1, 2)

In [188]:
fib_memo_2(10)

Calculating 10
Calculating 9
Calculating 8
Calculating 7
Calculating 6
Calculating 5
Calculating 4


55

In [189]:
fib_memo_2(10)

55

### Generators

In [230]:
def even_numbers(start=2):
    if start % 2 == 0:
        number = start
    else:
        number = start + 1
    while True:
        yield number
        number += 2

In [231]:
evens = even_numbers(2)

In [232]:
next(evens), next(evens) 

(2, 4)

In [234]:
for i, el in enumerate(evens):
    print(el)
    if i == 5:
        break

6
8
10
12
14
16


## OOP

In [197]:
class Person:
    def __init__(self, name):
        self.name = name

In [198]:
p = Person("Milen")

In [205]:
print(p)

<__main__.Person object at 0x0000023EFFCFB5E0>


In [199]:
p.name

'Milen'

In [209]:
class Person:
    def __init__(self, name):
        self.name = name
    def __str__(self) -> str:
        return self.name

In [210]:
p = Person("Milen")

<__main__.Person at 0x23effcfb880>

In [211]:
print(p)

Milen


In [218]:
class Person:
    def __init__(self, name):
        self.name = name
    def __str__(self) -> str:
        return self.name
    
class Student(Person):
    def __init__(self, name, student_number):
        super().__init__(name)
        self.student_number = student_number
    def __str__(self) -> str:
        return f'{super().__str__()} with student number {self.student_number}'

In [219]:
s = Student('Ivan', 42)

In [220]:
print(s)

Ivan with student number 42


In [221]:
s.name

'Ivan'

### Exeptions

In [241]:
try:
    raise 1/0
except Exception as e:
    print(e)
finally:
    print('Goodbye, world!')

division by zero
Goodbye, world!


In [244]:
try:
    raise 1/0
except ZeroDivisionError as e:
    print(e)
    print("Isn't that just infinity")
except Exception as e:
    print(e)
finally:
    print('Goodbye, world!')

division by zero
Isn't that just infinity
Goodbye, world!


### Duck typing or why sometimes it is EAFP (easier to as for forgiveness than permission)

In [247]:
class Duck:

    def quack(self):
        print('Quack, quack')

    def fly(self):
        print('Flap, Flap!')


class Person:

    def quack(self):
        print("I'm Quacking Like a Duck!")

    def fly(self):
        print("I'm Flapping my Arms!")


In [248]:
duck_duck = Duck()
duck_person = Person()

In [260]:
def quack_and_fly(thing):
    pass
    # Not Duck-Typed ("Non-Pythonic")
    # if isinstance(thing, Duck):
    #     thing.quack()
    #     thing.fly()
    # else:
    #     print('This has to be a Duck!')

    # # LBYL ("Non-Pythonic")
    # if hasattr(thing, 'quack'):
    #     if callable(thing.quack):
    #         thing.quack()

    if hasattr(thing, 'fly'):
        try:
            thing.quack()
            thing.fly()
            # thing.bark()
        except AttributeError as e:
            print(e)


In [261]:
quack_and_fly(duck_person)

I'm Quacking Like a Duck!
I'm Flapping my Arms!


### Special (magic, Dunder) Methods

In [311]:

class CallMe:
    def __init__(self, name):
        self.name = name
    def __call__(self, *args, **kwargs):
        print(f"My name is {self.name}. You called me with {args} args and {kwargs} kwargs")

In [312]:
call = CallMe("Object")
call(42, 'is not enough', statements="I don't believe in AI.")

My name is Object. You called me with (42, 'is not enough') args and {'statements': "I don't believe in AI"} kwargs


In [299]:
class Complex(object):
    def __init__(self, real, imag=0.0):
        self.real = real
        self.imag = imag
    
    def modulus(self):
        return float(self.real**2 + self.imag**2) ** (1/2)

    def __add__(self, other):
        if isinstance(other, Complex):
            return Complex(self.real + other.real,
                       self.imag + other.imag)
        return self + Complex(other)
             
    def __radd__(self, other):
        return other + self

    def __sub__(self, other):
        return Complex(self.real - other.real,
                       self.imag - other.imag)

    def __mul__(self, other):
        return Complex(self.real*other.real - self.imag*other.imag,
                       self.imag*other.real + self.real*other.imag)
    def __mul__(self, other):
        return other * self

    def __div__(self, other):
        r = other.modulus() ** 2
        return Complex((self.real*other.real+self.imag*other.imag)/r, (self.real*other.real-self.real*other.real)/r)

    def __abs__(self):
        return self.modulus()

    def __neg__(self):   # defines -c (c is Complex)
        return Complex(-self.real, -self.imag)

    def __eq__(self, other):
        return self.real == other.real and self.imag == other.imag

    def __ne__(self, other):
        return not self.__eq__(other)

    def __str__(self):
        return f'{self.real}, {self.imag}'

    def __repr__(self):
        class_name = type(self).__name__
        return f'Complex {str(self)}'

    def __pow__(self, power):
        raise NotImplementedError\
              ('self**power is not yet impl. for Complex')

In [300]:
c1 = Complex(1, 1)
c2 = Complex(2, 2)

In [301]:
print(c1 + c2)

3, 3


In [303]:
c1 + 5

Complex 6, 1.0

In [302]:
c1 + c2

Complex 3, 3

## NumPy
NumPy is a Python library, which adds support for large, multi-dimensional arrays and matrices, along with a large collection of optimized, high-level mathematical functions to operate on these arrays.

You may need to install numpy first before importing it in the next cell.

There are many ways to manage your packages, but the workflow we suggest for this class is to use Anaconda.
 - Download Anaconda. Create a conda environment when you work on a new project.
 - Activate your conda environment and install libraries using conda or pip if they are not available in conda.
 - If you are running scripts on command line, run inside your conda environment.
 - If you are using a Jupyter notebook, add your conda environment to your Jupyter notebook: https://towardsdatascience.com/get-your-conda-environment-to-show-in-jupyter-notebooks-the-easy-way-17010b76e874. Create your Jupyter notebook and verify you're in your conda environment kernel (top right of notebook should display the name). If you're not, go to the Kernel tab on the top left and click Change kernel to change to your conda environment kernel.

In [35]:
# Import numpy
import numpy as np

In [36]:
# Create numpy arrays from lists
x = np.array([1,2,3])
a = np.array([[1,2,3]])


y = np.array([[3,4,5]])
z = np.array([[6,7],[8,9]])

# Let's take a look at their shapes.
# When working with numpy arrays, .shape will be a very useful debugging tool
print(x.shape)
print(y.shape)
print()
print(z)
print(z.shape)

(3,)
(1, 3)

[[6 7]
 [8 9]]
(2, 2)


Vectors can be represented as 1-D arrays of shape (N,) or 2-D arrays of shape (N, 1) or (1, N). But it's important to note that the shapes (N,), (N, 1), and (1,N) are not the same and may result in different behavior (we'll see some examples below involving matrix multiplication and broadcasting).

Matrices are generally represented as 2-D arrays of shape (M, N).

The best way to ensure your code gives you the behavior you expect is to keep track of your array shapes and try out small test cases or refer back to documentation when you are unsure.

In [37]:
a = np.arange(10)
b = a.reshape((5, 2))
print(a)
print()
print(b)

[0 1 2 3 4 5 6 7 8 9]

[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]


### Array Operations

There are many NumPy operations that can be used to reduce a numpy array along an axis.

Let's look at the np.max operation (documentation: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.max.html).

In [38]:
x = np.array([[1,2],[3,4], [5, 6]])
print(x)
print()
print(x.shape)

[[1 2]
 [3 4]
 [5 6]]

(3, 2)


In [39]:
print(np.max(x, axis = 1))

[2 4 6]


In [40]:
print(np.max(x, axis = 1).shape)

(3,)


In [41]:
print(np.max(x, axis = 1, keepdims = True))

[[2]
 [4]
 [6]]


In [42]:
print(np.max(x, axis = 1, keepdims = True).shape)

(3, 1)


Next, let's look at some matrix operations. Let's take an element-wise product (Hadamard product).

In [43]:
A = np.array([[1, 2], [3, 4]])
B = np.array([[3, 3], [3, 3]])
print(A)
print(B)
print("---")
print(A * B)

[[1 2]
 [3 4]]
[[3 3]
 [3 3]]
---
[[ 3  6]
 [ 9 12]]


We can do matrix multiplication with np.matmul or @.

In [44]:
# One way to do matrix multiplication
print(np.matmul(A, B))

# Another way to do matrix multiplication
print(A @ B)

[[ 9  9]
 [21 21]]
[[ 9  9]
 [21 21]]


We can take the dot product or a matrix vector product with np.dot.

In [45]:
u = np.array([1, 2, 3])
v = np.array([1, 10, 100])

print(np.dot(u, v))

# Can also call numpy operations on the numpy array, useful for chaining together multiple operations
print(u.dot(v))

321
321


In [46]:
W = np.array([[1, 2], [3, 4], [5, 6]])
print(v.shape)
print(W.shape)

# This works.
print(np.dot(v, W))
print(np.dot(v, W).shape)

(3,)
(3, 2)
[531 642]
(2,)


In [47]:
# This does not. Why?
print(np.dot(W, v))

ValueError: shapes (3,2) and (3,) not aligned: 2 (dim 1) != 3 (dim 0)

In [38]:
# We can fix the above issue by transposing W.
print(np.dot(W.T, v))
print(np.dot(W.T, v).shape)

[531 642]
(2,)


###  Indexing

Slicing / indexing numpy arrays is a extension of the Python concept of slicing (lists) to N dimensions.

In [39]:
x = np.random.random((3, 4))

# Selects all of x
print(x[:])

[[0.72911546 0.60028083 0.34112318 0.04628585]
 [0.39061284 0.55989781 0.59768014 0.27299391]
 [0.07681211 0.02859183 0.2518763  0.02461059]]


In [40]:
# Selects the 0th and 2nd rows
print(x[np.array([0, 2]), :])

print("---")

# Selects 1st row as 1-D vector and and 1st through 2nd elements
print(x[1, 1:3])

[[0.72911546 0.60028083 0.34112318 0.04628585]
 [0.07681211 0.02859183 0.2518763  0.02461059]]
---
[0.55989781 0.59768014]


In [41]:
# Boolean indexing
print(x[x > 0.5])

[0.72911546 0.60028083 0.55989781 0.59768014]


In [42]:
# 3-D vector of shape (3, 4, 1)
print(x[:, :, np.newaxis])

[[[0.72911546]
  [0.60028083]
  [0.34112318]
  [0.04628585]]

 [[0.39061284]
  [0.55989781]
  [0.59768014]
  [0.27299391]]

 [[0.07681211]
  [0.02859183]
  [0.2518763 ]
  [0.02461059]]]


### Broadcasting

The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations.

**General Broadcasting Rules**

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when:
- they are equal, or
- one of them is 1 (in which case, elements on the axis are repeated along the dimension)

More details: https://numpy.org/doc/stable/user/basics.broadcasting.html

In [43]:
x = np.random.random((3, 4))

y = np.random.random((3, 1))
z = np.random.random((1, 4))

# In this example, y and z are broadcasted to match the shape of x.
# y is broadcasted along dim 1.
s = x + y
# z is broadcasted along dim 0.
p = x * z

In [44]:
print(x.shape)
print()
print(y.shape)
print(s.shape)

(3, 4)

(3, 1)
(3, 4)


In [45]:
print(x.shape)
print()
print(s.shape)
print(p.shape)

(3, 4)

(3, 4)
(3, 4)


In [46]:
a = np.zeros((3, 3))
b = np.array([[1, 2, 3]])
print(a)
print()
print(a+b)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

[[1. 2. 3.]
 [1. 2. 3.]
 [1. 2. 3.]]


Let's look at a more complex example.

In [47]:
a = np.random.random((3, 4))
b = np.random.random((3, 1))
c = np.random.random((3, ))

What is the expected broadcasting behavior for these operations? What do the following operations give us? What are the resulting shapes?

In [48]:
result1 = b + b.T

print(b.shape)
print(b.T.shape)
print(result1.shape)
print(result1)

(3, 1)
(1, 3)
(3, 3)
[[0.38107631 0.40825014 1.13208597]
 [0.40825014 0.43542398 1.1592598 ]
 [1.13208597 1.1592598  1.88309563]]


In [49]:
result2 = a + c

print(a.shape)
print(c.shape)
print(result2.shape)
print(result2)

ValueError: operands could not be broadcast together with shapes (3,4) (3,) 

In [50]:
result3 = b + c

print(b.shape)
print(c.shape)
print(result3.shape)
print(result3)

(3, 1)
(3,)
(3, 3)
[[0.46443248 1.00498129 0.96563004]
 [0.49160631 1.03215513 0.99280387]
 [1.21544213 1.75599095 1.7166397 ]]


### Efficient NumPy Code

When working with numpy arrays, avoid explicit for-loops over indices/axes at all costs. For-loops will dramatically slow down your code (~10-100x).

We can time code using the %%timeit magic. Let's compare using explicit for-loop vs. using numpy operations.

In [51]:
%%timeit
x = np.random.rand(1000, 1000)
for i in range(100, 1000):
    for j in range(x.shape[1]):
        x[i, j] += 5

158 ms ± 2.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [52]:
%%timeit
x = np.random.rand(1000, 1000)
x[np.arange(100,1000), :] += 5

7.32 ms ± 141 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
