#GirlsWhoML: Python Essentials

Author: Catherine Tong


##Introduction

This notebook covers essential Python concepts that we will use frequently during the GirlsWhoML course.

Feel free to go through the entire notebook or jump to any part. 

Structure: 

1.   Python Basics
2.   Arrays
3.   Array Maths
4.   Colab Workflow

This is not a complete introduction to Python -- for that we recommend refering to the Python Numpy Tutorial from [CS231n](https://cs231n.github.io/python-numpy-tutorial/), from which this notebook has been adapted. 

We use **Python3** by default.

## PART I. Python Basics

These are some basic data types and operations that we will use frequently throughout the course:

####Numbers

Integers and floats work as you would expect from other languages:



In [None]:
x = 3
print(x, type(x))

3 <class 'int'>


In [None]:
print(x + 1)   # Addition
print(x - 1)   # Subtraction
print(x * 2)   # Multiplication
print(x ** 2)  # Exponentiation

4
2
6
9


In [None]:
y = 2.5
print(type(y))
print(y, y + 1, y * 2, y ** 2)

<class 'float'>
2.5 3.5 5.0 6.25


###Containers

####Lists

List is a useful object to store data. 

In [None]:
xs = [3, 1, 2]   # Create a list
print(xs, xs[2])
print(xs[-1])     # Negative indices count from the end of the list; prints "2"

[3, 1, 2] 2
2


In [None]:
xs[2] = 'foo'    # Lists can contain elements of different types
print(xs)

[3, 1, 'foo']


In [None]:
xs.append('bar') # Add a new element to the end of the list
print(xs)  

[3, 1, 'foo', 'bar']


You can find all the gory details about lists in the [documentation](https://docs.python.org/3.7/tutorial/datastructures.html#more-on-lists).

####Slicing

Slicing a list allows us to retrieve specific items from it. **Remember, we always count from 0, not 1.**

In [None]:
nums = list(range(5))    # range is a built-in function that creates a list of integers
print(nums)         # Prints "[0, 1, 2, 3, 4]"
print(nums[2:4])    # Get a slice from index 2 to 4 (exclusive); prints "[2, 3]"
print(nums[2:])     # Get a slice from index 2 to the end; prints "[2, 3, 4]"
print(nums[:2])     # Get a slice from the start to index 2 (exclusive); prints "[0, 1]"
print(nums[:])      # Get a slice of the whole list; prints ["0, 1, 2, 3, 4]"
print(nums[:-1])    # Slice indices can be negative; prints ["0, 1, 2, 3]"
nums[2:4] = [8, 9] # Assign a new sublist to a slice
print(nums)         # Prints "[0, 1, 8, 9, 4]"

[0, 1, 2, 3, 4]
[2, 3]
[2, 3, 4]
[0, 1]
[0, 1, 2, 3, 4]
[0, 1, 2, 3]
[0, 1, 8, 9, 4]


####For Loops

You can loop over the elements of a list like this:

In [None]:
nums = list(range(5))
for x in nums:
    print(x)

0
1
2
3
4


For loops are very useful when we want to iteratively update values, for example:

In [None]:
a = 0
for step in range(10):
  a = a + 2
  print('step = {}, a = {}'.format(step, a))

print('Final a = {}'.format(a))

step = 0, a = 2
step = 1, a = 4
step = 2, a = 6
step = 3, a = 8
step = 4, a = 10
step = 5, a = 12
step = 6, a = 14
step = 7, a = 16
step = 8, a = 18
step = 9, a = 20
Final a = 20


You will see that this structure is very similar to how we we implement gradient descent in Lecture 2 Logistic Regression! 

Run the following two cells and inspect the similarities: 

In [None]:
#@title Quick detour: gradient descent
import numpy as np
cluster_centres = np.array([[-1, -3], [2, 2]])
colour_list = ['red', 'blue']

def generateData(num_samples_pc, dimensions=2, cluster_locs=cluster_centres):
    data = []
    labels = []

    for c, locs in enumerate(cluster_locs):
        pos = np.random.randn(num_samples_pc, dimensions) + locs
        data.append(pos)
        labels.append(np.ones(num_samples_pc) * c)

    data_np = np.concatenate(data, axis=0)
    labels_np = np.concatenate(labels, axis=0)
    return data_np, labels_np

train_data, train_labs = generateData(100)

def sigmoid(x):
  sigmoid_value = 1 / (1 + np.exp(-x))
  return sigmoid_value

def predictive_prob(x, a, b, c):
  f_x = x[:, 0] * a + x[:, 1] * b + c # might be easiest to start with for loop
  pred_prob = sigmoid(f_x)
  return pred_prob

def bce_loss(pred_probs, y):
  ## TODO ##
  pred_probs = np.clip(pred_probs, 1e-6, 1 - 1e-6)
  bce = np.sum(-y * np.log(pred_probs) - (1 - y) * np.log(1 - pred_probs))
  ## TODO end ##
  return bce

def compute_gradients(x, a, b, c, y):
  grad_a = np.sum(((predictive_prob(x, a, b, c)) - y) * x[:, 0])
  grad_b = np.sum(((predictive_prob(x, a, b, c)) - y) * x[:, 1])
  grad_c = np.sum(((predictive_prob(x, a, b, c)) - y))
  return grad_a, grad_b, grad_c

In [None]:
# Initial values of the weights (here's we're updating 3 values a, b,c )
a = 2
b = -1
c = -5

# Define parameters for the iteartive procedure
num_steps = 100
lr = 5e-3
print_every = 10

# Perform iteration!
for i in range(num_steps):

  # define an update operation
  grad_a, grad_b, grad_c = compute_gradients(train_data, a, b, c, train_labs)

  # the update step
  a = a - lr * grad_a
  b = b - lr * grad_b
  c = c - lr * grad_c

  if i % print_every == 0:
    print("Iteration %i\tLoss %.3f" % (i, bce_loss(predictive_prob(train_data, a, b, c), train_labs)))

# Final weights
print(a, b, c)

Iteration 0	Loss 95.857
Iteration 10	Loss 14.079
Iteration 20	Loss 8.570
Iteration 30	Loss 6.097
Iteration 40	Loss 4.632
Iteration 50	Loss 3.676
Iteration 60	Loss 3.021
Iteration 70	Loss 2.554
Iteration 80	Loss 2.209
Iteration 90	Loss 1.948
3.600601031795083 3.162224099866144 -2.5749136153775756


**Aside:** Do you know what's the meaning of this line? 

```if i % print_every == 0: ```

Let's look at this: 

In [None]:
print(0 % 5, 0 % 5 == 0) # gives you 0
print(1 % 5, 1 % 5 == 0)
print(2 % 5, 2 % 5 == 0)
print(3 % 5, 3 % 5 == 0)
print(4 % 5, 4 % 5 == 0)
print(5 % 5, 5 % 5 == 0) # gives you 0
print(6 % 5, 6 % 5 == 0)
print(7 % 5, 7 % 5 == 0)
print(8 % 5, 8 % 5 == 0)
print(9 % 5, 9 % 5 == 0)
print(10 % 5, 10 % 5 == 0) # gives you 0

0 True
1 False
2 False
3 False
4 False
0 True
1 False
2 False
3 False
4 False
0 True


`a % b` gives you the remainder when you divide $a$ by $b$.

So writing ```if i % b == 0``` is just a nice trick that we use to print statements every time the iteration number $i$ hits a multiple of $b$. 

###Functions

A function is a block of code which performs a specific task.

Python functions are defined using `def`. For example:

In [None]:
def sign(x):
    if x > 0:
        return 'positive'
    elif x < 0:
        return 'negative'
    else:
        return 'zero'


A function will only run when it is called:

In [None]:
for x in [-1, 0, 1]:
    print(sign(x))

negative
zero
positive


Sometimes we define functions to take optional keyword arguments, like this:

In [None]:
def hello(name, loud=False):
    if loud:
        print('HELLO, {}'.format(name.upper()))
    else:
        print('Hello, {}!'.format(name))

hello('Bob')
hello('Fred', loud=True)

Hello, Bob!
HELLO, FRED


### Classes (*new!*)💥

As we tackle more sophisticated data / ML pipelines, we'll need to use something called **classes**. 

Classes may look scary, but they're really useful for bundling data and methods together. 

Let's look at an example of a class below:


In [None]:
class Person:

  def __init__(self, first_name, last_name):
    # Create instance variables
    self.first_name = first_name
    self.last_name = last_name
  
  def get_full_name(self):
    # Instance method
    full_name = self.first_name + ' ' + self.last_name
    return full_name

How do we use it then? We need to *instantiate* it first.

**Remember:** the `__init__` function is called when you instantiate:

In [None]:
# Construct a new object instance of the Person class
obj = Person('Ada', 'Lovelace')

To check that `__init__` has been executed, try this:

In [None]:
print(obj.first_name) # prints 'Ada' because it was defined in __init__

Ada


You can now make use of the instantiated object to do all sorts of things, like calling its method:

In [None]:
print(obj.get_full_name())

Ada Lovelace


#### Inheritance

You also need to know about something called **inheritance**! 💸💸

Say you have a "parent" class, you can inherit the properties of the "parent" class like this:

```
class parent:
  statements

class child(parent):
  statements
```

But what does it do? Let's look at a more concrete example.

Pay attention to the `super()` line.

In [None]:
class Intro(Person):

  def __init__(self, first_name, last_name, pronoun):
    super().__init__(first_name, last_name) # the super() line
    self.pronoun = pronoun

  def describe(self):
    line = self.pronoun + ' is ' + self.get_full_name() + '.'
    return line


The super() line is really important! 

It explicitly passes these parameters (`first_name`, `last_name`) to the parent class (`Person`)'s `__init__` function. 

This saves you time as you don't have to write these again (`self.first_name = first_name`, `self.last_name = last_name`)

To verify this: 

In [None]:
child = Intro('Ada', 'Lovelace', 'She')
print(child.get_full_name()) # inherited method from Person class
print(child.describe())

Ada Lovelace
She is Ada Lovelace.


When writing ML code, class inheritance can be very handy! 

It allows us to easily recycle either classes we've written ourselves or those that come from pre-existing packages.

#### Classes for Neural Networks

For the keen beans -- here's a sneak peek of what you're going to encounter in Neural Networks:

```
import torch
from torch import nn

class OneLayerNet(nn.Module):
    def __init__(self, input_dim, output_dim): 
        super().__init__()
        self.my_layer_1 = nn.Linear(input_dim, output_dim)
        self.activation = nn.Sigmoid()

    def forward(self, x):
        hid1 = self.my_layer_1(x)
        output = self.activation(hid1)
        return output

# to instantiate
my_network = OneLayerNet(3, 4) 

# to do a forward pass on some random data
x = torch.rand(16, 3)
output = my_network.forward(x)

```

The same principle holds. We are defining a neural network here by inheriting a Parent (aka Base) class.This is `nn.Module`, a pre-defined class from the `pytorch` package. 

We then define the properties of our network in the `__init__` function. And we also add a `forward` function in order to define our forward pass.

You'll learn more about the meaning of these operations during your Neural Networks workshop 🤓

That's it! 


##PART II. Arrays

We use multi-dimensional arrays a lot for ML, whether during data preparation or model training.

In this part we'll take a closer look at Numpy arrays.


#### About Numpy 


Here's something you should know about Numpy if you aren't familiar with python packages. 

*   Numpy is the core library for scientific computing in Python. 

*   It provides a high-performance multidimensional array object, along with many tools for working with these arrays. 
*   
If you are already familiar with MATLAB, you might find this [tutorial](https://numpy.org/doc/stable/user/numpy-for-matlab-users.html) useful to get started with Numpy.

To use Numpy, we first need to import the `numpy` package:



In [None]:
import numpy as np

###Arrays

Arrays are a key part of how we manipulate the inputs and outputs of a Machine Learning (ML) model. 

In our first lecture, we see that we ususally strucutre our data $X$ like this:


$$\mathbf X = \begin{bmatrix} \mathbf{x}_1^\top \\ \mathbf{x}_2^\top \\ \vdots \\ \mathbf{x}_{N}^\top \end{bmatrix}, \;\;\text{where}\;\; \mathbf{x}_i=\begin{bmatrix}x_1 \\ x_2 \\ \vdots \\ x_d\end{bmatrix}$$

In Python terms, this means we have $X$ as a numpy array with shape $(N, d)$, where $N$ is the number of datapoints, and $d$ is the dimension of each datapoint.


Let's look at the following example:

In [None]:
# Create the following array with shape (6, 3)
# [[ 1  2  3 ]
#  [ 4  5  6 ]
#  [ 7  8  9 ]
#  [10 11 12 ]
#  [13 14 15 ]
#  [16 17 18 ]]

data = np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12], [13,14,15], [16,17,18]])
print(data)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]
 [13 14 15]
 [16 17 18]]


In [None]:
print('Data shape (N x d): ', data.shape)
print('No. of datapoints (N): ', data.shape[0])
print('Dimension of each datapoint (d): ', data.shape[1])

Data shape (N x d):  (6, 3)
No. of datapoints (N):  6
Dimension of each datapoint (d):  3


### Preparing Data for a Supervised ML task

Slicing now comes in handy when we want to isolate specific features / datapoints.



Let's say we want to split our data array by taking 80% of the data as our training set and the rest as testing set.

In [None]:
train_size = int(0.8 * len (data))
print('Train set shoud have {} datapoints.'.format(train_size))

Train set shoud have 4 datapoints.


In [None]:
# slicing along the datapoint dimension (dim = 0)

train_set = data[:train_size] # this is equivalent to data[:train_size, :]
test_set = data[train_size:] # this is equivalent to data[train_size:, :]

print('Shape of train set: ', train_set.shape)
print('Shape of test set: ', test_set.shape)

Shape of train set:  (4, 3)
Shape of test set:  (2, 3)


In a supervised ML task, we would want to define some parts of our data as input features, with which we would use to predict the label; e.g. using the petal length and width (features) to predict the flower species (label).

Now let's suppose that the first 2 dimensions are our input features and the 3rd is our label which we want to predict.

In [None]:
# slicing along the feature dimension (dim = 1)
train_x = train_set[:, :2]
train_y = train_set[:, 2:]
print('Shape of train features X: ', train_x.shape)
print('Shape of train labels y: ', train_y.shape)

Shape of train features X:  (4, 2)
Shape of train labels y:  (4, 1)


Try write your own code for slicing the test_set below: 
You should find that they result in shape (2, 2) and (2, 1).

In [None]:
# slicing along the feature dimension (dim = 1)

test_x = test_set[:, :2]
test_y = test_set[:, 2:]
print('Shape of test features X: ', test_x.shape)
print('Shape of test labels y: ', test_y.shape)

Shape of test features X:  (2, 2)
Shape of test labels y:  (2, 1)


And that's it! 

Once we have our training and testing $X$ and $y$, we can use them to train and evaluate any machine learning model.

NB: 
*   ML models are usually developed with train / validation / test sets. To do so, split the data into 3 sets instead of 2 at the start. 
*   We have used the simplest scheme of dividing up the train / test set here. You can adopt an appropriate splitting scheme which reflects your dataset and task. For example, you can split your data by time and train only with the oldest data and test on the newest ones. 



### Reshaping Arrays

Sometimes we need to reshape data in arrays. The simplest example of this type of operation is transposing a matrix; to transpose a matrix, simply use the T attribute of an array object:

In [None]:
print(train_x)
print("transpose:\n", train_x.T)

[[ 1  2]
 [ 4  5]
 [ 7  8]
 [10 11]]
transpose:
 [[ 1  4  7 10]
 [ 2  5  8 11]]


### Combining data

Sometimes our features may come from different sources, so we will need to combine them to form our desired array of shape $(N,d)$.

We'll introduce some handy functions: 
* `np.concatenate`
* `np.expand_dims`
* `np.stack`.

In [None]:
# Let's first create two feature arrays of shape (N,).

N = 4
feature_1 = np.arange(N)
feature_2 = np.arange(N, N*2)
print(feature_1)
print(feature_2)
print(feature_1.shape, feature_2.shape)

[0 1 2 3]
[4 5 6 7]
(4,) (4,)


We want to combine these to get a $X$ array with shape $(N, d)$, i.e. (4, 2). There are two ways to go about this:

In [None]:
print(x)
print('shape: ', x.shape)

[[0 4]
 [1 5]
 [2 6]
 [3 7]]
shape:  (4, 2)


In [None]:
x = np.concatenate([feature_1, feature_2], axis=0)
x.shape
print(x)

[0 1 2 3 4 5 6 7]


In [None]:
feature_1_exp = np.expand_dims(feature_1, 1)
print(feature_1_exp.shape)

(4, 1)


In [None]:
# 1. Combine by concatenating

# first expand dimensions of the features
feature_1_exp = np.expand_dims(feature_1, 1)
feature_2_exp = np.expand_dims(feature_2, 1)
print(feature_1_exp.shape, feature_2_exp.shape)

# then concatenate
x = np.concatenate([feature_1_exp, feature_2_exp], axis=1)
print(x.shape)

(4, 1) (4, 1)
(4, 2)


In [None]:
# Combine by stacking
x = np.stack([feature_1, feature_2], axis=1)
print(x.shape)

(4, 2)


In [None]:
print(x)

** This exercise resembles Task 5a in your Linear Regression notebook!

##PART III. Array Math

We use mathmeatical functions operated on arrays to help us implement the ML algorithms.

### Basic mathematical functions

Basic mathematical functions operate **elementwise** on arrays.

In [None]:
a = np.array([[1,2],[3,4]], dtype=np.float64)
b = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array
print(a + b)
print(np.add(a, b))

[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]


In [None]:
# Elementwise difference; both produce the array
print(a - b)
print(np.subtract(a, b))

[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]


In [None]:
# Elementwise product; both produce the array
print(a * b)
print(np.multiply(a, b))

[[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]


In [None]:
# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(a / b)
print(np.divide(a, b))

[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]


In [None]:
# Elementwise square root; produces the array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(np.sqrt(a))

[[1.         1.41421356]
 [1.73205081 2.        ]]


Numpy provides many useful functions for performing computations on arrays; one of the most useful is `sum`:

In [None]:
a = np.array([[1,2],[3,4], [5,6]])
print(a)
print(np.sum(a))  # Compute sum of all elements; prints "21"

[[1 2]
 [3 4]
 [5 6]]
21


### Array Axis

You can also sum along a particular **axis** in your array. 

Axis is an important concept when manipulating arrays, when you sum along the $i^{th}$ axis, you're collapsing data along that axis.

For a 2-D matrix, you can visualize it like this:


<div>
<img src="https://i.stack.imgur.com/h1alT.jpg" width="500"/>
</div>


In [None]:
print(np.sum(a, axis=0))  # Compute sum of each column; prints "[9 12]"
print(np.sum(a, axis=1))  # Compute sum of each row; prints "[3 7 11]"

[ 9 12]
[ 3  7 11]


Numpy broadcasting allows us to perform computation without actually creating multiple copies of v. Consider this version, using broadcasting:

In [None]:
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
b = a + v  # Add v to each row of x using broadcasting
print(b)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


The line `y = x + v` works even though `x` has shape `(4, 3)` and `v` has shape `(3,)` due to broadcasting; this line works as if v actually had shape `(4, 3)`, where each row was a copy of `v`, and the sum was performed elementwise.

Here's another example of broadcasting:

In [None]:
# Multiply a matrix by a constant:
a = np.array([[1,2,3], [4,5,6]])
# x has shape (2, 3). Numpy treats scalars as arrays of shape ();
# these can be broadcast together to shape (2, 3), producing the
# following array:
print(a * 2)

[[ 2  4  6]
 [ 8 10 12]]


You can find the full list of mathematical functions provided by numpy in the [documentation](http://docs.scipy.org/doc/numpy/reference/routines.math.html).

### Matrix Multiplication

Note that `*` is elementwise multiplication, not matrix multiplication.

We instead use the `np.matmul` function to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices.

In [None]:
a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])

v = np.array([9,10])
w = np.array([11, 12])

# Inner product of vectors; produces 219
print(np.matmul(v, w))

219


In [None]:
# Matrix / vector product; produces the rank 1 array [29 67]
print(np.matmul(a, v))

[29 67]


In [None]:
# Matrix / matrix product; produces the rank 2 array
# [[19 22]
#  [43 50]]
print(np.matmul(a, b))

[[19 22]
 [43 50]]


In [None]:
# Compute inverse of a matrix; 
a_inverse = np.linalg.inv(a)
print(a_inverse)

# check that the product of a matrix and its inverse ~ identity
print(np.matmul(a_inverse, a))

[[-2.   1. ]
 [ 1.5 -0.5]]
[[1.00000000e+00 0.00000000e+00]
 [1.11022302e-16 1.00000000e+00]]


With these new tools, we can bring many seemingly complicated equations to life! 

Recall the least square estimate in Linear Regression: $$ \mathbf{a} = (\mathbf X^\top \mathbf X)^{-1} \mathbf X^\top \mathbf y $$

In [None]:
# get some sample data as X and y
N, d = 4, 2
X = np.random.rand(N, d)
y = np.ones((N, 1))

# define a according to our equation
a = np.matmul(np.matmul(np.linalg.inv(np.matmul(X.T, X)), X.T), y)

This overview touched on many of the important things that you need to know about numpy and arrays for the exercises you will encounter in GirlsWhoML. We will provide hints whenever we ask you to use functions or modules which may be unfamiliar. 

However, this is not meant to be a complete picture for numpy, if you want to learn more check out the [numpy reference](http://docs.scipy.org/doc/numpy/reference/).

## PART IV. Colab Essentials

You probably understand how Colab works by now, if not check out our [getting started doc](https://drive.google.com/file/d/1a3atjWvJvB4jIvCueL1G6pwL3yCdotKC/view?usp=sharing).

### Typical Workflow

Let's walk through the common structure of our colab notebooks. 

1. Importing libraries

  You'll see import statements like this defined at the beginning of the notebook 
```
import numpy as np
import matplotlib.pyplot as plt
```
These are predefined libraries which helps us handle operations behind the scenes. `Numpy` is for handling arrays, `matplotlib` is for plotting, etc. For example, we can call now `np.sum(array)` instead of slowly doing `array[0] + array[1] + ...` .

2. Helper functions

  These are functions for data loading and plotting. Being able to write these isn't the focus of this course so we usually just wrote them for you.

3. The Core Tasks! 

  These are tasks that we've put together for you to implement what we've just covered in class. Relevant details is also found in the Text Cells directly preceding the tasks. If you're getting rusty with array math, review the previous secion! 

Voilà! You're reached the end of this Python Essential notebook. Happy coding! 

** Send us a mesasge if you think there are other things you think we should cover here! 