<a href="https://colab.research.google.com/github/csbell-vu/py4-dsms-success/blob/main/5_vectorization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Vectorization
> Rounding out Python knowledge with vectorization

As you know from your coursework with datacamp, numpy is optimized for fast vectorized operations for numerical operations including linear algebra and other functions. Let's check this out.

In [None]:
import numpy as np
import pandas as pd

# Introduction to numpy
We've already seen some operations with numpy through datacamp. Let's remind ourselves of the functionality using some of our previous examples.

In [None]:
#@markdown We'll just hide the work that we're going to do to generate the dataset.
#@markdown It's a simple, inelegant way of generating just a bit more data.
#@markdown Make sure to execute this cell to have access to the dog_data dataset.

#original data
weight_kgs = [25.0, 20.22, 17.83, 10.22, 8.05]
height_cm = [68.0, 57.99, 45.21, 36.2, 10.22]
neck_circ_cm = [45.2, 50.35, 55.2, 40.88, 5.06]
back_length_cm = [63.2, 50.25, 43.8, 50.1, 12.5]
chest_circ_cm = [78.2, 86.92, 53.9, 71.2, 25.5]
breed = ['Afghan Hound', 'Airedale Terrier', 'Staffordshire Terrier', 'Australian Shepherd', 'Toy Poodle']

dog_data_sm = {'weight_kgs': weight_kgs,
            'height_cm': height_cm,
            'neck_circ_cm': neck_circ_cm,
            'back_length_cm': back_length_cm,
            'chest_circ_cm' : chest_circ_cm,
            'breed': breed}

#extend to 10 elements
dog_data = {key : value + (np.array(value) + 2*np.random.rand(len(value))).tolist()
            for key, value in dog_data_sm.items() if key!='breed'}

#fix breed to be twice as long
dog_data['breed'] = breed + breed

#for visualization purposes
pd.DataFrame(dog_data)

## Creation of numpy arrays

In [None]:
#Lets practice dictionary comprehensions to remove the breed key
dog_data_num = {key:value for key, value in dog_data.items() if key != 'breed'}

In [None]:
#Let's try an unnecessarily difficult approach to make this dataset
column_names, data = list(zip(*dog_data_num.items()))
column_names

In [None]:
#Make target vector

weight_kgs_np

In [None]:
#Create data

np_data

In [None]:
#Remove target column from data

np_data

## Learn about numpy array

In [None]:
# number of rows and columns
print()

# number of dimensions of the data
print()

# data type of elements


## Indexing
We can use dimension indices to access data for each dimension

In [None]:
# row 0 and all columns


In [None]:
# row 5, column 2


In [None]:
# boolean indexing: get rows where dog weights are greater than 20 kgs


In [None]:
#create data for modeling
np_data_mdl = np.hstack((np.ones_like(np_data[:,0]).reshape(10,1), np_data))
np_data_mdl

### Example 1: Converting to weight_lbs
Let's use a for loop to create a new list. The new list should be the conversion of `weight_kgs_np` to `weight_lbs_np` where 1 kg = 2.2 lbs.

In [None]:
# using braodcasting/vectorization

weight_lbs_np

### Example 2: A basic linear regression
Let's make a REALLY terrible predictor. We'll try to predict the weight based on the height, circumferences, and back lengths. What is this operation?

$$ \hat{dog weight} = w_0 + (w_1*height) + (w_2 * neck\ circumference) + (w_3 * back\ length) + (w_4 * chest\ circumference) $$
<center><br><img src = 'https://algebra1course.files.wordpress.com/2013/02/dot-product-visual.jpg' /></center>

In [None]:
# let's just start by making a list of random weights
w = [1, 0.3, -0.4, 2.2, 0.5]

# Let's turn this into a numpy array


# Explore the shape and dimension
print(np_w, 'shape:', np_w.shape, 'dim:', np_w.ndim)

We can think of this problem as a particular linear algebra operation. Let's look at the [numpy linear algebra documentation](https://numpy.org/doc/stable/reference/routines.linalg.html).

In [None]:
# let's do the calculation for a single "row" or "dog"


# view the result
dogweight_pred

How can we do this for all data? Numpy will automatically perform [broadcasting](https://numpy.org/doc/stable/user/basics.broadcasting.html) depending on the shape of your data.

In [None]:
#use element-wise multiply with broadcasting to obtain dot product applied to each row


#view the result
dogweight_preds

Yay, us! With a few lines (but more knowledge of mathematics), we've created a terrible prediction model!

In [None]:
#@title Try it Yourself: Normalizing data
#@markdown Let's try a vectorized approach to normalizing data. You'll normalize the `weight_kgs_np` column, and use only the first
#@markdown 5 elements (to match the previous example with iteration to check your work).
#@markdown The calculation for normalization that we will use is:
#@markdown $$ /frac{value - np.std(column)}{mean(column)}

#calculate mean
weight_kgs_norm = (weight_kgs_np[:5] - np.mean(weight_kgs_np[:5]))/ np.std(weight_kgs_np[:5])
weight_kgs_norm

In [None]:
# your answer here

### Example 3: A basic neural network
Let's make a REALLY terrible predictor. We'll try to predict the weight based on the height, circumferences, and back lengths. What is this operation? We'll simplify by dropping off the "bias".

$$ \hat{dog weight} = (w_1*height) + (w_2 * neck\ circumference) + (w_3 * back\ length) + (w_4 * chest\ circumference) $$

Consider the behavior of a dot product.
<center><br><img src = 'https://algebra1course.files.wordpress.com/2013/02/dot-product-visual.jpg' /></center>

Now, consider that layer of nodes in a neural network is just a bunch of "models" stacked up.
<center><br><img width='40%' src = 'https://media.geeksforgeeks.org/wp-content/uploads/20200702205951/nn.PNG' /></center>

How can we extend what we've done here to this model?

In [None]:
#recall our original data
np_data

In [None]:
# create nodes in your layer

layer_node_w

In [None]:
# get one single input (row of data)

x

In [None]:
# get outputs of all nodes

layer_output

Now, we'll do essentially the same thing for our output node.

In [None]:

output_w

In [None]:

nn_pred

Yay!! Vectorization has helped us quickly compute the forward pass of a neural network!