# Pytorch Workshop 2 - Introduction to Deep Learning
The main elements in Pytorch are:
* PyTorch Tensors 
* Mathematical operations
* Autograd module

Things we did not cover last time:
* Optim module
* nn modiule

Today we will learn how to build your own Deep Neural Network!

In [None]:
# Import librarys 
import numpy as np

## Recap
### Pytorch Tensors
Tensors are nothing but multidimensional arrays. 

In [None]:
# Import Library
import torch

# Define a tensor
torch.FloatTensor([2])

#Try torch.FloatTensor(2), what do you get?

#Create a 2x5 matrix with elements from 1 to 10 
print(torch.FloatTensor(np.linspace(1,10,10).reshape(2,5)))

#Create a random matrix of the size 2x5
#(this will be useful later when we would like to initilaize parameters)
print(torch.randn(3,3))

## Mathematical Operations
There are more than 2200 mathematical operations you can use in Pytorch.

In [None]:
#Set things up 
x_1 = torch.FloatTensor([10])
x_2 = torch.FloatTensor([20])

#Try this 
print(x_2.add(x_1))

# now try x_2.add_(x_1) instead, whats the difference?


## Autograd module 
PyTorch uses a technique called automatic differentiation. That is, we have a recorder that records what operations we have performed, and then it replays it backward to compute our gradients. 

For this to work, we need `Variable`.

In [None]:
from torch.autograd import Variable

x = Variable(torch.FloatTensor([10]), requires_grad = True)
y = x**2

#Backprop
y.backward()

#Evaluate grad
x.grad

## Exercise 
What is the gradient of f with respect to x at 10 in the following expression? 
$$y = \log(x)$$
$$z = 2y^2$$
$$f = z + 2$$

In [None]:
# Answer here 

## Optim Module
``torch.optim`` is a module that implements various optimization algorithms used for building neural networks. Most of the commonly used methods are already supported, so that we don’t have to build them from scratch (unless you want to!).

In [None]:
optimizer = torch.optim.SGD() #will become useful later 

## nn module
PyTorch autograd makes it easy to define computational graphs and take gradients, but raw autograd can be a bit too low-level for defining complex neural networks. This is where the nn module can help.

The nn package defines a set of modules, which we can think of as a neural network layer that produces output from input and may have some trainable weights.

In [None]:
#examples
#Fully connect layer
torch.nn.Linear
#Rectified Linear unit
torch.nn.ReLU
#CNN
torch.nn.Conv1d
#Dropout
torch.nn.Dropout

# Python Class
Python is an “object-oriented programming language.” Programmers use classes to keep related things together. This is done using the keyword “class,” which is a grouping of object-oriented constructs.

### Creating a class

In [None]:
# We create a class using the class keyword
class Dog:
    pass

Rocky = Dog()
print(Rocky)

## Defining attributes and methods
A class by itself is of no use unless there is some functionality associated with it. Functionalities are defined by setting attributes, which act as containers for data and functions related to those attributes. Those functions are called methods.

In [None]:
class Dog:
    sci_name = "Canis lupus familiaris"

#Instantiate the class Dog and assign it to variable rocky
rocky = Dog()
print(rocky.sci_name)    

Methods are functions inside a class

In [None]:
#Methods
class Dog:
    sci_name = "Canis lupus familiaris"
    
    def change_name(self, new_name):
        self.sci_name = new_name 
        
rocky = Dog()
rocky.change_name("I dunno")
print(rocky.sci_name)

## Instances attributes and the init method
We can also provide the values for the attributes at runtime. This is done by defining the attributes inside the init method. 

In [None]:
class Dog:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    
    def change_name(self, new_name):
        self.name = new_name # now the name is updated

In [None]:
#Instantiate the object
my_dog = Dog("rocky", 10)
print(my_dog.name)

### Exercise
Create a class call Human and input your name and college and define a function that change your name into "halo"

# Deep Learning: 1 layer example

## Linear Regression

In [None]:
# Load the librarys
import torch 
import torch.nn as nn
from torch.autograd import Variable
import numpy as np 
import pandas as pd
import matplotlib.pylab as plt
%matplotlib inline

In [None]:
# Start off with the model itself
class LinearRegressionModel(nn.Module):
    
    def __init__(self, input_dim, output_dim):
        
        super(LinearRegressionModel, self).__init__()
        self.linear = nn.Linear(input_dim, output_dim) # How many input are we using and how many output are we expecting ?
        
    def forward(self, x):
        out = self.linear(x)
        return out
input_dim = 1
output_dim = 1

In [None]:
#Create instances of model
model = LinearRegressionModel(1,1)

#Select Loss Criterion
criterion = nn.MSELoss()
l_rate = 0.01 #learning rate 
optimiser = torch.optim.SGD(model.parameters(), lr = l_rate)

#Set the number of iteration for optimization
epochs = 2000

In [None]:
#Create fake data
x_vals = np.random.rand(50)
x_train = np.asarray(x_vals,dtype=np.float32).reshape(-1,1)
m = 1
alpha = np.random.rand(1)
beta = np.random.rand(1)
y_correct = np.asarray([2*i+m for i in x_vals], dtype=np.float32).reshape(-1,1)

In [None]:
for epoch in range(epochs):
    epoch+=1
    
    inputs = Variable(torch.from_numpy(x_train))
    labels = Variable(torch.from_numpy(y_correct))
    
    #clear grads 
    optimiser.zero_grad()
    
    #pass a forward
    outputs = model.forward(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimiser.step() #update the parameters
    print('epoch {},loss {}'.format(epoch, loss.data[0]))

In [None]:
# Printing the Predictions
predicted = model.forward(Variable(torch.from_numpy(x_train))).data.numpy()

plt.plot(x_train, y_correct, 'go', label = 'from data', alpha = .5)
plt.plot(x_train, predicted, label = 'prediction', alpha = 0.5)
plt.legend()
plt.show()
model.state_dict()

In [None]:
# We will now store the predicted values and the true values into the following:
one_layer_prediction = model.forward(Variable(torch.from_numpy(x_train), requires_grad = True))
one_layer_true_y = Variable(torch.FloatTensor(y_correct))

# Loss functions
There are two major loss functions in Machine Learning. The first one is the **Mean Squared Error Loss (MSE)** and the second one is the **CrossEntropy Loss**

## Mean Squared Error Loss
$$ MSE_{loss}(y,f) = \sum_{i=1}^n (y_i - f(x_i))^2 $$

## Cross Entropy Loss
$${CEloss}(f(x),class) = -\log\big(\frac{\exp(f(x)[class])}{\sum_{j}exp(f(x)[j]}\big)$$

In [None]:
#Instantiate the loss object
loss = nn.MSELoss()

In [None]:
#Recall we have an example of 1 layer neural net, we can obtain the performance of our prediction with the following code
loss(one_layer_prediction, one_layer_true_y).data

you will be using the Cross Entropy Loss later in today's challenge

# Optimizer 
Different optimizer will give you different results (but they hopefully should be the same in long run). In this section, we will examine the learning curve of the same network using different optimizer.

In [None]:
#Lets generate another dataset 
x = torch.unsqueeze(torch.linspace(-1, 1, 1000), dim=1)
y = x.pow(2) + 0.1*torch.normal(torch.zeros(*x.size()))

# plot dataset
plt.scatter(x.numpy(), y.numpy())
plt.show()

In [None]:
# This time define your own network, name the class (network) "Net".
class Net(torch.nn.Module):
   # your code here

In [None]:
# Instantiate different Nets
net_SGD         = Net(n_feature = 1, n_hidden = 5, n_output=1)
net_Momentum    = Net(n_feature = 1, n_hidden = 5, n_output=1)
net_RMSprop     = Net(n_feature = 1, n_hidden = 5, n_output=1)
net_Adam        = Net(n_feature = 1, n_hidden = 5, n_output=1)
nets = [net_SGD, net_Momentum, net_RMSprop, net_Adam]


In [None]:
#Set up the hyperparameters
LR = 0.05

# different optimizers
opt_SGD         = torch.optim.SGD(net_SGD.parameters(), lr=LR)
opt_Momentum    = torch.optim.SGD(net_Momentum.parameters(), lr=LR, momentum=0.8)
opt_RMSprop     = torch.optim.RMSprop(net_RMSprop.parameters(), lr=LR, alpha=0.9)
opt_Adam        = torch.optim.Adam(net_Adam.parameters(), lr=LR, betas=(0.9, 0.99))
optimizers = [opt_SGD, opt_Momentum, opt_RMSprop, opt_Adam]


In [None]:
# As above, we will use the MSELoss
loss_func = torch.nn.MSELoss()
losses_his = [[], [], [], []]   # record loss

In [None]:
#set the inputs as variables
x = Variable(x)
y = Variable(y)

# Run the networks and store the errors into the losses_his
for t in range(400):
    for net, opt, l_his in zip(nets, optimizers, losses_his):
        #Your code here

In [None]:
# We can now examine the learning curves
labels = ['SGD', 'Momentum', 'RMSprop', 'Adam']
for i, l_his in enumerate(losses_his):
    plt.plot(l_his, label=labels[i])
plt.legend(loc='best')
plt.xlabel('Steps')
plt.ylabel('Loss')
plt.ylim([0,0.3])
plt.show()


# Activation functions
In this section we will plot the activation functions out and examine their shapes 

In [None]:
x = Variable(torch.linspace(-2,2,200))
y_relu = F.relu(x)
y_sig = F.sigmoid(x)
y_tanh = F.tanh(x)
y_elu = F.elu(x)

In [None]:
#Plot the activation functions


# Finally! Some Deep Learning
Lets start with an example 

<img src="network_viz.png">


In [None]:
#Import the packaes first 
import torch 
import numpy as np
from torch.autograd import Variable
import torch.nn.functional as F
import matplotlib.pyplot as plt
plt.style.use('ggplot')

In [None]:
#Randomly generate some data
x = torch.FloatTensor(np.linspace(1,100,300).reshape(100,3))
y = 2*x[:,1] + x[:,2]**2 +x[:,-3]

In [None]:
# Torch only works on variables, so what should you do here? 
x,y = Variable(x), Variable(y)

In [None]:
class Net(torch.nn.Module):
    def __init__(self, n_feature, n_hidden, n_output):
        super(Net, self).__init__()
        self.hidden = torch.nn.Linear(n_feature, n_hidden)   # hidden layer
        self.predict = torch.nn.Linear(n_hidden, n_output)   # output layer

    def forward(self, x):
        x = F.relu(self.hidden(x))      # activation function for hidden layer
        x = self.predict(x)             # linear output
        return x

In [None]:
net = Net(n_feature = 3, n_hidden = 3, n_output = 1)
print(net)

In [None]:
optimizer = torch.optim.SGD(net.parameters(), lr=0.5) #Smaller learning rate, longer to converge
loss_func = torch.nn.MSELoss()  

In [None]:
#Do prediction
net = Net(n_feature = 3, n_hidden = 3, n_output=1)
print(net) # show net architecture

optimizer = torch.optim.SGD(net.parameters(),lr=0.5)
loss_func = torch.nn.MSELoss()
RMSE = []
# run 300 optimization
for t in range(300):
    prediction = net(x) #feedfoward
    RMSE.append(np.sqrt(np.mean((prediction.data.numpy() - y.data.numpy())**2)))

    loss = loss_func(prediction, y) #evaluation
    
    optimizer.zero_grad() #clear gradients for next training 
    loss.backward() #backpropagation to compute gradients
    optimizer.step() #apply the gradients to the parameters


In [None]:
x_axis = np.linspace(0,299,300)
y_axis = RMSE
plt.plot(x_axis,y_axis)
plt.title('Learning Curve')
plt.ylabel("RMSE")
plt.xlabel("Iteration")

Now try to use 10 hidden units instead and compare the performance

In [None]:
#  Your code here 

# Build a neural network quickly

In [None]:
import torch
import torch.nn.functional as F

In [None]:
# replace following class code with an easy sequential network
class Net(torch.nn.Module):
    def __init__(self, n_feature, n_hidden, n_output):
        super(Net, self).__init__()
        self.hidden = torch.nn.Linear(n_feature, n_hidden)   # hidden layer
        self.predict = torch.nn.Linear(n_hidden, n_output)   # output layer

    def forward(self, x):
        x = F.relu(self.hidden(x))      # activation function for hidden layer
        x = self.predict(x)             # linear output
        return x

In [None]:
# Old way
net1 = Net(1, 10, 1)

# easy and fast way to build your network
net2 = torch.nn.Sequential(
    torch.nn.Linear(1, 10),
    torch.nn.ReLU(),
    torch.nn.Linear(10, 1)
)

In [None]:
print(net1)     # net1 architecture
print(net2)     # net2 architecture

# Challenge time ! Class predictions for the retail dataset!
In this challenge, you will be predicting the class membership of a customer from the retail dataset. Remember to split your dataset into training and testing set.

In [None]:
#Import the necessary packages
import pandas as pd

In [None]:
#Read the data using pd.read_csv() and set "CustomerID" as the index col
retail = pd.read_csv("retail_data.csv", index_col= "CustomerID")

In [None]:
# Your work here :)