![title](https://image.ibb.co/erDntK/logo2018.png)

---

# Task 2 - Artificial Neural Network


In this assignment you will practice putting together a simple image classification pipeline, based on the Two-layer Neural Network classifier. The goals of this assignment are as follows:
* understand the basic Image Classification pipeline and the data-driven approach (train/predict stages)
* understand the train/val/test splits and the use of validation data for hyperparameter tuning.
* develop proficiency in writing efficient vectorized code with numpy

* implement and apply various activation functions
* implement and apply hyperparameter finetuning strategy

* understand the differences and tradeoffs between these classifiers
* the use of feature extraction to boost neural net performance




In [0]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

np.set_printoptions(precision=7)

Write down your Name and Student ID

In [0]:
## --- start your code here ----

NIM = 123456
Nama = ""

## --- end your code here ----

# [Part 1] Simple One-Layer Neural Network

## 1 - Load Dataset

For this exercise, we will use binary class data to recognize `cats` and `not cats`. 
The images are $64\times64$ in dimension.


---
### a. Load Cat Dataset
first, load the dataset


In [0]:
import h5py    
    
def load_dataset():
    !wget 'https://raw.githubusercontent.com/cnn-adf/Task2019/master/resources/catvnoncat.h5'
    dataset = h5py.File('catvnoncat.h5', "r")

    train_set_x_orig = np.array(dataset["X_train"][:]) # your train set features
    train_set_y_orig = np.array(dataset["y_train"][:]) # your train set labels
    val_set_x_orig = np.array(dataset["X_val"][:]) # your val set features
    val_set_y_orig = np.array(dataset["y_val"][:]) # your val set labels
    classes = np.array(dataset["classes"][:]) # the list of classes
    
    return train_set_x_orig, train_set_y_orig, val_set_x_orig, val_set_y_orig, classes

In [0]:
X_train_ori, y_train, X_val_ori, y_val, classes = load_dataset()

In [0]:
print(X_train_ori.shape)
print(y_train.shape)
print(X_val_ori.shape)
print(y_val.shape)

**EXPECTED OUTPUT**: 
<pre>
(209, 64, 64, 3)
(209, 1)
(50, 64, 64, 3)
(50, 1)

View some data

In [0]:
fig, ax = plt.subplots(2,10,figsize=(18,5))
fig.subplots_adjust(hspace=0.1, wspace=0.1)
for j in range(0,2):
    for i in range(0, 10):
        ax[j,i].imshow(X_train_ori[i+j*10])
        ax[j,i].set_title(classes[y_train[i+j*10,0]].decode("utf-8") )
        ax[j,i].axis('off')
plt.show()

---
### b. Reshape and Normalize Data
<br>

<font color='red'>**EXERCISE**: </font>
* Reshape `X_train_ori` and `X_val_ori` into 1-dimensional matrix, 
* Store it as `X_train` and `X_val`
* the `X_train_ori` and `X_val_ori` shape should still be `(209, 64, 64, 3)` and `(50, 64, 64, 3)`

<br>

*Hint: use `np.reshape()`*

In [0]:
X_train = ??
X_val = ??

In [0]:
print('before')
print(X_train_ori.shape)
print(X_val_ori.shape)
print('\nafter')
print(X_train.shape)
print(X_val.shape)

**EXPECTED OUTPUT**: 
<pre>
before
(209, 64, 64, 3)
(50, 64, 64, 3)

after
(209, 12288)
(50, 12288)


<br>

<font color='red'>**EXERCISE**: </font>
* Since for this dataset using sigmoid and regression is enough, standarize the dataset into a `range of 0-1` by dividing it with `255`

In [0]:
print('before',X_train[0,:6])

**EXPECTED OUTPUT**: 
<pre>
before [17 31 56 22 33 59]

In [0]:
X_train = ??
X_val = ??

In [0]:
print('after',X_train[0,:6])

**EXPECTED OUTPUT**: 
<pre>
after [0.0666667 0.1215686 0.2196078 0.0862745 0.1294118 0.2313725]

---
## 2 - Basic Neurons

Standard neuron is basically the same as previous linear function. 

So in here, we've already provide you with the forward-backward implementations

---
### a. Forward and Backward Affine Function

Implement Affine forward function:

$$
\begin{align}
f(x, W, b) = x.W + b
\end{align}
$$


In [0]:
def affine_forward(x, W, b):   
    v = np.dot(x, W) + b    
    return v

Implement affine backward function:


$$
\begin{align*}
\partial W & = x^T.\partial out \\
\partial b & = \sum \partial out \\
\partial x & = \partial out.W^T \\
\end{align*}
$$

In [0]:
def affine_backward(dout, x, W, b):
    dW = np.dot(x.T,dout)
    db = np.sum(dout, axis=0, keepdims=True)
    dx = dout.dot(W.T)
    
    return dW, db, dx

---
### b. Forward and Backward Sigmoid Function

For the activation function, though, you need to make it yourself
<br>

<font color='red'>**EXERCISE**: </font>
* Implement Sigmoid forward function:

$$
\begin{align}
f(x) = \sigma(x) = \frac{1}{1+e^{-v}}
\end{align}
$$


In [0]:
def sigmoid_forward(x):  
  out = ??
  return out

In [0]:
x = np.array([-2, -1, 0, 1, 2, 3])
ds = sigmoid_forward(x)

print(ds)

**EXPECTED OUTPUT**:

<pre>[0.1192029 0.2689414 0.5       0.7310586 0.8807971 0.9525741]


---
<br>

<font color='red'>**EXERCISE**: </font>
* Implement Sigmoid backward function
$$
\begin{align*}
\sigma'(x) = \sigma(x) \ (1 - \sigma(x))\\\\
\partial out = \partial out . \sigma'(x)
\end{align*}
$$
<br>

In [0]:
def sigmoid_backward(dout, ds):
    """
    Argument:
        ds: sigmoid forward result
        dout: gradient error
    """
    di = ??
    dout = ??
    return dout

In [0]:
np.random.seed(10)
dout = np.random.random((6,)) 
dout = sigmoid_backward(dout, ds)
print(dout)

**EXPECTED OUTPUT**:

<pre>[0.0809837 0.0040801 0.1584121 0.1472238 0.05234   0.0101556]



---
### c. Forward and Backward Tanh Function
<br>

<font color='red'>**EXERCISE**: </font>
* Implement `Tanh` forward function
*hint: use `np.tanh(x)`

$$
\begin{align}
f(x) = tanh(x)
\end{align}
$$


In [0]:
def tanh_forward(x):  
    out = ??
    return out

In [0]:
x = np.array([-2, -1, 0, 1, 2, 3])
dt = tanh_forward(x)

print(dt)

**EXPECTED OUTPUT**:

<pre>[ -0.9640276 -0.7615942  0.         0.7615942  0.9640276  0.9950548]



---
<br>

<font color='red'>**EXERCISE**: </font>
* Implement Tanh backward function
$$
\begin{align*}
f'(x) = 1-f(v)^2\\
\\
\partial out = \partial out . f'(x)
\end{align*}
$$
<br>

In [0]:
def tanh_backward(dout, dt):
    """
    Argument:
        dt: tanh forward result
        dout: gradient error
    """
    di = ??
    dout = ??
    return dout

In [0]:
np.random.seed(10)
dout = np.random.random((6,)) 
dout = tanh_backward(dout, dt)
print(dout)

**EXPECTED OUTPUT**:

<pre>[0.0544944 0.0087153 0.6336482 0.3144784 0.0352199 0.0022179]


---
## 3 - Train a One-Layer Sigmoid



### a. Training Function

The network architecture should be: <br>
<pre><b>Input - FC layer - Sigmoid</b></pre>


<br>

<font color='red'>**EXERCISE :** </font>
Implement Training Function

* `call affine forward function`
* `call sigmoid forward function`
* `apply L2 loss regression (MSE)`
* `call affine backward function`
* `implement weight update`

<br>

**Note** that we do not calculate `sigmoid backward` as in Single Layer Perceptron `sigmoid` is used directly as output activation

In [0]:
def train_one_layer(X, y, W=None, b=None, learning_rate=0.005, num_iters=100, verbose=True):
    
    
    num_train, dim = X.shape
    
    num_classes = 1
    
    if W is None:
        W = 0.02*np.random.rand(dim, num_classes)
    if b is None:
        b = np.zeros((1, num_classes))

    # Run stochastic gradient descent to optimize W
    loss_history = []
                     
    for it in range(num_iters):
        

        # calculate 1st layer score by calling affine forward function using X, W, and b
        layer1 = ??
        
        # calculate 1st activation score by calling sigmoid forward function using layer1 score
        act1 = ??
        
        # calculate error by subtracting act1 with y
        error = ??
        
        # calculate L2 Loss (MSE) by averaging the squared error
        loss = ??  
        
        # divide error by num_train
        error = ??
    
        # calculate layer 1 weights gradient by calling affine backward function using error, X, W, and b
        dW, db, _ = ??

        
        # perform parameter update by subtracting each W and b with a fraction of dW and db
        # according to the learning rate
        W -= ??
        b -= ??
        
        if verbose and it % 100 == 0:
            print ('iteration', it,'/',num_iters, ': loss =', loss)
            # append the loss history
            loss_history.append(loss)

    return loss_history, W, b

### b. Train the Softmax Classifier

Try the training Function using the initial parameter

In [0]:
loss, W, b = train_one_layer(X_train, y_train, num_iters=1000, learning_rate=0.005)

Visualize the loss

In [0]:
plt.rcParams['figure.figsize'] = [12, 5]
plt.plot(loss)
plt.xlabel('iteration')
plt.ylabel('training loss')
plt.title('Training Loss history')
plt.show()


### c. Predict Function

The network architecture should be: <br>
<pre><b>Input - FC layer - Sigmoid</b></pre>


<br>

<font color='red'>**EXERCISE :** </font>
Implement Predict Function

* `call forward function`

In [0]:
def predict_one_layer(X, W, b):    
    y_pred = np.zeros(X.shape[1])
    
    # calculate 1st layer score by calling affine forward function using X, W, and b
    layer1 = ??

    # calculate 1st activation score by calling sigmoid forward function using layer1 score
    act1 = ??
    
    # since it's a binary class, round the score to get the class
    y_pred = np.round(act1)
    
    return y_pred

Calculate the Training Accuracy

In [0]:
import sklearn
from sklearn.metrics import accuracy_score

y_pred = predict_one_layer(X_train, W, b)
accuracy = sklearn.metrics.accuracy_score(y_train, y_pred)

print('Training Accuracy =',accuracy*100,'%')

print('Training label  =',y_train[:15].ravel())
print('Predicted label =',y_pred[:15].astype('int').ravel())

**EXPECTED OUTPUT**:

<pre>You should get about <b>~95%</b> accuracy on training set using the initial run</pre>

Calculate the Validation Accuracy

In [0]:
y_pred = predict_one_layer(X_val, W, b)
accuracy = sklearn.metrics.accuracy_score(y_val, y_pred)
print('Validation Accuracy =', accuracy*100,'%')

print('Validation label  =',y_val[:15].ravel())
print('Predicted label =',y_pred[:15].astype('int').ravel())

**EXPECTED OUTPUT**:

<pre>You should also get about <b>~74%</b> accuracy on validation set</pre>

<br>

You can retrain further the weights by adding the pre-trained W and b to the arguments when calling training function



In [0]:
# loss, W, b = train_one_layer(X_train, y_train, W=W, b=b, num_iters=1000, learning_rate=0.005)

---
---
# [Part 2] Two-Layer Neural Network on CIFAR-10

Now, let's build a two-layered Neural Network and train it using CIFAR-10 dataset

---
## 1 - Load CIFAR-10 Dataset

CIFAR-10 dataset is a image classification dataset, consisting of 10 classes. The images are $32\times32$ color image with 50,000 data train and 10,000 data test

---
### a. Import Data ***CIFAR-10***

In [0]:
import tensorflow as tf

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

print('X_train.shape =',X_train.shape)
print('y_train.shape =',y_train.shape)
print('X_test.shape  =',X_test.shape)
print('y_test.shape  =',y_test.shape)

X_test_ori = X_test

---
### b. Visualizing Data
Show the first 20 images from `X_train`

In [0]:
fig, ax = plt.subplots(2,10,figsize=(15,4.5))
fig.subplots_adjust(hspace=0.1, wspace=0.1)
for j in range(0,2):
    for i in range(0, 10):
        ax[j,i].imshow(X_train[i+j*10])
        ax[j,i].set_title(classes[y_train[i+j*10,0]])
        ax[j,i].axis('off')
plt.show()

---
### c. Split Training Data

<br>

<font color='red'>**EXERCISE :** </font>
* `Get the last 1000 data from Training Set as Validation Set`

In [0]:
X_val = ??
y_val = ??

X_train = ??
y_train = ??

print('X_val.shape   =',X_val.shape)
print('y_val.shape   =',y_val.shape)
print('X_train.shape =',X_train.shape)
print('y_train.shape =',y_train.shape)

**EXPECTED OUTPUT:**
<pre>X_val.shape   = (1000, 32, 32, 3)
y_val.shape   = (1000, 1)
X_train.shape = (49000, 32, 32, 3)
y_train.shape = (49000, 1)

---
### d. Normalizing Data
Normalize `X_train`, `X_val` and `X_test` by *zero-centering* them:

In [0]:
X_train = X_train.astype('float32')
X_val = X_val.astype('float32')
X_test = X_test.astype('float32')

mean_image = np.mean(X_train, axis = 0)
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image

print('np.mean(X_train) =',np.mean(X_train))
print('np.mean(X_val)   =',np.mean(X_val))
print('np.mean(X_test)  =',np.mean(X_test))


---
### e. Reshape Data
Reshape each data in `X_train`, `X_val` and `X_test` into 1-dimensional matrix


In [0]:
X_train = X_train.reshape((X_train.shape[0],X_train.shape[1]*X_train.shape[2]*X_train.shape[3]))
X_val = X_val.reshape((X_val.shape[0],X_val.shape[1]*X_val.shape[2]*X_val.shape[3]))
X_test = X_test.reshape((X_test.shape[0],X_test.shape[1]*X_test.shape[2]*X_test.shape[3]))

print('X_train.shape =',X_train.shape)
print('X_val.shape   =',X_val.shape)
print('X_test.shape  =',X_test.shape)

 Reshape `y_train`, `y_val` and `y_test` into a vector 


In [0]:
y_train = y_train.ravel()
y_val = y_val.ravel()
y_test = y_test.ravel()

print('y_train.shape =',y_train.shape)
print('y_val.shape   =',y_val.shape)
print('y_test.shape  =',y_test.shape)

---
## 2 - Advanced Activation Functions

For this part, you need to implement several advanced activation functions that became popular recently

---
### a. Rectified Linear Unit (ReLU)

Implement the forward and backward function of the infamous Rectified Linear Unit (ReLU) activation function

<br>

<font color='red'>**EXERCISE :** </font>
* Implement ReLU forward function:


$$
\begin{align}
f(x) = 
\begin{cases}
0, & \text{for } x<0\\
x, & \text{for } x\geq0
\end{cases}
\end{align}
$$

*hint: use `np.maximum()`

In [0]:
def relu_forward(x):  
    out = ??
    return out

In [0]:
x = np.array([-2, -1, 0, 1, 2, 3])
dr = relu_forward(x) 

print(dr)

**EXPECTED OUTPUT**:

<pre>[0, 0, 0, 1, 2, 3]


---

<br>

<font color='red'>**EXERCISE :** </font>
* Implement ReLU backward function
$$
\begin{align*}
f'(v) = 
\begin{cases}
0, & \text{for } x<0\\
1, & \text{for } x\geq0
\end{cases}\\
\\
\partial out = \partial out . f'(x)
\end{align*}
$$

*hint: use `np.where(condition, if true, if false)`

In [0]:
def relu_backward(dout, x):
    """
    Argument:
        x: relu input
        dout: gradient error
    """
    di   = ??
    dout = ??
    return dout

In [0]:
np.random.seed(10)
dout = np.random.random((6,)) 
dout = relu_backward(dout, x)
print(dout)

**EXPECTED OUTPUT**:

<pre>[0.        0.        0.        0.7488039 0.498507  0.2247966]



---
### b. Parametric ReLU (Leaky ReLU)

Implement the forward and backward function of parametric Rectified Linear Unit (Leaky ReLU) activation function


<br>

<font color='red'>**EXERCISE :** </font>
* Implement Parametric ReLU forward function:


$$
\begin{align}
f(x, \alpha) = 
\begin{cases}
\alpha x, & \text{for } x<0\\
x, & \text{for } x\geq0
\end{cases}
\end{align}
$$

*hint: use `np.where(condition, if true, if false)`

In [0]:
def prelu_forward(x, alpha):  
    out = ??
    return out

In [0]:
x = np.array([-2, -1, 0, 1, 2, 3])
alpha = 0.01
dp = prelu_forward(x, alpha) 

print(dp)

**EXPECTED OUTPUT**:

<pre>[-0.02 -0.01  0.    1.    2.    3.  ]


---

<br>

<font color='red'>**EXERCISE :** </font>
* Implement PReLU backward function
$$
\begin{align*}
f'(x, \alpha) = 
\begin{cases}
\alpha, & \text{for } x<0\\
1, & \text{for } x\geq0
\end{cases}\\
\\
\partial out = \partial out . f'(x)
\end{align*}
$$

*hint: use `np.where(condition, if true, if false)`

In [0]:
def prelu_backward(dout, x, alpha):
    """
    Argument:
        x: prelu input
        dout: gradient error
    """
    di   = ??
    dout = ??
    return dout

In [0]:
np.random.seed(10)
dout = np.random.random((6,)) 
dout = prelu_backward(dout, x, alpha)


np.set_printoptions(precision=5)
print(dout)

**EXPECTED OUTPUT**:

<pre>[7.71321e-03 2.07519e-04 6.33648e-03 7.48804e-01 4.98507e-01 2.24797e-01]



---
### c. ELU Function

Implement the forward and backward function of the new Exponential Linear Unit (ELU) activation function

<br>

<font color='red'>**EXERCISE :** </font>
* Implement ELU forward function:


$$
\begin{align}
f(x, \alpha) = 
\begin{cases}
\alpha (e^x-1), & \text{for } x<0\\
x, & \text{for } x\geq0
\end{cases}
\end{align}
$$

*hint: use `np.where(condition, if true, if false)`

In [0]:
def elu_forward(x, alpha):  
    out = ??
    return out

In [0]:
x = np.array([-2, -1, 0, 1, 2, 3])
alpha = 1.0
de = elu_forward(x, alpha) 

np.set_printoptions(precision=7)
print(de)

**EXPECTED OUTPUT**:

<pre>[-0.8646647 -0.6321206  0.         1.         2.         3.       ]


---

<br>

<font color='red'>**EXERCISE :** </font>
* Implement `ELU` backward function
$$
\begin{align*}
f'(x, \alpha) = 
\begin{cases}
f(x,\alpha)+\alpha, & \text{for } x<0\\
1, & \text{for } x\geq0
\end{cases}\\
\\
\partial out = \partial out . f'(x)
\end{align*}
$$

*hint: use `np.where(condition, if true, if false)`

In [0]:
def elu_backward(dout, x, alpha):
    """
    Argument:
        x: elu input
        dout: gradient error
    """
    di   = ??
    dout = ??
    return dout

In [0]:
np.random.seed(10)
dout = np.random.random((6,)) 
dout = elu_backward(dout, x, alpha)
print(dout)

**EXPECTED OUTPUT**:

<pre>[0.1043869 0.0076342 0.6336482 0.7488039 0.498507  0.2247966]

</pre?


---
## 3 - Softmax Function 


### a. Softmax Score

The implementation is the same as previous Task, so in here we already provide you with the implementation


In [0]:
def softmax(x):  
    
    # shift x by subtracting with its maximum value . Use np.max(...)
    x -= np.max(x)
    
    # Apply exp() element-wise to x. Use np.exp(...).    
    x_exp = np.exp(x)
    
    # Create a vector X_sum that sums each row of X_exp. Use np.sum(..., axis = 1, keepdims = True).
    x_sum = np.sum(x_exp, axis = 1, keepdims = True)  
    
    # Compute softmax(x) score by dividing X_exp by X_sum. It should automatically use numpy broadcasting.
    score = x_exp / x_sum
    
    return score

### b. Softmax Loss

Implement a softmax loss function using numpy. The implementation is the same as previous Task, so in here we already provide you with the implementation


In [0]:
def softmax_loss(score, y):
   
    num_examples = score.shape[0]
    
    #make a number list containing [1 2 3 ... n]
    number_list = range(num_examples)
    
    # calculate the correct log probability of score[number_list,y] by applycing -np.log(...)
    corect_logprobs = -np.log(score[number_list,y])
    
    # average the correct log probability, use np.sum then divide it by num_examples
    loss = np.sum(corect_logprobs)/num_examples
    
    
    # 3. COMPUTE THE GRADIENT ON SCORES
    dscores = score
    dscores[range(num_examples),y] -= 1
    dscores /= num_examples

    
    return loss, dscores

---
---
# [Part 3] Train Sigmoid vs ReLU

Let's train the network to CIFAR-10 dataset, and compare the result between using Sigmoid and ReLU activation function

---
## 1 - Train a Two-Layer Sigmoid with Softmax

---

### a. Training Function

The network architecture should be: **Input - FC layer - Sigmoid - FC Layer - Softmax**


<br>

<font color='red'>**EXERCISE :** </font>Implement Training Function

* `call affine forward function`
* `call sigmoid forward function`
* `call affine forward function`
* `call softmax function`
* `call softmax_loss function`
* `call affine backward function`
* `call sigmoid backward function`
* `call affine backward function`
* `implement weight update`

In [0]:
def train_two_layer_sigmoid(X, y, hidden_size, W=None, b=None, learning_rate=1e-4, lr_decay=0.95, reg=0.5, num_iters=100, 
                    batch_size=200, verbose=True):
    num_train, dim = X.shape
    iterations_per_epoch = max(num_train / batch_size, 1)
    
    num_classes = np.max(y) + 1 # assume y takes values 0...K-1 where K is number of classes
    
    if W is None:
        W0 = 1e-4 * np.random.randn(dim, hidden_size)
        W1 = 1e-4 * np.random.randn(hidden_size, num_classes)
        W = [W0, W1]
    if b is None:
        b0 = np.zeros((1,hidden_size))
        b1 = np.zeros((1,num_classes))
        b = [b0, b1]

    # Run stochastic gradient descent to optimize W
    loss_history = []
                     
    for it in range(num_iters):
        X_batch = None
        y_batch = None

        # Randomly select indices from training examples
        train_rows = np.arange(num_train)
        idxs = np.random.choice(train_rows, batch_size, replace=False)
  
        X_batch = X[idxs]
        y_batch = y[idxs]


        # calculate 1st layer score by calling affine forward function using X_batch, W[0], and b[0]
        layer1 = ??
        
        # calculate 1st activation score by calling sigmoid forward function using layer1 score
        act1 = ??
        
        # calculate 2nd layer score by calling affine forward function using act1, W[1], and b[1]
        layer2 = ??
                
        # calculate softmax score by calling softmax function using layer2 score
        softmax_score = ??
        
        # evaluate loss and gradient by calling softmax_loss function using softmax_score and y_batch
        loss, dout = ??
        
        # add regularization to the loss
        loss+= 0.5 * reg * np.sum(W[0] * W[0]) + 0.5 * reg * np.sum(W[1] * W[1])
    
        # append the loss history
        loss_history.append(loss)

        # calculate layer 2 weights gradient by calling affine backward function using dout, act1, W[1], and b[1]
        dW1, db1, dact1 = ??
        
        # calculate sigmoid gradient by calling sigmoid backward function using dact1 and act1 score
        dlayer1 = ??
    
        # calculate layer 1 weights gradient by calling affine backward function using dlayer1, X_batch, W[0], and b[0]
        dW0, db0, _ = ??
        
        # perform regulatization gradient
        dW1 += reg*W[1]
        dW0 += reg*W[0]
        
        # perform parameter update by subtracting W and b with a fraction of dW and db
        # according to the learning rate
        W[0] -= ??
        b[0] -= ??
        W[1] -= ??
        b[1] -= ??
        
        if verbose and it % 100 == 0:
            print ('iteration', it,'/',num_iters, ': loss =', loss)
            
        if it % iterations_per_epoch == 0:
            # Decay learning rate
            learning_rate *= lr_decay
        
    return loss_history, W, b

### b. Train the Softmax Classifier

Try the training Function using the initial parameter

In [0]:
loss, W_sigm, b_sigm = train_two_layer_sigmoid(X_train, y_train, hidden_size=50, num_iters=1000)

Visualize the loss

In [0]:
plt.rcParams['figure.figsize'] = [12, 5]
plt.plot(loss)
plt.xlabel('iteration')
plt.ylabel('training loss')
plt.title('Training Loss history')
plt.show()


### c. Predict Function


<br>

<font color='red'>**EXERCISE :** </font> Implement Predict Function
* call forward function

<br>

The network architecture should be: 
<pre><b>Input - FC layer - Sigmoid - FC Layer - argmax



In [0]:
def predict_two_layer_sigmoid(X, W, b):    
    y_pred = np.zeros(X.shape[1])

    
    # calculate 1st layer score by calling affine forward function using X_batch, W[0], and b[0]
    layer1 = ??

    # calculate 1st activation score by calling sigmoid forward function using layer1 score
    act1 = ??

    # calculate 2nd layer score by calling affine forward function using act1, W[1], and b[1]
    layer2 = ??
    
    # take the maximum prediction from layer 2 and use that column to get the class   
    # use np.argmax
    y_pred = ??
    
    return y_pred

In [0]:
import sklearn
from sklearn.metrics import accuracy_score

y_pred = predict_two_layer_sigmoid(X_train, W_sigm, b_sigm)
accuracy = sklearn.metrics.accuracy_score(y_train, y_pred)

print('Training Accuracy =',accuracy*100,'%')

print('Training label  =',y_train[:15])
print('Predicted label =',y_pred[:15])

**EXPECTED OUTPUT**:

<pre>You should get about <b>~20%</b> accuracy on training set using the initial run

In [0]:
y_pred = predict_two_layer_sigmoid(X_val, W_sigm, b_sigm)
accuracy = sklearn.metrics.accuracy_score(y_val, y_pred)
print('Validation Accuracy =', accuracy*100,'%')

print('Validation label =',y_val[:15])
print('Predicted label  =',y_pred[:15])

**EXPECTED OUTPUT**:

<pre>You should also get about <b>~20%</b> accuracy on validation set</pre>

<br>

You can retrain further the weights by adding the pre-trained `W` and `b` to the arguments when calling training function


---
## 2 - Train a Two-Layer ReLU with Softmax


### a. Predict Function

This time, we implement the predict function first, because we are going to use `predict` function inside the `training` function to track the `validation` accuracy 

The network architecture should be: 
<pre><b>Input - FC layer - ReLU - FC Layer - argmax</b></pre>

<br>

<font color='red'>**EXERCISE :**</font> Implement Predict Function

* call forward function

In [0]:
def predict_two_layer_relu(X, W, b):    
    y_pred = np.zeros(X.shape[1])

    
    # calculate 1st layer score by calling affine forward function using X_batch, W[0], and b[0]
    layer1 = ??

    # calculate 1st activation score by calling relu forward function using layer1 score
    act1 = ??

    # calculate 2nd layer score by calling affine forward function using act1, W[1], and b[1]
    layer2 = ??
    
    # take the maximum prediction from layer 2 and use that column to get the class     
    y_pred = ??
    
    return y_pred

### b. Training Function

The network architecture should be: 
<pre><b>Input - FC layer - ReLU - FC Layer - Softmax</b></pre>


<br>

<font color='red'>**EXERCISE :**</font> Implement Training Function

* `call affine forward function`
* `call relu forward function`
* `call affine forward function`
* `call softmax function`
* `call softmax_loss function`
* `call affine backward function`
* `call relu backward function`
* `call affine backward function`
* `implement weight update`
* `calculate the training and validation accuracy`

In [0]:
def train_two_layer_relu(X, y, X_val, y_val, hidden_size, W=None, b=None, learning_rate=1e-4, lr_decay=0.9, reg=0.5, num_iters=100, 
                    batch_size=200, verbose=True):
    num_train, dim = X.shape
    iterations_per_epoch = max(num_train / batch_size, 1)
    
    num_classes = np.max(y) + 1 # assume y takes values 0...K-1 where K is number of classes
    
    if W is None:
        W0 = 1e-4 * np.random.randn(dim, hidden_size)
        W1 = 1e-4 * np.random.randn(hidden_size, num_classes)
        W = [W0, W1]
    if b is None:
        b0 = np.zeros((1,hidden_size))
        b1 = np.zeros((1,num_classes))
        b = [b0, b1]

    # Run stochastic gradient descent to optimize W
    loss_history = []
    train_acc_history = []
    val_acc_history = []
                     
    for it in range(num_iters):
        X_batch = None
        y_batch = None

        # Randomly select indices from training examples
        train_rows = np.arange(num_train)
        idxs = np.random.choice(train_rows, batch_size, replace=False)
  
        X_batch = X[idxs]
        y_batch = y[idxs]


        # calculate 1st layer score by calling affine forward function using X_batch, W[0], and b[0]
        layer1 = ??
        
        # calculate 1st activation score by calling relu forward function using layer1 score
        act1 = ??
        
        # calculate 2nd layer score by calling affine forward function using act1, W[1], and b[1]
        layer2 = ??
                
        # calculate softmax score by calling softmax function using layer2 score
        softmax_score = ??
        
        # evaluate loss and gradient by calling softmax_loss function using softmax_score and y_batch
        loss, dout = ??
        
        # add regularization to the loss
        loss+= 0.5 * reg * np.sum(W[0] * W[0]) + 0.5 * reg * np.sum(W[1] * W[1])
    
        # append the loss history
        loss_history.append(loss)

        # calculate layer 2 weights gradient by calling affine backward function using dout, act1, W[1], and b[1]
        dW1, db1, dact1 = ??
        
        # calculate sigmoid gradient by calling relu backward function using dact1 and act1 score
        dlayer1 = ??
    
        # calculate layer 1 weights gradient by calling affine backward function using dlayer1, X_batch, W[0], and b[0]
        dW0, db0, _ = ?? 
        
        # perform regulatization gradient
        dW1 += reg*W[1]
        dW0 += reg*W[0]
        
        # perform parameter update by subtracting W and b with a fraction of dW and db
        # according to the learning rate
        W[0] -= ??
        b[0] -= ??
        W[1] -= ??
        b[1] -= ??
        
        if verbose and it % 100 == 0:
            print ('iteration', it,'/',num_iters, ': loss =', loss)
            
        if it % iterations_per_epoch == 0:
            # Check accuracy
            # calculate the training accuracy by calling predict_two_layer_relu function on X_batch
            # and compare it tu y_batch. Then calculate the mean correct (accuracy in range 0-1)
            train_acc = (predict_two_layer_relu(X_batch, W, b) == y_batch).mean()
            
            # calculate the training accuracy by calling predict_two_layer_relu function on X_val
            # and compare it tu y_val. Then calculate the mean correct (accuracy in range 0-1)
            val_acc = (predict_two_layer_relu(X_val, W, b) == y_val).mean()
            
            train_acc_history.append(train_acc)
            val_acc_history.append(val_acc)
            
            # Decay learning rate
            learning_rate *= lr_decay
        
    return loss_history, W, b, train_acc_history, val_acc_history

### c. Train the Softmax Classifier

Try the training Function using the initial parameter

In [0]:
loss, W_relu, b_relu, train_acc, val_acc = train_two_layer_relu(X_train, y_train, X_val, y_val, hidden_size=50, num_iters=1000)

Visualize the loss

In [0]:
plt.plot(loss)
plt.xlabel('iteration')
plt.ylabel('training loss')
plt.title('Training Loss history')
plt.show()


Visualize the training and validation accuracy

In [0]:
plt.plot(train_acc, label='train')
plt.plot(val_acc, label='val')
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Clasification accuracy')
plt.legend()
plt.show()

In [0]:
import sklearn
from sklearn.metrics import accuracy_score

y_pred = predict_two_layer_relu(X_train, W_relu, b_relu)
accuracy = sklearn.metrics.accuracy_score(y_train, y_pred)

print('Training Accuracy =',accuracy*100,'%')

print('Training label  =',y_train[:15])
print('Predicted label =',y_pred[:15])

**EXPECTED OUTPUT**:

<pre>You should be able to get about <b>~28%</b> accuracy on training set using the initial run

In [0]:
y_pred = predict_two_layer_relu(X_val, W_relu, b_relu)
accuracy = sklearn.metrics.accuracy_score(y_val, y_pred)
print('Validation Accuracy =', accuracy*100,'%')

print('Validation label =',y_val[:15])
print('Predicted label  =',y_pred[:15])

**EXPECTED OUTPUT**:

<pre>You should also be able to get about <b>~29%</b> accuracy on validation set</pre>

<br>

You can retrain further the weights by adding the pre-trained W and b to the arguments when calling training function

---


## 3 - First Layer Visualization


In [0]:
## run this to turn off the scrolling effect in jupyter notebook

from IPython.core.magics.display import Javascript
Javascript("""
  IPython.OutputArea.prototype._should_scroll = function(lines) {
      return false;
  }"""
)

## set return true to turn on the scrolling effect

In [0]:
!wget 'https://raw.githubusercontent.com/CNN-ADF/Task2019/master/resources/vis_utils.py'

In [0]:
from vis_utils import visualize_grid

plt.rcParams['figure.figsize'] = [10, 10]

plt.imshow(visualize_grid(W_relu[0].reshape(32, 32, 3, -1).transpose(3, 0, 1, 2), padding=3).astype('uint8'))

plt.gca().axis('off')
plt.show()

<pre>You should see that if you re-train the network, the weight result visualization will be different.

---
---
# [Optional] Hyperparameter Tuning

Now, let's build a two-layered Neural Network and train it using CIFAR-10 dataset

**What's wrong?**. 
* Looking at the visualizations above, we see that the loss is decreasing more or less linearly, which seems to suggest that the learning rate may be too low. 
* Moreover, there is no gap between the training and validation accuracy, suggesting that the model we used has low capacity, and that we should increase its size. 
* On the other hand, with a very large model we would expect to see more overfitting, which would manifest itself as a very large gap between the training and validation accuracy.

<br>

**Tuning**. 
* Tuning the hyperparameters and developing intuition for how they affect the final performance is a large part of using Neural Networks, so we want you to get a lot of practice.
* Below, you should experiment with different values of the various hyperparameters, including hidden layer size, learning rate, numer of training epochs, and regularization strength. 
* You might also consider tuning the learning rate decay, but you should be able to get good performance using the default value.

**Approximate results**. 
* You should be aim to achieve a classification accuracy of greater than 48% on the validation set. Our best network gets over 52% on the validation set.

<br>

**EXERCISE :**
* `Use the validation set to tune hyperparameters (regularization strength and learning rate)`
* `find the best learning rate and regularization strength using staged random search, (**Coarse-to-Fine Search**)`
* `try to gradually decrease the random range to find the best learning rate and regularization strength`
* `use only few epochs or iteration`

In [0]:
import warnings
import datetime
warnings.filterwarnings('ignore')

results = {}
best_val = -1
best_reg = 0
best_lr = 0

best_W = None
best_b = None
max_iter = 1000
max_trial = 30

for trial in range(max_trial):
    
    reg = 10**np.random.uniform(-2, 1)    # <---------- you can try and change this
    lr = 10**np.random.uniform(-2,-4)     # <---------- you can try and change this    
    
    loss, W, b, _, _ = train_two_layer_relu(X_train, y_train, X_val, y_val, 50,
                                      num_iters=max_iter, batch_size=200, 
                                      learning_rate=lr, lr_decay=0.95, 
                                      reg=reg, verbose=False) 
    val_acc = (predict_two_layer_relu(X_val, W, b) == y_val).mean() 
    if val_acc > best_val: 
        best_W = W 
        best_b = b 
        best_val = val_acc 
        best_lr  = lr 
        best_reg = reg 
    print(str(datetime.datetime.now()), 'val_acc:', val_acc, '\tlr:', lr, 
          '\treg:', reg, '\t', str(trial)+'/'+str(max_trial))
    
print ("best_reg: ", best_reg)
print ("best_lr: ", best_lr)



* You can try different strategy to find the hyperparameter
* You can also try to finetune the other hyperparameter such as number of hidden neuron and lr_decay
* You can also try other architectures such as changing the activation function

---
## 1 - Train the Network Fully

When you are done experimenting,

Train the network for longer epochs using the `best learning rate` and ` best regularization strength`

---

In [0]:
print('lr =',best_lr)
print('reg =',best_reg)
loss, best_W, best_b, train_acc, val_acc = train_two_layer_relu(X_train, y_train, X_val, y_val,
                                                      W = best_W, b = best_b,
                                                      hidden_size=50,
                                                      num_iters=5000,
                                                      learning_rate = best_lr,
                                                      reg = best_reg)

Visualize the loss

In [0]:
plt.plot(loss)
plt.xlabel('iteration')
plt.ylabel('training loss')
plt.title('Training Loss history')
plt.show()


Visualize the training and validation accuracy

In [0]:
plt.plot(train_acc, label='train')
plt.plot(val_acc, label='val')
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Clasification accuracy')
plt.legend()
plt.show()

In [0]:
import sklearn
from sklearn.metrics import accuracy_score

y_pred = predict_two_layer_relu(X_train, best_W, best_b)
accuracy = sklearn.metrics.accuracy_score(y_train, y_pred)

print('Training Accuracy =',accuracy*100,'%')

print('Training label  =',y_train[:15])
print('Predicted label =',y_pred[:15])

**EXPECTED OUTPUT**:

<pre>If you're careful, You should be able to get about <b>~60%</b> accuracy on training set using the initial run

In [0]:
y_pred = predict_two_layer_relu(X_val, best_W, best_b)
accuracy = sklearn.metrics.accuracy_score(y_val, y_pred)
print('Validation Accuracy =', accuracy*100,'%')

print('Validation label =',y_val[:15])
print('Predicted label  =',y_pred[:15])

**EXPECTED OUTPUT**:

<pre>You should also be able to get about <b>~50%</b> accuracy on validation set</pre>

<br>

You can retrain further the weights by adding the pre-trained W and b to the arguments when calling training function

---


In [0]:
plt.imshow(visualize_grid(best_W[0].reshape(32, 32, 3, -1).transpose(3, 0, 1, 2), padding=3).astype('uint8'))

plt.gca().axis('off')
plt.show()

## 2 - Test the Trained Weights

Evaluate your final trained network on the test set; you should be able get **above 48%.**

In [0]:
y_pred = predict_two_layer_relu(X_test, best_W, best_b)

accuracy = sklearn.metrics.accuracy_score(y_test, y_pred)

print('Testing Accuracy =', accuracy*100,'%')
print('Test label      =',y_test[:15])
print('Predicted label =',y_pred[:15])

 An important way to gain intuition about how an algorithm works is to visualize the mistakes that it makes. 
 
 In this visualization, we show examples of images that are misclassified by our current system. 
 
 The first column  shows images that our system labeled as "plane" but whose true label is  something other than "plane".

In [0]:
examples_per_class = 8
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for cls, cls_name in enumerate(classes):
    idxs = np.where((y_test != cls) & (y_pred == cls))[0]
    idxs = np.random.choice(idxs, examples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt.subplot(examples_per_class, len(classes), i * len(classes) + cls + 1)
        plt.imshow(X_test_ori[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls_name)
plt.show()

---
---
# [Optional] Two-Layer NeuralNet on Feature Space

We have seen that we can achieve reasonable performance on an image classification task by training a linear classifier on the pixels of the input image. In this exercise we will show that we can improve our classification performance by training linear classifiers not on raw pixels but on features that are computed from the raw pixels.

All of your work for this exercise will be done in this notebook.

## 1 - Feature Extraction Functions
* For each image we will compute a Histogram of Oriented Gradients (HOG) as well as a color histogram using the hue channel in HSV color space. We form our final feature vector for each image by concatenating the HOG and color histogram feature vectors.

* Roughly speaking, HOG should capture the texture of the image while ignoring color information, and the color histogram represents the color of the input image while ignoring texture. 
* As a result, we expect that using both together ought to work better than using either alone. Verifying this assumption would be a good thing to try for your interests.

* The `hog_feature` and `color_histogram_hsv` functions both operate on a single
image and return a feature vector for that image.
* The `extract_features` function takes a set of images and a list of feature functions and evaluates each feature function on each image, storing the results in a matrix where
each column is the concatenation of all feature vectors for a single image.

In [0]:
from __future__ import print_function

import matplotlib
from scipy.ndimage import uniform_filter

def extract_features(imgs, feature_fns, verbose=False):
    num_images = imgs.shape[0]
    if num_images == 0:
        return np.array([])

    # Use the first image to determine feature dimensions
    feature_dims = []
    first_image_features = []
    for feature_fn in feature_fns:
        feats = feature_fn(imgs[0].squeeze())
        assert len(feats.shape) == 1, 'Feature functions must be one-dimensional'
        feature_dims.append(feats.size)
        first_image_features.append(feats)

    # Now that we know the dimensions of the features, we can allocate a single
    # big array to store all features as columns.
    total_feature_dim = sum(feature_dims)
    imgs_features = np.zeros((num_images, total_feature_dim))
    imgs_features[0] = np.hstack(first_image_features).T

    # Extract features for the rest of the images.
    for i in range(1, num_images):
        idx = 0
        for feature_fn, feature_dim in zip(feature_fns, feature_dims):
            next_idx = idx + feature_dim
            imgs_features[i, idx:next_idx] = feature_fn(imgs[i].squeeze())
            idx = next_idx
        if verbose and i % 1000 == 0:
            print('Done extracting features for %d / %d images' % (i, num_images))

    return imgs_features

In [0]:

def rgb2gray(rgb):
    return np.dot(rgb[..., :3], [0.299, 0.587, 0.144])


def hog_feature(im):

    # convert rgb to grayscale if needed
    if im.ndim == 3:
        image = rgb2gray(im)
    else:
        image = np.at_least_2d(im)

    sx, sy = image.shape  # image size
    orientations = 9  # number of gradient bins
    cx, cy = (8, 8)  # pixels per cell

    gx = np.zeros(image.shape)
    gy = np.zeros(image.shape)
    gx[:, :-1] = np.diff(image, n=1, axis=1)  # compute gradient on x-direction
    gy[:-1, :] = np.diff(image, n=1, axis=0)  # compute gradient on y-direction
    grad_mag = np.sqrt(gx ** 2 + gy ** 2)  # gradient magnitude
    grad_ori = np.arctan2(gy, (gx + 1e-15)) * (180 / np.pi) + 90  # gradient orientation

    n_cellsx = int(np.floor(sx / cx))  # number of cells in x
    n_cellsy = int(np.floor(sy / cy))  # number of cells in y
    # compute orientations integral images
    orientation_histogram = np.zeros((n_cellsx, n_cellsy, orientations))
    for i in range(orientations):
        # create new integral image for this orientation
        # isolate orientations in this range
        temp_ori = np.where(grad_ori < 180 / orientations * (i + 1),
                            grad_ori, 0)
        temp_ori = np.where(grad_ori >= 180 / orientations * i,
                            temp_ori, 0)
        # select magnitudes for those orientations
        cond2 = temp_ori > 0
        temp_mag = np.where(cond2, grad_mag, 0)
        orientation_histogram[:, :, i] = uniform_filter(temp_mag, size=(cx, cy))[int(cx / 2)::cx, int(cy / 2)::cy].T

    return orientation_histogram.ravel()


def color_histogram_hsv(im, nbin=10, xmin=0, xmax=255, normalized=True):
    ndim = im.ndim
    bins = np.linspace(xmin, xmax, nbin + 1)
    hsv = matplotlib.colors.rgb_to_hsv(im / xmax) * xmax
    imhist, bin_edges = np.histogram(hsv[:, :, 0], bins=bins, density=normalized)
    imhist = imhist * np.diff(bin_edges)

    # return histogram
    return imhist

## 2 - Reload the CIFAR-10 dataset

In [0]:
try:
    del X_train, y_train
    del X_test, y_test
    print('Clear previously loaded data.')
except:
    pass

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()

mask = list(range(49000, 50000))
X_val = X_train[mask]
y_val = y_train[mask]
mask = list(range(49000))
X_train = X_train[mask]
y_train = y_train[mask]

classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'forse', 'ship', 'truck']

## 3 - Extract Features

In [0]:
num_color_bins = 20 # Number of bins in the color histogram
feature_fns = [hog_feature, lambda img: color_histogram_hsv(img, nbin=num_color_bins)]
X_train_feats = extract_features(X_train, feature_fns, verbose=True)
X_val_feats = extract_features(X_val, feature_fns)
X_test_feats = extract_features(X_test, feature_fns)

# Preprocessing: Subtract the mean feature
mean_feat = np.mean(X_train_feats, axis=0, keepdims=True)
X_train_feats -= mean_feat
X_val_feats -= mean_feat
X_test_feats -= mean_feat

# Preprocessing: Divide by standard deviation. This ensures that each feature
# has roughly the same scale.
std_feat = np.std(X_train_feats, axis=0, keepdims=True)
X_train_feats /= std_feat
X_val_feats /= std_feat
X_test_feats /= std_feat

# Preprocessing: Add a bias dimension
X_train_feats = np.hstack([X_train_feats, np.ones((X_train_feats.shape[0], 1))])
X_val_feats = np.hstack([X_val_feats, np.ones((X_val_feats.shape[0], 1))])
X_test_feats = np.hstack([X_test_feats, np.ones((X_test_feats.shape[0], 1))])

In [0]:
y_train = y_train.ravel()
y_val = y_val.ravel()
y_test = y_test.ravel()

print('X_train_feats.shape =', X_train_feats.shape)
print('y_train.shape =',y_train.shape)
print('y_val.shape   =',y_val.shape)
print('y_test.shape  =',y_test.shape)

## 4 - Train a Two-Layer Neural Network

Again, fine tune the network, and find the best hyperparameter (learning rate, regularizations, bins, hidden neuron, etc)

Then train the network once again using feature space CIFAR10 dataset

In [0]:
loss, W, b, _, _ = train_two_layer_relu(X_train_feats, y_train, X_val_feats, y_val,
                                  hidden_size=100,
                                  num_iters=1000,
                                  learning_rate = 0.9,
                                  reg = 0.0)

This approach should outperform all previous approaches: you should easily be able to achieve over 55% classification accuracy on the test set; 

our best model achieves **about 60% classification accuracy**.

In [0]:
y_pred = predict_two_layer_relu(X_val_feats, W, b)

accuracy = sklearn.metrics.accuracy_score(y_val, y_pred)

print('Validation Accuracy =', accuracy*100,'%')
print('Test label      =',y_val[:15])
print('Predicted label =',y_pred[:15])

In [0]:
y_pred = predict_two_layer_relu(X_test_feats, W, b)

accuracy = sklearn.metrics.accuracy_score(y_test, y_pred)

print('Testing Accuracy =', accuracy*100,'%')
print('Test label      =',y_test[:15])
print('Predicted label =',y_pred[:15])

 An important way to gain intuition about how an algorithm works is to visualize the mistakes that it makes. 
 
 In this visualization, we show examples of images that are misclassified by our current system. 
 
 The first column  shows images that our system labeled as "plane" but whose true label is  something other than "plane".

In [0]:
examples_per_class = 8
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for cls, cls_name in enumerate(classes):
    idxs = np.where((y_test != cls) & (y_pred == cls))[0]
    idxs = np.random.choice(idxs, examples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt.subplot(examples_per_class, len(classes), i * len(classes) + cls + 1)
        plt.imshow(X_test[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls_name)
plt.show()


---

# Congratulation, You've Completed Exercise 2

<p>Copyright &copy;  <a href=https://www.linkedin.com/in/andityaarifianto/>2019 - ADF</a> </p>

![footer](https://image.ibb.co/hAHDYK/footer2018.png)