![title](https://i.ibb.co/f2W87Fg/logo2020.png)

---


<table  class="tfo-notebook-buttons" align="left"><tr><td>
    
<a href="https://colab.research.google.com/github/adf-telkomuniv/CV2020_Exercises/blob/master/CV2020 - 03 - Neural Network.ipynb" source="blank" ><img src="https://colab.research.google.com/assets/colab-badge.svg"></a>
</td><td>
<a href="https://github.com/adf-telkomuniv/CV2020_Exercises/blob/master/notebooks/CV2020 - 03 - Neural Network.ipynb" source="blank" ><img src="https://i.ibb.co/6NxqGSF/pinpng-com-github-logo-png-small.png"></a>
    
</td></tr></table>


# Task 3 - Artificial Neural Network


In this assignment you will practice putting together a simple image classification pipeline, based on the Two-layer Neural Network classifier. The goals of this assignment are as follows:

    * understand the basic Image Classification pipeline using Multi-layered Neural Network
    * understand the train/val/test splits and the use of validation data for hyperparameter tuning.
    * develop proficiency in writing efficient vectorized code with numpy

    * implement and apply various classic activation functions
    * implement and apply mini-batch gradient descent
    * implement and apply regularization
    * implement and apply hyperparameter finetuning strategy

    * understand the differences and tradeoffs between these classifiers
    * the use of feature extraction to boost neural net performance

---
---


# [Part 0] Import Libraries

In [None]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

np.set_printoptions(precision=7)

Write down your Name and Student ID

In [None]:
## --- start your code here ----

NIM = ??
Nama = ??

## --- end your code here ----

# [Part 1] Binary Single Layer Perceptron

## 1 - Load Dataset

For this exercise, we will use binary class data to recognize `cats` and `not cats`. 
The images are $64\times64$ in dimension.


---
### a. Load Cat Dataset
first, load the dataset


In [None]:
import h5py    
    
def load_dataset():
    !wget 'https://raw.githubusercontent.com/CNN-ADF/Task2020/master/resources/catvnoncat.h5'
    dataset = h5py.File('catvnoncat.h5', "r")

    train_set_x_orig  = np.array(dataset["X_train"][:]) # your train set features
    train_set_y_orig  = np.array(dataset["y_train"][:]) # your train set labels
    val_set_x_orig    = np.array(dataset["X_val"][:]) # your val set features
    val_set_y_orig    = np.array(dataset["y_val"][:]) # your val set labels
    classes           = np.array(dataset["classes"][:]) # the list of classes
    
    return train_set_x_orig, train_set_y_orig, val_set_x_orig, val_set_y_orig, classes

In [None]:
X_train_ori, y_train, X_val_ori, y_val, classes = load_dataset()

In [None]:
print(X_train_ori.shape)
print(y_train.shape)
print(X_val_ori.shape)
print(y_val.shape)

**EXPECTED OUTPUT**: 
<pre>
(209, 64, 64, 3)
(209, 1)
(50, 64, 64, 3)
(50, 1)

View some data

In [None]:
fig, ax = plt.subplots(2,10,figsize=(18,5))
fig.subplots_adjust(hspace=0.1, wspace=0.1)
for j in range(0,2):
    for i in range(0, 10):
        ax[j,i].imshow(X_train_ori[i+j*10])
        ax[j,i].set_title(classes[y_train[i+j*10,0]].decode("utf-8") )
        ax[j,i].axis('off')
plt.show()

---
### b. Reshape and Normalize

#### <font color='red'>**EXERCISE**: </font>
    * Reshape `X_train_ori` and `X_val_ori` into 1-dimensional matrix, 
    * Store it as `X_train` and `X_val`
    * the `X_train_ori` and `X_val_ori` shape should still be 
      (209, 64, 64, 3) and (50, 64, 64, 3)

<br>

*Hint: use `np.reshape()`*

In [None]:
X_train = ??
X_val   = ??

In [None]:
print('before')
print(X_train_ori.shape)
print(X_val_ori.shape)

print('\nafter')
print(X_train.shape)
print(X_val.shape)

**EXPECTED OUTPUT**: 
<pre>
before
(209, 64, 64, 3)
(50, 64, 64, 3)

after
(209, 12288)
(50, 12288)


<br>

#### <font color='red'>**EXERCISE**: </font>
    * Since for this dataset using sigmoid and regression is enough, 
    * standarize the dataset into a `range of 0-1` by dividing it with `255`

In [None]:
print('before',X_train[0,:6])

**EXPECTED OUTPUT**: 
<pre>
    before [17 31 56 22 33 59]

In [None]:
# divide with `255`
X_train = ??
X_val   = ??

In [None]:
print('after',X_train[0,:6])

**EXPECTED OUTPUT**: 
<pre>
after [0.0666667 0.1215686 0.2196078 0.0862745 0.1294118 0.2313725]

---
## 2 - Basic Neurons

<center>
<img src="https://i.ibb.co/y4Zz3Gy/neuron.png" width="50%" >
</center>

Standard neuron is basically the same as previous linear function. 


---
### a. Forward and Backward Affine Function



#### <font color='red'>**EXERCISE**: </font>
    * Implement Affine forward function:

$$
\begin{align}
f(x, W, b) = x.W + b
\end{align}
$$


In [None]:
def affine_forward(x, W, b):  
  
    v = ??            # x dot W + b   
    
    return v



<br>

#### <font color='red'>**EXERCISE**: </font>
    * Implement affine backward function:


$$
\begin{align*}
\partial W & = x^T.\partial out \\
\partial b & = \sum \partial out \\
\partial x & = \partial out.W^T \\
\end{align*}
$$

In [None]:
def affine_backward(dout, x, W, b):
  
    dW = ??           # x.T dot dout
    db = ??           # sum dout, axis=0, keepdims=True
    dx = ??           # dout dot W.T
    
    return dW, db, dx

---
### b. Forward and Backward Sigmoid Function


Also implement the activation function

#### <font color='red'>**EXERCISE**: </font>
    * Implement Sigmoid forward function:

$$
\begin{align}
f(x) = \sigma(x) = \frac{1}{1+e^{-v}}
\end{align}
$$

<center>
<img src="https://i.ibb.co/QjF8sDx/sigmoid.png" width="20%" >
</center>


In [None]:
def sigmoid_forward(x):  
  
    out = ??

    return out

Check your implementation

In [None]:
x  = np.array([-2, -1, 0, 1, 2, 3])
ds = sigmoid_forward(x)

print(ds)

**EXPECTED OUTPUT**:

<pre> [0.1192029 0.2689414 0.5       0.7310586 0.8807971 0.9525741]


---
<br>

#### <font color='red'>**EXERCISE**: </font>
    * Implement Sigmoid backward function
$$
\begin{align*}
\sigma'(x) = \sigma(x) \ (1 - \sigma(x))\\\\
\partial out = \partial out . \sigma'(x)
\end{align*}
$$
<br>

In [None]:
def sigmoid_backward(dout, ds):
    """
    Argument:
        ds: sigmoid forward result
        dout: gradient error
    """
    
    # calculate the local gradient of sigmoid
    ds_prime = ??
    
    # calculate the gradient propagation
    dout = ??
    return dout

Check your implementation

In [None]:
np.random.seed(10)
dout = np.random.random((6,)) 
dout = sigmoid_backward(dout, ds)
print(dout)

**EXPECTED OUTPUT**:

<pre> [0.0809837 0.0040801 0.1584121 0.1472238 0.05234   0.0101556]



---
### c. Forward and Backward Tanh Function

We haven't discussed aboout activation function yet in class, but the implementation of these functions are quite easy, so you should be able to handle it

#### <font color='red'>**EXERCISE**: </font>
    * Implement `Tanh` forward function

**hint: use `np.tanh(x)`*

$$
\begin{align}
f(x) = tanh(x)
\end{align}
$$

<center>
<img src="https://i.ibb.co/JQCczk5/tanh.png" width="20%" >
</center>


In [None]:
def tanh_forward(x):  
  
    out = ??
    
    return out

Check your implementation

In [None]:
x  = np.array([-2, -1, 0, 1, 2, 3])
dt = tanh_forward(x)

print(dt)

**EXPECTED OUTPUT**:

<pre> [ -0.9640276 -0.7615942  0.         0.7615942  0.9640276  0.9950548]



---
<br>

#### <font color='red'>**EXERCISE**: </font>
    * Implement Tanh backward function
$$
\begin{align*}
f'(x) = 1-f(v)^2\\
\\
\partial out = \partial out . f'(x)
\end{align*}
$$
<br>

In [None]:
def tanh_backward(dout, dt):
    """
    Argument:
        dt: tanh forward result
        dout: gradient error
    """
    
    # calculate the local gradient of tanh
    dt_prime = ??           # 1- dt^2
    
    # calculate the gradient propagation
    dout = ??
    
    return dout

Check your implementation

In [None]:
np.random.seed(10)
dout = np.random.random((6,)) 
dout = tanh_backward(dout, dt)
print(dout)

**EXPECTED OUTPUT**:

<pre> [0.0544944 0.0087153 0.6336482 0.3144784 0.0352199 0.0022179]


---
### d. Forward and Backward ReLU Function

Next we implement ReLU activation function

We haven't discussed aboout activation function yet in class, but the implementation of these functions are quite easy, so you should be able to handle it


#### <font color='red'>**EXERCISE:** </font>
* Implement ReLU forward function:


$$
\begin{align}
f(x) = 
\begin{cases}
0, & \text{for } x<0\\
x, & \text{for } x\geq0
\end{cases}
\end{align}
$$

<center>
<img src="https://i.ibb.co/rmXqJyh/relu.png" width="20%" >
</center>

**hint: use `np.maximum()`*

In [None]:
def relu_forward(x): 
  
    out = ??
    
    return out

Check your implementation

In [None]:
x  = np.array([-2, -1, 0, 1, 2, 3])
dr = relu_forward(x) 

print(dr)

**EXPECTED OUTPUT**:

<pre> [0, 0, 0, 1, 2, 3]


---

<br>

#### <font color='red'>**EXERCISE:** </font>
* Implement ReLU backward function
$$
\begin{align*}
f'(v) = 
\begin{cases}
0, & \text{for } x<0\\
1, & \text{for } x\geq0
\end{cases}\\
\\
\partial out = \partial out . f'(x)
\end{align*}
$$

**hint: use `np.where(condition, if true, if false)`*

In [None]:
def relu_backward(dout, x):
    """
    Argument:
        x: relu input
        dout: gradient error
    """
    
    # calculate the local gradient of relu
    dr_prime = ??
    
    # calculate the gradient propagation
    dout = ??
    
    return dout

Check your implementation

In [None]:
np.random.seed(10)
dout = np.random.random((6,)) 
dout = relu_backward(dout, x)
print(dout)

**EXPECTED OUTPUT**:

<pre> [0.        0.        0.        0.7488039 0.498507  0.2247966]



---
## 3 - One-Layer Sigmoid

We'll train a Single Layer Perceptron with Sigmoid for binary Classification using Full batch **Gradient Descent**


<center>
<img src="https://i.ibb.co/4mQBjd8/Single-Layer-Perceptron.png" width="25%" >
</center>



### a. Training Function

The network architecture should be: <br>
<pre><font color="blue"><b>Input - FC layer - Sigmoid</b></font></pre>



#### <font color='red'>**EXERCISE:** </font>
Implement Training Function

    * call affine forward function
    * call sigmoid forward function
    * apply L2 loss regression (MSE)
    * call affine backward function
    * implement weight update

<br>

**Note** that we do not calculate `sigmoid backward` as in Single Layer Perceptron `sigmoid` is used directly as output activation

In [None]:
def train_one_layer(X, y, W=None, b=None, lr=0.005, epochs=100, verbose=True):
    
    
    num_train, dim = X.shape
    
    num_classes = 1
    
    # initialize weights if not provided
    if W is None:
        W = 0.02*np.random.rand(dim, num_classes)
    if b is None:
        b = np.zeros((1, num_classes))

    # Run stochastic gradient descent to optimize W
    loss_history = []
                     
    for ep in range(epochs):
        

        # calculate 1st layer score by calling affine forward function using X, W, and b
        layer1 = ??
        
        # calculate 1st activation score by calling sigmoid forward function using layer1 score
        act1 = ??
        
        # calculate error by subtracting act1 with y
        error = ??
        
        # calculate L2 Loss (MSE) by averaging the squared error
        loss = ??  
        
        # divide error by num_train
        error = ??
    
        # calculate layer 1 weights gradient by calling affine backward function using error, X, W, and b
        dW, db, _ = ??

        
        # perform parameter update by subtracting each W and b with a fraction of dW and db
        # according to the learning rate
        W = ??                       # W-lr*dW
        b = ??                       # b-lr*db

        
        if verbose and ep % 100 == 0:
            print ('epoch',ep,'/',epochs, ': loss =', loss)
            
            # append the loss history
            loss_history.append(loss)

    print('Training done')
    return W, b, loss_history

### b. Train the Binary Classifier

Try the training Function using the initial parameter

In [None]:
W, b, loss = train_one_layer(X_train, y_train, epochs=1000, lr=0.005)

**EXPECTED OUTPUT**:

<pre>The loss should starts around 0.6 and ends around 0.05

Visualize the loss

In [None]:
plt.rcParams['figure.figsize'] = [12, 5]
plt.plot(loss)
plt.xlabel('iteration')
plt.ylabel('training loss')
plt.title('Training Loss history')
plt.show()


### c. Predict Function

The network architecture should be: <br>
<pre><font color="blue"><b>Input - FC layer - Sigmoid</b></font></pre>



#### <font color='red'>**EXERCISE:** </font>
Implement Predict Function

    * call forward function

In [None]:
def predict_one_layer(X, W, b):    
    y_pred = np.zeros(X.shape[1])
    
    # calculate 1st layer score by calling affine forward function using X, W, and b
    layer1 = ??

    # calculate 1st activation score by calling sigmoid forward function using layer1 score
    act1 = ??
    
    # since it's a binary class, round the score to get the class
    y_pred = np.round(act1)
    
    return y_pred

### d. Training Accuracy
Calculate the Training Accuracy

In [None]:
import sklearn
from sklearn.metrics import accuracy_score

y_pred = predict_one_layer(X_train, W, b)
accuracy = sklearn.metrics.accuracy_score(y_train, y_pred)

print('Training Accuracy =',accuracy*100,'%')

print('Training label    =',y_train[:15].ravel())
print('Predicted label   =',y_pred[:15].astype('int').ravel())

**EXPECTED OUTPUT**:

<pre>You should get about <b>~95%</b> accuracy on training set using the initial run</pre>

Calculate the Validation Accuracy

In [None]:
y_pred = predict_one_layer(X_val, W, b)
accuracy = sklearn.metrics.accuracy_score(y_val, y_pred)
print('Validation Accuracy =', accuracy*100,'%')

print('Validation label    =',y_val[:15].ravel())
print('Predicted label     =',y_pred[:15].astype('int').ravel())

**EXPECTED OUTPUT**:

<pre>You should also get about <b>~74%</b> accuracy on validation set</pre>

<br>

You can retrain further the weights by adding the pre-trained W and b to the arguments when calling training function



In [None]:
# loss, W, b = train_one_layer(X_train, y_train, W=W, b=b, num_iters=1000, learning_rate=0.005)

---
---
# [Part 2] Multi-Layered Perceptron

Now, let's build some multi-layered Neural Networks and train it using CIFAR-10 dataset

---
## 1 - Load CIFAR-10 Dataset

CIFAR-10 dataset is a image classification dataset, consisting of 10 classes. 

The images are $32\times32$ color image with 50,000 data train and 10,000 data test

---
### a. Import Data CIFAR-10

In [None]:
import tensorflow as tf

(X_train_ori, y_train_ori), (X_test_ori, y_test_ori) = tf.keras.datasets.cifar10.load_data()
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

print('X_train.shape =',X_train_ori.shape)
print('y_train.shape =',y_train_ori.shape)
print('X_test.shape  =',X_test_ori.shape)
print('y_test.shape  =',y_test_ori.shape)


---
### b. Visualizing Data
Show the first 20 images from `X_train`

In [None]:
fig, ax = plt.subplots(2,10,figsize=(15,4.5))
fig.subplots_adjust(hspace=0.1, wspace=0.1)
for j in range(0,2):
    for i in range(0, 10):
        ax[j,i].imshow(X_train_ori[i+j*10])
        ax[j,i].set_title(classes[y_train_ori[i+j*10,0]])
        ax[j,i].axis('off')
plt.show()

---
### c. Split Training Data

Get the last 10000 data from Training Set as Validation Set

In [None]:
X_val   = X_train_ori[-10000:,:]
y_val   = y_train_ori[-10000:]

X_train = X_train_ori[:-10000, :]
y_train = y_train_ori[:-10000]

X_test  = X_test_ori
y_test  = y_test_ori

print('X_val.shape   =',X_val.shape)
print('y_val.shape   =',y_val.shape)
print('X_train.shape =',X_train.shape)
print('y_train.shape =',y_train.shape)

---
### d. Normalizing Data
Normalize `X_train`, `X_val` and `X_test` by *zero-centering* them:

In [None]:
X_train = X_train.astype('float32')
X_val   = X_val.astype('float32')
X_test  = X_test.astype('float32')

mean_image = np.mean(X_train, axis = 0)
X_train   -= mean_image
X_val     -= mean_image
X_test    -= mean_image

print('np.mean(X_train) =',np.mean(X_train))
print('np.mean(X_val)   =',np.mean(X_val))
print('np.mean(X_test)  =',np.mean(X_test))


---
### e. Reshape Data
Reshape each data in `X_train`, `X_val` and `X_test` into 1-dimensional matrix


In [None]:
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1] * X_train.shape[2] * X_train.shape[3]) )
X_val   = X_val.reshape(  (X_val.shape[0],   X_val.shape[1]   * X_val.shape[2]   * X_val.shape[3]) )
X_test  = X_test.reshape( (X_test.shape[0],  X_test.shape[1]  * X_test.shape[2]  * X_test.shape[3]) )

print('X_train.shape =',X_train.shape)
print('X_val.shape   =',X_val.shape)
print('X_test.shape  =',X_test.shape)

 Reshape `y_train`, `y_val` and `y_test` into a vector 


In [None]:
y_train = y_train.ravel()
y_val   = y_val.ravel()
y_test  = y_test.ravel()

print('y_train.shape =',y_train.shape)
print('y_val.shape   =',y_val.shape)
print('y_test.shape  =',y_test.shape)

---
## 2 - Softmax Function 


### a. Softmax Score

The implementation is the same as previous Task, so in here we already provide you with the implementation


In [None]:
def softmax(x):  
    
    # shift x by subtracting with its maximum value . Use np.max(...)
    x -= np.max(x)
    
    # Apply exp() element-wise to x. Use np.exp(...).    
    x_exp = np.exp(x)
    
    # Create a vector X_sum that sums each row of X_exp. Use np.sum(..., axis = 1, keepdims = True).
    x_sum = np.sum(x_exp, axis = 1, keepdims = True)  
    
    # Compute softmax(x) score by dividing X_exp by X_sum. It should automatically use numpy broadcasting.
    score = x_exp / x_sum
    
    return score

### b. Softmax Loss

Implement a softmax loss function using numpy. The implementation is the same as previous Task, so in here we already provide you with the implementation


In [None]:
def softmax_loss(score, y):
   
    num_examples = score.shape[0]
    
    #make a number list containing [1 2 3 ... n]
    number_list = range(num_examples)
    
    # calculate the correct log probability of score[number_list,y] by applycing -np.log(...)
    corect_logprobs = -np.log(score[number_list,y])
    
    # average the correct log probability, use np.sum then divide it by num_examples
    loss = np.sum(corect_logprobs)/num_examples
    
    
    # 3. COMPUTE THE GRADIENT ON SCORES
    dscores = score
    dscores[range(num_examples),y] -= 1
    dscores /= num_examples

    
    return loss, dscores

---
## 3 - Two-Layered Sigmoid with Softmax


<center>
<img src="https://i.ibb.co/0KVzGBJ/2-Layer-Perceptron.png" width="35%" >
</center>


---

### a. Training Function

The network architecture should be: 
<pre><font color="blue"><b>Input - FC layer - Sigmoid - FC Layer - Softmax</b></font></pre>



#### <font color='red'>**EXERCISE:** </font>

Implement Training Function

    * call affine forward function
    * call sigmoid forward function
    * call affine forward function
    
    * call softmax function
    * call softmax_loss function
    
    * call affine backward function
    * call sigmoid backward function
    * call affine backward function
    
    * implement weight update

In [None]:
def train_two_layer_sigmoid(X, y, hidden_size, W=None, b=None, 
                            lr=1e-4, lr_decay=0.9, reg=0.25, 
                            epochs=100, batch_size=200, verbose=True):
    
    num_train, dim = X.shape
    
    # check if data train is divisible by batch size
    assert num_train % batch_size==0, "data train "+str(num_train)+" is not divisible by batch size"+str(batch_size)
    
    # total iteration per epoch
    num_iter = num_train // batch_size
    
    #start iteration counts
    it = 0
    
    num_classes = np.max(y) + 1  # assume y takes values 0...K-1 where K is number of classes
    
    # initialize weights if not provided
    if W is None:
        W0 = 1e-4 * np.random.randn(dim, hidden_size)
        W1 = 1e-4 * np.random.randn(hidden_size, num_classes)
        W = [W0, W1]
    if b is None:
        b0 = np.zeros((1,hidden_size))
        b1 = np.zeros((1,num_classes))
        b = [b0, b1]

    # Run stochastic gradient descent to optimize W
    loss_history = []
                     
    for ep in range(epochs):
        # Shuffle data train index
        train_rows = np.arange(num_train)
        np.random.shuffle(train_rows)
        
        # split index into mini batches
        id_batch = np.split(train_rows, num_iter)
        
        for batch in id_batch:
          
            # get mini batch data and label
            X_batch = X[batch]
            y_batch = y[batch]


            # calculate 1st layer score by calling affine forward function using X_batch, W[0], and b[0]
            layer1 = ??


            # calculate 1st activation score by calling sigmoid forward function using layer1 score
            act1 = ??


            # calculate 2nd layer score by calling affine forward function using act1, W[1], and b[1]
            layer2 = ??


            # calculate softmax score by calling softmax function using layer2 score
            softmax_score = ??


            # evaluate loss and gradient by calling softmax_loss function using softmax_score and y_batch
            loss, dout = ??


            # add regularization to the loss
            loss+= reg * (np.sum(W[0] * W[0]) + np.sum(W[1] * W[1]))

            # append the loss history
            loss_history.append(loss)

            # calculate layer 2 weights gradient by calling affine backward function using dout, act1, W[1], and b[1]
            dW1, db1, dact1 = ??


            # calculate sigmoid gradient by calling sigmoid backward function using dact1 and act1 score
            dlayer1 = ??


            # calculate layer 1 weights gradient by calling affine backward function using dlayer1, X_batch, W[0], and b[0]
            dW0, db0, _ = ??


            # perform regulatization gradient
            dW1 += 2 * reg * W[1]
            dW0 += 2 * reg * W[0]

            # perform parameter update by subtracting W and b with a fraction of dW and db
            # according to the learning rate            
            W[0] = ??            # W0 - lr * dW0
            b[0] = ??            # b0 - lr * db0
            W[1] = ??            # W1 - lr * dW1
            b[1] = ??            # b1 - lr * db1
        
            # iteration count
            it +=1

            if verbose and it % 100 == 0:
                print ('iteration',it,'(epoch', ep,'/',epochs, '): loss =', loss)
                
                
        # At the end of one epoch
        # Decay learning rate
        lr *= lr_decay
            
    print('Training Done')
    return W, b, loss_history

---
### b. Train the Softmax Classifier

Try the training Function using the initial parameter

In [None]:
W_sigm, b_sigm, loss = train_two_layer_sigmoid(X_train, y_train, hidden_size=50, epochs=15)

**EXPECTED OUTPUT**:

<pre>The loss should starts around 2.3 and ends around 2.27</pre>

Visualize the loss

In [None]:
plt.rcParams['figure.figsize'] = [12, 5]
plt.plot(loss)
plt.xlabel('iteration')
plt.ylabel('training loss')
plt.title('Training Loss history')
plt.show()

---
### c. Predict Function
Implement Predict function

The network architecture should be: 
<pre><font color="blue"><b>Input - FC layer - Sigmoid - FC Layer - argmax</b></font></pre>



#### <font color='red'>**EXERCISE:** </font> 

Implement Predict Function

    * call affine forward function
    * call sigmoid forward function
    * call affine forward function
    * call argmax to get max score id


In [None]:
def predict_two_layer_sigmoid(X, W, b):    
    y_pred = np.zeros(X.shape[1])

    
    # calculate 1st layer score by calling affine forward function using X_batch, W[0], and b[0]
    layer1 = ??


    # calculate 1st activation score by calling sigmoid forward function using layer1 score
    act1 = ??


    # calculate 2nd layer score by calling affine forward function using act1, W[1], and b[1]
    layer2 = ??

    
    # take the maximum prediction from layer 2 and use that column to get the class   
    # use np.argmax with axis=-1 
    y_pred = ??

    
    return y_pred

### d. Training Accuracy
Calculate the Training Accuracy

In [None]:
import sklearn
from sklearn.metrics import accuracy_score

y_pred = predict_two_layer_sigmoid(X_train, W_sigm, b_sigm)
accuracy = sklearn.metrics.accuracy_score(y_train, y_pred)

print('Training Accuracy =',accuracy*100,'%')

print('Training label  =',y_train[:15])
print('Predicted label =',y_pred[:15])

**EXPECTED OUTPUT**:

<pre>You should get about <b>~17%</b> accuracy on training set using the initial run

In [None]:
y_pred = predict_two_layer_sigmoid(X_val, W_sigm, b_sigm)
accuracy = sklearn.metrics.accuracy_score(y_val, y_pred)
print('Validation Accuracy =', accuracy*100,'%')

print('Validation label =',y_val[:15])
print('Predicted label  =',y_pred[:15])

**EXPECTED OUTPUT**:

<pre>You should also get about <b>~17%</b> accuracy on validation set</pre>

<br>

You can retrain further the weights by adding the pre-trained `W` and `b` to the arguments when calling training function


In [None]:
# W_sigm, b_sigm, loss = train_two_layer_sigmoid(X_train, y_train, W=W_sigm, b=b_sigm, hidden_size=50, epochs=3000)

---
## 4 - Two-Layer ReLU with Softmax

Now we implement Two-Layer Neural Network, but this time we're using ReLU activation function

By the end of this part you should see that ReLU converge much faster compared to Sigmoid


### a. Predict Function

This time, we implement the predict function first, because we are going to use `predict` function inside the `training` function to track the `validation` accuracy 

The network architecture should be: 
<pre><font color="blue"><b>Input - FC layer - ReLU - FC Layer - argmax


#### <font color='red'>**EXERCISE:**</font> 
Implement Predict Function

    * call affine forward function
    * call relu forward function
    * call affine forward function
    * call argmax to get max score id

In [None]:
def predict_two_layer_relu(X, W, b):    
    y_pred = np.zeros(X.shape[1])

    
    # calculate 1st layer score by calling affine forward function using X_batch, W[0], and b[0]
    layer1 = ??

    # calculate 1st activation score by calling relu forward function using layer1 score
    act1 = ??

    # calculate 2nd layer score by calling affine forward function using act1, W[1], and b[1]
    layer2 = ??
    
    # take the maximum prediction from layer 2 and use that column to get the class    
    # use np.argmax with axis=-1 
    y_pred = ?? 
    
    return y_pred

### b. Training Function

The network architecture should be: 
<pre><font color="blue"><b>Input - FC layer - ReLU - FC Layer - Softmax




#### <font color='red'>**EXERCISE:**</font> 

Implement Training Function

    * implement mini batch gradient descent
    
    * call affine forward function
    * call relu forward function
    * call affine forward function
    
    * call softmax function
    * call softmax_loss function
    
    * call affine backward function
    * call relu backward function
    * call affine backward function
    
    * implement weight update
    * add weights regularization
    * calculate the training and validation accuracy
    * decay learning rate
    

In [None]:
def train_two_layer_relu(X, y, X_val, y_val, hidden_size, 
                         W=None, b=None, lr=1e-4, lr_decay=0.9, 
                         reg=0.25, epochs=100, batch_size=200, verbose=True):
  
    num_train, dim = X.shape
    
    # check if data train is divisible by batch size
    assert num_train % batch_size==0, "data train "+str(num_train)+" is not divisible by batch size"+str(batch_size)
    
    # total iteration per epoch
    num_iter = num_train // batch_size
    
    #start iteration counts
    it = 0
    
    num_classes = np.max(y) + 1  # assume y takes values 0...K-1 where K is number of classes
    
    # initialize weights if not provided
    if W is None:
        W0 = 1e-4 * np.random.randn(dim, hidden_size)
        W1 = 1e-4 * np.random.randn(hidden_size, num_classes)
        W = [W0, W1]
    if b is None:
        b0 = np.zeros((1,hidden_size))
        b1 = np.zeros((1,num_classes))
        b = [b0, b1]

    # Run stochastic gradient descent to optimize W
    loss_history = []
    train_acc_history = []
    val_acc_history = []
                     
    for ep in range(epochs):
        # Shuffle data train index
        # see sigmoid train function
        train_rows = ??
        np.??
        
        # split index into mini batches
        # see sigmoid train function
        id_batch = ??
        
        for batch in id_batch:
      
            # get mini batch data and label
            X_batch = X[batch]
            y_batch = y[batch]

            # calculate 1st layer score by calling affine forward function using X_batch, W[0], and b[0]
            layer1 = ??

            # calculate 1st activation score by calling relu forward function using layer1 score
            act1 = ??

            # calculate 2nd layer score by calling affine forward function using act1, W[1], and b[1]
            layer2 = ??

            # calculate softmax score by calling softmax function using layer2 score
            softmax_score = ??

            # evaluate loss and gradient by calling softmax_loss function using softmax_score and y_batch
            loss, dout = ??

            # add regularization to the loss:
            #    for each weights, calculate the sum square, multiply by regularization strength
            #    then add it to the loss      
            # see sigmoid train function
            loss = ?? 

            # append the loss history
            loss_history.append(loss)

            # calculate layer 2 weights gradient by calling affine backward function using dout, act1, W[1], and b[1]
            dW1, db1, dact1 = ??

            # calculate sigmoid gradient by calling relu backward function using dact1 and act1 score
            dlayer1 = ??

            # calculate layer 1 weights gradient by calling affine backward function using dlayer1, X_batch, W[0], and b[0]
            dW0, db0, _ = ??


            # perform regulatization gradient
            # for each dWi, add with twice the weight multiplied by regularization strength
            # see sigmoid train function
            dW1 = ?? 
            dW0 = ?? 


            # perform parameter update by subtracting W and b with a fraction of dW and db
            # according to the learning rate
            W[0] = ?? 
            b[0] = ?? 
            W[1] = ?? 
            b[1] = ?? 
  
            # iteration count
            it +=1

      
            if verbose and it % 100 == 0:
                print ('iteration',it,'(epoch', ep,'/',epochs, '): loss =', loss)

            
        # At the end of one epoch
        # 1. Check accuracy
        #    calculate the training accuracy by calling predict_two_layer_relu function on X_batch
        #    and compare it tu y_batch. Then calculate the mean correct (accuracy in range 0-1)
        train_acc = (predict_two_layer_relu(X_batch, W, b) == y_batch).mean()
        train_acc_history.append(train_acc)

        # 2. Calculate the training accuracy by calling predict_two_layer_relu function on X_val
        #    and compare it tu y_val. Then calculate the mean correct (accuracy in range 0-1)
        val_acc = (predict_two_layer_relu(X_val, W, b) == y_val).mean()
        val_acc_history.append(val_acc)

        # 3. Decay learning rate
        #    multiply learning rate with decay
        #    see sigmoid train function
        lr = ??
  
    # compile all history
    history = [loss_history, train_acc_history, val_acc_history]
    
    if verbose:
        print('Training Done')
      
    return W, b, history

### c. Train the Softmax Classifier

Try the training Function using the initial parameter

In [None]:
W_relu, b_relu, history = train_two_layer_relu(X_train, y_train, 
                                               X_val, y_val, 
                                               hidden_size=50, 
                                               epochs=8)

**EXPECTED OUTPUT**:

<pre>The loss should starts around 2.3 and ends around 1.8 with only 8 epochs</pre>

Visualize the loss

In [None]:
loss, train_acc, val_acc = history

plt.plot(loss)
plt.xlabel('Epoch')
plt.ylabel('training loss')
plt.title('Training Loss history')
plt.show()


Visualize the training and validation accuracy

In [None]:
plt.plot(train_acc, label='train')
plt.plot(val_acc, label='val')
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Clasification accuracy')
plt.legend()
plt.show()

### d. Training Accuracy
Calculate the Training Accuracy

In [None]:
import sklearn
from sklearn.metrics import accuracy_score

y_pred = predict_two_layer_relu(X_train, W_relu, b_relu)
accuracy = sklearn.metrics.accuracy_score(y_train, y_pred)

print('Training Accuracy =',accuracy*100,'%')

print('Training label  =',y_train[:15])
print('Predicted label =',y_pred[:15])

**EXPECTED OUTPUT**:

<pre>You should be able to get about <b>~32%</b> accuracy on training set using the initial run

In [None]:
y_pred = predict_two_layer_relu(X_val, W_relu, b_relu)
accuracy = sklearn.metrics.accuracy_score(y_val, y_pred)
print('Validation Accuracy =', accuracy*100,'%')

print('Validation label =',y_val[:15])
print('Predicted label  =',y_pred[:15])

**EXPECTED OUTPUT**:

<pre>You should also be able to get about <b>~32%</b> accuracy on validation set</pre>

<br>

You can retrain further the weights by adding the pre-trained W and b to the arguments when calling training function

---


In [None]:
# W_relu, b_relu, history = train_two_layer_relu(X_train, y_train, X_val, y_val, W=W_relu, b=b_relu, hidden_size=50, epochs=1000)

## 5 - First Layer Visualization


In [None]:
## run this to turn off the scrolling effect in jupyter notebook

from IPython.core.magics.display import Javascript
Javascript("""
  IPython.OutputArea.prototype._should_scroll = function(lines) {
      return false;
  }"""
)

## set return true to turn on the scrolling effect

In [None]:
!wget 'https://raw.githubusercontent.com/CNN-ADF/Task2020/master/resources/vis_utils.py'

In [None]:
from vis_utils import visualize_grid

plt.rcParams['figure.figsize'] = [10, 10]

plt.imshow(visualize_grid(W_relu[0].reshape(32, 32, 3, -1).transpose(3, 0, 1, 2), padding=3).astype('uint8'))

plt.gca().axis('off')
plt.show()

<pre>You should see that if you re-train the network, the weight result visualization will be different.

---
# [Part 3] Hyperparameter Tuning

Now, let's build a two-layered Neural Network and train it using CIFAR-10 dataset

**What's wrong?**. 
* Looking at the visualizations above, we see that the loss is decreasing **more or less linearly**, which seems to suggest that the learning rate may be too low. 
* Moreover, there is **no gap between the training and validation** accuracy, suggesting that the model we used has low capacity, and that we should increase its size. 
* On the other hand, with a very large model we would expect to see more overfitting, which would manifest itself as a very large gap between the training and validation accuracy.

<br>

**Tuning**. 
* Tuning the hyperparameters and developing intuition for how they affect the final performance is a large part of using Neural Networks, so we want you to get a lot of practice.
* Below, you should experiment with different values of the various hyperparameters, including **hidden layer size**, **learning rate**, and **regularization strength**. 
* You might also consider tuning the **learning rate decay**, but you should be able to get good performance using the default value.

<br>

**Approximate results**. 
* You should be aim to achieve a classification accuracy of greater than **48% on the validation set**. 
* Our best network gets over 52% on the validation set.

---
## 1 - Fine Tune the Network

#### <font color="red">**EXERCISE:**</font>
    * Use the validation set to tune hyperparameters (regularization strength and learning rate)
    * find the best learning rate and regularization strength using staged random search, (Coarse-to-Fine Search)
    * try to gradually decrease the random range to find the best learning rate and regularization strength
    * use only few epochs or iteration

In [None]:
import warnings
import datetime
warnings.filterwarnings('ignore')

results   = {}
best_val  = -1
best_reg  = 0
best_lr   = 0

best_W    = None
best_b    = None
max_epoch = 3
max_trial = 30

for trial in range(max_trial):
    
    reg  = 10**np.random.uniform(-2, 1)     # <---------- you can try and change this <----------
    lr   = 10**np.random.uniform(-2,-4)     # <---------- you can try and change this <----------    
    hidden_size = 200                       # <---------- you can try and change this <----------    
    
    W, b, H = train_two_layer_relu(X_train, y_train, X_val, y_val, hidden_size,
                                      epochs=max_epoch, batch_size=200, 
                                      lr=lr, lr_decay=0.9, 
                                      reg=reg, verbose=False) 
    val_acc = (predict_two_layer_relu(X_val, W, b) == y_val).mean() 
    if val_acc > best_val: 
        best_W = W 
        best_b = b 
        best_val = val_acc 
        best_lr  = lr 
        best_reg = reg 
    print( "%s,   val_acc: %1.4f,   lr: %1.15f,   reg: %1.15f,   trial: %s/%s" % 
          (str(datetime.datetime.now()), val_acc, lr, reg, trial, max_trial ) )
    
print ("best regularizer  : ", best_reg)
print ("best learning rate: ", best_lr)



#### <font color="red">**EXERCISE:**</font>
    * Try different range of hyperparameter
    * Try different strategy to find the hyperparameter
    * Try to finetune the other hyperparameter such as number of hidden neuron and lr_decay
    * Try other architectures such as changing the activation function

---
## 2 - Train the Network Fully

When you are done experimenting,

Train the network for longer epochs using the best **`learning rate`** and best **`regularization strength`**


In [None]:
print ("regularizer  : ", best_reg)
print ("learning rate: ", best_lr)

best_W, best_b, history = train_two_layer_relu(X_train, y_train, X_val, y_val,
                                               W = best_W, b = best_b,
                                               hidden_size=50, epochs=15, 
                                               lr = best_lr, reg = best_reg)

Visualize the loss

In [None]:
loss, train_acc, val_acc = history
plt.plot(loss)
plt.xlabel('Epoch')
plt.ylabel('training loss')
plt.title('Training Loss history')
plt.show()


Visualize the training and validation accuracy

In [None]:
plt.plot(train_acc, label='train')
plt.plot(val_acc, label='val')
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Clasification accuracy')
plt.legend()
plt.show()

---
## 3 - Accuracy and Visualization
Calculate the Training Accuracy

In [None]:
import sklearn
from sklearn.metrics import accuracy_score

y_pred = predict_two_layer_relu(X_train, best_W, best_b)
accuracy = sklearn.metrics.accuracy_score(y_train, y_pred)

print('Training Accuracy =',accuracy*100,'%')

print('Training label  =',y_train[:15])
print('Predicted label =',y_pred[:15])

**EXPECTED OUTPUT**:

<pre>If you're careful, You should be able to get about <b>~60%</b> accuracy on training set 

In [None]:
y_pred = predict_two_layer_relu(X_val, best_W, best_b)
accuracy = sklearn.metrics.accuracy_score(y_val, y_pred)
print('Validation Accuracy =', accuracy*100,'%')

print('Validation label =',y_val[:15])
print('Predicted label  =',y_pred[:15])

**EXPECTED OUTPUT**:

<pre>You should also be able to get about <b>~50%</b> accuracy on validation set</pre>

<br>

You can retrain further the weights by adding the pre-trained W and b to the arguments when calling training function

---


In [None]:
plt.imshow(visualize_grid(best_W[0].reshape(32, 32, 3, -1).transpose(3, 0, 1, 2), padding=3).astype('uint8'))

plt.gca().axis('off')
plt.show()

---
## 4 - Test the Trained Weights

Evaluate your final trained network on the test set; you should be able get **above 48%.**

In [None]:
y_pred = predict_two_layer_relu(X_test, best_W, best_b)

accuracy = sklearn.metrics.accuracy_score(y_test, y_pred)

print('Testing Accuracy =', accuracy*100,'%')
print('Test label       =',y_test[:15])
print('Predicted label  =',y_pred[:15])

**EXPECTED OUTPUT**:

<pre>If you're careful, You should be able to get about <b>~48%</b> accuracy on test set 

---
## 5 - Missclassified Images
An important way to gain intuition about how an algorithm works is to visualize the mistakes that it makes. 
 
 In this visualization, we show examples of images that are misclassified by our current system. 
 
 The first column  shows images that our system labeled as "plane" but whose true label is  something other than "plane".

In [None]:
plt.rcParams['figure.figsize'] = [10, 10]

print('\n\n    missclassified images\n')
examples_per_class = 8
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for cls, cls_name in enumerate(classes):
    idxs = np.where((y_test != cls) & (y_pred == cls))[0]
    idxs = np.random.choice(idxs, examples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt.subplot(examples_per_class, len(classes), i * len(classes) + cls + 1)
        plt.imshow(X_test_ori[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls_name)
plt.show()


---
# [Part 4] Two-Layer NN on Feature Space

Similar with the previous exercise, in this exercise we will show that we can improve our classification performance by training two-layer classifiers not on raw pixels but on features that are computed from the raw pixels.

All of your work for this exercise will be done in this notebook.

In [None]:
from __future__ import print_function

from scipy.ndimage import uniform_filter

## 1 - Feature Extraction Functions


In [None]:
def extract_features(imgs, feature_fns, verbose=False):
    num_images = imgs.shape[0]
    if num_images == 0:
        return np.array([])

    feature_dims = []
    first_image_features = []
    for feature_fn in feature_fns:
        feats = feature_fn(imgs[0].squeeze())
        assert len(feats.shape) == 1, 'Feature functions must be one-dimensional'
        feature_dims.append(feats.size)
        first_image_features.append(feats)

    total_feature_dim = sum(feature_dims)
    imgs_features = np.zeros((num_images, total_feature_dim))
    imgs_features[0] = np.hstack(first_image_features).T

    for i in range(1, num_images):
        idx = 0
        for feature_fn, feature_dim in zip(feature_fns, feature_dims):
            next_idx = idx + feature_dim
            imgs_features[i, idx:next_idx] = feature_fn(imgs[i].squeeze())
            idx = next_idx
        if verbose and i % 1000 == 0:
            print('Done extracting features for %d / %d images' % (i, num_images))

    return imgs_features

In [None]:
def rgb2gray(rgb):
    return np.dot(rgb[..., :3], [0.299, 0.587, 0.144])

In [None]:
def hog_feature(im):

    if im.ndim == 3:
        image = rgb2gray(im)
    else:
        image = np.at_least_2d(im)

    sx, sy = image.shape  
    orientations = 9      
    cx, cy = (8, 8)       

    gx = np.zeros(image.shape)
    gy = np.zeros(image.shape)
    gx[:, :-1] = np.diff(image, n=1, axis=1)
    gy[:-1, :] = np.diff(image, n=1, axis=0)
    grad_mag = np.sqrt(gx ** 2 + gy ** 2) 
    grad_ori = np.arctan2(gy, (gx + 1e-15)) * (180 / np.pi) + 90  

    n_cellsx = int(np.floor(sx / cx)) 
    n_cellsy = int(np.floor(sy / cy))
    
    orientation_histogram = np.zeros((n_cellsx, n_cellsy, orientations))
    for i in range(orientations):        
        temp_ori = np.where(grad_ori < 180 / orientations * (i + 1),
                            grad_ori, 0)
        temp_ori = np.where(grad_ori >= 180 / orientations * i,
                            temp_ori, 0)
        cond2 = temp_ori > 0
        temp_mag = np.where(cond2, grad_mag, 0)
        orientation_histogram[:, :, i] = uniform_filter(temp_mag, size=(cx, cy))[int(cx / 2)::cx, int(cy / 2)::cy].T

    return orientation_histogram.ravel()



In [None]:
def color_histogram_hsv(im, nbin=10, xmin=0, xmax=255, normalized=True):
    ndim = im.ndim
    bins = np.linspace(xmin, xmax, nbin + 1)
    hsv = matplotlib.colors.rgb_to_hsv(im / xmax) * xmax
    imhist, bin_edges = np.histogram(hsv[:, :, 0], bins=bins, density=normalized)
    imhist = imhist * np.diff(bin_edges)

    return imhist

## 2 - Reload the CIFAR-10 dataset

In [None]:
try:
    del X_train, y_train
    del X_test, y_test
    print('Clear previously loaded data.')
except:
    pass

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()

mask = list(range(40000, 50000))
X_val = X_train[mask]
y_val = y_train[mask]
mask = list(range(40000))
X_train = X_train[mask]
y_train = y_train[mask]

classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'forse', 'ship', 'truck']

## 3 - Extract Features

In [None]:
num_color_bins = 20 # Number of bins in the color histogram
feature_fns    = [hog_feature, lambda img: color_histogram_hsv(img, nbin=num_color_bins)]

X_train_feats  = extract_features(X_train, feature_fns, verbose=True)
X_val_feats    = extract_features(X_val, feature_fns)
X_test_feats   = extract_features(X_test, feature_fns)

# Preprocessing: Subtract the mean feature
mean_feat = np.mean(X_train_feats, axis=0, keepdims=True)
X_train_feats -= mean_feat
X_val_feats   -= mean_feat
X_test_feats  -= mean_feat

# Preprocessing: Divide by standard deviation. This ensures that each feature
# has roughly the same scale.
std_feat = np.std(X_train_feats, axis=0, keepdims=True)
X_train_feats /= std_feat
X_val_feats   /= std_feat
X_test_feats  /= std_feat

# Preprocessing: Add a bias dimension
X_train_feats = np.hstack([X_train_feats, np.ones((X_train_feats.shape[0], 1))])
X_val_feats   = np.hstack([X_val_feats, np.ones((X_val_feats.shape[0], 1))])
X_test_feats  = np.hstack([X_test_feats, np.ones((X_test_feats.shape[0], 1))])

In [None]:
y_train = y_train.ravel()
y_val   = y_val.ravel()
y_test  = y_test.ravel()

print('X_train_feats.shape =', X_train_feats.shape)

print('y_train.shape =',y_train.shape)
print('y_val.shape   =',y_val.shape)
print('y_test.shape  =',y_test.shape)

## 4 - Train a Two-Layer Neural Network

Again, fine tune the network, and find the best hyperparameter (learning rate, regularizations, bins, hidden neuron, etc)

Then train the network once again using feature space CIFAR10 dataset

In [None]:
W, b, H = train_two_layer_relu(X_train_feats, y_train, X_val_feats, y_val,
                               hidden_size=100, epochs=10,
                               lr = 0.9, reg = 0.0)

This approach should outperform all previous approaches: you should easily be able to achieve **over 55%** classification accuracy on the test set; 

our best model achieves **about 60%** classification accuracy.

In [None]:
y_pred = predict_two_layer_relu(X_val_feats, W, b)

accuracy = sklearn.metrics.accuracy_score(y_val, y_pred)

print('Validation Accuracy =', accuracy*100,'%')
print('Test label          =',y_val[:15])
print('Predicted label     =',y_pred[:15])

In [None]:
y_pred = predict_two_layer_relu(X_test_feats, W, b)

accuracy = sklearn.metrics.accuracy_score(y_test, y_pred)

print('Testing Accuracy =', accuracy*100,'%')
print('Test label       =',y_test[:15])
print('Predicted label  =',y_pred[:15])

---
## 5 - Missclassified Images
An important way to gain intuition about how an algorithm works is to visualize the mistakes that it makes. 
 
 In this visualization, we show examples of images that are misclassified by our current system. 
 
 The first column  shows images that our system labeled as "plane" but whose true label is  something other than "plane".

In [None]:
plt.rcParams['figure.figsize'] = [10, 10]

print('\n\n    missclassified images\n')
examples_per_class = 8
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for cls, cls_name in enumerate(classes):
    idxs = np.where((y_test != cls) & (y_pred == cls))[0]
    idxs = np.random.choice(idxs, examples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt.subplot(examples_per_class, len(classes), i * len(classes) + cls + 1)
        plt.imshow(X_test_ori[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls_name)
plt.show()


---

# Congratulation, You've Completed Exercise 3

<p>Copyright &copy;  <a href=https://www.linkedin.com/in/andityaarifianto/>2020 - ADF</a> </p>

![footer](https://i.ibb.co/yX0jfMS/footer2020.png)