#NN: Design, Training, Regularization (Day 9)

##Recap


<img src="https://www.researchgate.net/publication/299474560/figure/fig6/AS:349583008911366@1460358492284/An-example-of-a-deep-neural-network-with-two-hidden-layers-The-first-layer-is-the-input.png" alt="nn" width="400"/>

In neural network (NN), we have three types out layer: input, hidden, output. NN can be used for both classification and regression. However, we need to optimize NN’s components to make it works well with its input type and objective.
![table](https://drive.google.com/uc?export=view&id=1J0_K8BJ669aoPD_RRUGskUNoxfyf_DTA)

##Regularization
Regularization happens in learning period, which is intended to reduce its generalization (test) error but not training error. These are different techniques to make the NN more robust:
-	Norm Penalties
-	Early Stopping
-	Data Augmentation
-	Dropout

##Norm Penalties
![nn_penalty](https://miro.medium.com/proxy/0*SY_r-Ltc9mB6pNBK)

Key idea: we regularize each weight in the network, so the model becomes less overfitted and achieve higher testing score.

---


##Early Stopping

<img src="https://miro.medium.com/max/973/1*nhmPdWSGh3ziatQKOmVq0Q.png" alt="early" width="400"/>

Key idea: terminate while validation/test data performance is better.

<img src="https://drive.google.com/uc?export=view&id=12faTvQffHodPJrtrYSKxivXxR2CXILIK" alt="stop" width="400"/>


---

##Data augmentation
Note: there will be a mini lab to show how NN can be susceptible to noisy data.

<img src="https://drive.google.com/uc?export=view&id=1mzf40FKIt2LKWLpCmqlrvdsJT-fukQqF" alt="panda" width="600"/>


<img src="https://drive.google.com/uc?export=view&id=1RM6Eua080h5KVuKtqKy0Op9SlR6STJHm" alt="result" width="400"/>


Key idea: training with noisy and transformed data (which is a process called adversarial training) makes the neural network more robust (this is under adversarial defense/attack research field)

Source: Cihang Xie, et al. “Adversarial Examples Improve Image Recognition”, 2019; [http://arxiv.org/abs/1911.09665 arXiv:1911.09665].

---


##Dropout

![dropout](https://drive.google.com/uc?export=view&id=10U_pG_BQkrqaHBJ7rpWLHzeoXKU9wCSl)

Key idea: randomly delete the connections between nodes to prevent overfitting.

Source: https://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf

#Mini-lab

![adversarial attack](https://miro.medium.com/max/1400/1*n18mfvFgeZTLVxx07iBNkA.png)

##Objective
This lab's objective is to show how neural network can be susceptible to transformed data. We also want to show how neural network can behave abnormally even though we can interpret the transformed data without any difficulty.

Let's use the same model and digits data from yesterday

Note: the model takes about 2 minutes to train

In [None]:
from keras.datasets import mnist
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPClassifier # import multi-layer perceptron classifier

(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train / 255.0
X_test = X_test / 255.0

X_train = X_train.reshape(60000,784)
X_test = X_test.reshape(10000,784)

hidden_layer_architecture = (764, 100)

mlp = MLPClassifier(hidden_layer_sizes=hidden_layer_architecture, max_iter=10, alpha=1e-4, 
                     solver='sgd', verbose=10, tol=1e-4, random_state=1,
                     learning_rate_init=.1)
mlp.fit(X_train, y_train)

print('\ntraining score: {}'.format(mlp.score(X_train, y_train)))
print('test score: {}'.format(mlp.score(X_test, y_test)))

Let's see what the noisy data looks like to us. Try running the module below and change index!

In [None]:
import numpy as np
import random
import cv2

random.seed(1)
_, (X_test_2, y_test_2) = mnist.load_data()

def sp_noise(image,prob = .01):
    '''
    Add salt and pepper noise to image
    prob: Probability of the noise
    '''
    output = np.zeros(image.shape,np.uint8)
    thres = 1 - prob 
    for i in range(image.shape[0]):
        for j in range(image.shape[1]):
            rdn = random.random()
            if rdn < prob:
                output[i][j] = 0
            elif rdn > thres:
                output[i][j] = random.randrange(0, 255)
            else:
                output[i][j] = image[i][j]
    return output

noise_index = 150 # <--- CHANGE ME. change this index to see different test data

# transform image by adding noise and then scale to [0,1]
img = X_test_2[noise_index].reshape((28,28))
noise_img = sp_noise(img).reshape(784)
noise_img = noise_img / 255.0

output_noise = mlp.predict([noise_img]) # make a prediction
print("Predict below picture as: {}".format(output_noise[0]))
plt.imshow(noise_img.reshape((28,28)), cmap=plt.get_cmap('gray'))
# show the figure
plt.show()

print("while the true label is: {}".format(y_test_2[noise_index]))
plt.imshow(X_test_2[noise_index].reshape((28,28)), cmap=plt.get_cmap('gray'))
plt.show()

The below module try to add noise to every test data. Then make a prediction on that noisy data. We want to see which picture the model predicts incorrectly

In [None]:
# check indices of wrong prediction of noisy image

random.seed(1)
sum = []
noisy_wrong_pred_indices = []
for i in range(10000):
  img = X_test_2[i].reshape((28,28))
  noise_img = sp_noise(img).reshape(784)
  noise_img = noise_img / 255.0
  output_noise = mlp.predict([noise_img])
  if (output_noise[0] != y_test_2[i]):
    sum.append(0)
    noisy_wrong_pred_indices.append(i)
  else:
    sum.append(1)

print("Accuracy for noisy images: {}".format(np.mean(sum)))

#indices of wrong prediction of clean image
random.seed(1)
sum2 = []
clean_wrong_pred_indices = []

for i in range(10000):
  if (mlp.predict([X_test_2[i].reshape(784) / 255.0])[0] != y_test_2[i]):
    sum2.append(0)
    clean_wrong_pred_indices.append(i)
  else:
    sum2.append(1)

print("Accuracy for clean images: {}".format(np.mean(sum2)))

HMMM... The accuracy for noisy images becomes slightly lowered. Let's see which one the adversarial attack works.

###Note: the below portion does not work perfectly. You can look at it as a reference. Please refer to the last module for more consistently adversarial attack

In [None]:
# show pictures that the model predicts incorrectly
# note: the model includes some incorrect prediction from both indices. This may be due to how random seed works differently in each module
# for more consistent data, please refer to the very last module
difference = [item for item in noisy_wrong_pred_indices if item not in clean_wrong_pred_indices]

random.seed(1)
for noise_index in difference[:30]:
  img = X_test_2[noise_index].reshape((28,28))
  noise_img = sp_noise(img).reshape(784)
  noise_img = noise_img / 255.0

  output_noise = mlp.predict([noise_img])

  if output_noise[0] != y_test_2[noise_index]:
    print("index: " + str(noise_index))
    print("Predict below picture as: {} \nwhile the true label is: {}".format(output_noise[0], y_test_2[noise_index]))
    plt.imshow(noise_img.reshape((28,28)), cmap=plt.get_cmap('gray'))
    # show the figure
    plt.show()

Note: due to random seed somehow change in every ipynb section, the wrong prediction indices may be different for each of you.

Try these three indices: **684, 4861, 6572**

In [None]:
random.seed(1)

# check for true value


''' try these three indices: 684, 4861, 6572 '''
noise_index = 684 # <--- change this index to see different test data

img = X_test_2[noise_index].reshape((28,28))
noise_img = sp_noise(img).reshape(784)
noise_img = noise_img / 255

# show noisy image and its wrong prediction
output_noise = mlp.predict([noise_img])
print("Predict noisy picture below as: {} \nwhile the true label is: {}".format(output_noise[0], y_test_2[noise_index]))
plt.imshow(noise_img.reshape((28,28)), cmap=plt.get_cmap('gray'))
plt.show()

# double check with clean data
img = X_test_2[noise_index].reshape(784) / 255
print("\nPredict clean data as: {}".format(mlp.predict([img])[0]))
print("Clean data:")
plt.imshow(img.reshape((28,28)), cmap=plt.get_cmap('gray'))
plt.show()

In ML, this is a research area called adversarial machine learning. We want to make the model becomes more robust. Different techniques are used, such as training the model with noisy data. But the whole topic is too deep and our goal for this lab is to let you have fun with this lab. Maybe you can try to train the model with noisy data and see how it performs!