## 1.7 Augmentation

이번 실습에서는 Augmentation에 대해서 알아보고자 합니다. 데이터가 적은 상황에서 성능을 향상시킬 수 있는 방법이다. 이번 실습에서는 가장 간단한 Augmentation 방법인 noise를 추가하는 방법에 대해서 알아보고자 합니다. 

In [None]:
import math
import random 
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np 

seed = 2020
random.seed(seed)
np.random.seed(seed=seed)
tf.random.set_random_seed(seed)

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

x_train = x_train.reshape([-1, 28 * 28]) 
x_test = x_test.reshape([-1, 28 * 28])

m = np.random.randint(0, high=60000, size=1100, dtype=np.int64)
x_train = x_train[m]
y_train = y_train[m]

i = np.arange(1100)
np.random.shuffle(i)
x_train = x_train[i]
y_train = y_train[i]

x_valid = x_train[:100]
y_valid = y_train[:100]

x_train = x_train[100:]
y_train = y_train[100:]

x = tf.placeholder(tf.float32, [None, 28 * 28])
y = tf.placeholder(tf.int32, [None])
training = tf.placeholder(tf.bool)

n_units = [28 * 28, 512, 512, 10]

weights, biases = [], []
for i, (n_in, n_out) in enumerate(zip(n_units[:-1], n_units[1:])):
    stddev = math.sqrt(2 / n_in) # Kaiming He Initialization
    weight = tf.Variable(tf.random.truncated_normal([n_in, n_out], mean=0, stddev=stddev))
    bias = tf.Variable(tf.zeros([n_out]))
    weights.append(weight)
    biases.append(bias)

Input Layer에 Gaussian Noise를 추가합니다. 

In [None]:
layer = x + tf.cond(training, lambda: tf.random.normal(tf.shape(x), mean=0.0, stddev=50), lambda: tf.zeros_like(x))
for i, (weight, bias) in enumerate(zip(weights, biases)):
    layer = tf.matmul(layer, weight) + bias
    if i < len(weights) - 1:
        layer = tf.nn.tanh(layer)  
y_hat = layer

다시 다른 부분들은 이전 실습과 동일하게 진행해 줍니다.

In [None]:
y_hot = tf.one_hot(y, 10)
costs = tf.nn.softmax_cross_entropy_with_logits_v2(
        labels=y_hot, logits=y_hat)
cross_entropy_loss = tf.reduce_mean(costs)
loss = cross_entropy_loss

accuracy = tf.count_nonzero(
        tf.cast(tf.equal(tf.argmax(y_hot, 1), tf.argmax(y_hat, 1)),
                tf.int64)) / tf.cast(tf.shape(y_hot)[0], tf.int64)

extra_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_ops):
    optimizer = tf.train.AdamOptimizer(1e-3)
    train_op = optimizer.minimize(loss)
    
gpu_options = tf.GPUOptions()
gpu_options.allow_growth = True
session = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
session.run(tf.global_variables_initializer())

max_valid_epoch_idx = 0
max_valid_accuracy = 0.0
final_test_accuracy = 0.0
for epoch_idx in range(1, 1000 + 1):
    session.run(
            train_op,
            feed_dict={
                x: x_train,
                y: y_train,
                training: True
            })
    
    if epoch_idx % 10 == 0:
        train_loss_value, train_accuracy_value = session.run(
            [loss, accuracy],
            feed_dict={
                x: x_train,
                y: y_train,
                training: False
            })
        
        valid_loss_value, valid_accuracy_value = session.run(
            [loss, accuracy],
            feed_dict={
                x: x_valid,
                y: y_valid,
                training: False
            })
            
        test_loss_value, test_accuracy_value = session.run(
            [loss, accuracy],
            feed_dict={
                x: x_test,
                y: y_test,
                training: False
            })

        print(epoch_idx, '%.4f' % train_loss_value, '%.4f' % valid_loss_value, '%.4f' % test_loss_value, '%.4f' % train_accuracy_value, '%.4f' % valid_accuracy_value, '%.4f' % test_accuracy_value)
        
        if max_valid_accuracy < valid_accuracy_value:
            max_valid_accuracy = valid_accuracy_value 
            max_valid_epoch_idx = epoch_idx
            final_test_accuracy = test_accuracy_value
            
    # Early Stop
    if max_valid_epoch_idx + 100 < epoch_idx:
        break
        
print(final_test_accuracy)

87.10% -> 90.39% 성능이 향상됨을 확인할 수 있습니다. 

### 연습문제

Q1. Gaussian Noise Level에 따라 어떻게 성능이 변화하는지 확인해봅시다.

Q2. Image rotation, shear, shift, zoom augmentation을 적용해봅시다. 여러가지 augmentation을 한번에 사용하였을 때 성능의 추가 향상이 있나요? 
    

Q3. Augmenation이 잘 동작하는지 matplotlib library를 사용하여 직접 이미지를 확인해봅시다.
    (original , rotation, shear, shift, zoom)


### Q2 HINT ###

***augment_data*** function 를 완성한 후, 적절한 cell에 넣어 사용하세요.

Augmentation를 위해 다음 함수들을 사용하시면 됩니다

**rotation** -- tf.contrib.keras.preprocessing.image.random_rotation()   
**shear**    -- tf.contrib.keras.preprocessing.image.random_shear()   
**shift**    -- tf.contrib.keras.preprocessing.image.random_shift()  
**zoom**     -- tf.contrib.keras.preprocessing.image.random_zoom()  

```python
def augment_data(dataset, dataset_labels, random_rotation=True, random_shear=True,
                 random_shift=True, random_zoom=True):
    '''
    [Argument]
    dataset -- input dataset, shape:(N, Height, Width, Channel)
    dataset_labels -- input dataset labels, shape: (N,)
    
    [Return]
    augmented dataset -- type: numpy array, shape:(N, 28*28)
    augmented labels -- type: numpy array, shape:(N,)
    '''  
    augmented_dataset = []
    augmented_labels = []

    for i in range (0, dataset.shape[0]):
       pass
           
    return augmented_dataset, augmented_labels
```


### Q3 Hint ###

```python
fig = plt.figure(figsize=(15,10))
plt.subplot(1,5,1)
plt.title('Original image')
plt.imshow() #TODO
plt.subplot(1,5,2)
plt.title('Rotated image')
plt.imshow() #TODO
plt.subplot(1,5,3)
plt.title('Sheared image')
plt.imshow() #TODO
plt.subplot(1,5,4)
plt.title('Shifted image')
plt.imshow() #TODO
plt.subplot(1,5,5)
plt.title('Zoomed image')
plt.imshow() #TODO
        
```