## LeNet5

논문 : Gradient-Based Learning Applied to Document Rocognition

augmentation : 딥러닝 기법이 아닌 전통적인 기법
    
    - invariant -> CNN에서 위치에 따라 잘 맞는 것도 있지만 아닌것도(equivariance) -> invariant하게 만들기 위해
    
### 특징
1. activation funciton= hyperbolic tangent
2. optimizer = Stochastic Gradient Descent(SGD)
    - SGD : shuffle해서 그중 한 개만 가지고 gradient desent -> local에 빠지지 않는다
    - 속도가 빠르지만 노이즈가 있으면 성능 안 좋음
    - sigmoid가 양수고 0~1범위로 여러 문제 -> tanh씀(-1~1)
    - hyper dimensional(초평면)에서는 local minimum이 거의 없다 -> 요즘 거의 고려 안함
    
3. Average pooling
4. learning rate decay

- https://arxiv.org/abs/1406.2572


## LeNet5 구현

In [2]:
import tensorflow as tf

In [20]:
model = tf.keras.models.Sequential()

In [21]:
model.add(tf.keras.layers.Conv2D(filters = 6,kernel_size=5, strides=1,
                                activation = 'tanh', padding ='same', input_shape=(32,32,1)))

In [22]:
model.add(tf.keras.layers.AveragePooling2D(pool_size=2,strides=2,padding = 'valid'))

In [23]:
model.add(tf.keras.layers.Conv2D(filters = 16,kernel_size=5, strides=1,
                                activation = 'tanh', padding ='valid'))
model.add(tf.keras.layers.AveragePooling2D(pool_size=2,strides=2,padding = 'valid'))

In [24]:
model.add(tf.keras.layers.Conv2D(filters = 120,kernel_size=5, strides=1,
                                activation = 'tanh', padding ='valid'))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(84, activation='tanh'))
model.add(tf.keras.layers.Dense(10, activation='softmax'))

In [10]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 32, 32, 6)         156       
_________________________________________________________________
average_pooling2d (AveragePo (None, 16, 16, 6)         0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 12, 12, 16)        2416      
_________________________________________________________________
average_pooling2d_1 (Average (None, 6, 6, 16)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 2, 2, 120)         48120     
_________________________________________________________________
flatten (Flatten)            (None, 480)               0         
_________________________________________________________________
dense (Dense)                (None, 84)                4

In [12]:
def learning_rate(epoch):
    lr = 5e-4
    if epoch <=2:
        lr = 2e-4
    elif epoch >2 and epoch <= 5:
        lr = 5e-5
    elif epoch >5 and epoch <= 9:
        lr = 1e-5
    return lr

In [14]:
tf.keras.optimizers.SGD(learning_rate = learning_rate)

<tensorflow.python.keras.optimizer_v2.gradient_descent.SGD at 0x1d71c30d940>

In [15]:
model.compile(loss='categorical_crossentropy', optimizer=tf.keras.optimizers.SGD(learning_rate = learning_rate))

# SGD 대문자는 class , 소문자는 function

learningrate decay : epoch을 돌 때마다 lr를 줄인다.

**callback 활용!**
- epoch 돌때마다 변화하게 하기

- 에폭 하나 끝낼 때 telegram 보내는 등의 작업 가능

https://www.tensorflow.org/tutorials/keras/save_and_load

In [16]:
from tensorflow.keras.callbacks import LearningRateScheduler
lr = LearningRateScheduler(learning_rate)

In [None]:
model.fit(x,y,epochs=20,callbacks=[lr])

In [28]:
class CA(tf.keras.callbacks.Callback):
    def on_epoch_begin(self, epoch, logs=None): # epoch 
        print('epoch begin')
    def on_epoch_end(self, epoch, logs=None):  # epoch 끝나면 실행
        print('epoch end')

In [17]:
dir(tf.keras.callbacks.Callback)
# predict, batch별, train 등에서 모두 가능

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_implements_predict_batch_hooks',
 '_implements_test_batch_hooks',
 '_implements_train_batch_hooks',
 '_keras_api_names',
 '_keras_api_names_v1',
 'on_batch_begin',
 'on_batch_end',
 'on_epoch_begin',
 'on_epoch_end',
 'on_predict_batch_begin',
 'on_predict_batch_end',
 'on_predict_begin',
 'on_predict_end',
 'on_test_batch_begin',
 'on_test_batch_end',
 'on_test_begin',
 'on_test_end',
 'on_train_batch_begin',
 'on_train_batch_end',
 'on_train_begin',
 'on_train_end',
 'set_model',
 'set_params']

In [25]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1,28,28,1)

In [33]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(filters = 6,kernel_size=5, strides=1,
                                activation = 'tanh', padding ='same', input_shape=(28,28,1)))
model.add(tf.keras.layers.AveragePooling2D(pool_size=2,strides=2,padding = 'valid'))
model.add(tf.keras.layers.Conv2D(filters = 16,kernel_size=5, strides=1,
                                activation = 'tanh', padding ='valid'))
model.add(tf.keras.layers.AveragePooling2D(pool_size=2,strides=2,padding = 'valid'))
model.add(tf.keras.layers.Conv2D(filters = 120,kernel_size=5, strides=1,
                                activation = 'tanh', padding ='valid'))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(84, activation='tanh'))
model.add(tf.keras.layers.Dense(10, activation='softmax'))

model.compile(loss='sparse_categorical_crossentropy', optimizer=tf.keras.optimizers.SGD())


In [29]:
model.fit(x_train,y_train,epochs=1,callbacks=[CA()])

epoch begin


<tensorflow.python.keras.callbacks.History at 0x1d71c4dc630>

In [30]:
model.fit(x_train,y_train,epochs=1,callbacks=[CA(), tf.keras.callbacks.TensorBoard()])

epoch begin
   1/1875 [..............................] - ETA: 0s - loss: 0.4005

W1006 20:47:54.638223 20056 deprecation.py:323] From C:\Users\Gyu\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\summary_ops_v2.py:1277: stop (from tensorflow.python.eager.profiler) is deprecated and will be removed after 2020-07-01.
Instructions for updating:
use `tf.profiler.experimental.stop` instead.


   2/1875 [..............................] - ETA: 4:56 - loss: 0.2936

W1006 20:47:54.899499 20056 callbacks.py:323] Callbacks method `on_train_batch_begin` is slow compared to the batch time (batch time: 0.0100s vs `on_train_batch_begin` time: 0.0409s). Check your callbacks.
W1006 20:47:54.900497 20056 callbacks.py:328] Callbacks method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0100s vs `on_train_batch_end` time: 0.2653s). Check your callbacks.




<tensorflow.python.keras.callbacks.History at 0x1d71c51c550>

In [31]:
%load_ext tensorboard

In [32]:
%tensorboard --logdir logs

ERROR: Timed out waiting for TensorBoard to start. It may still be running as pid 28636.

In [35]:
model.fit(x_train,y_train,epochs=10,callbacks=[CA(),lr])

epoch begin
Epoch 1/10
epoch begin
Epoch 2/10
epoch begin
Epoch 3/10
epoch begin
Epoch 4/10
epoch begin
Epoch 5/10
epoch begin
Epoch 6/10
epoch begin
Epoch 7/10
epoch begin
Epoch 8/10
epoch begin
Epoch 9/10
epoch begin
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x1d71d83ecc0>


    
<br></br>

## alexnet

논문 : ImageNet Classification with Deep Convolutonal Neural Networks by Hinton

- 이전보다 효율적으로 computing resource 사용

### 특징
1. GPU사용 : 하드웨어 성능 때문에 두개로 쪼갬
2. couvolution layer
    - stride : 4 -> 학습시간 문제로
    - kernel size도 크다
    - zero-padding
    - maxpooling, overlap(window)
    - Local Response Normalization(batch normalization)
    - **relu** : 속도 빠름(rectified)
    
---

### overfitting 줄이기

현대의 대부분 기법의 기반임

- augmentation

- Dropout도 Ensemble techique이라고 할 수 있음.
    - 학습시에만 랜덤하게 없애기 때문에 모델을 만들어도 합쳐서 하나의 최종 모델을 만든다.
    - overfitting 줄임
- kernel initialize
- batch size 키우기
- optimizer : SGD
- learning rate decay -> Plateu(고원 현상)
- model 여러 개 만들어 ensemble
- weight decay

## alexnet 구조

In [61]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(filters = 96,kernel_size = 11,strides = 4, padding='valid'
                                 ,kernel_regularizer=tf.keras.regularizers.l2(0.005), activation = 'relu'
                                ,input_shape=(227,227,3)))

In [62]:
model.add(tf.keras.layers.MaxPool2D(pool_size=3,strides=2,padding='valid'))

In [63]:
#model.add(tf.keras.layers.LRN  local regualr normal
model.add(tf.keras.layers.Conv2D(filters = 256,kernel_size = 3,strides = 1, padding='valid'
                                 ,kernel_regularizer=tf.keras.regularizers.l2(0.005), activation = 'relu'))
model.add(tf.keras.layers.MaxPool2D(pool_size=3,strides=2,padding='same'))

In [64]:
model.add(tf.keras.layers.Conv2D(filters = 384,kernel_size = 3,strides = 1, padding='same'
                                 ,kernel_regularizer=tf.keras.regularizers.l2(0.005), activation = 'relu'))
model.add(tf.keras.layers.MaxPool2D(pool_size=3,strides=1,padding='same'))

In [65]:
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(4096, activation='relu'))
model.add(tf.keras.layers.Dropout(0.5))

In [66]:
model.add(tf.keras.layers.Dense(1000, activation='softmax'))

In [67]:
model.summary()

Model: "sequential_8"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_22 (Conv2D)           (None, 55, 55, 96)        34944     
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 27, 27, 96)        0         
_________________________________________________________________
conv2d_23 (Conv2D)           (None, 25, 25, 256)       221440    
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 13, 13, 256)       0         
_________________________________________________________________
conv2d_24 (Conv2D)           (None, 13, 13, 384)       885120    
_________________________________________________________________
max_pooling2d_11 (MaxPooling (None, 13, 13, 384)       0         
_________________________________________________________________
flatten_6 (Flatten)          (None, 64896)            

In [53]:
model.compile(loss='categorical_crossentropy', optimizer=tf.keras.optimizers.SGD(lr=0.01,momentum=0.9))

In [None]:
pf = tf.keras.callbacks.ReduceLROnPlateau()

# Plateau 현상이 일어나면 lr을 줄임.
# 변하지 않으면  (min_delta) lr를 10퍼씩 줄이기