[PAPER SU ALEXNET](https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf)

![title](https://miro.medium.com/max/1400/1*qyc21qM0oxWEuRaj-XJKcw.png)   

Noi facciamo una versione analoga single-device (come se le parti delle due gpus fossero stacked)

![title](https://neurohive.io/wp-content/uploads/2018/10/AlexNet-1.png)

In [2]:
#say no to warnings!
import warnings
warnings.filterwarnings("ignore")
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

In [3]:
import tensorflow as tf
import numpy as np
from tensorflow.keras.backend import clear_session
import matplotlib.pyplot as plt
from tensorflow.keras import Sequential
from tensorflow.keras.layers import InputLayer,Dense,\
    Conv2D,MaxPool2D,BatchNormalization,Flatten,Dropout

In [10]:
clear_session()
alexnet_1 = Sequential([
    InputLayer(input_shape=(227,227,3)),
    Conv2D(96,kernel_size=(11,11),strides=(4,4),activation='relu'),
    MaxPool2D(pool_size=(3,3),strides=(2,2)),
    Conv2D(256,kernel_size=(5,5),strides=(1,1),
           padding='same',activation='relu'),
    MaxPool2D(pool_size=(3,3),strides=(2,2)),
    Conv2D(384,kernel_size=(3,3),strides=(1,1),
           padding='same',activation='relu'),
    Conv2D(384,kernel_size=(3,3),strides=(1,1),
           padding='same',activation='relu'),
    Conv2D(256,kernel_size=(3,3),strides=(1,1),
           padding='same',activation='relu'),
    MaxPool2D(pool_size=(3,3),strides=(2,2)),
    Flatten(),
    Dense(4096,activation='relu'),
    Dense(4096,activation='relu'),
    Dense(1000,activation='softmax')
])

In [11]:
alexnet_1.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 55, 55, 96)        34944     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 27, 27, 96)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 27, 27, 256)       614656    
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 256)       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 13, 13, 384)       885120    
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 13, 13, 384)       1327488   
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 13, 13, 256)       8

In [None]:
#RAGIONIAMO SU OGNI OUTPUT SHAPE ET NUM PARAM
#floor((n_in + 2p - k)/s)+1

In [34]:
clear_session()
w_init = tf.keras.initializers.RandomNormal(stddev=0.01)
alexnet_2 = Sequential([
    InputLayer(input_shape=(227,227,3)),
    Conv2D(96,kernel_size=(11,11),strides=(4,4),
           kernel_initializer=w_init,
           bias_initializer='zeros',activation='relu'),
    BatchNormalization(),
    MaxPool2D(pool_size=(3,3),strides=(2,2)),
    Conv2D(256,kernel_size=(5,5),strides=(1,1),padding='same',
           kernel_initializer=w_init,
           bias_initializer='ones',activation='relu'),
    BatchNormalization(),
    MaxPool2D(pool_size=(3,3),strides=(2,2)),
    Conv2D(384,kernel_size=(3,3),strides=(1,1),padding='same',
           kernel_initializer=w_init,
           bias_initializer='zeros',activation='relu'),
    Conv2D(384,kernel_size=(3,3),strides=(1,1),padding='same',
           kernel_initializer=w_init,
           bias_initializer='ones',activation='relu'),
    Conv2D(256,kernel_size=(3,3),strides=(1,1),padding='same',
           kernel_initializer=w_init,
           bias_initializer='ones',activation='relu'),
    MaxPool2D(pool_size=(3,3),strides=(2,2)),
    Flatten(),
    Dense(4096,kernel_initializer=w_init,
          bias_initializer='ones',activation='relu'),
    Dropout(.5),
    Dense(4096,kernel_initializer=w_init,
          bias_initializer='ones',activation='relu'),
    Dropout(.5),
    Dense(1000,kernel_initializer=w_init,
          bias_initializer='zeros',activation='softmax')
])

In [35]:
alexnet_2.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 55, 55, 96)        34944     
_________________________________________________________________
batch_normalization (BatchNo (None, 55, 55, 96)        384       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 27, 27, 96)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 27, 27, 256)       614656    
_________________________________________________________________
batch_normalization_1 (Batch (None, 27, 27, 256)       1024      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 256)       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 13, 13, 384)       8

<font color='red'>
Our network maximizes the multinomial logistic regression objective, which is equivalent to maximizing the average <br>across training cases of the log-probability of the correct label under the prediction distribution. <br>  
We trained our models using stochastic gradient descent
with a batch size of 128 examples, momentum of 0.9, and
weight decay of 0.0005. <br>We used an equal learning rate for all layers, which we adjusted manually throughout training.</font>

In [36]:
dummy_input = np.random.rand(128,227,227,3)
dummy_output = alexnet_2.predict(dummy_input)
dummy_output.shape

(128, 1000)