### Introduction :
AlexNet is the convolutional neural network that was designed by Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton. AlexNet famously won the ImageNet Large Scale Visual Recognition Challenge(ILSVRC) competition in 2012 by a huge margin, and set the benchmark for the image classification task. Several different techniques used for the first time in that architecture, that recognizes the model as one of the biggest breakthrough in computer vision.

### Architecture of AlexNet
AlexNet consists of eight layers, including five convolutional layers and three fully connected layers. It introduces several innovative techniques that helped improve performance and training efficiency.

### ImageNet Dataset :
* ImageNet is an image database of over 15 million labeled images belonging to roughly 22,000 categories. In ILSVRC competition, a subset of ImageNet database was used, where a roughly 1000 images in each of 1000 categories was present. Over all, there were roughly 1.2 million training images, 50,000 validation images and 1,50,000 testing images.
* The output of the last fully-connected layer is fed to a 1000-way(as the no. of class in the dataset is 1000) softmax which produces a distribution over the 1000 class-labels.
* **Overlapping max pooling layers follow both 1st, 2nd and 5th convolutional layer.**
* **At the end of each layer, ReLu activation is performed**, except for the last fully connected layer, which **outputs with a softmax distribution over the 1000 class labels.**
![AlexNet.webp](attachment:AlexNet.webp)

### Steps of AlexNet

**1. Input Layer:**
* **Input Size:** The Input for AlexNet is a 227x227x3 RGB image (Note that in original paper it was taken as 224x224x3 RGB image, which was a mistake, it should be 227x227x3 RGB image as input.)
* **Description:** The input layer receives a color image resized to 224x224 pixels.

**2. First Convolutional Layer (Conv1):**

* **Filter Size:** 11x11
* **Stride:** 4
* **Number of Filters:** 96
* **Activation Function:** ReLU
* **Description:** Applies 96 convolutional filters, each of size 11x11, with a stride of 4 and padding of 2, followed by ReLU activation.
* **Output:** 55x55x96 feature maps
* **Normalization:** Local Response Normalization (LRN)
* **Pooling:** Max pooling with 3x3 filter size and stride of 2, reducing the spatial dimensions to 27x27x96.

**3. Second Convolutional Layer (Conv2):**

* **Filter Size:** 5x5
* **Stride:** 1
* **Padding:** 2
* **Number of Filters:** 256
* **Activation Function:** ReLU
* **Description:** Applies 256 convolutional filters of size 5x5, with stride 1 and padding 2, followed by ReLU activation.
* **Output:** 27x27x256 feature maps
* **Normalization:** Local Response Normalization (LRN)
* **Pooling:** Max pooling with 3x3 filter size and stride of 2, reducing the spatial dimensions to 13x13x256.

**4. Third Convolutional Layer (Conv3):**

* **Filter Size:** 3x3
* **Stride:** 1
* **Padding:** 1
* **Number of Filters:** 384
* **Activation Function:** ReLU
* **Description:** Applies 384 convolutional filters of size 3x3, with stride 1 and padding 1, followed by ReLU activation.
* **Output:** 13x13x384 feature maps

**5. Fourth Convolutional Layer (Conv4):**

* **Filter Size:** 3x3
* **Stride:** 1
* **Padding:** 1
* **Number of Filters:** 384
* **Activation Function:** ReLU
* **Description:** Applies 384 convolutional filters of size 3x3, with stride 1 and padding 1, followed by ReLU activation.
* **Output:** 13x13x384 feature maps

**6. Fifth Convolutional Layer (Conv5):**

* **Filter Size:** 3x3
* **Stride:** 1
* **Padding:** 1
* **Number of Filters:** 256
* **Activation Function:** ReLU
* **Description:** Applies 256 convolutional filters of size 3x3, with stride 1 and padding 1, followed by ReLU activation.
* **Output:** 13x13x256 feature maps
* **Pooling:** Max pooling with 3x3 filter size and stride of 2, reducing the spatial dimensions to 6x6x256.

**7. First Fully Connected Layer (FC6):**

* **Number of Neurons:** 4096
* **Activation Function:** ReLU
* **Description:** Flattens the output from the last convolutional layer and applies a fully connected layer with 4096 neurons and ReLU activation.
* **Dropout:** Dropout with a rate of 0.5 to prevent overfitting.

**8. Second Fully Connected Layer (FC7):**

* **Number of Neurons:** 4096
* **Activation Function:** ReLU
* **Description:** Another fully connected layer with 4096 neurons and ReLU activation.
* **Dropout:** Dropout with a rate of 0.5 to prevent overfitting.

**9. Output Layer (FC8):**

* **Number of Neurons:** 1000 (for 1000 ImageNet classes)
* **Activation Function:** Softmax
* **Description:** Final fully connected layer with 1000 neurons and softmax activation to produce class probabilities.

### Major Highlighted Point in AlexNet :
* Used ReLu instead of tanh
* To solve overfitting problem : Dropout and Augmentation
* Overlapped Pooling

### Activation Function Used in AlexNet
* ReLU (Rectified Linear Unit):
    * Used after each convolutional and fully connected layer (except the output layer).
    * Formula: ReLU(x)= max(0,x)
    * Advantages: Introduces non-linearity, mitigates the vanishing gradient problem, and accelerates convergence.

### Methods to Avoid Overfitting in AlexNet

**1. Dropout:**
* Applied after the first two fully connected layers (FC6 and FC7) with a rate of 0.5.
* Prevents neurons from co-adapting too much by randomly setting a fraction of activations to zero during training.

**2. Data Augmentation:**
* Randomly cropping, flipping, and altering the brightness and contrast of the training images.
* Increases the diversity of the training data and helps the model generalize better.

**3. Weight Decay (L2 Regularization):** 
* Adds a penalty to the loss function proportional to the sum of the squares of the weights.
* Helps to regularize the model by discouraging large weights.

### Advantages of AlexNet

**1. High Performance:**
* Achieved a breakthrough performance on the ImageNet challenge, showcasing the power of deep learning.

**2. Introduced ReLU Activation:**
* Popularized the use of ReLU, which significantly accelerates training compared to sigmoid or tanh activations.

**3. Effective Regularization:**
* Uses dropout and data augmentation effectively to prevent overfitting.

**4. Deep Architecture:**
* Demonstrated that deeper networks can learn more complex features and improve classification performance.


### Disadvantages of AlexNet

**1. High Computational Cost:**
* Requires significant computational resources for training, including GPUs.

**2. Overfitting:**
* Large Number of Parameters:
    * With around 60 million parameters, AlexNet was prone to overfitting, especially when trained on relatively small datasets.
    * The use of data augmentation and dropout helped mitigate this, but the risk of overfitting was still significant.

**3. Complexity:**
* The architecture is relatively complex, requiring careful tuning of hyperparameters and regularization techniques.

**4. Outdated Compared to Modern Architectures:**
* Modern architectures like VGG, ResNet, and EfficientNet have surpassed AlexNet in performance and efficiency.