**Plan**

**1. Introduction to neural networks**
    
**2. Creating a simple neural network**
    
**3. Activation functions**
    
**4. Loss functions and optimization**



# **Introduction to neural networks**

Neural networks, inspired by the human brain, are a fundamental concept in machine learning, particularly in deep learning. They are used to model complex patterns and make predictions based on data. PyTorch, an open-source machine learning library developed by Facebook's AI Research lab, provides an intuitive and flexible way to build and train neural networks. This introduction covers the basics of neural networks and how to implement them using PyTorch.



**Key components:**

A neural network consists of layers of interconnected nodes (neurons). Each neuron receives inputs, processes them, and passes the result to the next layer. The most basic type of neural network is the feedforward neural network, where connections do not form cycles.

- **Neurons:** Basic units of a neural network.
- **Layers:** Combinations of neurons; typically include input, hidden, and output layers.
- **Weights and Biases:** Parameters that the network learns during training.
- **Activation Function:** Applies a non-linear transformation to the input, enabling the network to learn complex patterns.

**Common activation functions:**
- **Sigmoid:** $\sigma(x) = \frac{1}{1 + e^{-x}} $
- **ReLU (Rectified Linear Unit):** $ \text{ReLU}(x) = \max(0, x) $
- **Tanh:** $ \text{tanh}(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} $


# **Creating a simple neural network**

Creating a simple neural network with PyTorch involves several steps: setting up the environment, defining the network architecture, preparing the data, and training and evaluating the model. Below, I'll walk you through the entire process with a simple example.

**Step 1: Setup**

First, make sure you have PyTorch installed. If not, you can install it using:

In [None]:
! pip install torch torchvision

**Step 2: Import Libraries**

In [5]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import datasets, transforms

**Step 3: Define the Neural Network**

Here, we define a simple feedforward neural network with one hidden layer:

In [None]:
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 128)  # 784 input features (28x28 pixels), 128 hidden units
        self.fc2 = nn.Linear(128, 10)       # 128 hidden units, 10 output classes (digits 0-9)

    def forward(self, x):
        x = x.view(-1, 28 * 28)  # Flatten the input tensor
        x = F.relu(self.fc1(x))  # Apply ReLU activation
        x = self.fc2(x)          # Output layer
        return F.log_softmax(x, dim=1)  # Apply log-softmax

**Step 4: Prepare the Data**

Load and preprocess the MNIST dataset using torchvision:

In [None]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

train_dataset = datasets.MNIST('.', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)

test_dataset = datasets.MNIST('.', train=False, download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=1000, shuffle=False)


In [7]:
train_dataset

Dataset MNIST
    Number of datapoints: 60000
    Root location: .
    Split: Train
    StandardTransform
Transform: Compose(
               ToTensor()
               Normalize(mean=(0.1307,), std=(0.3081,))
           )

In [9]:
for images, target in train_loader:
  break

In [11]:
images.shape

torch.Size([64, 1, 28, 28])

In [12]:
target.shape

torch.Size([64])

In [13]:
target

tensor([6, 9, 0, 0, 1, 1, 7, 9, 1, 9, 0, 9, 1, 9, 1, 0, 0, 7, 7, 9, 1, 6, 8, 9,
        0, 0, 7, 0, 6, 0, 6, 3, 3, 5, 8, 2, 3, 7, 7, 4, 2, 1, 4, 1, 4, 1, 8, 0,
        2, 5, 8, 6, 4, 7, 2, 7, 1, 1, 3, 9, 2, 0, 5, 1])

**Step 5: Initialize the Model, Loss Function, and Optimizer**

In [None]:
model = SimpleNN()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

**Step 6: Train the Model**

Define the training loop:

In [None]:
def train(model, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 100 == 0:
            print(f'Train Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)} '
                  f'({100. * batch_idx / len(train_loader):.0f}%)]\tLoss: {loss.item():.6f}')

**Step 7: Test the Model**

Define the testing loop:

In [None]:
def test(model, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # Sum batch losses
            pred = output.argmax(dim=1, keepdim=True)  # Get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    print(f'\nTest set: Average loss: {test_loss:.4f}, Accuracy: {correct}/{len(test_loader.dataset)} '
          f'({100. * correct / len(test_loader.dataset):.0f}%)\n')

**Step 8: Run Training and Testing**

Train and test the model for a specified number of epochs:

In [None]:
for epoch in range(1, 11):
    train(model, train_loader, optimizer, epoch)
    test(model, test_loader)

# **Activation functions**

In PyTorch, activation functions are available in the torch.nn and torch.nn.functional modules. These functions introduce non-linearity into the model, enabling it to learn complex patterns. Here are some commonly used activation functions and how to use them in PyTorch:

**1. Sigmoid**

The sigmoid function maps the input to a value between 0 and 1. It is often used in binary classification problems.

In [None]:
import torch.nn.functional as F

x = torch.tensor([-1.0, 0.0, 1.0])
y = torch.sigmoid(x)
# or
y = F.sigmoid(x)

**2. Tanh (Hyperbolic Tangent)**

The tanh function maps the input to a value between -1 and 1. It is zero-centered, making it an improvement over the sigmoid function.

In [None]:
y = torch.tanh(x)
# or
y = F.tanh(x)

**3. ReLU (Rectified Linear Unit)**

The ReLU function is the most commonly used activation function in deep learning models. It outputs zero if the input is negative; otherwise, it outputs the input.

In [None]:
y = torch.relu(x)
# or
y = F.relu(x)

**4. Leaky ReLU**

The Leaky ReLU function allows a small, non-zero gradient when the input is negative, which helps mitigate the dying ReLU problem.

In [None]:
leaky_relu = nn.LeakyReLU(negative_slope=0.01)
y = leaky_relu(x)
# or
y = F.leaky_relu(x, negative_slope=0.01)

**5. ELU (Exponential Linear Unit)**

The ELU function helps with the vanishing gradient problem by allowing the outputs to be negative and have non-zero gradients.

In [None]:
elu = nn.ELU(alpha=1.0)
y = elu(x)
# or
y = F.elu(x, alpha=1.0)

**6. Softmax**

The Softmax function is often used in the output layer of a classification network to represent probabilities. It outputs a vector that sums to 1.

In [None]:
y = torch.softmax(x, dim=1)
# or
y = F.softmax(x, dim=1)

**7. Log Softmax**

The Log Softmax function applies the logarithm to the Softmax output, which is useful for numerical stability in conjunction with the negative log-likelihood loss.

In [None]:
y = torch.log_softmax(x, dim=1)
# or
y = F.log_softmax(x, dim=1)

**Example: Using Activation Functions in a Neural Network**

Here's an example of a simple neural network using different activation functions:

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 128)  # Input layer to hidden layer
        self.fc2 = nn.Linear(128, 64)       # Hidden layer to another hidden layer
        self.fc3 = nn.Linear(64, 10)        # Hidden layer to output layer

    def forward(self, x):
        x = x.view(-1, 28 * 28)  # Flatten the input tensor

        # Using ReLU activation function
        x = F.relu(self.fc1(x))

        # Using Leaky ReLU activation function
        x = F.leaky_relu(self.fc2(x), negative_slope=0.01)

        # Using Log Softmax activation function for the output layer
        x = F.log_softmax(self.fc3(x), dim=1)
        return x

# Create a model instance
model = SimpleNN()

# Print the model architecture
print(model)


- `self.fc1`, `self.fc2`, `self.fc3`: Define fully connected layers.
- `F.relu(self.fc1(x))`: Apply the ReLU activation function after the first fully connected layer.
- `F.leaky_relu(self.fc2(x), negative_slope=0.01)`: Apply the Leaky ReLU activation function after the second fully connected layer.
- `F.log_softmax(self.fc3(x), dim=1)`: Apply the Log Softmax activation function to the output layer, which is typical for classification tasks.

These activation functions can be mixed and matched within the network to best suit the problem you're tackling. Experimenting with different activation functions and their placements within your network can significantly affect the performance of your model.

# **Loss functions**

Loss functions, also known as cost functions or objective functions, measure the difference between the predicted output of a neural network and the actual target values. In PyTorch, loss functions are available in the torch.nn module. They are crucial for training neural networks, as they guide the optimization process by providing a measure of how well the network is performing.

**1. Mean Squared Error Loss (MSELoss)**

Used for regression tasks, it calculates the average of the squared differences between predicted and actual values.

In [14]:
loss_fn = nn.MSELoss()
predicted = torch.tensor([0.5, 0.8], requires_grad=True)
target = torch.tensor([0.3, 1.0])
loss = loss_fn(predicted, target)
print(loss)

tensor(0.0400, grad_fn=<MseLossBackward0>)


In [16]:
print(loss.item())

0.03999999538064003


**2. Binary Cross-Entropy Loss (BCELoss)**

Used for binary classification tasks, it measures the binary cross-entropy loss between the predicted and actual values.

In [19]:
loss_fn = nn.BCELoss()
predicted = torch.tensor([0.5, 0.8], requires_grad=True)
target = torch.tensor([0.0, 1.0])
loss = loss_fn(predicted, target)
print(loss)

tensor(0.4581, grad_fn=<BinaryCrossEntropyBackward0>)


**3. Binary Cross-Entropy with Logits Loss (BCEWithLogitsLoss)**

Combines a Sigmoid layer and the BCELoss in one single class. This is more numerically stable than using a plain Sigmoid followed by a BCELoss (**no need to define explicitly sigmoid activation in the classification layer**).

In [20]:
loss_fn = nn.BCEWithLogitsLoss()
predicted = torch.tensor([0.5, 0.8], requires_grad=True)
target = torch.tensor([0.0, 1.0])
loss = loss_fn(predicted, target)
print(loss)

tensor(0.6726, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)


**4. Cross-Entropy Loss (CrossEntropyLoss)**

Used for multi-class classification tasks. It combines LogSoftmax and NLLLoss in one single class (**no need to define explicitly LogSoftmax activation in the classification layer**).

In [21]:
loss_fn = nn.CrossEntropyLoss()
predicted = torch.tensor([[2.0, 1.0, 0.1]], requires_grad=True)
target = torch.tensor([0])
loss = loss_fn(predicted, target)
print(loss)

tensor(0.4170, grad_fn=<NllLossBackward0>)


**5. Negative Log-Likelihood Loss (NLLLoss)**

Used for classification tasks with LogSoftmax output.

In [22]:
loss_fn = nn.NLLLoss()
predicted = torch.tensor([[2.0, 1.0, 0.1]], requires_grad=True).log_softmax(dim=1)
target = torch.tensor([0])
loss = loss_fn(predicted, target)
print(loss)

tensor(0.4170, grad_fn=<NllLossBackward0>)


**6. Smooth L1 Loss**

Combines the benefits of L1 and L2 loss. It is less sensitive to outliers than MSELoss.

In [None]:
loss_fn = nn.SmoothL1Loss()
predicted = torch.tensor([0.5, 0.8], requires_grad=True)
target = torch.tensor([0.3, 1.0])
loss = loss_fn(predicted, target)
print(loss)

**7. Huber Loss**

Another loss function that is less sensitive to outliers in data than the squared error loss.

In [None]:
loss_fn = nn.HuberLoss()
predicted = torch.tensor([0.5, 0.8], requires_grad=True)
target = torch.tensor([0.3, 1.0])
loss = loss_fn(predicted, target)
print(loss)

**Example: Using Loss Functions in a Neural Network**

Here’s an example of a neural network for a classification task using CrossEntropyLoss:

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

# Define a simple neural network
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = x.view(-1, 28 * 28)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Prepare the data
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

train_dataset = datasets.MNIST('.', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)

# Initialize the model, loss function, and optimizer
model = SimpleNN()
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

# Train the model
def train(model, train_loader, loss_fn, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 100 == 0:
            print(f'Train Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)} '
                  f'({100. * batch_idx / len(train_loader):.0f}%)]\tLoss: {loss.item():.6f}')

for epoch in range(1, 11):
    train(model, train_loader, loss_fn, optimizer, epoch)

**Explanation**

- **Model Definition:** A simple feedforward neural network with two hidden layers and ReLU activations.
- **Data Preparation:** MNIST dataset with normalization.
- **Loss Function:** CrossEntropyLoss is used for the classification task.
- **Training Loop:** Trains the model using the training dataset, computes the loss, performs backpropagation, and updates the model parameters using the optimizer.

By using the appropriate loss function for your task, you can ensure that your neural network is trained effectively to minimize the error and improve its performance on the given problem.


# **Optimization functions**

Optimization is a crucial part of training neural networks, as it involves adjusting the model parameters to minimize the loss function. PyTorch provides various optimization algorithms through the `torch.optim` module. Here's a guide to using optimizers in PyTorch.

1. **Stochastic Gradient Descent (SGD)**
   The simplest optimization algorithm that updates the parameters using the gradient of the loss function.
   ```python
   optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
   ```

2. **SGD with Momentum**
   Improves upon SGD by adding a momentum term to help accelerate gradients vectors in the right directions.
   ```python
   optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
   ```

3. **Adam (Adaptive Moment Estimation)**
   Combines the advantages of two other extensions of SGD, AdaGrad and RMSProp. It works well on large datasets and is computationally efficient.
   ```python
   optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
   ```

4. **RMSprop**
   An adaptive learning rate method that divides the learning rate by an exponentially decaying average of squared gradients.
   ```python
   optimizer = torch.optim.RMSprop(model.parameters(), lr=0.001)
   ```

5. **Adagrad (Adaptive Gradient Algorithm)**
   Adaptively scales the learning rate for each parameter based on the historical gradient information.
   ```python
   optimizer = torch.optim.Adagrad(model.parameters(), lr=0.01)
   ```

6. **Adadelta**
   An extension of Adagrad that seeks to reduce its aggressive, monotonically decreasing learning rate.
   ```python
   optimizer = torch.optim.Adadelta(model.parameters(), lr=1.0)
   ```

7. **AdamW**
   Similar to Adam, but with correct weight decay, which helps prevent overfitting.
   ```python
   optimizer = torch.optim.AdamW(model.parameters(), lr=0.001)
   ```
