Train a model with unstructured data with a minimum of 100 epochs and prove that increasing epoch decreases the loss.


In [7]:
'''
Import the necessary libraries from PyTorch, including the core library (torch),
modules for neural network operations (nn), optimization algorithms (optim), datasets (datasets),
and data loading utilities (DataLoader).
'''
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define a simple neural network model
'''
 A simple neural network model is defined using the PyTorch nn.Module class.
 The model consists of two fully connected layers (fc1 and fc2) with ReLU activation in between (relu).
 The input size is 28x28 (the size of Fashion MNIST images), and the output size is 10 (the number of classes in Fashion MNIST).
 The softmax activation is applied to convert the raw scores into probabilities.
'''
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 128)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, 10)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = x.view(-1, 28 * 28)
        x = self.relu(self.fc1(x))
        x = self.softmax(self.fc2(x))
        return x

# Load Fashion MNIST dataset
'''
Data transformation (transform) that converts the raw image data to PyTorch tensors and normalizes the pixel values.
The Fashion MNIST dataset is loaded, specifying the root directory, training set (train=True), downloading if not present,
and applying the defined transformation. A DataLoader is created to handle loading the data in batches during training.
'''
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

# Instantiate the model, loss function, and optimizer
'''
An instance of the SimpleNN model is created, and the loss function (CrossEntropyLoss) and
optimizer (SGD with a learning rate of 0.01) are defined.
The optimizer will update the model parameters during training to minimize the loss.
'''
model = SimpleNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop
'''
Training loop, iterating over the specified number of epochs (min_epochs).
Within each epoch, it iterates over batches of data from the train_loader.
For each batch, it performs the following steps:

1. Zeroes the gradients (optimizer.zero_grad()).
2. Computes the model's output for the input data.
3. Calculates the loss between the model's output and the target labels.
4. Computes the gradients with respect to the model parameters (loss.backward()).
5. Updates the model parameters using the optimizer (optimizer.step()).
6. Accumulates the total loss for the epoch.

At the end of each epoch, the average loss is computed and printed.
'''
min_epochs = 100
for epoch in range(min_epochs):
    total_loss = 0
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()

    # Print average loss for the epoch
    average_loss = total_loss / len(train_loader)
    print(f'Epoch {epoch + 1}/{min_epochs}, Loss: {average_loss}')

Epoch 1/100, Loss: 2.071138850788572
Epoch 2/100, Loss: 1.823899308628619
Epoch 3/100, Loss: 1.756481065170597
Epoch 4/100, Loss: 1.7253228184510905
Epoch 5/100, Loss: 1.709675774264183
Epoch 6/100, Loss: 1.6994956201835991
Epoch 7/100, Loss: 1.6920565080795207
Epoch 8/100, Loss: 1.6864230001150673
Epoch 9/100, Loss: 1.6819609958988262
Epoch 10/100, Loss: 1.678271442842382
Epoch 11/100, Loss: 1.6751381846379116
Epoch 12/100, Loss: 1.6724315638989529
Epoch 13/100, Loss: 1.6699648742228428
Epoch 14/100, Loss: 1.66802790703804
Epoch 15/100, Loss: 1.6660615797998555
Epoch 16/100, Loss: 1.6643550190081728
Epoch 17/100, Loss: 1.6628946905959643
Epoch 18/100, Loss: 1.6615044810116164
Epoch 19/100, Loss: 1.660192292255125
Epoch 20/100, Loss: 1.6591070629894606
Epoch 21/100, Loss: 1.657945504956154
Epoch 22/100, Loss: 1.656999347814873
Epoch 23/100, Loss: 1.655990533991409
Epoch 24/100, Loss: 1.6550357510794456
Epoch 25/100, Loss: 1.6541442074247006
Epoch 26/100, Loss: 1.6533348111709807
Epoch 

As we can see, loss values decrease as number of epochs increases

Q1.Linear regression vs Logistic regression. Model?


Linear regression and logistic regression are both popular statistical methods used in the field of machine learning, but they serve different purposes and are suited for different types of problems.

**Linear Regression:**
**Purpose:**

* Linear regression is used for predicting a continuous dependent variable based on one or more independent variables.
* It establishes a linear relationship between the input variables (independent variables) and the output variable (dependent variable).

**Output:**

* The output of linear regression is a continuous value. For example, predicting house prices, temperature, sales revenue, etc.

**Equation:**

* The equation of a simple linear regression model is typically represented as:
y=mx+b, where
* y is the dependent variable,
* x is the independent variable,
* m is the slope, and
* b is the y-intercept.

**Logistic Regression:**

**Purpose:**

* Logistic regression is used for binary classification problems, where the outcome variable is categorical and has only two possible classes (0 or 1).
* It is also used for probability estimation in multi-class classification problems.

**Output:**

* The output of logistic regression is a probability that the given input belongs to a particular class. The logistic function (sigmoid function) is used to map the linear combination of input features to a value between 0 and 1.

**Equation:**

The logistic regression model uses the logistic function, and the basic equation is
p=1/(1+e^(-(mx+b))), where
p is the probability of the positive class.

**Summary:**
* Linear regression is used for regression problems, predicting continuous values, and has a linear relationship between variables.
* Logistic regression is used for classification problems, predicting binary outcomes, and involves the logistic function to model probabilities.

**Q2. Importance of batch size while training.**



The choice of batch size is a crucial hyperparameter in training machine learning models, especially in the context of deep learning. The batch size determines how many samples from the dataset are used in each iteration of training. Here are some key considerations for the importance of batch size:

**Memory Usage:**

**Smaller Batch Sizes:** Use less memory during training. This is important when dealing with large datasets that may not fit entirely into memory.
**Larger Batch Sizes:** Utilize more memory but may lead to faster training times, especially on hardware optimized for larger batch sizes.

**Computational Efficiency:**

**Smaller Batch Sizes:** Require more frequent updates to the model's weights, potentially leading to a more "noisy" training process.
**Larger Batch Sizes:** Benefit from vectorized operations and parallel processing, potentially accelerating training on hardware like GPUs.

**Generalization:**

**Smaller Batch Sizes:** The model may generalize better as it updates its weights more frequently and sees a greater variety of samples in each epoch. It can be seen as a form of regularization.
**Larger Batch Sizes:** May converge faster, but there's a risk of overfitting, as the model might not see as much variety in each update.

**Stochasticity and Noise:**

* Smaller Batch Sizes: Introduce **more randomness** into the weight updates, which can help escape local minima and explore the loss landscape more thoroughly.

* Larger Batch Sizes: Provide a **more stable** and deterministic training process, but might converge to suboptimal solutions if the loss landscape is complex.

**Parallelism:**

* Smaller Batch Sizes: Limit parallelism as each batch must be processed sequentially. This might be a concern on hardware like GPUs that are highly parallelizable.
* Larger Batch Sizes: Allow for greater parallelism, which can significantly speed up training on hardware with parallel processing capabilities.
Convergence and Training Dynamics:

**Conclusion:**

* Smaller Batch Sizes: Might require **more epochs to converge** but can exhibit better convergence dynamics, especially in the early stages of training.
* Larger Batch Sizes: May **converge more quickly** but could experience abrupt changes in the loss landscape.

* The optimal batch size depends on various factors, including the dataset size, model architecture, available hardware, and the specific characteristics of the problem you're solving. It's common to experiment with different batch sizes to find the one that balances computational efficiency with model performance.

Q3. What is cross entropy? Accuracy vs Loss function.



**Cross Entropy:**

Cross entropy is a loss function commonly used in machine learning for classification problems. It measures the performance of a classification model whose output is a probability value between 0 and 1. Cross entropy increases as the predicted probability diverges from the actual label.
The goal during training is to minimize the cross entropy, which is equivalent to maximizing the likelihood of the true labels given the predicted probabilities.

**Accuracy vs. Loss Function:**

**Accuracy:** Accuracy is a metric that measures the overall correctness of your model. It is the ratio of correctly predicted instances to the total instances. While accuracy is a commonly used metric, it might not be suitable for all scenarios. For example, in imbalanced datasets, high accuracy can be achieved by simply predicting the majority class.

**Loss Function:** The loss function, such as cross entropy, is used during the training phase to guide the model to make better predictions. It quantifies the difference between the predicted values and the actual values. The model adjusts its parameters to minimize this difference.

**In summary:**

Accuracy is a performance metric used for evaluation, especially in the context of classification. It provides a simple and intuitive measure of how well your model is doing overall.

Loss functions, like cross entropy, are used during training to optimize the model. They provide a gradient that helps adjust the model parameters to improve predictions. The goal is to minimize the loss during training, which ideally leads to better accuracy on unseen data.

**Q4. Importance of hidden layer. Deep Neural Network.
Practical:Train the same model of Part 1 with one hidden layer.Document the performance improvement on using this layer.**

The addition of a hidden layer can potentially improve the model's capacity to learn complex patterns in the data.

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define a neural network model with one hidden layer
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 128)  # Input layer to hidden layer
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, 10)  # Hidden layer to output layer
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = x.view(-1, 28 * 28)
        x = self.relu(self.fc1(x))
        x = self.softmax(self.fc2(x))
        return x

# Load Fashion MNIST dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

# Instantiate the model, loss function, and optimizer
model = NeuralNetwork()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop
min_epochs = 100
previous_loss = float('inf')  # Initialize with a high loss for performance improvement calculation
for epoch in range(min_epochs):
    total_loss = 0
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()

    # Print average loss for the epoch
    average_loss = total_loss / len(train_loader)
    print(f'Epoch {epoch + 1}/{min_epochs}, Loss: {average_loss}')

    # Calculate and print performance improvement
    improvement = previous_loss - average_loss
    print(f'Performance Improvement: {improvement}')
    previous_loss = average_loss


Epoch 1/100, Loss: 2.0802651319676624
Performance Improvement: inf
Epoch 2/100, Loss: 1.8011818303228186
Performance Improvement: 0.27908330164484374
Epoch 3/100, Loss: 1.7432845383564801
Performance Improvement: 0.05789729196633853
Epoch 4/100, Loss: 1.720992951886232
Performance Improvement: 0.022291586470248204
Epoch 5/100, Loss: 1.7072666141270065
Performance Improvement: 0.013726337759225382
Epoch 6/100, Loss: 1.6979074212533833
Performance Improvement: 0.00935919287362319
Epoch 7/100, Loss: 1.690930914268819
Performance Improvement: 0.006976506984564423
Epoch 8/100, Loss: 1.685619274309195
Performance Improvement: 0.005311639959624026
Epoch 9/100, Loss: 1.681250907846097
Performance Improvement: 0.004368366463097795
Epoch 10/100, Loss: 1.6775874342999733
Performance Improvement: 0.003663473546123841
Epoch 11/100, Loss: 1.6745298452723
Performance Improvement: 0.0030575890276731688
Epoch 12/100, Loss: 1.671976323066744
Performance Improvement: 0.002553522205556158
Epoch 13/100, Lo

Q5. Training a model?


Example of training a neural network using PyTorch on the Fashion MNIST dataset. This example uses a basic neural network with one hidden layer.This code defines a simple neural network, loads the Fashion MNIST dataset, and trains the model using stochastic gradient descent (SGD) as the optimizer and cross-entropy loss as the loss function. The training loop iterates through the dataset for a specified number of epochs, printing the loss every 100 batches.

In [9]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define a simple neural network model
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 128)  # Input layer to hidden layer
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, 10)  # Hidden layer to output layer
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = x.view(-1, 28 * 28)
        x = self.relu(self.fc1(x))
        x = self.softmax(self.fc2(x))
        return x

# Load Fashion MNIST dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

# Instantiate the model, loss function, and optimizer
model = SimpleNN()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop
epochs = 10  # You can adjust the number of epochs
for epoch in range(epochs):
    running_loss = 0.0
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

        if batch_idx % 100 == 99:  # Print every 100 batches
            print(f'Epoch {epoch + 1}, Batch {batch_idx + 1}, Loss: {running_loss / 100:.4f}')
            running_loss = 0.0

print('Training finished!')


Epoch 1, Batch 100, Loss: 2.2911
Epoch 1, Batch 200, Loss: 2.2562
Epoch 1, Batch 300, Loss: 2.1953
Epoch 1, Batch 400, Loss: 2.1356
Epoch 1, Batch 500, Loss: 2.0811
Epoch 1, Batch 600, Loss: 2.0290
Epoch 1, Batch 700, Loss: 1.9906
Epoch 1, Batch 800, Loss: 1.9386
Epoch 1, Batch 900, Loss: 1.8957
Epoch 2, Batch 100, Loss: 1.8613
Epoch 2, Batch 200, Loss: 1.8355
Epoch 2, Batch 300, Loss: 1.8217
Epoch 2, Batch 400, Loss: 1.8016
Epoch 2, Batch 500, Loss: 1.7867
Epoch 2, Batch 600, Loss: 1.7757
Epoch 2, Batch 700, Loss: 1.7796
Epoch 2, Batch 800, Loss: 1.7690
Epoch 2, Batch 900, Loss: 1.7604
Epoch 3, Batch 100, Loss: 1.7564
Epoch 3, Batch 200, Loss: 1.7504
Epoch 3, Batch 300, Loss: 1.7506
Epoch 3, Batch 400, Loss: 1.7430
Epoch 3, Batch 500, Loss: 1.7417
Epoch 3, Batch 600, Loss: 1.7350
Epoch 3, Batch 700, Loss: 1.7345
Epoch 3, Batch 800, Loss: 1.7304
Epoch 3, Batch 900, Loss: 1.7275
Epoch 4, Batch 100, Loss: 1.7257
Epoch 4, Batch 200, Loss: 1.7237
Epoch 4, Batch 300, Loss: 1.7264
Epoch 4, B

Q6. Necessity of GPU in training. Device parameter? Practical:Compare the training times on a CPU vs. GPU



The necessity of a GPU (Graphics Processing Unit) in training deep learning models depends on various factors, and it's not strictly mandatory but highly beneficial in many cases. Benefits of GPU compared to CPU:

**Speedup in Training Time:**

* GPU Benefit: GPUs are highly parallel processors, making them well-suited for the large-scale matrix operations involved in training deep neural networks. Training times can be significantly reduced when using a GPU compared to a CPU.
* CPU Impact: Training large models on CPUs can be computationally expensive and time-consuming.

**Model Size and Complexity:**

* GPU Benefit: Larger and more complex models, which have become common in deep learning, benefit more from GPU acceleration. This is because GPUs can handle the increased computational demands more efficiently than CPUs.
* CPU Impact: Smaller models may not see as much benefit from GPU acceleration, and training on CPUs might be sufficient for such cases.

**Data Size:**

* GPU Benefit: Handling large datasets is more efficient on GPUs due to their parallel processing capabilities. Loading and processing batches of data can be performed in parallel, leading to faster training.
* CPU Impact: CPUs may struggle with the parallel processing demands of large datasets, potentially leading to slower training.

**Memory Requirements:**

* GPU Benefit: GPUs typically have more memory than CPUs, which is important when dealing with large models and datasets.
* CPU Impact: Memory limitations on CPUs may restrict the size of models or datasets that can be effectively handled.

**Deep Learning Frameworks:**

* GPU Benefit: Many deep learning frameworks, such as TensorFlow and PyTorch, are optimized for GPU usage. Operations can be automatically offloaded to the GPU, resulting in faster computations.
* CPU Impact: While these frameworks can run on CPUs, they may not leverage the full potential of the hardware.

**Device Parameter in PyTorch:**
* In PyTorch, the device parameter is used to specify whether the computation should be performed on a CPU or a GPU. Common values for the device parameter are:
* "cpu": Indicates that the computation should be performed on the CPU.
* "cuda": Indicates that the computation should be performed on the GPU.

In [6]:
import torch
import time
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader
epochs=50

# Set device to CPU
device_cpu = torch.device("cpu")

# Load Fashion MNIST dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

# Instantiate a simple neural network model
model = models.resnet18()  # Using ResNet18 as an example
num_ftrs = model.fc.in_features
model.fc = torch.nn.Linear(num_ftrs, 10)  # Modify the fully connected layer for Fashion MNIST

# Training on CPU
start_time_cpu = time.time()
for epoch in range(epochs):
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device_cpu), target.to(device_cpu)
        # Forward pass, backward pass, and optimization here
end_time_cpu = time.time()
elapsed_time_cpu = end_time_cpu - start_time_cpu

# Set device to GPU
device_gpu = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load Fashion MNIST dataset again for the GPU case
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

# Move the model to GPU
model.to(device_gpu)

# Training on GPU
start_time_gpu = time.time()
for epoch in range(epochs):
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device_gpu), target.to(device_gpu)
        # Forward pass, backward pass, and optimization here
end_time_gpu = time.time()
elapsed_time_gpu = end_time_gpu - start_time_gpu

print(f"Training time on CPU: {elapsed_time_cpu} seconds")
print(f"Training time on GPU: {elapsed_time_gpu} seconds")


Training time on CPU: 615.3157975673676 seconds
Training time on GPU: 611.4228255748749 seconds


Training time of GPU is less compared to CPU