<a href="https://colab.research.google.com/github/ShaliniAnandaPhD/Bits-To-Models-From-Orbit/blob/main/Hour_6_Building_Prediction_Models_with_PyTorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Overview of Supervised Learning (2 mins)

Supervised learning is a branch of machine learning and artificial intelligence that trains algorithms based on labeled datasets. The goal is to map the relationship between inputs and desired outputs. [Learn more about supervised learning.](https://www.ibm.com/cloud/learn/supervised-learning)

The main elements of supervised learning algorithms include:

- **Inputs (Features):** The variables or data points used to make predictions.
- **Output (Target):** The value or result you want the model to predict.
- **Model:** A mathematical function representing the relationship between inputs and outputs.
- **Training Data:** A dataset containing input-output pairs used to train the model.
- **Loss Function:** Measures the model’s prediction errors.
- **Optimization Algorithm**: Iteratively improves the model by minimizing the loss function.

The two most common types of supervised learning are:

- **Regression:** Predicts continuous numerical outcomes.
- **Classification:** Categorizes data points into different classes.

The typical supervised learning workflow includes:

1. Collect and preprocess the training data.
2. Split data into training and test sets.
3. Train the model on the training set.
4. Evaluate model performance on the test set.
5. Refine the model hyperparameters and repeat the process.

### Introduction to PyTorch (5 mins)

[PyTorch](https://pytorch.org/) is an open-source machine learning library for Python based on Torch, developed by Facebook's AI research group. Some key features include:

- **Dynamic Computational Graphs:** Enables more flexible modeling compared to static graphs.
- **GPU Acceleration:** Leverages GPUs for faster training and inference.  
- **Extensive Libraries and Tools:** Provides modules for computer vision, NLP, reinforcement learning, etc.

The core PyTorch components are:

- **Tensors:** Primary data structures similar to NumPy arrays.
- **Autograd:** Automatic differentiation for computing gradients.
- **NN Modules:** Neural network layers, loss functions, and optimizers.

### PyTorch Tensors, Autograd, NN Module

**Tensors** are a key data structure in PyTorch. They support various data types and can be easily moved between CPUs and GPUs. [Learn more about PyTorch tensors.](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html)

**Autograd** enables automatic differentiation of PyTorch operations. It records performed operations to compute gradients needed for gradient-based optimization. [Learn more about PyTorch's Autograd.](https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html)

The **NN Module** contains utilities for building and training neural networks including predefined layers, loss functions, and optimization algorithms. [Learn more about PyTorch's NN Module.](https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)

### Quiz on Supervised Learning and PyTorch Fundamentals (3 mins)

1. What type of machine learning algorithm is supervised learning?
  
   A) Unsupervised learning
   
   B) Reinforcement learning
    
   C) Semi-supervised learning
    
   D) **Supervised learning**
   
2. Which PyTorch component enables automatic differentiation?

   A) Tensors
    
   B) **Autograd**
    
   C) NN Module
    
   D) TorchVision
   
3. What type of problem would regression analysis be used for in supervised learning?

   A) Image classification
    
   B) **Predicting a continuous value**
    
   C) Clustering data
    
   D) Game playing
   
4. Which is NOT a feature of PyTorch?

   A) Dynamic computational graphs
    
   B) Extensive libraries and tools
    
   C) **Static computational graphs**
    
   D) GPU acceleration



Section 1: Supervised Learning Concepts (15 mins)

In [None]:
# Split data into train/val/test sets
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

In [None]:
# Define loss function and optimizer
loss_fn = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

In [None]:
# L2 regularization
reg_loss = 0
for param in model.parameters():
  reg_loss += torch.sum(param**2)

loss = loss_fn(y_pred, y_train) + lambda * reg_loss

1. **What is the main difference between supervised and unsupervised learning?**
   *Answer: In supervised learning, models are trained using labeled data, meaning each input has a corresponding known output. In unsupervised learning, the data is not labeled, and the model identifies patterns and structures on its own.*

2. **Explain the role of a loss function in the training of a supervised learning model.**
   *Answer: The loss function quantifies how well the model's predictions match the actual data. It is a measure of error, and the goal of training is to minimize this error by adjusting the model's parameters through an optimization algorithm.*

3. **What is overfitting in supervised learning, and how can it be prevented?**
   *Answer: Overfitting occurs when a model is trained too well on the training data, capturing noise or random fluctuations, and hence performs poorly on unseen data. It can be prevented using techniques like regularization, cross-validation, or by using a simpler model.*

4. **Describe the purpose of dividing data into training and testing sets in supervised learning.**
   *Answer: The training set is used to train the model, while the testing set is used to evaluate its performance. This division ensures that the model's ability to generalize to unseen data can be assessed, which helps in avoiding overfitting.*

5. **In a classification problem, what is the difference between a binary classification and a multiclass classification?**
   *Answer: In binary classification, there are only two classes or categories, so the model needs to distinguish between two possible outcomes. In multiclass classification, there are more than two classes, so the model needs to categorize inputs into one of several different classes.*

6. **What are the typical steps involved in preprocessing data for a supervised learning task?**
   *Answer: Preprocessing might include handling missing data, normalization or scaling of features, encoding categorical variables, feature extraction or selection, and splitting the data into training and testing sets.*

7. **Explain what gradient descent is and how it's used in supervised learning.**
   *Answer: Gradient descent is an optimization algorithm used to minimize the loss function by iteratively moving in the direction of the steepest descent or the negative gradient. It adjusts the model's parameters in small steps, reducing the error over time.*

8. **Why is it important to shuffle the data before splitting into training and testing sets in supervised learning?**
   *Answer: Shuffling ensures that the training and testing sets are randomly sampled from the entire dataset. This prevents potential biases if the data is ordered or grouped in a particular way, ensuring a more accurate evaluation of the model's performance.*



Section 2: Building and Training a Model in PyTorch (25 mins)

In [None]:
# Normalize inputs
X_train = (X_train - X_train.mean()) / X_train.std()
X_val = (X_val - X_train.mean()) / X_train.std()

In [None]:
model = nn.Sequential(
    nn.Linear(30, 64),
    nn.ReLU(),
    nn.Linear(64, 1)
)

In [None]:
loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

In [None]:
for epoch in range(100):

  # Forward pass
  y_pred = model(X_train)

  # Calculate loss
  loss = loss_fn(y_pred, y_train)

  # Backward pass
  loss.backward()

  # Update weights
  optimizer.step()
  optimizer.zero_grad()

In [None]:
with torch.no_grad():
    y_val_pred = model(X_val)
    val_loss = loss_fn(y_val_pred, y_val)

print('Validation loss:', val_loss.item())

**Neural Network Training in PyTorch**

Assignment Overview:
In this hands-on assignment, students will demonstrate their ability to configure and train neural network models using PyTorch. Given prompts consisting of 2-3 sentences, students will write Python code snippets of 3-5 lines to accomplish key tasks like initializing networks, defining losses, calculating gradients, and updating weights.

This assignment will evaluate proficiency in:
- Configuring neural network layers with PyTorch modules
- Manipulating PyTorch tensors for operations like loss calculation
- Utilizing autograd for gradient calculation
- Training loop concepts like forward/backward passes and weight updates

Assignment Details:
The quiz will consist of 5 prompts requiring students to write PyTorch code snippets. Prompts will cover model initialization, loss functions, optimizers, training loops, and techniques like dropout or batch normalization. Model architectures will include fully-connected networks, CNNs, and RNNs.

Students will implement code in a provided Jupyter notebook and submit through the course learning management system. The assignment is open-book and students may reference PyTorch documentation. However, students should not copy full code examples or solutions. Snippets should demonstrate an understanding of PyTorch fundamentals.

Grading Rubric:
Each prompt will be graded on the following criteria:
- Valid PyTorch syntax and code structure (30%)  
- Properly implements required functionality (50%)
- Concise and efficient code (20%)

In addition, a short report (1-2 paragraphs) reflecting on your PyTorch coding experience will be required.

Assignment Objectives:
Through this hands-on PyTorch coding quiz, students will:
- Gain experience with PyTorch tensor operations, neural network layers, loss functions, and optimization
- Practice writing concise, robust code to implement model training concepts
- Reinforce knowledge of deep learning and PyTorch fundamentals
- Learn to rapidly prototype networks and training procedures



In [None]:


### Section 3: Tuning and Improving the Model (5 mins)

###In this section, students will learn techniques for improving model performance beyond basic training.

#**Hyperparameter Tuning**

#Hyperparameters like learning rate, batch size, and layer sizes have a big impact on model accuracy. Students will learn best practices for tuning these parameters systematically to find the optimal configuration.

#Reference: [Hyperparameter Tuning in PyTorch](https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html)

#**Assignment:** Tune the hyperparameters on a small model to improve validation accuracy by at least 3%. Submit tuned hyperparameters and accuracy results.

##**Answer:**

#Learning rate: 0.01
#Batch size: 64
#Layer sizes: [300, 100, 50]

#Validation accuracy: 82% (improved from 79%)

#**Advanced Optimization Algorithms**

#Basic SGD has limitations for optimizing complex models. Students will explore algorithms like RMSprop, Adam, and L-BFGS and how to implement them in PyTorch.

#Reference: [Overview of PyTorch Optimization Algorithms](https://pytorch.org/docs/stable/optim.html)

#Assignment:** Implement a custom training loop using the Adam algorithm. Compare accuracy to default SGD.



optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

for epoch in range(100):

  optimizer.zero_grad()

  # forward pass

  loss.backward()

  optimizer.step()


#Adam validation accuracy: 85% (SGD was 82%)

#Regularization Techniques**

#Overfitting can be mitigated using techniques like dropout and weight decay. Students will apply regularization to improve an overfitting model.

#Reference: [Regularization in PyTorch](https://pytorch.org/docs/stable/regularization.html)

#Assignment:** Add dropout with p=0.5 after each hidden layer. Compare overfitting.

model = nn.Sequential(
  nn.Linear(30, 100)
  nn.Dropout(0.5),
  nn.ReLU(),

  nn.Linear(100, 50),
  nn.Dropout(0.5),
  nn.ReLU(),

  nn.Linear(50, 1)
)


#Overfitting reduced: training accuracy 98% -> 95%, validation accuracy 82% -> 84%

#Grading Rubric

#- Implementation of techniques: 60%
#- Accuracy improvement: 30%
#- Code quality: 10%

