# Deep Learning with PyTorch

## 1. Introduction

**Instructor**: Jasmin, Senior Data Science Content Developer at DataCamp  
This course introduces deep learning using the PyTorch framework.

---

## 2. Deep Learning Is Everywhere

Deep learning powers many modern innovations:

- Language translation
- Self-driving cars
- Medical diagnostics
- Chatbots

---

## 3. What Is Deep Learning?

Deep learning is a subset of machine learning. Its core structure is a network composed of:

- Inputs
- Hidden layers (one or many)
- Outputs

These networks are inspired by the human brain, using interconnected units called **neurons**. Hence, the term **neural networks**.

Deep learning models require:

- Large datasets (hundreds of thousands or more)
- High computational resources
- Careful tuning and validation

---

## 4. PyTorch: A Deep Learning Framework

PyTorch is a widely-used deep learning framework:

- Originally developed by Meta AI (Facebook AI Research)
- Now maintained by the Linux Foundation
- Designed to be intuitive and user-friendly
- Shares syntax and behavior with NumPy

---

## 5. PyTorch Tensors

The fundamental data structure in PyTorch is a **tensor**, which is similar to an array or matrix. Tensors support many mathematical operations and form the building blocks of neural networks.

Tensors can be created from:

- Python lists
- NumPy arrays

They are converted using the `torch.tensor()` class into a format compatible with deep learning.

---

## 6. Tensor Attributes

Important tensor attributes include:

- `tensor.shape`: Displays the shape of the tensor
- `tensor.dtype`: Displays the data type (e.g., 64-bit integer)

Checking these attributes ensures tensors align correctly with the model and task, and helps with debugging.

---

## 7. Tensor Operations

Tensors support various operations:

- **Addition and subtraction**: Requires tensors of the same shape
- **Element-wise multiplication**: Multiplies corresponding elements of tensors with the same shape
- **Matrix multiplication**: Combines rows of the first matrix with columns of the second matrix and sums the results

---

## 8. Behind the Scenes in Deep Learning

Deep learning models perform countless operations such as:

- Addition
- Multiplication
- Matrix transformations

These operations are used to process data and learn patterns.

---

## 9. Additional Concepts to Explore

To deepen your understanding of PyTorch and deep learning, explore:

- Autograd: Automatic differentiation for training
- `nn.Module`: Building neural network layers
- Optimizers: Algorithms like SGD, Adam
- Loss functions: Metrics to evaluate model performance
- DataLoaders: Efficient data batching and shuffling


In [1]:
import torch

#firstefine the temperatures tensor
temperatures = torch.tensor([[30, 32, 33], [29, 31, 34]])

#define the adjustment tensor
adjustment = torch.tensor([[2, 2, 2], [2, 2, 2]])

#display the shape and type of the adjustment tensor
print("Adjustment shape:", adjustment.shape)
print("Adjustment type:", adjustment.dtype)

# Display the shape and type of the temperatures tensor
print("Temperatures shape:", temperatures.shape)
print("Temperatures type:", temperatures.dtype)

# appply the adjustment to the temperatures
adjusted_temperatures = temperatures + adjustment
print("Adjusted temperatures:", adjusted_temperatures)


  cpu = _conversion_method_template(device=torch.device("cpu"))


Adjustment shape: torch.Size([2, 3])
Adjustment type: torch.int64
Temperatures shape: torch.Size([2, 3])
Temperatures type: torch.int64
Adjusted temperatures: tensor([[32, 34, 35],
        [31, 33, 36]])


## tensor represents:

    * A 2D grid of temperature values (e.g., from sensors or locations).

    * Each row could represent a day, and each column a time or region.

    * It’s just numerical data structured like a matrix.

        ## What the adjustment tensor represents:

    ## A 2D grid of values to be added to the temperatures.

    * Could represent calibration offsets, corrections, or environmental adjustments.

    * Same shape as temperatures so they can be added element-wise.

## operation does:

    * Adds each value in adjustment to the corresponding value in temperatures.

    * This is element-wise addition, like basic matrix addition.

    * Result is a new tensor with adjusted temperature values.

## Difference between tensor and matrix:

    * Matrix: A 2D array of numbers (rows × columns).

    * Tensor: A general term for multi-dimensional arrays (can be 0D, 1D, 2D, 3D, etc.).

        - A matrix is just a 2D tensor.

        - Tensors support more complex operations and higher dimensions (e.g., images, videos, batches of data).

## Why use tensors in deep learning:

   * Tensors are optimized for GPU computation.

   * They support automatic differentiation (for training models).

   * They can represent complex data structures beyond simple matrices.



# Neural Networks and Layers

## 1. Building Our First Neural Network
Let's build our first neural network using PyTorch Tensors.

---

## 2. Neural Network Structure

A neural network consists of:

- **Input layer**: Contains the dataset features  
- **Hidden layers**: Lie between input and output  
- **Output layer**: Contains the predictions

---

## 3. Our First Neural Network

We begin with a network that has **no hidden layers**, where the output layer is a **linear layer**.  
In this setup:

- Every input neuron connects to every output neuron  
- This is called a **fully connected network**  
- It behaves like a **linear model**, useful for understanding fundamentals before adding complexity

---

## 4. Designing a Neural Network

We use the `torch.nn` module (imported as `nn`) to build networks.  
This module makes code more concise and flexible.

### Key Design Principles:

- **Input layer size** = number of features in the dataset  
- **Output layer size** = number of classes to predict  

Example:  
An `input_tensor` of shape `1 × 3` represents one row with three features or neurons.

---

## 5. Linear Layer and Prediction

We pass the `input_tensor` to a **linear layer**, which applies a linear function to generate predictions.

- `in_features`: Number of input features (e.g., 3)  
- `out_features`: Desired size of output (e.g., 2)

Correctly specifying `in_features` ensures the layer can process the input.

---

## 6. Output Interpretation

After passing the input through the linear layer:

- The output has two features or neurons  
- This matches the `out_features` specified

---

## 7. Weights and Biases

When the input is passed to the linear layer:

- A **linear operation** is performed using weights and biases  
- Each neuron has:
  - **Weights**: Indicate the importance of each input feature  
  - **Bias**: A baseline value independent of the weights

Initially, weights and biases are **randomly assigned** and later **tuned during training**.

---

## 8. Fully Connected Network in Action

Imagine a weather dataset with three features:

- Temperature  
- Humidity  
- Wind

We want to predict:

- Rain  
- Cloudy conditions

### Feature Importance:

- **Humidity** has a higher weight due to its strong predictive power  
- A **bias** is added to reflect the tropical region’s baseline likelihood of rain

With this setup, the model makes a prediction.



_if a neural network has no hidden layers and only a linear output layer, it behaves exactly like a linear model._

In [6]:
import torch
import torch.nn as nn 
input_tensor = torch.tensor([[0.3471, 0.4547,-0.2356]])
linear_layer = nn.Linear(in_features=3, out_features=2)

output=linear_layer(input_tensor)
print(output)
                            

tensor([[-0.5433,  0.0326]], grad_fn=<AddmmBackward0>)


# Hidden Layers and Parameters in Neural Networks

## 1. Expanding Beyond a Single Layer
So far, we've used one input layer and one linear layer. Now, we'll add more layers to help the network learn complex patterns.

---

## 2. Stacking Layers with `nn.Sequential()`

We can stack multiple linear layers using `nn.Sequential()`, a PyTorch container that applies layers in sequence:

- Input is passed through each layer one after another
- The intermediate layers are considered **hidden layers**
- This structure allows the network to learn hierarchical representations

---

## 3. Defining Input and Output Dimensions

- The **first layer's input** corresponds to the number of features in the dataset (`n_features`)
- The **final layer's output** corresponds to the number of prediction classes (`n_classes`)

---

## 4. Adding Hidden Layers

We can add:

- As many hidden layers as needed  
- As long as each layer’s input dimension matches the previous layer’s output dimension

### Example:
- First layer: input = 10, output = 18  
- Second layer: input = 18, output = 20  
- Third layer: input = 20, output = 5

---

## 5. Layers Are Made of Neurons

- A layer is **fully connected** when each neuron links to all neurons in the previous layer
- Each neuron performs a **linear operation** using all inputs from the previous layer

### Learnable Parameters:
- Each neuron has \( N + 1 \) parameters:
  - \( N \): weights (one per input)
  - \( +1 \): bias term

---

## 6. Parameters and Model Capacity

- More hidden layers → more parameters → higher **model capacity**
- High-capacity models can learn complex patterns but:
  - May take longer to train
  - Risk overfitting if not regularized

### Example Breakdown:
- **First layer**: 4 neurons, each with 8 weights + 1 bias → \( 4 x 9 = 36 \) parameters  
- **Second layer**: 2 neurons, each with 4 weights + 1 bias → \( 2 x 5 = 10 \) parameters  
- **Total**: 46 learnable parameters

---

## 7. Calculating Parameters in PyTorch

Use `.numel()` to count elements in each parameter tensor:

```python
total_params = sum(p.numel() for p in model.parameters())


 ![image.png](attachment:f820b4e6-ccf6-478b-8fab-d630374f85b7.png)


In [8]:
import torch
import torch.nn as nn

input_tensor = torch.Tensor([[2, 3, 6, 7, 9, 3, 2, 1]])

# Create a container for stacking linear layers
model = nn.Sequential(nn.Linear(8, 4),
                nn.Linear(4, 1)
                )

output = model(input_tensor)
print(output)

tensor([[0.6709]], grad_fn=<AddmmBackward0>)


In [9]:
model = nn.Sequential(nn.Linear(92, 4),
                      nn.Linear(4, 69),
                      nn.Linear(269, 1))

# Count total parameters
total_params = sum(p.numel() for p in model.parameters())
print(f"Total learnable parameters: {total_params}")

Total learnable parameters: 987


In [11]:
# another way of doing easy syntax

import torch.nn as nn
model = nn.Sequential(nn.Linear(92, 4),
                      nn.Linear(4, 69),
                      nn.Linear(269, 1))

total = 0

# Calculate the number of parameters in the model
for p in model.parameters():
  total += p.numel()

  
print(f"The number of parameters in the model is {total}")

The number of parameters in the model is 987


# Activation Functions in Neural Networks

## 1. Why Activation Functions Matter

Neural networks made only of linear layers can learn only straight-line relationships.  
To model complex patterns and interactions between inputs and outputs, we introduce **non-linearity** using activation functions.

---

## 2. What Is a Pre-Activation Output?

The output of the final linear layer before applying any activation is called the **pre-activation output**.  
This raw value is not yet interpretable. We pass it through an activation function to transform it into a meaningful output (e.g., probability or class score).

---

## 3. Sigmoid Activation Function (Binary Classification)

- **Use case**: Binary classification (e.g., mammal vs not mammal)
- **Input features**: number of limbs, lays eggs (1/0), has hair (1/0)
- **Model output**: raw score (e.g., 6) → not interpretable
- **Sigmoid transforms** this score into a value between 0 and 1:



![image.png](attachment:af1141c7-2b83-4ba0-9a2a-09f0996c3a22.png)


![image.png](attachment:a6a20905-f68e-4749-b251-8af0954fed5f.png)


- **Interpretation**:
  - Output > 0.5 → class 1 (mammal)
  - Output < 0.5 → class 0 (not mammal)

This transformation allows us to treat the output as a **probability**.

---

## 4. Sigmoid in Neural Network Architecture

- Typically added as the **final layer** in `nn.Sequential()`
- Automatically transforms the output of the last linear layer
- A network with only linear layers + sigmoid behaves like **logistic regression**
- Adding hidden layers and nonlinear activations unlocks the full power of deep learning

---

![image.png](attachment:d0914759-cb05-4cf4-b8ac-726648742af9.png)



## 5. Softmax Activation Function (Multi-Class Classification)

- **Use case**: Multi-class classification (e.g., bird, mammal, reptile)
- **Class labels**:
  - bird ->0  
  - mammal -> 1  
  - reptile -> 2

- **Softmax transforms** a vector of raw scores into a **probability distribution**:


![image.png](attachment:37b4aecb-0367-4aed-8ac5-46e1fc516386.png)

- Each output:
  - Is between 0 and 1
  - All outputs **sum to 1**
  - Highest value → predicted class

### Example:
- Pre-activation output: `[1.2, 3.5, 0.8]`
- Softmax output: `[0.07, 0.84, 0.09]`
- Prediction: class 1 (mammal) with probability 0.84

---

## 6. Softmax in Neural Network Architecture

- Added as the **final layer** in `nn.Sequential()` for multi-class tasks
- `dim=-1` ensures softmax is applied across the last dimension of the tensor
- Like sigmoid, it transforms raw scores into interpretable probabilities

---
![image.png](attachment:2bbb46bf-dfac-4cfd-a309-245c328710cb.png)

![image.png](attachment:8883c5dd-6169-4a74-ab27-9ffa0d25564e.png)



## Reinforcement Insights

- **Activation functions** are essential for learning complex, nonlinear patterns.
- **Sigmoid** is ideal for binary classification — output is a single probability.
- **Softmax** is ideal for multi-class classification — output is a probability vector.
- Without activation functions, neural networks are just stacks of linear equations.
- Activation functions are what make deep learning **flexible, expressive, and powerful**.

---

## Summary Table

| Activation Function | Use Case               | Output Range | Behavior                                |
|---------------------|------------------------|--------------|-----------------------------------------|
| Sigmoid             | Binary classification  | 0 to 1       | Converts raw score to probability       |
| Softmax             | Multi-class classification | 0 to 1 (sums to 1) | Converts scores to probability distribution |



In [15]:
input_tensor = torch.tensor([[2.4]])

# Create a sigmoid function and apply it on input_tensor
sigmoide = nn.Sigmoid()
probability = sigmoide(input_tensor)
print(probability)

tensor([[0.9168]])


In [23]:
input_tensor = torch.tensor([[1.0, -6.0, 2.5, -0.3, 1.2, 0.8]])

# Create a softmax function and apply it on input_tensor
softmax = nn.Softmax()
probabilities =softmax(input_tensor)
print(probabilities)

tensor([[1.2828e-01, 1.1698e-04, 5.7492e-01, 3.4961e-02, 1.5669e-01, 1.0503e-01]])


# Forward Pass in Neural Networks

##1. What Is a Forward Pass?

A **forward pass** is the process of sending input data through a neural network to generate predictions.  
It flows **forward** from the input layer to through hidden layers to the output layer.

At each layer:
- The data is transformed using weights, biases, and activation functions.
- These transformations create new representations of the input.
- The final output is the model’s prediction.

This process is used during both:
- **Training**: to compute predictions and compare them with actual labels.
- **Inference**: to make predictions on new, unseen data.

---

## 2. Types of Outputs from Forward Pass

Depending on the task, the final output can be:
- **Binary classification**: e.g., mammal vs not mammal
- **Multi-class classification**: e.g., mammal, bird, reptile
- **Regression**: predicting continuous values like weight or price

---

##3. Binary Classification Example

- **Input**: Data for 5 animals, each with 6 features (ex limbs, lays eggs, has hair, etc.)

- ![image.png](attachment:b4f7b47a-7a0b-4bba-889f-599df8b30ea5.png)
![image.png](attachment:d63c0ecf-189b-4a86-9a08-da201a4187e9.png)

- **Model**: Two linear layers + sigmoid activation
  - First layer: 6 inputs -> 4 outputs
  - Second layer: 4 inputs -> 1 output (probability)

###Output:
- A single probability between 0 and 1 for each animal
- Use a threshold (commonly 0.5) to convert probability into class label:
  - > 0.5 -> class 1 (mammal)
  - < 0.5 -> class 0 (not mammal)

**Note**: These predictions are based on current weights and biases.  
To improve accuracy, we use **backpropagation** to update these parameters     

---

## 4. Multi-Class Classification Example

- **Goal**: Predict one of three classes mammal, bird, reptile
  ![image.png](attachment:2127e2d1-9340-4842-a158-edad7f7cad1e.png)

  ![image.png](attachment:51b2528e-a2f8-4ffd-a906-7f0f74b1e5a7.png)
- **Model**:
  - Last linear layer outputs 3 values (one per class)
  - Use **softmax** activation to convert raw scores into probabilities
  - `dim = -1` ensures softmax is applied across the last dimension

### Output:
- Shape: 5 × 3 (5 animals × 3 class probabilities)
- Each row sums to 1
- Predicted label = class with highest probability

###  Example:
- Row 1: `[0.07, 0.84, 0.09]` -> predicted class = mammal
- Row 2: `[0.10, 0.85, 0.05]` -> predicted class = mammal
- Row 3: `[0.12, 0.08, 0.80]` -> predicted class = reptile

---

## 5. Regression Example

- **Goal**: Predict continuous values (ex animal weight)
- **Model**:
  - Same input data (5 animals × 6 features)
  - Last linear layer outputs 1 value per row
  - **No activation function** at the end

### Output:
- Shape: 5 × 1
- Each value is a continuous prediction (e.g., weight in kg)

---
![image.png](attachment:2726239a-1f82-4c70-bae9-dc7316ea5e54.png)

## Reinforcement Insights

- A forward pass is the **core prediction mechanism** in neural networks.
- It uses the model’s current parameters (weights and biases) to generate outputs.
- Different tasks (binary, multi-class, regression) require different output shapes and activation functions.
- The forward pass is only half the story learning happens when we **compare predictions to true labels** and **update parameters** using backpropagation.

---

## Summary Table

| Task Type              | Output Shape | Final Activation | Output Meaning                     |
|------------------------|--------------|------------------|------------------------------------|
| Binary Classification  | (batch_size, 1) | Sigmoid          | Probability of class 1             |
| Multi-Class Classification | (batch_size, num_classes) | Softmax          | Probability distribution across classes |
| Regression             | (batch_size, 1) | None             | Continuous value prediction        |


In [24]:
import torch
import torch.nn as nn

input_tensor = torch.Tensor([[3, 4, 6, 2, 3, 6, 8, 9]])

model = nn.Sequential(
  nn.Linear(8,1),
  nn.Sigmoid()
)

output = model(input_tensor)
print(output)

tensor([[0.2081]], grad_fn=<SigmoidBackward0>)


In [25]:
import torch
import torch.nn as nn

input_tensor = torch.Tensor([[3, 4, 6, 7, 10, 12, 2, 3, 6, 8, 9]])

# Implement a neural network with exactly four linear layers
model = nn.Sequential(
  nn.Linear(11,64),
  nn.Linear(64,32),
  nn.Linear(32,16),
  nn.Linear(16,1),
  nn.Softmax(dim=-1)
)

output = model(input_tensor)
print(output)

tensor([[1.]], grad_fn=<SoftmaxBackward0>)


# Loss Functions in Neural Networks

## 1. Why We Use a Loss Function

After making predictions with a forward pass, we need to check how close those predictions are to the actual answers.  
That’s what a loss function does. It compares the predicted value (y_hat) with the true label (y) and gives us a number.  
This number is called the loss. Lower loss means better predictions. Our goal is to reduce this loss during training.

![image.png](attachment:76b33ddb-bcd1-4d39-b5ea-ca6ade9dbd3c.png)
---

## 2. Example: Multi-Class Classification

Let’s say we’re predicting whether an animal is a mammal (0), bird (1), or reptile (2).  
If the true label is 0 and the model also predicts 0, the loss is low.  
If the model predicts 1 or 2, the loss is high.  
The loss function helps us measure how wrong the prediction is.

---

## 3. What the Loss Function Takes In

The loss function takes two things:
- y: the true label (like 0, 1, or 2)
- y_hat: the model’s raw prediction before softmax

If there are 3 classes, y_hat is a tensor with 3 values.  
Softmax will turn these into probabilities, but the loss function works with the raw scores.

---

## 4. One-Hot Encoding

To compare y with y_hat properly, we use one-hot encoding.  
This turns the label into a vector of zeros and ones.

Examples:
- y = 0 -> [1, 0, 0]
- y = 1 -> [0, 1, 0]
- y = 2 -> [0, 0, 1]

This helps match the shape of the prediction and makes comparison easier.


![image.png](attachment:56902421-4f2e-458f-85f3-8d842cf8de17.png)
---

## 5. Automating One-Hot Encoding

Instead of writing one-hot vectors manually, we use a function from PyTorch called F (from torch.nn.functional).  
It automatically converts the label into the right format.

- y = 0 ->one at first position
- y = 1 -> one at second position
- y = 2 -> one at third position

---

![image.png](attachment:329f44e2-8a72-471e-820c-bd1ce50965ac.png)


![image.png](attachment:1b8f0ed1-f42e-4e69-b908-6347dd89ff57.png)

## 6. Cross-Entropy Loss

Once we have the encoded label and the prediction scores, we pass them to a loss function.  
The most common one for classification is cross-entropy loss.  
It compares the predicted scores with the true label and gives a single float value.  
This value tells us how far off the prediction was.

---

## 7. Summary

- The loss function compares predictions with actual labels
- It gives a number that shows how wrong the prediction is
- Lower loss means better performance
- We use one-hot encoding to match label format
- Cross-entropy is the go-to loss function for classification
- The goal is to minimize this loss during training



In [31]:
import numpy as np
import torch  
import torch.nn.functional as F  
y = 1
num_classes = 3

# Create the one-hot encoded vector using NumPy
one_hot_numpy = np.array([0, 1, 0])

# Create the one-hot encoded vector using PyTorch
one_hot_pytorch = F.one_hot(torch.tensor(y), num_classes=num_classes)

print("One-hot vector using NumPy:", one_hot_numpy)
print("One-hot vector using PyTorch:", one_hot_pytorch)

One-hot vector using NumPy: [0 1 0]
One-hot vector using PyTorch: tensor([0, 1, 0])


In [34]:
import torch
import torch.nn.functional as F
from torch.nn import CrossEntropyLoss

y = [2]
scores = torch.tensor([[0.1, 6.0, -2.0, 3.2]])

# Create a one-hot encoded vector of the label y
one_hot_label = F.one_hot(torch.tensor(y), num_classes=4)


In [35]:
import torch
import torch.nn.functional as F
from torch.nn import CrossEntropyLoss

y = [2]
scores = torch.tensor([[0.1, 6.0, -2.0, 3.2]])

# Create a one-hot encoded vector of the label y
one_hot_label = F.one_hot(torch.tensor(y), num_classes=4)

# Create the cross entropy loss function
criterion = CrossEntropyLoss()

# Calculate the cross entropy loss
loss = criterion(scores.double(), one_hot_label.double())
print(loss)

tensor(8.0619, dtype=torch.float64)
