# Introduction to Deep Learning - Part 1  
### Understanding Neural Networks from Scratch  

---

Welcome to this beginner-friendly guide to deep learning! 🚀  
In this notebook, we’ll explore **neural networks**, how they work, and how we can use them for function approximation.  

---

🔹 **What you'll learn:**  
✔️ Basics of Neural Networks  
✔️ How Neurons Work  
✔️ Activation Functions  
✔️ Training with Backpropagation  

Let's dive in! 🧠💡  


## What is a Function? 🤔  

Before we dive into neural networks, let's first understand **functions**.  

A **function** is a system that takes an input and produces an output.  
Mathematically, we can write it as:  

\[
y = f(x)
\]

where:  
- \( x \) is the input,  
- \( f \) is the function,  
- \( y \) is the output.  

For example, if we have:

\[
y = 2x + 3
\]

and we input \( x = 4 \), we get:

\[
y = 2(4) + 3 = 11
\]




## What is a Function and Its Relation to Neural Networks? 🤔  

Let's start with a question:  

**Suppose we have some inputs and their corresponding outputs. Can we figure out the function that produced them?**  

At first, it might seem impossible. If we don’t know the function, how can we find it? But !!!



## The Answer
the answer is **yes!** We can approximate the function that maps inputs to outputs using a technique called **function approximation**.  

This is exactly what a **neural network** does! Instead of manually writing a function, we use a system that learns the relationship between inputs and outputs automatically.  

### Example: Understanding Function Approximation  

Imagine we have the following input-output pairs:  

| Input (x) | Output (y) |
|-----------|-----------|
| 1         | 3         |
| 2         | 5         |
| 3         | 7         |

Can you see a pattern? The function behind this data is:  

\[
y = 2x + 1
\]


But what if we had **thousands** of inputs and outputs, and the function was much more complex? Finding the function manually would be very difficult.  

This is where **neural networks** come in! They help us learn the function that best fits the data **without us needing to define it explicitly**.  



## What is a Neuron? 🧠  

Now that we understand the idea of functions and function approximation, let’s break down the **basic building block of a neural network**—the **neuron**.  

A **neuron** is just a function! It takes some inputs, applies some calculations, and produces an output.  

### How Does a Neuron Work?  

Each neuron takes multiple inputs and applies two key components:  

1. **Weights (\( w \))**: Each input is multiplied by a weight, which determines how important that input is.  
2. **Bias (\( b \))**: A bias is added to shift the function up or down, helping the neuron learn more flexible patterns.  

Mathematically, a neuron works like this:  

\[
\output = (w_1 \cdot x_1) + (w_2 \cdot x_2) + ... + (w_n \cdot x_n) + b
\]

Then, we apply a **special function** (called an **activation function**) to make the output more useful. We’ll talk about activation functions soon!  

---

## What is a Neural Network? 🤖  

A **neural network** is just a **combination of many neurons working together**!  

Instead of a single neuron, we **stack multiple neurons into layers** to create a powerful system that can learn complex functions.  

### Structure of a Neural Network  

A neural network is made up of **three types of layers**:  

1. **Input Layer** 🎯 – Takes in the raw data (e.g., pixels from an image, words from text).  
2. **Hidden Layers** 🔄 – The "thinking" part of the network, where neurons process information and learn patterns.  
3. **Output Layer** 🎯 – Produces the final result (e.g., predicting a number, classifying an image).  

💡 **Think of a neural network like a team of neurons working together!** Each neuron contributes a small part, and together they can approximate almost any function.  

---

### Why Do We Need Weights and Biases?  

- **Weights** control the importance of each input, allowing the network to adjust and learn.  
- **Bias** helps shift the function, allowing the network to learn patterns that wouldn’t be possible with just weights alone.  

Together, weights and biases **help the network learn from data and improve its accuracy**!  

In the next section, we’ll discuss **why we need activation functions** and how they help neurons make better decisions. 🚀  


## The Problem: Neurons Can Only Approximate Linear Functions 😟  

Now that we know what a neuron is, let's talk about a big limitation.  

A **single neuron** is just a **linear function** because it only does this:  

\[
\text{output} = (w_1 \cdot x_1) + (w_2 \cdot x_2) + ... + (w_n \cdot x_n) + b
\]

This is just like drawing a straight line on a graph! 📈  

But in deep learning, we want to represent **more complex patterns**, like recognizing objects in images or understanding speech. These problems need **non-linear** functions, not just straight lines.  


## The Solution: Activation Functions 🚀  

To solve this, we use something called an **activation function**.  

An activation function **adds non-linearity** to our neurons, so they can learn complex relationships instead of just straight lines. It allows the network to combine neurons in a way that can approximate **any function**, no matter how complicated!  

### Most Common Activation Functions  

1. **ReLU (Rectified Linear Unit)** 🔥  
   - The most commonly used activation function.  
   - Formula:  
     \[
     f(x) =
     \begin{cases} 
     x, & x > 0 \\
     0, & x \leq 0
     \end{cases}
     \]
   - Simply returns the input if it's positive, otherwise outputs zero.  
   - Helps networks learn faster and prevents unnecessary complexity.  

2. **Sigmoid** 📉  
   - Formula:  
     \[
     f(x) = \frac{1}{1 + e^{-x}}
     \]
   - Squashes values between **0 and 1**, making it useful for probabilities.  
   - Used in older networks but can cause problems like vanishing gradients.  

3. **Tanh (Hyperbolic Tangent)** 📈  
   - Formula:  
     \[
     f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}
     \]
   - Similar to sigmoid but outputs values between **-1 and 1**, making it better for balanced data.  

---

## Why Are Activation Functions So Important?  

Without activation functions, **no matter how deep our network is, it will still behave like a simple linear function!** That means it wouldn't be able to learn complex relationships.  

But by adding activation functions, we allow our network to model **non-linear patterns**, making deep learning so powerful.  

### Key Takeaway:  
🔑 **Activation functions give neurons the ability to learn beyond straight lines, allowing deep learning to approximate any function!**  

Next, we’ll discuss **how neural networks learn** using backpropagation! ⚡  

## How Neural Networks Learn 🤖  

Now that we understand neurons and activation functions, let’s talk about how a neural network **learns**.  

Neural networks learn in **three main steps**:  

1. **Feedforward** 🚀 (Making a prediction)  
2. **Loss Calculation** ⚖️ (Measuring the error)  
3. **Backpropagation & Optimization** 🔄 (Fixing the errors)  

---

## 1️⃣ Feedforward – Making Predictions  

Imagine we have an input (like an image of a cat 🐱), and we want our neural network to recognize it.  

### What happens?  
1. The input is passed through the network **layer by layer**.  
2. Each neuron processes the input using **weights, biases, and activation functions**.  
3. The network finally **produces an output** (e.g., "90% chance it's a cat").  

This step is called **feedforward** because data moves **forward** through the network to make a prediction.  

💡 **But what if the prediction is wrong?** This is where the next step comes in!  

---

## 2️⃣ Loss Function – Measuring Error  

A neural network **doesn’t start out perfect**—it makes mistakes!  
To improve, we need to measure **how wrong** its predictions are.  

A **loss function** is a mathematical formula that tells us **how far off** the network's prediction is from the correct answer.  

### Example Loss Functions:  
- **Mean Squared Error (MSE)** – For regression problems.  
- **Cross-Entropy Loss** – For classification problems (e.g., cat vs. dog).  

The smaller the loss, the better the network’s predictions!  

---

## 3️⃣ Backpropagation & Optimization – Fixing Mistakes  

Now that we know the error, we need to **adjust the weights and biases** to improve the predictions.  

### How does backpropagation work?  
1. **The error is sent backward through the network** (backpropagation).  
2. Each neuron **calculates its contribution to the total error**.  
3. The weights and biases are **updated** to reduce the error.  

This is done using a technique called **Gradient Descent**!  

---

## 🔧 Optimization – Updating Weights & Biases  

Backpropagation alone is not enough—we need an **optimizer** to efficiently adjust the weights and biases.  

An **optimizer** is an algorithm that improves the network by minimizing the loss.  

### Popular Optimizers:  
1. **Gradient Descent** – Adjusts weights step by step using derivatives.  
2. **Stochastic Gradient Descent (SGD)** – Updates after each small batch of data.  
3. **Adam (Adaptive Moment Estimation)** – The most commonly used optimizer in deep learning.  

💡 **Optimizers


# 🔥 Practical Deep Learning with PyTorch – Part 1  

## What is a Tensor? 🤔  

In deep learning, **Tensors** are the fundamental building blocks. They are just **multi-dimensional arrays**, like NumPy arrays but more powerful!  

A **Tensor** can be:  
- A **single number** (scalar)  
- A **list of numbers** (vector)  
- A **matrix** (2D array)  
- A **higher-dimensional array** (3D, 4D, etc.)  

💡 **Think of tensors as the data format that neural networks understand!**  

---

## ✍️ Let's Create Tensors in PyTorch  

We’ll use the `torch` library to create some tensors!  


In [None]:
import torch  # Import PyTorch

# Create a 1D Tensor (Vector)
tensor_1d = torch.tensor([1, 2, 3, 4])
print(tensor_1d)

# Create a 2D Tensor (Matrix)
tensor_2d = torch.tensor([[1, 2], [3, 4]])
print(tensor_2d)

# Create a Random Tensor
tensor_rand = torch.rand(3, 3)  # 3x3 random matrix
print(tensor_rand)


# 🎯 Play with Tensors in PyTorch  

## Your Task:  

Let's practice creating and manipulating **tensors** in PyTorch! Try to complete the following tasks before checking the solutions.  

### 1️⃣ Create a tensor of shape (2,3) filled with random numbers.  
### 2️⃣ Create a tensor of zeros with shape (4,4).  
### 3️⃣ Convert a NumPy array into a PyTorch tensor.  

🔹 **Hint:** Use functions like `torch.rand()`, `torch.zeros()`, `torch.tensor()`, and `torch.from_numpy()`.  




In [None]:
import torch
import numpy as np  # Needed for Task 3

# 1️⃣ Create a (2,3) tensor with random numbers
tensor_random = torch.rand(2, 3)
print("Random Tensor (2,3):\n", tensor_random)

# 2️⃣ Create a (4,4) tensor filled with zeros
tensor_zeros = torch.zeros(4, 4)
print("\nZero Tensor (4,4):\n", tensor_zeros)

# 3️⃣ Convert a NumPy array into a PyTorch tensor
numpy_array = np.array([[5, 6, 7], [8, 9, 10]])
tensor_from_numpy = torch.from_numpy(numpy_array)
print("\nTensor from NumPy:\n", tensor_from_numpy)


# 🧠 Building a Simple Neural Network in PyTorch  

## 🔹 Step 1: Creating a Neural Network (Without Activation Function)  

Let's start with a **simple neural network** that takes an input, applies weights and bias, and gives an output. **No activation function yet!**  


In [None]:
import torch

# Define input tensor (1 sample, 3 features)
x = torch.tensor([[2.0, 3.0, 4.0]])  # Shape: (1,3)

# Define weight tensor (3 input features → 1 output)
w = torch.tensor([[0.1, 0.2, 0.3]])  # Shape: (1,3)

# Define bias tensor
b = torch.tensor([0.5])  # Shape: (1,)

# Compute the output: y = x*w + b
y = torch.sum(x * w) + b
print("Output without activation:", y)

### 🎯 Task 1: Modify Weights & Bias  
🔹 Try changing the **values of weights and bias**. What happens to the output? 

In [None]:
w = torch.tensor([[0.5, -0.3, 0.2]])  # Different weights
b = torch.tensor([1.0])  # Different bias
b = torch.tensor([0.5])  # Shape: (1,)

# Compute the output: y = x*w + b
y = torch.sum(x * w) + b
print("Output without activation:", y)

#### ✅ Solution:  
When you change the weights, you are modifying the influence of each input.  
When you change the bias, you shift the output up or down.  
You'll see a **different output**! 

## 🔹 Step 2: Adding an Activation Function  

Now, let's **add an activation function** (ReLU) to introduce non-linearity.  

In [None]:
import torch.nn.functional as F  # For activation functions

# Apply ReLU activation
y_activated = F.relu(y)
print("Output with ReLU activation:", y_activated)

### 🎯 Task 2: Try Different Activation Functions  
🔹 Replace `F.relu(y)` with `torch.sigmoid(y)` or `torch.tanh(y)`.  

In [None]:
y_sigmoid = torch.sigmoid(y)
y_tanh = torch.tanh(y)

print("Sigmoid output:", y_sigmoid)
print("Tanh output:", y_tanh)

Each activation function **transforms the output differently**!  

## 🔹 Step 3: Feedforward  

A **feedforward neural network** processes inputs layer by layer to produce an output. In PyTorch, we can define it as a class:  

In [None]:
import torch.nn as nn

class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.linear = nn.Linear(3, 1)  # 3 input features → 1 output

    def forward(self, x):
        return self.linear(x)  # No activation yet

# Create model and sample input
model = SimpleNN()
x = torch.tensor([[2.0, 3.0, 4.0]])
output = model(x)
print("Model output:", output)

### 🎯 Task 3: Add an Activation Function  
🔹 Modify the network to **include a ReLU activation function** inside `forward()`. 

In [None]:
import torch.nn as nn

class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.linear = nn.Linear(3, 1)  # 3 input features → 1 output

    def forward(self, x):
        return torch.relu(self.linear(x))

# Create model and sample input
model = SimpleNN()
x = torch.tensor([[2.0, 3.0, 4.0]])
output = model(x)
print("Model output:", output)

## 🔹 Step 4: Loss Function  

A **loss function** measures how far the model's predictions are from the actual values. One common choice for regression is **Mean Squared Error (MSE)**. 

In [None]:
# Define target (true output)
target = torch.tensor([[10.0]])

# Define Mean Squared Error Loss
loss_fn = nn.MSELoss()

# Compute loss
loss = loss_fn(output, target)
print("Loss:", loss.item())

### 🎯 Task 4: Try L1 Loss  
🔹 Replace `nn.MSELoss()` with `nn.L1Loss()`. What’s the difference?  

In [None]:
loss_fn = nn.L1Loss()
loss = loss_fn(output, target)

- **MSE Loss** penalizes large errors more.  
- **L1 Loss** is more resistant to outliers.  


## 🔹 Step 5: Backpropagation & Optimizer  

The optimizer **adjusts the weights and biases** using gradient descent to minimize the loss. 

In [None]:
# Define optimizer (Stochastic Gradient Descent)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Backpropagation: Compute gradients
loss.backward()

# Update weights
optimizer.step()

# Clear gradients for next iteration
optimizer.zero_grad()

### 🎯 Task 5: Try the Adam Optimizer  
🔹 Change the optimizer to **Adam (`torch.optim.Adam`)** and observe the effect on training.  

In [None]:
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

## 🎯 Final Task: Train a Neural Network on a Custom Dataset  

### 🔹 Task:
Build a dataset where the target function is:  
\[ y = \frac{e^{(x+1)}}{\ln(2x)} \]  
Train a neural network on this dataset.

### 🔹 Hint:
- Use PyTorch's `Dataset` class to create the dataset.
- Define a neural network with multiple layers.
- Use `MSELoss()` as the loss function.
- Use `.fit()` or a custom training loop to train the model.




In [None]:
import numpy as np
from torch.utils.data import Dataset, DataLoader

# Define dataset class
class CustomDataset(Dataset):
    def __init__(self, size=100):
        self.x = torch.linspace(1, 10, size).view(-1, 1)
        self.y = (torch.exp(self.x + 1) / torch.log(2 * self.x)).view(-1, 1)
    
    def __len__(self):
        return len(self.x)
    
    def __getitem__(self, idx):
        return self.x[idx], self.y[idx]

# Create dataset and dataloader
dataset = CustomDataset()
dataloader = DataLoader(dataset, batch_size=10, shuffle=True)

# Define a simple neural network
class DeepNN(nn.Module):
    def __init__(self):
        super(DeepNN, self).__init__()
        self.layers = nn.Sequential(
            nn.Linear(1, 10),
            nn.ReLU(),
            nn.Linear(10, 10),
            nn.ReLU(),
            nn.Linear(10, 1)
        )
    
    def forward(self, x):
        return self.layers(x)

# Instantiate model, loss, and optimizer
model = DeepNN()
loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Training loop
for epoch in range(100):
    for x_batch, y_batch in dataloader:
        optimizer.zero_grad()
        y_pred = model(x_batch)
        loss = loss_fn(y_pred, y_batch)
        loss.backward()
        optimizer.step()
    
    if epoch % 10 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item()}")

print("Training Complete!")