# 🎯 Deep Learning Notes – Topics 1 to 3

---

## ✅ 1. Introduction to Deep Learning

---

### 📌 What is Deep Learning?

Deep Learning (DL) is a part of Machine Learning (ML) that uses **neural networks with many layers** to learn from large volumes of data.

🧠 Inspired by the structure of the human brain, DL networks can:
- Automatically extract features
- Learn complex patterns
- Handle unstructured data (images, text, audio)

---

### 📊 ML vs DL

| Feature                | Machine Learning            | Deep Learning                    |
|------------------------|-----------------------------|----------------------------------|
| Feature Engineering    | Manual (you design features)| Automatic (learns from raw data) |
| Performance on big data| May plateau                 | Improves with more data          |
| Data Requirement       | Can work on small data      | Needs large data sets            |
| Examples               | Decision Trees, SVM         | CNNs, RNNs, Transformers         |

---

### 🧭 When to Use DL?

✅ Use Deep Learning when:
- You have **lots of labeled data**
- The problem involves **images, audio, or text**
- You want **automatic feature extraction**
- You need **state-of-the-art accuracy**

📌 Not ideal for small tabular datasets with limited data — traditional ML might perform better there.

---

## ✅ 2. Neural Networks Fundamentals

---

### 🧱 Structure of Neural Networks

A neural network consists of:

1. **Input Layer** – takes the input features (e.g., pixel values, words).
2. **Hidden Layers** – multiple layers of neurons doing computation.
3. **Output Layer** – produces final prediction (e.g., class label)

---

### 📐 Each Neuron Does:
```
Z = W·X + b  
A = Activation(Z)
```

Where:
- W = weight  
- X = input  
- b = bias  
- Z = linear sum  
- A = activated output

---

### 🔁 Forward Propagation

This is the **process of calculating outputs** from the input, layer by layer.

📌 In each neuron:
- Multiply input with weights
- Add bias
- Pass through activation function
- Send output to next layer

#### 🧮 Example:
Given:
- Input: X = 1.0  
- Weight: W = 2.0  
- Bias: b = 1.0

Then:
```
Z = W·X + b = 2.0 * 1.0 + 1.0 = 3.0  
A = ReLU(Z) = max(0, 3.0) = 3.0
```

---

### 📉 Loss Function & Cost Function

- **Loss** = error for one training example  
- **Cost** = average error over all training examples

---

#### 🔹 Common Loss Functions:

**Mean Squared Error (MSE)** – for regression:
```
L = (1/n) * Σ (y_pred - y_true)^2
```

**Cross-Entropy Loss** – for classification:
```
L = - Σ y_true * log(y_pred)
```

---

### 🔁 Backpropagation & Gradient Descent

This is how the network **learns** by adjusting weights and biases.

Steps:
1. Calculate prediction error (loss)
2. Compute gradient of the loss (slope)
3. Update weights using gradient descent

#### 🎯 Gradient Descent Formula:
```
W = W - α * ∂L/∂W
```
Where:
- α = learning rate (how big a step to take)
- ∂L/∂W = gradient (slope of loss w.r.t. weights)

---

## ✅ 3. Activation Functions

---

Activation functions introduce **non-linearity**, allowing the network to learn complex patterns. Without them, the network would be just a linear model.

---

### 1️⃣ Sigmoid

#### 📌 Formula:
```
A = 1 / (1 + e^(-Z))
```

- Output range: (0, 1)
- Use: Output layer in binary classification

#### Example:
Z = 2 → A ≈ 0.88  
Z = -2 → A ≈ 0.12

🧨 Downside: Vanishing gradients

---

### 2️⃣ Tanh (Hyperbolic Tangent)

#### 📌 Formula:
```
A = (e^Z - e^(-Z)) / (e^Z + e^(-Z))
```

- Output range: (-1, 1)
- Use: Hidden layers (better than sigmoid for centered data)

#### Example:
Z = 1 → A ≈ 0.76  
Z = -1 → A ≈ -0.76

---

### 3️⃣ ReLU (Rectified Linear Unit)

#### 📌 Formula:
```
A = max(0, Z)
```

- Output range: [0, ∞)
- Very popular in hidden layers

#### Example:
Z = -3 → A = 0  
Z = 4 → A = 4

⚠️ Can cause "dying neurons" if too many 0s

---

### 4️⃣ Leaky ReLU

#### 📌 Formula:
```
A = Z if Z > 0 else 0.01 * Z
```

- Fixes ReLU's zero output for negative Z
- Range: (-∞, ∞)

#### Example:
Z = -2 → A = -0.02

---

### 5️⃣ Softmax

#### 📌 Formula:
```
A[i] = exp(Z[i]) / Σ exp(Z[j])
```

- Converts outputs into probabilities
- Use: Output layer for multi-class classification

#### Example:
Z = [1, 2, 3]  
Softmax(Z) ≈ [0.09, 0.24, 0.67]

---

### 🧠 Summary: When to Use Which?

| Activation     | Where to Use                         |
|----------------|--------------------------------------|
| **Sigmoid**    | Output layer in binary classification|
| **Tanh**       | Hidden layers with centered data     |
| **ReLU**       | Most common in hidden layers         |
| **Leaky ReLU** | Hidden layers (to fix dead neurons)  |
| **Softmax**    | Output layer in multi-class tasks    |

---

🎉 Congratulations! You’ve now learned:

- ✅ What Deep Learning is  
- ✅ How neural networks work  
- ✅ How data flows through layers  
- ✅ What activation functions do & when to use them

🧪 Next: Training models with Keras + real projects like digit recognition!


# ✅ Topic 4: Training Deep Neural Networks

---

## 🔄 Epochs, Batch Size & Iterations

When training a deep neural network, data is not fed all at once — it's split into smaller parts:

### 🔹 Epoch
- One full pass of the training dataset through the model.
- If you have 1,000 samples and train for 10 epochs → model sees each sample 10 times.

### 🔹 Batch Size
- Number of training examples used in one forward/backward pass.
- Smaller batch = faster updates but more noise.
- Common values: 16, 32, 64, 128

### 🔹 Iteration
- One update of weights.
- No. of iterations per epoch = Total samples / Batch size

---

### 📌 Example:
Dataset = 1,000 images  
Batch size = 100  
Epochs = 3

Then:
- 1 Epoch = 10 Iterations (1000 / 100)
- 3 Epochs = 30 Iterations in total

---

## ⚠️ Overfitting vs Underfitting

| Type          | Description                                  | Symptoms                      | Fix                          |
|---------------|----------------------------------------------|-------------------------------|------------------------------|
| Underfitting  | Model is too simple, can't capture patterns  | High training & test error    | Add layers, train longer     |
| Overfitting   | Model memorizes training data                | Low train error, high test error | Regularization, dropout, early stopping |

---

### 📈 Visual Representation:

- **Underfitting**: Both training & validation loss are high  
- **Good Fit**: Both losses are low  
- **Overfitting**: Training loss ↓ but validation loss ↑

---

## 🛡️ Regularization Techniques

---

### 🔹 1. Dropout
- Randomly turns off neurons during training to avoid co-dependency.
- Forces the model to learn robust features.

```python
from tensorflow.keras.layers import Dropout
model.add(Dropout(0.5))
```

---

### 🔹 2. L1 & L2 Regularization

- Adds penalty to the loss function to discourage large weights.

#### 🔸 L1 (Lasso):
- Promotes sparsity (more zeros in weights)

#### 🔸 L2 (Ridge):
- Penalizes large weights smoothly

```python
from tensorflow.keras import regularizers

Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.01))
```

---

## 🧪 Training Workflow Overview

1. Define model architecture  
2. Compile model (optimizer + loss + metrics)  
3. Train using `.fit()` with training data  
4. Validate on unseen data  
5. Tune epochs, batch size, and layers  
6. Prevent overfitting using dropout/regularization

---

## 🧠 Optimizer Quick Intro (Preview of Topic 5)

| Optimizer | Description                  | Notes                    |
|----------|------------------------------|--------------------------|
| SGD      | Vanilla Stochastic Gradient  | Basic, slower            |
| Adam     | Adaptive + momentum          | Most commonly used       |
| RMSProp  | Scales learning rate adaptively | Good for RNNs           |

---

🎯 In the next lesson:
We’ll **build and train** your first neural network using **Keras** on the **MNIST digit dataset**.

You'll learn:
- How to load data
- Build architecture
- Train, evaluate, and visualize results!

Ready to code some neurons? 🧠💻
