# 🧠 Neural Networks – Key Idea

## 📌 Limitations of Traditional Algorithms

- Algorithms like **linear regression** or **logistic regression** have **limited performance**, even when given **more data**.
- They don't **scale well** or take full advantage of **big data**.
- This makes them ineffective for complex tasks like vision or natural language.

---

## 🚀 Neural Networks vs. Traditional Algorithms

- When training neural networks of increasing size:
  - 🟢 **Small network** → slight improvement.
  - 🟡 **Medium-sized network** → better performance.
  - 🔴 **Large network** → performance **keeps improving** with more data.
- This ability to scale is **key** to their success.

---

## 📈 Why Deep Learning Took Off

- ✅ Massive availability of **big data**.
- ✅ Ability to train **deep neural networks** (more layers, more neurons).
- ✅ Improved computational resources: **GPUs**, parallel processing.
- Result: major breakthroughs in:
  - 🎙️ Speech recognition
  - 📸 Computer vision
  - 💬 Natural Language Processing (NLP)
  - ⚕️ And more biomedical applications...

---

## 💡 Conclusion

> Deep neural networks can **leverage more data and compute** to improve performance — something traditional algorithms can't do. This is why **deep learning** has become the dominant approach in many AI tasks.

# 🧠 Deeper Neural Networks – Architecture & Intuition

## 🔄 From Input to Output

- A neural network takes an **input feature vector** $X$, processes it through **one or more hidden layers**, and finally produces a **single output** (e.g., a prediction).
- Each **hidden layer** contains multiple **neurons** (aka *units*), which apply a weighted sum and activation function to the inputs.
- The output of each layer becomes the **input to the next layer**.

---

## 🧩 Learning Features Automatically

- Even if you *think* a network should compute certain features like “affordability,” “awareness,” or “quality,” **you don’t need to hand-design them**.
- Neural networks **learn which features to extract** by themselves during training — that’s one of their biggest advantages!
- This automatic **feature extraction** is what makes neural networks powerful for complex tasks.

---

## 🏗️ Examples of Deeper Networks

### 🧱 Example 1 – 2 Hidden Layers
- **Input layer** → 3 neurons in **1st hidden layer** → 2 neurons in **2nd hidden layer** → **Output layer**
- Activations flow forward through the network:  
  $$
  X \rightarrow a^{[1]} \rightarrow a^{[2]} \rightarrow \hat{y}$$

### 🧱 Example 2 – 3 Hidden Layers
- Input layer  
→ Hidden Layer 1  
→ Hidden Layer 2  
→ Hidden Layer 3  
→ Output

This is called a **deep neural network**, as it has **multiple layers**.

---

## 🔧 Choosing Architecture: Layers & Units

- When designing your own neural network, you must decide:
  - How many **hidden layers** to include
  - How many **neurons per layer**
- These choices define the **architecture** of your neural network.
- Choosing the right architecture can significantly affect the performance of your model.

---

## 📚 Terminology

> **Multilayer Perceptron (MLP)** = a neural network with **one or more hidden layers**.

If you see this term in books or papers, it refers to the kind of layered structure we've just described.

---

## 💡 Summary

> Neural networks consist of layers of neurons that transform an input into an output. Deep networks (with multiple layers) can automatically learn useful features from data, and the choice of architecture (number of layers & neurons) plays a key role in their performance.

# 🧠 Forward Propagation – Notation & Layer Computation

## 🧩 Layer Numbering Convention

- Input layer → **Layer 0**
- Hidden layers → **Layers 1, 2, ..., L-1**
- Output layer → **Layer L**
- So if a network has 3 hidden layers + 1 output layer → total of **4 layers**
- We **do not count the input layer** when saying “4-layer neural network”

---

## 🔢 Neuron-Level Notation

For any layer $l$ and neuron (unit) $j$ in that layer:

- $a_j^{[l]}$: Activation of neuron $j$ in layer $l$
- $w_j^{[l]}$: Weight vector of neuron $j$ in layer $l$
- $b_j^{[l]}$: Bias of neuron $j$ in layer $l$
- $a^{[l-1]}$: Activation vector from the **previous layer**, used as input

---

## 🧠 Activation Computation (per neuron)

The activation of neuron $j$ in layer $l$ is computed as:

$$
a_j^{[l]} = g\left( \mathbf{w}_j^{[l]} \cdot \mathbf{a}^{[l-1]} + b_j^{[l]} \right)
$$

Where:
- $g$ is the **activation function**, e.g. sigmoid
- $\cdot$ is the **dot product**
- $\mathbf{a}^{[l-1]}$ is a vector → the output of layer $l-1$

---

## ✅ Example: Layer 3 with 3 neurons

Suppose:
- Layer 3 has **3 neurons** → $j = 1, 2, 3$
- Input is vector $\mathbf{a}^{[2]}$ (from Layer 2)

Each neuron in Layer 3 computes:

```text
a_1^{[3]} = g(w_1^{[3]} ⋅ a^{[2]} + b_1^{[3]})
a_2^{[3]} = g(w_2^{[3]} ⋅ a^{[2]} + b_2^{[3]})
a_3^{[3]} = g(w_3^{[3]} ⋅ a^{[2]} + b_3^{[3]})