- created a repository github. copied http url
- opened terminal 'git clone http..'
- edited a file.
- 

- git status 
- git add .
- git commit -m 'msg'
- git push -u origin main

# 🧮 Forward Propagation: Feedforward Neural Network

## 🧠 Network Architecture:
- **Input:** 2 features: $x = [1, 2]$
- **Hidden Layer 1:** 2 neurons (ReLU)
- **Hidden Layer 2:** 2 neurons (ReLU)
- **Output Layer:** 1 neuron (Sigmoid)

---

## 🔧 Parameters:

### Hidden Layer 1
$$
W^{[1]} = \begin{bmatrix} 0.1 & 0.2 \\ 0.3 & 0.4 \end{bmatrix}, \quad
b^{[1]} = \begin{bmatrix} 0.1 \\ 0.2 \end{bmatrix}
$$

### Hidden Layer 2
$$
W^{[2]} = \begin{bmatrix} 0.5 & 0.6 \\ 0.7 & 0.8 \end{bmatrix}, \quad
b^{[2]} = \begin{bmatrix} 0.3 \\ 0.4 \end{bmatrix}
$$

### Output Layer
$$
W^{[3]} = \begin{bmatrix} 0.9 & 1.0 \end{bmatrix}, \quad
b^{[3]} = \begin{bmatrix} 0.5 \end{bmatrix}
$$

---

## ➕ Step-by-step Calculations

### 📥 Input
$$
x = \begin{bmatrix} 1 \\ 2 \end{bmatrix}
$$

---

### 🔹 Layer 1: Linear + ReLU
$$
z^{[1]} = W^{[1]}x + b^{[1]} = \begin{bmatrix} 0.1 & 0.2 \\ 0.3 & 0.4 \end{bmatrix} \begin{bmatrix} 1 \\ 2 \end{bmatrix} + \begin{bmatrix} 0.1 \\ 0.2 \end{bmatrix}
= \begin{bmatrix} 0.6 \\ 1.3 \end{bmatrix}
$$

**ReLU Activation:**
$$
a^{[1]} = \max(0, z^{[1]}) = \begin{bmatrix} 0.6 \\ 1.3 \end{bmatrix}
$$

---

### 🔹 Layer 2: Linear + ReLU
$$
z^{[2]} = W^{[2]}a^{[1]} + b^{[2]} = \begin{bmatrix} 0.5 & 0.6 \\ 0.7 & 0.8 \end{bmatrix} \begin{bmatrix} 0.6 \\ 1.3 \end{bmatrix} + \begin{bmatrix} 0.3 \\ 0.4 \end{bmatrix}
= \begin{bmatrix} 1.38 \\ 1.86 \end{bmatrix}
$$

**ReLU Activation:**
$$
a^{[2]} = \max(0, z^{[2]}) = \begin{bmatrix} 1.38 \\ 1.86 \end{bmatrix}
$$

---

### 🔹 Output Layer: Linear + Sigmoid
$$
z^{[3]} = W^{[3]}a^{[2]} + b^{[3]} = \begin{bmatrix} 0.9 & 1.0 \end{bmatrix} \begin{bmatrix} 1.38 \\ 1.86 \end{bmatrix} + 0.5 = 3.602
$$

**Sigmoid Activation:**
$$
\hat{y} = \sigma(z^{[3]}) = \frac{1}{1 + e^{-3.602}} \approx 0.9736
$$

---

## ✅ Final Output:
$$
\hat{y} = 0.9736
$$


# 🔁 Common Activation Functions in Neural Networks

---

## 1. Sigmoid (Logistic)

**Equation:**
$$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$

**Range:**  
$ (0, 1) $

**Use When:**
- Binary classification (used in the output layer)
- Models requiring probabilities

**Pros:** Smooth gradient, probabilistic output  
**Cons:** Vanishing gradient for large inputs

---

## 2. Tanh (Hyperbolic Tangent)

**Equation:**
$$
\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}
$$

**Range:**  
$ (-1, 1) $

**Use When:**
- Hidden layers when zero-centered output is preferred

**Pros:** Stronger gradients than sigmoid  
**Cons:** Still suffers from vanishing gradients for large values

---

## 3. ReLU (Rectified Linear Unit)

**Equation:**
$$
f(x) = \max(0, x)
$$

**Range:**  
$ [0, \infty) $

**Use When:**
- Default for hidden layers
- Most popular due to simplicity and performance

**Pros:** Efficient, less vanishing gradient  
**Cons:** Can die (output zero) for negative inputs ("dying ReLU")

---

## 4. Leaky ReLU

**Equation:**
$$
f(x) = \begin{cases}
x & \text{if } x > 0 \\
\alpha x & \text{if } x \leq 0
\end{cases}
$$

(typically $ \alpha = 0.01 $)

**Range:**  
$ (-\infty, \infty) $

**Use When:**
- Fix for dying ReLU problem

**Pros:** Allows small gradient when $x < 0$  
**Cons:** Small slope may still cause slow learning

---

## 5. Softmax

**Equation (for multi-class classification):**
$$
\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_{j} e^{z_j}}
$$

**Range:**  
$ (0, 1) $, and all outputs sum to 1

**Use When:**
- Multi-class classification (used in the output layer)

**Pros:** Gives probability distribution over classes  
**Cons:** Not used in hidden layers

---

## Summary Table

| Activation     | Range             | Use Case                     |
|----------------|------------------|------------------------------|
| Sigmoid        | (0, 1)           | Binary classification output |
| Tanh           | (-1, 1)          | Hidden layers (zero-centered)|
| ReLU           | [0, ∞)           | Most hidden layers           |
| Leaky ReLU     | (-∞, ∞)          | Hidden layers (ReLU fix)     |
| Softmax        | (0, 1), sum=1    | Output of multi-class model  |


# 🧱 Common Keras Layers and Their Usage

Keras provides a wide range of layers to build neural network architectures. Below are some of the most commonly used layers:

---

## 🔹 Dense Layer (Fully Connected Layer)

- Connects every neuron in the current layer to every neuron in the next layer.
- Commonly used in feedforward neural networks and as the output layer.
- Supports activation functions like ReLU, Sigmoid, Softmax, etc.

**Use Case:** Classification, regression, MLPs

---

## 🔹 Activation Layer

- Applies an activation function to the inputs.
- Can be used standalone or with layers like Dense and Conv2D.

**Use Case:** Adds non-linearity (e.g., ReLU, Sigmoid, Tanh, Softmax)

---

## 🔹 Input Layer

- Defines the shape and data type of the input.
- Required when using the Functional API.

**Use Case:** Entry point for models built using Functional API

---

## 🔹 Dropout Layer

- Randomly sets a fraction of input units to zero during training.
- Helps prevent overfitting by promoting generalization.

**Use Case:** Between layers in deep networks to regularize

---

## 🔹 Flatten Layer

- Converts multi-dimensional input into a 1D vector.
- Typically used to move from convolutional to dense layers.

**Use Case:** In CNNs before feeding into fully connected layers

---

## 🔹 Conv2D Layer

- Performs 2D convolution on input images.
- Extracts spatial features like edges, textures, and patterns.

**Use Case:** Image classification, object detection

---

## 🔹 MaxPooling2D Layer

- Reduces the spatial size of feature maps.
- Helps in downsampling and reducing computation.

**Use Case:** After Conv2D to reduce spatial dimensions

---

## 🔹 BatchNormalization Layer

- Normalizes the inputs across the batch.
- Accelerates training and improves performance.

**Use Case:** Between layers to stabilize and accelerate training

---

## 🔹 LSTM Layer

- A type of recurrent neural network (RNN) that handles long-term dependencies.
- Suitable for sequential and time-series data.

**Use Case:** Text processing, speech recognition, time series prediction

---

## ✅ Summary Table

| Layer               | Description                                | Common Use Case                     |
|--------------------|--------------------------------------------|-------------------------------------|
| Dense              | Fully connected layer                      | Classification, regression          |
| Activation         | Applies non-linearity                     | All types of models                 |
| Input              | Specifies input shape                      | Functional API                      |
| Dropout            | Randomly disables neurons                 | Preventing overfitting              |
| Flatten            | Flattens input                            | CNN to Dense transition             |
| Conv2D             | 2D convolution for images                 | Image processing, CNNs              |
| MaxPooling2D       | Downsamples feature maps                  | Reducing spatial dimensions         |
| BatchNormalization | Normalizes layer input                    | Speeds up and stabilizes training   |
| LSTM               | Memory for sequences                      | Time series, text, sequences        |
