# ⚙️ Machine Learning Hyperparameters


## 1️⃣ Learning Rate (η)

🔹 **What:**

It’s the step size that controls how much we update the model’s weights during training.  


🔹 **Why:**

It decides how fast or slow your model “learns”.

- If it’s too high → the model takes huge steps, jumps over the minimum (fails to converge).  
- If it’s too low → the model learns very slowly, takes ages to reach the minimum.


🔹 **How:**

You usually start with a small value like `0.01` or `0.001`, and sometimes use **Learning Rate Schedulers** (to decrease it over time).

**Example:**
```python
optimizer = Adam(learning_rate=0.001)



---

##  2️⃣ Batch Size

### 🔹 What:
The number of training samples processed before the model updates its weights once.


### 🔹 Why:
To balance between **speed** and **accuracy**.

- **Large batch size →** more accurate gradient (stable learning) but uses lots of memory.  
- **Small batch size →** faster updates but noisy (less stable).


### 🔹 How:
You divide the dataset into **mini-batches** (e.g., 32, 64, 128).  
Each mini-batch runs through the network → loss computed → weights updated.


### 🔹 Example:
```python
model.fit(X_train, y_train, batch_size=64, epochs=10)


---

##  3️⃣ Number of Epochs

### 🔹 What:
One **epoch** = one full pass through the entire training dataset.


### 🔹 Why:
You rarely learn everything in one go.

- **Too few epochs →** underfitting (model hasn’t learned enough).  
- **Too many epochs →** overfitting (model memorizes training data).


### 🔹 How:
You set it as a number — for example, `epochs=50` — and use **Early Stopping** to halt training if performance stops improving.


### 🔹 Example:
```python
model.fit(X_train, y_train, epochs=50)

### 🔹 Analogy:

- Think of epochs as study revisions.
- You read your textbook (dataset) once → you’ll forget things (underfit).
- You reread it 100 times → you memorize examples but forget concepts (overfit).
- Reading it 5–10 times, grasping key ideas → perfect balance.

---

##  4️⃣ Optimizer

### 🔹 What:
It’s the **algorithm that adjusts the model’s weights** to minimize the loss function using gradients.


### 🔹 Why:
Because **raw gradient descent** is simple but inefficient.  
Optimizers make learning **faster** and **more stable**.


### 🔹 How:
They modify how gradients update weights. Common optimizers include:

- **SGD:** Basic gradient descent.  
- **Momentum:** Adds “inertia” to smooth the path.  
- **Adam:** Adaptive learning rate + momentum (**most used**).  
- **RMSProp:** Similar to Adam, adjusts step size per parameter.


### 🔹 Example:
```python
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=optimizer, loss='mse')


---

##  5️⃣ Loss Function

### 🔹 What:
A **mathematical function** that measures how far your model’s predictions are from the actual values.

### 🔹 Why:
It’s what the model tries to **minimize** during training.  
Without it, the model has **no idea** if it’s improving or not.

### 🔹 How:
The type of loss function depends on the **problem type**:

####  Regression → Mean Squared Error (MSE)


####  Classification → Cross Entropy / Log Loss

### 🔹 Example:
```python
model.compile(optimizer='adam', loss='binary_crossentropy')
