
---

### **1. Regression Formulas**

#### **Linear Regression**
- Hypothesis: $ \hat{y} = w^T x + b $
- Cost Function (Mean Squared Error - MSE):  
  $ MSE = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 $
- Gradient Descent Update Rule:  
  $ w = w - \alpha \frac{\partial MSE}{\partial w} $  
  $ b = b - \alpha \frac{\partial MSE}{\partial b} $

#### **Lasso Regression (L1 Regularization)**
- Cost Function:  
  $ J(w) = MSE + \lambda \sum_{j=1}^m |w_j| $
- Regularization Term: $ \lambda \sum_{j=1}^m |w_j| $

#### **Ridge Regression (L2 Regularization)**
- Cost Function:  
  $ J(w) = MSE + \lambda \sum_{j=1}^m w_j^2 $
- Regularization Term: $ \lambda \sum_{j=1}^m w_j^2 $

#### **Polynomial Regression**
- Hypothesis: $ \hat{y} = w_0 + w_1 x + w_2 x^2 + \dots + w_n x^n $
- Cost Function: Same as Linear Regression (MSE).

#### **Support Vector Regression (SVR)**
- Hypothesis: $ \hat{y} = w^T x + b $
- Cost Function (ε-Insensitive Loss):  
  $ L_\epsilon(y, \hat{y}) = \begin{cases} 
  0 & \text{if } |y - \hat{y}| \leq \epsilon \\
  |y - \hat{y}| - \epsilon & \text{otherwise}
  \end{cases} $
- Regularization Term: $ \frac{1}{2} ||w||^2 $

---

### **2. Evaluation Metrics for Regression**

#### **Mean Absolute Error (MAE)**
$ MAE = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{y}_i| $

#### **Mean Squared Error (MSE)**
$ MSE = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2 $

#### **Root Mean Squared Error (RMSE)**
$ RMSE = \sqrt{MSE} = \sqrt{\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2} $

#### **R-Squared (Coefficient of Determination)**
$ R^2 = 1 - \frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2}{\sum_{i=1}^n (y_i - \bar{y})^2} $

---

### **3. Clustering**

#### **K-Means Clustering**
- Objective: Minimize the within-cluster sum of squares (WCSS):  
  $ WCSS = \sum_{j=1}^k \sum_{i=1}^n ||x_i - \mu_j||^2 $
- Update Cluster Centers:  
  $ \mu_j = \frac{1}{|C_j|} \sum_{x_i \in C_j} x_i $

#### **DBSCAN**
- Density-Based Clustering:
  - Core Point: A point with at least `min_samples` within `ε` radius.
  - Border Point: A point within `ε` radius of a core point but not a core point.
  - Noise Point: A point that is neither a core nor a border point.

---

### **4. Ensemble Techniques**

#### **Boosting (e.g., AdaBoost, Gradient Boosting)**
- Weighted Error:  
  $ \epsilon_t = \frac{\sum_{i=1}^n w_i \cdot I(y_i \neq \hat{y}_i)}{\sum_{i=1}^n w_i} $
- Model Weight:  
  $ \alpha_t = \frac{1}{2} \ln \left( \frac{1 - \epsilon_t}{\epsilon_t} \right) $
- Update Weights:  
  $ w_i = w_i \cdot \exp(\alpha_t \cdot I(y_i \neq \hat{y}_i)) $

#### **Gradient Boosting**
- Residual Calculation:  
  $ r_i = y_i - \hat{y}_i $
- Model Update:  
  $ \hat{y}_i = \hat{y}_i + \alpha \cdot h(x_i) $

---

### **5. Gradient Descent**

#### **General Update Rule**
$ w = w - \alpha \frac{\partial J(w)}{\partial w} $

#### **Stochastic Gradient Descent (SGD)**
- Update Rule for Each Sample:  
  $ w = w - \alpha \frac{\partial J(w, x_i, y_i)}{\partial w} $

#### **Mini-Batch Gradient Descent**
- Update Rule for a Batch of Samples:  
  $ w = w - \alpha \frac{1}{m} \sum_{i=1}^m \frac{\partial J(w, x_i, y_i)}{\partial w} $

---

### **6. Classification Metrics**

#### **Accuracy**
$ \text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}} $

#### **Precision**
$ \text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}} $

#### **Recall (Sensitivity)**
$ \text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}} $

#### **F1-Score**
$ \text{F1-Score} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} $

#### **Confusion Matrix**
|                | Predicted Positive | Predicted Negative |
|----------------|---------------------|---------------------|
| **Actual Positive** | True Positive (TP)   | False Negative (FN)  |
| **Actual Negative** | False Positive (FP)  | True Negative (TN)   |

---

### **7. Clustering Metrics**

#### **Silhouette Score**
$ s(i) = \frac{b(i) - a(i)}{\max(a(i), b(i))} $
- $ a(i) $: Average distance within the cluster.
- $ b(i) $: Average distance to the nearest cluster.

#### **Davies-Bouldin Index**
$ DB = \frac{1}{k} \sum_{i=1}^k \max_{j \neq i} \left( \frac{\sigma_i + \sigma_j}{d(c_i, c_j)} \right) $
- $ \sigma_i $: Average distance within cluster $ i $.
- $ d(c_i, c_j) $: Distance between cluster centers.

---
