# **Mathematical Calculations Related to Finding Accuracy of Machine Learning Algorithms**

In machine learning, evaluating the performance of a model is crucial to understanding how well it generalizes to unseen data. Several mathematical metrics are used to assess the accuracy and effectiveness of models, especially for classification tasks. Below are the key metrics and calculations related to the accuracy of machine learning algorithms.

---

## **1. Accuracy**

**Definition**:
Accuracy is one of the simplest and most commonly used metrics for evaluating classification models. It measures the proportion of correct predictions made by the model.

### Formula:
The accuracy is calculated as the ratio of correct predictions to the total number of predictions:

$$ \text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}} $$

Alternatively, in terms of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN):

$$ \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} $$

Where:
- **TP (True Positive)**: Correctly predicted positive cases
- **TN (True Negative)**: Correctly predicted negative cases
- **FP (False Positive)**: Incorrectly predicted positive cases (Type I error)
- **FN (False Negative)**: Incorrectly predicted negative cases (Type II error)

---

## **2. Precision**

**Definition**:
Precision (also called Positive Predictive Value) measures the proportion of positive predictions that are actually correct. It is particularly important in situations where false positives are costly (e.g., fraud detection).

### Formula:
$$ \text{Precision} = \frac{TP}{TP + FP} $$

---

## **3. Recall (Sensitivity or True Positive Rate)**

**Definition**:
Recall (also called Sensitivity or True Positive Rate) measures the proportion of actual positive cases that were correctly identified by the model. It is useful when false negatives are critical (e.g., in medical diagnosis).

### Formula:
$$ \text{Recall} = \frac{TP}{TP + FN} $$

---

## **4. F1-Score**

**Definition**:
The F1-score is the harmonic mean of precision and recall. It is a single metric that combines both precision and recall into one score. It is particularly useful when you need a balance between precision and recall.

### Formula:
$$ \text{F1-Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} $$

Alternatively:

$$ \text{F1-Score} = 2 \times \frac{TP}{2TP + FP + FN} $$

---

## **5. Specificity (True Negative Rate)**

**Definition**:
Specificity measures the proportion of actual negative cases that are correctly identified. It is useful in scenarios where false positives need to be minimized.

### Formula:
$$ \text{Specificity} = \frac{TN}{TN + FP} $$

---

## **6. ROC Curve and AUC (Area Under the Curve)**

**Definition**:
The **Receiver Operating Characteristic (ROC)** curve is a graphical representation of the true positive rate (recall) against the false positive rate. The **Area Under the Curve (AUC)** provides an aggregate measure of the model’s ability to discriminate between classes.

### Formula:
The **false positive rate (FPR)** and **true positive rate (TPR)** are calculated as follows:

$$ \text{FPR} = \frac{FP}{FP + TN} $$

$$ \text{TPR} = \frac{TP}{TP + FN} $$

The **AUC** is the area under the ROC curve and represents the model's ability to rank positive instances higher than negative ones. An AUC of 0.5 means the model is no better than random guessing, while an AUC of 1.0 indicates perfect performance.

---

## **7. Matthews Correlation Coefficient (MCC)**

**Definition**:
The Matthews Correlation Coefficient (MCC) is a more informative metric that balances all four confusion matrix categories. It is especially useful when dealing with imbalanced datasets.

### Formula:
$$ \text{MCC} = \frac{TP \times TN - FP \times FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}} $$

The MCC returns a value between -1 and 1:
- **1**: Perfect prediction
- **0**: No better than random prediction
- **-1**: Completely wrong prediction

---

## **8. Logarithmic Loss (Log Loss)**

**Definition**:
Log Loss measures the performance of a classification model where the output is a probability value between 0 and 1. It calculates the penalty for incorrect classifications with a higher penalty for more confident but incorrect predictions.

### Formula:
For a binary classification, Log Loss is given as:

$$ \text{Log Loss} = - \frac{1}{N} \sum_{i=1}^{N} \left[ y_i \log(p_i) + (1 - y_i) \log(1 - p_i) \right] $$

Where:
- \( y_i \) is the true label (0 or 1)
- \( p_i \) is the predicted probability of the positive class for sample \( i \)
- \( N \) is the total number of samples

A lower log loss indicates better model performance.

---

## **9. Confusion Matrix**

**Definition**:
The confusion matrix is a table used to evaluate the performance of a classification model. It summarizes the true positives, true negatives, false positives, and false negatives.

### Formula:
The matrix structure is:

|              | Predicted Positive | Predicted Negative |
|--------------|--------------------|--------------------|
| **Actual Positive** | TP                 | FN                 |
| **Actual Negative** | FP                 | TN                 |

From this matrix, various metrics like accuracy, precision, recall, F1-score, and others can be derived.

---

## **10. Hamming Loss**

**Definition**:
Hamming Loss is used for multi-label classification problems, and it measures the fraction of incorrect labels across all samples.

### Formula:
$$ \text{Hamming Loss} = \frac{1}{N \cdot L} \sum_{i=1}^{N} \sum_{j=1}^{L} \mathbb{I}(y_{ij} \neq \hat{y}_{ij}) $$

Where:
- \( N \) is the number of samples
- \( L \) is the number of labels
- \( \mathbb{I}(y_{ij} \neq \hat{y}_{ij}) \) is an indicator function that is 1 if the true label is not equal to the predicted label for the \( j \)-th label of the \( i \)-th sample, and 0 otherwise.

---

## **11. Mean Absolute Error (MAE)**

**Definition**:
For regression tasks, Mean Absolute Error (MAE) measures the average of the absolute errors between predicted and actual values.

### Formula:
$$ \text{MAE} = \frac{1}{N} \sum_{i=1}^{N} |y_i - \hat{y}_i| $$

Where:
- \( y_i \) is the actual value
- \( \hat{y}_i \) is the predicted value

---

## **Conclusion**

These mathematical metrics are essential for evaluating machine learning models. While **accuracy** is often the most straightforward measure, it can be misleading in imbalanced datasets. Other metrics like **precision**, **recall**, **F1-score**, **AUC**, and **MCC** provide a more nuanced understanding of model performance, especially when dealing with imbalanced data or specific error costs. It's important to choose the right evaluation metric based on the problem at hand to get a true sense of how well the model performs.


# **Additional Evaluation Metrics for Machine Learning Algorithms**

In addition to the commonly used classification metrics like accuracy, precision, recall, and F1-score, there are several other evaluation metrics used for regression tasks and general performance assessment. Below are some of these important metrics, including Mean Squared Error (MSE), R-squared (R²), and Adjusted R-squared, among others.

---

## **1. Mean Squared Error (MSE)**

**Definition**:
Mean Squared Error (MSE) is a commonly used metric for evaluating regression models. It calculates the average squared difference between the predicted and actual values. MSE gives a high penalty to large errors due to squaring the differences.

### Formula:
$$ \text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2 $$

Where:
- \( y_i \) is the actual value
- \( \hat{y}_i \) is the predicted value
- \( N \) is the number of samples

A lower MSE indicates better model performance, with perfect predictions yielding an MSE of 0.

---

## **2. Root Mean Squared Error (RMSE)**

**Definition**:
Root Mean Squared Error (RMSE) is the square root of the Mean Squared Error. It provides the error in the same units as the original data, making it easier to interpret.

### Formula:
$$ \text{RMSE} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2} $$

Where:
- \( y_i \) is the actual value
- \( \hat{y}_i \) is the predicted value
- \( N \) is the number of samples

---

## **3. R-squared (R²)**

**Definition**:
R-squared, also known as the coefficient of determination, measures the proportion of the variance in the dependent variable that is predictable from the independent variables. It is a measure of how well the regression model fits the data.

### Formula:
$$ R^2 = 1 - \frac{\sum_{i=1}^{N} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{N} (y_i - \bar{y})^2} $$

Where:
- \( y_i \) is the actual value
- \( \hat{y}_i \) is the predicted value
- \( \bar{y} \) is the mean of the actual values
- \( N \) is the number of samples

- **Interpretation**:
  - \( R^2 = 1 \): Perfect fit
  - \( R^2 = 0 \): No fit (the model does not explain any of the variance in the target variable)

---

## **4. Adjusted R-squared (Adjusted R²)**

**Definition**:
Adjusted R-squared adjusts the R-squared value for the number of predictors in the model. It provides a more accurate measure when comparing models with different numbers of predictors, as it penalizes adding irrelevant features.

### Formula:
$$ \text{Adjusted } R^2 = 1 - \left(1 - R^2\right) \times \frac{N - 1}{N - p - 1} $$

Where:
- \( N \) is the number of samples
- \( p \) is the number of predictors (independent variables)
- \( R^2 \) is the R-squared value

- **Interpretation**:
  - A higher Adjusted \( R^2 \) indicates a better fit, but unlike R-squared, it takes into account the number of features used in the model.

---

## **5. Mean Absolute Percentage Error (MAPE)**

**Definition**:
Mean Absolute Percentage Error (MAPE) is a commonly used metric for regression tasks that measures the average absolute percentage difference between the predicted and actual values.

### Formula:
$$ \text{MAPE} = \frac{1}{N} \sum_{i=1}^{N} \left|\frac{y_i - \hat{y}_i}{y_i}\right| \times 100 $$

Where:
- \( y_i \) is the actual value
- \( \hat{y}_i \) is the predicted value
- \( N \) is the number of samples

- **Interpretation**:
  - A lower MAPE indicates better model performance. It is particularly useful when the errors need to be interpreted in percentage terms.

---

## **6. Explained Variance Score**

**Definition**:
Explained Variance Score measures the proportion of the variance in the target variable that is explained by the model. It is similar to R-squared but focuses on how much of the total variance is explained by the predictions.

### Formula:
$$ \text{Explained Variance} = 1 - \frac{\text{Variance of residuals}}{\text{Variance of the actual values}} $$

Where the residuals are the differences between the actual and predicted values.

- **Interpretation**:
  - A score close to 1 indicates a model that explains most of the variance in the data, while a score close to 0 indicates the model explains very little.

---

## **7. Huber Loss**

**Definition**:
Huber Loss is a loss function used in regression that combines the benefits of both Mean Squared Error (MSE) and Mean Absolute Error (MAE). It is less sensitive to outliers than MSE.

### Formula:
For each error \( \delta = |y_i - \hat{y}_i| \), the Huber loss is defined as:

$$
\text{Huber Loss} =
\begin{cases}
\frac{1}{2} \delta^2 & \text{for} \, \delta \leq \delta_{\text{threshold}} \\
\delta_{\text{threshold}}(\delta - \frac{1}{2} \delta_{\text{threshold}}) & \text{for} \, \delta > \delta_{\text{threshold}}
\end{cases}
$$

Where \( \delta_{\text{threshold}} \) is a user-defined threshold value.

- **Interpretation**:
  - Huber Loss behaves like MSE for small errors and like MAE for large errors, making it robust to outliers.

---

## **8. AIC (Akaike Information Criterion)**

**Definition**:
AIC is a model evaluation criterion that helps in model selection. It penalizes models for having too many parameters, which can lead to overfitting.

### Formula:
$$ \text{AIC} = 2k - 2 \ln(L) $$

Where:
- \( k \) is the number of parameters in the model
- \( L \) is the likelihood of the model

- **Interpretation**:
  - Lower AIC values indicate a better model. AIC helps to balance the goodness of fit with model complexity.

---

## **9. BIC (Bayesian Information Criterion)**

**Definition**:
BIC is similar to AIC, but it applies a larger penalty for models with more parameters. It is particularly useful when comparing models with different sample sizes.

### Formula:
$$ \text{BIC} = \ln(N)k - 2 \ln(L) $$

Where:
- \( N \) is the number of samples
- \( k \) is the number of parameters in the model
- \( L \) is the likelihood of the model

- **Interpretation**:
  - Like AIC, lower BIC values indicate better models. BIC is especially useful when the number of samples is large.

---

## **10. Theil’s U-statistic**

**Definition**:
Theil’s U-statistic measures how well the model's predictions match the true values, similar to MSE but with a focus on proportional differences.

### Formula:
$$ U = \sqrt{\frac{1}{N} \sum_{i=1}^{N} \left( \frac{y_i - \hat{y}_i}{y_i} \right)^2} $$

Where:
- \( y_i \) is the actual value
- \( \hat{y}_i \) is the predicted value
- \( N \) is the number of samples

- **Interpretation**:
  - The closer U is to 0, the better the model's predictions match the actual values. 

---

## **Conclusion**

In regression and other tasks, these additional metrics, such as **MSE**, **R-squared**, **Adjusted R-squared**, **MAPE**, and **AIC**, provide deeper insights into the model's performance. By choosing the appropriate evaluation metric, you can assess the predictive quality of your model more accurately and ensure it is well-suited to your specific problem.
