**Types of evaluation metrics (FOR REGRESSION):** 

**Aim** : Judge the accuracy of the model's predictions versus the actual value. 

*Error = Actual Value - Predicted value*

- Summation of all the errors/loss. Simple summation! 
- Averaging of all the errors/loss. Average!

**Standard Methods** : 

- **Mean absolute error (MAE)** -> Mean of absolute errors. 
    - MAE measures the average magnitude of errors in a set of predictions, without considering their direction.
    - MAE treats all errors equally, regardless of their size. This means that it is **less sensitive to outliers** compared to MSE.
    - **Interpretation** :  In a business context like stock price prediction, a MAE of 5 dollars implies that, on average, the prediction is off by 5 dollars. It's straightforward and easy to understand.

- **Mean squared error (MSE)** -> Mean of squared errors. 
    - MSE measures the average of the squares of the errors. 
    - It **accentuates (emphasize) larger errors more than smaller ones** since the errors are squared before they are averaged.
    - MSE is **highly sensitive to outliers**. A few large errors can significantly increase the value of MSE, making it useful for models where large errors are particularly undesirable.
    - The squaring is done to penalise the one having more error much analogous to the subject in which the student scored the least need to be focussed more because that subject has the most error i.e. deviation from the ideal i.e 100. Further square makes the error positive. 
    - **Interpretation** : An error of 5 dollars yields an MSE of 25 (since (5^2 = 25)). This amplification illustrates that MSE penalizes larger errors more heavily. Thus, MSE can overstate the impact of errors compared to MAE.

- **Root mean square error (RMSE)** -> The root of Mean of squared errors. 
    - RMSE is the square root of MSE
    - It brings the error metric back to the same scale as the original data, which is not the case with MSE since it squares the errors.
    - Used **when MSE is giving very large values**.
    - RMSE is useful for its interpretability. Since it's on the same scale as the target variable, stakeholders can understand it more intuitively than MSE.
    - **Interpretation** : if RMSE is 5, it means that the model's predictions are, on average, within 5 of the actual stock prices. It's more interpretable than an MSE of  25 (which is 5 squared), as it directly relates to the scale of the stock prices.

- Modified Mean squared error (mMSE) -> kind of half of RMSE. 




**When RMSE and when MAE:**
- Use MAE when:

    - You care about typical performance
    - Outliers are not critical
    - You want a stable and explainable metric

- Use RMSE when:

    - Large errors are unacceptable
    - Outliers represent true risk
    - You want the model to â€œplay safe

- In real-world ML pipelines:

    - Train using RMSE (or MSE) for stronger gradients
    - Report MAE for business stakeholders
    - Monitor both to understand error behavior

Types of evaluation metrics (FOR CLASSIFICATION): 
- True Positive (TP)
- True Negative (TN)
- False Positive (FP)
- False Negative (FN)

These all work on each line item of the dataset and using these we will define the evalutaion matrix. 
- Accuracy : (TP+TN)/all i.e. all true cases / all cases. Overall correctness
- Precision : TP/(TP+FP) i.e. How often we are correct.
- Recall : TP/(TP+FN) i.e. How many actual spams got caught
- F1 score : Harmonic mean : a balance of precision and recall.

  F1 = 2*precision *recall/(precision + recall)

**Performance metrics**: These tell you if your model is working properly or not. 
- Confusion matrix 
- Accuracy
- Precision
- Recall
- F-beta score
 

**Confusion matrix** for a logistic regression problem is a 2x2 matrix. 

- **Accuracy** = (TP+TN)/(All)
For imbalanced datasets you usually use precision and recall. If you do your model will say an accuracy of 90% but in actual it won't be 90%. 

- **Precision** : (TP)/(TP+FP). Out of all the actual values, how many were correctly predicted? 

    - We try to reduce the false positives here, like we don't want a non psam email to be modelled aas spam i.e we don't false positives. 


- **Recall** :  (TP)/(TP+FN) . Out of all the predicted values, how many were correctly predicted? 
    - We try to reduce false negatives here liuke we don't want a person actually having diabetes but the model categorising as not having diabetes. 

- **F-Beta score** - If FP & FN are both important, we take beta =1 and then we have a F1 score, 
    - f1 sCORE = 2 X PRECISION X RECALL/(PRECISION + RECALL)

cddcdc