### Regression Metrics

1. R Squared
2. Adjusted R Squared
3. Mean Absolute Error
4. Mean Squared Error
5. Root Mean Squared Error

### 1. R Squared 
R Squared is used to determine the strength of correlation between the predictors and the target (how good regression model is). It helps us determine how much of the variation in the value of a dependent variable (y)  is explained by the values of the independent variable(s) (X, X1, X, X2 ..). It takes values between 0 and 1. Higher R Squared means better model.

Formula : R_Squared = 1 - SSR / SST where <br>
SSR (Sum of Squares of Residuals) is the sum of the squares of the difference between the actual observed value (y) and the predicted value (y^). <br>
SST (Total Sum of Squares) is the sum of the squares of the difference between the actual observed value (y) and the average of the observed y value (yavg). <br>
We want to achive low SSR => low SSR/SST => high 1 - SSR/SST = R_Squared (R Squared close to 1)

### 2. Adjusted R Squared
Adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. Like R Squared it is used to determine the strength of correlation between the predictors and the target (how good regression model is). <br>
Formula : 1 - (1 - R_Squared) (M - 1) / M - N - 1 where M is total training size and N nr of features. <br>

**Diference between R Squared and Adjusted R Squared?**<br>
When we add 'bad' (not useful) independent features to our model, R Squared slightly increases (basically R Squared will always increase when we add new features) while Adjusted R Squared is going to decrease. This happens because of the way R Squared and Adjusted R Squared are calculated. So Adjusted R Squared is in general more prefered since it has the potential to be more accurate. (we dont want in our model unuseful independent features). <br>
R Squared is always positive while Adjusted R Squared can be negative if we have a very bad regression model.

### 3. MAE 
It is not very sensitive to outliers in comparison to MSE since it doesn't punish huge errors (it calculates absolute value and does not mupltiply the residual by 2 like MSE; model is more likely to predict the outliers incorrectly since they are unusual data points) which means errors from outliers are not going to increase the cost function (overall mae error) that much / single bad prediction would not ruin the entire model's predicting abilities. MSE is a differentiable function that makes it easy to perform mathematical operations in comparison to a non-differentiable function like MAE. Therefore, in many models, RMSE is used as a default metric for calculating Loss Function despite being harder to interpret than MAE. <br>
**When to use ?** So if we find out that data has outliers and we want to ignore them / we dont want the erros caused by outliers to affect the overall model performance we can choose the mae as performance metric to evaluate our model.<br
![image-4.png](attachment:image-4.png)


### 4. MSE
It is one of the most commonly used metrics, but least useful when a single bad prediction would ruin the entire model's predicting abilities, i.e when the dataset contains a lot of noise (it penalizes large errors). It is most useful when the dataset  does not contain outliers, or unexpected values (too high or too low values). <br>
**Whent to use ?**<br> If the data has outliers and we dont want to ignore them / we want that the model penalizes large errors we can go for MSE. <br>
![image-2.png](attachment:image-2.png)

### 5. RMSE 
In RMSE, the errors are squared before they are averaged. This basically implies that RMSE assigns a higher weight to larger errors. This indicates that RMSE is much more useful when large errors are present and they drastically affect the model's performance. It avoids taking the absolute value of the error and this trait is useful in many mathematical calculations. In this metric also, lower the value, better is the performance of the model. <br>
**When to use ?** <br> If the data has outliers and we dont want to ignore them / we want that the model penalizes large errors we can go for MSE. <br>
 ![image.png](attachment:image.png)

### Classification Metrics

1. Accuracy
2. Recall / Precision / F1 Score
3. Confusion Matrix (True Positive, True Negative, False Positive, False Negative)
4. AUC Score
5. Binary CrossEntropy or Loglikehood, 
6. Categorical CrossEntropy, 
7. Parse Categorical CrossEntropy

1. <b>Accuracy</b> shows the amout of examples that are correctly classified by our model. Formula : (True_Positive + True_Negative) / (True_Positive + True_Negative + False_Positive + False_Negative) <br>

2. <b>Precision</b> shows how many positive examples were correctly classified (as positive) out of all positive predicted examples by the model. This would be the Precision for the positive class. The same goes for negative class. <br>
Formula : Precision of positive class = TP / (TP + FP) , Precision of negative class = TN / (TN + FP)
<b>Recall</b> shows how many positive examples were correctly classified (as positive) by our model out of all real positive examples. This would be the Recall for the positive class. The same goes for the negative class. <br> 
Formula : Recall of positive class = TP / (TP + FN), Recall of negative class = TN / (TN + FN). <br>
We aim to have a model that achieves a high Precision and high Recall at the same time. But when precision increases, recall decreases which means that there is a trade-off between precision and recall. So we could have a model with perfect recall=1 and precision=0 which is not a good model. Since it is difficult to select the best models by looking at these 2 scores, f1-score was introduced. 


2. <b>F1 Score</b> is a way of combining the precision and recall and is defined as harmonic/weighted mean of precision and recall. It shows how well we the model performs in general in a particular class (we will have a f1-score computed for each class in the target). It takes values between 0 and 1. A good f1-score means we have a good recall and precision, low f1-score means we have a low precision and recall. <br>
Formula : F1-Score = (2*Recall*Precision) / (Recall + Precision)

3. <b>Confusion Matrix</b> is a NxN matrix which is used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. With the help of confusion matrix we can calculate different metrics like accuracy, f1_score, precision, recall. It includes : <br><br>
True Positive => nr of examples that are actually postive (1) and correctly (positive) predicted by our model (1) <br>
True Negative => nr of examples that are actually negative (0) and correctly (negative) predicted by our model (0) <br>
False Positive => nr of examples that are actually negative (0) and are incorrectly (positive) predicted by our model (1) <br>
False Negative => nr of examples that are actually positive (1) and are incorrectly (negative) predicted by our model (0) <br>

### - What metrics do we use in case of imbalanced data and balanced data?<br>
In case of balanced data we can use accuracy. In case of imbalanced data we can not use accuracy metric because suppose if we predict the majority class 90% accurate and minority class 1% and overall accuracy would be 91% which actually is a high accuracy. However this model is not a good model because it only predict good one class (majority class). Instead we use other metrics like recall, precision, f1-score, roc_auc_score. It depends on the problem statement that we are trying to solve which one from these performance metrics we should choose for example we may want to increase true negative, reduce false positive etc.

### - Show examples when we should reduce false negative (increase true positive) or reduce false positive (increase true negative)
* Example when we must reduce false negative (increase true positive) (predict value 0, actual value 1) : suppose we want to predict if a person has a disease (cancer). If we predict he does not have cancer but in reality he has cancer, he would not get any medical treatment and that we would be a big problem and he may even have bad healthy consequences or may die. False postive if we predict he has disease but he does not actually have. In this case this is not such a big problem.<br>
* Example when we must reduce false positive (increase true negative) (predict value 1, actual value 0) : suppse we predict whether a email is spam or not. If we predict that a particular email is spam but in reality it is not a spam that we would be a problem for the user since he would/may miss the email. The opposite if we predict email is not spam but actually it is spam the user is going to see it anyway.

**Note:**<br>
In sklearn we perform this when we choose the performance metrics during cross validation and model evaluation. In those cases we must choose f1_score, recall, precision etc.


**Note :** <br>
Never compare two different metrics with each other since they are calculated in different ways and give different meaning.

### - What  difference between loss and accuracy (or any other performance metric like f1 score, recall, precision) and how to interpret them when we have a value of loss and accuracy ?

Loss can be seen as a distance between the true values of the problem and the values predicted by the model (using probabilities in case of classification). It calculates the average of losses made on each training example (for some training examples loss/error might be higher, for some other might be lower). It is the function that needs to be minimized when finding parameters of a NN using Gradient Descent. Greater the loss is, more huge is the errors you made on the data while accuracy describes just what percentage of your data are classified correctly (using threshold=0.5). <br>
So it may happen that our model has missclassified only few examples but made huge errors on them so this would cause high loss and high accuracy. If we want to penalize our model by huge errors we should consider more the loss value, otherwise the accuracy or any other metric like f1_score, recall, precision etc.

* A low accuracy and huge loss means you made huge errors on many training examples.
* A low accuracy but low loss means you made little errors on many training examples.
* A great accuracy with low loss means you made low errors on few training examples (best case)


### - In the linear regression, can we use RSS (MSE) to compare a model where we have log(target) and a model where we just have target ?
No. RSS (MSE) is the sum of squared distances (label - prediction). If we have log(label) RSS will automatically be lower. So RSS is dependen on the scale of label y. But we could use R Squared since we divide the RSS with the variance of labels/target y.

### - Can R Squared be negative ?
R Squared = 1 - (SSR/SST). If SSR>SST then R Squared is negative. This is the case if the model is worse the just the mean of y (a model which always predicts the mean of y). This usually happens when the data has only few rows.