# Model Evaluation Metrics : AUC-ROC
AUC stands for Area Under Curve & ROC for Receiving Operator Characteristics
ROC is a evaluation metric for binary classification tasks which is represented in the form of a curve, which me use the AUC to value.
It computes the amount of True positives in fonction of the false positives.

You can easily vizualize that a perfect model will give a square of side 1 and AUC of 1 which means that there are never false positives.
![Perfect](https://developers.google.com/static/machine-learning/crash-course/images/auc_1-0.png)
The AUC-ROC represents the probability that the model, if given a randomly chosen positive and negative example, will rank the positive higher than the negative.

An Random model will gave a straight line from (0,0) to (1,1) 
![Random](https://developers.google.com/static/machine-learning/crash-course/images/auc_0-5.png)

A good model will have a curve somewhat looking like : 
![GoodModel](https://developers.google.com/static/machine-learning/crash-course/images/auc_0-93.png)
Which we can read as when there is no true positives : the model will not predict false positives => meaning that all datapoints are negatives.

A condition for this is that the model is roughly balanced.
It's important to see the error rate of your model in a glimpse of an eye.

For imbalanced datasets you can compute precision-recall curve that'll give you a similar outview.
precision = tp/(tp+fp)
recall = tp/(tp+fn) 

# Handling Imbalanced Datasets : 

## Oversampling
Oversampling provides a method to rebalance classes before model training commences. By replicating minority class data points, oversampling balances the playing field and prevents algorithms from disregarding significant yet sparse classes.
You can use random oversampling which just duplicates datapoints, or SMOTE (Synthetic Minority Oversampling Technique), and ADASYN (Adaptive Synthetic Sampling Approach for Imbalanced Learning) to strategically generate new data points in that class.
You can also use data augmentation techniques, such as image rotations in case of image classifications, replacement by synonyms in text etc
![Oversampling](https://miro.medium.com/v2/resize:fit:720/format:webp/0*HWTiFVseEi0CNFg_.png)
## Undersampling
It's the opposite somewhat of oversampling.
Balance uneven datasets by keeping all of the data in the minority class and decreasing the size of the majority class.  Though it has disadvantages, such as the loss of potentially important information
## F1-Score
In imbalanced case you have to pay attention to your F1 score. 
Why : it's the harmonic mean between precision and recall : 2 / (1/precision + 1/recall) = 2 * (precision*recalll) / (precision + recall)

if precision and recall are both high, meaning close to 1 you can interpret it two ways :
- your fp and fn rates are low meaning that your model is not, in case of imbalance, just rendering the most present class
- your F1-Score is also close to 1

If the model just returns the most present class, one of the precision or recall will drop causing F1 to drop.
What's the threshold for high/low : the ratio of the most present class (with no resampling)

# Feature Engineering
Feature engineering is a technique that leverages data to create new features that are not initially in the training set.
These new features are a combinaison or function of previous features. It's goal is to make data richer, clearer and easily usable for the defined model to improve accuracy and robustness. It also lowers the complexity of the model as it already exposes some realationships between the input features that could've been complex to find. For instance in timeseries, we can introduce variance over a fixed time as a new feature. Removing or capping outliers is another example as well.

feature engineering can often have a more significant impact on model performance than the choice of the model itself

## Feature selection

Once you've enhanced your training data, you have to select the features that most make sense to your view to avoid giving useless information to your model and start causing overfitting. This involve­s selectively including or e­xcluding important features while ke­eping them unchanged. By doing so, it e­ffectively eliminate­s irrelevant noise from your data and re­duces the size and scope of the­ input dataset.

Feature selection can also be automated with :
- filter methods : They offer computational efficie­ncy and effectivene­ss in eliminating duplicate, correlate­d, and unnecessary feature­s.
- wrapper methods : Train the mode­l iteratively using differe­nt subsets of features. The­y determine the­ model’s performance and add or re­move features accordingly. These model are computationally greedy.
- embedded methods : that combines the advantages of the two above by integrating feature se­lection directly into the le­arning algorithm itself. These me­thods are computationally efficient and conside­r feature combinations, making them e­ffective in solving complex proble­ms.

# Hyperparameter Tuning

Feature selection is a hyperparameter for instance : do I choose regularization L1 or L2 or do I go for tree based models
## Hyperparameter
Your learning model has parameters that are modified during the learning phase and parameters that are not.
The latter are known as hyperparameter : all the variables that you set for your model to train on:
- which evaluation metric
- early stopping
- which model
- batch_size
- tree depth and number of estimators for tree based models
- thresholds in probability to decide the class output

Hyperparameter tuning consist of finding the right combination of hyperparameters that maximizes the accuracy at the end.
This can be manually or automatically with methods such as GridSearch or RandomSearch that will train iteratively your model with different combination of hyperparameters that you provided to find the optimal set that maximizes the accuracy of the model.
This can be greedy computationally during the search but can be worth it . The data scientist must make sure that he understans and fell wonfident with the final combination before using it : the reason must not be just because.

# The titanic survical 

Before the EDA we can foresee that the dataset will be imbalanced so we'll need to handle that. Being in a binary classification taks : the metrics are going to be important : precision, recall, f1 and the AUC-ROC to judge the accuracy of the model. 