# Machine Learning Model Applications

It's crucial that you develop an ability to determine which modeling metric to use based on the type of response variable you have and the business problem you're trying to solve. Likewise, it's also essential to be able to communicate your model's performance in terms that a stakeholder can understand. For example, if you can say your model predicts home values with an error range of \\$5,000 to $10,000 instead of an error rate of 5-10%, your stakeholders will quickly understand the impact of what you've created. 

A **hyperparameter** is a parameter that is set before the machine learning process starts; it is a value that controls the learning process. Hyperparameter tuning is when you determine a set of hyperparameters to govern a learning algorithm. Hyperparameter tuning can make a world of difference — it can make or break your model's prediction abilities. For example, you may be able to improve your model's accuracy by 5% just by optimizing the hyperparameters. 

#### Supervised Machine Learning Model Evaluation Metrics
* You should consider your response variable when choosing a machine learning model to build. 
* There are a variety of models out there, so it's important to choose one that's right for your data and the business problem you're trying to solve. 

* What's an evaluation metric?
    * A way to quantify performance of a machine learning model
    * Evaluation metric $\neq$ Loss function
        * The two can be the same thing, but don't have to be and aren't always.
    * A loss function is something that you use while you are training your model, while you're optimizing, while evaluation metrics are used on an alredy trained machine learning model 
    #### Supervised Learning Metrics:
    * **Classification:** Classification accuracy, Precision, Recall, F1 Score, ROC/AUC, Precision/Recall AUC, Matthews Correlation Coefficient, Log loss... etc...
    * **Regression:** $R^{2}$, MAE, MSE, RMSE, RMSLE, etc...
    
### Classification Metrics:
#### Binary Classificstion:

* **Accuracy** = (Number of correct predictions) / (Total number of predictions)
    * Ranges from 0%-100% or 0 to 1
    * Very intuitive
    * **Easily calculate with `sklearn` with `.score` method**
    
* **`Dummy Clasifier`:** in `sklearn` something that doesn't learn anything from the data; follows a simple strategy of: either generate numbers uniformly at random, or just predict most common/most frequent class it has seen in the data
* If we don't know what our data looks like, we canmot determine if an accuracy score is good or not

* **Confusion Matrix:** 
    * Tachnically not a metric, more of a diagnostic tool
    * Helps to gain insight into the type of errors a model is making
    * Helps to understand some other metrics
    * `from skearn.metrics import confusion_matrix`
    * `confusion_matrix(y_test, model.predict(X_test))`
    * Convention is to first pass actual values, then predictions $\Rightarrow$ so that in the rows you get the actual values and in the column you get the predicted values
        * This convention used in `sklearn` and `tensorflow`. 
        * Be careful: other tools may use a different convention

* **Precision:** (True positives) / (True positives + False positives)
    * For when minimizing false positives is really important (like labeling an important email 'spam' and sending it to spam folder).
    
* **Recall:** (True positives) / (True positives + False negatives)
    * For when minimizing false negatives is really important (like when testing for a really contagious/deadly disease like Ebola).
    
* **F1 Score:** Another way of summarizing the confusion matrix in one number, taking into account both precision and recall
    * F1 Score is a harmonic mean of Precision and Recall
    * `(2 * Precision * Recall) / (Precision + Recall)` =
    * `(2 * TP) / 2*(TP + FP + FN)`

```
from sklearn.metrics import precision_score, recall_score, f1_score
print('Precision: ', precision_score(y_test, model.predict(X_test)))
print('Recall: ', recall_score(y_test, model.predict(X_test)))
print('F1 Score: ', f1_score(y_test, model.predict(X_test)))
```

```
from sklearn.model_selection import GridSearchCV
param_grid = {
    'n_estimators' : [10, 200],
    'max_features' : ['auto', 'sqrt', 'log2', 0.5],
}
gs=GridSearchCV(estimator=model, param_grid=param_grid, scoring = 'recall', n_jobs = -1)
gs.fit(X_train, y_train)
recall_score(y_test, gs.best_estimator_.predict(X_test)))
```
* In this case, GridSearchCV is going to give us the estimator that is going to give us the best recall among the possible versions of hyperparameters


* **Matthews Correlation Coefficient:**

    * Takes into account all four confusion matrix categories
    * One plus is the MCC throws errors to red flag certain situations, whereas the above mention metrics wouldn't throw errors
    * If we flip what values are considered positives vs negatives $\Rightarrow$ MCC score stays the same.
        * In contrast to F1-score, which is very sensitive to what you call a positive and what you call a negative
    * If you want to summarize a binary confusion matrix in one number, MCC is (arguably) the best way to do so.
    * Downside to MCC: does not work well in multi-class problems

\begin{equation}
MCC = \frac{(TP * TN) - (FP * FN)}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}
\end{equation}
    

----------------------------------------------------------------------------------------------------------------------
* Generating metrics based on correct vs incorrect $\Rightarrow$ VS $\Rightarrow$ calculating metrics based on probabilities    

* **Receiver Operating Characteristic (ROC) curve**:
    * x-axis = False positive rate
    * y-axis = True positive rate
    * When we have a probability generated of an example belonging to one class or the other
    

\begin{equation}
True Positive Rate =\frac {TP}{TP + FN}
\end{equation}

\begin{equation}
False Positive Rate =\frac {FP}{FP + TN}
\end{equation}

* **Area Under the Curve (AUC):**