# Machine Learning with `scikit-learn`

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set style
sns.set_style("whitegrid") 
sns.set_palette('viridis')
plt.rcParams['axes.spines.top'] = False
plt.rcParams['axes.spines.right'] = False
plt.rcParams['font.family'] = 'monospace'

### Machine Learning
## Cross validation and data resampling
from sklearn.model_selection import train_test_split

## Modeling
from sklearn.linear_model import LogisticRegression

## Model Evaluation
from sklearn.metrics import (
    confusion_matrix,
    precision_score, 
    recall_score,
    f1_score,
    classification_report)

## Basic Machine Learning Workflow
Below is a commonly used workflow

<br>
<center>
<img 
  src="../assets/model_process.png" 
  alt="Modeling Process" 
  style="width:auto;height:300px;"
> 
<br>
<br>

### Data Resampling
Splitting our data into a training and test set is important for evaluating model performance and estimating generalization to new data

- **Under-fitting**
    - Model can't capture complex trends in the data
    - Give away - poor performance on both training and test datasets

<br>
<br>
<center>
<img 
  src="../assets/under_fitting.png" 
  alt="Underfitting" 
  style="width:auto;height:375px;"
> 
<br>
<br>

<br>

- **Over-fitting**
    - Model finds trends that don't exist
    - Give away - great performance training data and *poor performance test dataset*

<br>
<br>

<center>
<img 
  src="../assets/under_fitting.png" 
  alt="Underfitting" 
  style="width:auto;height:375px;"
> 
<br>
<br>

### Optimal Complexity

Generally, as we go from simple models to more complex:
- Training error continues to decrease (potentially reaching zero!)
- Test error decreases initially, but increases when we are over-fitting
- Goal is to find the optimal model complexity to ensure good performance on new data

<br>
<br>

<center>
<img 
  src="../assets/optimal_complexity.png" 
  alt="Underfitting" 
  style="width:auto;height:375px;"
> 
<br>
<br>

## Model Objects in `scikit-learn`

Each model is defined as its own object in `sklearn` with `fit()` and `predict()` methods.

A huge benefit of using `scikit-learn` for machine learning is its amazing [documentation](https://scikit-learn.org/stable/index.html)