# Simplified 4-Step Process for Modeling using Scikit-Learn
1. Instantiate Model
> `model = Classifier()`
2. Fit model to training data
> `model.fit(X_train, y_train)`
3. Predict on test data with the fitted model
> `pred_test = model.predict(X_test)`
4. Score the model using a metric to evaluate how well it performs
> `fbeta_score(y_test, pred_test, beta=0.5)`

# Working With Missing Data
1. Remove
> We can remove (or “drop”) the rows or columns holding the missing values
2. Impute
> Replace with mean, median, mode of frequency, univariate linear regression, etc.
3. Work Around
> We can build models that work around them, and only use the information provided

Resource: [How to Handle Missing Data](https://towardsdatascience.com/how-to-handle-missing-data-8646b18db0d4)

## Option 1: Removing
1. Ask "Why are the values missing"
> Removing data can lead to biased models

Ex: If data is of survey nature, the types of questions NOT RESPONDED TO may indicate different types of respondents

May be valuable to account for the number of, or which questions have, missing values for each observation:

| Q1 | Q2  | Q3  | Missing |
|----|-----|-----|---------|
| 1  | Nan | 1   | 1       |
| 4  | 4   | Nan | 1       |
| 1  | 2   | 1   | 0       |

### When is it ok to remove missing values?
1. Data entry errors
2. Mechanical errors
3. Didn't need the data
4. The missing data is in the column to be predicted
5. There is no variablity in the observations

### Other Considerations
1. Drop observations
2. Drop columns