# Introduction: Machine Learning Project Part 3

In the first two parts of this project, we implemented the first 6 steps of the machine learning pipeline:

1. Data cleaning and structuring
2. Exploratory Data Analysis
3. Feature Engineering/Selection
4. Evaluate/compare several machine learning models on a performance metric
5. Perform hyperparameter tuning on the best model
6. Evaluate the best model on the testing set
7. Interpret the model results
8. Draw conclusions and write a well-documented report

In this notebook, we will concentrate on the last two steps, which is where the most value in the project comes from. We have our final model and the results, but what can we take away from the results? To answer this question, we can employ a variety of techniques to try and understand our model.

### Imports

We will use a similar stack of data science and machine learning imports as in the previous parts. These are all fairly standard tools of the trade, so being familiar with them will be very useful in your data science career!

In [1]:
# Pandas and numpy for data manipulation
import pandas as pd
import numpy as np

# No warnings about setting value on copy of slice
pd.options.mode.chained_assignment = None

# Matplotlib and seaborn for visualization
import matplotlib.pyplot as plt
%matplotlib inline

# Set default font size
plt.rcParams['font.size'] = 24

from IPython.core.pylabtools import figsize

import seaborn as sns

sns.set(font_scale = 2)

pd.set_option('display.max_columns', 60)

# Imputing missing values
from sklearn.preprocessing import Imputer, MinMaxScaler

# Machine Learning Models
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor

import itertools

In [None]:
# Read in data into dataframes from GitHub url
X = pd.read_csv('https://raw.githubusercontent.com/WillKoehrsen/machine-learning-project/master/data/training_features.csv', header = 0)
X_test = pd.read_csv('https://raw.githubusercontent.com/WillKoehrsen/machine-learning-project/master/data/testing_features.csv', header = 0)
y = pd.read_csv('https://raw.githubusercontent.com/WillKoehrsen/machine-learning-project/master/data/training_labels.csv', header = 0)
y_test = pd.read_csv('https://raw.githubusercontent.com/WillKoehrsen/machine-learning-project/master/data/testing_labels.csv', header = 0)

# Display sizes of data
print('Training Feature Size: ', X.shape)
print('Testing Feature Size:  ', X_test.shape)
print('Training Labels Size:  ', y.shape)
print('Testing Labels Size:   ', y_test.shape)