# Test a trained model

Once you have trained a model, you can test it with the test data you put aside.
We will start by rerunning the code from the previous notebook to create a trained model

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

In [2]:
# Load our data from the csv file
delays_df = pd.read_csv('./Lots_of_flight_data.csv') 

# Remove rows with null values since those will crash our linear regression model training
delays_df.dropna(inplace=True)

# Move our features into the X DataFrame
X = delays_df.loc[:,['DISTANCE', 'CRS_ELAPSED_TIME']]

# Move our labels into the y DataFrame
y = delays_df.loc[:,['ARR_DELAY']] 

# Split our data into test and training DataFrames
X_train, X_test, y_train, y_test = train_test_split(
                                                    X, 
                                                    y, 
                                                    test_size=0.3, 
                                                    random_state=42
                                                   )
regressor = LinearRegression()     # Create a scikit learn LinearRegression object
regressor.fit(X_train, y_train)    # Use the fit method to train the model using your training data

LinearRegression()

## Prediction

Use **Scikitlearn LinearRegression `predict`** to have our trained model predict values for our test data.  
We stored our test data in `X_Test`  
We will store the predicted results in  `y_pred`

In [3]:
y_pred = regressor.predict(X_test)

In [4]:
y_pred

array([[3.47739078],
       [5.89055919],
       [4.33288464],
       ...,
       [5.84678979],
       [6.05195889],
       [5.66255414]])

## Model Testing/Evaluation

In this step, we will evaluate the model by using the standard metrics available in `sklearn.metrics`. The quality of our model shows how well its predictions match up against actual values. We will assess how well the model performs against the test data using the following standard metrics:
- Mean Absolute Error
- Mean Squared Error
- R^2 score (the coefficient of determination)

The metrics have been discussed further in the README of this section.

In [5]:
from sklearn import metrics

mae = metrics.mean_absolute_error(y_test, y_pred)
mse = metrics.mean_squared_error(y_test, y_pred)
r2 = metrics.r2_score(y_test, y_pred)

print("The Model Performance for the Testing Set")
print("-----------------------------------------")
print("MAE is {:.2f}".format(mae))
print("MSE is {:.2f}".format(mse))
print("R2 score is {:f}".format(r2))

The Model Performance for the Testing Set
-----------------------------------------
MAE is 23.09
MSE is 2250.44
R2 score is 0.000096
