# Train a linear regression model
When you have your data prepared you can train a model.

There are multiple libraries and methods you can call to train models. In this notebook we will use the **LinearRegression** model in the **scikit-learn** library

We need our DataFrame, with data loaded, all the rows with null values removed, and the features and labels split into the separate training and test data. So, we'll start by just rerunning the commands from the previous notebooks.

In [6]:
import pandas as pd
from sklearn.model_selection import train_test_split

In [16]:
# Load our data from the csv file
delays_df = pd.read_csv('Lots_of_flight_data.csv') 

# Remove rows with null values since those will crash our linear regression model training
delays_df.dropna(inplace=True)

# Move our features into the X DataFrame
X = delays_df.loc[:,['DISTANCE', 'CRS_ELAPSED_TIME']]

# Move our labels into the y DataFrame
y = delays_df.loc[:,['ARR_DELAY']] 

print(X)
print(y)
print(delays_df)

# Split our data into test and training DataFrames
X_train, X_test, y_train, y_test = train_test_split(
                                                    X, 
                                                    y, 
                                                    test_size=0.3, 
                                                    random_state=42
                                                   )

        DISTANCE  CRS_ELAPSED_TIME
0           1670               225
1           1670               225
2            580               105
3            580               105
4            580               100
...          ...               ...
299995       386                87
299996       386                92
299997       221                71
299998       221                81
299999       240                66

[295832 rows x 2 columns]
        ARR_DELAY
0           -17.0
1           -25.0
2           -13.0
3           -12.0
4            -7.0
...           ...
299995       -7.0
299996      -13.0
299997       -2.0
299998      -13.0
299999        6.0

[295832 rows x 1 columns]
           FL_DATE OP_UNIQUE_CARRIER TAIL_NUM  OP_CARRIER_FL_NUM ORIGIN DEST  \
0       2018-10-01                WN   N221WN                802    ABQ  BWI   
1       2018-10-01                WN   N8329B               3744    ABQ  BWI   
2       2018-10-01                WN   N920WN               1019    AB

Use **Scikitlearn LinearRegression** *fit* method to train a linear regression model based on the training data stored in X_train and y_train

In [15]:
from sklearn.linear_model import LinearRegression

regressor = LinearRegression()     # Create a scikit learn LinearRegression object
regressor.fit(X_train, y_train)    # Use the fit method to train the model using your training data

The *regressor* object now contains your trained Linear Regression model