# Linear Regression: Predicting Taxi-Out Delay
## Given the dataset, predict the runway time of the flight.

#### Task Details
1. Go throught the dataset and perform preprocessing and then perform a 90:10 split and for train and test pruposes.
2. Firstly label encode the columns which are required.
3. Your target or **y variable is TAXI-OUT time**. Use all 8 algorithms above on the dataset with loss score as RMSE (Root mean Square Error).
4. Now, One-Hot encode all the data points and preform the 3rd Step again.

Keep in mind that you will be using the same splitted dataframe for all the training and testing and should not split again.

##### Some models to consider:-

Linear Models:-

1. Linear Regression
2. Ridge Regression(Popularily L1)
3. Lasso Regression(Popularily L2)

Non linear Models:-

1. KNN model
2. SVR
3. Naive Bayes
4. Random Forest
5. LightGBM(Tree Based Model)

##### Expected Submission
Open Submission till end of July.
Submit a Notebook: End-Goal: We will see how label encoding or one hot encoding is better for the model and which out of the 8 algorithms which is the best. Just a cojparative study and plots to understand the results on bigger datasets.

##### Evaluation
Having a complete report along with comparative graphs.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split

### Configuring the Data Frame
> **Source:** MUHAMMET ALI BÜYÜKNACAR'S [work](https://www.kaggle.com/buyuknacar/jfk-flight-notebook). 

In [2]:
pd.set_option('display.width', 1200)
pd.set_option('display.max_columns', 25)

### Loading the data from csv file

In [3]:
taxi = pd.read_csv('M1_final.csv')
taxi.head()

Unnamed: 0,MONTH,DAY_OF_MONTH,DAY_OF_WEEK,OP_UNIQUE_CARRIER,TAIL_NUM,DEST,DEP_DELAY,CRS_ELAPSED_TIME,DISTANCE,CRS_DEP_M,DEP_TIME_M,CRS_ARR_M,Temperature,Dew Point,Humidity,Wind,Wind Speed,Wind Gust,Pressure,Condition,sch_dep,sch_arr,TAXI_OUT
0,11,1,5,B6,N828JB,CHS,-1,124,636,324,323,448,48,34,58,W,25,38,29.86,Fair / Windy,9,17,14
1,11,1,5,B6,N992JB,LAX,-7,371,2475,340,333,531,48,34,58,W,25,38,29.86,Fair / Windy,9,17,15
2,11,1,5,B6,N959JB,FLL,40,181,1069,301,341,482,48,34,58,W,25,38,29.86,Fair / Windy,9,17,22
3,11,1,5,B6,N999JQ,MCO,-2,168,944,345,343,513,48,34,58,W,25,38,29.86,Fair / Windy,9,17,12
4,11,1,5,DL,N880DN,ATL,-4,139,760,360,356,499,46,32,58,W,24,35,29.91,Fair / Windy,9,17,13


### Editing the column names for clarity
> **Source:** MUHAMMET ALI BÜYÜKNACAR'S [work](https://www.kaggle.com/buyuknacar/jfk-flight-notebook). 

In [4]:
col_names = {"OP_UNIQUE_CARRIER":"CARRIER_CODE",
                "TAIL_NUM":"FLIGHT_NO",
                "CRS_ELAPSED_TIME":"SCHEDULED_DURATION",
                "CRS_DEP_M":"SCHEDULED_DEPARTURE",
                "DEP_TIME_M":"ACTUAL_DEP_TIME",
                "CRS_ARR_M":"SCHEDULED_ARRIVAL",
                "sch_dep":"FLT_SCH_ARRIVAL",
                "sch_arr":"FLT_SCH_DEPARTURE"}

taxi = taxi.rename(col_names, axis=1)
taxi.head()

Unnamed: 0,MONTH,DAY_OF_MONTH,DAY_OF_WEEK,CARRIER_CODE,FLIGHT_NO,DEST,DEP_DELAY,SCHEDULED_DURATION,DISTANCE,SCHEDULED_DEPARTURE,ACTUAL_DEP_TIME,SCHEDULED_ARRIVAL,Temperature,Dew Point,Humidity,Wind,Wind Speed,Wind Gust,Pressure,Condition,FLT_SCH_ARRIVAL,FLT_SCH_DEPARTURE,TAXI_OUT
0,11,1,5,B6,N828JB,CHS,-1,124,636,324,323,448,48,34,58,W,25,38,29.86,Fair / Windy,9,17,14
1,11,1,5,B6,N992JB,LAX,-7,371,2475,340,333,531,48,34,58,W,25,38,29.86,Fair / Windy,9,17,15
2,11,1,5,B6,N959JB,FLL,40,181,1069,301,341,482,48,34,58,W,25,38,29.86,Fair / Windy,9,17,22
3,11,1,5,B6,N999JQ,MCO,-2,168,944,345,343,513,48,34,58,W,25,38,29.86,Fair / Windy,9,17,12
4,11,1,5,DL,N880DN,ATL,-4,139,760,360,356,499,46,32,58,W,24,35,29.91,Fair / Windy,9,17,13


### Preparing the Data for Analysis

#### Aggregating using: min, max, means, sum, distinct and count.

In [20]:
pd.set_option('display.max_columns', 120) # configure dataframe to show all columns

dest = taxi.drop(columns=['MONTH', 'DAY_OF_MONTH', 'DAY_OF_WEEK']) # create new dataframe from taxi and drop the date columns

dest.groupby(['DEST']).describe() # creates dataframe grouped by destinations, showing details on relevant columns' numerical data.  

Unnamed: 0_level_0,DEP_DELAY,DEP_DELAY,DEP_DELAY,DEP_DELAY,DEP_DELAY,DEP_DELAY,DEP_DELAY,DEP_DELAY,SCHEDULED_DURATION,SCHEDULED_DURATION,SCHEDULED_DURATION,SCHEDULED_DURATION,SCHEDULED_DURATION,SCHEDULED_DURATION,SCHEDULED_DURATION,SCHEDULED_DURATION,DISTANCE,DISTANCE,DISTANCE,DISTANCE,DISTANCE,DISTANCE,DISTANCE,DISTANCE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,Temperature,Temperature,Temperature,Temperature,Temperature,Temperature,Temperature,Temperature,Humidity,Humidity,Humidity,Humidity,Humidity,Humidity,Humidity,Humidity,Wind Speed,Wind Speed,Wind Speed,Wind Speed,Wind Speed,Wind Speed,Wind Speed,Wind Speed,Wind Gust,Wind Gust,Wind Gust,Wind Gust,Wind Gust,Wind Gust,Wind Gust,Wind Gust,Pressure,Pressure,Pressure,Pressure,Pressure,Pressure,Pressure,Pressure,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,TAXI_OUT,TAXI_OUT,TAXI_OUT,TAXI_OUT,TAXI_OUT,TAXI_OUT,TAXI_OUT,TAXI_OUT
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
DEST,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2,Unnamed: 31_level_2,Unnamed: 32_level_2,Unnamed: 33_level_2,Unnamed: 34_level_2,Unnamed: 35_level_2,Unnamed: 36_level_2,Unnamed: 37_level_2,Unnamed: 38_level_2,Unnamed: 39_level_2,Unnamed: 40_level_2,Unnamed: 41_level_2,Unnamed: 42_level_2,Unnamed: 43_level_2,Unnamed: 44_level_2,Unnamed: 45_level_2,Unnamed: 46_level_2,Unnamed: 47_level_2,Unnamed: 48_level_2,Unnamed: 49_level_2,Unnamed: 50_level_2,Unnamed: 51_level_2,Unnamed: 52_level_2,Unnamed: 53_level_2,Unnamed: 54_level_2,Unnamed: 55_level_2,Unnamed: 56_level_2,Unnamed: 57_level_2,Unnamed: 58_level_2,Unnamed: 59_level_2,Unnamed: 60_level_2,Unnamed: 61_level_2,Unnamed: 62_level_2,Unnamed: 63_level_2,Unnamed: 64_level_2,Unnamed: 65_level_2,Unnamed: 66_level_2,Unnamed: 67_level_2,Unnamed: 68_level_2,Unnamed: 69_level_2,Unnamed: 70_level_2,Unnamed: 71_level_2,Unnamed: 72_level_2,Unnamed: 73_level_2,Unnamed: 74_level_2,Unnamed: 75_level_2,Unnamed: 76_level_2,Unnamed: 77_level_2,Unnamed: 78_level_2,Unnamed: 79_level_2,Unnamed: 80_level_2,Unnamed: 81_level_2,Unnamed: 82_level_2,Unnamed: 83_level_2,Unnamed: 84_level_2,Unnamed: 85_level_2,Unnamed: 86_level_2,Unnamed: 87_level_2,Unnamed: 88_level_2,Unnamed: 89_level_2,Unnamed: 90_level_2,Unnamed: 91_level_2,Unnamed: 92_level_2,Unnamed: 93_level_2,Unnamed: 94_level_2,Unnamed: 95_level_2,Unnamed: 96_level_2,Unnamed: 97_level_2,Unnamed: 98_level_2,Unnamed: 99_level_2,Unnamed: 100_level_2,Unnamed: 101_level_2,Unnamed: 102_level_2,Unnamed: 103_level_2,Unnamed: 104_level_2,Unnamed: 105_level_2,Unnamed: 106_level_2,Unnamed: 107_level_2,Unnamed: 108_level_2,Unnamed: 109_level_2,Unnamed: 110_level_2,Unnamed: 111_level_2,Unnamed: 112_level_2
ABQ,58.0,10.327586,39.259098,-13.0,-6.75,-3.0,7.00,215.0,58.0,310.258621,3.522162,307.0,307.0,307.0,314.0,314.0,58.0,1826.0,0.0,1826.0,1826.0,1826.0,1826.0,1826.0,58.0,1195.844828,4.697150,1189.0,1193.0,1200.0,1200.0,1200.0,58.0,1206.172414,40.006200,1178.0,1190.00,1194.0,1203.25,1415.0,58.0,1386.103448,1.682622,1383.0,1387.0,1387.0,1387.0,1387.0,58.0,41.758621,7.472312,26.0,36.00,42.0,48.0,56.0,58.0,62.327586,23.869814,10.0,49.25,61.0,84.25,97.0,58.0,11.948276,5.325946,0.0,7.25,12.0,16.00,25.0,58.0,5.551724,11.050399,0.0,0.0,0.0,0.0,33.0,58.0,30.083966,0.305424,29.38,29.8925,30.095,30.2975,30.73,58.0,37.189655,6.974784,19.0,33.0,40.0,41.75,47.0,58.0,31.465517,5.737582,13.0,28.0,31.0,33.75,43.0,58.0,22.620690,8.243717,11.0,15.0,21.0,29.0,41.0
ATL,795.0,6.725786,26.290094,-13.0,-4.00,-1.0,6.50,341.0,795.0,158.148428,6.547875,136.0,155.0,158.0,163.0,172.0,795.0,760.0,0.0,760.0,760.0,760.0,760.0,760.0,795.0,780.316981,280.985778,315.0,510.0,833.0,1015.0,1275.0,795.0,787.042767,285.398459,314.0,509.00,835.0,1012.00,1356.0,795.0,938.465409,284.157564,463.0,665.0,983.0,1169.0,1433.0,795.0,41.530818,8.252013,17.0,35.50,42.0,47.0,67.0,795.0,57.201258,23.670753,10.0,45.00,59.0,74.00,97.0,795.0,12.396226,6.267162,0.0,8.00,12.0,16.00,36.0,795.0,5.322013,11.705322,0.0,0.0,0.0,0.0,47.0,795.0,30.086679,0.294683,29.21,29.8800,30.110,30.3100,30.74,795.0,30.070440,10.505849,6.0,26.0,30.0,36.00,55.0,795.0,28.817610,8.360167,13.0,20.0,31.0,36.00,45.0,795.0,19.272956,6.881488,7.0,14.0,18.0,23.0,41.0
AUS,485.0,4.682474,27.899950,-14.0,-6.00,-3.0,1.00,219.0,485.0,261.525773,6.883765,237.0,257.0,263.0,265.0,277.0,485.0,1521.0,0.0,1521.0,1521.0,1521.0,1521.0,1521.0,485.0,915.839175,327.606022,432.0,555.0,1090.0,1259.0,1391.0,485.0,905.676289,340.460645,4.0,551.00,826.0,1259.00,1432.0,485.0,710.602062,473.716208,29.0,68.0,752.0,1020.0,1439.0,485.0,41.059794,7.876535,21.0,35.00,41.0,47.0,67.0,485.0,58.736082,23.388394,10.0,47.00,60.0,75.00,97.0,485.0,12.484536,6.251799,0.0,8.00,12.0,16.00,35.0,485.0,5.171134,11.672068,0.0,0.0,0.0,0.0,45.0,485.0,30.102474,0.296995,29.23,29.8800,30.120,30.3300,30.74,485.0,34.263918,8.710237,3.0,29.0,33.0,39.00,55.0,485.0,28.070103,8.684985,3.0,20.0,30.0,35.00,45.0,485.0,19.226804,6.192346,9.0,15.0,18.0,23.0,41.0
BNA,366.0,4.106557,30.840324,-14.0,-8.00,-4.5,0.00,199.0,366.0,166.013661,5.357604,154.0,163.0,165.0,170.0,175.0,366.0,765.0,0.0,765.0,765.0,765.0,765.0,765.0,366.0,919.994536,247.202429,485.0,815.0,951.0,1135.0,1230.0,366.0,924.101093,248.513042,483.0,810.00,945.5,1131.00,1302.0,366.0,1026.008197,245.996085,580.0,919.0,1066.0,1241.0,1337.0,366.0,42.232240,7.778308,23.0,36.00,42.0,48.0,67.0,366.0,57.677596,22.930984,10.0,45.25,58.5,73.00,97.0,366.0,12.543716,6.352392,0.0,8.00,12.0,16.00,35.0,366.0,6.109290,12.318115,0.0,0.0,0.0,0.0,45.0,366.0,30.093907,0.291997,29.23,29.8900,30.110,30.3100,30.74,366.0,36.155738,7.914634,20.0,29.0,37.0,41.00,55.0,366.0,32.191257,8.488286,12.0,28.0,34.5,38.00,45.0,366.0,19.702186,7.110967,7.0,15.0,19.0,24.0,41.0
BOS,1243.0,10.536605,43.788570,-16.0,-6.00,-3.0,3.50,515.0,1243.0,78.744167,7.367545,66.0,73.0,79.0,83.0,104.0,1243.0,187.0,0.0,187.0,187.0,187.0,187.0,187.0,1243.0,870.674980,315.771158,360.0,597.0,862.0,1165.0,1395.0,1243.0,861.517297,326.669282,2.0,569.00,849.0,1137.50,1436.0,1243.0,821.985519,371.751796,3.0,555.0,833.0,1108.0,1430.0,1243.0,41.448109,8.122324,18.0,36.00,42.0,47.0,67.0,1243.0,57.622687,23.067360,10.0,46.00,58.0,73.50,97.0,1243.0,12.336283,6.314327,0.0,8.00,12.0,16.00,35.0,1243.0,5.577635,11.979090,0.0,0.0,0.0,0.0,49.0,1243.0,30.102904,0.292522,29.22,29.8900,30.120,30.3300,30.75,1243.0,28.242156,8.278958,1.0,21.0,29.0,34.00,52.0,1243.0,27.872084,7.799142,0.0,21.0,29.0,34.00,46.0,1243.0,20.218825,6.371894,9.0,16.0,19.0,24.0,41.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
SMF,82.0,11.292683,30.616903,-10.0,-5.00,-1.0,12.25,151.0,82.0,385.951220,7.525926,379.0,379.0,379.0,394.0,394.0,82.0,2521.0,0.0,2521.0,2521.0,2521.0,2521.0,2521.0,82.0,1026.963415,20.042206,1000.0,1019.0,1020.0,1052.0,1052.0,82.0,1038.256098,36.266779,992.0,1015.00,1037.0,1054.00,1171.0,82.0,1232.914634,15.285918,1214.0,1218.0,1234.0,1251.0,1251.0,82.0,41.768293,7.204763,25.0,37.00,41.5,46.0,61.0,82.0,55.878049,22.971475,10.0,44.25,57.0,70.00,96.0,82.0,12.182927,5.604513,0.0,7.25,12.0,16.75,25.0,82.0,4.195122,10.282958,0.0,0.0,0.0,0.0,35.0,82.0,30.083171,0.292751,29.22,29.9000,30.110,30.3000,30.74,82.0,30.219512,4.000075,20.0,28.0,30.0,32.00,42.0,82.0,30.646341,2.786085,24.0,28.0,31.0,32.00,38.0,82.0,18.878049,5.774337,11.0,15.0,18.0,22.0,38.0
SRQ,105.0,4.942857,39.103617,-13.0,-6.00,-3.0,2.00,372.0,105.0,189.952381,7.836528,183.0,184.0,190.0,190.0,209.0,105.0,1041.0,0.0,1041.0,1041.0,1041.0,1041.0,1041.0,105.0,535.780952,58.645936,449.0,490.0,560.0,568.0,614.0,105.0,540.723810,71.747219,440.0,486.00,561.0,582.00,932.0,105.0,725.733333,56.251370,633.0,692.0,744.0,752.0,804.0,105.0,40.714286,8.015267,21.0,37.00,40.0,46.0,60.0,105.0,58.914286,24.412762,10.0,48.00,61.0,77.00,97.0,105.0,12.228571,6.498901,0.0,7.00,10.0,16.00,31.0,105.0,5.447619,11.931681,0.0,0.0,0.0,0.0,41.0,105.0,30.096571,0.307286,29.23,29.8300,30.130,30.3300,30.73,105.0,35.885714,9.020611,17.0,31.0,36.0,41.00,55.0,105.0,22.866667,7.037081,11.0,19.0,20.0,23.00,37.0,105.0,19.895238,6.139287,10.0,15.0,19.0,24.0,38.0
STT,62.0,0.467742,11.465869,-10.0,-5.75,-3.0,0.00,43.0,62.0,234.451613,3.700587,228.0,232.0,232.5,238.0,243.0,62.0,1623.0,0.0,1623.0,1623.0,1623.0,1623.0,1623.0,62.0,516.838710,39.070006,492.0,492.0,515.0,515.0,680.0,62.0,517.306452,42.409243,485.0,501.25,510.0,514.00,723.0,62.0,809.354839,40.533369,746.0,790.0,807.0,807.0,975.0,62.0,39.403226,8.237180,22.0,34.25,39.5,45.0,61.0,62.0,55.629032,25.873104,10.0,48.00,60.0,71.50,97.0,62.0,12.693548,6.674121,0.0,8.25,12.0,16.75,31.0,62.0,7.241935,13.233942,0.0,0.0,0.0,0.0,41.0,62.0,30.126613,0.323571,29.26,29.8700,30.190,30.3475,30.73,62.0,47.483871,5.833037,20.0,45.0,47.5,51.00,55.0,62.0,17.919355,1.884348,12.0,17.0,18.0,19.00,23.0,62.0,26.661290,7.539934,14.0,21.0,27.0,33.5,40.0
SYR,389.0,5.511568,33.195267,-15.0,-7.00,-5.0,-1.00,262.0,389.0,82.293059,7.185788,69.0,76.0,80.0,90.0,98.0,389.0,209.0,0.0,209.0,209.0,209.0,209.0,209.0,389.0,853.347044,308.978823,395.0,605.0,989.0,1139.0,1327.0,389.0,858.858612,312.447328,408.0,600.00,997.0,1134.00,1366.0,389.0,935.640103,311.894079,464.0,697.0,1082.0,1215.0,1410.0,389.0,41.282776,7.971744,18.0,36.00,42.0,47.0,65.0,389.0,58.683805,23.231916,10.0,48.00,59.0,75.00,97.0,389.0,12.244216,6.089072,0.0,8.00,12.0,16.00,31.0,389.0,4.773779,11.230262,0.0,0.0,0.0,0.0,49.0,389.0,30.091465,0.290771,29.22,29.8800,30.110,30.3100,30.74,389.0,29.935733,8.991031,12.0,21.0,30.0,38.00,54.0,389.0,28.357326,6.647093,14.0,22.0,30.0,34.00,45.0,389.0,20.935733,6.678052,9.0,16.0,20.0,25.0,41.0


In [6]:
taxi.groupby(['DEST']).count()

Unnamed: 0_level_0,MONTH,DAY_OF_MONTH,DAY_OF_WEEK,CARRIER_CODE,FLIGHT_NO,DEP_DELAY,SCHEDULED_DURATION,DISTANCE,SCHEDULED_DEPARTURE,ACTUAL_DEP_TIME,SCHEDULED_ARRIVAL,Temperature,Dew Point,Humidity,Wind,Wind Speed,Wind Gust,Pressure,Condition,FLT_SCH_ARRIVAL,FLT_SCH_DEPARTURE,TAXI_OUT
DEST,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
ABQ,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58
ATL,795,795,795,795,795,795,795,795,795,795,795,795,795,795,795,795,795,795,795,795,795,795
AUS,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485
BNA,366,366,366,366,366,366,366,366,366,366,366,366,366,366,366,366,366,366,366,366,366,366
BOS,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
SMF,82,82,82,82,82,82,82,82,82,82,82,82,82,82,82,82,82,82,82,82,82,82
SRQ,105,105,105,105,105,105,105,105,105,105,105,105,105,105,105,105,105,105,105,105,105,105
STT,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62
SYR,389,389,389,389,389,389,389,389,389,389,389,389,389,389,389,389,389,389,389,389,389,389


### Splitting the dataset (90:10)
> See Real Python's [guide](https://realpython.com/train-test-split-python-data/) for splitting data with scikit-learn and python.
 
> See [this](https://stackoverflow.com/questions/24147278/how-do-i-create-test-and-train-samples-from-one-dataframe-with-pandas) stack overflow thread for examples using pandas data frames. 

In [None]:
#sklearn.model_selection.train_test_split(taxi)
train, test = train_test_split(taxi, test_size=0.1) #test_size set to 0.1 because we want 0.9 train_size