# Linear Regression: Predicting Taxi-Out Delay
## Given the dataset, predict the runway time of the flight.

#### Task Details
1. Go through the dataset and perform preprocessing and then perform a 90:10 split for train and test purposes.
2. Firstly label encode the columns which are required.
3. Your target or **y variable is TAXI-OUT time**. Use all 8 algorithms above on the dataset with loss score as RMSE (Root mean Square Error).
4. Now, One-Hot encode all the data points and perform the 3rd Step again.

Keep in mind that you will be using the same splitted dataframe for all the training and testing and should not split again.

##### Some models to consider:-

Linear Models:-

1. Linear Regression
2. Ridge Regression(Popularily L1)
3. Lasso Regression(Popularily L2)

Non linear Models:-

1. KNN model
2. SVR
3. Naive Bayes
4. Random Forest
5. LightGBM(Tree Based Model)

##### Expected Submission
Submit a notebook to see if label encoding or one hot encoding is better for the model and which out of the 8 algorithms which is the best. Just a comparative study and plots to understand the results on bigger datasets.

##### Evaluation
Having a complete report along with comparative graphs.

In [42]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
from sklearn.linear_model import RidgeCV
from sklearn.model_selection import RepeatedKFold
from numpy import arange
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error


### Configuring the Data Frame
> **Source:** MUHAMMET ALI BÜYÜKNACAR'S [work](https://www.kaggle.com/buyuknacar/jfk-flight-notebook). 

In [43]:
pd.set_option('display.width', 1200)
pd.set_option('display.max_columns', 25)

### Loading the data from csv file

In [44]:
taxi = pd.read_csv('M1_final.csv')
taxi.head()

Unnamed: 0,MONTH,DAY_OF_MONTH,DAY_OF_WEEK,OP_UNIQUE_CARRIER,TAIL_NUM,DEST,DEP_DELAY,CRS_ELAPSED_TIME,DISTANCE,CRS_DEP_M,DEP_TIME_M,CRS_ARR_M,Temperature,Dew Point,Humidity,Wind,Wind Speed,Wind Gust,Pressure,Condition,sch_dep,sch_arr,TAXI_OUT
0,11,1,5,B6,N828JB,CHS,-1,124,636,324,323,448,48,34,58,W,25,38,29.86,Fair / Windy,9,17,14
1,11,1,5,B6,N992JB,LAX,-7,371,2475,340,333,531,48,34,58,W,25,38,29.86,Fair / Windy,9,17,15
2,11,1,5,B6,N959JB,FLL,40,181,1069,301,341,482,48,34,58,W,25,38,29.86,Fair / Windy,9,17,22
3,11,1,5,B6,N999JQ,MCO,-2,168,944,345,343,513,48,34,58,W,25,38,29.86,Fair / Windy,9,17,12
4,11,1,5,DL,N880DN,ATL,-4,139,760,360,356,499,46,32,58,W,24,35,29.91,Fair / Windy,9,17,13


### Editing the column names for clarity
> **Source:** MUHAMMET ALI BÜYÜKNACAR'S [work](https://www.kaggle.com/buyuknacar/jfk-flight-notebook). 

In [45]:
col_names = {"OP_UNIQUE_CARRIER":"CARRIER_CODE",
                "TAIL_NUM":"FLIGHT_NO",
                "CRS_ELAPSED_TIME":"SCHEDULED_DURATION",
                "CRS_DEP_M":"SCHEDULED_DEPARTURE",
                "DEP_TIME_M":"ACTUAL_DEP_TIME",
                "CRS_ARR_M":"SCHEDULED_ARRIVAL",
                "sch_dep":"FLT_SCH_ARRIVAL",
                "sch_arr":"FLT_SCH_DEPARTURE"}

taxi = taxi.rename(col_names, axis=1)
taxi.head()

Unnamed: 0,MONTH,DAY_OF_MONTH,DAY_OF_WEEK,CARRIER_CODE,FLIGHT_NO,DEST,DEP_DELAY,SCHEDULED_DURATION,DISTANCE,SCHEDULED_DEPARTURE,ACTUAL_DEP_TIME,SCHEDULED_ARRIVAL,Temperature,Dew Point,Humidity,Wind,Wind Speed,Wind Gust,Pressure,Condition,FLT_SCH_ARRIVAL,FLT_SCH_DEPARTURE,TAXI_OUT
0,11,1,5,B6,N828JB,CHS,-1,124,636,324,323,448,48,34,58,W,25,38,29.86,Fair / Windy,9,17,14
1,11,1,5,B6,N992JB,LAX,-7,371,2475,340,333,531,48,34,58,W,25,38,29.86,Fair / Windy,9,17,15
2,11,1,5,B6,N959JB,FLL,40,181,1069,301,341,482,48,34,58,W,25,38,29.86,Fair / Windy,9,17,22
3,11,1,5,B6,N999JQ,MCO,-2,168,944,345,343,513,48,34,58,W,25,38,29.86,Fair / Windy,9,17,12
4,11,1,5,DL,N880DN,ATL,-4,139,760,360,356,499,46,32,58,W,24,35,29.91,Fair / Windy,9,17,13


### Preparing the Data for Analysis

### Checking for and dropping missing values

In [46]:
taxi.isnull().sum() # will show the total number of missing values for each feature

MONTH                  0
DAY_OF_MONTH           0
DAY_OF_WEEK            0
CARRIER_CODE           0
FLIGHT_NO              0
DEST                   0
DEP_DELAY              0
SCHEDULED_DURATION     0
DISTANCE               0
SCHEDULED_DEPARTURE    0
ACTUAL_DEP_TIME        0
SCHEDULED_ARRIVAL      0
Temperature            0
Dew Point              0
Humidity               0
Wind                   2
Wind Speed             0
Wind Gust              0
Pressure               0
Condition              0
FLT_SCH_ARRIVAL        0
FLT_SCH_DEPARTURE      0
TAXI_OUT               0
dtype: int64

In [47]:
taxi = taxi.dropna() # drop rows with missing values, reassign the dataframe with no null values as taxi
taxi.isnull().sum() # check to see the missing values have been dropped

MONTH                  0
DAY_OF_MONTH           0
DAY_OF_WEEK            0
CARRIER_CODE           0
FLIGHT_NO              0
DEST                   0
DEP_DELAY              0
SCHEDULED_DURATION     0
DISTANCE               0
SCHEDULED_DEPARTURE    0
ACTUAL_DEP_TIME        0
SCHEDULED_ARRIVAL      0
Temperature            0
Dew Point              0
Humidity               0
Wind                   0
Wind Speed             0
Wind Gust              0
Pressure               0
Condition              0
FLT_SCH_ARRIVAL        0
FLT_SCH_DEPARTURE      0
TAXI_OUT               0
dtype: int64

#### Aggregating using: min, max, means, sum, distinct and count.

In [48]:
pd.set_option('display.max_columns', 120) # configure dataframe to show all columns

dest = taxi.drop(columns=['MONTH', 'DAY_OF_MONTH', 'DAY_OF_WEEK']) # create new dataframe from taxi and drop the date columns

dest.groupby(['DEST']).describe() # creates dataframe grouped by destinations, showing details on relevant columns' numerical data.  

Unnamed: 0_level_0,DEP_DELAY,DEP_DELAY,DEP_DELAY,DEP_DELAY,DEP_DELAY,DEP_DELAY,DEP_DELAY,DEP_DELAY,SCHEDULED_DURATION,SCHEDULED_DURATION,SCHEDULED_DURATION,SCHEDULED_DURATION,SCHEDULED_DURATION,SCHEDULED_DURATION,SCHEDULED_DURATION,SCHEDULED_DURATION,DISTANCE,DISTANCE,DISTANCE,DISTANCE,DISTANCE,DISTANCE,DISTANCE,DISTANCE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,Temperature,Temperature,Temperature,Temperature,Temperature,Temperature,Temperature,Temperature,Humidity,Humidity,Humidity,Humidity,Humidity,Humidity,Humidity,Humidity,Wind Speed,Wind Speed,Wind Speed,Wind Speed,Wind Speed,Wind Speed,Wind Speed,Wind Speed,Wind Gust,Wind Gust,Wind Gust,Wind Gust,Wind Gust,Wind Gust,Wind Gust,Wind Gust,Pressure,Pressure,Pressure,Pressure,Pressure,Pressure,Pressure,Pressure,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,TAXI_OUT,TAXI_OUT,TAXI_OUT,TAXI_OUT,TAXI_OUT,TAXI_OUT,TAXI_OUT,TAXI_OUT
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
DEST,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2,Unnamed: 31_level_2,Unnamed: 32_level_2,Unnamed: 33_level_2,Unnamed: 34_level_2,Unnamed: 35_level_2,Unnamed: 36_level_2,Unnamed: 37_level_2,Unnamed: 38_level_2,Unnamed: 39_level_2,Unnamed: 40_level_2,Unnamed: 41_level_2,Unnamed: 42_level_2,Unnamed: 43_level_2,Unnamed: 44_level_2,Unnamed: 45_level_2,Unnamed: 46_level_2,Unnamed: 47_level_2,Unnamed: 48_level_2,Unnamed: 49_level_2,Unnamed: 50_level_2,Unnamed: 51_level_2,Unnamed: 52_level_2,Unnamed: 53_level_2,Unnamed: 54_level_2,Unnamed: 55_level_2,Unnamed: 56_level_2,Unnamed: 57_level_2,Unnamed: 58_level_2,Unnamed: 59_level_2,Unnamed: 60_level_2,Unnamed: 61_level_2,Unnamed: 62_level_2,Unnamed: 63_level_2,Unnamed: 64_level_2,Unnamed: 65_level_2,Unnamed: 66_level_2,Unnamed: 67_level_2,Unnamed: 68_level_2,Unnamed: 69_level_2,Unnamed: 70_level_2,Unnamed: 71_level_2,Unnamed: 72_level_2,Unnamed: 73_level_2,Unnamed: 74_level_2,Unnamed: 75_level_2,Unnamed: 76_level_2,Unnamed: 77_level_2,Unnamed: 78_level_2,Unnamed: 79_level_2,Unnamed: 80_level_2,Unnamed: 81_level_2,Unnamed: 82_level_2,Unnamed: 83_level_2,Unnamed: 84_level_2,Unnamed: 85_level_2,Unnamed: 86_level_2,Unnamed: 87_level_2,Unnamed: 88_level_2,Unnamed: 89_level_2,Unnamed: 90_level_2,Unnamed: 91_level_2,Unnamed: 92_level_2,Unnamed: 93_level_2,Unnamed: 94_level_2,Unnamed: 95_level_2,Unnamed: 96_level_2,Unnamed: 97_level_2,Unnamed: 98_level_2,Unnamed: 99_level_2,Unnamed: 100_level_2,Unnamed: 101_level_2,Unnamed: 102_level_2,Unnamed: 103_level_2,Unnamed: 104_level_2,Unnamed: 105_level_2,Unnamed: 106_level_2,Unnamed: 107_level_2,Unnamed: 108_level_2,Unnamed: 109_level_2,Unnamed: 110_level_2,Unnamed: 111_level_2,Unnamed: 112_level_2
ABQ,58.0,10.327586,39.259098,-13.0,-6.75,-3.0,7.00,215.0,58.0,310.258621,3.522162,307.0,307.0,307.0,314.0,314.0,58.0,1826.0,0.0,1826.0,1826.0,1826.0,1826.0,1826.0,58.0,1195.844828,4.697150,1189.0,1193.0,1200.0,1200.0,1200.0,58.0,1206.172414,40.006200,1178.0,1190.00,1194.0,1203.25,1415.0,58.0,1386.103448,1.682622,1383.0,1387.0,1387.0,1387.0,1387.0,58.0,41.758621,7.472312,26.0,36.00,42.0,48.0,56.0,58.0,62.327586,23.869814,10.0,49.25,61.0,84.25,97.0,58.0,11.948276,5.325946,0.0,7.25,12.0,16.00,25.0,58.0,5.551724,11.050399,0.0,0.0,0.0,0.0,33.0,58.0,30.083966,0.305424,29.38,29.8925,30.095,30.2975,30.73,58.0,37.189655,6.974784,19.0,33.0,40.0,41.75,47.0,58.0,31.465517,5.737582,13.0,28.0,31.0,33.75,43.0,58.0,22.620690,8.243717,11.0,15.0,21.0,29.0,41.0
ATL,795.0,6.725786,26.290094,-13.0,-4.00,-1.0,6.50,341.0,795.0,158.148428,6.547875,136.0,155.0,158.0,163.0,172.0,795.0,760.0,0.0,760.0,760.0,760.0,760.0,760.0,795.0,780.316981,280.985778,315.0,510.0,833.0,1015.0,1275.0,795.0,787.042767,285.398459,314.0,509.00,835.0,1012.00,1356.0,795.0,938.465409,284.157564,463.0,665.0,983.0,1169.0,1433.0,795.0,41.530818,8.252013,17.0,35.50,42.0,47.0,67.0,795.0,57.201258,23.670753,10.0,45.00,59.0,74.00,97.0,795.0,12.396226,6.267162,0.0,8.00,12.0,16.00,36.0,795.0,5.322013,11.705322,0.0,0.0,0.0,0.0,47.0,795.0,30.086679,0.294683,29.21,29.8800,30.110,30.3100,30.74,795.0,30.070440,10.505849,6.0,26.0,30.0,36.00,55.0,795.0,28.817610,8.360167,13.0,20.0,31.0,36.00,45.0,795.0,19.272956,6.881488,7.0,14.0,18.0,23.0,41.0
AUS,485.0,4.682474,27.899950,-14.0,-6.00,-3.0,1.00,219.0,485.0,261.525773,6.883765,237.0,257.0,263.0,265.0,277.0,485.0,1521.0,0.0,1521.0,1521.0,1521.0,1521.0,1521.0,485.0,915.839175,327.606022,432.0,555.0,1090.0,1259.0,1391.0,485.0,905.676289,340.460645,4.0,551.00,826.0,1259.00,1432.0,485.0,710.602062,473.716208,29.0,68.0,752.0,1020.0,1439.0,485.0,41.059794,7.876535,21.0,35.00,41.0,47.0,67.0,485.0,58.736082,23.388394,10.0,47.00,60.0,75.00,97.0,485.0,12.484536,6.251799,0.0,8.00,12.0,16.00,35.0,485.0,5.171134,11.672068,0.0,0.0,0.0,0.0,45.0,485.0,30.102474,0.296995,29.23,29.8800,30.120,30.3300,30.74,485.0,34.263918,8.710237,3.0,29.0,33.0,39.00,55.0,485.0,28.070103,8.684985,3.0,20.0,30.0,35.00,45.0,485.0,19.226804,6.192346,9.0,15.0,18.0,23.0,41.0
BNA,366.0,4.106557,30.840324,-14.0,-8.00,-4.5,0.00,199.0,366.0,166.013661,5.357604,154.0,163.0,165.0,170.0,175.0,366.0,765.0,0.0,765.0,765.0,765.0,765.0,765.0,366.0,919.994536,247.202429,485.0,815.0,951.0,1135.0,1230.0,366.0,924.101093,248.513042,483.0,810.00,945.5,1131.00,1302.0,366.0,1026.008197,245.996085,580.0,919.0,1066.0,1241.0,1337.0,366.0,42.232240,7.778308,23.0,36.00,42.0,48.0,67.0,366.0,57.677596,22.930984,10.0,45.25,58.5,73.00,97.0,366.0,12.543716,6.352392,0.0,8.00,12.0,16.00,35.0,366.0,6.109290,12.318115,0.0,0.0,0.0,0.0,45.0,366.0,30.093907,0.291997,29.23,29.8900,30.110,30.3100,30.74,366.0,36.155738,7.914634,20.0,29.0,37.0,41.00,55.0,366.0,32.191257,8.488286,12.0,28.0,34.5,38.00,45.0,366.0,19.702186,7.110967,7.0,15.0,19.0,24.0,41.0
BOS,1243.0,10.536605,43.788570,-16.0,-6.00,-3.0,3.50,515.0,1243.0,78.744167,7.367545,66.0,73.0,79.0,83.0,104.0,1243.0,187.0,0.0,187.0,187.0,187.0,187.0,187.0,1243.0,870.674980,315.771158,360.0,597.0,862.0,1165.0,1395.0,1243.0,861.517297,326.669282,2.0,569.00,849.0,1137.50,1436.0,1243.0,821.985519,371.751796,3.0,555.0,833.0,1108.0,1430.0,1243.0,41.448109,8.122324,18.0,36.00,42.0,47.0,67.0,1243.0,57.622687,23.067360,10.0,46.00,58.0,73.50,97.0,1243.0,12.336283,6.314327,0.0,8.00,12.0,16.00,35.0,1243.0,5.577635,11.979090,0.0,0.0,0.0,0.0,49.0,1243.0,30.102904,0.292522,29.22,29.8900,30.120,30.3300,30.75,1243.0,28.242156,8.278958,1.0,21.0,29.0,34.00,52.0,1243.0,27.872084,7.799142,0.0,21.0,29.0,34.00,46.0,1243.0,20.218825,6.371894,9.0,16.0,19.0,24.0,41.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
SMF,82.0,11.292683,30.616903,-10.0,-5.00,-1.0,12.25,151.0,82.0,385.951220,7.525926,379.0,379.0,379.0,394.0,394.0,82.0,2521.0,0.0,2521.0,2521.0,2521.0,2521.0,2521.0,82.0,1026.963415,20.042206,1000.0,1019.0,1020.0,1052.0,1052.0,82.0,1038.256098,36.266779,992.0,1015.00,1037.0,1054.00,1171.0,82.0,1232.914634,15.285918,1214.0,1218.0,1234.0,1251.0,1251.0,82.0,41.768293,7.204763,25.0,37.00,41.5,46.0,61.0,82.0,55.878049,22.971475,10.0,44.25,57.0,70.00,96.0,82.0,12.182927,5.604513,0.0,7.25,12.0,16.75,25.0,82.0,4.195122,10.282958,0.0,0.0,0.0,0.0,35.0,82.0,30.083171,0.292751,29.22,29.9000,30.110,30.3000,30.74,82.0,30.219512,4.000075,20.0,28.0,30.0,32.00,42.0,82.0,30.646341,2.786085,24.0,28.0,31.0,32.00,38.0,82.0,18.878049,5.774337,11.0,15.0,18.0,22.0,38.0
SRQ,105.0,4.942857,39.103617,-13.0,-6.00,-3.0,2.00,372.0,105.0,189.952381,7.836528,183.0,184.0,190.0,190.0,209.0,105.0,1041.0,0.0,1041.0,1041.0,1041.0,1041.0,1041.0,105.0,535.780952,58.645936,449.0,490.0,560.0,568.0,614.0,105.0,540.723810,71.747219,440.0,486.00,561.0,582.00,932.0,105.0,725.733333,56.251370,633.0,692.0,744.0,752.0,804.0,105.0,40.714286,8.015267,21.0,37.00,40.0,46.0,60.0,105.0,58.914286,24.412762,10.0,48.00,61.0,77.00,97.0,105.0,12.228571,6.498901,0.0,7.00,10.0,16.00,31.0,105.0,5.447619,11.931681,0.0,0.0,0.0,0.0,41.0,105.0,30.096571,0.307286,29.23,29.8300,30.130,30.3300,30.73,105.0,35.885714,9.020611,17.0,31.0,36.0,41.00,55.0,105.0,22.866667,7.037081,11.0,19.0,20.0,23.00,37.0,105.0,19.895238,6.139287,10.0,15.0,19.0,24.0,38.0
STT,62.0,0.467742,11.465869,-10.0,-5.75,-3.0,0.00,43.0,62.0,234.451613,3.700587,228.0,232.0,232.5,238.0,243.0,62.0,1623.0,0.0,1623.0,1623.0,1623.0,1623.0,1623.0,62.0,516.838710,39.070006,492.0,492.0,515.0,515.0,680.0,62.0,517.306452,42.409243,485.0,501.25,510.0,514.00,723.0,62.0,809.354839,40.533369,746.0,790.0,807.0,807.0,975.0,62.0,39.403226,8.237180,22.0,34.25,39.5,45.0,61.0,62.0,55.629032,25.873104,10.0,48.00,60.0,71.50,97.0,62.0,12.693548,6.674121,0.0,8.25,12.0,16.75,31.0,62.0,7.241935,13.233942,0.0,0.0,0.0,0.0,41.0,62.0,30.126613,0.323571,29.26,29.8700,30.190,30.3475,30.73,62.0,47.483871,5.833037,20.0,45.0,47.5,51.00,55.0,62.0,17.919355,1.884348,12.0,17.0,18.0,19.00,23.0,62.0,26.661290,7.539934,14.0,21.0,27.0,33.5,40.0
SYR,389.0,5.511568,33.195267,-15.0,-7.00,-5.0,-1.00,262.0,389.0,82.293059,7.185788,69.0,76.0,80.0,90.0,98.0,389.0,209.0,0.0,209.0,209.0,209.0,209.0,209.0,389.0,853.347044,308.978823,395.0,605.0,989.0,1139.0,1327.0,389.0,858.858612,312.447328,408.0,600.00,997.0,1134.00,1366.0,389.0,935.640103,311.894079,464.0,697.0,1082.0,1215.0,1410.0,389.0,41.282776,7.971744,18.0,36.00,42.0,47.0,65.0,389.0,58.683805,23.231916,10.0,48.00,59.0,75.00,97.0,389.0,12.244216,6.089072,0.0,8.00,12.0,16.00,31.0,389.0,4.773779,11.230262,0.0,0.0,0.0,0.0,49.0,389.0,30.091465,0.290771,29.22,29.8800,30.110,30.3100,30.74,389.0,29.935733,8.991031,12.0,21.0,30.0,38.00,54.0,389.0,28.357326,6.647093,14.0,22.0,30.0,34.00,45.0,389.0,20.935733,6.678052,9.0,16.0,20.0,25.0,41.0


In [49]:
taxi.groupby(['DEST']).count() # show the counts of each feature for each destination

Unnamed: 0_level_0,MONTH,DAY_OF_MONTH,DAY_OF_WEEK,CARRIER_CODE,FLIGHT_NO,DEP_DELAY,SCHEDULED_DURATION,DISTANCE,SCHEDULED_DEPARTURE,ACTUAL_DEP_TIME,SCHEDULED_ARRIVAL,Temperature,Dew Point,Humidity,Wind,Wind Speed,Wind Gust,Pressure,Condition,FLT_SCH_ARRIVAL,FLT_SCH_DEPARTURE,TAXI_OUT
DEST,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
ABQ,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58,58
ATL,795,795,795,795,795,795,795,795,795,795,795,795,795,795,795,795,795,795,795,795,795,795
AUS,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485
BNA,366,366,366,366,366,366,366,366,366,366,366,366,366,366,366,366,366,366,366,366,366,366
BOS,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243,1243
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
SMF,82,82,82,82,82,82,82,82,82,82,82,82,82,82,82,82,82,82,82,82,82,82
SRQ,105,105,105,105,105,105,105,105,105,105,105,105,105,105,105,105,105,105,105,105,105,105
STT,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62,62
SYR,389,389,389,389,389,389,389,389,389,389,389,389,389,389,389,389,389,389,389,389,389,389


In [50]:
pd.set_option('display.max_columns', 120) # configure dataframe to show all columns

condition = taxi.drop(columns=['MONTH', 'DAY_OF_MONTH', 'DAY_OF_WEEK']) # create new dataframe from taxi and drop the date columns

condition.groupby(['Condition']).describe() # creates dataframe grouped by Conditions, showing details on relevant columns' numerical data.  

Unnamed: 0_level_0,DEP_DELAY,DEP_DELAY,DEP_DELAY,DEP_DELAY,DEP_DELAY,DEP_DELAY,DEP_DELAY,DEP_DELAY,SCHEDULED_DURATION,SCHEDULED_DURATION,SCHEDULED_DURATION,SCHEDULED_DURATION,SCHEDULED_DURATION,SCHEDULED_DURATION,SCHEDULED_DURATION,SCHEDULED_DURATION,DISTANCE,DISTANCE,DISTANCE,DISTANCE,DISTANCE,DISTANCE,DISTANCE,DISTANCE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,SCHEDULED_DEPARTURE,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,ACTUAL_DEP_TIME,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,SCHEDULED_ARRIVAL,Temperature,Temperature,Temperature,Temperature,Temperature,Temperature,Temperature,Temperature,Humidity,Humidity,Humidity,Humidity,Humidity,Humidity,Humidity,Humidity,Wind Speed,Wind Speed,Wind Speed,Wind Speed,Wind Speed,Wind Speed,Wind Speed,Wind Speed,Wind Gust,Wind Gust,Wind Gust,Wind Gust,Wind Gust,Wind Gust,Wind Gust,Wind Gust,Pressure,Pressure,Pressure,Pressure,Pressure,Pressure,Pressure,Pressure,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_ARRIVAL,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,FLT_SCH_DEPARTURE,TAXI_OUT,TAXI_OUT,TAXI_OUT,TAXI_OUT,TAXI_OUT,TAXI_OUT,TAXI_OUT,TAXI_OUT
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
Condition,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2,Unnamed: 31_level_2,Unnamed: 32_level_2,Unnamed: 33_level_2,Unnamed: 34_level_2,Unnamed: 35_level_2,Unnamed: 36_level_2,Unnamed: 37_level_2,Unnamed: 38_level_2,Unnamed: 39_level_2,Unnamed: 40_level_2,Unnamed: 41_level_2,Unnamed: 42_level_2,Unnamed: 43_level_2,Unnamed: 44_level_2,Unnamed: 45_level_2,Unnamed: 46_level_2,Unnamed: 47_level_2,Unnamed: 48_level_2,Unnamed: 49_level_2,Unnamed: 50_level_2,Unnamed: 51_level_2,Unnamed: 52_level_2,Unnamed: 53_level_2,Unnamed: 54_level_2,Unnamed: 55_level_2,Unnamed: 56_level_2,Unnamed: 57_level_2,Unnamed: 58_level_2,Unnamed: 59_level_2,Unnamed: 60_level_2,Unnamed: 61_level_2,Unnamed: 62_level_2,Unnamed: 63_level_2,Unnamed: 64_level_2,Unnamed: 65_level_2,Unnamed: 66_level_2,Unnamed: 67_level_2,Unnamed: 68_level_2,Unnamed: 69_level_2,Unnamed: 70_level_2,Unnamed: 71_level_2,Unnamed: 72_level_2,Unnamed: 73_level_2,Unnamed: 74_level_2,Unnamed: 75_level_2,Unnamed: 76_level_2,Unnamed: 77_level_2,Unnamed: 78_level_2,Unnamed: 79_level_2,Unnamed: 80_level_2,Unnamed: 81_level_2,Unnamed: 82_level_2,Unnamed: 83_level_2,Unnamed: 84_level_2,Unnamed: 85_level_2,Unnamed: 86_level_2,Unnamed: 87_level_2,Unnamed: 88_level_2,Unnamed: 89_level_2,Unnamed: 90_level_2,Unnamed: 91_level_2,Unnamed: 92_level_2,Unnamed: 93_level_2,Unnamed: 94_level_2,Unnamed: 95_level_2,Unnamed: 96_level_2,Unnamed: 97_level_2,Unnamed: 98_level_2,Unnamed: 99_level_2,Unnamed: 100_level_2,Unnamed: 101_level_2,Unnamed: 102_level_2,Unnamed: 103_level_2,Unnamed: 104_level_2,Unnamed: 105_level_2,Unnamed: 106_level_2,Unnamed: 107_level_2,Unnamed: 108_level_2,Unnamed: 109_level_2,Unnamed: 110_level_2,Unnamed: 111_level_2,Unnamed: 112_level_2
Cloudy,4992.0,5.316106,38.055821,-22.0,-6.0,-3.0,0.0,1048.0,4992.0,225.11238,117.825273,57.0,125.0,189.5,365.0,697.0,4992.0,1264.958934,875.580354,94.0,483.0,1041.0,2248.0,4983.0,4992.0,825.221554,304.417077,301.0,531.75,857.0,1095.0,1439.0,4992.0,818.999199,311.664436,2.0,523.0,848.0,1092.0,1440.0,4992.0,909.3123,345.897879,1.0,662.0,917.0,1194.0,1439.0,4992.0,43.303486,7.217202,21.0,39.0,43.0,48.0,61.0,4992.0,63.158654,27.643813,10.0,52.0,70.0,85.25,97.0,4992.0,9.875401,4.3909,0.0,7.0,9.0,13.0,20.0,4992.0,0.990184,4.848259,0.0,0.0,0.0,0.0,30.0,4992.0,30.036973,0.3143069,29.22,29.81,30.02,30.2825,30.74,4992.0,30.476362,9.926571,0.0,25.0,30.0,37.0,54.0,4992.0,28.195913,8.576998,0.0,20.0,29.0,35.0,45.0,4992.0,20.258814,6.36242,7.0,16.0,19.0,24.0,41.0
Cloudy / Windy,341.0,18.695015,54.967806,-15.0,-6.0,-3.0,10.0,372.0,341.0,215.105572,110.922525,57.0,125.0,183.0,334.0,424.0,341.0,1191.331378,821.225672,94.0,483.0,1005.0,1990.0,2586.0,341.0,1006.175953,262.500451,301.0,840.0,1089.0,1214.0,1439.0,341.0,995.31085,303.706103,43.0,824.0,1084.0,1231.0,1433.0,341.0,1003.803519,412.28578,1.0,876.0,1076.0,1339.0,1439.0,341.0,42.560117,5.188794,34.0,40.0,41.0,44.0,57.0,341.0,70.11437,15.997844,10.0,65.0,67.0,77.0,97.0,341.0,23.143695,2.774866,21.0,22.0,22.0,23.0,31.0,341.0,23.398827,13.667454,0.0,28.0,30.0,31.0,44.0,341.0,29.871408,0.3133309,29.29,29.61,29.89,30.12,30.27,341.0,27.445748,7.211153,0.0,24.0,29.0,30.0,41.0,341.0,30.01173,7.867273,3.0,24.0,31.0,35.0,44.0,341.0,20.970674,7.001619,7.0,16.0,20.0,25.0,41.0
Drizzle and Fog,5.0,-3.0,5.700877,-7.0,-6.0,-5.0,-4.0,7.0,5.0,283.4,149.106673,93.0,151.0,379.0,387.0,407.0,5.0,1663.4,1110.715895,213.0,718.0,2378.0,2422.0,2586.0,5.0,570.8,7.463243,560.0,569.0,570.0,575.0,580.0,5.0,567.8,3.701351,565.0,565.0,567.0,568.0,574.0,5.0,746.2,48.100936,673.0,726.0,767.0,768.0,797.0,5.0,46.0,0.0,46.0,46.0,46.0,46.0,46.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,6.0,0.0,6.0,6.0,6.0,6.0,6.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,29.66,3.972055e-15,29.66,29.66,29.66,29.66,29.66,5.0,40.0,0.0,40.0,40.0,40.0,40.0,40.0,5.0,13.0,0.0,13.0,13.0,13.0,13.0,13.0,5.0,32.6,5.128353,27.0,29.0,32.0,35.0,40.0
Fair,5038.0,6.032553,37.086923,-18.0,-6.0,-3.0,1.0,830.0,5038.0,227.434895,119.711126,57.0,125.0,190.0,367.0,697.0,5038.0,1283.190155,889.593552,94.0,483.0,1069.0,2248.0,4983.0,5038.0,834.307265,319.986273,301.0,510.0,860.0,1130.0,1439.0,5038.0,832.908297,325.661777,3.0,508.0,859.5,1135.75,1440.0,5038.0,894.080389,371.6885,1.0,632.0,894.0,1225.0,1439.0,5038.0,38.084756,8.861102,19.0,32.0,36.0,44.0,61.0,5038.0,54.12366,15.445074,10.0,44.0,52.0,66.0,97.0,5038.0,11.679436,4.427325,0.0,8.0,12.0,15.0,20.0,5038.0,3.669909,9.03238,0.0,0.0,0.0,0.0,37.0,5038.0,30.262622,0.2247688,29.55,30.11,30.28,30.42,30.75,5038.0,30.778881,10.10801,0.0,25.0,30.5,38.0,55.0,5038.0,28.127233,8.104287,0.0,21.0,30.0,35.0,45.0,5038.0,20.240969,6.507207,6.0,15.0,19.0,24.0,41.0
Fair / Windy,918.0,9.896514,55.643271,-15.0,-5.0,-2.0,4.0,1276.0,918.0,224.440087,117.682002,59.0,127.25,188.0,357.75,675.0,918.0,1276.844227,878.432511,94.0,509.0,1069.0,2248.0,4983.0,918.0,704.971678,312.077716,301.0,479.0,560.0,1040.0,1439.0,918.0,703.8878,317.589702,17.0,475.0,555.5,1075.25,1435.0,918.0,813.333333,324.579349,3.0,617.0,701.5,1043.75,1439.0,918.0,33.820261,11.286946,17.0,25.0,32.0,40.0,64.0,918.0,48.150327,10.353788,24.0,43.0,48.0,54.0,68.0,918.0,23.28976,2.461242,21.0,22.0,23.0,24.0,36.0,918.0,29.519608,9.830468,0.0,30.0,32.0,33.0,46.0,918.0,30.232887,0.249465,29.5,30.09,30.26,30.36,30.63,918.0,32.397603,11.963497,0.0,21.0,33.0,40.0,53.0,918.0,25.255991,8.099359,3.0,19.0,23.0,34.0,45.0,918.0,21.265795,7.242576,8.0,16.0,20.0,25.0,41.0
Fog,147.0,2.639456,19.265252,-13.0,-5.0,-3.0,2.5,132.0,147.0,208.44898,114.798703,71.0,112.0,182.0,337.0,412.0,147.0,1161.319728,848.563742,184.0,425.0,1005.0,1990.0,2586.0,147.0,618.44898,150.457621,389.0,450.0,677.0,710.0,1309.0,147.0,611.292517,151.109106,1.0,446.5,680.0,708.0,843.0,147.0,768.938776,156.354761,91.0,631.0,823.0,879.0,1035.0,147.0,48.659864,7.382408,33.0,46.0,52.0,55.0,55.0,147.0,10.0,0.0,10.0,10.0,10.0,10.0,10.0,147.0,10.632653,5.070843,3.0,6.0,8.0,16.0,20.0,147.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,147.0,29.76034,0.3563983,29.22,29.33,29.87,30.01,30.28,147.0,27.972789,7.792643,13.0,21.0,26.0,32.0,44.0,147.0,25.965986,8.066008,10.0,21.0,21.0,34.0,42.0,147.0,24.965986,7.62693,10.0,18.0,25.0,30.0,41.0
Fog / Windy,28.0,25.892857,74.707003,-11.0,-2.25,0.5,28.5,390.0,28.0,207.964286,109.032609,83.0,124.75,170.0,302.0,404.0,28.0,1135.714286,830.919475,213.0,405.25,894.0,1717.0,2586.0,28.0,825.357143,182.387909,360.0,867.5,895.5,899.0,952.0,28.0,851.25,176.075452,349.0,890.25,897.5,932.0,953.0,28.0,986.178571,180.062963,517.0,994.75,1032.0,1072.25,1164.0,28.0,52.535714,1.261455,52.0,52.0,52.0,52.0,56.0,28.0,10.0,0.0,10.0,10.0,10.0,10.0,10.0,28.0,23.5,4.434712,21.0,22.0,22.0,22.0,36.0,28.0,4.928571,14.488638,0.0,0.0,0.0,0.0,46.0,28.0,29.276071,0.2060446,29.2,29.2,29.21,29.21,29.86,28.0,24.464286,5.460173,7.0,24.0,24.0,28.0,28.0,28.0,35.892857,5.573345,21.0,35.0,35.0,40.0,40.0,28.0,16.928571,3.799889,10.0,14.75,16.5,18.5,25.0
Heavy Rain,54.0,35.611111,71.089743,-9.0,-2.0,9.0,27.25,325.0,54.0,228.518519,106.586618,81.0,145.0,206.0,314.25,417.0,54.0,1280.740741,790.174454,184.0,608.0,1051.5,1892.0,2586.0,54.0,909.962963,291.175689,420.0,779.25,1016.5,1139.0,1412.0,54.0,918.907407,311.962157,155.0,565.25,1017.0,1167.0,1216.0,54.0,1060.703704,321.161902,258.0,740.75,1203.5,1310.0,1439.0,54.0,49.518519,3.446304,37.0,48.0,51.0,52.0,55.0,54.0,10.0,0.0,10.0,10.0,10.0,10.0,10.0,54.0,12.148148,4.923595,3.0,12.0,14.0,16.0,17.0,54.0,1.907407,6.808065,0.0,0.0,0.0,0.0,26.0,54.0,29.852037,0.1737204,29.29,29.79,29.82,29.88,30.12,54.0,34.981481,7.053678,2.0,31.0,37.0,37.0,42.0,54.0,32.648148,5.600396,2.0,29.5,32.5,36.0,42.0,54.0,27.037037,8.032864,12.0,21.0,26.0,34.5,41.0
Heavy Rain / Windy,6.0,1.166667,24.701552,-19.0,-9.75,-6.5,-1.75,50.0,6.0,222.666667,132.382275,81.0,143.0,171.0,334.75,390.0,6.0,1257.0,978.368847,184.0,684.0,886.0,2092.25,2475.0,6.0,863.666667,24.848877,819.0,862.25,869.0,869.75,895.0,6.0,864.833333,8.908797,850.0,861.5,867.0,868.75,876.0,6.0,1026.333333,49.854455,956.0,989.5,1040.0,1065.0,1076.0,6.0,47.0,0.0,47.0,47.0,47.0,47.0,47.0,6.0,10.0,0.0,10.0,10.0,10.0,10.0,10.0,6.0,25.0,0.0,25.0,25.0,25.0,25.0,25.0,6.0,32.0,0.0,32.0,32.0,32.0,32.0,32.0,6.0,29.75,0.0,29.75,29.75,29.75,29.75,29.75,6.0,22.0,0.0,22.0,22.0,22.0,22.0,22.0,6.0,40.0,0.0,40.0,40.0,40.0,40.0,40.0,6.0,15.833333,3.430258,12.0,14.25,15.0,16.5,22.0
Light Drizzle,194.0,3.958763,30.807918,-16.0,-6.0,-3.0,0.0,228.0,194.0,234.262887,134.94024,57.0,121.0,191.0,366.5,697.0,194.0,1338.381443,1009.767925,150.0,427.0,1055.0,2248.0,4983.0,194.0,683.649485,263.334459,301.0,486.25,629.0,762.25,1439.0,194.0,687.608247,269.968931,296.0,500.25,624.0,757.5,1430.0,194.0,750.283505,262.159934,2.0,642.0,785.5,892.0,1439.0,194.0,44.886598,3.068733,38.0,44.0,46.0,47.0,49.0,194.0,22.639175,29.649484,10.0,10.0,10.0,10.0,97.0,194.0,7.716495,3.363479,0.0,6.0,7.0,10.0,14.0,194.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,194.0,29.874433,0.2677036,29.54,29.67,29.835,30.13,30.36,194.0,29.180412,10.507507,1.0,19.0,32.0,38.0,50.0,194.0,23.221649,7.179948,4.0,17.0,23.0,29.0,35.0,194.0,21.659794,6.907811,10.0,16.0,21.0,26.0,41.0


### Ordering the dataset by: date (month, day of month, day of week), carrier_code

In [51]:
# Sorting values by date
taxi.sort_values(by=['MONTH', 'DAY_OF_WEEK', 'DAY_OF_MONTH'])

Unnamed: 0,MONTH,DAY_OF_MONTH,DAY_OF_WEEK,CARRIER_CODE,FLIGHT_NO,DEST,DEP_DELAY,SCHEDULED_DURATION,DISTANCE,SCHEDULED_DEPARTURE,ACTUAL_DEP_TIME,SCHEDULED_ARRIVAL,Temperature,Dew Point,Humidity,Wind,Wind Speed,Wind Gust,Pressure,Condition,FLT_SCH_ARRIVAL,FLT_SCH_DEPARTURE,TAXI_OUT
20511,1,6,1,B6,N952JB,FLL,-8,185,1069,301,293,486,33,29,85,W,10,0,29.93,Cloudy,1,4,31
20512,1,6,1,B6,N973JT,SJU,-7,223,1598,301,294,584,33,29,85,W,10,0,29.93,Cloudy,1,4,29
20513,1,6,1,B6,N982JB,LAX,-3,386,2475,330,327,536,33,29,85,W,10,0,29.93,Cloudy,10,24,25
20514,1,6,1,DL,N110DU,SLC,-1,330,1990,330,329,540,33,29,85,W,10,0,29.93,Cloudy,10,24,34
20515,1,6,1,AA,N176UW,PHX,-3,365,2153,345,342,590,33,29,85,W,10,0,29.93,Cloudy,10,24,31
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18414,12,29,7,B6,N317JB,BOS,18,70,187,1360,1378,1430,44,44,10,E,12,0,30.00,Light Rain,18,35,24
18415,12,29,7,AA,N117AN,BOS,-5,76,187,1390,1385,26,44,44,10,E,12,0,30.00,Light Rain,22,19,22
18416,12,29,7,B6,N957JB,SJU,0,225,1598,1412,1412,257,44,44,10,E,12,0,30.00,Light Rain,22,19,18
18417,12,29,7,DL,N994AT,TPA,244,190,1005,1169,1413,1359,44,44,10,E,12,0,30.00,Light Rain,22,19,10


In [52]:
# Sorting values by carrier code
taxi.sort_values(by='CARRIER_CODE')

Unnamed: 0,MONTH,DAY_OF_MONTH,DAY_OF_WEEK,CARRIER_CODE,FLIGHT_NO,DEST,DEP_DELAY,SCHEDULED_DURATION,DISTANCE,SCHEDULED_DEPARTURE,ACTUAL_DEP_TIME,SCHEDULED_ARRIVAL,Temperature,Dew Point,Humidity,Wind,Wind Speed,Wind Gust,Pressure,Condition,FLT_SCH_ARRIVAL,FLT_SCH_DEPARTURE,TAXI_OUT
21543,1,9,4,9E,N907XJ,RDU,-5,128,427,535,530,663,26,9,49,NW,12,22,30.73,Fair,48,20,24
18923,12,31,2,9E,N820AY,IAD,37,100,228,1165,1202,1265,43,39,86,SSW,10,0,29.63,Cloudy,23,35,24
24783,1,19,7,9E,N8783E,BWI,-5,86,184,1160,1155,1246,36,18,48,WNW,16,0,29.89,Fair,38,28,14
9459,11,30,6,9E,N600LR,PIT,-8,106,340,454,446,560,32,21,64,N,10,0,30.09,Mostly Cloudy,36,31,28
24791,1,19,7,9E,N8974C,ORF,-3,107,290,1170,1167,1277,34,16,48,WNW,17,28,29.92,Fair,38,28,38
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
26031,1,23,4,YX,N133HQ,CMH,-10,121,483,933,923,1054,42,27,55,SSW,5,0,30.33,Cloudy,30,39,22
26026,1,23,4,YX,N404YX,IND,-11,143,665,930,919,1073,42,27,55,SSW,5,0,30.33,Cloudy,30,39,18
20165,1,4,6,YX,N214JQ,BOS,-2,89,187,1260,1258,1349,47,42,83,NW,18,26,29.53,Light Rain,35,43,39
20166,1,4,6,YX,N875RW,PIT,14,117,340,1250,1264,1367,47,42,83,NW,18,26,29.53,Light Rain,30,27,35


### Dropping columns (dates, categorical data) for k-Fold Cross-Validation

In [53]:
taxiCV = taxi.drop(columns=['MONTH', 'DAY_OF_MONTH', 'DAY_OF_WEEK', 'CARRIER_CODE', 'FLIGHT_NO', 'DEST', 'Wind', 'Condition'])
taxiCV

Unnamed: 0,DEP_DELAY,SCHEDULED_DURATION,DISTANCE,SCHEDULED_DEPARTURE,ACTUAL_DEP_TIME,SCHEDULED_ARRIVAL,Temperature,Dew Point,Humidity,Wind Speed,Wind Gust,Pressure,FLT_SCH_ARRIVAL,FLT_SCH_DEPARTURE,TAXI_OUT
0,-1,124,636,324,323,448,48,34,58,25,38,29.86,9,17,14
1,-7,371,2475,340,333,531,48,34,58,25,38,29.86,9,17,15
2,40,181,1069,301,341,482,48,34,58,25,38,29.86,9,17,22
3,-2,168,944,345,343,513,48,34,58,25,38,29.86,9,17,12
4,-4,139,760,360,356,499,46,32,58,24,35,29.91,9,17,13
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28815,2,57,150,1370,1372,1427,39,38,96,6,0,30.18,20,32,19
28816,2,75,187,1390,1392,25,39,38,96,6,0,30.18,19,23,22
28817,283,392,2422,1125,1408,1337,39,38,96,6,0,30.18,19,23,21
28818,5,224,1598,1417,1422,261,39,38,96,6,0,30.18,19,23,13


### k-Fold Cross-Validation (CV)

In [54]:
X = taxiCV.iloc[:-1, 0:14] # set predicting variables (initially trying all of them except taxi out time, k = n) from taxiCV
y = taxiCV.iloc[:-1, 14] # set variable of what we want to predict (taxi out time)

# Define model evaluation method with 15 fold cross-validation
cv = RepeatedKFold(n_splits=15, n_repeats=3, random_state=1) 

# Defining the model with 3 alphas of 0, 1, and 0.01 (λ), one for each n_repeat
taxi_model = RidgeCV(alphas=arange(0, 1, 0.01), cv=cv, scoring='neg_mean_absolute_error') 

# Fitting the model
taxi_model.fit(X, y) 

# Summarise chosen configuration
print('alpha: %f' % taxi_model.alpha_) # λ = 0.99

alpha: 0.990000


### Splitting the dataset (90:10)
> See Real Python's [guide](https://realpython.com/train-test-split-python-data/) for splitting data with scikit-learn and python.
 
> See [this](https://stackoverflow.com/questions/24147278/how-do-i-create-test-and-train-samples-from-one-dataframe-with-pandas) stack overflow thread for examples using pandas data frames. 

In [55]:
# Using X and y from k-Fold CV above, split dataset with ratio of train to test of 90:10
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=1) # test_size set to 0.1 because we want 0.9 train_size, random_state can be any integer

In [56]:
X_train # Show X_train

Unnamed: 0,DEP_DELAY,SCHEDULED_DURATION,DISTANCE,SCHEDULED_DEPARTURE,ACTUAL_DEP_TIME,SCHEDULED_ARRIVAL,Temperature,Dew Point,Humidity,Wind Speed,Wind Gust,Pressure,FLT_SCH_ARRIVAL,FLT_SCH_DEPARTURE
12992,-1,105,427,1030,1029,1135,31,17,57,3,0,30.71,32,32
27682,-3,187,740,970,967,1097,41,25,53,18,0,29.83,28,29
16171,-3,365,2153,345,342,590,35,31,85,8,0,30.14,11,21
15508,30,263,1521,1289,1319,52,27,15,61,10,0,30.63,32,35
4622,-14,192,1089,485,471,677,40,27,60,8,0,30.26,36,30
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10955,-10,100,266,1320,1310,1420,39,25,57,15,30,30.02,29,32
17291,-2,292,1626,1015,1013,1187,45,40,82,14,0,30.36,28,32
5192,-5,412,2586,1199,1194,1431,34,5,30,10,0,30.46,40,26
12172,-2,384,2475,1320,1318,84,50,50,10,14,0,29.77,26,28


In [57]:
y_train # Show y_train

12992    26
27682    18
16171    21
15508    12
4622     26
         ..
10955    35
17291     9
5192     25
12172    14
235      23
Name: TAXI_OUT, Length: 25935, dtype: int64

In [58]:
X_test # Show X_test

Unnamed: 0,DEP_DELAY,SCHEDULED_DURATION,DISTANCE,SCHEDULED_DEPARTURE,ACTUAL_DEP_TIME,SCHEDULED_ARRIVAL,Temperature,Dew Point,Humidity,Wind Speed,Wind Gust,Pressure,FLT_SCH_ARRIVAL,FLT_SCH_DEPARTURE
1637,-6,87,213,515,509,602,50,33,52,12,0,30.39,52,18
22484,-3,265,1391,539,536,744,62,49,62,25,0,29.98,49,18
14604,103,85,187,1034,1137,1119,34,34,10,12,0,29.71,38,36
5945,-11,379,2475,425,414,624,43,43,10,10,0,29.78,16,14
26858,0,125,427,516,516,641,40,35,83,13,0,29.83,47,17
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4184,8,128,483,1220,1228,1348,30,7,38,9,0,30.43,29,44
938,-11,180,1005,447,436,627,44,36,73,3,0,30.37,35,28
25913,-7,395,2422,546,539,761,38,28,68,9,0,30.43,46,18
5549,-8,83,266,1320,1312,1403,39,31,73,17,0,30.07,32,34


In [59]:
y_test # Show y_test

1637     20
22484    33
14604    31
5945     19
26858    16
         ..
4184     14
938      10
25913    15
5549     21
8351     27
Name: TAXI_OUT, Length: 2882, dtype: int64

In [60]:
print(X_train.shape,X_test.shape,y_train.shape,y_test.shape) # Show sizes of train and test data

(25935, 14) (2882, 14) (25935,) (2882,)


### Creating the Linear Regression 
> **Source:** See DHWANIT BALWANI'S [work](https://realpython.com/train-test-split-python-data/) on Kaggle. 

In [74]:

# Create a linear regression object called lreg
lreg = LinearRegression()  

# Perform the linear regression using X_train and y_train
lreg.fit(X_train, y_train)  

# Make predictions based on the model
y_pred = lreg.predict(X_test)  


# Calculate the mean squared error, **0.5 will square root the mean squared error
rmse = mean_squared_error(y_test,y_pred)**0.5
print(rmse) # rmse = 6.575824000116706

6.575824000116706


### Ridge Regression

In [77]:
# define model
ridge_mod = Ridge(alpha=0.05)

# fit model
ridge_mod.fit(X_train, y_train)

# make prediction
ridge_pred = ridge_mod.predict(X_test)

# Calculate the mean squared error
rmser = mean_squared_error(y_test,y_pred)**0.5
print(rmser) #rmser = 6.575824000116706 when alpha is 0.05)



6.575824000116706


### Lasso Regression

In [78]:
# define model
lasso_mod = Lasso(alpha=0.05)

# fit model
lasso_mod.fit(X_train, y_train)

# make prediction
lasso_pred = lasso_mod.predict(X_test)

# Calculate the mean squared error
rmsel = mean_squared_error(y_test,y_pred)**0.5
print(rmsel) #rmsel = 6.575824000116706 when alpha is 0.05)


6.575824000116706
