# Machine Learning

In this file, instructions how to approach the challenge can be found.

We are going to work on different types of Machine Learning problems:

- **Regression Problem**: The goal is to predict delay of flights.
- **(Stretch) Multiclass Classification**: If the plane was delayed, we will predict what type of delay it is (will be).
- **(Stretch) Binary Classification**: The goal is to predict if the flight will be cancelled.

In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split, KFold, GridSearchCV, cross_val_score
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
scaler = StandardScaler()
import xgboost as xgb
import seaborn as sns
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor

In [2]:
# olsInitial = pd.read_csv('Datacsv/OLSinitial.csv')

In [3]:
# Flights = pd.read_csv('Datacsv/Flights-300K.csv')

In [4]:
trainData2018 = pd.read_csv('Datacsv/trainData2018.csv')
trainData2019 = pd.read_csv('Datacsv/trainData2019.csv')

In [5]:
mainTrainData = pd.concat([trainData2018, trainData2019], ignore_index=True)

In [6]:
newTrain = pd.read_csv('Datacsv/New_trainData.csv')
avg_delay = pd.read_csv('Datacsv/Avg_Arr_Dep_Delay.csv')
carrier_delay = pd.read_csv('Datacsv/avg_op_unique_carrier_delay.csv')

## Main Task: Regression Problem

The target variable is **ARR_DELAY**. We need to be careful which columns to use and which don't. For example, DEP_DELAY is going to be the perfect predictor, but we can't use it because in real-life scenario, we want to predict the delay before the flight takes of --> We can use average delay from earlier days but not the one from the actual flight we predict.  

For example, variables **CARRIER_DELAY, WEATHER_DELAY, NAS_DELAY, SECURITY_DELAY, LATE_AIRCRAFT_DELAY** shouldn't be used directly as predictors as well. However, we can create various transformations from earlier values.

We will be evaluating your models by predicting the ARR_DELAY for all flights **1 week in advance**.

In [7]:
#Here we are adding Average Arrival Delay relative to the month

#Start by changing the date from object to datetime
avg_delay['fl_date'] = pd.to_datetime(avg_delay['fl_date'],
                              format='%Y-%m-%d')

#Groupby to compare monthly averages in delays
# NOTE: Negative values (early arrivals) ARE INCLUDED
month_arr = avg_delay.groupby(avg_delay['fl_date'].dt.strftime('%m'))['avg_arr_delay'].mean()
month_arr = month_arr.to_frame()
month_dep = avg_delay.groupby(avg_delay['fl_date'].dt.strftime('%m'))['avg_dep_delay'].mean()
month_dep = month_dep.to_frame()

#Resetting the index 
month_arr = month_arr.reset_index()
month_dep = month_dep.reset_index()

#Creating 2 copies of fl_date, extracting the month in order to replace the month with its respective Average Arrival and/or Departure Delay 
newTrain['Month_Avg_Arr_Delay'] = pd.to_datetime(newTrain.fl_date , format="%Y-%m-%d").dt.month
newTrain['Month_Avg_Dep_Delay'] = pd.to_datetime(newTrain.fl_date , format="%Y-%m-%d").dt.month

month_arr_dict =  dict(zip(month_arr.fl_date, month_arr.avg_arr_delay))
month_dep_dict =  dict(zip(month_dep.fl_date, month_dep.avg_dep_delay))


newTrain.replace({'Month_Avg_Arr_Delay': {1: 3.9281951759577782,
 2: 6.670705822847316,
 3: 2.854581405409215,
 4: 4.177950054675787,
 5: 6.416833084409337,
 6: 10.393455353404956,
 7: 8.910038151256863,
 8: 8.847961842961464,
 9: 1.5852627540712663,
 10: 2.7923909776573588,
 11: 2.757202900691894,
 12: 4.815971225866452}}, inplace=True)

newTrain.replace({'Month_Avg_Dep_Delay': {1: 9.82808600285777,
 2: 11.689433403570048,
 3: 8.45752678421839,
 4: 9.375029826488923,
 5: 11.283686509030298,
 6: 14.629757423341372,
 7: 13.770582924983167,
 8: 13.279282347021876,
 9: 6.900262796528355,
 10: 7.502918821697483,
 11: 8.049444482526964,
 12: 10.62795705344142}}, inplace=True)

In [8]:
#Groupby to account for taxi in/out based on carriers which appeared to have the largest cross variance
Avg_Taxi_Out_Carrier = newTrain['taxi_out'].groupby(newTrain['op_unique_carrier']).mean().reset_index()
Avg_Taxi_In_Carrier = newTrain['taxi_in'].groupby(newTrain['op_unique_carrier']).mean().reset_index()

#Create dictionary filled with op_unique_carrier as keys and the mean taxi in and out times as values
taxi_out_dict = dict(zip(Avg_Taxi_Out_Carrier.op_unique_carrier, Avg_Taxi_Out_Carrier.taxi_out))
taxi_in_dict = dict(zip(Avg_Taxi_In_Carrier.op_unique_carrier, Avg_Taxi_In_Carrier.taxi_in))

#Creating two copies of op_unique_carrier to replace the values with the carrier's respective average taxi in and out time 
newTrain['Avg_Taxi_In_Carrier'] = newTrain['op_unique_carrier']
newTrain['Avg_Taxi_Out_Carrier'] = newTrain['op_unique_carrier']

#Replacing the Carrier codes in copied features with their respective average taxi in and out times.
newTrain.replace({'Avg_Taxi_In_Carrier': {'9E': 7.360715045754416,
 '9K': 4.714285714285714,
 'AA': 9.445789265313048,
 'AS': 8.082283095510885,
 'AX': 7.877306903622693,
 'B6': 7.36336976806185,
 'C5': 8.20173646578141,
 'CP': 9.47292817679558,
 'DL': 7.542487551418056,
 'EM': 4.005050505050505,
 'EV': 8.146282587705182,
 'F9': 10.15011596036264,
 'G4': 6.785416666666666,
 'G7': 7.6468788249694,
 'HA': 7.200770960488275,
 'KS': 3.617021276595745,
 'MQ': 8.747318339100346,
 'NK': 9.849809617825413,
 'OH': 8.452057416267943,
 'OO': 7.693122041031036,
 'PT': 8.16294088425236,
 'QX': 5.72971114167813,
 'UA': 7.847001223990208,
 'VX': 8.774086378737541,
 'WN': 5.293501334008452,
 'YV': 7.493231100994369,
 'YX': 8.656821963394343,
 'ZW': 8.605810234541577}}, inplace=True)

newTrain.replace({'Avg_Taxi_Out_Carrier': {'9E': 21.49329644605235,
 '9K': 8.785714285714286,
 'AA': 18.694389457609862,
 'AS': 18.991042599729195,
 'AX': 20.173615857826384,
 'B6': 17.75419888029859,
 'C5': 24.258426966292134,
 'CP': 18.9292817679558,
 'DL': 17.24063650140723,
 'EM': 8.146464646464647,
 'EV': 20.229320888316703,
 'F9': 16.60278304870335,
 'G4': 13.095052083333334,
 'G7': 19.86689106487148,
 'HA': 11.959524574365563,
 'KS': 5.872340425531915,
 'MQ': 18.889359861591696,
 'NK': 15.177690029615006,
 'OH': 17.736363636363638,
 'OO': 19.763907154129406,
 'PT': 20.783904619970194,
 'QX': 13.661393856029344,
 'UA': 19.814797619550077,
 'VX': 21.036544850498338,
 'WN': 12.319694638649244,
 'YV': 17.57553612076195,
 'YX': 21.11281198003328,
 'ZW': 19.840618336886994}}, inplace=True)

In [9]:
#Create 4 copies of origin_city_name feature to replace the current values with their respective longtitude and latitude values
newTrain['originLat'] = newTrain['origin_city_name']
newTrain['originLong'] = newTrain['origin_city_name']
newTrain['destLat'] = newTrain['dest_city_name']
newTrain['destLong'] = newTrain['dest_city_name']

#Replacing the City names with their longitude and latitude values
#Geopy (from geopy.geocoders import Nominatim) was used in the aggregation of these values, but some had to manually encoded due to API call limits
newTrain.replace({'originLat': {'Aberdeen, SD': 45.4649805, 'Abilene, TX': 32.44645, 'Adak Island, AK': 51.7961654, 'Aguadilla, PR': 18.4274359, 'Akron, OH': 41.083064, 'Albany, GA': 42.7439143, 'Albany, NY': 42.6511674, 'Albuquerque, NM': 35.0841034,
                                'Alexandria, LA': 31.199004, 'Allentown/Bethlehem/Easton, PA': 40.651163100000005, 'Alpena, MI': 45.0176181, 'Amarillo, TX': 35.2072185, 'Anchorage, AK': 61.2163129, 'Appleton, WI': 44.2611337, 'Arcata/Eureka, CA': 40.8033073,
                                'Asheville, NC': 35.6009498, 'Ashland, WV': 37.4084488, 'Aspen, CO': 39.1911128, 'Atlanta, GA': 33.7489924, 'Atlantic City, NJ': 39.3642852, 'Augusta, GA': 48.3689438, 'Austin, TX': 30.2711286, 'Bakersfield, CA': 35.3738712,
                                'Baltimore, MD': 39.2908816, 'Bangor, ME': 44.8011821, 'Barrow, AK': 71.387113, 'Baton Rouge, LA': 30.4459596, 'Beaumont/Port Arthur, TX': 29.954324, 'Belleville, IL': 48.8176714, 'Bellingham, WA': 48.7544012,
                                'Bemidji, MN': 47.4785418, 'Bend/Redmond, OR': 44.2165084, 'Bethel, AK': 60.7922222, 'Billings, MT': 45.7874957, 'Binghamton, NY': 42.096968, 'Birmingham, AL': 52.4459629, 'Bismarck/Mandan, ND': 46.8101709,
                                'Bloomington/Normal, IL': 40.508752, 'Boise, ID': 43.6166163, 'Boston, MA': 42.3602534, 'Bozeman, MT': 45.6794293, 'Brainerd, MN': 46.3580221, 'Branson, MO': 36.6411357, 'Brownsville, TX': 25.9140256, 'Brunswick, GA': 52.3175903,
                                'Buffalo, NY': 42.8867166, 'Bullhead City, AZ': 35.1477774, 'Burbank, CA': 34.1816482, 'Burlington, VT': 44.4761601, 'Butte, MT': 39.6519275, 'Cape Girardeau, MO': 37.3034933, 'Casper, WY': 42.849709, 'Cedar City, UT': 37.6774238,
                                'Cedar Rapids/Iowa City, IA': 41.9758872, 'Champaign/Urbana, IL': 40.1157948, 'Charleston/Dunbar, WV': 38.3616659, 'Charleston, SC': 32.7876012, 'Charlotte Amalie, VI': 18.341137, 'Charlotte, NC': 35.2272086,
                                'Charlottesville, VA': 38.0360726, 'Chattanooga, TN': 35.0457219, 'Cheyenne, WY': 41.139981, 'Chicago, IL': 41.8755616, 'Christiansted, VI': 17.7439481, 'Cincinnati, OH': 39.1014537, 'Clarksburg/Fairmont, WV': 39.2798118,
                                'Cleveland, OH': 41.5051613, 'Cody, WY': 44.5263107, 'Colorado Springs, CO': 38.8339578, 'Columbia, MO': 38.951883, 'Columbia, SC': 34.0007493, 'Columbus, GA': 40.0838862, 'Columbus, MS': 33.4956744, 'Columbus, OH': 39.9622601,
                                'Concord, NC': 35.4094178, 'Cordova, AK': 60.5439444, 'Corpus Christi, TX': 27.7477253, 'Dallas/Fort Worth, TX': 32.7476308, 'Dallas, TX': 32.7762719, 'Daytona Beach, FL': 29.2108147, 'Dayton, OH': 39.7589478,
                                'Deadhorse, AK': 70.2006973, 'Del Rio, TX': 29.3655405, 'Denver, CO': 5.3428475, 'Des Moines, IA': 41.5910323, 'Detroit, MI': 42.3315509, 'Devils Lake, ND': 48.112779, 'Dickinson, ND': 46.8791756, 'Dillingham, AK': 59.0397222,
                                'Dothan, AL': 31.2237434, 'Dubuque, IA': 42.5006217, 'Duluth, MN': 46.7729322, 'Durango, CO': 24.833333, 'Eagle, CO': 39.6161124, 'Eau Claire, WI': 44.811349, 'Elko, NV': 41.1958128, 'Elmira/Corning, NY': 42.1608441,
                                'El Paso, TX': 31.7754152, 'Erie, PA': 42.1294712, 'Escanaba, MI': 45.7455707, 'Eugene, OR': 44.0505054, 'Evansville, IN': 37.9386712, 'Everett, WA': 47.9673056, 'Fairbanks, AK': 64.837845, 'Fargo, ND': 46.877229,
                                'Fayetteville, AR': 36.0625843, 'Fayetteville, NC': 35.0525759, 'Flagstaff, AZ': 35.1816047, 'Flint, MI': 43.0161693, 'Florence, SC': 34.1984435, 'Fort Lauderdale, FL': 26.1223084, 'Fort Myers, FL': 26.640628,
                                'Fort Smith, AR': 35.3872218, 'Fort Wayne, IN': 41.0799898, 'Fresno, CA': 36.7394421, 'Gainesville, FL': 29.6519684, 'Garden City, KS': 37.9716898, 'Gillette, WY': 44.290635, 'Grand Forks, ND': 47.9078244,
                                'Grand Island, NE': 40.924271, 'Grand Junction, CO': 39.063956, 'Grand Rapids, MI': 42.9632405, 'Great Falls, MT': 47.5048851, 'Green Bay, WI': 44.5126379, 'Greensboro/High Point, NC': 36.0726355, 'Greenville, NC': 35.613224,
                                'Greer, SC': 34.9381361, 'Guam, TT': 13.486490199999999, 'Gulfport/Biloxi, MS': 30.4900534, 'Gunnison, CO': 38.6476702, 'Gustavus, AK': 58.4128377, 'Hagerstown, MD': 39.6419219, 'Hancock/Houghton, MI': 47.126871,
                                'Harrisburg, PA': 40.2663107, 'Hartford, CT': 41.7655582, 'Hayden, CO': 47.7725145, 'Hays, KS': 38.8791783, 'Helena, MT': 46.5927425, 'Hibbing, MN': 47.427155, 'Hilo, HI': 19.7073734, 'Hilton Head, SC': 32.3836213,
                                'Hobbs, NM': 32.707667, 'Honolulu, HI': 21.304547, 'Hoolehua, HI': 21.1590908, 'Houston, TX': 29.7589382, 'Huntsville, AL': 34.729847, 'Hyannis, MA': 41.651513, 'Idaho Falls, ID': 43.4935245, 'Indianapolis, IN': 39.9164009,
                                'International Falls, MN': 48.601033, 'Islip, NY': 40.7304311, 'Ithaca/Cortland, NY': 42.4415242, 'Jackson/Vicksburg, MS': 32.3520532, 'Jacksonville/Camp Lejeune, NC': 34.7338577, 'Jacksonville, FL': 30.3321838,
                                'Jackson, WY': 32.2990384, 'Jamestown, ND': 46.910544, 'Joplin, MO': 37.08418, 'Juneau, AK': 58.3019496, 'Kahului, HI': 20.8747708, 'Kalamazoo, MI': 42.291707, 'Kalispell, MT': 48.2022563, 'Kansas City, MO': 39.100105,
                                'Kapalua, HI': 20.99490395, 'Kearney, NE': 40.4906216, 'Ketchikan, AK': 55.3430696, 'Key West, FL': 24.5625566, 'Killeen, TX': 31.1171441, 'King Salmon, AK': 58.7551615, 'Knoxville, TN': 35.9603948, 'Kodiak, AK': 57.79,
                                'Kona, HI': 19.743906, 'Kotzebue, AK': 66.8982057, 'La Crosse, WI': 43.8014053, 'Lafayette, LA': 30.2240897, 'Lake Charles, LA': 30.2265949, 'Lanai, HI': 20.830544099999997, 'Lansing, MI': 42.7337712, 'Laramie, WY': 41.311367,
                                'Laredo, TX': 27.5199841, 'Las Vegas, NV': 36.1672559, 'Latrobe, PA': 40.317287, 'Lawton/Fort Sill, OK': 34.6172103, 'Lewisburg, WV': 37.8017879, 'Lewiston, ID': 46.4195913, 'Lexington, KY': 38.0464066, 'Liberal, KS': 37.0430812,
                                'Lihue, HI': 21.9769622, 'Lincoln, NE': 40.8088861, 'Little Rock, AR': 34.7464809, 'Long Beach, CA': 33.7690164, 'Longview, TX': 32.5007031, 'Los Angeles, CA': 34.0536909, 'Louisville, KY': 38.2542376, 'Lubbock, TX': 33.5635206,
                                'Lynchburg, VA': 37.4137536, 'Madison, WI': 43.074761, 'Mammoth Lakes, CA': 37.6432525, 'Manchester, NH': 42.9956397, 'Manhattan/Ft. Riley, KS': 40.8576918, 'Marquette, MI': 46.4481521, "Martha's Vineyard, MA": 41.3918832,
                                'Medford, OR': 42.3264181, 'Melbourne, FL': 28.106471, 'Memphis, TN': 35.1490215, 'Meridian, MS': 32.3643098, 'Miami, FL': 25.7741728, 'Midland/Odessa, TX': 31.8329723, 'Milwaukee, WI': 43.0349931, 'Minneapolis, MN': 44.9772995,
                                'Minot, ND': 48.23251, 'Missoula, MT': 46.8701049, 'Moab, UT': 38.5738096, 'Mobile, AL': 30.6943566, 'Moline, IL': 41.5067003, 'Monroe, LA': 38.2722313, 'Monterey, CA': 36.2231079, 'Montgomery, AL': 32.379952849999995,
                                'Montrose/Delta, CO': 38.8777609, 'Mosinee, WI': 44.7927298, 'Muskegon, MI': 43.2341813, 'Myrtle Beach, SC': 33.6956461, 'Nantucket, MA': 41.316911450000006, 'Nashville, TN': 36.1622296, 'Newark, NJ': 40.735657,
                                'New Haven, CT': 41.298434349999994, 'New Orleans, LA': 29.9499323, 'New York, NY': 40.7127281, 'Niagara Falls, NY': 43.08436, 'Nome, AK': 64.4989922, 'Norfolk, VA': 52.56365215, 'North Bend/Coos Bay, OR': 43.4065089,
                                'North Platte, NE': 41.1238873, 'Oakland, CA': 37.8044557, 'Ogdensburg, NY': 44.694285, 'Ogden, UT': 41.2230048, 'Oklahoma City, OK': 35.4729886, 'Omaha, NE': 41.2587459, 'Ontario, CA': 50.000678, 'Orlando, FL': 28.5421109,
                                'Owensboro, KY': 37.7742152, 'Paducah, KY': 37.0833893, 'Pago Pago, TT': -14.2754786, 'Palm Springs, CA': 33.772179449999996, 'Panama City, FL': 30.1600827, 'Pasco/Kennewick/Richland, WA': 46.1736015, 'Pellston, MI': 45.552789,
                                'Pensacola, FL': 30.421309, 'Peoria, IL': 40.6938609, 'Petersburg, AK': 56.8127965, 'Philadelphia, PA': 39.9527237, 'Phoenix, AZ': 33.4484367, 'Pierre, SD': 44.3683644, 'Pittsburgh, PA': 40.4416941, 'Plattsburgh, NY': 44.69282,
                                'Pocatello, ID': 42.8688613, 'Ponce, PR': 18.0039949, 'Portland, ME': 43.6610277, 'Portland, OR': 45.5202471, 'Portsmouth, NH': 43.0702223, 'Prescott, AZ': 34.5399962, 'Presque Isle/Houlton, ME': 46.661867799999996,
                                'Providence, RI': 41.8239891, 'Provo, UT': 40.2338438, 'Pueblo, CO': 10.961033, 'Pullman, WA': 46.7304268, 'Punta Gorda, FL': 26.9297836, 'Quincy, IL': 39.9356016, 'Raleigh/Durham, NC': 35.9217839, 'Rapid City, SD': 44.0869329,
                                'Redding, CA': 40.5863563, 'Reno, NV': 39.5261206, 'Rhinelander, WI': 45.636623, 'Richmond, VA': 49.1977086, 'Roanoke, VA': 37.270973, 'Rochester, MN': 44.0234387, 'Rochester, NY': 43.157285, 'Rockford, IL': 42.2713945,
                                'Rock Springs, WY': 41.5869225, 'Roswell, NM': 33.3943282, 'Rota, TT': 66.947975, 'Sacramento, CA': 38.5810606, 'Saipan, TT': 7.0698398, 'Salina, KS': 38.8402805, 'Salisbury, MD': 38.3662114, 'Salt Lake City, UT': 40.7596198,
                                'San Angelo, TX': 31.4648357, 'San Antonio, TX': 29.4246002, 'San Diego, CA': 32.7174202, 'Sanford, FL': 28.8117297, 'San Francisco, CA': 37.7790262, 'San Jose, CA': 37.3361905, 'San Juan, PR': -25.4206759,
                                'San Luis Obispo, CA': 35.3540209, 'Santa Ana, CA': 33.7494951, 'Santa Barbara, CA': 34.4221319, 'Santa Fe, NM': 35.6869996, 'Santa Maria, CA': 34.9531295, 'Santa Rosa, CA': 38.4404925, 'Sarasota/Bradenton, FL': 27.499764300000002,
                                'Sault Ste. Marie, MI': 46.490586, 'Savannah, GA': 9.7568312, 'Scottsbluff, NE': 41.862302, 'Scranton/Wilkes-Barre, PA': 41.33709205, 'Seattle, WA': 47.6038321, 'Shreveport, LA': 32.5221828, 'Sioux City, IA': 42.4966815,
                                'Sioux Falls, SD': 43.549973, 'Sitka, AK': 57.0524973, 'South Bend, IN': 38.622348, 'Spokane, WA': 47.6571934, 'Springfield, IL': 39.7990175, 'Springfield, MO': 37.2166779, 'State College, PA': 40.7944504,
                                'Staunton, VA': 38.1357949, 'St. Cloud, MN': 45.5616075, 'St. George, UT': 37.104153, 'Stillwater, OK': 36.1156306, 'St. Louis, MO': 38.6529545, 'Stockton, CA': 37.9577016, 'St. Petersburg, FL': 27.7703796,
                                'Syracuse, NY': 43.0481221, 'Tallahassee, FL': 30.4380832, 'Tampa, FL': 27.9477595, 'Texarkana, AR': 33.4254684, 'Toledo, OH': 41.6529143, 'Traverse City, MI': 44.7606441, 'Trenton, NJ': 40.2170575,
                                'Tucson, AZ': 32.2228765, 'Tulsa, OK': 36.1556805, 'Twin Falls, ID': 42.5704456, 'Tyler, TX': 32.3512601, 'Unalaska, AK': 53.8722824, 'Valdosta, GA': 30.8327022, 'Valparaiso, FL': 30.5085309, 'Vernal, UT': 40.4556825,
                                'Waco, TX': 31.549333, 'Walla Walla, WA': 46.0667277, 'Washington, DC': 38.8949924, 'Waterloo, IA': 42.4979693, 'Watertown, NY': 43.9747838, 'Watertown, SD': 44.899211, 'Wenatchee, WA': 47.4234599,
                                'West Palm Beach/Palm Beach, FL': 26.715364, 'West Yellowstone, MT': 44.664290199999996, 'White Plains, NY': 41.0339862, 'Wichita Falls, TX': 33.9137085, 'Wichita, KS': 37.6922361, 'Williamsport, PA': 41.2493292,
                                'Williston, ND': 48.1465457, 'Wilmington, NC': 34.2257282, 'Worcester, MA': 42.2761217, 'Wrangell, AK': 56.4706022, 'Yakima, WA': 46.601557, 'Yakutat, AK': 59.572734499999996, 'Youngstown/Warren, OH': 41.22497,
                                'Yuma, AZ': 32.665135, 'Bristol/Johnson City/Kingsport, TN': 36.475201, 'Mission/McAllen/Edinburg, TX': 26.203407, 'New Bern/Morehead/Beaufort, NC': 35.108494, 'Hattiesburg/Laurel, MS': 31.467,
                                'Iron Mountain/Kingsfd, MI': 45.8146, 'Newburgh/Poughkeepsie, NY': 41.66598, 'College Station/Bryan, TX': 30.601389, 'Saginaw/Bay City/Midland, MI': 43.4195, 'Newport News/Williamsburg, VA': 37.131900,
                                'Harlingen/San Benito, TX': 26.1326, 'Sun Valley/Hailey/Ketchum, ID': 43.504398}}, inplace=True)

newTrain.replace({'originLong': {'Aberdeen, SD': -98.487813, 'Abilene, TX': -99.7475905, 'Adak Island, AK': -176.5734916431957, 'Aguadilla, PR': -67.1541343, 'Akron, OH': -81.518485, 'Albany, GA': -73.8016558, 'Albany, NY': -73.754968,
                                 'Albuquerque, NM': -106.6509851, 'Alexandria, LA': 29.894378, 'Allentown/Bethlehem/Easton, PA': -75.44225386838299, 'Alpena, MI': -83.6670019, 'Amarillo, TX': -101.8338246, 'Anchorage, AK': -149.894852,
                                 'Appleton, WI': -88.4067604, 'Arcata/Eureka, CA': -124.1535049, 'Asheville, NC': -82.5540161, 'Ashland, WV': -81.3526017, 'Aspen, CO': -106.8235606, 'Atlanta, GA': -84.3902644, 'Atlantic City, NJ': -74.4229351,
                                 'Augusta, GA': 10.8933327, 'Austin, TX': -97.7436995, 'Bakersfield, CA': -119.0194639, 'Baltimore, MD': -76.610759, 'Bangor, ME': -68.7778138, 'Barrow, AK': -156.4809618, 'Baton Rouge, LA': -91.18738,
                                 'Beaumont/Port Arthur, TX': -93.985972, 'Belleville, IL': 6.0982683, 'Bellingham, WA': -122.4788361, 'Bemidji, MN': -94.8907869, 'Bend/Redmond, OR': -121.2150324, 'Bethel, AK': -161.7558333, 'Billings, MT': -108.49607,
                                 'Binghamton, NY': -75.914341, 'Birmingham, AL': -1.8237251, 'Bismarck/Mandan, ND': -100.8363564, 'Bloomington/Normal, IL': -88.9844947, 'Boise, ID': -116.200886, 'Boston, MA': -71.0582912, 'Bozeman, MT': -111.044047,
                                 'Brainerd, MN': -94.2008288, 'Branson, MO': -93.2175285, 'Brownsville, TX': -97.4890856, 'Brunswick, GA': 10.560215, 'Buffalo, NY': -78.8783922, 'Bullhead City, AZ': -114.5682983, 'Burbank, CA': -118.3258554,
                                 'Burlington, VT': -73.212906, 'Butte, MT': -121.5858444, 'Cape Girardeau, MO': -89.5230357, 'Casper, WY': -106.3254928, 'Cedar City, UT': -113.0618277, 'Cedar Rapids/Iowa City, IA': -91.6704053,
                                 'Champaign/Urbana, IL': -88.241194, 'Charleston/Dunbar, WV': -81.7207214, 'Charleston, SC': -79.9402728, 'Charlotte Amalie, VI': -64.932789, 'Charlotte, NC': -80.8430827, 'Charlottesville, VA': -78.49973472559668,
                                 'Chattanooga, TN': -85.3094883, 'Cheyenne, WY': -104.820246, 'Chicago, IL': -87.6244212, 'Christiansted, VI': -64.7079823, 'Cincinnati, OH': -84.5124602, 'Clarksburg/Fairmont, WV': -80.3300893, 'Cleveland, OH': -81.6934446,
                                 'Cody, WY': -109.0563923, 'Colorado Springs, CO': -104.8253485, 'Columbia, MO': -92.3337366, 'Columbia, SC': -81.0343313, 'Columbus, GA': -83.0765043, 'Columbus, MS': -88.4272627, 'Columbus, OH': -83.0007065,
                                 'Concord, NC': -80.5800049, 'Cordova, AK': -145.7589103, 'Corpus Christi, TX': -97.4014129, 'Dallas/Fort Worth, TX': -97.3135971, 'Dallas, TX': -96.7968559, 'Daytona Beach, FL': -81.0228331, 'Dayton, OH': -84.1916069,
                                 'Deadhorse, AK': -148.4598151, 'Del Rio, TX': -100.8946984, 'Denver, CO': -72.3959849, 'Des Moines, IA': -93.6046655, 'Detroit, MI': -83.0466403, 'Devils Lake, ND': -98.86512, 'Dickinson, ND': -102.7896242,
                                 'Dillingham, AK': -158.4575, 'Dothan, AL': -85.3933906, 'Dubuque, IA': -90.6647967, 'Duluth, MN': -92.1251218, 'Durango, CO': -104.833333, 'Eagle, CO': -106.7172844, 'Eau Claire, WI': -91.4984941, 'Elko, NV': -115.3272864,
                                 'Elmira/Corning, NY': -76.89199038453467, 'El Paso, TX': -106.464634, 'Erie, PA': -80.0852695, 'Escanaba, MI': -87.0647434, 'Eugene, OR': -123.0950506, 'Evansville, IN': -87.518899, 'Everett, WA': -122.2013998,
                                 'Fairbanks, AK': -147.716675, 'Fargo, ND': -96.789821, 'Fayetteville, AR': -94.1574328, 'Fayetteville, NC': -78.878292, 'Flagstaff, AZ': -111.6165953319917, 'Flint, MI': -83.6900211, 'Florence, SC': -79.7671658,
                                 'Fort Lauderdale, FL': -80.1433786, 'Fort Myers, FL': -81.8723084, 'Fort Smith, AR': -94.4248983, 'Fort Wayne, IN': -85.1386015, 'Fresno, CA': -119.7848307, 'Gainesville, FL': -82.3249846, 'Garden City, KS': -100.8726618,
                                 'Gillette, WY': -105.501876, 'Grand Forks, ND': -97.0592028, 'Grand Island, NE': -98.338685, 'Grand Junction, CO': -108.5507317, 'Grand Rapids, MI': -85.6678639, 'Great Falls, MT': -111.29189, 'Green Bay, WI': -88.0125794,
                                 'Greensboro/High Point, NC': -79.7919754, 'Greenville, NC': -77.3724593, 'Greer, SC': -82.2272119, 'Guam, TT': 144.80206025352555, 'Gulfport/Biloxi, MS': -89.0290044, 'Gunnison, CO': -107.0603126, 'Gustavus, AK': -135.7375654,
                                 'Hagerstown, MD': -77.7202641, 'Hancock/Houghton, MI': -88.580956, 'Harrisburg, PA': -76.8861122, 'Hartford, CT': -72.69061276146614, 'Hayden, CO': -116.82675375791398, 'Hays, KS': -99.3267702, 'Helena, MT': -112.036277,
                                 'Hibbing, MN': -92.937689, 'Hilo, HI': -155.0815803, 'Hilton Head, SC': -99.748119, 'Hobbs, NM': -103.1311314, 'Honolulu, HI': -157.8556764, 'Hoolehua, HI': -157.09484723911947, 'Houston, TX': -95.3676974,
                                 'Huntsville, AL': -86.5859011, 'Hyannis, MA': -70.2825918, 'Idaho Falls, ID': -112.0400919, 'Indianapolis, IN': -86.0519568269157, 'International Falls, MN': -93.4105904, 'Islip, NY': -73.2108618,
                                 'Ithaca/Cortland, NY': -76.4580207, 'Jackson/Vicksburg, MS': -90.8730418, 'Jacksonville/Camp Lejeune, NC': -77.4457643, 'Jacksonville, FL': -81.655651, 'Jackson, WY': -90.1847691, 'Jamestown, ND': -98.708436,
                                 'Joplin, MO': -94.51323, 'Juneau, AK': -134.419734, 'Kahului, HI': -156.4529879461996, 'Kalamazoo, MI': -85.5872286, 'Kalispell, MT': -114.316711, 'Kansas City, MO': -94.5781416, 'Kapalua, HI': -156.6562339558182,
                                 'Kearney, NE': -98.9472344, 'Ketchikan, AK': -131.6466819, 'Key West, FL': -81.7724368, 'Killeen, TX': -97.727796, 'King Salmon, AK': -156.5192469940953, 'Knoxville, TN': -83.9210261, 'Kodiak, AK': -152.4072222,
                                 'Kona, HI': -156.0422959812206, 'Kotzebue, AK': -162.5977621, 'La Crosse, WI': -91.2395429, 'Lafayette, LA': -92.0198427, 'Lake Charles, LA': -93.2173759, 'Lanai, HI': -156.9029492509114, 'Lansing, MI': -84.5553805,
                                 'Laramie, WY': -105.591101, 'Laredo, TX': -99.4953764, 'Las Vegas, NV': -115.1485163, 'Latrobe, PA': -79.3840301, 'Lawton/Fort Sill, OK': -98.4037888, 'Lewisburg, WV': -80.4456303, 'Lewiston, ID': -117.0216144,
                                 'Lexington, KY': -84.4970393, 'Liberal, KS': -100.920999, 'Lihue, HI': -159.3687721, 'Lincoln, NE': -96.7077751, 'Little Rock, AR': -92.2895948, 'Long Beach, CA': -118.191604, 'Longview, TX': -94.74049,
                                 'Los Angeles, CA': -118.242766, 'Louisville, KY': -85.759407, 'Lubbock, TX': -101.879336, 'Lynchburg, VA': -79.1422464, 'Madison, WI': -89.3837613, 'Mammoth Lakes, CA': -118.9668509, 'Manchester, NH': -71.4547891,
                                 'Manhattan/Ft. Riley, KS': -73.9222899, 'Marquette, MI': -87.6305899, "Martha's Vineyard, MA": -70.62085427857699, 'Medford, OR': -122.8718605, 'Melbourne, FL': -80.6371513, 'Memphis, TN': -90.0516285,
                                 'Meridian, MS': -88.703656, 'Miami, FL': -80.19362, 'Midland/Odessa, TX': -102.3606957, 'Milwaukee, WI': -87.922497, 'Minneapolis, MN': -93.2654692, 'Minot, ND': -101.296273, 'Missoula, MT': -113.995267,
                                 'Moab, UT': -109.5462146, 'Mobile, AL': -88.0430541, 'Moline, IL': -90.5151342, 'Monroe, LA': -90.1792484, 'Monterey, CA': -121.3877428, 'Montgomery, AL': -86.3107669425032, 'Montrose/Delta, CO': -108.226467,
                                 'Mosinee, WI': -89.7035959, 'Muskegon, MI': -86.2483921, 'Myrtle Beach, SC': -78.8900409, 'Nantucket, MA': -70.14287301528347, 'Nashville, TN': -86.7743531, 'Newark, NJ': -74.1723667, 'New Haven, CT': -72.93102342707913,
                                 'New Orleans, LA': -90.0701156, 'New York, NY': -74.0060152, 'Niagara Falls, NY': -79.0614686, 'Nome, AK': -165.39879944316317, 'Norfolk, VA': 1.2623608080231654, 'North Bend/Coos Bay, OR': -124.2242824,
                                 'North Platte, NE': -100.7654232, 'Oakland, CA': -122.2713563, 'Ogdensburg, NY': -75.486374, 'Ogden, UT': -111.9738429, 'Oklahoma City, OK': -97.5170536, 'Omaha, NE': -95.9383758, 'Ontario, CA': -86.000977,
                                 'Orlando, FL': -81.3790304, 'Owensboro, KY': -87.1133304, 'Paducah, KY': -88.6000478, 'Pago Pago, TT': -170.7048298, 'Palm Springs, CA': -116.49529769785079, 'Panama City, FL': -85.6545729, 
                                 'Pasco/Kennewick/Richland, WA': -119.0664001, 'Pellston, MI': -84.783936, 'Pensacola, FL': -87.2169149, 'Peoria, IL': -89.5891008, 'Petersburg, AK': -132.95547, 'Philadelphia, PA': -75.1635262, 'Phoenix, AZ': -112.0741417,
                                 'Pierre, SD': -100.3511367, 'Pittsburgh, PA': -79.9900861, 'Plattsburgh, NY': -73.45562, 'Pocatello, ID': -112.4401098, 'Ponce, PR': -66.6169509, 'Portland, ME': -70.2548596, 'Portland, OR': -122.6741949,
                                 'Portsmouth, NH': -70.7548621, 'Prescott, AZ': -112.4687616, 'Presque Isle/Houlton, ME': -68.01074889363161, 'Providence, RI': -71.4128343, 'Provo, UT': -111.6585337, 'Pueblo, CO': -74.84053554739253, 
                                 'Pullman, WA': -117.173895, 'Punta Gorda, FL': -82.0453664, 'Quincy, IL': -91.4098727, 'Raleigh/Durham, NC': -78.76087880585929, 'Rapid City, SD': -103.2274481, 'Redding, CA': -122.3916754, 'Reno, NV': -119.8126581, 
                                 'Rhinelander, WI': -89.412075, 'Richmond, VA': -123.1912406, 'Roanoke, VA': -79.9414313, 'Rochester, MN': -92.4630182, 'Rochester, NY': -77.615214, 'Rockford, IL': -89.093966, 'Rock Springs, WY': -109.2047867,
                                 'Roswell, NM': -104.5229518, 'Rota, TT': 13.553736, 'Sacramento, CA': -121.4938951, 'Saipan, TT': 125.5116649, 'Salina, KS': -97.6114237, 'Salisbury, MD': -75.6008881, 'Salt Lake City, UT': -111.8867975,
                                 'San Angelo, TX': -100.4398442, 'San Antonio, TX': -98.4951405, 'San Diego, CA': -117.1627728, 'Sanford, FL': -81.2680345, 'San Francisco, CA': -122.4199061, 'San Jose, CA': -121.890583, 'San Juan, PR': -49.2687428522959,
                                 'San Luis Obispo, CA': -120.3757163, 'Santa Ana, CA': -117.8732213, 'Santa Barbara, CA': -119.7026673, 'Santa Fe, NM': -105.9377997, 'Santa Maria, CA': -120.4358577, 'Santa Rosa, CA': -122.7141049, 
                                 'Sarasota/Bradenton, FL': -82.56510160912002, 'Sault Ste. Marie, MI': -84.359269, 'Savannah, GA': -2.4962, 'Scottsbluff, NE': -103.6627088, 'Scranton/Wilkes-Barre, PA': -75.72257122928625, 'Seattle, WA': -122.3300624,
                                 'Shreveport, LA': -93.7651944, 'Sioux City, IA': -96.4058782, 'Sioux Falls, SD': -96.7003324, 'Sitka, AK': -135.337612, 'South Bend, IN': -105.518825, 'Spokane, WA': -117.4235106, 'Springfield, IL': -89.6439575,
                                 'Springfield, MO': -93.2920373, 'State College, PA': -77.8616386, 'Staunton, VA': -79.08927008810585, 'St. Cloud, MN': -94.1642004, 'St. George, UT': -113.5841313, 'Stillwater, OK': -97.0585717, 
                                 'St. Louis, MO': -90.24111656024635, 'Stockton, CA': -121.2907796, 'St. Petersburg, FL': -82.6695085, 'Syracuse, NY': -76.1474244, 'Tallahassee, FL': -84.2809332, 'Tampa, FL': -82.458444,
                                 'Texarkana, AR': -94.0430977, 'Toledo, OH': -83.5378173, 'Traverse City, MI': -85.6165301, 'Trenton, NJ': -74.7429463, 'Tucson, AZ': -110.9748477, 'Tulsa, OK': -95.9929113, 'Twin Falls, ID': -114.4602554, 
                                 'Tyler, TX': -95.3010624, 'Unalaska, AK': -166.5272262, 'Valdosta, GA': -83.2784851, 'Valparaiso, FL': -86.5027282, 'Vernal, UT': -109.5284741, 'Waco, TX': -97.1466695, 'Walla Walla, WA': -118.3393456,
                                 'Washington, DC': -77.0365581, 'Waterloo, IA': -92.3329637, 'Watertown, NY': -75.9107565, 'Watertown, SD': -97.115289, 'Wenatchee, WA': -120.3103494, 'West Palm Beach/Palm Beach, FL': -80.0532942, 
                                 'West Yellowstone, MT': -111.10513722509046, 'White Plains, NY': -73.7629097, 'Wichita Falls, TX': -98.4933873, 'Wichita, KS': -97.3375448, 'Williamsport, PA': -77.0027671, 'Williston, ND': -103.621814,
                                 'Wilmington, NC': -77.9447107, 'Worcester, MA': -71.8058232, 'Wrangell, AK': -132.3829431, 'Yakima, WA': -120.5108421, 'Yakutat, AK': -139.57831243878087, 'Youngstown/Warren, OH': -80.789606,
                                 'Yuma, AZ': -114.47603157249804, 'Bristol/Johnson City/Kingsport, TN': -82.407401, 'Mission/McAllen/Edinburg, TX': -98.230011, 'New Bern/Morehead/Beaufort, NC': -77.044113, 'Hattiesburg/Laurel, MS': -89.3331,
                                 'Iron Mountain/Kingsfd, MI': -88.1186, 'Newburgh/Poughkeepsie, NY': -73.884201, 'College Station/Bryan, TX': -96.314445, 'Saginaw/Bay City/Midland, MI': -83.9508, 'Newport News/Williamsburg, VA': -76.492996,
                                 'Harlingen/San Benito, TX': -97.6311, 'Sun Valley/Hailey/Ketchum, ID': -114.2959976}}, inplace=True)

newTrain.replace({'destLat': {'Aberdeen, SD': 45.4649805, 'Abilene, TX': 32.44645, 'Adak Island, AK': 51.7961654, 'Aguadilla, PR': 18.4274359, 'Akron, OH': 41.083064, 'Albany, GA': 42.7439143, 'Albany, NY': 42.6511674, 'Albuquerque, NM': 35.0841034,
                              'Alexandria, LA': 31.199004, 'Allentown/Bethlehem/Easton, PA': 40.651163100000005, 'Alpena, MI': 45.0176181, 'Amarillo, TX': 35.2072185, 'Anchorage, AK': 61.2163129, 'Appleton, WI': 44.2611337, 'Arcata/Eureka, CA': 40.8033073,
                              'Asheville, NC': 35.6009498, 'Ashland, WV': 37.4084488, 'Aspen, CO': 39.1911128, 'Atlanta, GA': 33.7489924, 'Atlantic City, NJ': 39.3642852, 'Augusta, GA': 48.3689438, 'Austin, TX': 30.2711286, 'Bakersfield, CA': 35.3738712,
                              'Baltimore, MD': 39.2908816, 'Bangor, ME': 44.8011821, 'Barrow, AK': 71.387113, 'Baton Rouge, LA': 30.4459596, 'Beaumont/Port Arthur, TX': 29.954324, 'Belleville, IL': 48.8176714, 'Bellingham, WA': 48.7544012,
                              'Bemidji, MN': 47.4785418, 'Bend/Redmond, OR': 44.2165084, 'Bethel, AK': 60.7922222, 'Billings, MT': 45.7874957, 'Binghamton, NY': 42.096968, 'Birmingham, AL': 52.4459629, 'Bismarck/Mandan, ND': 46.8101709,
                              'Bloomington/Normal, IL': 40.508752, 'Boise, ID': 43.6166163, 'Boston, MA': 42.3602534, 'Bozeman, MT': 45.6794293, 'Brainerd, MN': 46.3580221, 'Branson, MO': 36.6411357, 'Brownsville, TX': 25.9140256,
                              'Brunswick, GA': 52.3175903, 'Buffalo, NY': 42.8867166, 'Bullhead City, AZ': 35.1477774, 'Burbank, CA': 34.1816482, 'Burlington, VT': 44.4761601, 'Butte, MT': 39.6519275, 'Cape Girardeau, MO': 37.3034933,
                              'Casper, WY': 42.849709, 'Cedar City, UT': 37.6774238, 'Cedar Rapids/Iowa City, IA': 41.9758872, 'Champaign/Urbana, IL': 40.1157948, 'Charleston/Dunbar, WV': 38.3616659, 'Charleston, SC': 32.7876012,
                              'Charlotte Amalie, VI': 18.341137, 'Charlotte, NC': 35.2272086, 'Charlottesville, VA': 38.0360726, 'Chattanooga, TN': 35.0457219, 'Cheyenne, WY': 41.139981, 'Chicago, IL': 41.8755616, 'Christiansted, VI': 17.7439481,
                              'Cincinnati, OH': 39.1014537, 'Clarksburg/Fairmont, WV': 39.2798118, 'Cleveland, OH': 41.5051613, 'Cody, WY': 44.5263107, 'Colorado Springs, CO': 38.8339578, 'Columbia, MO': 38.951883, 'Columbia, SC': 34.0007493,
                              'Columbus, GA': 40.0838862, 'Columbus, MS': 33.4956744, 'Columbus, OH': 39.9622601, 'Concord, NC': 35.4094178, 'Cordova, AK': 60.5439444, 'Corpus Christi, TX': 27.7477253, 'Dallas/Fort Worth, TX': 32.7476308, 
                              'Dallas, TX': 32.7762719, 'Daytona Beach, FL': 29.2108147, 'Dayton, OH': 39.7589478, 'Deadhorse, AK': 70.2006973, 'Del Rio, TX': 29.3655405, 'Denver, CO': 5.3428475, 'Des Moines, IA': 41.5910323, 'Detroit, MI': 42.3315509,
                              'Devils Lake, ND': 48.112779, 'Dickinson, ND': 46.8791756, 'Dillingham, AK': 59.0397222, 'Dothan, AL': 31.2237434, 'Dubuque, IA': 42.5006217, 'Duluth, MN': 46.7729322, 'Durango, CO': 24.833333, 'Eagle, CO': 39.6161124,
                              'Eau Claire, WI': 44.811349, 'Elko, NV': 41.1958128, 'Elmira/Corning, NY': 42.1608441, 'El Paso, TX': 31.7754152, 'Erie, PA': 42.1294712, 'Escanaba, MI': 45.7455707, 'Eugene, OR': 44.0505054, 'Evansville, IN': 37.9386712,
                              'Everett, WA': 47.9673056, 'Fairbanks, AK': 64.837845, 'Fargo, ND': 46.877229, 'Fayetteville, AR': 36.0625843, 'Fayetteville, NC': 35.0525759, 'Flagstaff, AZ': 35.1816047, 'Flint, MI': 43.0161693, 'Florence, SC': 34.1984435, 
                              'Fort Lauderdale, FL': 26.1223084, 'Fort Myers, FL': 26.640628, 'Fort Smith, AR': 35.3872218, 'Fort Wayne, IN': 41.0799898, 'Fresno, CA': 36.7394421, 'Gainesville, FL': 29.6519684, 'Garden City, KS': 37.9716898, 
                              'Gillette, WY': 44.290635, 'Grand Forks, ND': 47.9078244, 'Grand Island, NE': 40.924271, 'Grand Junction, CO': 39.063956, 'Grand Rapids, MI': 42.9632405, 'Great Falls, MT': 47.5048851, 'Green Bay, WI': 44.5126379,
                              'Greensboro/High Point, NC': 36.0726355, 'Greenville, NC': 35.613224, 'Greer, SC': 34.9381361, 'Guam, TT': 13.486490199999999, 'Gulfport/Biloxi, MS': 30.4900534, 'Gunnison, CO': 38.6476702, 'Gustavus, AK': 58.4128377,
                              'Hagerstown, MD': 39.6419219, 'Hancock/Houghton, MI': 47.126871, 'Harrisburg, PA': 40.2663107, 'Hartford, CT': 41.7655582, 'Hayden, CO': 47.7725145, 'Hays, KS': 38.8791783, 'Helena, MT': 46.5927425, 'Hibbing, MN': 47.427155,
                              'Hilo, HI': 19.7073734, 'Hilton Head, SC': 32.3836213, 'Hobbs, NM': 32.707667, 'Honolulu, HI': 21.304547, 'Hoolehua, HI': 21.1590908, 'Houston, TX': 29.7589382, 'Huntsville, AL': 34.729847, 'Hyannis, MA': 41.651513, 
                              'Idaho Falls, ID': 43.4935245, 'Indianapolis, IN': 39.9164009, 'International Falls, MN': 48.601033, 'Islip, NY': 40.7304311, 'Ithaca/Cortland, NY': 42.4415242, 'Jackson/Vicksburg, MS': 32.3520532,
                              'Jacksonville/Camp Lejeune, NC': 34.7338577, 'Jacksonville, FL': 30.3321838, 'Jackson, WY': 32.2990384, 'Jamestown, ND': 46.910544, 'Joplin, MO': 37.08418, 'Juneau, AK': 58.3019496, 'Kahului, HI': 20.8747708,
                              'Kalamazoo, MI': 42.291707, 'Kalispell, MT': 48.2022563, 'Kansas City, MO': 39.100105, 'Kapalua, HI': 20.99490395, 'Kearney, NE': 40.4906216, 'Ketchikan, AK': 55.3430696, 'Key West, FL': 24.5625566,
                              'Killeen, TX': 31.1171441, 'King Salmon, AK': 58.7551615, 'Knoxville, TN': 35.9603948, 'Kodiak, AK': 57.79, 'Kona, HI': 19.743906, 'Kotzebue, AK': 66.8982057, 'La Crosse, WI': 43.8014053, 'Lafayette, LA': 30.2240897, 
                              'Lake Charles, LA': 30.2265949, 'Lanai, HI': 20.830544099999997, 'Lansing, MI': 42.7337712, 'Laramie, WY': 41.311367, 'Laredo, TX': 27.5199841, 'Las Vegas, NV': 36.1672559, 'Latrobe, PA': 40.317287, 
                              'Lawton/Fort Sill, OK': 34.6172103, 'Lewisburg, WV': 37.8017879, 'Lewiston, ID': 46.4195913, 'Lexington, KY': 38.0464066, 'Liberal, KS': 37.0430812, 'Lihue, HI': 21.9769622, 'Lincoln, NE': 40.8088861,
                              'Little Rock, AR': 34.7464809, 'Long Beach, CA': 33.7690164, 'Longview, TX': 32.5007031, 'Los Angeles, CA': 34.0536909, 'Louisville, KY': 38.2542376, 'Lubbock, TX': 33.5635206, 'Lynchburg, VA': 37.4137536, 
                              'Madison, WI': 43.074761, 'Mammoth Lakes, CA': 37.6432525, 'Manchester, NH': 42.9956397, 'Manhattan/Ft. Riley, KS': 40.8576918, 'Marquette, MI': 46.4481521, "Martha's Vineyard, MA": 41.3918832, 'Medford, OR': 42.3264181,
                              'Melbourne, FL': 28.106471, 'Memphis, TN': 35.1490215, 'Meridian, MS': 32.3643098, 'Miami, FL': 25.7741728, 'Midland/Odessa, TX': 31.8329723, 'Milwaukee, WI': 43.0349931, 'Minneapolis, MN': 44.9772995, 'Minot, ND': 48.23251,
                              'Missoula, MT': 46.8701049, 'Moab, UT': 38.5738096, 'Mobile, AL': 30.6943566, 'Moline, IL': 41.5067003, 'Monroe, LA': 38.2722313, 'Monterey, CA': 36.2231079, 'Montgomery, AL': 32.379952849999995, 'Montrose/Delta, CO': 38.8777609,
                              'Mosinee, WI': 44.7927298, 'Muskegon, MI': 43.2341813, 'Myrtle Beach, SC': 33.6956461, 'Nantucket, MA': 41.316911450000006, 'Nashville, TN': 36.1622296, 'Newark, NJ': 40.735657, 'New Haven, CT': 41.298434349999994,
                              'New Orleans, LA': 29.9499323, 'New York, NY': 40.7127281, 'Niagara Falls, NY': 43.08436, 'Nome, AK': 64.4989922, 'Norfolk, VA': 52.56365215, 'North Bend/Coos Bay, OR': 43.4065089, 'North Platte, NE': 41.1238873,
                              'Oakland, CA': 37.8044557, 'Ogdensburg, NY': 44.694285, 'Ogden, UT': 41.2230048, 'Oklahoma City, OK': 35.4729886, 'Omaha, NE': 41.2587459, 'Ontario, CA': 50.000678, 'Orlando, FL': 28.5421109, 'Owensboro, KY': 37.7742152,
                              'Paducah, KY': 37.0833893, 'Pago Pago, TT': -14.2754786, 'Palm Springs, CA': 33.772179449999996, 'Panama City, FL': 30.1600827, 'Pasco/Kennewick/Richland, WA': 46.1736015, 'Pellston, MI': 45.552789, 'Pensacola, FL': 30.421309,
                              'Peoria, IL': 40.6938609, 'Petersburg, AK': 56.8127965, 'Philadelphia, PA': 39.9527237, 'Phoenix, AZ': 33.4484367, 'Pierre, SD': 44.3683644, 'Pittsburgh, PA': 40.4416941, 'Plattsburgh, NY': 44.69282, 'Pocatello, ID': 42.8688613,
                              'Ponce, PR': 18.0039949, 'Portland, ME': 43.6610277, 'Portland, OR': 45.5202471, 'Portsmouth, NH': 43.0702223, 'Prescott, AZ': 34.5399962, 'Presque Isle/Houlton, ME': 46.661867799999996, 'Providence, RI': 41.8239891,
                              'Provo, UT': 40.2338438, 'Pueblo, CO': 10.961033, 'Pullman, WA': 46.7304268, 'Punta Gorda, FL': 26.9297836, 'Quincy, IL': 39.9356016, 'Raleigh/Durham, NC': 35.9217839, 'Rapid City, SD': 44.0869329, 'Redding, CA': 40.5863563,
                              'Reno, NV': 39.5261206, 'Rhinelander, WI': 45.636623, 'Richmond, VA': 49.1977086, 'Roanoke, VA': 37.270973, 'Rochester, MN': 44.0234387, 'Rochester, NY': 43.157285, 'Rockford, IL': 42.2713945, 'Rock Springs, WY': 41.5869225,
                              'Roswell, NM': 33.3943282, 'Rota, TT': 66.947975, 'Sacramento, CA': 38.5810606, 'Saipan, TT': 7.0698398, 'Salina, KS': 38.8402805, 'Salisbury, MD': 38.3662114, 'Salt Lake City, UT': 40.7596198, 'San Angelo, TX': 31.4648357,
                              'San Antonio, TX': 29.4246002, 'San Diego, CA': 32.7174202, 'Sanford, FL': 28.8117297, 'San Francisco, CA': 37.7790262, 'San Jose, CA': 37.3361905, 'San Juan, PR': -25.4206759, 'San Luis Obispo, CA': 35.3540209,
                              'Santa Ana, CA': 33.7494951, 'Santa Barbara, CA': 34.4221319, 'Santa Fe, NM': 35.6869996, 'Santa Maria, CA': 34.9531295, 'Santa Rosa, CA': 38.4404925, 'Sarasota/Bradenton, FL': 27.499764300000002, 'Sault Ste. Marie, MI': 46.490586,
                              'Savannah, GA': 9.7568312, 'Scottsbluff, NE': 41.862302, 'Scranton/Wilkes-Barre, PA': 41.33709205, 'Seattle, WA': 47.6038321, 'Shreveport, LA': 32.5221828, 'Sioux City, IA': 42.4966815, 'Sioux Falls, SD': 43.549973,
                              'Sitka, AK': 57.0524973, 'South Bend, IN': 38.622348, 'Spokane, WA': 47.6571934, 'Springfield, IL': 39.7990175, 'Springfield, MO': 37.2166779, 'State College, PA': 40.7944504, 'Staunton, VA': 38.1357949, 'St. Cloud, MN': 45.5616075,
                              'St. George, UT': 37.104153, 'Stillwater, OK': 36.1156306, 'St. Louis, MO': 38.6529545, 'Stockton, CA': 37.9577016, 'St. Petersburg, FL': 27.7703796, 'Syracuse, NY': 43.0481221, 'Tallahassee, FL': 30.4380832, 'Tampa, FL': 27.9477595,
                              'Texarkana, AR': 33.4254684, 'Toledo, OH': 41.6529143, 'Traverse City, MI': 44.7606441, 'Trenton, NJ': 40.2170575, 'Tucson, AZ': 32.2228765, 'Tulsa, OK': 36.1556805, 'Twin Falls, ID': 42.5704456, 'Tyler, TX': 32.3512601,
                              'Unalaska, AK': 53.8722824, 'Valdosta, GA': 30.8327022, 'Valparaiso, FL': 30.5085309, 'Vernal, UT': 40.4556825, 'Waco, TX': 31.549333, 'Walla Walla, WA': 46.0667277, 'Washington, DC': 38.8949924, 'Waterloo, IA': 42.4979693,
                              'Watertown, NY': 43.9747838, 'Watertown, SD': 44.899211, 'Wenatchee, WA': 47.4234599, 'West Palm Beach/Palm Beach, FL': 26.715364, 'West Yellowstone, MT': 44.664290199999996, 'White Plains, NY': 41.0339862, 
                              'Wichita Falls, TX': 33.9137085, 'Wichita, KS': 37.6922361, 'Williamsport, PA': 41.2493292, 'Williston, ND': 48.1465457, 'Wilmington, NC': 34.2257282, 'Worcester, MA': 42.2761217, 'Wrangell, AK': 56.4706022,
                              'Yakima, WA': 46.601557, 'Yakutat, AK': 59.572734499999996, 'Youngstown/Warren, OH': 41.22497, 'Yuma, AZ': 32.665135, 'Bristol/Johnson City/Kingsport, TN': 36.475201, 'Mission/McAllen/Edinburg, TX': 26.203407, 
                              'New Bern/Morehead/Beaufort, NC': 35.108494, 'Hattiesburg/Laurel, MS': 31.467, 'Iron Mountain/Kingsfd, MI': 45.8146,'Newburgh/Poughkeepsie, NY': 41.66598, 'College Station/Bryan, TX': 30.601389, 
                              'Saginaw/Bay City/Midland, MI': 43.4195, 'Newport News/Williamsburg, VA': 37.131900, 'Harlingen/San Benito, TX': 26.1326, 'Sun Valley/Hailey/Ketchum, ID': 43.504398}}, inplace=True)

newTrain.replace({'destLong': {'Aberdeen, SD': -98.487813, 'Abilene, TX': -99.7475905, 'Adak Island, AK': -176.5734916431957, 'Aguadilla, PR': -67.1541343, 'Akron, OH': -81.518485, 'Albany, GA': -73.8016558, 'Albany, NY': -73.754968,
                               'Albuquerque, NM': -106.6509851, 'Alexandria, LA': 29.894378, 'Allentown/Bethlehem/Easton, PA': -75.44225386838299, 'Alpena, MI': -83.6670019, 'Amarillo, TX': -101.8338246, 'Anchorage, AK': -149.894852,
                               'Appleton, WI': -88.4067604, 'Arcata/Eureka, CA': -124.1535049, 'Asheville, NC': -82.5540161, 'Ashland, WV': -81.3526017, 'Aspen, CO': -106.8235606, 'Atlanta, GA': -84.3902644, 'Atlantic City, NJ': -74.4229351,
                               'Augusta, GA': 10.8933327, 'Austin, TX': -97.7436995, 'Bakersfield, CA': -119.0194639, 'Baltimore, MD': -76.610759, 'Bangor, ME': -68.7778138, 'Barrow, AK': -156.4809618, 'Baton Rouge, LA': -91.18738,
                               'Beaumont/Port Arthur, TX': -93.985972, 'Belleville, IL': 6.0982683, 'Bellingham, WA': -122.4788361, 'Bemidji, MN': -94.8907869, 'Bend/Redmond, OR': -121.2150324, 'Bethel, AK': -161.7558333,
                               'Billings, MT': -108.49607, 'Binghamton, NY': -75.914341, 'Birmingham, AL': -1.8237251, 'Bismarck/Mandan, ND': -100.8363564, 'Bloomington/Normal, IL': -88.9844947, 'Boise, ID': -116.200886, 
                               'Boston, MA': -71.0582912, 'Bozeman, MT': -111.044047, 'Brainerd, MN': -94.2008288, 'Branson, MO': -93.2175285, 'Brownsville, TX': -97.4890856, 'Brunswick, GA': 10.560215, 'Buffalo, NY': -78.8783922, 
                               'Bullhead City, AZ': -114.5682983, 'Burbank, CA': -118.3258554, 'Burlington, VT': -73.212906, 'Butte, MT': -121.5858444, 'Cape Girardeau, MO': -89.5230357, 'Casper, WY': -106.3254928, 'Cedar City, UT': -113.0618277,
                               'Cedar Rapids/Iowa City, IA': -91.6704053, 'Champaign/Urbana, IL': -88.241194, 'Charleston/Dunbar, WV': -81.7207214, 'Charleston, SC': -79.9402728, 'Charlotte Amalie, VI': -64.932789, 'Charlotte, NC': -80.8430827,
                               'Charlottesville, VA': -78.49973472559668, 'Chattanooga, TN': -85.3094883, 'Cheyenne, WY': -104.820246, 'Chicago, IL': -87.6244212, 'Christiansted, VI': -64.7079823, 'Cincinnati, OH': -84.5124602, 
                               'Clarksburg/Fairmont, WV': -80.3300893, 'Cleveland, OH': -81.6934446, 'Cody, WY': -109.0563923, 'Colorado Springs, CO': -104.8253485, 'Columbia, MO': -92.3337366, 'Columbia, SC': -81.0343313, 'Columbus, GA': -83.0765043,
                               'Columbus, MS': -88.4272627, 'Columbus, OH': -83.0007065, 'Concord, NC': -80.5800049, 'Cordova, AK': -145.7589103, 'Corpus Christi, TX': -97.4014129, 'Dallas/Fort Worth, TX': -97.3135971, 'Dallas, TX': -96.7968559,
                               'Daytona Beach, FL': -81.0228331, 'Dayton, OH': -84.1916069, 'Deadhorse, AK': -148.4598151, 'Del Rio, TX': -100.8946984, 'Denver, CO': -72.3959849, 'Des Moines, IA': -93.6046655, 'Detroit, MI': -83.0466403, 
                               'Devils Lake, ND': -98.86512, 'Dickinson, ND': -102.7896242, 'Dillingham, AK': -158.4575, 'Dothan, AL': -85.3933906, 'Dubuque, IA': -90.6647967, 'Duluth, MN': -92.1251218, 'Durango, CO': -104.833333,
                               'Eagle, CO': -106.7172844, 'Eau Claire, WI': -91.4984941, 'Elko, NV': -115.3272864, 'Elmira/Corning, NY': -76.89199038453467, 'El Paso, TX': -106.464634, 'Erie, PA': -80.0852695, 'Escanaba, MI': -87.0647434, 
                               'Eugene, OR': -123.0950506, 'Evansville, IN': -87.518899, 'Everett, WA': -122.2013998, 'Fairbanks, AK': -147.716675, 'Fargo, ND': -96.789821, 'Fayetteville, AR': -94.1574328, 'Fayetteville, NC': -78.878292,
                               'Flagstaff, AZ': -111.6165953319917, 'Flint, MI': -83.6900211, 'Florence, SC': -79.7671658, 'Fort Lauderdale, FL': -80.1433786, 'Fort Myers, FL': -81.8723084, 'Fort Smith, AR': -94.4248983, 'Fort Wayne, IN': -85.1386015,
                               'Fresno, CA': -119.7848307, 'Gainesville, FL': -82.3249846, 'Garden City, KS': -100.8726618, 'Gillette, WY': -105.501876, 'Grand Forks, ND': -97.0592028, 'Grand Island, NE': -98.338685, 'Grand Junction, CO': -108.5507317,
                               'Grand Rapids, MI': -85.6678639, 'Great Falls, MT': -111.29189, 'Green Bay, WI': -88.0125794, 'Greensboro/High Point, NC': -79.7919754, 'Greenville, NC': -77.3724593, 'Greer, SC': -82.2272119, 'Guam, TT': 144.80206025352555,
                               'Gulfport/Biloxi, MS': -89.0290044, 'Gunnison, CO': -107.0603126, 'Gustavus, AK': -135.7375654, 'Hagerstown, MD': -77.7202641, 'Hancock/Houghton, MI': -88.580956, 'Harrisburg, PA': -76.8861122, 'Hartford, CT': -72.69061276146614,
                               'Hayden, CO': -116.82675375791398, 'Hays, KS': -99.3267702, 'Helena, MT': -112.036277, 'Hibbing, MN': -92.937689, 'Hilo, HI': -155.0815803, 'Hilton Head, SC': -99.748119, 'Hobbs, NM': -103.1311314, 'Honolulu, HI': -157.8556764,
                               'Hoolehua, HI': -157.09484723911947, 'Houston, TX': -95.3676974, 'Huntsville, AL': -86.5859011, 'Hyannis, MA': -70.2825918, 'Idaho Falls, ID': -112.0400919, 'Indianapolis, IN': -86.0519568269157,
                               'International Falls, MN': -93.4105904, 'Islip, NY': -73.2108618, 'Ithaca/Cortland, NY': -76.4580207, 'Jackson/Vicksburg, MS': -90.8730418, 'Jacksonville/Camp Lejeune, NC': -77.4457643, 'Jacksonville, FL': -81.655651,
                               'Jackson, WY': -90.1847691, 'Jamestown, ND': -98.708436, 'Joplin, MO': -94.51323, 'Juneau, AK': -134.419734, 'Kahului, HI': -156.4529879461996, 'Kalamazoo, MI': -85.5872286, 'Kalispell, MT': -114.316711,
                               'Kansas City, MO': -94.5781416, 'Kapalua, HI': -156.6562339558182, 'Kearney, NE': -98.9472344, 'Ketchikan, AK': -131.6466819, 'Key West, FL': -81.7724368, 'Killeen, TX': -97.727796, 'King Salmon, AK': -156.5192469940953, 
                               'Knoxville, TN': -83.9210261, 'Kodiak, AK': -152.4072222, 'Kona, HI': -156.0422959812206, 'Kotzebue, AK': -162.5977621, 'La Crosse, WI': -91.2395429, 'Lafayette, LA': -92.0198427, 'Lake Charles, LA': -93.2173759,
                               'Lanai, HI': -156.9029492509114, 'Lansing, MI': -84.5553805, 'Laramie, WY': -105.591101, 'Laredo, TX': -99.4953764, 'Las Vegas, NV': -115.1485163, 'Latrobe, PA': -79.3840301, 'Lawton/Fort Sill, OK': -98.4037888,
                               'Lewisburg, WV': -80.4456303, 'Lewiston, ID': -117.0216144, 'Lexington, KY': -84.4970393, 'Liberal, KS': -100.920999, 'Lihue, HI': -159.3687721, 'Lincoln, NE': -96.7077751, 'Little Rock, AR': -92.2895948,
                               'Long Beach, CA': -118.191604, 'Longview, TX': -94.74049, 'Los Angeles, CA': -118.242766, 'Louisville, KY': -85.759407, 'Lubbock, TX': -101.879336, 'Lynchburg, VA': -79.1422464, 'Madison, WI': -89.3837613, 
                               'Mammoth Lakes, CA': -118.9668509, 'Manchester, NH': -71.4547891, 'Manhattan/Ft. Riley, KS': -73.9222899, 'Marquette, MI': -87.6305899, "Martha's Vineyard, MA": -70.62085427857699, 'Medford, OR': -122.8718605,
                               'Melbourne, FL': -80.6371513, 'Memphis, TN': -90.0516285, 'Meridian, MS': -88.703656, 'Miami, FL': -80.19362, 'Midland/Odessa, TX': -102.3606957, 'Milwaukee, WI': -87.922497, 'Minneapolis, MN': -93.2654692, 
                               'Minot, ND': -101.296273, 'Missoula, MT': -113.995267, 'Moab, UT': -109.5462146, 'Mobile, AL': -88.0430541, 'Moline, IL': -90.5151342, 'Monroe, LA': -90.1792484, 'Monterey, CA': -121.3877428, 'Montgomery, AL': -86.3107669425032,
                               'Montrose/Delta, CO': -108.226467, 'Mosinee, WI': -89.7035959, 'Muskegon, MI': -86.2483921, 'Myrtle Beach, SC': -78.8900409, 'Nantucket, MA': -70.14287301528347, 'Nashville, TN': -86.7743531, 'Newark, NJ': -74.1723667,
                               'New Haven, CT': -72.93102342707913, 'New Orleans, LA': -90.0701156, 'New York, NY': -74.0060152, 'Niagara Falls, NY': -79.0614686, 'Nome, AK': -165.39879944316317, 'Norfolk, VA': 1.2623608080231654,
                               'North Bend/Coos Bay, OR': -124.2242824, 'North Platte, NE': -100.7654232, 'Oakland, CA': -122.2713563, 'Ogdensburg, NY': -75.486374, 'Ogden, UT': -111.9738429, 'Oklahoma City, OK': -97.5170536, 
                               'Omaha, NE': -95.9383758, 'Ontario, CA': -86.000977, 'Orlando, FL': -81.3790304, 'Owensboro, KY': -87.1133304, 'Paducah, KY': -88.6000478, 'Pago Pago, TT': -170.7048298, 'Palm Springs, CA': -116.49529769785079,
                               'Panama City, FL': -85.6545729, 'Pasco/Kennewick/Richland, WA': -119.0664001, 'Pellston, MI': -84.783936, 'Pensacola, FL': -87.2169149, 'Peoria, IL': -89.5891008, 'Petersburg, AK': -132.95547,
                               'Philadelphia, PA': -75.1635262, 'Phoenix, AZ': -112.0741417, 'Pierre, SD': -100.3511367, 'Pittsburgh, PA': -79.9900861, 'Plattsburgh, NY': -73.45562, 'Pocatello, ID': -112.4401098, 'Ponce, PR': -66.6169509, 
                               'Portland, ME': -70.2548596, 'Portland, OR': -122.6741949, 'Portsmouth, NH': -70.7548621, 'Prescott, AZ': -112.4687616, 'Presque Isle/Houlton, ME': -68.01074889363161, 'Providence, RI': -71.4128343, 'Provo, UT': -111.6585337, 
                               'Pueblo, CO': -74.84053554739253, 'Pullman, WA': -117.173895, 'Punta Gorda, FL': -82.0453664, 'Quincy, IL': -91.4098727, 'Raleigh/Durham, NC': -78.76087880585929, 'Rapid City, SD': -103.2274481, 'Redding, CA': -122.3916754,
                               'Reno, NV': -119.8126581, 'Rhinelander, WI': -89.412075, 'Richmond, VA': -123.1912406, 'Roanoke, VA': -79.9414313, 'Rochester, MN': -92.4630182, 'Rochester, NY': -77.615214, 'Rockford, IL': -89.093966,
                               'Rock Springs, WY': -109.2047867, 'Roswell, NM': -104.5229518, 'Rota, TT': 13.553736, 'Sacramento, CA': -121.4938951, 'Saipan, TT': 125.5116649, 'Salina, KS': -97.6114237, 'Salisbury, MD': -75.6008881, 
                               'Salt Lake City, UT': -111.8867975, 'San Angelo, TX': -100.4398442, 'San Antonio, TX': -98.4951405, 'San Diego, CA': -117.1627728, 'Sanford, FL': -81.2680345, 'San Francisco, CA': -122.4199061, 'San Jose, CA': -121.890583,
                               'San Juan, PR': -49.2687428522959, 'San Luis Obispo, CA': -120.3757163, 'Santa Ana, CA': -117.8732213, 'Santa Barbara, CA': -119.7026673, 'Santa Fe, NM': -105.9377997, 'Santa Maria, CA': -120.4358577, 
                               'Santa Rosa, CA': -122.7141049, 'Sarasota/Bradenton, FL': -82.56510160912002, 'Sault Ste. Marie, MI': -84.359269, 'Savannah, GA': -2.4962, 'Scottsbluff, NE': -103.6627088, 'Scranton/Wilkes-Barre, PA': -75.72257122928625,
                               'Seattle, WA': -122.3300624, 'Shreveport, LA': -93.7651944, 'Sioux City, IA': -96.4058782, 'Sioux Falls, SD': -96.7003324, 'Sitka, AK': -135.337612, 'South Bend, IN': -105.518825, 'Spokane, WA': -117.4235106, 
                               'Springfield, IL': -89.6439575, 'Springfield, MO': -93.2920373, 'State College, PA': -77.8616386, 'Staunton, VA': -79.08927008810585, 'St. Cloud, MN': -94.1642004, 'St. George, UT': -113.5841313, 'Stillwater, OK': -97.0585717,
                               'St. Louis, MO': -90.24111656024635, 'Stockton, CA': -121.2907796, 'St. Petersburg, FL': -82.6695085, 'Syracuse, NY': -76.1474244, 'Tallahassee, FL': -84.2809332, 'Tampa, FL': -82.458444, 'Texarkana, AR': -94.0430977, 
                               'Toledo, OH': -83.5378173, 'Traverse City, MI': -85.6165301, 'Trenton, NJ': -74.7429463, 'Tucson, AZ': -110.9748477, 'Tulsa, OK': -95.9929113, 'Twin Falls, ID': -114.4602554, 'Tyler, TX': -95.3010624, 'Unalaska, AK': -166.5272262,
                               'Valdosta, GA': -83.2784851, 'Valparaiso, FL': -86.5027282, 'Vernal, UT': -109.5284741, 'Waco, TX': -97.1466695, 'Walla Walla, WA': -118.3393456, 'Washington, DC': -77.0365581, 'Waterloo, IA': -92.3329637,
                               'Watertown, NY': -75.9107565, 'Watertown, SD': -97.115289, 'Wenatchee, WA': -120.3103494, 'West Palm Beach/Palm Beach, FL': -80.0532942, 'West Yellowstone, MT': -111.10513722509046, 'White Plains, NY': -73.7629097, 
                               'Wichita Falls, TX': -98.4933873, 'Wichita, KS': -97.3375448, 'Williamsport, PA': -77.0027671, 'Williston, ND': -103.621814, 'Wilmington, NC': -77.9447107, 'Worcester, MA': -71.8058232, 'Wrangell, AK': -132.3829431,
                               'Yakima, WA': -120.5108421, 'Yakutat, AK': -139.57831243878087, 'Youngstown/Warren, OH': -80.789606, 'Yuma, AZ': -114.47603157249804, 'Bristol/Johnson City/Kingsport, TN': -82.407401, 'Mission/McAllen/Edinburg, TX': -98.230011, 
                               'New Bern/Morehead/Beaufort, NC': -77.044113, 'Hattiesburg/Laurel, MS': -89.3331, 'Iron Mountain/Kingsfd, MI': -88.1186,'Newburgh/Poughkeepsie, NY': -73.884201, 'College Station/Bryan, TX': -96.314445, 
                               'Saginaw/Bay City/Midland, MI': -83.9508, 'Newport News/Williamsburg, VA': -76.492996, 'Harlingen/San Benito, TX': -97.6311, 'Sun Valley/Hailey/Ketchum, ID': -114.2959976}}, inplace=True)

In [10]:
#Converting the planned departure time from 24 hours to a more catagorical variable, which captures an
newTrain['crs_dep_time'] = (newTrain['crs_dep_time']/100).astype(int)
newTrain['crs_arr_time'] = (newTrain['crs_arr_time']/100).astype(int)

#Convert fl_date to Datetime, then just month number to account for higher delays within certain months 
monthDummies = pd.get_dummies(pd.to_datetime(newTrain.fl_date , format="%Y-%m-%d").dt.strftime('%B'))
dayDummies = pd.get_dummies(pd.to_datetime(newTrain.fl_date , format="%Y-%m-%d").dt.strftime('%A'))


#Creating dummy variables for carriers to account for delays related to certain carriers then concat these dummies onto newTrain
mktCarrierDummies = pd.get_dummies(newTrain['mkt_unique_carrier'])

# opCarrierDummies = pd.get_dummies(newTrain['op_unique_carrier'])
newTrain = pd.concat([newTrain, mktCarrierDummies, monthDummies, dayDummies], axis=1)
#tes without these dummies then swap and check results
#op dummies was giving better results than mkt dummies

newTrain['distanceSQ'] = newTrain['distance']**2

newTrain['originLong*Lat'] = newTrain['originLong']*newTrain['originLat']

newTrain['originLongSQ'] = newTrain['originLong']**2

newTrain['originLatSQ'] = newTrain['originLat']**2

newTrain['Month_Avg_Arr_DelaySQ'] = newTrain['Month_Avg_Arr_Delay']**2


#Assign X & y
y = newTrain['arr_delay'].values.reshape(-1,1)
X = newTrain.drop(columns = ['fl_date', 'origin', 'dep_time', 'mkt_unique_carrier','op_unique_carrier','arr_delay','origin_city_name','dest','dest_city_name','crs_arr_time', 'arr_time', 'taxi_out', 'taxi_in'])

# #Scale both X and y due to differing units of measurements between features
# XScaled = scaler.fit_transform(X)
# yScaled = scaler.fit_transform(y)

# # #Split data into train and test data
# X_train, X_test, y_train, y_test = train_test_split(XScaled, yScaled, train_size = 0.75, random_state=5)

In [11]:
X

Unnamed: 0,crs_dep_time,distance,Month_Avg_Arr_Delay,Month_Avg_Dep_Delay,Avg_Taxi_In_Carrier,Avg_Taxi_Out_Carrier,originLat,originLong,destLat,destLong,...,Saturday,Sunday,Thursday,Tuesday,Wednesday,distanceSQ,originLong*Lat,originLongSQ,originLatSQ,Month_Avg_Arr_DelaySQ
0,17,2688,6.670706,11.689433,8.082283,18.991043,47.603832,-122.330062,19.743906,-156.042296,...,1,0,0,0,0,7225344,-5823.379751,14964.644167,2266.124831,44.498316
1,16,1069,10.393455,14.629757,5.293501,12.319695,29.424600,-98.495141,36.167256,-115.148516,...,1,0,0,0,0,1142761,-2898.180131,9701.292702,865.807097,108.023914
2,22,912,2.854581,8.457527,9.445789,18.694389,35.227209,-80.843083,29.758938,-95.367697,...,0,1,0,0,0,831744,-2847.876138,6535.604020,1240.956226,8.148635
3,7,581,10.393455,14.629757,7.542488,17.240637,26.122308,-80.143379,33.748992,-84.390264,...,1,0,0,0,0,337561,-2093.530052,6422.961133,682.374996,108.023914
4,8,1091,2.854581,8.457527,8.146283,20.229321,41.505161,-81.693445,29.758938,-95.367697,...,0,0,0,0,1,1190281,-3390.699595,6673.818891,1722.678415,8.148635
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
299995,6,2026,10.393455,14.629757,5.293501,12.319695,35.921784,-78.760879,36.167256,-115.148516,...,0,0,0,1,0,4104676,-2829.231268,6203.276030,1290.374559,108.023914
299996,10,153,6.416833,11.283687,7.542488,17.240637,34.938136,-82.227212,33.748992,-84.390264,...,0,0,0,0,1,23409,-2872.865520,6761.314377,1220.673354,41.175747
299997,19,1042,6.670706,11.689433,7.847001,19.814798,29.758938,-95.367697,35.921784,-78.760879,...,0,0,0,0,0,1085764,-2838.041413,9094.997707,885.594403,44.498316
299998,16,109,8.910038,13.770583,7.693122,19.763907,34.053691,-118.242766,32.717420,-117.162773,...,0,0,0,0,1,11881,-4026.602605,13981.351711,1159.653864,79.388780


In [12]:
X.columns

Index(['crs_dep_time', 'distance', 'Month_Avg_Arr_Delay',
       'Month_Avg_Dep_Delay', 'Avg_Taxi_In_Carrier', 'Avg_Taxi_Out_Carrier',
       'originLat', 'originLong', 'destLat', 'destLong', 'AA', 'AS', 'B6',
       'DL', 'F9', 'G4', 'HA', 'NK', 'UA', 'VX', 'WN', 'April', 'August',
       'December', 'February', 'January', 'July', 'June', 'March', 'May',
       'November', 'October', 'September', 'Friday', 'Monday', 'Saturday',
       'Sunday', 'Thursday', 'Tuesday', 'Wednesday', 'distanceSQ',
       'originLong*Lat', 'originLongSQ', 'originLatSQ',
       'Month_Avg_Arr_DelaySQ'],
      dtype='object')

In [11]:
xg_reg = xgb.XGBRegressor(objective ='reg:squarederror', learning_rate = 0.04,
                max_depth = 5, alpha = 20, n_estimators = 800)

xg_reg.fit(X_train, y_train)
y_pred = xg_reg.predict(X_test)
print('MSE: ', mean_squared_error(y_test, y_pred))
print('R^2 Score: ', r2_score(y_test, y_pred))
print('R^2 Adj-Score: ', 1-(1-r2_score(y_test, y_pred))*((len(X_test)-1)/(len(X_test)-len(X_test[0])-1)))

MSE:  0.9488091377782872
R^2 Score:  0.03330616672907394
R^2 Adj-Score:  0.032725794467457714


In [12]:
xg_reg = xgb.XGBRegressor(objective ='reg:squarederror', learning_rate = 0.03,
                max_depth = 5, alpha = 40, n_estimators = 800)

cv_r2 = []

cv_mse = []

kf = KFold(n_splits=5)

# X_train, X_test, y_train, y_test = train_test_split(XScaled, yScaled, train_size = 0.75, random_state=5)
for train_idx, test_idx in kf.split(XScaled, yScaled):
        X_train, X_test, y_train, y_test = XScaled[train_idx], XScaled[test_idx], yScaled[train_idx], yScaled[test_idx]
        
        xg_reg.fit(X_train, y_train.ravel())
        
        y_pred = xg_reg.predict(X_test)
        mse = mean_squared_error(y_test, y_pred)
        r2 = r2_score(y_test, y_pred)
        
        cv_mse.append(mse)
        cv_r2.append(r2)

cv_mse = np.array(cv_mse)        
cv_r2 = np.array(cv_r2)
print(f'Cross-validated R^2\nMean:\t{cv_r2.mean()}\nStd.:\t{cv_r2.std()}')

Cross-validated R^2
Mean:	0.03341754387312523
Std.:	0.0013685044054007752


In [9]:
X

Index([         'crs_dep_time',              'distance',
         'Month_Avg_Arr_Delay',   'Month_Avg_Dep_Delay',
         'Avg_Taxi_In_Carrier',  'Avg_Taxi_Out_Carrier',
                   'originLat',            'originLong',
                     'destLat',              'destLong',
                          'AA',                    'AS',
                          'B6',                    'DL',
                          'F9',                    'G4',
                          'HA',                    'NK',
                          'UA',                    'VX',
                          'WN',                       1,
                             2,                       3,
                             4,                       5,
                             6,                       7,
                             8,                       9,
                            10,                      11,
                            12,                       0,
                             1,

In [None]:
#Replacing string origin values with weighted float values relative to the flight traffic
newTrain.replace({'origin': {'JFK':32.5, 'MCO':35.0, 'DCA':36.0, 'BOS':38.0, 'PHL':39.0, 'EWR':40.5, 'MSP':41.0, 'LAS':42.0, 'LGA':43.0, 'DTW':43.5, 'IAH':45.0, 'PHX': 45.0, 'SFO':45.0, 'SEA':48.0, 'LAX':63.0, 'CLT':63.4,
                                  'DEN':69.8, 'DFW':73.9, 'ATL':99.0, 'ORD':100.0}}, inplace=True)

#Convert any strings to NaN to then convert the NaNs to zero
newTrain['origin'] = pd.to_numeric(newTrain['origin'], errors='coerce')
newTrain['origin'] = newTrain['origin'].fillna(0)

#Convert fl_date to Datetime, then just month number to account for higher delays within certain months 
newTrain['month'] = pd.to_datetime(newTrain.fl_date , format="%Y-%m-%d").dt.month
newTrain['day'] = pd.to_datetime(newTrain.fl_date , format="%Y-%m-%d").dt.dayofweek
#test without day, include after testing with strictly the month
#including day improved our tests.

#Creating dummy variables for carriers to account for delays related to certain carriers
#then concat these dummies onto newTrain
# mktCarrierDummies = pd.get_dummies(newTrain['mkt_unique_carrier'])
# newTrain = pd.concat([newTrain, mktCarrierDummies], axis=1)

opCarrierDummies = pd.get_dummies(newTrain['op_unique_carrier'])
newTrain = pd.concat([newTrain, opCarrierDummies], axis=1)
#tes without these dummies then swap and check results
#op dummies was giving better results than mkt dummies

newTrain['distanceSQ'] = newTrain['distance']**2

#Assign X & y
y = newTrain['arr_delay'].values.reshape(-1,1)
X = newTrain.drop(columns = ['fl_date', 'dep_time', 'mkt_unique_carrier','op_unique_carrier','arr_delay','origin_city_name','dest','dest_city_name','crs_arr_time', 'arr_time', 'taxi_out', 'taxi_in'])

#Scale both X and y due to differing units of measurements between features
XScaled = scaler.fit_transform(X)
yScaled = scaler.fit_transform(y)

#Split data into train and test data
X_train, X_test, y_train, y_test = train_test_split(XScaled, yScaled, train_size = 0.75, random_state=5)

In [None]:
#XGBoost GridSearch

n_estimators = [400]
max_depth = [5]
learning_rate = [0.5, 0.1, 0.2, 0.15]
objective = ['reg:squarederror']
alpha = [20]
n_jobs = [-2]

parameters = dict(n_estimators=n_estimators,
                 max_depth=max_depth,
                 learning_rate=learning_rate,
                 objective=objective,
                 alpha=alpha)

xgbGS = GridSearchCV(estimator=xgb.XGBRegressor(random_state=5),
                    param_grid=parameters,
                    verbose=1,
                    n_jobs = -2)

xgbGS.fit(X_train, y_train)
xgbGS_pred = xgbGS.predict(X_test)

In [None]:
print(f'Best params:\n {xgbGS.best_params_}')
print(' ')
print(f'Best score: {xgbGS.best_score_}')
xgbGSMSE = mean_squared_error(y_test, xgbGS_pred)
xgbGSr2 = r2_score(y_test,xgbGS_pred)
print(f'Random Forest Regression Test MSE Value:\t{xgbGSMSE}')
print(f'Random Forest Regression Test r2 Value:\t{xgbGSr2}')

In [None]:
#XG Boost Regression

xg_reg = xgb.XGBRegressor(objective ='reg:squarederror', colsample_bytree = 0.4, learning_rate = 0.1,
                max_depth = 8, alpha = 10, n_estimators = 100)

xg_reg.fit(X_train, y_train)
y_pred = xg_reg.predict(X_test)
print('MSE: ', mean_squared_error(y_test, y_pred))
print('R^2 Score: ', r2_score(y_test, y_pred))
print('R^2 Adj-Score: ', 1-(1-r2_score(y_test, y_pred))*((len(X_test)-1)/(len(X_test)-len(X_test[0])-1)))

In [None]:
#Replacing string origin values with weighted float values relative to the flight traffic
newTrain.replace({'origin': {'JFK':32.5, 'MCO':35.0, 'DCA':36.0, 'BOS':38.0, 'PHL':39.0, 'EWR':40.5, 'MSP':41.0, 'LAS':42.0, 'LGA':43.0, 'DTW':43.5, 'IAH':45.0, 'PHX': 45.0, 'SFO':45.0, 'SEA':48.0, 'LAX':63.0, 'CLT':63.4,
                                  'DEN':69.8, 'DFW':73.9, 'ATL':99.0, 'ORD':100.0}}, inplace=True)

#Convert any strings to NaN to then convert the NaNs to zero
newTrain['origin'] = pd.to_numeric(newTrain['origin'], errors='coerce')
newTrain['origin'] = newTrain['origin'].fillna(0)

#Convert fl_date to Datetime, then just month number to account for higher delays within certain months 
newTrain['month'] = pd.to_datetime(newTrain.fl_date , format="%Y-%m-%d").dt.month
newTrain['day'] = pd.to_datetime(newTrain.fl_date , format="%Y-%m-%d").dt.dayofweek
#test without day, include after testing with strictly the month
#including day improved our tests.

#Creating dummy variables for carriers to account for delays related to certain carriers
#then concat these dummies onto newTrain
# mktCarrierDummies = pd.get_dummies(newTrain['mkt_unique_carrier'])
# newTrain = pd.concat([newTrain, mktCarrierDummies], axis=1)

opCarrierDummies = pd.get_dummies(newTrain['op_unique_carrier'])
newTrain = pd.concat([newTrain, opCarrierDummies], axis=1)
#tes without these dummies then swap and check results
#op dummies was giving better results than mkt dummies

newTrain['distanceSQ'] = newTrain['distance']**2

#Assign X & y
y = newTrain['arr_delay'].values.reshape(-1,1)
X = newTrain.drop(columns = ['fl_date', 'crs_dep_time', 'dep_time', 'mkt_unique_carrier','op_unique_carrier','arr_delay','origin_city_name','dest','dest_city_name','crs_arr_time', 'arr_time', 'taxi_out', 'taxi_in'])

#Scale both X and y due to differing units of measurements between features
X = scaler.fit_transform(X)
y = scaler.fit_transform(y)

#Split data into train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.75, random_state=5)

In [None]:
#Random Forest Regression

rfreg = RandomForestRegressor(n_estimators=100, random_state=5)
rfreg.fit(X_train, y_train.ravel())

rfreg_pred_y = rfreg.predict(X_test)

print('MSE: ', mean_squared_error(y_test, rfreg_pred_y))
print('R^2 Score: ', r2_score(y_test, rfreg_pred_y))

In [None]:
X

In [15]:
#XGBoost 
mainTrainData = mainTrainData.dropna()

mainTrainData['fl_date'] = pd.to_datetime(mainTrainData.fl_date, format="%Y-%m-%d").dt.month

monthDummies = pd.get_dummies(mainTrainData['fl_date'])

mktCarrierDummies = pd.get_dummies(mainTrainData['mkt_unique_carrier'])

mainTrainData = pd.concat([mainTrainData, mktCarrierDummies, monthDummies], axis=1)

y = mainTrainData['arr_delay'].values.reshape(-1,1)
X = mainTrainData.drop(columns = ['fl_date','mkt_unique_carrier','op_unique_carrier','origin','dest','arr_delay'])

XScaled = scaler.fit_transform(X)
yScaled = scaler.fit_transform(y)

xg_reg = xgb.XGBRegressor(objective ='reg:squarederror', colsample_bytree = 0.3, learning_rate = 0.01,
                max_depth = 8, alpha = 10, n_estimators = 10)

cv_r2 = []

cv_mse = []

kf = KFold(n_splits=50)

# X_train, X_test, y_train, y_test = train_test_split(XScaled, yScaled, train_size = 0.75, random_state=5)
for train_idx, test_idx in kf.split(XScaled, yScaled):
        X_train, X_test, y_train, y_test = XScaled[train_idx], XScaled[test_idx], yScaled[train_idx], yScaled[test_idx]
        
        xg_reg.fit(X_train, y_train)
        
        y_pred = xg_reg.predict(X_test)
        mse = mean_squared_error(y_test, y_pred)
        r2 = r2_score(y_test, y_pred)
        
        cv_mse.append(mse)
        cv_r2.append(r2)

cv_r2 = np.array(cv_r2)
print(f'Cross-validated R^2\nMean:\t{cv_r2.mean()}\nStd.:\t{cv_r2.std()}')

Cross-validated R^2
Mean:	-0.21906766461303812
Std.:	0.07907361481999438


In [None]:
#RandomForest

mainTrainData = mainTrainData.dropna()

mainTrainData['fl_date'] = pd.to_datetime(mainTrainData.fl_date, format="%Y-%m-%d").dt.month

monthDummies = pd.get_dummies(mainTrainData['fl_date'])

mktCarrierDummies = pd.get_dummies(mainTrainData['mkt_unique_carrier'])

mainTrainData = pd.concat([mainTrainData, mktCarrierDummies, monthDummies], axis=1)

y = mainTrainData['arr_delay'].values.reshape(-1,1)
X = mainTrainData.drop(columns = ['fl_date','mkt_unique_carrier','op_unique_carrier','origin','dest','arr_delay'])

XScaled = scaler.fit_transform(X)
yScaled = scaler.fit_transform(y)

rfreg=RandomForestRegressor(random_state=5)
cv_r2 = []

cv_mse = []

kf = KFold(n_splits=10)

# X_train, X_test, y_train, y_test = train_test_split(XScaled, yScaled, train_size = 0.75, random_state=5)
for train_idx, test_idx in kf.split(XScaled, yScaled):
        X_train, X_test, y_train, y_test = XScaled[train_idx], XScaled[test_idx], yScaled[train_idx], yScaled[test_idx]
        
        rfreg.fit(X_train, y_train.ravel())
        
        y_pred = rfreg.predict(X_test)
        mse = mean_squared_error(y_test, y_pred)
        r2 = r2_score(y_test, y_pred)
        
        cv_mse.append(mse)
        cv_r2.append(r2)

cv_mse = np.array(cv_mse)        
cv_r2 = np.array(cv_r2)
print(f'Cross-validated R^2\nMean:\t{cv_r2.mean()}\nStd.:\t{cv_r2.std()}')

In [None]:
#Polynomial Linear Regression to overcome some underfitting and increase the compplexity of the model

mainTrainData = mainTrainData.dropna()
mainTrainData['fl_date'] = pd.to_datetime(mainTrainData.fl_date, format="%Y-%m-%d").dt.month
monthDummies = pd.get_dummies(mainTrainData['fl_date'])
mktCarrierDummies = pd.get_dummies(mainTrainData['mkt_unique_carrier'])
mainTrainData = pd.concat([mainTrainData, mktCarrierDummies, monthDummies], axis=1)
y = mainTrainData['arr_delay'].values.reshape(-1,1)
X = mainTrainData.drop(columns = ['fl_date','mkt_unique_carrier','op_unique_carrier','origin','dest','arr_delay'])
X['const'] = 1
XScaled = scaler.fit_transform(X)
yScaled = scaler.fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(XScaled, yScaled, train_size = 0.75, random_state=5)
linreg = LinearRegression()
poly = PolynomialFeatures(degree=3)
X_poly_train = poly.fit_transform(X_train)
X_poly_test = poly.transform(X_test)
linreg.fit(X_poly_train, y_train)
y_poly_pred = linreg.predict(X_poly_test)
print('MSE: ', mean_squared_error(y_test, y_poly_pred))
print('R^2 Score: ', linreg.score(X_poly_test, y_test))

In [None]:
#Using the new and improved data! Thanks Karan :D

#OLS Model - updated data, dummies of mkt_carrier
mainTrainData = pd.concat([trainData2018, trainData2019], ignore_index=True)
mainTrainData = mainTrainData.dropna()
mktCarrierDummies = pd.get_dummies(mainTrainData['mkt_unique_carrier'])
originDummies = pd.get_dummies(mainTrainData['origin'])
mainTrainData = pd.concat([mainTrainData, mktCarrierDummies, originDummies], axis=1)
y = mainTrainData['arr_delay'].values.reshape(-1,1)
X = mainTrainData.drop(columns = ['fl_date', 'mkt_unique_carrier','op_unique_carrier','origin','dest','arr_delay'])
X['distanceSQ'] = X['distance']**2
X['const'] = 1
XScaled = scaler.fit_transform(X)
yScaled = scaler.fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(XScaled, yScaled, train_size = 0.75, random_state=5)
linreg = LinearRegression()
linreg.fit(X_train, y_train)
y_pred = linreg.predict(X_test)
print('MSE: ', mean_squared_error(y_test, y_pred))
print('R^2 Score: ', linreg.score(X_test, y_test))
print('R^2 Adj-Score: ', 1-(1-r2_score(y_test, y_pred))*((len(X_test)-1)/(len(X_test)-len(X_test[0])-1)))



In [None]:
#OLS Model - Distance Squared added as a feature
filteredFlights = Flights[['mkt_carrier_fl_num', 'op_carrier_fl_num', 'origin_airport_id', 'dest_airport_id', 'crs_dep_time', 'crs_arr_time', 'distance', 'arr_delay']]
filteredFlights = filteredFlights.dropna()
y = filteredFlights.iloc[:, -1:]
X = filteredFlights.iloc[:,1:-1]
X['distanceSQ'] = X['distance']**2
X['const'] = 1
XScaled = scaler.fit_transform(X)
yScaled = scaler.fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(XScaled, yScaled, train_size = 0.75, random_state=5)
linreg = LinearRegression()
linreg.fit(X_train, y_train)
y_pred = linreg.predict(X_test)
print('MSE: ', mean_squared_error(y_test, y_pred))
print('R^2 Score: ', linreg.score(X_test, y_test))
print('R^2 Adj-Score: ', 1-(1-r2_score(y_test, y_pred))*((len(X_test)-1)/(len(X_test)-len(X_test[0])-1)))

In [17]:
filteredFlights = Flights[['mkt_carrier_fl_num', 'op_carrier_fl_num', 'origin_airport_id', 'dest_airport_id', 'crs_dep_time', 'crs_arr_time', 'distance', 'arr_delay']]
filteredFlights

Unnamed: 0,mkt_carrier_fl_num,op_carrier_fl_num,origin_airport_id,dest_airport_id,crs_dep_time,crs_arr_time,distance,arr_delay
0,1846,1846,14122,11057,701,849,366,874.0
1,1847,1847,11298,14747,910,1148,1660,5.0
2,1847,1847,14747,11298,1250,1850,1660,9.0
3,1848,1848,13204,14100,1632,1901,861,-9.0
4,1848,1848,14100,13204,1245,1521,861,26.0
...,...,...,...,...,...,...,...,...
299995,670,670,14771,14679,1325,1500,447,-6.0
299996,1719,1719,14771,14679,635,810,447,-14.0
299997,1721,1721,14771,14679,1110,1250,447,-20.0
299998,1723,1723,14771,14679,1540,1715,447,5.0


In [22]:
#XGBoost Model - Distance Squared added as a feature

filteredFlights = Flights[['mkt_carrier_fl_num', 'op_carrier_fl_num', 'origin_airport_id', 'dest_airport_id', 'crs_dep_time', 'crs_arr_time', 'distance', 'arr_delay']]
filteredFlights = filteredFlights.dropna()
y = filteredFlights.iloc[:, -1:]
X = filteredFlights.iloc[:,1:-1]
X['distanceSQ'] = X['distance']**2
XScaled = scaler.fit_transform(X)
yScaled = scaler.fit_transform(y)

data_dmatrix = xgb.DMatrix(data=XScaled,label=yScaled)

X_train, X_test, y_train, y_test = train_test_split(XScaled, yScaled, train_size = 0.75, random_state=5)
xg_reg = xgb.XGBRegressor(objective ='reg:squarederror', learning_rate = 0.01,
                max_depth = 8, alpha = 10, n_estimators = 200)

xg_reg.fit(X_train, y_train)
y_pred = xg_reg.predict(X_test)
print('MSE: ', mean_squared_error(y_test, y_pred))
print('R^2 Score: ', linreg.score(X_test, y_test))
print('R^2 Adj-Score: ', 1-(1-r2_score(y_test, y_pred))*((len(X_test)-1)/(len(X_test)-len(X_test[0])-1)))

MSE:  0.9406709150312306
R^2 Score:  0.008863950501306617
R^2 Adj-Score:  0.043397914976841534


In [18]:
#Baseline OLS Model - No Feature Engineering

filteredFlights = Flights[['mkt_carrier_fl_num', 'op_carrier_fl_num', 'origin_airport_id', 'dest_airport_id', 'crs_dep_time', 'crs_arr_time', 'distance', 'arr_delay']]
filteredFlights = filteredFlights.dropna()
y = filteredFlights.iloc[:, -1:]
X = filteredFlights.iloc[:,1:-1]
X['const'] = 1
XScaled = scaler.fit_transform(X)
yScaled = scaler.fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(XScaled, yScaled, train_size = 0.75, random_state=5)
linreg = LinearRegression()
linreg.fit(X_train, y_train)
y_pred = linreg.predict(X_test)
print('MSE: ', mean_squared_error(y_test, y_pred))
print('R^2 Score: ', linreg.score(X_test, y_test))
print('R^2 Adj-Score: ', 1-(1-r2_score(y_test, y_pred))*((len(X_test)-1)/(len(X_test)-len(X_test[0])-1)))

MSE:  0.9747221620357202
R^2 Score:  0.008863950501306617
R^2 Adj-Score:  0.008769977234074355


### Feature Engineering

Feature engineering will play a crucial role in this problems. We have only very little attributes so we need to create some features that will have some predictive power.

- weather: we can use some weather API to look for the weather in time of the scheduled departure and scheduled arrival.
- statistics (avg, mean, median, std, min, max...): we can take a look at previous delays and compute descriptive statistics
- airports encoding: we need to think about what to do with the airports and other categorical variables
- time of the day: the delay probably depends on the airport traffic which varies during the day.
- airport traffic
- unsupervised learning as feature engineering?
- **what are the additional options?**: Think about what we could do more to improve the model.

### Feature Selection / Dimensionality Reduction

We need to apply different selection techniques to find out which one will be the best for our problems.

- Original Features vs. PCA conponents?

### Modeling

Use different ML techniques to predict each problem.

- linear / logistic / multinomial logistic regression
- Naive Bayes
- Random Forest
- SVM
- XGBoost
- The ensemble of your own choice

### Evaluation

You have data from 2018 and 2019 to develop models. Use different evaluation metrics for each problem and compare the performance of different models.

You are required to predict delays on **out of sample** data from **first 7 days (1st-7th) of January 2020** and to share the file with LighthouseLabs. Sample submission can be found in the file **_sample_submission.csv_**

======================================================================
## Stretch Tasks

### Multiclass Classification

The target variables are **CARRIER_DELAY, WEATHER_DELAY, NAS_DELAY, SECURITY_DELAY, LATE_AIRCRAFT_DELAY**. We need to do additional transformations because these variables are not binary but continuos. For each flight that was delayed, we need to have one of these variables as 1 and others 0.

It can happen that we have two types of delays with more than 0 minutes. In this case, take the bigger one as 1 and others as 0.

### Binary Classification

The target variable is **CANCELLED**. The main problem here is going to be huge class imbalance. We have only very little cancelled flights with comparison to all flights. It is important to do the right sampling before training and to choose correct evaluation metrics.