## Student ID: 190428550

### Machine Learning and Neural Networks - Template (1)
### Deep Learning on a Public Dataset

##### *This notebook will contain the third iteration of our project, where we will use the models that we previously created to create a model ensemble. Model ensembling produces results that are more accurate than singular models. This is a continuation of the 5th iteration. Any new findings will be reported in new sections.*


In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import tensorflow as tf
import numpy as np
import pandas as pd
import tensorflow.keras as keras
from tensorflow.python.keras import utils
from tensorflow.keras import utils

In [3]:
airplane_data = pd.read_csv("2004.csv")
airplane_data = airplane_data.sample(n=100000)

airplane_data['CRSDepTime'] = airplane_data['CRSDepTime'].astype(str)
length_of_crsdeptime = airplane_data['CRSDepTime'].str.len()
airplane_data['HourOfDay'] = np.select([length_of_crsdeptime==4, length_of_crsdeptime==3, length_of_crsdeptime<3], [airplane_data['CRSDepTime'].str[0:2], airplane_data['CRSDepTime'].str[0:1], 0], np.nan)
airplane_data['HourOfDay'] = airplane_data['HourOfDay'].astype(int)

airplane_data['Delayed'] = np.where(airplane_data['ArrDelay'] > 15, 1, 0)
airplane_data = airplane_data[['Month', 'DayOfWeek', 'UniqueCarrier', 'Origin', 'Dest', 'Distance', 'HourOfDay', 'Delayed']]

airplane_data.head()

Unnamed: 0,Month,DayOfWeek,UniqueCarrier,Origin,Dest,Distance,HourOfDay,Delayed
3451752,6,7,CO,EWR,ORD,719,8,0
5577044,10,7,XE,SAV,EWR,708,17,0
731620,2,6,NW,DTW,BWI,408,13,0
6053324,11,5,WN,LAS,ONT,197,17,1
746335,2,1,NW,DTW,ABE,424,13,0


In [4]:
def df_to_dataset(dataframe, shuffle=True, batch_size=32):
    df = dataframe.copy()
    labels = df.pop('Delayed')
    df = {key: value[:,tf.newaxis] for key, value in dataframe.items()}
    ds = tf.data.Dataset.from_tensor_slices((dict(df), labels))
    if shuffle:
        ds = ds.shuffle(buffer_size=len(dataframe))
        ds = ds.batch(batch_size)
        ds = ds.prefetch(batch_size)
        return ds
    
test_ds = df_to_dataset(airplane_data, batch_size=256)

In [5]:
m1 = tf.keras.models.load_model('flight_delay_classifier_first')
m2 = tf.keras.models.load_model('flight_delay_classifier_second')
m3 = tf.keras.models.load_model('flight_delay_classifier_third')

In [6]:
loss1, accuracy1, auc1 = m1.evaluate(test_ds)
loss2, accuracy2, auc2 = m2.evaluate(test_ds)
loss3, accuracy3, auc3 = m3.evaluate(test_ds)

print(" ")
print("The loss of model 1 is: ", loss1)
print("The loss of model 2 is: ", loss2)
print("The loss of model 3 is: ", loss3)
print(" ")
print("The accuracy of model 1 is: ", accuracy1)
print("The accuracy of model 2 is: ", accuracy2)
print("The accuracy of model 3 is: ", accuracy3)
print("")
print("The AUC of model 1 is: ", auc1)
print("The AUC of model 2 is: ", auc2)
print("The AUC of model 3 is: ", auc3)

 
The loss of model 1 is:  0.4869658648967743
The loss of model 2 is:  0.48777273297309875
The loss of model 3 is:  0.4813674986362457
 
The accuracy of model 1 is:  0.8079500198364258
The accuracy of model 2 is:  0.8079800009727478
The accuracy of model 3 is:  0.8079699873924255

The AUC of model 1 is:  0.5869275331497192
The AUC of model 2 is:  0.5602584481239319
The AUC of model 3 is:  0.589623212814331


In [7]:
sample = {
    'Month': 1,
    'DayOfWeek': 1,
    'UniqueCarrier': 'NW',
    'Origin': 'HNL',
    'Dest': 'SEA',
    'Distance': 2677,
    'HourOfDay': 14,    
}

# MODEL 1 

input_dict = {name: tf.convert_to_tensor([value]) for name, value in sample.items()}
predictions = m1.predict(input_dict)
prob = tf.nn.sigmoid(predictions[0])

print(
    "Using Model 1 to predict, this particular flight had a %.1f percent probability "
    "of getting delayed." % (100 * prob)
)

# MODEL 2 

input_dict = {name: tf.convert_to_tensor([value]) for name, value in sample.items()}
predictions = m2.predict(input_dict)
prob = tf.nn.sigmoid(predictions[0])

print(
    "Using Model 2 to predict, this particular flight had a %.1f percent probability "
    "of getting delayed." % (100 * prob)
)

# MODEL 3 

input_dict = {name: tf.convert_to_tensor([value]) for name, value in sample.items()}
predictions = m3.predict(input_dict)
prob = tf.nn.sigmoid(predictions[0])

print(
    "Using Model 3 to predict, this particular flight had a %.1f percent probability "
    "of getting delayed." % (100 * prob)
)

Using Model 1 to predict, this particular flight had a 57.9 percent probability of getting delayed.
Using Model 2 to predict, this particular flight had a 56.0 percent probability of getting delayed.
Using Model 3 to predict, this particular flight had a 56.4 percent probability of getting delayed.


In [8]:
#Ensembling consists of pooling together the predictions of a set of different models, to produce better predictions

input_dict = {name: tf.convert_to_tensor([value]) for name, value in sample.items()}

preds_a = m1.predict(input_dict)
preds_b = m2.predict(input_dict)
preds_c = m3.predict(input_dict)
final_preds = 0.33 * preds_a + 0.33 * preds_b + 0.33 * preds_c

prob = tf.nn.sigmoid(final_preds)

print(
    "This particular flight had a %.1f percent probability "
    "of getting delayed." % (100 * prob)
)


This particular flight had a 56.7 percent probability of getting delayed.
