

# **Predicting Airline Customer Satisfaction**
## Phase 2: Predictive modelling


<center> Names & IDs of group members </center> 

Names  | IDs
------------- | -------------
Matthew Bentham  | S3923076
John Murrowood  | S3923075
Isxaq Warsame  |  S3658179



__________

### Table of contents:
- [Introduction](#intro)  
  - [Phase 1 Summary](#sum)  
  - [Report Overview](#Rover)
  - [Overview of Methodology](#Mover)   
- [Predictive Modeling](#PM)
  - [Feature Selection](#fs)
  - [Model fitting & tuning](#MF)
  - [Neural Network model fitting and tuning](#NMF)
  - [Model comparisons](#MC)
- [Summary & Conclusions](#sumconc)
  - [Project Summary](#PS)
  - [Summary of findings](#SF)
  - [Conclusions](#Conc)





### INTRODUCTION <a name="intro"></a>

#### **Phase 1 Summary:** <a name="sum"></a>



#### **Report Overview:** <a name="Rover"></a>



#### **Overview of Methodology:**<a name="Mover"></a>





### Initial Housekeeping <a name="init"></a>

In [None]:
# Reading in required packages, and setting up warnings filter
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import sklearn
import seaborn as sns
import matplotlib.pyplot as plt
from tabulate import tabulate



airplane_df = pd.read_csv('satisfaction_cleaned_5000.csv')
airplane_df.head(10)

: 

### Predictive Modeling <a name="PM"></a>
#### **Feature Selection:** <a name="fs"></a>


#### **Model Fitting & Tuning:** <a name="MF"></a>

#### **NN Model Fitting & Tuning:** <a name="NMF"></a>

#### **Model Comparison:** <a name="MC"></a>

### Summary & Conclusions <a name="Sumconc"></a>
#### **Project Summary:** <a name="PS"></a>
#### **Summary of findings:** <a name="SF"></a>
#### **Conclusions:** <a name="Conc"></a>

In [1]:
# Reading in required packages, and setting up warnings filter
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import sklearn
import seaborn as sns
import matplotlib.pyplot as plt
from tabulate import tabulate



airplane_df = pd.read_csv('satisfaction_cleaned_5000.csv')
airplane_df.head(10)

Unnamed: 0,Gender,Customer Type,Age,Type of Travel,Class,Flight Distance,Seat comfort,Departure/Arrival time convenient,Food and drink,Gate location,...,Ease of Online booking,On-board service,Leg room service,Baggage handling,Checkin service,Cleanliness,Online boarding,Departure Delay in Minutes,Arrival Delay in Minutes,Satisfaction
0,Female,Loyal Customer,45,Business travel,Business,3184,4,4,5,4,...,4,4,3,4,5,4,5,0.0,0.0,satisfied
1,Male,Loyal Customer,46,Business travel,Eco,2195,3,4,4,4,...,3,4,2,4,4,4,3,0.0,21.0,neutral or dissatisfied
2,Male,Loyal Customer,43,Business travel,Business,2616,5,5,5,5,...,4,4,5,4,5,4,5,0.0,0.0,satisfied
3,Female,disloyal Customer,40,Business travel,Business,1904,2,2,2,1,...,5,5,4,5,3,4,5,15.0,7.0,neutral or dissatisfied
4,Female,Loyal Customer,53,Business travel,Business,318,2,1,1,1,...,2,2,2,2,1,2,2,2.0,0.0,neutral or dissatisfied
5,Female,Loyal Customer,11,Personal Travel,Eco,3539,2,2,2,2,...,4,4,4,4,3,4,4,0.0,0.0,satisfied
6,Female,Loyal Customer,36,Business travel,Business,1923,1,1,1,1,...,4,4,4,4,2,4,2,0.0,0.0,satisfied
7,Male,Loyal Customer,13,Personal Travel,Eco,2726,5,5,5,2,...,3,5,5,3,3,2,3,3.0,0.0,satisfied
8,Male,disloyal Customer,27,Business travel,Business,2398,4,4,4,3,...,5,3,3,5,4,5,5,54.0,80.0,satisfied
9,Female,disloyal Customer,26,Business travel,Eco,2217,3,3,3,4,...,4,2,3,4,4,3,4,114.0,99.0,neutral or dissatisfied


# **One-Hot-Encoding & Integer-Encoding**
- As the target feature for this dataset is either one of satisfied or neutral/dissastisfied we must integer-encode it. Normally, nominal descriptive features would never be integer-encoded.
- Normally, Sklearn would be used to do this but since we have a binary variably of either satisfied or neutral/dissastisfied we can continue with pandas.
- Through visual inspection, it was confirmed that the satisfied variable was correctly encoded as 1 and not a 0

In [2]:
# Creating a categorical columns list to be used with get_dummies()
categorical_cols = airplane_df.columns[airplane_df.dtypes==object].tolist()
categorical_cols
# CHecking dataframe pre-encoding


['Gender', 'Customer Type', 'Type of Travel', 'Class', 'Satisfaction']

In [3]:
airplane_df.head()

Unnamed: 0,Gender,Customer Type,Age,Type of Travel,Class,Flight Distance,Seat comfort,Departure/Arrival time convenient,Food and drink,Gate location,...,Ease of Online booking,On-board service,Leg room service,Baggage handling,Checkin service,Cleanliness,Online boarding,Departure Delay in Minutes,Arrival Delay in Minutes,Satisfaction
0,Female,Loyal Customer,45,Business travel,Business,3184,4,4,5,4,...,4,4,3,4,5,4,5,0.0,0.0,satisfied
1,Male,Loyal Customer,46,Business travel,Eco,2195,3,4,4,4,...,3,4,2,4,4,4,3,0.0,21.0,neutral or dissatisfied
2,Male,Loyal Customer,43,Business travel,Business,2616,5,5,5,5,...,4,4,5,4,5,4,5,0.0,0.0,satisfied
3,Female,disloyal Customer,40,Business travel,Business,1904,2,2,2,1,...,5,5,4,5,3,4,5,15.0,7.0,neutral or dissatisfied
4,Female,Loyal Customer,53,Business travel,Business,318,2,1,1,1,...,2,2,2,2,1,2,2,2.0,0.0,neutral or dissatisfied


In [4]:
for i in categorical_cols:
    if (airplane_df[i].nunique() == 2): # if it has only two values, e.g, if its binary
        airplane_df[i] = pd.get_dummies(airplane_df[i], drop_first=True, dtype=np.int64)
   
# if it has more than two levels this is where the one hot encoding occurs for those cols
airplane_df = pd.get_dummies(airplane_df, dtype=np.int64)
airplane_df.head()  # Checking Dataframe post-encoding

Unnamed: 0,Gender,Customer Type,Age,Type of Travel,Flight Distance,Seat comfort,Departure/Arrival time convenient,Food and drink,Gate location,Inflight wifi service,...,Baggage handling,Checkin service,Cleanliness,Online boarding,Departure Delay in Minutes,Arrival Delay in Minutes,Satisfaction,Class_Business,Class_Eco,Class_Eco Plus
0,0,0,45,0,3184,4,4,5,4,2,...,4,5,4,5,0.0,0.0,1,1,0,0
1,1,0,46,0,2195,3,4,4,4,3,...,4,4,4,3,0.0,21.0,0,0,1,0
2,1,0,43,0,2616,5,5,5,5,2,...,4,5,4,5,0.0,0.0,1,1,0,0
3,0,1,40,0,1904,2,2,2,1,5,...,5,3,4,5,15.0,7.0,0,1,0,0
4,0,0,53,0,318,2,1,1,1,5,...,2,1,2,2,2.0,0.0,0,1,0,0


- Checking to see if the data types are all numeric after encoding

In [5]:
airplane_df.dtypes

Gender                                 int64
Customer Type                          int64
Age                                    int64
Type of Travel                         int64
Flight Distance                        int64
Seat comfort                           int64
Departure/Arrival time convenient      int64
Food and drink                         int64
Gate location                          int64
Inflight wifi service                  int64
Inflight entertainment                 int64
Online support                         int64
Ease of Online booking                 int64
On-board service                       int64
Leg room service                       int64
Baggage handling                       int64
Checkin service                        int64
Cleanliness                            int64
Online boarding                        int64
Departure Delay in Minutes           float64
Arrival Delay in Minutes             float64
Satisfaction                           int64
Class_Busi

## Scaling of Features
Once One-Hot-Encoding has taken place the features are now scaled using min-max scaling

In [6]:
from sklearn import preprocessing

airplane_df_scaled = airplane_df.copy() # Copying dataframe
scaler = preprocessing.MinMaxScaler()   # setting caling function
airplane_arr = scaler.fit_transform(airplane_df_scaled)  # fitting and tranforming the dataframe

airplane_df_scaled = pd.DataFrame(airplane_arr, columns=airplane_df.columns)    # changing back to dataframe as sk learn only outputs a np array
airplane_df_scaled.head()

Unnamed: 0,Gender,Customer Type,Age,Type of Travel,Flight Distance,Seat comfort,Departure/Arrival time convenient,Food and drink,Gate location,Inflight wifi service,...,Baggage handling,Checkin service,Cleanliness,Online boarding,Departure Delay in Minutes,Arrival Delay in Minutes,Satisfaction,Class_Business,Class_Eco,Class_Eco Plus
0,0.0,0.0,0.487179,0.0,0.454255,0.8,0.8,1.0,0.75,0.4,...,0.75,1.0,0.75,1.0,0.0,0.0,1.0,1.0,0.0,0.0
1,1.0,0.0,0.5,0.0,0.31086,0.6,0.8,0.8,0.75,0.6,...,0.75,0.75,0.75,0.6,0.0,0.162791,0.0,0.0,1.0,0.0
2,1.0,0.0,0.461538,0.0,0.371901,1.0,1.0,1.0,1.0,0.4,...,0.75,1.0,0.75,1.0,0.0,0.0,1.0,1.0,0.0,0.0
3,0.0,1.0,0.423077,0.0,0.268668,0.4,0.4,0.4,0.0,1.0,...,1.0,0.5,0.75,1.0,0.117188,0.054264,0.0,1.0,0.0,0.0
4,0.0,0.0,0.589744,0.0,0.038712,0.4,0.2,0.2,0.0,1.0,...,0.25,0.0,0.25,0.4,0.015625,0.0,0.0,1.0,0.0,0.0


<span style='font-family:"Times New Roman"'> 

### Summary & conclusion: <a name="sum"></a>
<span styel=''>

<span style='font-family:"Times New Roman"'> 

### References: <a name="ref"></a>
<span styel=''>

In [69]:
"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2656082/" # Discretization    

'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2656082/'