

# **Predicting Airline Customer Satisfaction**
## Phase 1: Data Preparation & Visualisation


<center> Names & IDs of group members </center> 

Names  | IDs
------------- | -------------
Matthew Bentham  | S3923076
John Murrowood  | S3923075
Isxaq Warsame  |  S3658179



__________

### Table of contents:
- [Introduction](#intro)
   - [Data source](#ds)
   - [Dataset detail](#dd)
   - [Dataset features](#df)
   - [Target Feature](#tf)
- [Goals & Objectives](#gao) 
- [Data Cleaning & Preprocessing](#dprep)
- [Data Exploration & Visualisation](#dvis)
- [Literature Review](#lr)
- [Summary & Conclusions](#sum)
- [References](#ref)


### INTRODUCTION <a name="intro"></a>

#### **Data source:** <a name="ds"></a>

The US airline passenger satisfaction survey dataset was sourced from kaggle, uploaded by John D 2018. This dataset contains survey results for whether a customer was satisfied with the flight or not as well as passenger and flight information. The dataset also contains information on what parts of the flight service they were satisfied with or not satisifed with.

URL: [US Airline Passenger Satisfaction](https://www.kaggle.com/datasets/johndddddd/customer-satisfaction)

#### **Dataset details:** <a name="dd"></a>

This dataset contains information on whether customers were satisfied or not during there domestic flight within the USA. There is personal details of each traveller including age, gender, type of travel (personal or buisness), as well as information on the flight including in-flight duration, gate departure and if the flight was delayed. There is also a rating for which cutomers enjoyed certain aspects of the flight such as inflight wifi, cleanliness, leg room and other things. These features will then be used for a classification problem to predict the target feature of whether a customer will be satisfied or not.

The dataset has 24 features, split into descriptive features and survey response features, including the target feature and 129,880 observations before any pre processing is preformed on the dataset.

##### **Dataset Retieval**
- The data was downloaded from kaggle as a xlsx file. Link: [US Airline Passenger Satisfaction](https://www.kaggle.com/datasets/johndddddd/customer-satisfaction) 
- As the data file is in the same github directory as this report 'satisfaction.xlsx' can be read directly 
- The first 10 rows are displayed 

In [54]:
# Reading in required packages, and setting up warnings filter
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import sklearn
from tabulate import tabulate



df_name  = 'satisfaction.xlsx'
df = pd.read_excel(df_name)
df.head(10)

Unnamed: 0,id,satisfaction_v2,Gender,Customer Type,Age,Type of Travel,Class,Flight Distance,Seat comfort,Departure/Arrival time convenient,...,Online support,Ease of Online booking,On-board service,Leg room service,Baggage handling,Checkin service,Cleanliness,Online boarding,Departure Delay in Minutes,Arrival Delay in Minutes
0,11112,satisfied,Female,Loyal Customer,65,Personal Travel,Eco,265,0,0,...,2,3,3,0,3,5,3,2,0,0.0
1,110278,satisfied,Male,Loyal Customer,47,Personal Travel,Business,2464,0,0,...,2,3,4,4,4,2,3,2,310,305.0
2,103199,satisfied,Female,Loyal Customer,15,Personal Travel,Eco,2138,0,0,...,2,2,3,3,4,4,4,2,0,0.0
3,47462,satisfied,Female,Loyal Customer,60,Personal Travel,Eco,623,0,0,...,3,1,1,0,1,4,1,3,0,0.0
4,120011,satisfied,Female,Loyal Customer,70,Personal Travel,Eco,354,0,0,...,4,2,2,0,2,4,2,5,0,0.0
5,100744,satisfied,Male,Loyal Customer,30,Personal Travel,Eco,1894,0,0,...,2,2,5,4,5,5,4,2,0,0.0
6,32838,satisfied,Female,Loyal Customer,66,Personal Travel,Eco,227,0,0,...,5,5,5,0,5,5,5,3,17,15.0
7,32864,satisfied,Male,Loyal Customer,10,Personal Travel,Eco,1812,0,0,...,2,2,3,3,4,5,4,2,0,0.0
8,53786,satisfied,Female,Loyal Customer,56,Personal Travel,Business,73,0,0,...,5,4,4,0,1,5,4,4,0,0.0
9,7243,satisfied,Male,Loyal Customer,22,Personal Travel,Eco,1556,0,0,...,2,2,2,4,5,3,4,2,30,26.0




#### **Dataset features:** <a name="df"></a>



The 24 input Descriptive features: 
- **Age**:  The actual age of the passengers 
- **ID**:   Passenger id unique ID number
- **Gender**:   Gender of the passengers (Female, Male)
- **Type of Travel**:   Purpose of the flight of the passengers (Personal Travel, Business Travel)
- **Class**: Travel class in the plane of the passengers (Business, Eco, Eco Plus)
- **Customer Type**: The customer type (Loyal customer, disloyal customer)
- **Flight distance**: The flight distance of this journey 
- **Flight cancelled**: Whether the Flight cancelled or not (Yes, No)
- **Departure Delay in Minutes (m)**: Minutes delayed when departure 
- **Arrival Delay in Minutes (m)**: Minutes delayed when Arrival, Int 

**Survery response features:** Satisfaction level , Int , On a scale of [0,5]
- Ordinal in nature, the meaning of the numbers is listed below and below that are the response features themselves.
- **0** being **Not Applicable**,
- **1** being **Very Dissatisfied**,
- **2** being **Dissatisfied**,
- **3** being **Neutral**,
- **4** being **Satisfied**,
- **5** being **Very Satisfied**
- **The Survey Response Features are listed below:**
  -  Inflight wifi service (0:Not Applicable;1-5)
  -  Inflight service
  -  Online boarding
  -  Ease of Online booking
  -  Inflight entertainment
  -  Food and drink
  -  Seat comfort
  -  On-board service
  -  Leg room service
  -  Departure/Arrival time
  -  Baggage handling
  -  Gate location
  -  Cleanliness
  -  Check-in service


In [55]:
# Creating table of features
table = [['Name','Data Type','Units','Description'],
         ['Age', 'Numeric', 'Integer','The actual age of the passengers'],
         ['ID', 'Nominal', 'Integer', 'Passenger identifier, unique ID number'],
         ['Gender', 'Binary', 'Male or Female', 'Gender of the passengers (Female, Male)'],
         ['Type of Travel', 'Nominal Categorical', 'Personal or Business', 'Purpose of the flight of the passengers (Personal Travel, Business Travel)'],
         ['Class', 'Nominal Categorical', 'Business, Eco or Eco Plus', 'Travel class in the plane of the passengers (Business, Eco, Eco Plus)' ],
         ['Customer Type', 'Nominal Categorical', 'Loyal Customer or Disloyal Customer', 'The type of customer and how loyal they are to the airline'],
         ['Flight Distance','Numeric', 'Integer, Km', 'The flight distance of the journey in Kilometers (km)' ],
         ['Flight Cancelled', 'Nominal Categorical', 'Boolean, Yes or No', ' Whether the Flight was cancelled or not (yes or no)'  ],
         ['Departure Delay', 'Numeric', 'Integer, Minutes', 'Minutes delayed at departure, e.g, Delay in minutes before aircraft takes-off ' ],
         ['Arrival Delay', 'Numeric', 'Integer, Minutes', 'Minutes Delayed at arrival, e.g, Delay in minutes before aircraft lands']
         ]

print(tabulate(table, headers='firstrow', tablefmt='fancy_grid'))

╒══════════════════╤═════════════════════╤═════════════════════════════════════╤═══════════════════════════════════════════════════════════════════════════════╕
│ Name             │ Data Type           │ Units                               │ Description                                                                   │
╞══════════════════╪═════════════════════╪═════════════════════════════════════╪═══════════════════════════════════════════════════════════════════════════════╡
│ Age              │ Numeric             │ Integer                             │ The actual age of the passengers                                              │
├──────────────────┼─────────────────────┼─────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ ID               │ Nominal             │ Integer                             │ Passenger identifier, unique ID number                                        │
├──────────────────┼──────────────



#### **Target Feature:** <a name="tf"></a>

- 'satisfied' : The airline customer was satisfied with their cumulative experience.
- 'neutral or dissastisfied: The airline customer was NOT satisfied with their cumulative experience.


The target feature for this project is the satisfaction level of the airline customers , meaning the supervised learning task contain a binary target that defines whether or not an airline customer is satsified (Satisfied , neutral/dissatisfied) based off the descriptive variables.




### Goals & Objectives: <a name="gao"></a>

The airline industry is extremely competetive with lots of various running costs and often very fine profit margins. Therefore, these airline surveys are important as they can be used to generate models that are able to predict what the most important features of an airlines service are. Once the significant features are determined they can be used to ascertain what is important to passenger. This allows an airline to priotise spending into specific sectors of thier buisness and ensures they are able to offer the best service that is tailored the largest number of people and which parts of ther service offering they can cut down costs on while ensuring minimal impact on customer satisfaction.

*The Main objectives of this project are as follows:*
1. Identify the features that act as the best predictors for customer satisfaction in airlines. 
2. Subsequently, identify which features can be cut down on to increase profit margins.
3. Predict whether a customer is satisfied with thier airline experience based on basic customer features and survey response awnsers.




### Data Cleaning & Preprocessing: <a name="dprep"></a>


##### Data Cleaning Steps: <a name = "prepsteps"></a>
- Check datatypes and data quality issues 
- Check for outliers and missing values 
- Remove redundant features
- Rename columns
- Randomly sample data 

In [56]:
# CHECK COLUMN NAMES
df.columns


Index(['id', 'satisfaction_v2', 'Gender', 'Customer Type', 'Age',
       'Type of Travel', 'Class', 'Flight Distance', 'Seat comfort',
       'Departure/Arrival time convenient', 'Food and drink', 'Gate location',
       'Inflight wifi service', 'Inflight entertainment', 'Online support',
       'Ease of Online booking', 'On-board service', 'Leg room service',
       'Baggage handling', 'Checkin service', 'Cleanliness', 'Online boarding',
       'Departure Delay in Minutes', 'Arrival Delay in Minutes'],
      dtype='object')

The column containing the target feature (satissfaction) is named irregularly , additionally the id column is a redunant fetaure as it is purely used to uniquely identify passenegers and therefore has no use in a machine learning dataset.

In [57]:
# reorder columns so target feature is at the end 
df=df.rename(columns={'satisfaction_v2':'Satisfaction'})
new_cols = ['id', 'Gender', 'Customer Type', 'Age',
            'Type of Travel', 'Class', 'Flight Distance', 'Seat comfort',
            'Departure/Arrival time convenient', 'Food and drink', 'Gate location',
            'Inflight wifi service', 'Inflight entertainment', 'Online support',
            'Ease of Online booking', 'On-board service', 'Leg room service',
            'Baggage handling', 'Checkin service', 'Cleanliness', 'Online boarding',
            'Departure Delay in Minutes', 'Arrival Delay in Minutes','Satisfaction']
#rename columns + remove redunant column
df=df.reindex(columns=new_cols)

airplane_df = df.drop(columns=["id"])

Check data types:

In [58]:
print(airplane_df.dtypes)

Gender                                object
Customer Type                         object
Age                                    int64
Type of Travel                        object
Class                                 object
Flight Distance                        int64
Seat comfort                           int64
Departure/Arrival time convenient      int64
Food and drink                         int64
Gate location                          int64
Inflight wifi service                  int64
Inflight entertainment                 int64
Online support                         int64
Ease of Online booking                 int64
On-board service                       int64
Leg room service                       int64
Baggage handling                       int64
Checkin service                        int64
Cleanliness                            int64
Online boarding                        int64
Departure Delay in Minutes             int64
Arrival Delay in Minutes             float64
Satisfacti

As the departure delay in minutes is a continuous variable, the corresponding datatype should be float. Otherwise all remaining features have the correct corresponding datatype.

In [59]:
# Changing data type of 'Depature Delay in Minutes from int to float
airplane_df['Departure Delay in Minutes'] = airplane_df['Departure Delay in Minutes'].astype(float)
print(airplane_df.dtypes)

Gender                                object
Customer Type                         object
Age                                    int64
Type of Travel                        object
Class                                 object
Flight Distance                        int64
Seat comfort                           int64
Departure/Arrival time convenient      int64
Food and drink                         int64
Gate location                          int64
Inflight wifi service                  int64
Inflight entertainment                 int64
Online support                         int64
Ease of Online booking                 int64
On-board service                       int64
Leg room service                       int64
Baggage handling                       int64
Checkin service                        int64
Cleanliness                            int64
Online boarding                        int64
Departure Delay in Minutes           float64
Arrival Delay in Minutes             float64
Satisfacti

To avoid future discrepancies , the unique values for each categorical feature is displayed to check for whitespaces , case errors ect. 

In [60]:

Objectdata = airplane_df.columns[airplane_df.dtypes==object].tolist()
print('Variable','|'.center(15), 'Unique values')
print('----------------------------------------')
for x in Objectdata:
    print(x,'|'.rjust(16-len(x)),airplane_df[x].unique())
    
    

Variable        |        Unique values
----------------------------------------
Gender          | ['Female' 'Male']
Customer Type   | ['Loyal Customer' 'disloyal Customer']
Type of Travel  | ['Personal Travel' 'Business travel']
Class           | ['Eco' 'Business' 'Eco Plus']
Satisfaction    | ['satisfied' 'neutral or dissatisfied']


As seen above no discrepancies were found in the categorical data features.

missing values are displayed: as 393 rows only account for 0.3% of the data it can be removed without causing any substantial impact to the overall dataset 

In [61]:
# Count missing values in each column
print(f"\nNumber of missing values for each column/ feature:")
airplane_df = airplane_df.dropna()  # Dropping Missing Values
print(airplane_df.isnull().sum())   # Verifying they have been removed



Number of missing values for each column/ feature:
Gender                               0
Customer Type                        0
Age                                  0
Type of Travel                       0
Class                                0
Flight Distance                      0
Seat comfort                         0
Departure/Arrival time convenient    0
Food and drink                       0
Gate location                        0
Inflight wifi service                0
Inflight entertainment               0
Online support                       0
Ease of Online booking               0
On-board service                     0
Leg room service                     0
Baggage handling                     0
Checkin service                      0
Cleanliness                          0
Online boarding                      0
Departure Delay in Minutes           0
Arrival Delay in Minutes             0
Satisfaction                         0
dtype: int64


In order to check for outliers in the numeric data , the summary statistics for each is displayed: No outliers are detected 

In [62]:

airplane_df.describe(include=['int64','float64']).T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Age,129487.0,39.428761,15.117597,7.0,27.0,40.0,51.0,85.0
Flight Distance,129487.0,1981.008974,1026.884131,50.0,1359.0,1924.0,2543.0,6951.0
Seat comfort,129487.0,2.838586,1.392873,0.0,2.0,3.0,4.0,5.0
Departure/Arrival time convenient,129487.0,2.990277,1.527183,0.0,2.0,3.0,4.0,5.0
Food and drink,129487.0,2.852024,1.443587,0.0,2.0,3.0,4.0,5.0
Gate location,129487.0,2.990377,1.305917,0.0,2.0,3.0,4.0,5.0
Inflight wifi service,129487.0,3.24916,1.318765,0.0,2.0,3.0,4.0,5.0
Inflight entertainment,129487.0,3.383745,1.345959,0.0,2.0,4.0,4.0,5.0
Online support,129487.0,3.519967,1.306326,0.0,3.0,4.0,5.0,5.0
Ease of Online booking,129487.0,3.472171,1.305573,0.0,2.0,4.0,5.0,5.0


As the data contains more than 5000 rows , in order to simplfy the dataset to make it less compuatationally intensive , 5000 randomly sampled rows are generated

In [63]:
airplane_df = airplane_df.sample(n=5000, random_state=111)
print(airplane_df.shape)
airplane_df.sample(10, random_state=111)

(5000, 23)


Unnamed: 0,Gender,Customer Type,Age,Type of Travel,Class,Flight Distance,Seat comfort,Departure/Arrival time convenient,Food and drink,Gate location,...,Ease of Online booking,On-board service,Leg room service,Baggage handling,Checkin service,Cleanliness,Online boarding,Departure Delay in Minutes,Arrival Delay in Minutes,Satisfaction
96669,Male,Loyal Customer,73,Business travel,Eco,3084,4,5,5,5,...,4,3,3,3,4,3,4,2.0,1.0,neutral or dissatisfied
41547,Female,disloyal Customer,25,Business travel,Eco,2757,1,1,1,1,...,3,2,3,2,4,2,3,4.0,0.0,neutral or dissatisfied
116255,Female,Loyal Customer,57,Business travel,Business,2353,2,2,2,2,...,5,5,5,5,3,5,5,11.0,3.0,satisfied
34567,Male,Loyal Customer,30,Personal Travel,Eco,1162,4,4,4,1,...,4,4,3,5,5,5,4,85.0,75.0,neutral or dissatisfied
56933,Female,disloyal Customer,22,Business travel,Eco,1228,4,0,4,2,...,3,3,2,5,4,4,3,0.0,0.0,satisfied
23098,Male,Loyal Customer,14,Personal Travel,Eco Plus,2453,3,3,3,3,...,3,2,5,2,3,4,3,0.0,11.0,neutral or dissatisfied
9626,Male,Loyal Customer,70,Personal Travel,Eco,4035,2,2,2,3,...,4,2,3,3,2,3,4,0.0,0.0,neutral or dissatisfied
66352,Male,Loyal Customer,52,Business travel,Business,2171,1,2,2,2,...,4,3,3,3,1,1,1,137.0,139.0,neutral or dissatisfied
22105,Female,Loyal Customer,41,Personal Travel,Eco,3689,2,2,2,2,...,5,5,5,5,4,5,5,36.0,36.0,satisfied
95816,Male,Loyal Customer,38,Business travel,Business,424,3,3,3,3,...,4,4,4,4,2,4,2,0.0,0.0,satisfied


## **Discretization PLS PROVIDE INPUT READ BELOW**

**READ FIRST**

**After reading some articles im beginning to wonder whether this step is necessary please provide some input for now  ill comment out the code below**



In the next step some of the data will be discretized. This has many benefits in machine learning applications of which some are listed below
- Significantly improves perfomance of classification of algorithms such as Suport Vector Machines and Random Forest
- Improves performance of Naive Bayes algorithm
- Easier to understand continuous variables
- Continuous features will have less of a  chance of correlating with the target variable due to limited degrees of freedom
- Reduces the impact of small fluctuations. Hence, reducing noise in our data.

For the following reasons above the Age, Flight Distance, Departure Delay and the Arrival Delay Features will be discretized.

In [64]:
'''airplane_df['Age'] = pd.qcut(airplane_df['Age'], 
                              q = 3, 
                              labels=['young', 'middle_aged', 'old'])
print(airplane_df['Age'].value_counts())                              

airplane_df['Flight Distance'] = pd.qcut(airplane_df['Flight Distance'], 
                              q = 5, 
                              labels=['very short', 'short', 'medium', 'long', 'very long'])
print(airplane_df['Flight Distance'].value_counts())

airplane_df['Departure Delay in Minutes'] = pd.qcut(airplane_df['Departure Delay in Minutes'], 
                              q = 5, 
                              labels=['very short', 'short', 'medium', 'long', 'very long'])
print(airplane_df['Departure Delay in Minutes'].value_counts())

airplane_df['Arrival Delay in Minutes'] = pd.qcut(airplane_df['Arrival Delay in Minutes'], 
                              q = 5, 
                              labels=['very short', 'short', 'medium', 'long', 'very long'])
print(airplane_df['Arrival Delay in Minutes'].value_counts())   '''

"airplane_df['Age'] = pd.qcut(airplane_df['Age'], \n                              q = 3, \n                              labels=['young', 'middle_aged', 'old'])\nprint(airplane_df['Age'].value_counts())                              \n\nairplane_df['Flight Distance'] = pd.qcut(airplane_df['Flight Distance'], \n                              q = 5, \n                              labels=['very short', 'short', 'medium', 'long', 'very long'])\nprint(airplane_df['Flight Distance'].value_counts())\n\nairplane_df['Departure Delay in Minutes'] = pd.qcut(airplane_df['Departure Delay in Minutes'], \n                              q = 5, \n                              labels=['very short', 'short', 'medium', 'long', 'very long'])\nprint(airplane_df['Departure Delay in Minutes'].value_counts())\n\nairplane_df['Arrival Delay in Minutes'] = pd.qcut(airplane_df['Arrival Delay in Minutes'], \n                              q = 5, \n                              labels=['very short', 'short', 'medium', 

# **One-Hot-Encoding & Integer-Encoding**

- As the target feature for this dataset is either one of satisfied or neutral/dissastisfied we must integer-encode it. Normally, nominal descriptive features would never be integer-encoded.
- Normally, Sklearn would be used to do this but since we have a binary variably of either satisfied or neutral/dissastisfied we can continue with pandas.
  - Through visual inspection, it was confirmed that the satisfied variable was correctly encoded as 1 and not a 0

In [65]:
# Creating a categorical columns list to be used with get_dummies()
categorical_cols = airplane_df.columns[airplane_df.dtypes==object].tolist()
categorical_cols

['Gender', 'Customer Type', 'Type of Travel', 'Class', 'Satisfaction']

In [66]:
# CHecking dataframe pre-encoding
airplane_df.head()

Unnamed: 0,Gender,Customer Type,Age,Type of Travel,Class,Flight Distance,Seat comfort,Departure/Arrival time convenient,Food and drink,Gate location,...,Ease of Online booking,On-board service,Leg room service,Baggage handling,Checkin service,Cleanliness,Online boarding,Departure Delay in Minutes,Arrival Delay in Minutes,Satisfaction
77340,Female,Loyal Customer,27,Business travel,Business,2824,2,4,3,4,...,2,4,5,3,2,3,2,20.0,13.0,neutral or dissatisfied
95735,Female,Loyal Customer,55,Business travel,Business,408,3,3,4,3,...,4,4,4,4,5,4,5,0.0,0.0,satisfied
41122,Male,disloyal Customer,27,Business travel,Eco,1764,1,0,1,2,...,3,1,2,1,5,4,3,0.0,0.0,neutral or dissatisfied
66932,Male,Loyal Customer,51,Business travel,Eco,1320,1,5,5,5,...,1,1,1,4,1,3,1,0.0,0.0,neutral or dissatisfied
102225,Female,Loyal Customer,56,Business travel,Business,3966,4,4,4,4,...,4,4,4,4,4,4,5,0.0,0.0,satisfied


In [67]:
for i in categorical_cols:
    if (airplane_df[i].nunique() == 2): # if it has only two values, e.g, if its binary
        airplane_df[i] = pd.get_dummies(airplane_df[i], drop_first=True, dtype=np.int64)
   
# if it has more than two levels this is where the one hot encoding occurs for those cols
airplane_df = pd.get_dummies(airplane_df, dtype=np.int64)
airplane_df.head()  # Checking Dataframe post-encoding

Unnamed: 0,Gender,Customer Type,Age,Type of Travel,Flight Distance,Seat comfort,Departure/Arrival time convenient,Food and drink,Gate location,Inflight wifi service,...,Baggage handling,Checkin service,Cleanliness,Online boarding,Departure Delay in Minutes,Arrival Delay in Minutes,Satisfaction,Class_Business,Class_Eco,Class_Eco Plus
77340,0,0,27,0,2824,2,4,3,4,2,...,3,2,3,2,20.0,13.0,0,1,0,0
95735,0,0,55,0,408,3,3,4,3,3,...,4,5,4,5,0.0,0.0,1,1,0,0
41122,1,1,27,0,1764,1,0,1,2,3,...,1,5,4,3,0.0,0.0,0,0,1,0
66932,1,0,51,0,1320,1,5,5,5,1,...,4,1,3,1,0.0,0.0,0,0,1,0
102225,0,0,56,0,3966,4,4,4,4,2,...,4,4,4,5,0.0,0.0,1,1,0,0


- Checking to see if the data types are all numeric after encoding

In [68]:
airplane_df.dtypes

Gender                                 int64
Customer Type                          int64
Age                                    int64
Type of Travel                         int64
Flight Distance                        int64
Seat comfort                           int64
Departure/Arrival time convenient      int64
Food and drink                         int64
Gate location                          int64
Inflight wifi service                  int64
Inflight entertainment                 int64
Online support                         int64
Ease of Online booking                 int64
On-board service                       int64
Leg room service                       int64
Baggage handling                       int64
Checkin service                        int64
Cleanliness                            int64
Online boarding                        int64
Departure Delay in Minutes           float64
Arrival Delay in Minutes             float64
Satisfaction                           int64
Class_Busi

<span style='font-family:"Times New Roman"'> 

### Data Exploration & Visualisation: <a name="dvis"></a>
<span styel=''>

<span style='font-family:"Times New Roman"'> 

### Literature Review: <a name="lr"></a>
<span styel=''>

<span style='font-family:"Times New Roman"'> 

### Summary & conclusion: <a name="sum"></a>
<span styel=''>

<span style='font-family:"Times New Roman"'> 

### References: <a name="ref"></a>
<span styel=''>

In [69]:
"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2656082/" # Discretization    

'https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2656082/'