# <center>Classification Overview</center>

### <center> Classification is when the feature to be predicted contains **categories of values** . Each of these categories is considered as a **class** into which the predicted value falls and hence has its name, classification.</center>

![image.png](attachment:image.png)

# <center> Airline Passenger Satisfaction </center>
 ![image.png](attachment:image.png)

**Context**

This dataset contains an airline passenger satisfaction survey. Can you predict passenger satisfaction?

**Content**

* **Gender**: Gender of the passengers (Female, Male)

* **Customer Type**: The customer type (Loyal customer, disloyal customer)

* **Age**: The actual age of the passengers

* **Type of Travel**: Purpose of the flight of the passengers (Personal Travel, Business Travel)

* **Class**: Travel class in the plane of the passengers (Business, Eco, Eco Plus)

* **Flight distance**: The flight distance of this journey

* **Inflight wifi service**: Satisfaction level of the inflight wifi service (0:Not Applicable;1-5)

* **Departure/Arrival time convenient**: Satisfaction level of Departure/Arrival time convenient

* **Ease of Online booking**: Satisfaction level of online booking

* **Gate location**: Satisfaction level of Gate location

* **Food and drink**: Satisfaction level of Food and drink

* **Online boarding**: Satisfaction level of online boarding

* **Seat comfort**: Satisfaction level of Seat comfort

* **Inflight entertainment**: Satisfaction level of inflight entertainment

* **On-board service**: Satisfaction level of On-board service

* **Leg room service**: Satisfaction level of Leg room service

* **Baggage handling**: Satisfaction level of baggage handling

* **Check-in service**: Satisfaction level of Check-in service

* **Inflight service**: Satisfaction level of inflight service

* **Cleanliness**: Satisfaction level of Cleanliness

* **Departure Delay in Minutes**: Minutes delayed when departure

* **Arrival Delay in Minutes**: Minutes delayed when Arrival

* **Satisfaction**: Airline satisfaction level(Satisfaction, neutral or dissatisfaction)

### Importing Libraries

In [22]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
import lux

### Importing and Exploring Our data

In [23]:
data = pd.read_csv('data.csv')

In [24]:
data

Button(description='Toggle Pandas/Lux', layout=Layout(top='5px', width='140px'), style=ButtonStyle())

Output()

In [25]:
data.info()
#
#
#

<class 'lux.core.frame.LuxDataFrame'>
RangeIndex: 103904 entries, 0 to 103903
Data columns (total 23 columns):
 #   Column                             Non-Null Count   Dtype  
---  ------                             --------------   -----  
 0   Gender                             103904 non-null  object 
 1   Customer Type                      103904 non-null  object 
 2   Age                                103904 non-null  int64  
 3   Type of Travel                     103904 non-null  object 
 4   Class                              103904 non-null  object 
 5   Flight Distance                    103904 non-null  int64  
 6   Inflight wifi service              103904 non-null  int64  
 7   Departure/Arrival time convenient  103904 non-null  int64  
 8   Ease of Online booking             103904 non-null  int64  
 9   Gate location                      103904 non-null  int64  
 10  Food and drink                     103904 non-null  int64  
 11  Online boarding                    1039

In [31]:
data.select_dtypes(include='object')
#
#

In [27]:
data.select_dtypes(exclude='object')
#
#

Button(description='Toggle Pandas/Lux', layout=Layout(top='5px', width='140px'), style=ButtonStyle())

Output()

In [29]:
data.Gender.nunique()
#
#

2

In [37]:
data.select_dtypes(include='object').columns

Index(['Gender', 'Customer Type', 'Type of Travel', 'Class', 'satisfaction'], dtype='object')

In [38]:
data.Gender.value_counts()

Button(description='Toggle Pandas/Lux', layout=Layout(top='5px', width='140px'), style=ButtonStyle())

Output()

In [35]:
cat_columns= data.select_dtypes(include='object').columns
for col in cat_columns:
    print(data[col].value_counts())
    print("__________________")

Female    52727
Male      51177
Name: Gender, dtype: int64
__________________
Loyal Customer       84923
disloyal Customer    18981
Name: Customer Type, dtype: int64
__________________
Business travel    71655
Personal Travel    32249
Name: Type of Travel, dtype: int64
__________________
Business    49665
Eco         46745
Eco Plus     7494
Name: Class, dtype: int64
__________________
neutral or dissatisfied    58879
satisfied                  45025
Name: satisfaction, dtype: int64
__________________


In [41]:
data.isnull().sum()

Button(description='Toggle Pandas/Lux', layout=Layout(top='5px', width='140px'), style=ButtonStyle())

Output()

## Preprocessing the data

In [42]:
# drop missing values 
#
#
data.dropna(inplace=True)

In [43]:
data.isnull().sum().sum()
#

0

In [46]:
data['Customer Type'].unique()

array(['Loyal Customer', 'disloyal Customer'], dtype=object)

In [47]:
data.Gender=data.Gender.replace({"Male":1,"Female":0})
data['Customer Type']= data['Customer Type'].replace({"Loyal Customer":1, "disloyal Customer":0})
data['Type of Travel'] = data['Type of Travel'].replace({"Personal Travel":1, "Business travel":0})
data.Class = data.Class.replace({"Eco":0, "Eco Plus":1, "Business":3})
data.satisfaction = data.satisfaction.replace({"neutral or dissatisfied":0, "satisfied":1})

In [48]:
data.info()

<class 'lux.core.frame.LuxDataFrame'>
Int64Index: 103594 entries, 0 to 103903
Data columns (total 23 columns):
 #   Column                             Non-Null Count   Dtype  
---  ------                             --------------   -----  
 0   Gender                             103594 non-null  int64  
 1   Customer Type                      103594 non-null  int64  
 2   Age                                103594 non-null  int64  
 3   Type of Travel                     103594 non-null  int64  
 4   Class                              103594 non-null  int64  
 5   Flight Distance                    103594 non-null  int64  
 6   Inflight wifi service              103594 non-null  int64  
 7   Departure/Arrival time convenient  103594 non-null  int64  
 8   Ease of Online booking             103594 non-null  int64  
 9   Gate location                      103594 non-null  int64  
 10  Food and drink                     103594 non-null  int64  
 11  Online boarding                    1035

In [45]:
data['Gender']

Button(description='Toggle Pandas/Lux', layout=Layout(top='5px', width='140px'), style=ButtonStyle())

Output()

In [15]:
#data # after all transformations and preprocessing

In [13]:
#
#
#
#

In [16]:
# separate features from the target
#
#
#

In [17]:
# splitting training and testing data
#
#

# <center> Classification Algorithms </center>

* Naive Bayes
* Logistic regression
* K-nearest neighbors
* (Kernel) SVM
* Decision tree
* Ensemble learning (Random Forest , adaboost, xgboost, lightgbm , catboost)
* ...

### Logistic regression

![image.png](attachment:image.png)

### Implementation with sklearn

In [55]:
from sklearn.linear_model import LogisticRegression # import LogisticRegression 

#LR = LogisticRegression(solver='liblinear')  # solver : Algorithm to use in the optimization problem
# fit the model
#
#

LogisticRegression(solver='liblinear')

In [18]:
# make predictions on training data
#
#

# make predictions on testing data
#
#



In [19]:
#
#
#

## use accuracy score to evaluate the classifier

![image.png](attachment:image.png)

In [20]:
# evaluate the model and conclude 

#print("train accuracy : {}".format(accuracy_score(y_train,y_train_pred)))
#print("test accuracy : {}".format(accuracy_score(y_test,y_test_pred)))