<a href="https://www.kaggle.com/code/ishkag26/project-4-airline-passenger-satisfaction?scriptVersionId=144485009" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

**PROBLEM STATEMENT**

**This project aims to identify the level of satisfaction of the passengers to know the quality of services provided by the airline companies, the key factor that derive customer satisfaction and identifying the ways how airline industry can improve the service quality. Focus in developing the better idea for future. 
This eventually lead the factors having positive and negative influence on service quality of airline industry.**

**IMPORTING LIBRARIES**

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from matplotlib import cm
import seaborn as sns
import os

import warnings
warnings.filterwarnings('ignore')

In [None]:
# checking the current working directory
os.getcwd()

**LOADING THE DATASET**

In [None]:
df = pd.read_csv("/kaggle/input/airline-passenger-satisfaction/train.csv",encoding = "unicode_escape")
df_1 = pd.read_csv("/kaggle/input/airline-passenger-satisfaction/test.csv", encoding = "unicode_escape")
# to avoid encoding error, use "unicode_escape"

**TRAIN DATASET**

In [None]:
df.head()

**The first two columns are not much useful so we'll get rid of them.**

In [None]:
df.drop(['Unnamed: 0'], axis=1)

**TEST DATASET**

In [None]:
df_1.head()

In [None]:
df_1.drop(['Unnamed: 0'], axis=1)

**EXPLORATORY DATA ANALYSIS**

In [None]:
df.shape

In [None]:
df_1.shape

In [None]:
df.info()

In [None]:
df_1.info()

In [None]:
# false means no null values in the dataset
pd.isnull(df)

In [None]:
pd.isnull(df_1)

In [None]:
# Replace spaces in the column names with underscore

df.columns = [c.replace(" ","_") for c in df.columns]
df_1.columns = [c.replace(" ","_") for c in df_1.columns]

In [None]:
# count the number of missing values in each column
pd.isnull(df).sum()

**In column Arrival Delay in Minutes, we are having 310 null values.**

In [None]:
# remove rows having null values

df.dropna(inplace = True)    
pd.isnull(df).sum()

In [None]:
pd.isnull(df_1).sum()

**In column Arrival Delay in Minutes, we are having 83 null values.**

In [None]:
# remove rows having null values

df_1.dropna(inplace = True)    
pd.isnull(df_1).sum()

In [None]:
df["Arrival_Delay_in_Minutes"] = df["Arrival_Delay_in_Minutes"].astype("int")
df["Arrival_Delay_in_Minutes"].dtypes    # to check datatype

In [None]:
df_1["Arrival_Delay_in_Minutes"] = df_1["Arrival_Delay_in_Minutes"].astype("int")
df_1["Arrival_Delay_in_Minutes"].dtypes    # to check datatype

In [None]:
# used to describe numeric columns
df.describe().style.background_gradient()   

In [None]:
df_1.describe().style.background_gradient()

**IDENTIFYING OUTLIERS**

In [None]:
# import numpy as np

np.percentile(df["Flight_Distance"],25)
np.percentile(df["Flight_Distance"],75)
iqr = np.percentile(df["Flight_Distance"],75)-np.percentile(df["Flight_Distance"],25)
lower_bound = np.percentile(df["Flight_Distance"],25)-(1.5*iqr)
upper_bound = np.percentile(df["Flight_Distance"],75)+(1.5*iqr)
print(lower_bound,upper_bound)

In [None]:
df.boxplot("Flight_Distance")

In [None]:
# considering the values that are below 5000 in Flight_Distance column
df = df[df["Flight_Distance"]<5000]

In [None]:
df

**Target Distribution**

In [None]:
df.columns

In [None]:
fig,(ax1,ax2) = plt.subplots(nrows = 1,ncols = 2, figsize=(20,6))

# plot 1

ax1.bar(df["satisfaction"].value_counts().index,df["satisfaction"].value_counts(),color = ["lightgrey","lightblue"])
ax1.set_title("Satisfaction Count",fontsize = 20)
for bars in ax1.containers:    # for showing count
    ax1.bar_label(bars)

# plot 2

label = list(df['satisfaction'].value_counts().index)
value = list(df["satisfaction"].value_counts().values)
ax2.pie(value,labels=label,autopct="%1.1f%%",explode = (0,0.1),startangle = 90,shadow = True)
ax2.set_title("Satisfaction Count")

In [None]:
# Count of the classes in target column
df["satisfaction"].value_counts()

**We can clearly observe that most of the population is neutral or dissatisfied.**

**Passenger Profile**

In [None]:
fig = plt.figure(figsize=(20,6))
grid = gridspec.GridSpec(nrows = 1,ncols = 2,figure = fig)

ax3 = fig.add_subplot(grid[0,1:])
ax3.set_title("Gender",weight = "bold",fontsize = 20)
label = list(df['Gender'].value_counts().index)
value = list(df["Gender"].value_counts().values)
ax3.pie(value,labels=label,autopct="%1.1f%%",explode = (0,0.1),startangle = 90)
ax3.axis("equal")
#sns.countplot(x = "satisfaction",data = df,ax = ax1,hue = "satisfaction")
plt.show()

**High number of passengers are females.**

In [None]:
fig,(ax1,ax2) = plt.subplots(nrows = 1,ncols = 2, figsize=(20,6))

# plot 1

ax1.bar(df["Customer_Type"].value_counts().index,df["Customer_Type"].value_counts(),color = ["lightgrey","lightblue"])
ax1.set_title("Customer_Type",fontsize = 20)
for bars in ax1.containers:    # for showing count
    ax1.bar_label(bars)

# plot 2

label = list(df['Customer_Type'].value_counts().index)
value = list(df["Customer_Type"].value_counts().values)
ax2.pie(value,labels=label,autopct="%1.1f%%",explode = (0,0.1),startangle = 90,shadow = True)
ax2.set_title("Customer_Type")

**Majority of the customers are giving the right feedback and thus loyal.**

In [None]:
df.Age.plot.hist(bins = 25,color = "thistle")
plt.xlabel("Age")
#plt.ylabel("Gender")

**Higher no. of people are of the age group 40-45**

In [None]:
fig = plt.figure(figsize=(25,12))
grid = gridspec.GridSpec(nrows = 2, ncols = 2, figure = fig)


ax1 = fig.add_subplot(grid[0,:1])
ax1.set_title("Gender Distribution",fontsize = 10)


sns.countplot(x = df["Gender"],hue = df["satisfaction"],ax = ax1,color = "yellowgreen")


for bars in ax1.containers:    # for showing count
    ax1.bar_label(bars)
    


In [None]:
fig = plt.figure(figsize=(20,11))   
ax2 = fig.add_subplot(grid[0,:1])
ax2.set_title("Class Distribution",fontsize = 10)


sns.countplot(x = df["Class"],hue = df["satisfaction"],ax = ax2,color = "aquamarine")


plt.show()

**People travelling in business class are more satisfied comparison to other classes.**

In [None]:
df

In [None]:
df["Inflight_wifi_service"].value_counts()

In [None]:
df["On-board_service"].value_counts()

In [None]:
fig,(ax1,ax2) = plt.subplots(nrows = 1,ncols = 2, figsize=(20,5))

# plot 1

ax1.bar(df["Inflight_wifi_service"].value_counts().index,df["Inflight_wifi_service"].value_counts(),color = ["lightgrey","lightblue"])
ax1.set_title("Inflight_wifi_service",fontsize = 20)


# plot 2

ax2.bar(df["On-board_service"].value_counts().index,df["On-board_service"].value_counts(),color = ["lightgrey","lightblue"])
ax2.set_title("On-board_service",fontsize = 20)




**CONCLUSION**

* **Most of the passengers were female.**
*  **Most of the population was neutral or dissatisfied by the services being provided.**
* **Majority of them were the loyal customers.**
*  **Higher no. of people are of the age group 40-45.**
*  **Large no. of people were satisfied in the business class.**
* **People travelling in business class are more satisfied comparison to other classes.**
 