# Enhancing Airline Passenger Satisfaction: Data-driven Insights for Superior Travel Experiences.

## Getting you up to speed :

### If you've ever wondered what makes airline passengers truly happy, you're in for a treat. 
### This project is all about digging into a bunch of data about how travelers feel about their flights. My plan? 
### To use some cool tech skills to figure out what things matter most to passengers, from looking at stuff like how long flights are, to checking if delays annoy folks, I'll be using my data analysis superpowers to unravel the secrets of happy flyers. 
### So buckle up, as we dive into this data adventure to help airlines make your next trip even more awesome!

#### First, let's import the required libraries 


In [2]:
import numpy as np 
import pandas as pd
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)


In [3]:
pd.read_csv("Datasets\Airline+Passenger+Satisfaction/airline_passenger_satisfaction.csv")

airline_data = pd.read_csv("Datasets\Airline+Passenger+Satisfaction/airline_passenger_satisfaction.csv")
airline_data

Unnamed: 0,ID,Gender,Age,Customer Type,Type of Travel,Class,Flight Distance,Departure Delay,Arrival Delay,Departure and Arrival Time Convenience,...,On-board Service,Seat Comfort,Leg Room Service,Cleanliness,Food and Drink,In-flight Service,In-flight Wifi Service,In-flight Entertainment,Baggage Handling,Satisfaction
0,1,Male,48,First-time,Business,Business,821,2,5.0,3,...,3,5,2,5,5,5,3,5,5,Neutral or Dissatisfied
1,2,Female,35,Returning,Business,Business,821,26,39.0,2,...,5,4,5,5,3,5,2,5,5,Satisfied
2,3,Male,41,Returning,Business,Business,853,0,0.0,4,...,3,5,3,5,5,3,4,3,3,Satisfied
3,4,Male,50,Returning,Business,Business,1905,0,0.0,2,...,5,5,5,4,4,5,2,5,5,Satisfied
4,5,Female,49,Returning,Business,Business,3470,0,1.0,3,...,3,4,4,5,4,3,3,3,3,Satisfied
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129875,129876,Male,28,Returning,Personal,Economy Plus,447,2,3.0,4,...,5,1,4,4,4,5,4,4,4,Neutral or Dissatisfied
129876,129877,Male,41,Returning,Personal,Economy Plus,308,0,0.0,5,...,5,2,5,2,2,4,3,2,5,Neutral or Dissatisfied
129877,129878,Male,42,Returning,Personal,Economy Plus,337,6,14.0,5,...,3,3,4,3,3,4,2,3,5,Neutral or Dissatisfied
129878,129879,Male,50,Returning,Personal,Economy Plus,337,31,22.0,4,...,4,4,5,3,3,4,5,3,5,Satisfied


#### Dealing with missing values 

In [4]:
airline_data.dropna(inplace=True)
airline_data

Unnamed: 0,ID,Gender,Age,Customer Type,Type of Travel,Class,Flight Distance,Departure Delay,Arrival Delay,Departure and Arrival Time Convenience,...,On-board Service,Seat Comfort,Leg Room Service,Cleanliness,Food and Drink,In-flight Service,In-flight Wifi Service,In-flight Entertainment,Baggage Handling,Satisfaction
0,1,Male,48,First-time,Business,Business,821,2,5.0,3,...,3,5,2,5,5,5,3,5,5,Neutral or Dissatisfied
1,2,Female,35,Returning,Business,Business,821,26,39.0,2,...,5,4,5,5,3,5,2,5,5,Satisfied
2,3,Male,41,Returning,Business,Business,853,0,0.0,4,...,3,5,3,5,5,3,4,3,3,Satisfied
3,4,Male,50,Returning,Business,Business,1905,0,0.0,2,...,5,5,5,4,4,5,2,5,5,Satisfied
4,5,Female,49,Returning,Business,Business,3470,0,1.0,3,...,3,4,4,5,4,3,3,3,3,Satisfied
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129875,129876,Male,28,Returning,Personal,Economy Plus,447,2,3.0,4,...,5,1,4,4,4,5,4,4,4,Neutral or Dissatisfied
129876,129877,Male,41,Returning,Personal,Economy Plus,308,0,0.0,5,...,5,2,5,2,2,4,3,2,5,Neutral or Dissatisfied
129877,129878,Male,42,Returning,Personal,Economy Plus,337,6,14.0,5,...,3,3,4,3,3,4,2,3,5,Neutral or Dissatisfied
129878,129879,Male,50,Returning,Personal,Economy Plus,337,31,22.0,4,...,4,4,5,3,3,4,5,3,5,Satisfied


In [5]:
airline_data.isnull().sum()

ID                                        0
Gender                                    0
Age                                       0
Customer Type                             0
Type of Travel                            0
Class                                     0
Flight Distance                           0
Departure Delay                           0
Arrival Delay                             0
Departure and Arrival Time Convenience    0
Ease of Online Booking                    0
Check-in Service                          0
Online Boarding                           0
Gate Location                             0
On-board Service                          0
Seat Comfort                              0
Leg Room Service                          0
Cleanliness                               0
Food and Drink                            0
In-flight Service                         0
In-flight Wifi Service                    0
In-flight Entertainment                   0
Baggage Handling                

In [6]:
airline_data.tail()

Unnamed: 0,ID,Gender,Age,Customer Type,Type of Travel,Class,Flight Distance,Departure Delay,Arrival Delay,Departure and Arrival Time Convenience,...,On-board Service,Seat Comfort,Leg Room Service,Cleanliness,Food and Drink,In-flight Service,In-flight Wifi Service,In-flight Entertainment,Baggage Handling,Satisfaction
129875,129876,Male,28,Returning,Personal,Economy Plus,447,2,3.0,4,...,5,1,4,4,4,5,4,4,4,Neutral or Dissatisfied
129876,129877,Male,41,Returning,Personal,Economy Plus,308,0,0.0,5,...,5,2,5,2,2,4,3,2,5,Neutral or Dissatisfied
129877,129878,Male,42,Returning,Personal,Economy Plus,337,6,14.0,5,...,3,3,4,3,3,4,2,3,5,Neutral or Dissatisfied
129878,129879,Male,50,Returning,Personal,Economy Plus,337,31,22.0,4,...,4,4,5,3,3,4,5,3,5,Satisfied
129879,129880,Female,20,Returning,Personal,Economy Plus,337,0,0.0,1,...,4,2,4,2,2,2,3,2,1,Neutral or Dissatisfied


In [7]:
airline_data.dtypes

ID                                          int64
Gender                                     object
Age                                         int64
Customer Type                              object
Type of Travel                             object
Class                                      object
Flight Distance                             int64
Departure Delay                             int64
Arrival Delay                             float64
Departure and Arrival Time Convenience      int64
Ease of Online Booking                      int64
Check-in Service                            int64
Online Boarding                             int64
Gate Location                               int64
On-board Service                            int64
Seat Comfort                                int64
Leg Room Service                            int64
Cleanliness                                 int64
Food and Drink                              int64
In-flight Service                           int64


#### Now that the data looks clean, it is indeed ready to answer analytical questions

### How is passenger satisfaction distributed across different gender groups?

#### We can answer this by analyzing the satisfaction levels for female and male passengers and see if there are any notable differences.

In [8]:
# Group data by Gender and Satisfaction, and count occurrences

passenger_satisfaction = airline_data.groupby(['Gender', 'Satisfaction']).size().unstack()

passenger_satisfaction

Satisfaction,Neutral or Dissatisfied,Satisfied
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,37524,28179
Male,35701,28083


### Does age affect overall passenger satisfaction?

#### We can group passengers by age ranges and compare their overall satisfaction levels to see if there's a correlation.

In [9]:
# Define age ranges
age_bins = [0, 18, 30, 50, 100]
age_labels = ['0-18', '19-30', '31-50', '51+']

# Add a new column 'age_range' to the DataFrame based on age bins
airline_data['age_range'] = pd.cut(airline_data['Age'], bins=age_bins, labels=age_labels, right=False)

airline_data['Satisfied'] = airline_data['Satisfaction'] == 'Satisfied'
airline_data['Unsatisfied'] = airline_data['Satisfaction'] == 'Neutral or Dissatisfied'

# Group by age_range and calculate the count of 'Satisfied' occurrences for each group
age_satisfaction = airline_data.groupby('age_range')[['Satisfied', 'Unsatisfied']].sum()
age_satisfaction

Unnamed: 0_level_0,Satisfied,Unsatisfied
age_range,Unnamed: 1_level_1,Unnamed: 2_level_1
0-18,1641,8170
19-30,9914,18514
31-50,27727,27451
51+,16980,19090


### Are returning customers generally more satisfied than first-time customers?

#### We can compare the satisfaction levels of returning and first-time customers to understand if loyalty affects satisfaction.

In [10]:
# Classifying first-time customers and returning ones.
returning_customers = airline_data[airline_data['Customer Type'] == 'Returning']
first_time_customers = airline_data[airline_data['Customer Type'] == 'First-time']

# Calculate average satisfaction levels for returning and first-time customers
avg_satisfaction_returning = returning_customers['Satisfied'].mean()
avg_satisfaction_first_time = first_time_customers['Satisfied'].mean()

print("Average satisfaction level for returning customers:", avg_satisfaction_returning.round(2))
print("Average satisfaction level for first-time customers:", avg_satisfaction_first_time.round(2))

Average satisfaction level for returning customers: 0.48
Average satisfaction level for first-time customers: 0.24


In [11]:
# Perform a t-test to check if the difference is statistically significant
from scipy.stats import ttest_ind

t_statistic, p_value = ttest_ind(returning_customers['Satisfied'], first_time_customers['Satisfied'])

if p_value < 0.05:
    print("The difference in satisfaction levels is statistically significant.")
else:
    print("The difference in satisfaction levels is not statistically significant.")

The difference in satisfaction levels is statistically significant.


### Is there a difference in satisfaction levels between business and personal travelers?

####  We will analyze whether passengers on business trips have different satisfaction levels compared to those traveling for personal reasons.

In [12]:
# Group data by 'Travel_Type' and calculate the mean satisfaction for each group

satisfaction_by_type = airline_data.groupby('Type of Travel')[['Satisfied','Unsatisfied']].mean().round(2)

satisfaction_by_type


Unnamed: 0_level_0,Satisfied,Unsatisfied
Type of Travel,Unnamed: 1_level_1,Unnamed: 2_level_1
Business,0.58,0.42
Personal,0.1,0.9


### Which travel class has the highest satisfaction rating?

#### We will investigate which travel class (e.g., Economy, Business, First) receives the highest satisfaction ratings.


In [13]:
# Calculate the average satisfaction rating for each travel class

average_satisfaction_by_class = airline_data.groupby('Class')['Satisfied'].mean()

# Find the travel class with the highest average satisfaction

highest_satisfaction_class = average_satisfaction_by_class.idxmax()
highest_satisfaction_value = average_satisfaction_by_class.round(2).max()

# Create a new DataFrame to display the result

satifaction_by_class = pd.DataFrame({
    'Travel Class': [highest_satisfaction_class],
    'Highest Satisfaction Rating': [highest_satisfaction_value]
})

satifaction_by_class

Unnamed: 0,Travel Class,Highest Satisfaction Rating
0,Business,0.69


### Do longer flight distances have a direct effect with lower satisfaction levels?

#### We could determine if passengers on longer flights tend to have lower satisfaction scores compared to those on shorter flights.

In [14]:
# Convert 'Flight Distance' column to numeric type

dtype={'Flight Distance': float}

# Calculate the average satisfaction score for different distance ranges

distance_bins = [0, 500, 1000, 1500, 2000]  
distance_labels = ['<500', '500-1000', '1000-1500', '1500+']

airline_data['Flight Distance'] = pd.cut(airline_data['Flight Distance'], bins=distance_bins, labels=distance_labels)
average_satisfaction_by_distance = airline_data.groupby('Flight Distance')[['Satisfied']].mean()

average_satisfaction_by_distance

Unnamed: 0_level_0,Satisfied
Flight Distance,Unnamed: 1_level_1
<500,0.335657
500-1000,0.324809
1000-1500,0.363494
1500+,0.581687


### How do different aspects of the travel experience (e.g., cleanliness, food, entertainment) contribute to overall satisfaction?

####   We will break down the satisfaction scores for various aspects of the flight to see which factors have the most impact on overall satisfaction.

In [15]:
# Creating a new dataframe that only displays data about the travel experience 

# List of columns selected
selected_columns = [
    'Cleanliness',
    'Food and Drink',
    'In-flight Entertainment',
    'Seat Comfort',
    'Leg Room Service',
    'In-flight Service',
    'In-flight Wifi Service',
    'Online Boarding',
    'Ease of Online Booking',
    'Departure and Arrival Time Convenience',
    'On-board Service',
     'Check-in Service',
    'Baggage Handling',
    'Gate Location'
    
]

travel_experience = airline_data[selected_columns]

# Group by Flight Type and calculate mean for each aspect

travel_experience_by_class = travel_experience.groupby(airline_data['Class']).mean()

# Calculate the overall satisfaction for each row (axis=1 means calculate across columns)

travel_experience_by_class['Overall Satisfaction'] = travel_experience_by_class.mean(axis=1)

# Sort by the overall satisfaction in descending order

travel_experience_by_class = travel_experience_by_class.sort_values(
    by=[
    'Cleanliness',
    'Food and Drink',
    'In-flight Entertainment',
    'Seat Comfort',
    'Leg Room Service',
    'In-flight Service',
    'In-flight Wifi Service',
    'Online Boarding',
    'Ease of Online Booking',
    'Departure and Arrival Time Convenience',
    'On-board Service',
     'Check-in Service',
    'Baggage Handling',
    'Gate Location'
    
],
    ascending=False
)

# Rearrange columns to have 'Overall Satisfaction' as the first column

column_order = ['Overall Satisfaction'] + selected_columns
travel_experience_by_class = travel_experience_by_class[column_order]

travel_experience_by_class

Unnamed: 0_level_0,Overall Satisfaction,Cleanliness,Food and Drink,In-flight Entertainment,Seat Comfort,Leg Room Service,In-flight Service,In-flight Wifi Service,Online Boarding,Ease of Online Booking,Departure and Arrival Time Convenience,On-board Service,Check-in Service,Baggage Handling,Gate Location
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Business,3.432669,3.481933,3.329795,3.639313,3.763704,3.646169,3.846007,2.775657,3.719035,2.915373,2.907582,3.682529,3.520745,3.844539,2.984981
Economy Plus,3.060029,3.118017,3.110554,3.120469,3.168763,3.05661,3.382303,2.755864,2.886247,2.662793,3.209382,3.034755,3.014606,3.351812,2.96823
Economy,3.066348,3.104617,3.086429,3.096426,3.142041,3.083848,3.467144,2.673882,2.814478,2.602801,3.19256,3.120171,3.124507,3.450264,2.969699


### Is there a relationship between departure delay and passenger satisfaction?

#### Let's explore whether longer departure delays are associated with lower satisfaction ratings.

In [16]:
# Calculate average satisfaction for different departure delay ranges
  
airline_data['Departure Delay Range in minutes'] = pd.cut(airline_data['Departure Delay'], 
                                         
                                         bins=[-np.inf, 10, 20, np.inf], labels=['<10', '10-20', '>20'])

average_satisfaction_by_delay = airline_data.groupby('Departure Delay Range in minutes')[['Satisfied']].mean()

average_satisfaction_by_delay.round(2)

Unnamed: 0_level_0,Satisfied
Departure Delay Range in minutes,Unnamed: 1_level_1
<10,0.46
10-20,0.41
>20,0.36


### Does the satisfaction with online services (booking, check-in, boarding) impact overall satisfaction?

#### We can analyze how satisfaction with online services affects the overall satisfaction level of passengers.

In [17]:
# import pandas as pd

# List of columns related to online services
online_services = [
    'Online Boarding',
    'Ease of Online Booking',
    'Check-in Service'
]

# Create a new DataFrame with selected columns
online_services_experience = airline_data[online_services]

# Calculate the mean overall satisfaction for each level of online service satisfaction
overall_satisfaction_mean = airline_data.groupby('Gender')[['Online Boarding', 'Ease of Online Booking', 'Check-in Service' ]].mean().reset_index()


# Calculate the overall satisfaction for each row (axis=1 means calculate across columns)

overall_satisfaction_mean['Overall Satisfaction'] = overall_satisfaction_mean.mean(axis=1)

overall_satisfaction_mean.round(2)

Unnamed: 0,Gender,Online Boarding,Ease of Online Booking,Check-in Service,Overall Satisfaction
0,Female,3.31,2.75,3.3,3.12
1,Male,3.19,2.77,3.32,3.09
