The statistical analysis aimed to extract insights from the flight reviews dataset to enhance understanding of passenger experiences and sentiments. It began by summarizing key statistics such as ratings distribution, sentiment polarity, and the percentage of recommended flights. The analysis then identified top-performing airlines, areas for improvement, and characteristics of review lengths. Overall, the objective was to provide actionable insights to help airlines improve services, boost customer satisfaction, and inform decision-making processes to optimize the passenger experience.

In [4]:
import pandas as pd 
df = pd.read_csv('Final_csv.csv')
df.head()

Unnamed: 0,rating,country,date,review,Type Of Traveller,Seat Type,Date Flown,Seat Comfort,Cabin Staff Service,Food & Beverages,...,Airlines,author,CODE2,CODE3,Latitude,Longitude,Verified,review_length,roberta_polarity,roberta_sentiment
0,1.0,Jordan,04-08-2024,Ryanair lost my luggage on a direct flight. I...,Business,Economy Class,April 2024,2.0,1.0,2.0,...,ryanair,Alan Robinson,JO,JOR,31.166705,36.941628,1,188,-0.716298,Negative
1,1.0,Switzerland,04-08-2024,Booked Basel to Dublin 11.10 6.4.24. Baggage...,Family Leisure,Economy Class,April 2024,1.0,1.0,2.0,...,ryanair,T Maysan,CH,CHE,46.798562,8.231974,1,441,-0.551128,Negative
2,6.0,Germany,04-05-2024,You get what you pay. Had overweight luggag...,Couple Leisure,Economy Class,April 2024,3.0,3.0,3.0,...,ryanair,55 reviews\n\n\n\nR Darnel,DE,DEU,51.163818,10.447831,1,94,-0.353992,Negative
3,3.0,Italy,04-01-2024,Very cheeky check-in system: this did not ha...,Couple Leisure,Economy Class,March 2024,1.0,2.0,2.0,...,ryanair,Y Chen,IT,ITA,42.638426,12.674297,0,108,-0.721132,Negative
4,1.0,Spain,03-28-2024,Terrible customer service. Handling in Marra...,Family Leisure,Economy Class,March 2024,2.0,4.0,2.0,...,ryanair,Diego Perez,ES,ESP,39.326068,-4.837979,0,594,-0.652707,Negative


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13144 entries, 0 to 13143
Data columns (total 24 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   rating                  13144 non-null  float64
 1   country                 13144 non-null  object 
 2   date                    13144 non-null  object 
 3   review                  13144 non-null  object 
 4   Type Of Traveller       13144 non-null  object 
 5   Seat Type               13144 non-null  object 
 6   Date Flown              13144 non-null  object 
 7   Seat Comfort            13144 non-null  float64
 8   Cabin Staff Service     13144 non-null  float64
 9   Food & Beverages        13144 non-null  float64
 10  Inflight Entertainment  13144 non-null  float64
 11  Ground Service          13144 non-null  float64
 12  Value For Money         13144 non-null  float64
 13  Recommended             13144 non-null  int64  
 14  Airlines                13144 non-null

# Stats summary of all airlines

In [7]:
# Define the list of numerical columns to aggregate
numerical_columns = ['rating', 'Seat Comfort', 'Cabin Staff Service', 'Food & Beverages',
                     'Inflight Entertainment', 'Ground Service', 'Value For Money',
                     'review_length', 'roberta_polarity']

# Group by 'Airlines' and aggregate numerical columns
aggregated_data = df.groupby('Airlines')[numerical_columns].mean().reset_index()

aggregated_data

Unnamed: 0,Airlines,rating,Seat Comfort,Cabin Staff Service,Food & Beverages,Inflight Entertainment,Ground Service,Value For Money,review_length,roberta_polarity
0,delta-air-lines,3.201685,2.427593,2.751448,2.327014,2.869932,2.125856,2.055819,151.070037,-0.383448
1,emirates,5.114633,3.260433,3.038563,3.005811,3.740095,2.745906,2.966191,160.843106,-0.118528
2,etihad-airways,4.003401,2.672336,2.794218,2.578798,2.965986,1.955782,2.550454,162.159864,-0.281915
3,qatar-airways,7.295047,3.946259,4.268177,3.856164,4.004742,3.731296,3.860379,130.160695,0.283887
4,ryanair,4.00792,2.262408,2.719113,2.004752,2.675818,1.964625,2.652587,131.82471,-0.233792
5,spirit-airlines,1.657881,1.498155,1.939905,1.626252,2.214022,1.356879,1.395888,140.074328,-0.5924
6,united-airlines,2.5208,2.036862,2.399157,1.954713,2.400211,1.823591,1.730384,151.580305,-0.469733


Based on the findings from the table:

**1. Rating Range:**
- The ratings range from as low as 1.66 for Spirit Airlines to as high as 7.30 for Qatar Airways, indicating a significant variation in passenger satisfaction levels among the airlines.

**2. Top Performing Airlines:**
- Qatar Airways has the highest average rating of 7.30, followed by Emirates (5.11) and Etihad Airways (4.00), indicating high levels of passenger satisfaction.

**3. Areas of Strength:**
- Qatar Airways received high ratings across all aspects, including cabin staff service, food & beverages, inflight entertainment, and ground service, suggesting excellence in overall customer experience.
- Emirates also performed well in most categories, particularly in seat comfort, inflight entertainment, and value for money.

**4. Areas for Improvement:**
- Spirit Airlines received the lowest ratings across all aspects, indicating areas for improvement in seat comfort, cabin staff service, and value for money.
- United Airlines also received relatively low ratings compared to other airlines, indicating potential areas for enhancement in multiple aspects of the passenger experience.

**5. Overall Sentiment:**
- The Roberta polarity scores suggest that the overall sentiment towards most airlines is negative, indicating that there may be a higher proportion of negative reviews compared to positive ones.

In conclusion, Qatar Airways stands out as the top-performing airline with consistently high ratings across all aspects of the passenger experience. On the other hand, Spirit Airlines and United Airlines appear to have room for improvement in various areas to enhance customer satisfaction and sentiment.

# Stats of Ratings

In [8]:
df['rating'].describe()

count    13144.000000
mean         3.970861
std          3.465944
min          1.000000
25%          1.000000
50%          2.000000
75%          8.000000
max         10.000000
Name: rating, dtype: float64

**Overview of Conclusions:**

1. **Skewed Distribution**: The distribution of ratings appears to be heavily skewed towards lower ratings, as indicated by the relatively low mean compared to the median and the high standard deviation. This suggests that a significant proportion of reviewers may have had negative experiences or provided lower ratings.

2. **Variability in Ratings**: The high standard deviation indicate that there is considerable variability in the opinions of reviewers. This variability could stem from differences in individual experiences, expectations, and preferences.

3. **Positive Skewness**: The data shows positive skewness, with more ratings concentrated towards the lower end of the scale and a longer tail towards higher ratings. This suggests that while there are some highly positive ratings, the majority of ratings tend to be lower.

4. **Potential Issues**: The presence of a large number of low ratings, as indicated by the 25th percentile being 1, may signal potential issues or areas for improvement in the flight experiences provided by the airlines.

# Percentage of recommended flights

In [9]:
percentage_recommended = (df['Recommended'].sum() / len(df)) * 100
print("Percentage of Recommended Flights:", percentage_recommended)

Percentage of Recommended Flights: 33.285149117468045


**Overview of Conclusions:**

1. **Recommendation Rate**: A recommendation rate of around 32% suggests a lower level of satisfaction among the customers. 

2. **Room for Improvement**: The percentage of recommended flights falling below 50% suggests that there may be areas where airlines can enhance their services or address issues to increase customer satisfaction and the likelihood of recommendations.

# Stats of Review length

In [10]:
df['review_length'].describe()

count    13144.000000
mean       146.660149
std        105.741992
min         15.000000
25%         75.000000
50%        117.000000
75%        185.000000
max        957.000000
Name: review_length, dtype: float64

**Overview of Conclusions**:

1. **Varied Review Lengths**: The summary statistics show a wide range of review lengths, measured in terms of word count, ranging from 15 to 957 words. This indicates that reviewers have provided feedback across a spectrum of levels of detail.

2. **Average Review Length**: The average review length, measured in terms of word count, of approximately 147.50 words suggests that, on average, reviewers tend to provide relatively concise feedback. 

3. **Distribution of Review Lengths**: The quartiles (25th, 50th, and 75th percentiles) indicate that the majority of reviews are relatively short, with 75% of reviews having a word count of 186 words or fewer.