airline

⚡ 🚀 Airline negative comments analysis

The objective of this project is to identify the most important issues faced by airline companies based on customers’ negative reviews.

📝 Table of Contents

About
Data Clearning
Classification
Result of Classification
Topic_modeling
Conclusion

🧐 About

The investigation is done through data analysis by using Python scripts. First, data cleaning is performed to narrow down the dataset to three major airline companies for analysis purpose. Then, data accuracy and classification are done using three methods: confusion matrix, Naïve Bayes, and Decision Tree. Lastly, results are analyzed and compared using Chunking, Word Cloud, and Topic Modeling.

Prerequisites

SQL, Tableau, Python or Jupyter Notebook

🔖 Data Cleaning

The dataset shown below is downloaded from Kaggle.com.; There are a total of 14 columns, 27284 lines, and no null values. In this dataset, the top 3 of the company are chosen for analysis. They are Air Canada Rouge, British Airways, and United Airlines.

Before the analysis, I used python NLTK, SQL and Tableau to checked the overall reviews about the airline industry, shows that the word “good” appeared the most in customer reviews, which has almost 16,000 counts.

SQL：

Tableau:

🌱 Classification

A deeper mining is done to explore why customers give negative reviews. Three attributes are kept for classification and topic modeling, they are airline_name, content, and recommended. Number 0 represents negative reviews and number 1 represents positive reviews.

Here are the total counts of the negative and positive reviews of each airline. United Airlines and Air Canada Rouge have significant higher negative review counts than positive. It is very critical to investigate what causes this result and how to improve it.

A confusion table is created to test the accuracy of the interpretation results. To begin, two attributes are extracted from data frame: content and recommended. Then, the reviews are converted to a list of a list.

Here are the object sets for each airline, which are lists of tuples. The review contents are broken down into individual words, and these words are labeled as neg or pos. Adjustives are selected for analysis only to eliminate background noise words.

Tagged the wrods:

Extracted Adjustives:

Classification starts after defining each feature set. First, training and testing set are generated at 80/20. Then, Naïve Bayes and Decision Tree are used to compare results and accuracy.

🌳 Result of classification and confusion table:

In classification, Naïve Bayes and Decision Tree are used for testing accuracy; and confusion tables are set for visualization of the algorithm. In addition, Chunking and Word Cloud are used to extract informative words from customers’ negative reviews.

Air Canada Rouge:

For Air Canada Rouge, the accuracy of the two models are high which is at around 90%. According to the confusion table, five reviews should be negative but prediction shows positive; and eight reviews should be positive but prediction shows negative. The overall accuracy is high and the most informative word for this airline is “unconformable”. From Chucking and Word Cloud, many customers complained the seat being uncomfortable; they also feel uncomfortable because of limited leg room.

British AirwaysAir:

British Airways has an accuracy of 80% from the two models. Nineteen reviews should be negative but prediction shows positive, and twenty six reviews should be positive but predition shows negative. The most informative words for this airline are “awful”, “terrible”, “worst”, “uncomfortable” and “disappointed”. From Chucking and Word Cloud results, customers mainly complained about the seat, food, and schedule delays.

United Airlines:

United Airlines has an accuracy of 85% from the two models. 31 reviews should be negative but prediction shows positive, and 40 reviews should be positive but predition shows negative. The overall accuracy is good and the most informative words for this airline are “worst”, “terrible” and “rude”. From Chucking and Word Cloud results, customers mainly complained about the seat, food, and customer service

🌽 Topic Modeling

Topic Modeling is also performed to compare with the result done by Chunking. First, the review contents are broken into individual words and initialized as a dictionary. Then, a corpus is generated, which is a library of words. Lastly, an LDA model is used to get the weight of each word in the negative reviews.

The circles shown below represent the corpus, and the distances between the circles represent similarity. For Air Canada Rouge, the main keywords are “seat”, “leg”, and “back”; for British Airways, the main keywords are “seat”, “food”, “time” and “hours”; for United Airline, the main keywords are “seat”, “service”, and “delay”. These words are very similar to previous results done by Chunking and Word Clouds.

Air Canada Rouge:

British AirwaysAir:

United Airlines:

🎉 Conclusion

In conclusion, this data analysis project has discovered the major customer complaints for the top three airline companies. They are categorized as seat comfortableness, food quality, customer service, and schedule on-time performance. If an airline company can provide comfortable seats, high quality food, exceptional customer service, and on-time schedule performance, then they would receive high customer satisfactions and good reviews, and they would be on the way to becoming a very successful airline company in this competitive industry.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
img		img
img1		img1
Classification and Word Cloud.ipynb		Classification and Word Cloud.ipynb
Dataset & Visualization.ipynb		Dataset & Visualization.ipynb
README.md		README.md
Topic modeling.ipynb		Topic modeling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

airline

⚡ 🚀 Airline negative comments analysis

📝 Table of Contents

🧐 About

Prerequisites

🔖 Data Cleaning

SQL：

Tableau:

🌱 Classification

🌳 Result of classification and confusion table:

Air Canada Rouge:

British AirwaysAir:

United Airlines:

🌽 Topic Modeling

Air Canada Rouge:

British AirwaysAir:

United Airlines:

🎉 Conclusion

About

Releases

Packages

Languages

YingHu1234/airline

Folders and files

Latest commit

History

Repository files navigation

airline

⚡ 🚀 Airline negative comments analysis

📝 Table of Contents

🧐 About

Prerequisites

🔖 Data Cleaning

SQL：

Tableau:

🌱 Classification

🌳 Result of classification and confusion table:

Air Canada Rouge:

British AirwaysAir:

United Airlines:

🌽 Topic Modeling

Air Canada Rouge:

British AirwaysAir:

United Airlines:

🎉 Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages