### Airline review analysis
The world of aviation industry majorly depends on the ratings and reviews given by other customers. Good ratings attract more customers and vice versa. It is for this reason that most airlines do a review analysis from their clients to asses their position in the market. 

Not only are the reviews reviewed but also the rating given. The airline management and board of directors always fully depend on the data science team for insights gotten from the consumers data analysed. They are the ones who do a follow-up on both the positive and negative issues raised by their consumers. The analysis doesn't only focus on the negative side of it but also on the positive side where employees maybe given promotions depending on the consumer rating. Pilots may also be considered for pay rises and much more. 

### Business Objectives 
Given the airline data, we seek to:
1. Develop a classification model to classify the reviews into (neutral, negative and positive).
2. Find the most well rated airlines yearly with good comfort.
3. Find the most luxurious airline in terms of seat comfort, food served, wifi connectivity and inflight entertainment.  

### Import libraries

In [2]:
import pandas as pd  
import numpy as np 

import matplotlib.pyplot as plt 
import seaborn as sns 
%matplotlib inline 


from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LogisticRegression 
from sklearn.metrics import classification_report 
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
import spacy 
import nltk 
from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize
import re 
import inflect 
from nltk.stem import WordNetLemmatizer 
import gensim 

sns.set_style('white')

### Data Understanding

In [3]:
origin_df = pd.read_csv("Airline_review.csv")
origin_df.head()

Unnamed: 0.1,Unnamed: 0,Airline Name,Overall_Rating,Review_Title,Review Date,Verified,Review,Aircraft,Type Of Traveller,Seat Type,Route,Date Flown,Seat Comfort,Cabin Staff Service,Food & Beverages,Ground Service,Inflight Entertainment,Wifi & Connectivity,Value For Money,Recommended
0,0,AB Aviation,9,"""pretty decent airline""",11th November 2019,True,Moroni to Moheli. Turned out to be a pretty ...,,Solo Leisure,Economy Class,Moroni to Moheli,November 2019,4.0,5.0,4.0,4.0,,,3.0,yes
1,1,AB Aviation,1,"""Not a good airline""",25th June 2019,True,Moroni to Anjouan. It is a very small airline...,E120,Solo Leisure,Economy Class,Moroni to Anjouan,June 2019,2.0,2.0,1.0,1.0,,,2.0,no
2,2,AB Aviation,1,"""flight was fortunately short""",25th June 2019,True,Anjouan to Dzaoudzi. A very small airline an...,Embraer E120,Solo Leisure,Economy Class,Anjouan to Dzaoudzi,June 2019,2.0,1.0,1.0,1.0,,,2.0,no
3,3,Adria Airways,1,"""I will never fly again with Adria""",28th September 2019,False,Please do a favor yourself and do not fly wi...,,Solo Leisure,Economy Class,Frankfurt to Pristina,September 2019,1.0,1.0,,1.0,,,1.0,no
4,4,Adria Airways,1,"""it ruined our last days of holidays""",24th September 2019,True,Do not book a flight with this airline! My fr...,,Couple Leisure,Economy Class,Sofia to Amsterdam via Ljubljana,September 2019,1.0,1.0,1.0,1.0,1.0,1.0,1.0,no


In [5]:
origin_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23171 entries, 0 to 23170
Data columns (total 20 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Unnamed: 0              23171 non-null  int64  
 1   Airline Name            23171 non-null  object 
 2   Overall_Rating          23171 non-null  object 
 3   Review_Title            23171 non-null  object 
 4   Review Date             23171 non-null  object 
 5   Verified                23171 non-null  bool   
 6   Review                  23171 non-null  object 
 7   Aircraft                7129 non-null   object 
 8   Type Of Traveller       19433 non-null  object 
 9   Seat Type               22075 non-null  object 
 10  Route                   19343 non-null  object 
 11  Date Flown              19417 non-null  object 
 12  Seat Comfort            19016 non-null  float64
 13  Cabin Staff Service     18911 non-null  float64
 14  Food & Beverages        14500 non-null

In [4]:
origin_df.isna().sum()

Unnamed: 0                    0
Airline Name                  0
Overall_Rating                0
Review_Title                  0
Review Date                   0
Verified                      0
Review                        0
Aircraft                  16042
Type Of Traveller          3738
Seat Type                  1096
Route                      3828
Date Flown                 3754
Seat Comfort               4155
Cabin Staff Service        4260
Food & Beverages           8671
Ground Service             4793
Inflight Entertainment    12342
Wifi & Connectivity       17251
Value For Money            1066
Recommended                   0
dtype: int64

In [7]:
for cols in origin_df.columns:
    print(f"{cols} has {origin_df[cols].nunique()} unique values")

Unnamed: 0 has 23171 unique values
Airline Name has 497 unique values
Overall_Rating has 10 unique values
Review_Title has 17219 unique values
Review Date has 4557 unique values
Verified has 2 unique values
Review has 23046 unique values
Aircraft has 1048 unique values
Type Of Traveller has 4 unique values
Seat Type has 4 unique values
Route has 13607 unique values
Date Flown has 109 unique values
Seat Comfort has 6 unique values
Cabin Staff Service has 6 unique values
Food & Beverages has 6 unique values
Ground Service has 5 unique values
Inflight Entertainment has 6 unique values
Wifi & Connectivity has 6 unique values
Value For Money has 6 unique values
Recommended has 2 unique values
