# Emirates Review Project

This project created by **Ahmad Ulfi Jihad Dzulqornain** for data analyst portfolio <br>

Our adventure begins with a visit to the beloved website AirlineQuality.com, where we'll immerse ourselves in a treasure trove of reviews from Emirates passengers. But before we set off, let's make sure we have all the necessary tools in our arsenal:


_______

First thing first, we need to install the package needed including:
- Selenium 🕸️
- BeautifulSoup4 🍲
- Pandas 🐼
- Numpy 🔢
- Pickle 🥒
- datetime 📅

Join me on this thrilling adventure through the Emirates Review Project, where we'll soar through the skies of data analysis, discovering the highs, the lows, and everything in between. Let's unravel the stories, sentiments, and trends that shape the Emirates airline experience.

```Note: This project is a showcase of my data analysis skills and is not officially affiliated with Emirates or AirlineQuality.com.✍️```

In [1]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium import webdriver
from bs4 import BeautifulSoup
from datetime import datetime

import pandas as pd
import numpy as np
import pickle


To provide comprehensive information, web scraping was performed to extract the following details from each review:<br><br>
- **Name** 👤: The reviewer's name.<br>
- **Country** 🌍: The country of origin.<br>
- **Date** 📅: The date the review was posted.<br>
- **Overall Rating** ⭐️: A rating out of 10, representing the overall experience.<br>
- **Headline** 📰: A concise summary of the review.<br>
- **Review** 💬: The detailed review text.<br>
- **Aircraft** ✈️: The specific aircraft used.<br>
- **Type of Traveller** 🎒: The category of the traveler (e.g., business, leisure).<br>
- **Seat Type** 🪑: The type of seat chosen.<br>
- **Route** 🗺️: The flight route taken.<br>
- **Seat Comfort Rating** 😌: A rating out of 10 for seat comfort.<br>
- **Cabin Staff Service Rating** 🙌: A rating out of 10 for the cabin staff's service.<br>
- **FnB Rating** 🍽️: A rating out of 10 for food and beverage services.<br>
- **Inflight Entertainment Rating** 🎥: A rating out of 10 for inflight entertainment.<br>
- **Ground Service Rating** 🛫: A rating out of 10 for ground services.<br>
- **Wifi & Connectivity Rating** 📶: A rating out of 10 for wifi and connectivity.<br>
- **Value for Money Rating** 💰: A rating out of 10 for the value provided.<br>
- **Recommended** 👍👎: Whether the reviewer would recommend the airline (Yes/No).<br><br>
Now, with all these insightful details, readers can dive into a richer understanding of the reviews! 😊🚀

The provided code uses Selenium and BeautifulSoup libraries to scrape airline reviews from the website [https://www.airlinequality.com/airline-reviews/emirates/](https://www.airlinequality.com/airline-reviews/emirates/).

**Code Flow:**
- Initializes a Chrome WebDriver session.
- Navigates to the specified URL.
- Parses HTML responses.

This code enables efficient scraping of airline reviews for further analysis.

🌟 Displaying only 10 reviews as the default setting on the website can be a bit limiting. To enhance the user experience, access has been opened to showcase the complete set of reviews. ✨<br><br>

In [14]:
# Specify the URL to scrape
url = 'https://www.airlinequality.com/airline-reviews/emirates/page/1/?sortby=post_date%3ADesc&pagesize=2269'

In [15]:
# Set the path for the chromedriver executable
path_chromedriver = ("./chromedriver_win32/chromedriver.exe")

# Initialize Chrome WebDriver
driver = webdriver.Chrome(path_chromedriver)

# Open the URL in the driver
driver.get(url)

  driver = webdriver.Chrome(path_chromedriver)


In [16]:
# Create BeautifulSoup object for parsing the page source
result = BeautifulSoup(driver.page_source, 'html.parser')

# Close the driver
driver.close()

In [17]:
# Save the result as a .pkl file
with open('result.pkl', 'wb') as file:
    pickle.dump(result, file)

In [2]:
# Open the .pkl file in binary mode
with open('result.pkl', "rb") as file:
    result = pickle.load(file)

In [37]:
# Retrieve User Name
name_element = result.find_all('span',itemprop='name')
name = [element.text for element in name_element]
len(name)

2269

In [36]:
# Retrieve Country Name
country_element = result.find_all('span',itemprop="author")
country = [element.next_sibling.strip(' ()"') for element in country_element]
len(country)

2269

In [34]:
# Retrieve Date Published
date_element = result.find_all('time',itemprop='datePublished')
date = [element['datetime'] for element in date_element]
# Change type
dates = [datetime.strptime(string_dates,'%Y-%m-%d')
         for string_dates in date]
len(dates)

2269

In [38]:
# Rating
rating_element = result.find_all('span',itemprop="ratingValue")
rating = [int(element.text) for element in rating_element]
len(rating)

2269

In [39]:
#Retrieve Headline
headline_element = result.find_all(class_='text_header')
headline = [element.text.strip('"') for element in headline_element]
len(headline)

2273

In [42]:
#Retrieve Review
rev_element = result.find_all('div',class_='text_content')
reviews = [element.text for element in rev_element]
len(reviews)

2269

In [44]:
verified_status = ['Verified' if review.startswith('✅ Trip Verified') else 'Not Verified' for review in reviews]
review_only = [review.split('|')[1].strip() 
                if (review.startswith('✅ Trip Verified') or review.startswith('Not Verified')) 
                else review for review in reviews]
len(verified_status)

2269

In [27]:
reviewsrate = result.find_all('table', class_='review-ratings')

In [68]:
overall_rate = reviewsrate[0]
reviewsrate = reviewsrate[1:]

In [69]:
reviewsrate = result.find_all('table', class_='review-ratings')

Aircraft = []
Traveller = []
Seat = []
Route = []
Comfort = []
Staff = []
FnB = []
Entertainment = []
Service = []
Wifi = []
Value = []
Recommended = []

for review in reviewsrate:
    data = {}

    try:
        Aircraft.append(review.find('.review-rating-header.aircraft + td').text.strip())
    except AttributeError:
        Aircraft.append(np.nan)

    try:
        Traveller.append(review.select_one('.review-rating-header.type_of_traveller + td').text.strip())
    except AttributeError:
        Traveller.append(np.nan)

    try:
        Seat.append(review.select_one('.review-rating-header.cabin_flown + td').text.strip())
    except AttributeError:
        Seat.append(np.nan)

    try:
        Route.append(review.select_one('.review-rating-header.route + td').text.strip())
    except AttributeError:
        Route.append(np.nan)

    Comfort.append(len(review.select('.review-rating-header.seat_comfort + td .star.fill')))
    Staff.append(len(review.select('.review-rating-header.cabin_staff_service + td .star.fill')))
    FnB.append(len(review.select('.review-rating-header.food_and_beverages + td .star.fill')))
    Entertainment.append(len(review.select('.review-rating-header.inflight_entertainment + td .star.fill')))
    Service.append(len(review.select('.review-rating-header.ground_service + td .star.fill')))
    Wifi.append(len(review.select('.review-rating-header.wifi_and_connectivity + td .star.fill')))
    Value.append(len(review.select('.review-rating-header.value_for_money + td .star.fill')))

    try:
        Recommended.append(review.select_one('.review-rating-header.recommended + td').text.strip())
    except AttributeError:
        Recommended.append(np.nan)

In [71]:
len(Aircraft)

2270

In [33]:
data = {
    'Name' : name,
    'Country' : country,
    'dates' : dates,
    'Rating' : rating[1:],
    'Headline' : headline[:-4],
    'Review' : review_only,
    'Verified' : verified_status,
    'Aircraft' : Aircraft[1:],
    'Traveller' : Traveller[1:],
    'Seat' : Seat[1:],
    'Route' : Route[1:],
    'Comfort' : Comfort[1:],
    'Staff' : Staff[1:],
    'FnB' : FnB[1:],
    'Entertainment' : Entertainment[1:],
    'Service' : Service[1:],
    'Wifi' : Wifi[1:],
    'Value' : Value[1:],
    'Recommended' : Recommended[1:]}

In [32]:
for i in data:
    if len(data[i])>2268 or len(data[i])<2268:
        print(i)
        print(len(data[i]))

In [34]:
dataset = pd.DataFrame(data)
dataset

Unnamed: 0,Name,Country,dates,Rating,Headline,Review,Verified,Aircraft,Traveller,Seat,Route,Comfort,Staff,FnB,Entertainment,Service,Wifi,Value,Recommended
0,Wendy Felese,United States,2023-06-06,10,pleasant and professional,Every encounter with airline representatives w...,Verified,A340,Couple Leisure,Economy Class,New York JFK to Dubai,5,5,5,5,5,0,5,yes
1,Wayne Good,New Zealand,2023-06-06,10,excelling with their food,"Birmingham to Dubai, Boeing 777. Hideous aircr...",Verified,Boeing 777-300 / A380,Solo Leisure,Economy Class,Birmingham to Auckland via Dubai,4,5,5,5,4,5,4,yes
2,M Warter,United States,2023-05-31,1,worst experience of my lifetime,I had the worst experience of my lifetime flyi...,Verified,Boeing 777,Couple Leisure,Economy Class,Trivandrum to Seattle via Dubai,1,1,1,1,3,1,1,no
3,L Parelli,Portugal,2023-05-29,7,acceptably comfortable,Lisbon to Dubai in one of Emirates B777. The p...,Verified,Boeing 777,Business,Economy Class,Lisbon to Dubai,3,5,4,5,4,4,4,yes
4,Alain Delaunay,Germany,2023-05-27,10,complete different experience,Had the chance to travel in first class but th...,Verified,Boeing 777 / A380,Business,First Class,Hamburg to Bangkok via Dubai,5,5,5,5,5,5,5,yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2263,D James,South Africa,2013-12-29,10,Emirates customer review,Cape Town to Bangkok-departed on time excellen...,Not Verified,,,Economy Class,,3,3,4,4,0,0,4,yes
2264,M Caplan,United Kingdom,2013-12-29,2,Emirates customer review,Flew 6 flights recently in business with Emira...,Not Verified,,,Business Class,,5,5,5,5,0,0,5,yes
2265,Ranaweera Ravindra,Japan,2013-12-29,3,Emirates customer review,The fleets are new comfy. Flew Narita-Dubai-Mu...,Not Verified,,,Economy Class,,4,1,2,5,0,0,3,no
2266,R Breakwell,Australia,2013-12-25,5,Emirates customer review,NBO-SIN-NBO. First leg was alright but nothing...,Not Verified,,,Economy Class,,3,1,2,5,0,0,3,no


In [86]:
# Save the DataFrame to an Excel file
dataset.to_excel('Emirates Review Scrapped Data.xlsx', index=False)  # Set index=False to exclude the index column

It's done! Time to analysis.