## Collecting Data

To execute this phase of the project, we will focus on the following key data points:

Date of Travel: We will extract the dates when customers traveled, allowing us to analyze trends and patterns over time.

Customer Reviews: Our aim is to scrape in-depth reviews written by customers, providing us with valuable feedback and suggestions.

Customer Origin: We will collect information about the geographical locations of our customers, helping us understand the diversity of our clientele.

Flight Ratings: We will be capturing the ratings assigned by customers for their flights on a scale of 1 to 10, enabling us to assess overall flight satisfaction levels.


In [49]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [50]:
#Let's create some empty to get data from website
date = []
reviews = []
country = []
stars = []

for i in range(1, 50):
    page = requests.get(f"https://www.airlinequality.com/airline-reviews/british-airways/page/{i}/?sortby=post_date%3ADesc&pagesize=50")
    soup = BeautifulSoup(page.content, "html.parser")

    for item in soup.find_all("div", class_="text_content"):
        reviews.append(item.text)

    for item in soup.find_all("div", class_="rating-10"):
        try:
            stars.append(item.span.text)
        except:
            print(f"Error on page {i}")
            stars.append("None")

    for item in soup.find_all("time"):
        date.append(item.text)

    for item in soup.find_all("h3"):
        country.append(item.span.next_sibling.text.strip(" ()"))


In [51]:
#Let's check length of date,reviews,country and stars because it is import before creating a dataframe
len(date),len(reviews),len(country),len(stars)

(2450, 2450, 2450, 2499)

In [53]:
#Let's make stars columns in the same size with others columns to create a dataframe otherwise we can not create
stars=stars[:len(reviews)]

In [60]:
# It is time to set up a dictionary to create the DataFrame
data = {
    "Date": date,
    "Review": reviews,
    "Country": country,
    "Star Rating": stars
   
}

# Create the DataFrame
data = pd.DataFrame(data)

In [62]:
#First 5 rows of data
data.head()

Unnamed: 0,Date,Review,Country,Star Rating
0,24th July 2023,Not Verified | I booked Premium Economy from I...,United Kingdom,\n\t\t\t\t\t\t\t\t\t\t\t\t\t5
1,21st July 2023,✅ Trip Verified | A simple story with an unfor...,Germany,1
2,21st July 2023,✅ Trip Verified | Flight was delayed due to t...,United Kingdom,1
3,20th July 2023,Not Verified | Fast and friendly check in (to...,United Kingdom,4
4,20th July 2023,✅ Trip Verified | I don't understand why Brit...,United Kingdom,8


## Convert the dataset to csv format to work on it.

In [63]:
data.to_csv('British_Airways_reviews.csv', index=False)