

# Web scraping and analysis

First of all i want to mention that for this task the basic parser was provided by British Airways DS team. However in my opinoin there was more to scrap from the website for the deeper analysis. 

## Scraping text data from Skytrax

If you visit [https://www.airlinequality.com] you can see that there is a lot of data there. For this task, we are only interested in reviews related to British Airways and the Airline itself.

If you navigate to this link: [https://www.airlinequality.com/airline-reviews/british-airways] you will see this data. Now, we can use `Python` and `BeautifulSoup` to collect all the links to the reviews and then to collect the text data on each of the individual review links.

In [19]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime

In [2]:
base_url = "https://www.airlinequality.com/airline-reviews/british-airways"
pages = 10
page_size = 100

reviews = []

# for i in range(1, pages + 1):
for i in range(1, pages + 1):

    print(f"Scraping page {i}")

    # Create URL to collect links from paginated data
    url = f"{base_url}/page/{i}/?sortby=post_date%3ADesc&pagesize={page_size}"

    # Collect HTML data from this page
    response = requests.get(url)

    # Parse content
    content = response.content
    parsed_content = BeautifulSoup(content, 'html.parser')
    for para in parsed_content.find_all("div", {"class": "text_content"}):
        reviews.append(para.get_text())
    
    print(f"   ---> {len(reviews)} total reviews")

Scraping page 1
   ---> 100 total reviews
Scraping page 2
   ---> 200 total reviews
Scraping page 3
   ---> 300 total reviews
Scraping page 4
   ---> 400 total reviews
Scraping page 5
   ---> 500 total reviews
Scraping page 6
   ---> 600 total reviews
Scraping page 7
   ---> 700 total reviews
Scraping page 8
   ---> 800 total reviews
Scraping page 9
   ---> 900 total reviews
Scraping page 10
   ---> 1000 total reviews


In [6]:
print(reviews[0])

Not Verified |  Airport check in was functionary with little warmth but some efficiency. Flight was delayed with no communication given. Boarding was chaotic and no management of the process by ground staff. Seats on board are tight and you really feel like they have crammed in every seat possible. There is next to no recline. However I was thankful that there was no recline. If the person in front had even the slightest recline they would be in my face owing to how tight the seats were spaced on this A321 aircraft. No amenities on this flight except for a toilet that was clean but small. No IFE, no food and beverage unless you pay extra but the staff were friendly. Luggage arrived at the carousel within a very short amount of time.


In [7]:
df = pd.DataFrame()
df["reviews"] = reviews
display(df.head())

Unnamed: 0,reviews
0,Not Verified | Airport check in was functiona...
1,✅ Trip Verified | Flight fine. In-line with c...
2,✅ Trip Verified | Came from Glasgow to London...
3,✅ Trip Verified | My flight on on 12 May 2023...
4,Not Verified | Cairo is a 5 hour flight and B...


In [5]:
#df.to_csv("data/BA_reviews.csv")

## Exterimenting what we can extract

That was actually the data that we've been able to get with the default parser that was provided. But if you visit [https://www.airlinequality.com/airline-reviews/british-airways] you will find out that there is much more to discover and all this information can be used for good. 

So I decided to go further and try to extract as much as i could while maintainig good quality of data and low amount of mising values.

In [12]:
response = requests.get(base_url)
html_content = response.content

In [13]:
soup = BeautifulSoup(html_content, "html.parser")

In [14]:
parent_elements = soup.find_all('div', class_='review')

In [15]:
print(parent_elements)

[]


In [21]:
# Extract the review text
review_text_element = review_block.find("div", itemprop="reviewBody")
review_text = review_text_element.get_text(strip=True)

# Extract the verification status
verification_status = review_text_element.find("a", href="https://www.airlinequality.com/verified-reviews/").get_text(strip=True)

# Remove the verification status from the review text
review_text = review_text.replace(verification_status, "").strip()

print("Verification Status:", verification_status)
print("Modified Review Text:", review_text)


Verification Status: Not Verified
Modified Review Text: |  Airport check in was functionary with little warmth but some efficiency. Flight was delayed with no communication given. Boarding was chaotic and no management of the process by ground staff. Seats on board are tight and you really feel like they have crammed in every seat possible. There is next to no recline. However I was thankful that there was no recline. If the person in front had even the slightest recline they would be in my face owing to how tight the seats were spaced on this A321 aircraft. No amenities on this flight except for a toilet that was clean but small. No IFE, no food and beverage unless you pay extra but the staff were friendly. Luggage arrived at the carousel within a very short amount of time.


In [26]:
date_published = review_block.find("time", itemprop="datePublished").get("datetime")

# Convert the date string to a datetime object
review_date = datetime.strptime(date_published, "%Y-%m-%d")

print("Review Date:", review_date)

Review Date: 2023-07-03 00:00:00


In [32]:
rating_element = review_block.find("td", class_="review-rating-header", text="Seat Comfort")
if rating_element:
    rating = rating_element.find_next_sibling("td", class_="review-rating-stars").find("span", class_="star fill")
    if rating:
        rating = rating.text
    else:
        rating = None
else:
    rating = None

print("Rating:", rating)

Rating: 1


  rating_element = review_block.find("td", class_="review-rating-header", text="Seat Comfort")


In [37]:
df = pd.DataFrame(columns=["Review Text", "Verification Status", "Rating", "Author", "Review Date", "Aircraft Name", "Recommendation", "Type Of Traveller", "Seat Type"])

# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(html_content, "html.parser")

# Loop through all the review blocks on the page
for review_block in soup.find_all("article", itemprop="review"):
    # Extract the review text and verification status
    review_text_element = review_block.find("div", itemprop="reviewBody")
    review_text = review_text_element.get_text(strip=True)
    verification_status = review_text_element.find("a", href="https://www.airlinequality.com/verified-reviews/").get_text(strip=True)
    review_text = review_text.replace(verification_status, "").strip()

    # Extract the rating value
    rating = review_block.find("span", itemprop="ratingValue")
    if rating:
        rating = rating.get_text(strip=True)
    else:
        rating = None

    # Extract the author name
    author = review_block.find("span", itemprop="name").get_text(strip=True)

    # Extract the date published
    date_published = review_block.find("time", itemprop="datePublished").get("datetime")
    review_date = datetime.strptime(date_published, "%Y-%m-%d")

    # Extract the aircraft name
    aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
    aircraft_name = None
    if aircraft_element:
        aircraft_name_element = aircraft_element.find_next_sibling("td", class_="review-value")
        if aircraft_name_element:
            aircraft_name = aircraft_name_element.get_text(strip=True)

    # Extract the recommendation
    recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
    recommendation = None
    if recommendation_element:
        recommendation = recommendation_element.find_next_sibling("td", class_="review-value").get_text(strip=True)
        recommendation = recommendation.lower() == "yes"

    # Extract the type of traveller
    traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
    traveller_type = None
    if traveller_element:
        traveller_type = traveller_element.find_next_sibling("td", class_="review-value").get_text(strip=True)

    # Extract the seat type
    seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
    seat_type = None
    if seat_element:
        seat_type = seat_element.find_next_sibling("td", class_="review-value").get_text(strip=True)

    # Store the extracted information in the DataFrame
    df = df.append({
        "Review Text": review_text,
        "Verification Status": verification_status,
        "Rating": rating,
        "Author": author,
        "Review Date": review_date,
        "Aircraft Name": aircraft_name,
        "Recommendation": recommendation,
        "Type Of Traveller": traveller_type,
        "Seat Type": seat_type
    }, ignore_index=True)


print(df)

                                         Review Text Verification Status  \
0  |  Airport check in was functionary with littl...        Not Verified   
1  ✅|  Flight fine. In-line with competitors. Ple...       Trip Verified   
2  ✅|  Came from Glasgow to London and took conne...       Trip Verified   
3  ✅|  My flight on on 12 May 2023 got delayed an...       Trip Verified   
4  |  Cairo is a 5 hour flight and BA considers i...        Not Verified   
5  ✅|  After travelling London to Madrid with Bri...       Trip Verified   
6  ✅|  My luggage was mis-tagged in Dallas on my ...       Trip Verified   
7  ✅|  The airline lost my luggage and was absolu...       Trip Verified   
8  ✅|  We booked on the BA website, round trip fl...       Trip Verified   
9  ✅|  First time flying with BA business class, ...       Trip Verified   

  Rating           Author Review Date Aircraft Name Recommendation  \
0      3  Carlos Whilhelm  2023-07-03          A321          False   
1     10         S Wart

  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-h

In [43]:
base_url = "https://www.airlinequality.com/airline-reviews/british-airways"
pages = 36
page_size = 100

df = pd.DataFrame(columns=["Review Text", "Verification Status", "Rating",
                           "Author", "Review Date", "Aircraft Name", "Recommendation",
                           "Type Of Traveller", "Seat Type"])

# for i in range(1, pages + 1):
for i in range(1, pages + 1):

    print(f"Scraping page {i}")

    # Create URL to collect links from paginated data
    url = f"{base_url}/page/{i}/?sortby=post_date%3ADesc&pagesize={page_size}"

    # Collect HTML data from this page
    response = requests.get(url)

    # Parse content
    html_content = response.content
    soup = BeautifulSoup(html_content, "html.parser")
    
    for review_block in soup.find_all("article", itemprop="review"):
    # Extract the review text and verification status
        review_text_element = review_block.find("div", itemprop="reviewBody")
        review_text = review_text_element.get_text(strip=True)
        
        verification_status_element = review_text_element.find("a", href="https://www.airlinequality.com/verified-reviews/")
        if verification_status_element:
            verification_status = verification_status_element.get_text(strip=True)
            review_text = review_text.replace(verification_status, "").strip()
        else:
            verification_status = None


    # Extract the rating value
        rating = review_block.find("span", itemprop="ratingValue")
        if rating:
            rating = rating.get_text(strip=True)
        else:
            rating = None

        # Extract the author name
        author = review_block.find("span", itemprop="name").get_text(strip=True)

        # Extract the date published
        date_published = review_block.find("time", itemprop="datePublished").get("datetime")
        review_date = datetime.strptime(date_published, "%Y-%m-%d")

        # Extract the aircraft name
        aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
        aircraft_name = None
        if aircraft_element:
            aircraft_name_element = aircraft_element.find_next_sibling("td", class_="review-value")
            if aircraft_name_element:
                aircraft_name = aircraft_name_element.get_text(strip=True)

        # Extract the recommendation
        recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
        recommendation = None
        if recommendation_element:
            recommendation = recommendation_element.find_next_sibling("td", class_="review-value").get_text(strip=True)
            recommendation = recommendation.lower() == "yes"

        # Extract the type of traveller
        traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
        traveller_type = None
        if traveller_element:
            traveller_type = traveller_element.find_next_sibling("td", class_="review-value").get_text(strip=True)

        # Extract the seat type
        seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
        seat_type = None
        if seat_element:
            seat_type = seat_element.find_next_sibling("td", class_="review-value").get_text(strip=True)

        # Store the extracted information in the DataFrame
        df = df.append({
            "Review Text": review_text,
            "Verification Status": verification_status,
            "Rating": rating,
            "Author": author,
            "Review Date": review_date,
            "Aircraft Name": aircraft_name,
            "Recommendation": recommendation,
            "Type Of Traveller": traveller_type,
            "Seat Type": seat_type
        }, ignore_index=True)
    
    print(f"   ---> {len(reviews)} total reviews")


Scraping page 1


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-h

   ---> 1000 total reviews
Scraping page 2


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 3


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 4


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 5


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 6


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 7


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 8


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 9


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 10


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 11


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 12


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 13


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 14


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 15


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 16


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 17


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 18


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 19


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 20


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 21


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 22


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 23


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 24


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 25


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 26


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 27


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 28


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 29


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 30


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 31


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 32


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 33


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 34


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 35


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

   ---> 1000 total reviews
Scraping page 36
   ---> 1000 total reviews


  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", class_="review-rating-header", text="Recommended")
  traveller_element = review_block.find("td", class_="review-rating-header", text="Type Of Traveller")
  seat_element = review_block.find("td", class_="review-rating-header", text="Seat Type")
  df = df.append({
  df = df.append({
  aircraft_element = review_block.find("td", class_="review-rating-header", text="Aircraft")
  recommendation_element = review_block.find("td", clas

In [44]:
display(df)

Unnamed: 0,Review Text,Verification Status,Rating,Author,Review Date,Aircraft Name,Recommendation,Type Of Traveller,Seat Type
0,| Airport check in was functionary with littl...,Not Verified,3,Carlos Whilhelm,2023-07-03,A321,False,Couple Leisure,Economy Class
1,✅| Flight fine. In-line with competitors. Ple...,Trip Verified,10,S Warten,2023-07-02,A320,True,Solo Leisure,Economy Class
2,✅| Came from Glasgow to London and took conne...,Trip Verified,1,Kapil Tyagi,2023-06-30,,False,Family Leisure,Economy Class
3,✅| My flight on on 12 May 2023 got delayed an...,Trip Verified,1,Saeed Alzubaidi,2023-06-29,,False,Solo Leisure,Economy Class
4,| Cairo is a 5 hour flight and BA considers i...,Not Verified,2,Ralph Tuckwell,2023-06-29,A321Neo,False,Couple Leisure,Economy Class
...,...,...,...,...,...,...,...,...,...
3586,LHR-JFK-LAX-LHR. Check in was ok apart from be...,,4,D Smith,2012-08-29,,False,,Economy Class
3587,LHR to HAM. Purser addresses all club passenge...,,9,Nick Berry,2012-08-28,,True,,Business Class
3588,My son who had worked for British Airways urge...,,5,Avril Barclay,2011-10-12,,True,,Economy Class
3589,London City-New York JFK via Shannon on A318 b...,,4,C Volz,2011-10-11,,False,,Premium Economy


## Final Parser

In [51]:

base_url = "https://www.airlinequality.com/airline-reviews/british-airways"
pages = 36
page_size = 100

df = pd.DataFrame(columns=["Review Text", "Verification Status", "Rating",
                           "Author", "Review Date", "Aircraft Name", "Recommendation",
                           "Type Of Traveller", "Seat Type"])

for i in range(1, pages + 1):
    print(f"Scraping page {i}")

    # Create URL to collect links from paginated data
    url = f"{base_url}/page/{i}/?sortby=post_date%3ADesc&pagesize={page_size}"

    # Collect HTML data from this page
    response = requests.get(url)

    # Parse content
    html_content = response.content
    soup = BeautifulSoup(html_content, "html.parser")

    for review_block in soup.find_all("article", itemprop="review"):
        # Extract the review text
        review_text_element = review_block.find("div", itemprop="reviewBody")
        review_text = review_text_element.get_text(strip=True)

        # Extract the verification status if present
        verification_status_element = review_text_element.find("a", href="https://www.airlinequality.com/verified-reviews/")
        if verification_status_element:
            verification_status = verification_status_element.get_text(strip=True)
            review_text = review_text.replace(verification_status, "").strip()
        else:
            verification_status = None

        # Extract the rating value
        rating = review_block.find("span", itemprop="ratingValue")
        if rating:
            rating = rating.get_text(strip=True)
        else:
            rating = None

        # Extract the author name
        author = review_block.find("span", itemprop="name").get_text(strip=True)

        # Extract the date published
        date_published = review_block.find("time", itemprop="datePublished").get("datetime")
        review_date = datetime.strptime(date_published, "%Y-%m-%d")

        # Extract the aircraft name
        aircraft_element = review_block.find("td", class_="review-rating-header", string="Aircraft")
        aircraft_name = None
        if aircraft_element:
            aircraft_name_element = aircraft_element.find_next_sibling("td", class_="review-value")
            if aircraft_name_element:
                aircraft_name = aircraft_name_element.get_text(strip=True)

        # Extract the recommendation
        recommendation_element = review_block.find("td", class_="review-rating-header", string="Recommended")
        recommendation = None
        if recommendation_element:
            recommendation = recommendation_element.find_next_sibling("td", class_="review-value").get_text(strip=True)
            recommendation = recommendation.lower() == "yes"

        # Extract the type of traveller
        traveller_element = review_block.find("td", class_="review-rating-header", string="Type Of Traveller")
        traveller_type = None
        if traveller_element:
            traveller_type = traveller_element.find_next_sibling("td", class_="review-value").get_text(strip=True)

        # Extract the seat type
        seat_element = review_block.find("td", class_="review-rating-header", string="Seat Type")
        seat_type = None
        if seat_element:
            seat_type = seat_element.find_next_sibling("td", class_="review-value").get_text(strip=True)

        # Add the extracted information to the DataFrame
        df.loc[df.shape[0]] = {
            "Review Text": review_text,
            "Verification Status": verification_status,
            "Rating": rating,
            "Author": author,
            "Review Date": review_date,
            "Aircraft Name": aircraft_name,
            "Recommendation": recommendation,
            "Type Of Traveller": traveller_type,
            "Seat Type": seat_type
        }

    print(f"   ---> {df.shape[0]} total reviews")

Scraping page 1
   ---> 100 total reviews
Scraping page 2
   ---> 200 total reviews
Scraping page 3
   ---> 300 total reviews
Scraping page 4
   ---> 400 total reviews
Scraping page 5
   ---> 500 total reviews
Scraping page 6
   ---> 600 total reviews
Scraping page 7
   ---> 700 total reviews
Scraping page 8
   ---> 800 total reviews
Scraping page 9
   ---> 900 total reviews
Scraping page 10
   ---> 1000 total reviews
Scraping page 11
   ---> 1100 total reviews
Scraping page 12
   ---> 1200 total reviews
Scraping page 13
   ---> 1300 total reviews
Scraping page 14
   ---> 1400 total reviews
Scraping page 15
   ---> 1500 total reviews
Scraping page 16
   ---> 1600 total reviews
Scraping page 17
   ---> 1700 total reviews
Scraping page 18
   ---> 1800 total reviews
Scraping page 19
   ---> 1900 total reviews
Scraping page 20
   ---> 2000 total reviews
Scraping page 21
   ---> 2100 total reviews
Scraping page 22
   ---> 2200 total reviews
Scraping page 23
   ---> 2300 total reviews
Scrapi

In [47]:
display(df)

Unnamed: 0,Review Text,Verification Status,Rating,Author,Review Date,Aircraft Name,Recommendation,Type Of Traveller,Seat Type
0,| Airport check in was functionary with littl...,Not Verified,3,Carlos Whilhelm,2023-07-03,A321,False,Couple Leisure,Economy Class
1,✅| Flight fine. In-line with competitors. Ple...,Trip Verified,10,S Warten,2023-07-02,A320,True,Solo Leisure,Economy Class
2,✅| Came from Glasgow to London and took conne...,Trip Verified,1,Kapil Tyagi,2023-06-30,,False,Family Leisure,Economy Class
3,✅| My flight on on 12 May 2023 got delayed an...,Trip Verified,1,Saeed Alzubaidi,2023-06-29,,False,Solo Leisure,Economy Class
4,| Cairo is a 5 hour flight and BA considers i...,Not Verified,2,Ralph Tuckwell,2023-06-29,A321Neo,False,Couple Leisure,Economy Class
...,...,...,...,...,...,...,...,...,...
3586,LHR-HKG on Boeing 747 - 23/08/12. Much has bee...,,4,W Benson,2012-08-29,,False,,Economy Class
3587,LHR to HAM. Purser addresses all club passenge...,,9,Nick Berry,2012-08-28,,True,,Business Class
3588,My son who had worked for British Airways urge...,,5,Avril Barclay,2011-10-12,,True,,Economy Class
3589,London City-New York JFK via Shannon on A318 b...,,4,C Volz,2011-10-11,,False,,Premium Economy


In [52]:
df.columns = df.columns.str.lower()
display(df)

Unnamed: 0,review text,verification status,rating,author,review date,aircraft name,recommendation,type of traveller,seat type
0,| Airport check in was functionary with littl...,Not Verified,3,Carlos Whilhelm,2023-07-03,A321,False,Couple Leisure,Economy Class
1,✅| Flight fine. In-line with competitors. Ple...,Trip Verified,10,S Warten,2023-07-02,A320,True,Solo Leisure,Economy Class
2,✅| Came from Glasgow to London and took conne...,Trip Verified,1,Kapil Tyagi,2023-06-30,,False,Family Leisure,Economy Class
3,✅| My flight on on 12 May 2023 got delayed an...,Trip Verified,1,Saeed Alzubaidi,2023-06-29,,False,Solo Leisure,Economy Class
4,| Cairo is a 5 hour flight and BA considers i...,Not Verified,2,Ralph Tuckwell,2023-06-29,A321Neo,False,Couple Leisure,Economy Class
...,...,...,...,...,...,...,...,...,...
3586,Flew LHR - VIE return operated by bmi but BA a...,,10,J Tinning,2012-08-29,,True,,Economy Class
3587,LHR to HAM. Purser addresses all club passenge...,,9,Nick Berry,2012-08-28,,True,,Business Class
3588,My son who had worked for British Airways urge...,,5,Avril Barclay,2011-10-12,,True,,Economy Class
3589,London City-New York JFK via Shannon on A318 b...,,4,C Volz,2011-10-11,,False,,Premium Economy


## Saving the data

In [54]:
df.to_csv('/Users/BrightFuture/Desktop/Projects/BrtitishAirways/data.csv', index=False)