#Web Scraping and analysis on British Airways reviews
(web Scraping on Skytrax)

---

### background information

British Airways (BA) is the flag carrier airline of the United Kingdom (UK). Every day, thousands of BA flights arrive to and depart from the UK, carrying customers across the world. Whether it’s for holidays, work or any other reason, the end-to-end process of scheduling, planning, boarding, fuelling, transporting, landing, and continuously running flights on time, efficiently and with top-class customer service is a huge task with many highly important responsibilities.

Customers who book a flight with BA will experience many interaction points with the BA brand. Understanding a customer's feelings, needs, and feedback is crucial for any business, including BA.

This first task is focused on scraping and collecting customer feedback and reviewing data from a third-party source and analysing this data to present any insights we may uncover.

###To do list

*  The first thing to do will be to scrape review data from the web. For this, we should use a website called <a href="https://www.airlinequality.com/">Skytrax</a> and focus on reviews specifically about the airline itself.

*  If we navigate to this link: [https://www.airlinequality.com/airline-reviews/british-airways] you will see this data. Now, we can use `BeautifulSoup ` to collect all the links to the reviews and then to collect the text data on each of the individual review links.




In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


 `google.colab - drive ` making a google colab as a temporary directory for further analysis in google colab.


 `beautifulSoup` and ` requests` for web scraping and collecting data

 `pandas` for data processing and csv I/O

`Text Blob` for Natural Language Processing. Using Text Blob for sentiment analysis is quite simple


In [3]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from textblob import TextBlob
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS

*  Create a script for web scraping using a base URL
*  Use a for loop to navigate through 10 pages of reviews, with each page containing 100 reviews
*  Use the `requests` command to retrieve the data from each page
*  Store the data in a variable called 'reviews'

In [5]:
base_url = "https://www.airlinequality.com/airline-reviews/british-airways"
pages = 10
page_size = 100

reviews = []

# for i in range(1, pages + 1):
for i in range(1, pages + 1):

    print(f"Scraping page {i}")

    # Create URL to collect links from paginated data
    url = f"{base_url}/page/{i}/?sortby=post_date%3ADesc&pagesize={page_size}"

    # Collect HTML data from this page
    response = requests.get(url)

    # Parse content
    content = response.content
    parsed_content = BeautifulSoup(content, 'html.parser')
    for para in parsed_content.find_all("div", {"class": "text_content"}):
        reviews.append(para.get_text())
    
    print(f"   ---> {len(reviews)} total reviews")

Scraping page 1
   ---> 100 total reviews
Scraping page 2
   ---> 200 total reviews
Scraping page 3
   ---> 300 total reviews
Scraping page 4
   ---> 400 total reviews
Scraping page 5
   ---> 500 total reviews
Scraping page 6
   ---> 600 total reviews
Scraping page 7
   ---> 700 total reviews
Scraping page 8
   ---> 800 total reviews
Scraping page 9
   ---> 900 total reviews
Scraping page 10
   ---> 1000 total reviews


creating a dataframe and store a reviews in df

In [6]:
df = pd.DataFrame()
df["reviews"] = reviews
df.head()

Unnamed: 0,reviews
0,Not Verified | I find BA incredibly tacky and...
1,✅ Trip Verified | Flew ATL to LHR 8th Jan 202...
2,Not Verified | Great thing about British Airw...
3,Not Verified | The staff are friendly. The pla...
4,✅ Trip Verified | Probably the worst business ...


create a csv file store a data in "BA_reviews.csv" for future reference

In [7]:
df.to_csv("BA_reviews.csv")