# Task 1

---

## Web scraping and analysis

This Jupyter notebook includes some code to get you started with web scraping. We will use a package called `BeautifulSoup` to collect the data from the web. Once you've collected your data and saved it into a local `.csv` file you should start with your analysis.

### Scraping data from Skytrax

If you visit [https://www.airlinequality.com] you can see that there is a lot of data there. For this task, we are only interested in reviews related to British Airways and the Airline itself.

If you navigate to this link: [https://www.airlinequality.com/airline-reviews/british-airways] you will see this data. Now, we can use `Python` and `BeautifulSoup` to collect all the links to the reviews and then to collect the text data on each of the individual review links.

In [9]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import requests

In [10]:
# collect all reviews
reviews = []

# collect ratings star
star = []

# collect date
date = []

# collect country
country = []

In [14]:
for i in range(1, 36):
    page = requests.get(f'https://www.airlinequality.com/airline-reviews/british-airways/page{i}/?sortby=post_date%3ADesc&pagesize=100')
    
    soup = BeautifulSoup(page.content, 'html.parser')
    
    for item in soup.find_all('div', class_='text_content'):
        reviews.append(item.text)
    
    for item in soup.find_all('div', class_='rating-10'):
        try:
            star.append(item.span.text)
        except:
            print(f'error page {i}')
            star.append("None")
    #date scrape
    for item in soup.find_all("time"):
        date.append(item.text)
        
    # Country Scarpe
    for item in soup.find_all('h3'):
        country.append(item.span.next_sibling.text.strip(" ()"))

error page 30
error page 31
error page 31
error page 33
error page 34


In [16]:
len(reviews)

3476

In [25]:
star = star[:3476]

In [26]:
len(star)

3476

In [27]:
len(date)

3476

In [28]:
len(country)

3476

# Export to CSV

In [30]:
df = pd.DataFrame({'reviews': reviews, 'star': star, 'date': date, 'country': country}).to_csv('data/BA_reviews.csv', index=False)

In [32]:
df = pd.read_csv('data/BA_reviews.csv')
df.head()

Unnamed: 0,reviews,star,date,country
0,"✅ Trip Verified | Very competent check in staff, saw had a problem with my left arm and insisted I could not take exit seat. Moved me to row 30 where the middle seat was empty. On the other hand on board - huge line for toilets - 45 min into a 2.30 min flight a crew member realised one of the toilets is closed - as crew had put their luggage there. They announced that they could not serve hot drinks on this flight and to bear with them as service will be slow. On asking why: ""They did not give us enough cups for hot drinks. And the card machine is not working so we have to fill out each credit card slip"". A bottle of water and a nutrigrain bar.",\n\t\t\t\t\t\t\t\t\t\t\t\t\t5,23rd February 2023,United Arab Emirates
1,"✅ Trip Verified | Check in was so slow, no self check in and bag drop. Boarding was ok, flight totally full. Booked row 9 which was ok. Some space in overhead bins. Seat and legroom ok. Cabin crew ok, smiled, gave out the bottle of water and pretzels. Flight itself was ok, landed 25 mins early and then waited 30 minutes for a stand and ground crew. Bags came off reasonably quickly. BA still seems to believe it is something special, a premium carrier. The reality is that it is not. The only reason we flew BA was we got virtually free tickets because both flights were cancelled last year. BA is our carrier of last choice.",4,18th February 2023,United Kingdom
2,"✅ Trip Verified | My review relates to the appalling experiences I had with British Airways on 14th February 2023. I was due to travel to Madrid with British Airways and before setting off I heard on the radio that there were flight delays. I looked at the Heathrow Airport website and saw that my flight had been cancelled. As a result of the information on the website, I called British Airways and have 26 minutes on hold I found myself speaking to somebody in a call centre in South Africa. The person was hard to understand due to a heavy accent and he was incompetent and insisted that the flight was not cancelled. I made my way to Heathrow and sure enough the flight was indeed cancelled. The woman in question had the audacity to declare that the flight was cancelled and that “there are no facilities here at the airport to rebook you”. She refused to let me and about 12 other people join the queue for the clearly marked assistance desk. The same woman and one of her colleagues gave out a card with a telephone number and told one rather elderly gentleman to “go online to rebook your seat”. I am visually impaired and found the attitude of the BA employee to be appalling. I called the number on the card the employee gave me and was again in a long queue to a call centre. The person who picked up the phone kept me waiting 22 minutes and declared that the flight was not cancelled and that the airport “must be wrong”. He then in a flippant tone said he was just informed that the flight was indeed cancelled and that I would have to go and speak to IBERIA customer services in Heathrow to rebook! I took the elderly gentleman with me to IBERIA where the counter staff member rolled her eyes in disgust and said that it was nothing to do with IBERIA. I returned to BA and this time I joined the check-in line and told a check-in agent what had happened. The man in question booked me onto the next available flight within a matter of a few minutes. He gave me a £30 voucher as a form of compensation but said he could not add my frequent flyer number with a partner airline. The voucher proved useless I am afraid as various shops in the predepature area refused to accept it and although I was told I could use it after clearing security none of the shops would accept it. Once aboard my new flight the service was minimal. Other than a tiny bottle of water and a packet of pretzels nothing else was available. This was a full flight as so many people had been transferred from the cancelled flight. BA Cabin crew announced that any additional items other than the small bottle of water and pretzels had to be ordered via their skyshop.",5,18th February 2023,United Kingdom
3,"✅ Trip Verified | This was my first time flying with BA & I was pleasantly surprised. Islamabad via Doha was very comfortable and the crew were great, friendly and helpful. The second flight was operated by Qatar Airways this was from Doha to Islamabad. My return journey was a direct BA flight, 46kg luggage allowance was really generous and stopped the hassle with connecting flights. Crew on this flight were really nice too. What let BA down was the food, it really was not nice, the IFE could do with some more content but that's not really a huge issue.",1,16th February 2023,United Kingdom
4,"✅ Trip Verified | Lots of cancellations and delays and no one apologized. Edinburgh to London on Feb 14th, 2023, our original flight was cancelled, and the rebooked one was cancelled as well, the third rebooked one was supposed to departure at 12:50 and had several delays until 2:24. Then we had to wait 57 minutes for our luggage to arrive.",8,15th February 2023,Canada


Congratulations! Now you have your dataset for this task! The loops above collected 1000 reviews by iterating through the paginated pages on the website. However, if you want to collect more data, try increasing the number of pages!

 The next thing that you should do is clean this data to remove any unnecessary text from each of the rows. For example, "✅ Trip Verified" can be removed from each row if it exists, as it's not relevant to what we want to investigate.