# Task


1. **Web Scraping**: Utilize the website Skytrax to gather review data about the airline. You can use the provided Jupyter Notebook in the Resources section to execute Python code that will assist in data collection.

2. **Data Analysis**: Once you have your dataset, prepare it by cleaning the messy and text-heavy data. After cleaning, conduct your own analysis to uncover insights. You might consider sentiment analysis, topic modeling, or generating word clouds to gain insights into the content of the reviews. It's recommended to complete this task using Python, but you can use any tool of your choice. Utilize the documentation websites provided in the Resources section to analyze the data.

3. **Presenting Insights**: Create a single PowerPoint slide that summarizes your findings. Include visualizations and metrics in this slide, along with clear and concise explanations to quickly convey the key points from your analysis. Use the provided PowerPoint template to create this slide.



Nama: Dwi Putra Satria Utama

# Import libraries

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time


In [2]:
#url = "https://www.airlinequality.com/airline-reviews/british-airways"
#response = requests.get(url)
#html = response.text
#soup = BeautifulSoup(html, "html.parser")
#for i in soup.find_all('div', {'class': 'text_content'}):
    #print(i.text)

In [3]:
# URL
url = "https://www.airlinequality.com/airline-reviews/british-airways/page/{}"

# List untuk menyimpan hasil scraping
scraped_data = []

# Loop melalui 100 halaman
for page in range(1, 101):
    url = url.format(page)
    response = requests.get(url)
    
    if response.status_code == 200:
        html = response.text
        soup = BeautifulSoup(html, "html.parser")
        
        for i in soup.find_all('div', {'class': 'text_content'}):
            scraped_data.append(i.text)
        
        # Jeda waktu 3 detik
        time.sleep(2)

In [4]:
# Buat DataFrame dari hasil scraping
df = pd.DataFrame({"Reviews": scraped_data})

# Menampilkan DataFrame
print(df)

                                               Reviews
0    ✅ Trip Verified |  My family flew from Washing...
1    ✅ Trip Verified |  Easy check in a T5. Galleri...
2    Not Verified |  Flight delayed by an hour, it ...
3    Not Verified | The staff are very rude and not...
4    ✅ Trip Verified |  Good domestic flight operat...
..                                                 ...
995  Not Verified | Failed at all basic travel fund...
996  ✅ Trip Verified |  They lost my baggage in a v...
997  ✅ Trip Verified |  Late boarding led to a one ...
998  ✅ Trip Verified | As usual the flight is delay...
999  ✅ Trip Verified |  I had the most fantastic BA...

[1000 rows x 1 columns]


In [5]:
# Menghapus dari karakter "|" ke belakang
df["Reviews"] = df["Reviews"].str.split("|", n=1).str[-1].str.strip()

# Menampilkan DataFrame setelah pembersihan
df

Unnamed: 0,Reviews
0,My family flew from Washington to London on a ...
1,Easy check in a T5. Galleries south and North ...
2,"Flight delayed by an hour, it happens, no bigg..."
3,The staff are very rude and not trained proper...
4,Good domestic flight operated by BA Cityflyer....
...,...
995,Failed at all basic travel fundamentals: 1) Ou...
996,They lost my baggage in a very simple situatio...
997,Late boarding led to a one hour flight leaving...
998,As usual the flight is delayed. BA try to blam...


In [6]:
# Menyimpan DataFrame ke file CSV
df.to_csv("british_airways_reviews_19august2023.csv", index=False)