### Task Background
British Airways (BA) is the flag carrier airline of the United Kingdom (UK), operating thousands of flights daily across the globe. As a data scientist at BA, your role is pivotal in leveraging analytical insights to drive business decisions, optimize operations, and enhance customer experience.

#### Objective
The objective of this task is to scrape and collect customer feedback and review data from a third-party source to analyze and uncover insights.

#### Approach
1. To achieve this objective, the following steps were undertaken:

2. Web Scraping: Utilized Python libraries such as requests and BeautifulSoup to scrape data from AirlineQuality website.

3. Data Collection: Extracted relevant data including review details, ratings, and statuses.

4. Data Analysis: Processed the collected data to derive insights regarding customer sentiment, satisfaction levels, and overall feedback.

#### Data Components
- Review Details: Includes information such as review titles, dates, and user statuses.

- Review Ratings: Consists of ratings provided by users across various aspects of their experience.

- Review Statuses: Indicates the status of each review, providing context to the feedback provided.

In [None]:
# Web Scraping for British Airways Customer Reviews

## Code Overview

import requests                 # Importing requests library for making HTTP requests
from bs4 import BeautifulSoup   # Importing BeautifulSoup for parsing HTML content
import pandas as pd             # Importing pandas for data manipulation
from datetime import datetime   # Importing datetime for date manipulation
from dateutil import parser     # Importing parser from dateutil for parsing dates

# Extract total review count from initial page
url1 = "https://www.airlinequality.com/airline-reviews/british-airways"                           # URL of the initial page
response1 = requests.get(url1)                                                                    # Sending a GET request to the URL
soup1 = BeautifulSoup(response1.content, "html.parser")                                           # Parsing the HTML content of the response
total_review_count = int(soup1.find_all('div',{'class':'pagination-total'})[0].text.split()[-2])  # Extracting the total number of reviews

# Construct URL with total review count for pagination
url = f"https://www.airlinequality.com/airline-reviews/british-airways/?sortby=post_date%3ADesc&pagesize={total_review_count}"  # Constructing the URL with total review count for pagination
response = requests.get(url)                            # Sending a GET request to the URL
soup = BeautifulSoup(response.content, "html.parser")   # Parsing the HTML content of the response

# Extract review details
details_list = soup.find_all('h3', {'class': 'text_sub_header userStatusWrapper'})  # Finding all review details

# Extract review boxes
review_box = soup.find_all('div',{'class':'tc_mobile'})                             # Finding all review boxes

# Extract review statuses
status = soup.find_all('table',{'class':'review-ratings'})                          # Finding all review statuses

# Removing unnecessary first element
del status[0]

# Extract review ratings
ratings = soup.find_all('div', {'class': 'rating-10'})                              # Finding all review ratings

#### Function to Extract Ratings from Review Data
The function `get_rating()` is designed to extract ratings from a list of review elements. This function is a part of the process of scraping and analyzing customer review data for British Airways from a third-party website.

#### Code Explanation

- The function takes a list of review elements as input.
- It iterates through each review element in the list.
- For each review element, it extracts the rating value using BeautifulSoup.
- The extracted rating value is appended to a list of ratings.
- Finally, the function returns the list of ratings.

#### Conclusion

The `get_rating()` function efficiently extracts ratings from review data, enabling further analysis of customer feedback and sentiment. This function contributes to the overall process of deriving insights from customer reviews to enhance the customer experience within British Airways.


In [None]:
def get_rating(list):
    ratings = []   # Initialize list for storing ratings
    
    for div in list:  # Iterate through each review element
        # Extract the rating value from the review element
        rating_value_element = div.find('span', itemprop='ratingValue')
        rating_value = rating_value_element.text.strip() if rating_value_element else None
        ratings.append(rating_value)  # Add rating value to ratings list

    return ratings  # Return the list of ratings

rating_out_10 = get_rating(ratings)  # Get ratings
del rating_out_10[0]  # Remove first element (header)

#### Verification of Review Content

The function `verification()` is designed to verify the content of each review extracted from the review boxes. It examines whether the review includes a verification status, indicated by the presence of a checkmark symbol ('✅').

#### Code Explanation

- The function takes a list of review boxes as input.
- It iterates through each review box in the list.
- For each review box, it finds the verification status element and extracts the text content.
- The verification status is parsed from the text using a split operation.
- If the length of the parsed result is less than or equal to 3 characters, it is considered as a valid verification status and added to the verification list.
- Otherwise, 'no' is appended to indicate the absence of a verification status.
- Finally, the function returns the list of verification statuses.

#### Conclusion

The `verification()` function provides insight into the verification status of review content, which can be useful for assessing the credibility and reliability of customer feedback.


In [None]:
def verification(review_box):
    # Initialize list to store verification statuses
    verify = []
    
    # Iterate through each review box
    for div in review_box:
        # Find the text content element within the review box
        verification = div.find('div',{'class':'text_content'})
        # Extract and process the verification status
        result = verification.text.split('|')[0].split('✅')[-1].strip()
        
        # Check if the verification status is valid
        if len(result) <= 3:
            verify.append(result)
        else:
            verify.append('no')
    
    # Return the list of verification statuses
    return verify

# Call the verification function and store the result in 'verify'
verify = verification(review_box)

#### Function to Extract Reviews from Review Boxes

The function `get_reviews()` aims to extract the textual content of each review from the review boxes. It processes the HTML structure of each review box to retrieve the review text.

#### Code Explanation

- The function takes a list of review boxes as input.
- It iterates through each review box in the list.
- For each review box, it finds the text content element.
- If the text content is found, it is stripped of leading and trailing whitespace and split by the '|' character.
- If the split result contains at least two elements, the second element is considered as the review text. Otherwise, the first element is used.
- The extracted review text is appended to the list of reviews.
- If no text content is found, 'Null' is appended to indicate the absence of a review.
- Finally, the function returns the list of reviews.

#### Conclusion

The `get_reviews()` function efficiently extracts review text from review boxes, providing valuable insights into customer feedback for further analysis.


In [None]:
def get_reviews(review_boxes):
    # Initialize list to store extracted reviews
    reviews = []
    
    # Iterate through each review box
    for box in review_boxes:
        # Find the text content element within the review box
        text_content = box.find('div', class_='text_content')
        
        # If text content is found
        if text_content:
            # Split the text content by '|' character
            review = text_content.text.strip().split('|')
            # Extract the review text
            if len(review) >= 2:
                review_text = review[1].strip()
            else:
                review_text = review[0].strip()
            # Append the review text to the list of reviews
            reviews.append(review_text)
        else:
            # If no text content is found, append 'Null' to indicate absence of review
            reviews.append('Null')
    
    # Return the list of extracted reviews
    return reviews

reviews = get_reviews(review_box)

#### Function to Extract Details from Review Data

The function `get_details()` is designed to extract specific details such as names, places, and dates from a list of review elements. This function utilizes BeautifulSoup and regular expressions for data extraction.

#### Code Explanation

- The function takes a list of review elements as input.
- It iterates through each review element in the list.
- For each review element, it initializes a BeautifulSoup instance to parse the HTML content.
- It then finds the name element using BeautifulSoup and extracts the text content.
- The place is extracted using a regular expression pattern to match text enclosed within parentheses.
- Similarly, the date is extracted using BeautifulSoup from the 'datetime' attribute of the time element.
- The extracted details (names, places, and dates) are appended to their respective lists.
- Finally, the function returns the lists of names, places, and dates.

#### Conclusion

The `get_details()` function facilitates the extraction of important details from review data, providing valuable insights for further analysis and interpretation.


In [None]:
import re  # Importing the re module for regular expressions

def get_details(list):
    # Initialize lists to store names, places, and dates
    names = []
    places = []
    dates = []
    
    # Iterate through each review detail
    for detail in list:
        # Create BeautifulSoup instance to parse HTML
        soup = BeautifulSoup(str(detail), 'html.parser')
        
        # Extract the name
        name_element = soup.find('span', itemprop='name')
        name = name_element.text.strip() if name_element else None
        
        # Extract the place using regular expression
        place_regex = r'\((.*?)\)'
        place_match = re.search(place_regex, soup.text)
        place = place_match.group(1) if place_match else None
        
        # Extract the date
        date_element = soup.find('time', itemprop='datePublished')
        date = date_element['datetime'] if date_element else None
        
        # Append extracted details to respective lists
        names.append(name)
        places.append(place)
        dates.append(date)

    # Return lists of names, places, and dates
    return names, places, dates

details = get_details(details_list)

#### Final formating the scraped data to CSV file

- The `scrap()` function is used to extract review details from review tables. It identifies different types of review attributes (e.g., ratings and text) and processes them accordingly.
- The main loop iterates through each review table and calls the `scrap()` function to extract review details.
- After scraping, the extracted data is structured into a DataFrame using pandas, and additional columns such as name, place, post date, verification status, review text, and overall rating are added.
- The DataFrame is then exported to a CSV file named 'BA_data.csv'.

#### Conclusion

The provided code effectively scrapes and processes review data for British Airways, allowing for further analysis and insights into customer feedback and satisfaction.


In [None]:
# Importing necessary libraries
import re
import pandas as pd

# Initialize list to store review data
data = []

# Define function to scrape review details
def scrap(Table):
    doc = []

    # Define lists for different types of review attributes
    review_value = ['Type Of Traveller', 'Seat Type', 'Route', 'Date Flown', 'Recommended']
    review_rating = ['Seat Comfort', 'Cabin Staff Service', 'Food & Beverages', 'Inflight Entertainment', 'Ground Service', 'Value For Money', 'Wifi & Connectivity']

    # Iterate through each review attribute
    for Str in x:
        # Check if the attribute is a review value
        if Str in review_value:
            td = Table.find('td', class_='review-rating-header', string=f'{Str}')
            # Extract review value
            if td:
                rating_td = td.find_next_sibling('td', class_='review-value')
                doc.append(rating_td.text)
            else:
                doc.append('Null')
        else:
            td = Table.find('td', class_='review-rating-header', string=f'{Str}')
            # Check if attribute is 'Food & Beverages'
            if Str == 'Food & Beverages':
                if td:
                    doc.append('yes')
                else:
                    doc.append('no')
            if td:
                rating_td = td.find_next_sibling('td', class_='review-rating-stars')
                # Extract and append review rating
                if rating_td:
                    rating = rating_td.find_all('span', class_='star fill')
                    rating_value = len(rating)
                    doc.append(rating_value)
                else:
                    doc.append(0)
            else:
                doc.append(0)
    # Append scraped data to main list
    data.append(doc)

# Iterate through review status elements and scrape data
for i in status:
    scrap(i)

# Create DataFrame from scraped data
df = pd.DataFrame(data, columns=x)

# Add additional columns to DataFrame
df.insert(loc=0, column='Name', value=details[0])
df.insert(loc=1, column='Place', value=details[1])
df.insert(loc=2, column='Post Date', value=details[2])
df.insert(loc=3, column='Verified', value=verify)
df.insert(loc=4, column='Review', value=reviews)
df['Rating'] = rating_out_10

# Export DataFrame to CSV
df.to_csv('BA_data.csv', index=False)

### Data Cleaning for British Airways Reviews

#### Making Sense of British Airways Review Data

In this section, we're cleaning up the data we've collected from British Airways reviews. Here's what's happening:

- First, we're reading in the CSV file that contains all the review data we've scraped.
- Next, we're replacing any 'Null' values with proper NaNs, so our data is consistent and understandable.
- Then, we're cleaning up the date columns, 'Date Flown' and 'Post Date', making sure they're in a proper datetime format for analysis.
- We're also converting the 'Rating' column to floats, so we can perform numerical operations on it later.
- Now, for some of the categorical columns like 'Type Of Traveller', 'Seat Type', and 'Route', we're making sure any 'Null' values are properly converted to NaNs.
- Similarly, for the rating columns like 'Seat Comfort', 'Cabin Staff Service', and others, we're replacing any 0 values with NaNs. This ensures that missing ratings don't skew our analysis.
- Finally, for the 'Recommended' column, we're converting categorical values ('yes' and 'no') to boolean values (True and False), so we can easily understand if a reviewer recommends British Airways or not.

By cleaning up the data, we're ensuring that it's ready for analysis. Once this is done, we'll be able to delve deeper into the insights hidden within these reviews and make informed decisions to enhance the British Airways experience.


In [None]:
import pandas as pd 
import numpy as np

df = pd.read_csv('BA_data.csv')

# Replace 'Null' values with actual None (NaN) values
df.replace('Null', np.nan, inplace=True)

# Clean the 'Date Flown' column by converting it to a valid date format
df['Post Date'] = pd.to_datetime(df['Post Date'])
df['Date Flown'] = pd.to_datetime(df['Date Flown'], errors='coerce')
df['Rating'] = df['Rating'].astype(float)

# Clean the 'Type of traveller' column by replacing 'Null' values with None (NaN)
df['Type Of Traveller'].replace('Null', np.nan, inplace=True)

# Clean the 'Seat type' column by replacing 'Null' values with None (NaN)
df['Seat Type'].replace('Null', np.nan, inplace=True)

# Clean the 'Route' column by replacing 'Null' values with None (NaN)
df['Route'].replace('Null', np.nan, inplace=True)

# Clean the rating columns by replacing 0 values with None (NaN)
rating_columns = ['Seat Comfort', 'Cabin Staff Service', 'Food & Beverages', 'Inflight Entertainment',
                  'Wifi & Connectivity', 'Ground Service', 'Value For Money', 'Rating']
df[rating_columns] = df[rating_columns].replace(0, np.nan)

# Clean the 'Recommended' column by converting 'yes' and 'no' values to boolean values
df['Recommended'] = df['Recommended'].map({'yes': True, 'no': False})

# Display the cleaned DataFrame

df.to_csv('BA_data.csv', index= False)

### Viewing British Airways Review Data

#### Exploring Our British Airways Review Dataset

In this section, we're loading our British Airways review dataset from a CSV file and taking a quick look at the first few rows to understand its structure and contents.

#### Our Initial Data Exploration

We've imported the pandas library and loaded our dataset into a DataFrame named `df` using the `read_csv()` function.

#### Checking the First Few Rows

Now, let's take a peek at the first few rows of our dataset to get an overview of the information it contains. This will help us understand the columns and the type of data we're working with.

```python
df.head()


In [2]:
# Importing necessary libraries
import pandas as pd

# Reading the CSV file into a DataFrame
df = pd.read_csv('BA_data.csv')

# Displaying the first few rows of the DataFrame
df.head()


Unnamed: 0,Name,Place,Post Date,Verified,Review,Type Of Traveller,Seat Type,Route,Date Flown,Seat Comfort,Cabin Staff Service,Food Opted,Food & Beverages,Inflight Entertainment,Wifi & Connectivity,Ground Service,Value For Money,Recommended,Rating
0,K Robson,United Kingdom,2023-07-16,no,Horrible airline. Does not care about their cu...,Solo Leisure,Economy Class,Amman to London,2023-07-01,3.0,1.0,yes,1.0,1.0,3.0,4.0,3.0,False,2.0
1,Pradeep Madhavan,United Kingdom,2023-07-09,no,My family and I have flown mostly on British A...,Couple Leisure,Premium Economy,Chennai to London,2023-07-01,3.0,2.0,yes,1.0,1.0,,4.0,1.0,False,4.0
2,Jeffrey Rice,United States,2023-07-09,no,This has been by far the worst service I have ...,Couple Leisure,Economy Class,Istanbul to London,2023-07-01,2.0,2.0,no,,,,1.0,1.0,False,2.0
3,Bridget Fagan,United Kingdom,2023-07-08,no,In Nov 2022 I booked and paid for a return jou...,Solo Leisure,Economy Class,London to Edinburgh,2022-11-01,2.0,5.0,yes,3.0,3.0,,1.0,2.0,False,2.0
4,Bervin Hedman,United Kingdom,2023-07-06,no,BA is not treating its premium economy passeng...,Family Leisure,Premium Economy,Kingston to London,2023-06-01,5.0,4.0,yes,4.0,3.0,,3.0,3.0,False,4.0


### Understanding British Airways Review Data

#### Overview of Our Data

In this section, we're inspecting the structure and summary information of our British Airways review data using the `info()` function. 

#### What We're Doing

We're examining essential details about our DataFrame, such as the number of entries, the number of non-null values for each column, and the data types of each column. 

#### Why It's Important

Understanding the structure and data types of our DataFrame helps us identify any missing or inconsistent data, plan data cleaning and preprocessing steps, and ensure that our analysis is based on accurate and reliable information.

#### Conclusion

By gaining insights into the overall structure and summary information of our British Airways review data, we're better prepared to proceed with our analysis and extract meaningful insights from the dataset.


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3599 entries, 0 to 3598
Data columns (total 19 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Name                    3599 non-null   object 
 1   Place                   3597 non-null   object 
 2   Post Date               3599 non-null   object 
 3   Verified                3599 non-null   object 
 4   Review                  3599 non-null   object 
 5   Type Of Traveller       2829 non-null   object 
 6   Seat Type               3597 non-null   object 
 7   Route                   2824 non-null   object 
 8   Date Flown              2821 non-null   object 
 9   Seat Comfort            3492 non-null   float64
 10  Cabin Staff Service     3482 non-null   float64
 11  Food Opted              3599 non-null   object 
 12  Food & Beverages        3238 non-null   float64
 13  Inflight Entertainment  2504 non-null   float64
 14  Wifi & Connectivity     574 non-null    

### Summarizing British Airways Review Data

#### Exploring Statistical Summary

In this section, we're generating a statistical summary of our British Airways review data using the `describe()` function.

#### What We're Doing

We're computing descriptive statistics for numerical columns in our DataFrame, including count, mean, standard deviation, minimum, maximum, and quartile values.

#### Why It's Important

Generating a statistical summary allows us to gain insights into the distribution and variability of numerical variables in our dataset. It helps us understand the central tendency, spread, and shape of the data, which are crucial for making informed decisions and drawing meaningful conclusions.

#### Conclusion

By examining the statistical summary of our British Airways review data, we can identify patterns, outliers, and potential areas of interest for further analysis, enabling us to derive valuable insights and recommendations.


In [4]:
df.describe()

Unnamed: 0,Seat Comfort,Cabin Staff Service,Food & Beverages,Inflight Entertainment,Wifi & Connectivity,Ground Service,Value For Money,Rating
count,3492.0,3482.0,3238.0,2504.0,574.0,2758.0,3598.0,3594.0
mean,2.888603,3.263929,2.724212,2.653754,1.930314,2.812183,2.709283,4.779911
std,1.36302,1.488098,1.439193,1.398704,1.358325,1.451765,1.469352,3.168472
min,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
25%,2.0,2.0,1.0,1.0,1.0,1.0,1.0,2.0
50%,3.0,3.0,3.0,3.0,1.0,3.0,3.0,4.0
75%,4.0,5.0,4.0,4.0,3.0,4.0,4.0,8.0
max,5.0,5.0,5.0,5.0,5.0,5.0,5.0,10.0


### Exploring Correlation Between Review Attributes

#### Understanding Relationship Between Review Attributes

In this section, we're analyzing the correlation between different review attributes such as 'Seat Comfort', 'Cabin Staff Service', 'Food & Beverages', 'Inflight Entertainment', 'Wifi & Connectivity', 'Ground Service', 'Value For Money', 'Recommended', and 'Rating'.

#### What We're Doing

We're computing the Spearman correlation coefficients between these review attributes using the `corr()` function with the 'spearman' method.

#### Why It's Important

Understanding the correlation between review attributes helps us identify potential relationships and dependencies between different aspects of the flight experience. It allows us to pinpoint areas that may have a significant impact on overall ratings and customer recommendations.

#### Conclusion

By analyzing the correlation matrix, we can uncover insights into how different aspects of the flight experience are related to each other, enabling us to prioritize areas for improvement and enhance the overall customer satisfaction with British Airways.


In [12]:
rating_columns = ['Seat Comfort', 'Cabin Staff Service', 'Food & Beverages', 'Inflight Entertainment',
                  'Wifi & Connectivity', 'Ground Service', 'Value For Money', 'Recommended', 'Rating']

corr = df.corr(method='spearman', numeric_only=True)
corr

Unnamed: 0,Seat Comfort,Cabin Staff Service,Food & Beverages,Inflight Entertainment,Wifi & Connectivity,Ground Service,Value For Money,Recommended,Rating
Seat Comfort,1.0,0.599993,0.616279,0.529257,0.559472,0.536218,0.694639,0.653902,0.735054
Cabin Staff Service,0.599993,1.0,0.715341,0.461425,0.490021,0.527636,0.661757,0.662224,0.723667
Food & Beverages,0.616279,0.715341,1.0,0.533965,0.541256,0.50503,0.704444,0.695698,0.744917
Inflight Entertainment,0.529257,0.461425,0.533965,1.0,0.658749,0.448801,0.509313,0.498571,0.540615
Wifi & Connectivity,0.559472,0.490021,0.541256,0.658749,1.0,0.447826,0.532138,0.535152,0.545394
Ground Service,0.536218,0.527636,0.50503,0.448801,0.447826,1.0,0.670034,0.626434,0.729626
Value For Money,0.694639,0.661757,0.704444,0.509313,0.532138,0.670034,1.0,0.790908,0.86343
Recommended,0.653902,0.662224,0.695698,0.498571,0.535152,0.626434,0.790908,1.0,0.829839
Rating,0.735054,0.723667,0.744917,0.540615,0.545394,0.729626,0.86343,0.829839,1.0


### Visualizing Correlation Between Review Attributes

#### Understanding Correlation Patterns

In this section, we're visualizing the correlation between different review attributes such as 'Seat Comfort', 'Cabin Staff Service', 'Food & Beverages', 'Inflight Entertainment', 'Wifi & Connectivity', 'Ground Service', 'Value For Money', 'Recommended', and 'Rating'.

#### What We're Doing

We're creating a heatmap using Plotly to visually represent the correlation matrix computed earlier. Each cell in the heatmap represents the correlation coefficient between two review attributes. We're also adding text annotations to display the correlation values on each cell.

#### Why It's Important

Visualizing the correlation heatmap allows us to easily identify patterns and relationships between different review attributes. We can quickly spot areas of high positive or negative correlation, which helps us understand how changes in one attribute may affect others.

#### Conclusion

By visualizing the correlation heatmap, we gain valuable insights into the interplay between different aspects of the flight experience. This visualization guides our decision-making process and helps us prioritize areas for improvement to enhance overall customer satisfaction with British Airways.


In [6]:
import pandas as pd
import plotly.graph_objects as go

# Get the column names for the heatmap
columns = corr.columns.tolist()

# Create the heatmap using Plotly
fig = go.Figure(data=go.Heatmap(
    z=corr.values,
    x=columns,
    y=columns,
    colorscale='Viridis',  # You can choose any other colorscale
    colorbar=dict(title='Correlation'),
))

# Add text annotations for correlation values on each box
for i in range(len(columns)):
    for j in range(len(columns)):
        fig.add_annotation(x=columns[j], y=columns[i],
                           text=str(round(corr.iloc[i, j], 2)),
                           showarrow=False, font=dict(size=10, color='white'))

# Customize the layout
fig.update_layout(
    title='Correlation Heatmap',
    xaxis_title='Columns',
    yaxis_title='Columns',
    width=700,
    height=700,
)

# Show the heatmap
fig.show()

### Analyzing Seat Types Distribution

#### Exploring Seat Types

In this section, we're visualizing the distribution of different seat types within our British Airways review dataset.

#### What We're Doing

We're creating a histogram using Plotly Express to visualize the count of each seat type. Each bar in the histogram represents the frequency of a specific seat type. We're customizing the colors of the bars to distinguish between different seat types.

#### Why It's Important

Analyzing the distribution of seat types helps us understand the variety of seating options available to customers. It allows us to identify popular seat types and potential areas for improvement or optimization in seat allocation.

#### Conclusion

By visualizing the distribution of seat types, we gain insights into the preferences and choices of customers when it comes to seating arrangements. This information can be valuable for British Airways in enhancing the overall flight experience and catering to the diverse needs of passengers.


In [7]:
import plotly.express as plt


plt.histogram(df,x='Seat Type',height=400,width=600,color='Seat Type',color_discrete_sequence=['purple','yellow','blue','#48E8ED'], title= 'Seat Type Count')

#### Analyzing Traveler Type Distribution

We're using Plotly Express to create a histogram showcasing the distribution of traveler types within our British Airways review dataset. Each bar represents the count of a specific traveler type, aiding in understanding passenger demographics and tailoring service offerings accordingly.


In [8]:
plt.histogram(df,x='Type Of Traveller',height=400,width=600,color='Type Of Traveller',color_discrete_sequence=['purple','yellow','blue','#48E8ED'],title= 'Type of Traveller Count')

#### Analyzing Ratings by Traveler Type

We're using Plotly Express to create a histogram visualizing the distribution of ratings respective to the type of traveler within our British Airways review dataset.

In [9]:
plt.histogram(df,x='Type Of Traveller',y = 'Rating',height=400,width=600,color='Rating', title= 'Distribution of Ratings Respective To Type Of Traveller')

#### Analyzing Seat Comfort by Traveler Type

We're using Plotly Express to create a histogram visualizing the distribution of seat comfort ratings respective to the type of traveler within our British Airways review dataset.

In [10]:
plt.histogram(df,x='Type Of Traveller',y = 'Seat Comfort',height=400,width=600,color='Seat Comfort',title= 'Seat Comfort Respective To Type Of Traveller')

### Conclusion

We conducted an analysis of the British Airways review data, focusing on understanding passenger sentiments and preferences. 

#### Data Preparation and Cleaning

We started by preparing and cleaning the dataset, addressing issues such as missing values, inconsistent formatting, and irrelevant text. This involved tasks like converting data types, handling null values, and standardizing text.

#### Data Analysis

We performed exploratory data analysis to uncover insights from the cleaned dataset. This included analyzing distributions of traveler types, ratings, and seat comfort. We used histograms to visualize the distribution of ratings and seat comfort ratings across different traveler types.

### `` Insights`` 💡

Our analysis of the correlation between different review attributes revealed the following insights:
- There is a strong positive correlation between seat comfort and ratings, indicating that passengers who rate seat comfort highly tend to give higher overall ratings.
- Cabin staff service, food & beverages, inflight entertainment, and wifi & connectivity also show positive correlations with ratings, albeit slightly weaker compared to seat comfort.
- Value for money has a strong positive correlation with ratings, indicating that passengers who perceive better value for money tend to give higher ratings.
- The recommendation status also correlates positively with ratings, suggesting that passengers who recommend British Airways tend to give higher ratings.

#### Next Steps

Based on these findings, further analysis could include sentiment analysis to understand the tone of the reviews and topic modeling to identify common themes or topics discussed by passengers. These insights can inform strategic decisions to enhance the customer experience and improve overall satisfaction with British Airways services.
