## **Final Project**

### **A. Introduction**

---

**Batch**         : BSD-005  
**Group**         : 002  

**Team Members**  :
- Livia Amanda Annafiah
- Alfarabi
- Badriah Nursakinah

**Dataset**       : [Airline Reviews](https://www.kaggle.com/datasets/juhibhojani/airline-reviews/data)  

**Hugging Face**  :


---

**Problem Statement**  

Choosing the right airline can greatly affect a traveler's overall experience, including comfort, service quality, and in-flight amenities. With many online reviews available, airline passengers often **rely on these reviews** to make informed decisions about which airline to choose. However, the large number of reviews can make it difficult and **time-consuming** to read through and understand the general opinion about an airline.

**FlightBuddy** aims to solve this problem by using advanced Natural Language Processing (NLP) techniques to analyze airline reviews quickly and accurately. By processing and understanding a large number of reviews, FlightBuddy can determine whether the opinions in the reviews are positive or negative.

---

**Objective**  

The main goal of **FlightBuddy** is to improve the decision-making process for travelers by providing personalized airline recommendations based on the analysis of review sentiments. Specifically, FlightBuddy aims to:

- Analyze the sentiment of airline reviews to classify them as positive or negative.
- Recommend five airlines with similar positive characteristics for users who have seen favorable reviews.
- Suggest top-rated alternative airlines for users who have encountered negative experiences, ensuring they have better options for future travel.

### **B. Libraries**

The following libraries are used for this analysis:

In [1]:
# Import libraries for data loading and manipulation
import pandas as pd
import numpy as np

# Import libraries for recommendation system
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import StandardScaler

# Import library to ignore warnings
import warnings
warnings.filterwarnings('ignore')

### **C. Data Loading**

In [2]:
# Read CSV file
df = pd.read_csv('avg_ratings_per_airline.csv')

# Display the dataframe
df

Unnamed: 0,airline_name,avg_seat_comfort,avg_cabin_staff_service,avg_food_beverages,avg_ground_service,avg_inflight_entertainment,avg_wifi_connectivity,avg_value_for_money
0,Air Costa,5.000000,4.000000,3.000000,4.000000,2.000000,1.00,5.00000
1,UP by El Al,2.000000,2.250000,2.000000,2.500000,2.000000,1.00,2.50000
2,Iraqi Airways,3.000000,2.750000,2.750000,2.500000,2.750000,1.00,3.00000
3,Asiana Airlines,3.643286,4.029729,3.819823,3.650000,2.960000,1.57,3.83000
4,Air Tahiti Nui,2.461307,2.996890,2.605181,1.630000,2.250000,1.11,2.40000
...,...,...,...,...,...,...,...,...
492,Copa Airlines,2.160389,2.383309,1.997148,1.570000,1.850000,1.08,1.65000
493,Eastern Airways,2.496214,3.021872,2.515598,1.555556,1.793651,1.00,2.31532
494,Aegean Airlines,2.793286,3.201013,2.766253,2.510000,2.170000,1.55,2.49000
495,WOW air,2.019470,2.448445,1.995721,1.870000,1.570000,1.05,2.00000


### **D. Recommender System**

In [3]:
# Normalize the ratings
scaler = StandardScaler()

rating_columns = ['avg_seat_comfort', 'avg_cabin_staff_service', 'avg_food_beverages', 'avg_ground_service',
                  'avg_inflight_entertainment', 'avg_wifi_connectivity', 'avg_value_for_money']

normalized_data = scaler.fit_transform(df[rating_columns])

# Compute cosine similarity
similarity_matrix = cosine_similarity(normalized_data)

# Convert the similarity matrix to a DataFrame
similarity_df = pd.DataFrame(similarity_matrix, index=df['airline_name'], columns=df['airline_name'])

# Function to get recommendations based on cosine similarity
def get_similar_airlines(airline, n_recommendations=5):

    # Get the similarity scores for the specified airline with all others
    similar_scores = similarity_df[airline].sort_values(ascending=False)

    # Remove the airline itself from the recommendation
    similar_scores = similar_scores.drop(airline)

    # Get the top N similar airlines
    top_airlines = similar_scores.head(n_recommendations)
    return top_airlines

### **E. Test the Recommender System**

In [4]:
# Test the function by getting similar airlines to "TUS Airways"
similar_airlines_to_tus = get_similar_airlines('TUS Airways')

# Show result
similar_airlines_to_tus

airline_name
Golden Myanmar Airlines    0.991354
Air Rarotonga              0.983796
Bangkok Airways            0.983468
Air KBZ                    0.981220
Bassaka Air                0.980871
Name: TUS Airways, dtype: float64

### **F. Conclusion**