<a href="https://www.kaggle.com/code/shet9s/las-vegas-hotels-reviews-analysis-tripadvisor?scriptVersionId=143715781" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Analyzing Las Vegas Hotels Reviews on TripAdvisor

In this notebook, I delve into a comprehensive analysis of hotel reviews in Las Vegas sourced from TripAdvisor.

The analysis covers the following key aspects and more:
1. **Amenities Impact:** Understanding how amenities such as free WiFi, pool, Casino, Spa presence affect the average ratings.
2. **Best and Worst Hotels:** Identifying and showcasing the top 10 best and worst hotels based on user ratings.
3. **Traveler Categories:** Analyzing the best hotels based on different traveler categories (e.g., families, couples, business travelers).
4. **Seasonal Variation:** Determining the best hotels for each 3-month period throughout the year.

By exploring these facets, we aim to provide valuable insights for travelers seeking the best accommodation options in Las Vegas and assist the hospitality industry in enhancing their services to meet customer expectations.

Let's start by loading and exploring the dataset, followed by a systematic analysis to uncover valuable insights that can assist both travelers and the hospitality industry in understanding and enhancing the overall hotel experience in Las Vegas.


# 1) Importing Necessary Packages 

In [None]:
## Specify the directory path
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
        
### Pandas for Data Manipulation and Processing
import pandas as pd

### NumPy for Numerical Operations
import numpy as np

### Matplotlib for Data Visualization
import matplotlib.pyplot as plt

### Seaborn for Statistical Data Visualization
import seaborn as sns

### missingno for visualizing null data in simple way
import missingno as msno

In [None]:
### Now to Extract the data from CSV file

h_reviews = pd.read_csv("../input/hotels-reviews/lasvegas_tripadvisor.csv")
h_reviews.head(5)

# 2) Checking for Missing Values & Duplicates 

In [None]:
msno.matrix(h_reviews)
h_reviews.duplicated().any()



# NO Missing Values / Duplicates found
No need for further cleaning

# 3) Data Manipulation to work more efficient

In [None]:
### I always love to start with changing the name of columns to work faster

h_reviews.rename(columns={"Free internet":"free_internet","User country":"user_country","Traveler type":"traveler_type","Period of stay":"stay_period","Helpful votes":"helpful_votes","Hotel stars":"hotel_stars","Nr. rooms":"num_rooms","Score":"score","Pool":"pool","Gym":"gym","Tennis court":"tennis_court","Spa":"spa","Casino":"casino","Hotel name":"hotel_name","Free internet":"free_wifi"},inplace=True)

# 4) Finally ready to start extracting insights!

## 4) A) Analyzing Hotel Ratings by Visitor Categories

****The question I will try to answer here is "Examining which visitor category tends to give higher ratings in hotel reviews."****

In [None]:
### we must use average here, considering that the number of reviews may vary across categories. This approach ensures a fair comparison.
average_scores = h_reviews.groupby('traveler_type')['score'].mean().reset_index()
average_scores = average_scores.sort_values(by='score', ascending=False)

# Define a custom color palette
custom_palette = sns.color_palette("Set3")

# Plot the average scores using Seaborn catplot
plt.figure(figsize=(10, 6))
sns.catplot(data=average_scores, x='traveler_type', y='score', kind='bar', height=5, aspect=2,palette=custom_palette)
plt.title('Average Hotel Ratings by Visitor Category')
plt.xlabel('Visitor Category')
plt.ylabel('Average Rating')
plt.show()

## Conclusion: The average hotel ratings by visitor category, in descending order, are as follows:
- Friends
- Couples
- Families
- Solo travelers
- Business travelers

**It's important to highlight that while there are slight variations in the ratings across these categories, the differences are relatively minimal, indicating a consistent satisfaction level among different visitor types.**

                                                     -------------------------------------------------------

### 4) B) Analyzing Trustworthiness of the Reviews based on Helpfulness Ratings

****The question I will try to answer here is "If enough people mark the review as helpful, will future readers of these reviews might be more likely to trust the content of the original review?"****

In [None]:
# Define a custom color palette
custom_palette = sns.color_palette("Set3")

# Set the font scale and style
sns.set(font_scale=1.2)
sns.set_style("whitegrid")

# Create the box plot with increased height, no outliers, and a custom color palette
g = sns.catplot(data=h_reviews, kind="box", x="traveler_type", y="helpful_votes", aspect=2, height=8, sym="", palette=custom_palette)
g.set_axis_labels('Traveler Type', 'Helpful Votes')
g.fig.suptitle('Distribution of Helpful Votes by Traveler Type (Outliers Removed)', y=1.02)

plt.show()

## Conclusion: Business Travelers Submit the Most Trustworthy Hotel Reviews


                                                     -------------------------------------------------------

### 4) C) Analyzing Seasonal Patterns of Average Ratings in Hotel Reviews

****The question I will try to investigate here is "If quality of ratings follow a specific seasonal pattern?"****

In [None]:
# Calculate the average rating for each stay period
average_rating_per_period = h_reviews.groupby('stay_period')['score'].mean().sort_values(ascending=False)

# Plot the relationship between stay period and average rating
plt.figure(figsize=(12, 8))
ax = sns.barplot(x=average_rating_per_period.values, y=average_rating_per_period.index, palette='viridis')

ax.set_xlabel('Average Rating')
ax.set_ylabel('Stay Period')
ax.set_title('Average Hotel Rating for Every 3-Month Period (Descending Order)')
ax.set_xlim(3, 5)  # A common limit for better comparison

# Display data labels (used chatGPT for this)
for i in range(len(average_rating_per_period)):
    ax.text(average_rating_per_period.values[i] + 0.02, i, f'{average_rating_per_period.values[i]:.2f}', ha='center', va='center')

plt.show()

## Conclusion: The analysis indicates that there are minimal differences in average hotel ratings for each 3-month period.Suggesting a consistent satisfaction level among guests throughout the year.


                                                     -------------------------------------------------------

### 4) D) Analyzing Seasonal Patterns of Helpful Votes in Hotel Reviews

****The question I will try to investigate here is "If helpful votes follow a specific seasonal pattern?"****

In [None]:
print(h_reviews["stay_period"].value_counts())
# no variation in number of reviews

# Define a custom color palette
custom_palette = sns.color_palette("RdPu", 6)

# Define the custom order for the stay periods
custom_order = ['Dec-Feb', 'Mar-May', 'Jun-Aug', 'Sep-Nov']


sns.catplot(kind="bar",data=h_reviews,x="stay_period",y="helpful_votes",ci=None,palette=custom_palette,order=custom_order)
plt.show()


## Conclusion: Potential guests tend to find reviews from December-February most helpful, and those from September-November least helpful. It gets worse as the year goes. However, the overall difference is not much.


                                                     -------------------------------------------------------

  #                                                         5)               Amenities Impact

## 5) A) Analyzing the Impact of Pool Availability on Hotel Ratings

****The question I will try to answer here is "The extent to which the presence of a pool in a hotel influences its overall rating."****

In [None]:
### we must use average here, considering that the number of reviews may vary across categories. This approach ensures a fair comparison.
custom_order = ["YES","NO"]
sns.catplot(x="pool", y="score", data=h_reviews, kind="bar",estimator=np.mean,ci=None,order=custom_order,palette='viridis').set(title='Average Hotel Rating based on Pool Presence')
plt.show()

## Conclusion: Hotels with a pool exhibit higher ratings, averaging at 4, compared to hotels without a pool, which average at 3.Therefore, The difference is whole One point which indicates presence of pool is mandatory in hotels

                                                     -------------------------------------------------------

## 5) B) Impact of Free WiFi on Hotel Ratings
**Does the availability of free WiFi in a hotel have a notable influence on its average rating? How does the provision of free WiFi potentially affect the perception of the hotel's quality?**

In [None]:
# Calculate the average rating for hotels with and without free WiFi
avg_rating_with_wifi = h_reviews[h_reviews['free_wifi'] == 'YES']['score'].mean()
avg_rating_without_wifi = h_reviews[h_reviews['free_wifi'] == 'NO']['score'].mean()

# Plotting a bar plot to show the average rating based on free WiFi presence
plt.figure(figsize=(6, 6))
sns.barplot(x=['With Free WiFi', 'Without Free WiFi'], y=[avg_rating_with_wifi, avg_rating_without_wifi], palette='viridis')

plt.xlabel('WiFi Presence')
plt.ylabel('Average Rating')
plt.title('Average Rating Based on Free WiFi Presence')

plt.show()


### Conclusion: The analysis reveals a substantial positive impact of offering free WiFi on a hotel's average rating. Hotels providing free WiFi experience an average rating approximately 0.8 points higher than those without this amenity. This considerable difference underscores the significance of offering free WiFi as a contributing factor in enhancing guest satisfaction and positively influencing their overall rating of the hotel.

## 5) C) Analyzing the Impact of Tennis Court Availability on Hotel Ratings

****The question I will try to answer this time is "The extent to which the presence of a tennis court in a hotel influences its overall rating."****

In [None]:
# Plotting average rating against the presence of a tennis court
custom_order=["YES","NO"]
plt.figure(figsize=(8, 6))
ax = sns.barplot(x='tennis_court', y='score', data=h_reviews, estimator=np.mean, ci=None,palette='viridis',order=custom_order)

ax.set_xlabel('Tennis Court Presence')
ax.set_ylabel('Average Rating')
ax.set_title('Average Hotel Rating based on Tennis Court Presence')

plt.show()

## Conclusion: The analysis reveals that Hotels with a tennis court enjoy a higher average rating of approximately 4.5 stars, compared to hotels without a tennis court which receive an average rating of 4 stars. This half-point difference suggests that the availability of a tennis court positively influences guest ratings, albeit moderately.


                                                     -------------------------------------------------------


## 5) D) Analyzing the Impact of other Amenities on Hotel Ratings

****The question I will try to answer here is "To what extent do amenities such as a gym, spa, or casino influence the overall ratings of hotels?"****

In [None]:
df_melted = pd.melt(h_reviews, id_vars=['score'], value_vars=['gym', 'spa', 'casino'], var_name='amenity')


# Plotting average rating against the presence of gym, spa, and casino
plt.figure(figsize=(10, 6))
ax = sns.barplot(x='amenity', y='score', hue='value', data=df_melted, estimator=np.mean, ci=None, palette="viridis")

ax.set_xlabel('Amenity Presence')
ax.set_ylabel('Average Rating')
ax.set_title('Average Hotel Rating based on Amenities Presence')

# Customize legend
leg = ax.legend(title='Presence', labels=['With', 'Without'], loc='lower right')
leg.get_title().set_fontsize(12)

plt.show()

## Conclusion:The analysis across various amenities, including gym, spa, and casino, reveals consistent findings. There appears to be only a slight difference in average hotel ratings based on the presence or absence of these amenities. Whether a hotel offers a gym, spa, or casino seems to have a negligible impact on the overall rating. Guests' ratings remain relatively stable, suggesting that the availability of these amenities does not significantly influence their perception of the hotel.


                                                     -------------------------------------------------------

#  6) Best and Worst Hotels

## 6) A) Best 10 Hotels Based on Average Ratings

In [None]:
# Calculate the average rating for each hotel
average_rating_per_hotel = h_reviews.groupby('hotel_name')['score'].mean().sort_values(ascending=False)

# the top 10 highest rated hotels
top_10_hotels = average_rating_per_hotel.head(10)

# Plotting average rating for the top 10 hotels
plt.figure(figsize=(12, 8))
ax = sns.barplot(x=top_10_hotels.values, y=top_10_hotels.index, palette='viridis')

ax.set_xlabel('Average Rating')
ax.set_ylabel('Hotel Name')
ax.set_title('Top 10 Highest Rated Hotels (Average Ratings)')

plt.show()


## Conclusion: Wynn Las Vegas is the Best Hotel Out there!

                                                     -------------------------------------------------------

## 6) B) Worst 10 Hotels Based on Average Ratings

In [None]:
# Calculate the average rating for each hotel
average_rating_per_hotel = h_reviews.groupby('hotel_name')['score'].mean().sort_values(ascending=True)

# Select the worst 10 rated hotels
worst_10_hotels = average_rating_per_hotel.head(10)

# Plotting average rating for the worst 10 hotels
plt.figure(figsize=(12, 8))
ax = sns.barplot(x=worst_10_hotels.values, y=worst_10_hotels.index, palette='magma')

ax.set_xlabel('Average Rating')
ax.set_ylabel('Hotel Name')
ax.set_title('Worst 10 Hotels Based on Average Ratings')

plt.show()

## Conclusion: The worst Hotel this year is Circus Circus Hotel & Casino Las Vegas

                                                     -------------------------------------------------------

# 7) Traveler Categories

## 7) A) Highest Rated Hotels for Each Traveler Category

****The question I will try to answer this time is "Which hotel is rated the highest on average for each traveler category, providing insights into preferred hotels for different types of travelers?
"****


In [None]:
# Create a dictionary to store the highest-rated hotel for each category
highest_rated_hotels = {}

# List of traveler categories
traveler_categories = h_reviews['traveler_type'].unique()

# Iterate over each traveler category
for category in traveler_categories:
    # Subset the data for the specific category
    category_data = h_reviews[h_reviews['traveler_type'] == category]
    
    # Find the highest-rated hotel for this category
    highest_rated_hotel = category_data.groupby('hotel_name')['score'].mean().idxmax()
    highest_rated_hotels[category] = highest_rated_hotel

# Display the highest-rated hotel for each category
for category, hotel in highest_rated_hotels.items():
    print(f"The highest-rated hotel for {category} travelers is: {hotel}")

## Conclusion:
## The highest-rated hotel for Friends travelers is: Hilton Grand Vacations on the Boulevard
## The highest-rated hotel for Business travelers is: Caesars Palace
## The highest-rated hotel for Families travelers is: The Cosmopolitan Las Vegas
## The highest-rated hotel for Solo travelers is: Hilton Grand Vacations on the Boulevard
## The highest-rated hotel for Couples travelers is: Encore at wynn Las Vegas

                                                     -------------------------------------------------------

# 8) Seasonal Variation

## 8) A) Unveiling the Crème de la Crème - Best Hotels for Every Period

**Which hotels shine brightest during each period of the year, revealing the top-rated Hotels for different seasons?**


In [None]:

# Calculate the average rating for each hotel within each period
avg_rating_per_period = h_reviews.groupby(['stay_period', 'hotel_name'])['score'].mean().reset_index()

# Find the highest rated hotel for each period
best_hotel_per_period = avg_rating_per_period.loc[avg_rating_per_period.groupby('stay_period')['score'].idxmax()]

# Sort the data by stay period for better visualization
best_hotel_per_period.sort_values(by='stay_period', inplace=True)

# Plotting the highest rated hotel for each period
plt.figure(figsize=(12, 8))
ax = sns.barplot(x='stay_period', y='score', hue='hotel_name', data=best_hotel_per_period, palette='viridis')

ax.set_xlabel('Stay Period')
ax.set_ylabel('Average Rating')
ax.set_title('Highest Rated Hotel for Each Period of the Year')

plt.legend(title='Hotel Name', bbox_to_anchor=(1, 1))
plt.xticks(rotation=45)
plt.show()


## Conclusion:

In the realm of hospitality, certain hotels stand as epitomes of excellence during distinct periods of the year. 

- **December to February:** **Trump International Hotel Las Vegas** is the best during the winter months.

- **March to May:** **Marriott's Grand Chateau.**

- **June to August:** **The Palazzo Resort Hotel Casino** shines through the summer.

- **September to November:** As autumn embraces the scene, **The Cosmopolitan Las Vegas** takes the spotlight.

                                                     -------------------------------------------------------

# 9) Random Thoughts

## 9) A) Checking the Relationship between Number of Rooms and Average Rating
**Is there a significant correlation between the number of rooms in a hotel and its average rating? How does the quantity of rooms potentially influence the quality of the hotel?**

In [None]:


# Calculate the correlation
correlation = h_reviews['num_rooms'].corr(h_reviews['score'])

# Plotting a scatter plot to show the relationship between number of rooms and average rating
plt.figure(figsize=(10, 6))
sns.scatterplot(x='num_rooms', y='score', data=h_reviews, color='blue', alpha=0.7)

plt.xlabel('Number of Rooms')
plt.ylabel('Average Rating')
plt.title('Relationship between Number of Rooms and Average Rating\nCorrelation: {:.2f}'.format(correlation))

plt.show()



### Conclusion: The analysis reveals a very weak negative correlation with a coefficient of **-0.08**. Therefore, the correlation is close to zero, signifying a lack of a significant relationship. 

                                                     -------------------------------------------------------

## 9) B) I have a theory I want to check "Normally the people with more ratings tends to give less rating because they have tried alot of hotels so it's hard to satisfy them. We will now check our theory"

In [None]:
# Calculate the average rating for each number of visits
## Note: "Nr. reviews", which is the number of reviews a reviewer has previously provided

filtered_df = h_reviews[h_reviews['Nr. reviews'] < 250]
average_rating_per_visit = filtered_df.groupby('Nr. reviews')['score'].mean()

# Plot the relationship between visits and average rating
plt.figure(figsize=(10, 6))
plt.scatter(average_rating_per_visit.index, average_rating_per_visit.values, color='blue')
plt.xlabel('Number of Visits to Different Places')
plt.ylabel('Average Rating')
plt.title('Average Rating vs. Number of Visits')
plt.grid(True)
plt.show()

## Conclusion: Contrary to the hypothesis, The average ratings remain relatively consistent across different numbers of visits, suggesting that the breadth of experience with various hotels does not significantly influence rating behavior. Further analysis as "test hypotheses" or further exploration of other factors may be needed to understand the determinants of hotel ratings better.


                                                     -------------------------------------------------------

## The End

Our analysis of Las Vegas hotel reviews from TripAdvisor has provided significant insights into the factors influencing user ratings and the overall hospitality experience. Key findings include:

- **Amenities Matter:** Amenities like free WiFi, Pool prescence greatly impact average ratings, emphasizing the importance of providing modern conveniences to guests.

- **Best and Worst Hotels:** By identifying the top 10 best and worst hotels, travelers can make more informed choices, aligning with their preferences and expectations.

- **Tailoring to Traveler Categories:** Recognizing the preferences of various traveler categories (e.g., families, couples, business travelers) allows hotels to tailor their services and excel in customer satisfaction.

- **Seasonal Considerations:** Seasonal variations affect the perceived quality of hotels, emphasizing the need for tailored services during different periods of the year.

**The analysis goes beyond mere numbers, aiming to guide both travelers and the hotel industry towards a better understanding of guest experiences.**

# Thanks!

----------------------------------------------------------------------------------------------