In [None]:
Plotting basics with Pandas/ Data Analysis and Plotting of Hotel Ratings and Trip Type

In [None]:
# Data Analysis and Plotting of Hotel Ratings and Trip Types

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data/Data_TripAdvisor_v1 - Data.csv', low_memory=False)

df.head()

df.info()

##### 1) Analyze Trip Type Distribution

trip_type_counts = df['Trip Type'].value_counts()

##### 2) Create a pie chart to identify the most popular Trip Type

# Create a figure and axis
fig, ax = plt.subplots()

trip_type_counts.plot(kind="pie",
                      labels=trip_type_counts.index,
                      autopct='%1.1f%%',
                      title="Most Popular Trip Type",
                      ax=ax);

##### 3) Analyzing Hotel City Distribution
The `HOTEL_CITY` column contains several cities where the hotels are present. Now, use the `value_counts()` method on the hotel city column and store it in the `city_counts` variable. 

city_counts = df['HOTEL_CITY'].value_counts()

##### 4) Create a bar chart to determine the most popular city where the highest number of hotels are present

# Create a figure and axis
fig, ax = plt.subplots(figsize=(12, 6))

city_counts.plot(kind='bar',
                 xlabel='Hotel City',
                 ylabel='Count',
                 title='Most Popular Hotel Cities',
                 ax=ax);

##### 5) Analyzing Rating Distribution

rating_counts = df['Rating'].value_counts()

##### 6) Create a bar plot to visualize the distribution of customer ratings

# Create a figure and axis
fig, ax = plt.subplots(figsize=(8, 6))

rating_counts.plot(kind='bar', 
                    xlabel='Rating', 
                    ylabel='Count', 
                    title='Distribution of Customer Ratings', 
                    ax=ax);

##### 7) Which rating was given by the most people?

Look at the plot of Activity 3.

4


##### 8) Count Hotel Timezones


hotel_timezone_counts = df['HOTEL_TIMEZONE'].value_counts()

##### 9) Calculate the percentage of each hotel timezone 

hotel_timezone_percents = (hotel_timezone_counts / len(df)) * 100

##### 10) Create a pie chart to visualize the percentage distribution of hotel timezones

# Create a figure and axis
fig, ax = plt.subplots(figsize=(8, 6))

hotel_timezone_percents.plot(kind='pie',
                             labels=hotel_timezone_percents.index,
                             autopct='%1.1f%%',
                             title='Percentage Distribution of Hotel Timezones',
                             ax=ax);

##### 11) Which timezone most of the hotels are in?
- Central
- Mountain
- Eastern
- Pacific

For the next activity, we need to calculate the mean rating for each `HOTEL_TIMEZONE`. To do this, we will use the `pivot_table()` function to group the data by `HOTEL_TIMEZONE` and calculate the mean `Rating` for each group. Then, we will convert the result to a DataFrame with columns `HOTEL_TIMEZONE` and mean `Rating`, and store it in the `timezone_ratings` variable.

> Note: The `pivot_table()` method is out of scope for this project, which is why we have provided the code for you.

timezone_ratings = df.pivot_table(index='HOTEL_TIMEZONE', values='Rating', aggfunc='mean').reset_index()

##### 12) Create a bar chart to visualize average ratings by Hotel Timezone using subplots

# Create a figure and axis
fig, ax = plt.subplots(figsize=(8, 6))

# Create a bar plot using pandas
timezone_ratings.plot(kind='bar',
                      x='HOTEL_TIMEZONE',
                      y='Rating',
                      ax=ax,
                      xlabel='Hotel Timezone',
                      ylabel='Average Rating',
                      title='Average Rating by Hotel Timezone');

##### 13) What are the two time zones with the highest average ratings?

- Central
- Eastern
- Mountain
- Pacific

For the next activity, we need to calculate the mean `Rating` for each `USER_STATE` using the `pivot_table()` function. We will store the result in the `user_state_ratings` DataFrame. Then, we will sort the `user_state_ratings` DataFrame by the `Rating` column in descending order.

> Note: The `pivot_table()` method is out of scope for this project, which is why we have provided the code for you.

user_state_ratings = df.pivot_table(index='USER_STATE', values='Rating', aggfunc='mean').reset_index()

user_state_ratings = user_state_ratings.sort_values(by='Rating', ascending=False)

##### 14) Create a bar chart to visualize average ratings by User State using subplots

# Create a figure and axis
fig, ax = plt.subplots(figsize=(12, 6))

user_state_ratings.plot(kind='bar',
                        x='USER_STATE',
                        y='Rating',
                        ax=ax,
                        xlabel='User State',
                        ylabel='Average Rating',
                        title='Average Ratings by User State',
                       rot=90); # This sets the rotation directly

##### 15) Which two user states have given the highest rating?

- ND
- DC
- GA
- WV
- ME

##### 16) Filter top-rated hotels

top_rated_hotels = df[df['Rating'] >= 4]

##### 17) Select data for a specific hotel timezone


timezone = 'Eastern'
timezone_data = top_rated_hotels[top_rated_hotels['HOTEL_TIMEZONE'] == timezone]

##### 18) Get unique trip types and their counts


trip_types = timezone_data['Trip Type'].value_counts()

##### 19) Calculate the percentage of each trip type


total_trips = sum (trip_types.values)
trip_type_percents = pd.Series([(count / total_trips) * 100 for count in trip_types.values], index=trip_types.index)

##### 20) Visualizing Trip Type Distribution with a Pie Chart

# Create a figure and axis
fig, ax = plt.subplots(figsize=(8, 6))

# Create a pie chart using pandas
trip_type_percents.plot(kind='pie',
                        labels=trip_types.index,
                        autopct='%1.1f%%',
                        ax=ax,
                        title='Distribution of Trip Types for Top-Rated Hotels');

For the next activity, we need to find the top 5 `USER_STATE` by the number of ratings and store them in `top_user_states`. Then, we will filter the data to include only the top 5 user states and store it in `top_user_state_data`. 

Next, we will use the `pivot_table()` function to find the mean `Rating` for each combination of `USER_STATE` and `HOTEL_TIMEZONE`. We will save the result in `user_state_timezone_ratings`.

> Note: The `pivot_table()` method is out of scope for this project, which is why we have provided the code for you.

top_user_states = df['USER_STATE'].value_counts().nlargest(5).index

top_user_state_data = df[df['USER_STATE'].isin(top_user_states)]

user_state_timezone_ratings = top_user_state_data.pivot_table(
    index=['USER_STATE', 'HOTEL_TIMEZONE'],
    values='Rating',
    aggfunc='mean'
).reset_index()

##### 21) Create a bar chart to visualize regional rating trends for the top user states across hotel locations using pivoted data.

> Note: The `pivot()` method is out of scope for this project, which is why we have provided the code for you.

# Pivot the data

pivoted_timezone = user_state_timezone_ratings.pivot(index='USER_STATE', columns='HOTEL_TIMEZONE', values='Rating')

fig, ax = plt.subplots(figsize=(10, 6))

pivoted_timezone.plot(kind='bar',
                      legend=True,
                      ax=ax,
                      xlabel='Hotel Timezone',
                      ylabel='Average Rating',
                      title='Average Ratings of Top User States across Hotel Timezones',
                      rot=45);  # This sets the rotation directly

For the next activity, we need to find the top 5 `USER_STATE` by the count of ratings. Then, we will filter the data to include only the top 5 user states. Next, we will calculate the mean rating for each combination of user state and hotel state using the `pivot_table()` function. Finally, we will sort the data by `Rating` in descending order.

> Note: The `pivot_table()` method is out of scope for this project, which is why we have provided the code for you.

top_states = df['USER_STATE'].value_counts().nlargest(5).index
top_data = df[df['USER_STATE'].isin(top_states)]
ratings = top_data.pivot_table(index=['USER_STATE', 'HOTEL_STATE'], values='Rating', aggfunc='mean').reset_index()
ratings = ratings.sort_values(by='Rating', ascending=False)

##### 22) Visualizing Average Ratings with a Bar Chart

> Note: The `pivot()` method is out of scope for this project, which is why we have provided the code for you.

# Pivot the data
pivoted_state = ratings.pivot(index='USER_STATE', columns='HOTEL_STATE', values='Rating')

# Create a figure and axis
fig, ax = plt.subplots(figsize=(12, 8))

# Create a bar chart using pandas
pivoted_state.plot(kind="bar",
                   legend=True,
                   ax=ax,
                   xlabel='Hotel State',
                   ylabel='Average Rating',
                   title='Average Ratings of Top User States by Hotel State',
                   rot=45);