# <font size='8px'><font color='#FFFF00'>**Project Name**    - <font color='#FFFFFF'>Airbnb Bookings Analysis
















#####<font size='5px'> **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**

Since its inception in 2008, Airbnb has transformed the way people travel, offering unique and personalized lodging experiences across the globe. Today, Airbnb is recognized as a pioneering platform that connects hosts and guests, expanding travel possibilities. With millions of listings worldwide, the importance of data analysis within Airbnb cannot be overstated.                  
This project aims to explore and analyze a dataset comprising approximately 49,000 listings across 16 columns. The dataset is a rich mix of categorical and numeric values, providing valuable insights into guest behavior, host performance, and overall market dynamics.

***Objectives***:

**Data Exploration**: Conduct an initial exploration of the dataset to understand its structure, key statistics, and the distribution of values within each column.

**Customer Behavior Analysis**: Analyze user behavior patterns by examining booking trends, popular listing types, pricing strategies, and seasonal variations in demand.

**Host Performance Evaluation**: Assess host performance metrics such as occupancy rates, review scores, and response times to identify best practices and areas for improvement.

**Market Insights**: Identify key factors that influence booking decisions, such as location, amenities, and pricing strategies, to guide marketing initiatives.

**Innovative Services Development**: Explore opportunities for implementing innovative services based on customer preferences and market gaps identified through data analysis.

***Methodology***:

**Data Cleaning and Preprocessing**: Address any missing values, outliers, and inconsistencies to ensure data integrity.

**Exploratory Data Analysis (EDA)**: Utilize visualizations and statistical methods to uncover trends, correlations, and patterns within the data.

**Predictive Analytics**: Apply machine learning techniques to forecast booking trends and provide actionable insights for Airbnb's strategic decision-making.

***Expected Outcomes***:

The analysis will yield actionable insights that can enhance the user experience for guests, improve host performance, and inform Airbnb's marketing and service development strategies. The findings will contribute to a deeper understanding of the dynamics within the platform, ultimately supporting Airbnb's mission of providing unique and personalized travel experiences.

---



# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


As Airbnb continues to shape the travel landscape by connecting hosts and guests globally, the platform faces the challenge of optimizing its offerings based on a vast dataset comprising approximately 49,000 listings. Despite the wealth of data available, a detailed understanding of the factors that influence guest bookings and host performance is lacking. This project aims to address the following critical questions:

1. What key attributes, such as price, location, amenities, and property type, significantly impact guests’ booking decisions?

2. How do host performance metrics—including response rates, review scores, and cancellation histories—affect the likelihood of securing bookings?

3. What seasonal trends and patterns can be identified in guest behavior that may inform pricing strategies and marketing initiatives?

By conducting a thorough analysis of the dataset, this project seeks to uncover actionable insights that can guide Airbnb in enhancing the user experience for guests while simultaneously empowering hosts to improve their listings. Ultimately, the findings aim to contribute to the strategic decision-making process at Airbnb, fostering a deeper understanding of market dynamics and driving business growth.

#### **Define Your Business Objective?**

The primary business objective of this project is to leverage data analysis to derive actionable insights that will enhance Airbnb's understanding of guest behavior and host performance. Specifically, the project aims to:

1. **Optimize User Experience**: Identify the key factors influencing guest bookings to tailor marketing strategies and improve the overall user experience on the platform.

2. **Enhance Host Support**: Provide data-driven recommendations for hosts by analyzing performance metrics, helping them improve their listings and increase booking rates.

3. **Inform Strategic Decision-Making**: Uncover seasonal trends and patterns in booking behaviors to guide Airbnb’s pricing strategies, promotional campaigns, and service offerings, ultimately driving business growth.

By achieving these objectives, the project will support Airbnb's mission of providing unique, personalized travel experiences while fostering stronger connections between guests and hosts.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd

# Import Data Visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns

# Importing Date Time library for Date column
from datetime import datetime as dt

# Import Warnings
#it is important to address these warnings in order to avoid potential problems.
import warnings
warnings.filterwarnings('ignore')

### Dataset Loading

In [None]:
# Load dataset (importing my CSV file for analysis)

# Creating a file path to easily access my CSV file for analysis
filepath = 'Airbnb NYC 2019.csv'

# Creating DataFrame of my csv file
Airbnb_df = pd.read_csv(filepath)

### Dataset First View

In [None]:
# Dataset First Look
Airbnb_df

In [None]:
# Take a look on the first 10 rows of the dataframe
print('The first 10 rows of the dataframe are: \n')
Airbnb_df.head(10)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print(f'The number of rows in the dataframe are: {Airbnb_df.shape[0]}')
print(f'The number of columns in the dataframe are: {Airbnb_df.shape[1]}')

### Dataset Information

In [None]:
# Dataset Info
# To get information about the DataFrame
Airbnb_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
# To get information about duplicate values in the DataFrame using the .duplicated() function, then get the sum of duplicate values
print(f'The number of duplicate values in the dataframe are: {Airbnb_df.duplicated().sum()}')

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
# To get information about missing values in the DataFrame
Airbnb_df.isnull().sum()

In [None]:
# Visualizing the missing values

# To visualize the distribution of null values using a heatmap
sns.heatmap(Airbnb_df.isnull(), cbar=False)

In [None]:
# Calculate and display the percentage of missing values for each column
missing_percentage = (Airbnb_df.isnull().sum() / len(Airbnb_df)) * 100
missing_percentage = missing_percentage[missing_percentage > 0].sort_values(ascending=False)
print(missing_percentage)

In [None]:
# Plot the percentage of missing values per column
plt.figure(figsize=(10, 6))
sns.barplot(x=missing_percentage.index, y=missing_percentage.values, palette="viridis")
plt.xticks(rotation=90)
plt.title("Percentage of Missing Values by Column")
plt.xlabel("Columns")
plt.ylabel("Percentage of Missing Values")
plt.show()

### What did you know about your dataset?





The dataset comprises 48,895 listings from Airbnb, containing 16 columns that provide detailed information about each listing. The key features include:

1. **Unique Identifiers**: The **id** and **host_id** columns serve as unique identifiers for each listing and host, respectively.

2. **Listing Attributes**: The **name**, **room_type**, **price**, and **minimum_nights** columns describe the characteristics of the listings.

3. **Host Information**: The **host_name** column identifies the property owner or authorized person.

4. **Location Data**: The **neighbourhood_group** and **neighbourhood** columns provide geographic context, while **latitude** and **longitude** specify exact locations.

5. **Reviews and Ratings**: The **number_of_reviews**, **last_review**, and **reviews_per_month** columns offer insights into the listing's popularity and customer feedback.

6. **Availability**: The **availability_365** column indicates how many days a listing is available for booking in a year.

#<font size='5px'># **Key Findings from the Dataset**--


1. **Missing Values**:
   - The dataset contains missing values in several columns:
     - The `name` column has **16** missing entries.
     - The `host_name` column has **21** missing entries.
     - The `last_review` and `reviews_per_month` columns have **10,052** missing entries each.
   - These missing values can impact the analysis and need to be addressed, either through imputation or removal.

2. **No Duplicate Values**:
   - There are **0 duplicate entries** in the dataset, ensuring that each listing is unique and making it reliable for analysis.

3. **Data Types**:
   - The dataset includes various data types: integers, floats, and objects (strings), which will require different handling during analysis.

4. **Noted info**:
   - Convert last_review from object to datetime to enable date manipulations.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
# To get the list of columns in the DataFrame
print('The columns in the dataframe are: \n')
Airbnb_df.columns

In [None]:
# Dataset Describe
# To get the summary statistics of the DataFrame
Airbnb_df.describe(include='all')

### Variables Description

* **id**:  A unique identifier assigned to each listing.
          (example: 123 or 124)

* **name**:  The name of the listing as provided by the host.
          (example: "Charming Studio in Downtown" etc)

* **host_id**:  A unique identifier assigned to each host, used to distinguish hosts from one another.
          (example: 2787 or 2845)

* **host_name**:  The name of the host (property owner or authorized person).
          (example: "John" or "Marry" etc.)

* **neighbourhood_group**: 	The general geographic area where the listing is located.
          (example: 	"Brooklyn" or "Manhattan" etc.)
  
* **neighbourhood**: The specific neighborhood of the listing.
          (example: "Harlem" or "Midtowm" or "East Harlem" etc.)

* **latitude**: 	The latitude coordinate of the listing's location.
          (example: 40.64749 or 40.75362)

* **longitude**: The longitude coordinate of the listing's location.
          (example: -73.97237	or -73.98377 )

* **room_type**: The type of room being offered in the listing.
          (example: entire home, private room, shared room).

* **price**: The price per night for renting the listing.
          (example: 149, 225, 150 )

* **minimum_nights**: The minimum number of nights a guest must book the listing.
          (example: 1,2,3,4,5,6 )

* **number_of_reviews**: The total number of reviews received by the listing.
        (example: 9, 45, 0, 270)

* **last_review**: The date of the last review submitted for the listing.
          (example: 2018-08-21)

* **reviews_per_month** : Average number of reviews that a listing receives per month
          (example: Nan, 0.21, 0.23)

* **calculated_host_listings_count** : Total number of listings that a host has on the Airbnb platform
          (example: 1,2,5,78)

* **availability_365** : The number of days in a year that a listing is available for booking on the Airbnb platform based on the listing's calendar, and reflects the number of days in the future that the listing is marked as available for booking.
          (example: 365, 355, 365)

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
# To get the unique values in each column of the DataFrame
Airbnb_df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# To make a copy of the original DataFrame
df = Airbnb_df.copy()
# Making a copy of the dataset is important to ensure that our original dataset remains intact and safe from any modifications.

In [None]:
# Dealing with Null values in name, host_name, reviews_per_month
# To fill missing values in the 'name' column with 'Unknown'
df['name'].fillna('Unknown', inplace=True)

# To fill missing values in the 'host_name' column with 'Unknown'
df['host_name'].fillna('Unknown', inplace=True)

# To fill missing values in the 'reviews_per_month' column with 0
df['reviews_per_month'].fillna(0, inplace=True)

In [None]:
# To get information about missing values in the DataFrame
df.isnull().sum()

In [None]:
# Check if number_of_reviews == 0
len(df[df['number_of_reviews'] == 0])

In [None]:
# Converting the last_review column data type from string to datetime format
df['last_review'] = pd.to_datetime(Airbnb_df['last_review'], errors='coerce')

# Fill with median date, which might represent a common review period
median_date = df['last_review'].median()
df['last_review'].fillna(median_date, inplace=True)


In [None]:
# Check if price == 0
len(df[df['price'] == 0])

In [None]:
# Drop rows where 'price' is 0
df.drop(df[df['price'] == 0].index, inplace=True)

In [None]:
# Creating a box plot to visualize outliers for price.
plt.figure(figsize=(10, 6))
sns.boxplot(x=df['price'])
plt.title('Box Plot of Price')
plt.show()

In [None]:
# Dealing with outliears
# To remove outliers in the dataset
Q1 = df['price'].quantile(0.25)
Q3 = df['price'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR # creating a lower boundry for outliers
upper_bound = Q3 + 1.5 * IQR # creating a upper boundry for outliers

# removing outliers
df = df[(df['price'] >= lower_bound) & (df['price'] <= upper_bound)]

In [None]:
# Creating a box plot to visualize outliers for minimum_nights.
plt.figure(figsize=(10, 6))
sns.boxplot(x=df['minimum_nights'])
plt.title('Box Plot of minimum_nights')
plt.show()

In [None]:
# Dealing with outliers in minimum_nights column.
Q1_nights = df['minimum_nights'].quantile(0.25)
Q3_nights = df['minimum_nights'].quantile(0.75)
IQR_nights = Q3_nights - Q1_nights
upper_limit_nights = Q3_nights + 1.5 * IQR_nights

df = df[df['minimum_nights'] <= upper_limit_nights]

In [None]:
# Checking for seasonal bookings

# Step 1: Extract Month from the last_review
df['month'] = df['last_review'].dt.month

# Step 2: Create a function to determine the season
def get_season(month):
    if month in [12, 1, 2]:
        return 'Winter'
    elif month in [3, 4, 5]:
        return 'Spring'
    elif month in [6, 7, 8]:
        return 'Summer'
    elif month in [9, 10, 11]:
        return 'Fall'

# Step 3: Apply the function to create a season column
df['season'] = df['month'].apply(get_season)

# Step 4: Group by month and season to aggregate the data
monthly_bookings = df.groupby('month').agg({
    'number_of_reviews': 'sum',  # Assuming this indicates bookings
    'price': 'mean'  # Average price for the month
}).reset_index()

seasonal_bookings = df.groupby('season').agg({
    'number_of_reviews': 'sum',  # Assuming this indicates bookings
    'price': 'mean'  # Average price for the season
}).reset_index()

# Step 5: Display the results
print("Monthly Bookings:")
print(monthly_bookings)

print("\nSeasonal Bookings:")
print(seasonal_bookings)

In [None]:
# Droping host_name and last_review columns because they are not usefull for the analysis
df.drop(['host_name', 'last_review'], axis=1, inplace=True)

In [None]:
# Group by 'neighbourhood_group' and aggregate with mean of 'price' and sum of 'number_of_reviews'
grouped_data = df.groupby('neighbourhood_group').agg({
    'price': 'mean',
    'number_of_reviews': 'sum'
}).reset_index()

# Rename columns for clarity
grouped_data.rename(columns={'price': 'mean_price', 'number_of_reviews': 'total_reviews'}, inplace=True)

# Sorting the grouped_data in decending order
grouped_data.sort_values(by='mean_price', ascending=False)



In [None]:
# Group by 'neighbourhood' and aggregate with mean of 'price' and sum of 'number_of_reviews'
grouped_neighbourhood_data = Airbnb_df.groupby('neighbourhood').agg({
    'price': 'mean',
    'number_of_reviews': 'sum'
}).reset_index()

# Rename columns for clarity
grouped_neighbourhood_data.rename(columns={'price': 'mean_price', 'number_of_reviews': 'total_reviews'}, inplace=True)

# Sorting the grouped_neighbourhood_data in decending order
grouped_neighbourhood_data.sort_values(by='mean_price', ascending=False)


In [None]:
# Creating a new column for number_of_bookings for each listing
df['number_of_bookings'] = (365 - df['availability_365']) / df['minimum_nights']

# Creating a new column for revenue based on price, minimum_nights, and number_of_bookings
df['revenue'] = df['price'] * df['minimum_nights'] * df['number_of_bookings']


In [None]:
# Grouping by 'neighbourhood_group' and 'room_type' with aggregations on price, number_of_reviews, and revenue
grouped_neighbourhood_room = df.groupby(['neighbourhood_group', 'room_type']).agg({
    'price': 'mean',
    'number_of_reviews': 'sum',
    'revenue': 'mean'
}).reset_index()

# Renaming columns for clarity
grouped_neighbourhood_room.rename(columns={
    'price': 'mean_price',
    'number_of_reviews': 'total_reviews',
    'revenue': 'avg_revenue'
}, inplace=True)

# Sorting by average revenue or mean price if desired
grouped_neighbourhood_room.sort_values(by='avg_revenue', ascending=False, inplace=True)
grouped_neighbourhood_room


In [None]:
# Group by 'neighbourhood_group' and get unique 'neighbourhood' names
unique_neighbourhoods = df.groupby('neighbourhood_group')['neighbourhood'].unique().reset_index()

# convert the list of unique neighbourhoods to a string for better readability
unique_neighbourhoods['neighbourhood'] = unique_neighbourhoods['neighbourhood'].apply(lambda x: ', '.join(x))
unique_neighbourhoods


In [None]:
# Calculate price per night
df['price_per_night'] = df['price'] / df['minimum_nights']

# Calculate total reviews per year
df['reviews_per_year'] = df['reviews_per_month'] * 12


In [None]:
# Calculate the mean for each of the three columns to define thresholds
avg_bookings = df['number_of_bookings'].mean()
avg_price = df['price'].mean()
avg_reviews = df['number_of_reviews'].mean()

# Define a function to categorize engagement based on conditions
def engagement_category(row):
    # Criteria for good engagement:
    if row['number_of_bookings'] >= avg_bookings and row['price'] <= avg_price and row['number_of_reviews'] >= avg_reviews:
        return 'Good Engagement'
    else:
        return 'Bad Engagement'

# Apply the function to each row to create a new 'engagement_category' column
df['engagement_category'] = df.apply(engagement_category, axis=1)

# Display the updated DataFrame with the new column
df[['number_of_bookings', 'price', 'number_of_reviews', 'engagement_category']]


In [None]:
# Calculating average location for each neighbourhood group
avg_location = df.groupby(['neighbourhood_group', 'season']).agg({
    'latitude': 'mean',
    'longitude': 'mean',
    'price': 'mean',
    'number_of_reviews': 'sum'
}).reset_index()

# Renaming columns for clarity
avg_location.rename(columns={'price': 'avg_price', 'number_of_reviews': 'total_reviews'}, inplace=True)


### What all manipulations have you done and insights you found?

<font size='5px'> **Data Manipulation and Insights** --

In this project, I undertook extensive data wrangling on the Airbnb dataset to prepare it for insightful analysis. The following steps highlight the manipulations performed and the key insights derived:

1. **Handling Missing Values** :

* I addressed null values in critical columns such as **name**, **host_name**, and **reviews_per_month**. Missing values in the **name** and **host_name** columns were filled with 'Unknown' to ensure completeness, while **reviews_per_month** was set to 0, reflecting listings with no reviews.

* **The last_review** column was converted to a datetime format, and missing dates were filled with the median date, representing a common review period.

2. **Outlier Detection and Removal** :

* I utilized box plots to visualize and detect outliers in the **price** and **minimum_nights** columns. Outliers were identified using the Interquartile Range (IQR) method and subsequently removed to enhance the quality of the dataset.

3. **Feature Engineering** :

* New features were created to enrich the dataset for analysis:

 *  A **number_of_bookings** column was derived from **availability_365** and **minimum_nights**.

 * A revenue column was calculated based on the **price**, **minimum_nights**, and **number_of_bookings**.

 4. **Engagement Category** :

 * I defined an **engagement_category** column based on **number_of_bookings**, **price**, and **reviews** to classify listings into 'Good Engagement' or 'Bad Engagement.' This segmentation aids in understanding which listings are performing better in terms of guest engagement.

 5. **Seasonal Analysis** :

 * I extracted the month from the **last_review **column and created a **season column** based on the month. This classification allowed for a detailed analysis of booking trends across different seasons.

  * By aggregating the **number of reviews** (as a proxy for bookings) and average
  prices by month and season, I gained insights into seasonal booking patterns.

  *  *Key Findings:*
     
    * Monthly Trends: The total bookings varied by month, providing insights into peak booking periods.

    * Seasonal Trends: The analysis highlighted which seasons attract more bookings and the average pricing during these periods, helping hosts adjust their pricing strategies accordingly.

 6. **Group Aggregations** :

 * The data was grouped by **neighbourhood_group** and **room_type** to identify **mean prices** and **total reviews**. This helped uncover which **neighborhoods** and **room types **offer the best value and have higher engagement levels.


**Conclusion**

This data wrangling process transformed the raw Airbnb dataset into a structured format suitable for in-depth analysis. By addressing missing values, removing outliers, creating new features, and conducting seasonal analyses, I developed valuable insights into guest behavior, pricing strategies, and seasonal trends. These findings can be instrumental for Airbnb hosts and stakeholders in making informed decisions to enhance user experience and maximize bookings.



## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

### **Univariate Analysis**

#### Chart - 1 -- Histogram


In [None]:
# Chart - 1 visualization code

# Histogram for prize distribution

plt.figure(figsize=(10, 6)) # For fixing fig. size

sns.histplot(df['price'], bins=30, kde=True) # Ploating a histogram for prize with the help of seaborn (activate kde for better understanding of prize distribution)

plt.title('Distribution of Airbnb Prices') # Giving a title to the plot

plt.xlabel('Price') # Set x_label for price

plt.ylabel('Frequency') # Set Y_label
plt.show()


##### 1. Why did you pick the specific chart?

I chose a histogram to visualize the distribution of Airbnb prices because it effectively captures the frequency of listings at different price points. Given the dataset's focus on various aspects of Airbnb listings, a histogram provides a clear representation of how prices are spread across the dataset. This visualization allows us to quickly identify the most common price ranges and observe any potential outliers or trends that could inform pricing strategies for hosts.

##### 2. What is/are the insight(s) found from the chart?

From the histogram of Airbnb prices, several insights can be derived:

 - **Price Range**: Most Airbnb listings tend to be priced between $50 and $150 per night. This shows that many hosts are pricing their properties similarly to stay competitive.


 - **Distribution Shape**: The histogram may be right-skewed, meaning there are many affordable listings and a few expensive ones. This suggests a market for luxury properties that appeal to a different type of guest.


 - **Outliers**: Listings with very high prices (shown on the far right of the histogram) could indicate luxury accommodations. If there are many high-priced listings but not enough guests, this might mean a crowded luxury market.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***Positive Impacts*** :

 - **Smart Pricing** : Knowing the common price ranges helps Airbnb and hosts set competitive prices, which can lead to more bookings. For example, if most guests book properties priced under $150, new hosts might want to price their listings similarly to attract more guests.


 - **Targeted Marketing** : Understanding price distribution helps in targeting marketing efforts. If many guests are looking for budget-friendly options, marketing can focus on those affordable listings to draw in more customers.


 ***Potential Negative Impacts*** :

 - **High-Priced Listings** : If many listings are priced much higher than average but don’t get booked, it indicates a problem. These high-priced listings may sit empty, leading to loss of income for hosts and creating an imbalance in the market.


 - **Saturation in Luxury Segment** : If the histogram shows too many high-priced listings without enough demand, it could lead to intense competition. Hosts might have to lower their prices to compete, which could reduce their revenue and negatively affect Airbnb's overall earnings.

#### Chart - 2 -- Count plot


In [None]:
# Chart - 2 visualization code

# Count plot for Distribution of Room Types Distribution of Room Type Visualization

plt.figure(figsize=(8, 5)) # Fix fig. size

# Ploting countplot for Room type disttibution visulization
sns.countplot(data=df, x='room_type', palette='viridis')

plt.title('Distribution of Room Types') # Set the title for count plot

plt.xlabel('Room Type') # Set x_label

plt.ylabel('Count') # Set y)label
plt.show()

##### 1. Why did you pick the specific chart?

I chose a count plot for visualizing the distribution of room types because it effectively displays the frequency of each category (e.g., entire home, private room, shared room) within the dataset. This chart makes it easy to compare the popularity of different room types at a glance and allows us to quickly identify which type is most commonly listed on Airbnb. This insight is essential for understanding market offerings and guest preferences.

##### 2. What is/are the insight(s) found from the chart?

The chart likely shows that a particular room type, private rooms, has the highest number of listings, indicating a preference among guests for privacy and comfort. On the other hand, entire home/apartment listings are the second most preferred option for guests and travelers based on their comfort. Conversely, if shared rooms have fewer listings, it might suggest that guests are less inclined to book them.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***Positive Business Impact*** :

 - **Better Marketing** : By knowing which room types are popular (like entire homes), Airbnb can focus its marketing on these listings. This could attract more guests who want spacious places to stay, leading to more bookings and happier customers.

 - **Motivating Hosts** : Insights from the data can encourage hosts to offer the types of rooms that guests prefer. For instance, if private rooms are in demand, new hosts might be inspired to provide these options, increasing their chances of getting booked.


*** Potential Negative Growth Insights*** :

 - **Too Many Similar Listings** : If there are too many listings for one type of room (like entire homes), it might create too much competition among hosts. This could lead to lower prices, hurting the income of hosts and causing frustration if they feel undervalued.

 - **Ignoring Other Options** : If everyone focuses on listing entire homes, there might be fewer affordable options like shared or private rooms. This could turn away budget-conscious travelers who are looking for cheaper stays, which may hurt Airbnb’s image as a platform that caters to all types of guests.

#### Chart - 3 -- Histogram




In [None]:
# Chart - 3 visualization code

# Visulization for minimum nights requirment

plt.figure(figsize=(8, 5)) # Fix fig, size

# Plot histogram for Minimum Nights Requirement
sns.histplot(df['minimum_nights'], bins=30, kde=True, color = 'orange')

plt.title('Minimum Nights Requirement') # Set title for histogram

plt.xlabel('Minimum Nights') # Set x_label

plt.ylabel('Frequency') # Set y_label
plt.show()

##### 1. Why did you pick the specific chart?

I chose a histogram to visualize the minimum nights requirement because it effectively shows the distribution of this variable across all listings. A histogram allows us to see how many listings have different minimum night requirements, helping us understand trends and patterns in booking practices.

##### 2. What is/are the insight(s) found from the chart?

From the histogram of minimum nights requirement, we can derive several insights:


 1. **Common Minimum Nights** : The histogram might reveal that a majority of listings have a minimum night requirement of one night or two nights. This indicates that many hosts prefer shorter stays, which could attract more guests looking for flexibility.

 2. **Outliers in Minimum Nights** : If there are several listings with a high minimum night requirement (e.g., 30 nights), it suggests a specific segment of the market catering to long-term stays, which may appeal to different types of travelers (like those on extended vacations or relocations).

 3. **Booking Behavior Insights** : The distribution may help identify how rigid or flexible hosts are with their booking policies. For example, a lot of listings with a minimum of three to five nights could indicate a trend towards medium-length stays

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***Positive Business Impact*** :


 - **Enhanced Booking Flexibility** : Understanding that most listings have a low minimum night requirement can help Airbnb and hosts promote these properties to attract short-term travelers, thereby increasing overall booking rates and customer satisfaction.

 - **Targeted Marketing for Long Stays** : If there are many listings with higher minimum nights, Airbnb can create marketing strategies to target longer-term guests, potentially increasing the value of those listings and benefiting hosts who cater to this market.

***Potential Negative Growth Insights*** :

 - **High Minimum Night Requirements** : Listings with excessive minimum nights may deter guests looking for short stays. If many hosts set high requirements without corresponding demand, it could lead to lower occupancy rates for those properties, negatively affecting host income and Airbnb’s overall business.

 - **Market Imbalance** : If the histogram shows a clear bias towards low minimum nights, it might create an imbalance where hosts offering longer stays feel pressured to lower their minimums to compete, possibly leading to dissatisfaction among those who prefer to cater to longer bookings.

#### Chart - 4 -- Count Plot

In [None]:
# Chart - 4 visualization code

# Visualization of neighbourhood group

plt.figure(figsize=(10, 6)) # fix fig, size

# Creating count plot for visualization of neighbourhood group
sns.countplot(data=df, x='neighbourhood_group', palette='coolwarm')

plt.title('Listings Count by Neighbourhood Group') # Set the title for count plot

plt.xlabel('Neighbourhood Group') # Set x_label

plt.xticks(rotation=45) # Rotate x-axis labels for better readability
plt.ylabel('Count') # Set y_label
plt.show()


##### 1. Why did you pick the specific chart?

I chose a count plot to visualize the listings count by neighborhood group because it clearly displays the number of Airbnb listings in each neighborhood. This type of chart effectively highlights the distribution of listings across different areas, making it easy to compare their popularity and market saturation at a glance.



##### 2. What is/are the insight(s) found from the chart?

From the count plot of neighborhood groups, we can derive several insights:

 1. **Popular Neighborhoods** : The chart shows which neighborhood groups have the most listings (e.g., Brooklyn and Manhattan). If one area has many listings, it likely indicates a strong demand for accommodations there, which can lead to more guests booking stays. We can clearly see that the Bronx and Staten Island have lower bookings.

 2. **Growth Opportunities** : Areas with fewer listings may be opportunities for new hosts. If these neighborhoods are not well represented, there might be guests looking for places to stay there, making it a good chance for new listings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***Positive Business Impact*** :


 1. **Better Marketing Strategies** : Knowing which neighborhoods are popular helps Airbnb and hosts target their marketing. For example, promoting listings in high-demand areas can attract more guests, which improves booking rates and host satisfaction.

 2. **Spotting Under-Served Areas** : The chart can also help identify neighborhoods with fewer listings. Airbnb can encourage more hosts to offer accommodations in these areas, diversifying options for guests.

***Potential Negative Insights*** :

 1. **Oversaturation Issues** : If a few neighborhoods have too many listings, hosts may end up competing on price, which can lower their profits. This might lead to frustration among hosts if they feel their properties aren’t being valued fairly.

 2. **Neglecting Less Popular Areas** : Focusing too much on popular neighborhoods might lead to fewer listings in less popular areas. This could reduce choices for budget travelers and hurt Airbnb’s image as a platform that offers something for everyone.

#### Chart - 5 -- Histogram


In [None]:
# Chart - 5 visualization code

# Visualization of availability(Days in year)

plt.figure(figsize=(10, 5)) # Fix fig. size

# Creating histogram for availability
sns.histplot(df['availability_365'], bins=50, kde=True, color = 'skyblue')

plt.title('Listings Availability (Days per Year)') # Set title for histogram

plt.xlabel('Availability in Days') # Set x_label

plt.ylabel('Frequency') # Set y_label

plt.show()


##### 1. Why did you pick the specific chart?

I chose a histogram for this chart because it effectively shows the distribution of the number of days listings are available throughout the year. By using a histogram, we can visualize how many listings are available for various ranges of days, making it easier to identify trends in availability among Airbnb properties.

##### 2. What is/are the insight(s) found from the chart?

From the histogram of availability_365, we can derive several insights:

 1. **General Availability Trends** : The histogram likely shows how many listings are available for the entire year versus those that are only available for part of the year. For instance, if many listings are clustered around 365 days, it suggests that many hosts are committed to renting their properties year-round.

 2. **Identifying Low Availability** : If there are a significant number of listings available for a few days (e.g., less than 30 days), it could indicate that these properties are used primarily for personal use or have other limitations.

 3. **Potential for Seasonal Listings** : If the histogram reveals peaks around certain ranges (like 0-30 days), it might suggest that some listings are only available during peak tourist seasons, indicating a potential market for short-term rentals during busy periods.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***Positive Business Impact*** :


 1. **Better Marketing Strategies** : By understanding when listings are available, Airbnb can better promote properties that are open all year. This could attract more long-term travelers who need a place to stay for an extended period.

 2. **Opportunities for Growth** : If many listings are only available for a few days, it indicates that there’s a chance for hosts to rent their properties more often. Airbnb can encourage these hosts to make their properties available for longer, which could lead to more bookings.

***Potential Negative Insights*** :


 1. **Imbalance in Options** : If many listings are only available for short stays, travelers looking for longer accommodations might not find enough choices. This could lead them to choose other platforms instead of Airbnb.

 2. **Wasted Potential** : If hosts are not renting out their properties year-round, they could be missing out on potential income. Airbnb might need to offer support or incentives to help these hosts keep their listings available more consistently.

#### **Bivariate Analysis**

#### Chart - 6 --Bar Chart

In [None]:
# Chart - 6 visualization code

# Visualization of average price by room_type

plt.figure(figsize=(8, 6)) # Fix fig. size

# Creating a bar plot for Visualization of average price by room_type
sns.barplot(data=df, x='room_type', y='price', estimator=np.mean, ci=None, palette='viridis')

plt.title('Average Price by Room Type') # Set title for barplot

plt.xlabel('Room Type') # Set X_label

plt.ylabel('Average Price') # Set Y_label

plt.show()


##### 1. Why did you pick the specific chart?

I chose a bar plot to visualize the average price by room type because bar plots are effective for comparing categorical data with numerical values. Here, we are interested in understanding how prices differ between various room types (Entire home/apt, Private room, Shared room, etc.). A bar plot makes it easy to see which room types are priced higher on average and allows for quick comparisons among categories, which aligns well with the goal of understanding price trends across different types of listings.


##### 2. What is/are the insight(s) found from the chart?

From the bar plot showing the average price by room type, several insights can be drawn:

 1. **Higher Pricing for Entire Homes** : Listings that are for entire homes or apartments tend to have a higher average price compared to private rooms or shared rooms. This likely reflects the added privacy, space, and amenities offered in entire properties.

 2. **Economical Options in Private and Shared Rooms** : Private rooms and shared rooms generally have a lower average price, which may appeal to budget-conscious travelers. This could indicate that these room types attract a different demographic, such as solo travelers or those prioritizing affordability.

 3. **Market Segmentation** : The difference in average prices suggests a segmentation of listings catering to various guest preferences and budgets. Entire homes are likely targeting families or groups, while private or shared rooms are more appealing for individuals or budget travelers.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***Positive Business Impact*** :


 1. **Targeted Marketing** : Knowing that entire homes usually have higher prices can help Airbnb promote these listings to families and groups looking for more space and privacy. At the same time, they can highlight private and shared rooms as budget-friendly options to attract cost-conscious travelers.

 2. **Better Pricing Strategies for Hosts** : Hosts can use this information to set prices that match guest expectations. For example, hosts with private rooms might focus on affordability, while those with entire homes could emphasize comfort and privacy. This approach can lead to more bookings and happier guests.

 3. **Customized Recommendations for Hosts** : Airbnb can suggest which room types to offer based on demand in different locations. For instance, if private rooms are popular in a certain area, Airbnb can encourage new hosts to list this type to meet guest demand.


***Potential Negative Growth Insights*** :

 1. **Too Many High-Priced Listings** : If a lot of hosts choose to list entire homes to get higher prices, it might create an oversupply. This could increase competition and drive down prices, which might reduce profitability for hosts.

 2. **Fewer Affordable Options** : If more hosts focus on entire homes, there might be fewer private or shared rooms available. This could make it harder for budget travelers to find affordable options, potentially pushing them to competitor platforms with more economical choices.

#### Chart - 7 -- Box plot

In [None]:
# Chart - 7 visualization code

# Visualization of neighbourhood_group vs price

plt.figure(figsize=(10, 6)) # Fix fig. size

# Creating boxplot for Visualization of neighbourhood_group vs price
sns.boxplot(data=df, x='neighbourhood_group', y='price', palette='pastel')

plt.title('Price Distribution by Neighbourhood Group') # Set the title for box plot

plt.xlabel('Neighbourhood Group') # Set X_label

plt.ylabel('Price') # Set Y_label

plt.show()


##### 1. Why did you pick the specific chart?

I choose boxplot because it is ideal for comparing price distributions across different neighbourhood_group values. This type of chart shows not only the range of prices but also highlights the median, quartiles, and potential outliers for each neighborhood group. This is useful for understanding price variations within each group and for spotting any outlier listings with unusually high or low prices.

##### 2. What is/are the insight(s) found from the chart?

Following insights can be drawn from the box plot:

 1. **Price Differences by Neighborhood** : The boxplot shows that some neighborhood groups have higher average prices, which may indicate these are popular or upscale areas. Lower-priced neighborhoods could be more appealing for budget travelers.

 2. **Price Range and Outliers** : The chart also shows the spread of prices within each neighborhood group. Some areas may have high-end properties that are outliers, which represent luxury listings.

 3. **Price Consistency** : In certain neighborhoods, prices are more consistent, while others show a broader range. A wider range could mean a mix of affordable and luxury listings, while consistent neighborhoods may target a specific guest type.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***Positive Business Impact*** :


 1. **Better Pricing and Marketing** : Knowing which areas have higher prices allows Airbnb to market them as premium neighborhoods, appealing to guests who want a luxury experience. Hosts can adjust their prices according to neighborhood trends, making their listings more competitive and attractive.

 2. **Helpful for New Hosts** : This data can help new hosts set prices that fit with the neighborhood average, making their listing more appealing to guests by aligning with what’s expected in that area.

 3. **Targeted Campaigns** : Airbnb can design campaigns focused on budget-friendly neighborhoods for cost-conscious travelers and premium neighborhoods for those seeking a high-end stay. This approach could attract a wider range of guests.

***Potential Negative Growth Insights*** :


 1. **Risk of Overpricing** : If some neighborhoods are much pricier, this could lead to overpricing, which may discourage budget travelers from booking in those areas, resulting in more vacant days for hosts.

 2. **Lack of Affordable Options** : If certain areas only have expensive listings, it may limit options for budget travelers. This could push budget-conscious guests to choose other platforms with more affordable listings, reducing Airbnb's potential bookings.

#### Chart - 8 -- Bar Chart

In [None]:
# Chart - 8 visualization code

# Visualization of number of reviews for neighbourhood_group

# Create a scatter plot to visualize the relationship between number of reviews and neighbourhood group
plt.figure(figsize=(10, 6)) # Set the fig size
sns.barplot(data=df, x='neighbourhood_group', y='number_of_reviews', estimator=sum, palette='plasma')

plt.title('Total Number of Reviews by Neighbourhood Group') # Set title for barplot

plt.xlabel('Neighbourhood Group') # Set X_label

plt.ylabel('Total Reviews') # Set Y_label

plt.show()


##### 1. Why did you pick the specific chart?

This bar plot was chosen to visually compare the total number of reviews across different neighborhood groups. It provides a clear view of which neighborhoods are generating the most reviews, which can serve as a rough indicator of guest activity or popularity. The use of a bar plot with summed values makes it easy to see and compare the cumulative activity level in each neighborhood group.

##### 2. What is/are the insight(s) found from the chart?

There are following insights from bar chart :

 1. **Guest Activity by Neighborhood** : Neighborhood groups with higher total reviews likely have higher guest activity or popularity. This can signal areas with more frequent bookings or high guest satisfaction, as guests tend to leave reviews after a positive experience.

 2. **Potential Guest Interest Patterns** : Neighborhoods with fewer reviews might indicate less guest interest, fewer available listings, or areas with less competitive prices or amenities, potentially signaling areas for development.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


***Positive Business Impact*** :


 1. **Promoting Popular Neighborhoods** : Airbnb can advertise listings in neighborhoods with high guest activity. Since these areas are already popular, marketing them can attract even more guests who are looking for well-reviewed, desirable locations.

 2. **Finding Growth Opportunities** : Neighborhoods with fewer reviews might be underused or less known. Airbnb could encourage hosts in these areas to improve their listings or offer competitive prices. This could attract guests to explore these less crowded areas, balancing guest interest across neighborhoods.

 3. **Supporting Successful Hosts** : In neighborhoods with many reviews, Airbnb can support hosts by encouraging them to keep up the good service and consider adjusting prices if demand is high. This can lead to better occupancy rates and profitability for both Airbnb and the hosts.

***Potential Negative Growth Insights*** :

 1. **Too Much Competition in Popular Areas** : High review counts in certain neighborhoods may mean overcrowding and competition among hosts. This can lead to price cuts as hosts try to attract more bookings, which may reduce overall profitability.

 2. **Limited Choices in Less Popular Areas** : If Airbnb mainly promotes popular neighborhoods, less-known neighborhoods may be ignored. This could discourage guests who prefer quieter, less touristy areas, potentially pushing them toward other platforms with diverse location options.

#### Chart - 9 -- Box Plot

In [None]:
# Chart - 9 visualization code

# Visualization Room Type vs. Availability

plt.figure(figsize=(10, 6)) # Set fig. size

# Creating boxplot for box plot
sns.boxplot(data=df, x='room_type', y='availability_365', palette='spring')

plt.title('Room Type vs Availability') # Set title for boxplot

plt.xlabel('Room Type') # Set X_label

plt.ylabel('Availability (Days)') # Set Y_label

plt.show()




##### 1. Why did you pick the specific chart?

The box plot is chosen to show the distribution of availability (in days) across different room types. This chart helps visualize the median availability, the range, and any outliers in each room type category, making it easy to compare the availability trends across room types (e.g., entire homes, private rooms, shared rooms).

##### 2. What is/are the insight(s) found from the chart?

There are following insights can be drawn :

 1. **Higher Availability for Certain Room Types** : Entire homes or apartments might show greater availability, indicating that these properties are typically listed for more days throughout the year.

 2. **Variability in Availability** : Private or shared rooms might show greater variation in availability, with some being available year-round and others for shorter periods.

 3. **Outliers** : The box plot may reveal outliers where some listings have extremely low availability, possibly due to occasional rentals or seasonal use.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***Positive Business Impact*** :


 1. **Better Recommendations for Guests** : Airbnb can help guests find the right room type based on how often they are available. For example, if someone wants to stay year-round, they can be directed to entire homes. For shorter stays, private rooms might be a better choice.

 2. **Guiding Hosts on Availability** : Hosts can compare their room type's availability with others. If private rooms are usually listed more often, new hosts with similar rooms might decide to make their listings available more days to attract more guests.


***Potential Negative Growth Insights*** :


 1. **Missed Booking Opportunities** : If entire homes are not available enough, Airbnb could lose out on potential bookings. This suggests that some listings aren't being used to their full potential for guests looking for longer stays.

 2. **Fewer Budget Options** : If private or shared rooms are not available often, it can limit affordable choices for budget travelers. Over time, this might push guests to other platforms that offer more budget-friendly listings.

#### Chart - 10 -- Bar Chart

In [None]:
# Chart - 10 visualization code

# Visualization of seasonal booking trends

plt.figure(figsize=(10, 6)) # Fix Fig. size

# Creating bar chart for visualization of seasonal booking trends
sns.barplot(data=monthly_bookings, x='month', y='number_of_reviews', palette='YlGnBu')

plt.title('Monthly Booking Trends Based on Number of Reviews') # Set title for bar chart

plt.xlabel('Month') # Set X-label

plt.ylabel('Total Reviews') # Set Y_label

plt.show()




##### 1. Why did you pick the specific chart?

I chose a bar chart to visualize monthly booking trends because it clearly displays the total number of reviews across different months. Bar charts are effective for comparing quantities in distinct categories, allowing for easy identification of peak booking months and trends over the year.

##### 2. What is/are the insight(s) found from the chart?

***From the bar chart, we can derive several insights*** :

 1. **Peak Booking Periods** : The chart likely indicates which months have the highest number of reviews, suggesting when demand is strongest. For instance, if certain months show a spike in reviews, it indicates those are popular times for travelers.

 2. **Low Booking Months** : Conversely, months with very few reviews may highlight off-peak seasons. This information can help hosts and Airbnb strategize on pricing and promotions during these slower months.

 3. **Trend Patterns** : Analyzing the overall trend across the months can reveal seasonal patterns in bookings, helping to forecast future demand based on historical data.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***Positive Business Impact*** :


 1. **Informed Pricing and Promotions** : Understanding when peak booking months occur allows Airbnb and hosts to implement dynamic pricing strategies. For example, they can increase prices during high-demand months and offer discounts during off-peak times to boost bookings.

 2. **Targeted Marketing Campaigns** : Insights about seasonal trends can help Airbnb design marketing campaigns that target travelers at specific times of the year, promoting listings when guests are most likely to book.


***Potential Negative Growth Insights*** :


 1. **Overcrowding During Peak Times** : If many hosts raise prices during peak months due to high demand, it might deter budget-conscious travelers, leading to vacancies and reduced guest satisfaction.

 2. **Neglecting Off-Peak Months** : If there’s a significant focus on peak months, less attention may be paid to improving listings or attracting guests during off-peak periods, potentially causing missed opportunities to grow bookings throughout the year.

#### Multivariate Visualizations

#### Chart - 11 -- Bar Chart

In [None]:
# Chart - 11 visualization code

# Visualization of Average Price by Neighbourhood Group and Room Type

plt.figure(figsize=(12, 8)) # Fix fig. size

# Creating bar plot for Average Price by Neighbourhood Group and Room Type
sns.barplot(data=grouped_neighbourhood_room, x='neighbourhood_group', y='mean_price', hue='room_type', palette='Set2')

plt.title('Average Price by Neighbourhood Group and Room Type') # Set title for bar plot

plt.xlabel('Neighbourhood Group') # Set X_label

plt.ylabel('Average Price') # Set Y_label

plt.legend(title='Room Type') # Set legend title

plt.xticks(rotation=45) # Rotate x-axis labels for better readability

plt.show()


##### 1. Why did you pick the specific chart?

I chose a bar plot for this visualization because it effectively compares average prices across different neighborhood groups while also allowing us to differentiate between room types. Bar plots are particularly useful for showing categorical comparisons, making it easy to see trends and differences in pricing.

##### 2. What is/are the insight(s) found from the chart?

***The bar plot provides several key insights*** :


 1. **Price Variation by Room Type** : The chart likely illustrates how average prices differ not just by neighborhood group but also by room type. For instance, entire homes may have significantly higher average prices than private or shared rooms, indicating their appeal to families or groups.

 2. **Neighborhood Pricing Trends** : Certain neighborhood groups may consistently show higher or lower average prices across all room types. This can help identify upscale areas versus budget-friendly options.

 3. Market Positioning **bold text** : If specific room types dominate certain neighborhoods, it indicates a concentration of supply, which can inform both potential hosts and Airbnb about where to focus their marketing efforts or encourage new listings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***Positive Business Impact*** :

 1. **Targeted Marketing Strategies** : Knowing which neighborhoods command higher prices for specific room types allows Airbnb to create targeted marketing campaigns. For example, they can promote entire homes in upscale neighborhoods to attract higher-income travelers, while also highlighting budget-friendly private rooms in more affordable areas.

 2. **Guidance for Hosts** : These insights can help current and prospective hosts set competitive prices based on their neighborhood and room type, optimizing their listings for better visibility and bookings.


***Potential Negative Growth Insights*** :

 1. **Risk of Overpricing** : If hosts in popular neighborhoods set prices too high based on these averages, they may deter potential guests, leading to lower occupancy rates. Balancing pricing with demand is crucial to avoid vacancies.

 2. **Limited Options for Diverse Travelers** : If the focus is predominantly on high-priced listings, budget-conscious travelers might find fewer options available, potentially steering them toward competitors with more affordable offerings.

#### Chart - 12 -- Bar chart and line chart

In [None]:
# Chart - 12 visualization code

# Visualization of  Seasonal Trends in Price and Bookings
fig, ax1 = plt.subplots(figsize=(10, 6))

# Creating a bar chart
sns.barplot(data=seasonal_bookings, x='season', y='number_of_reviews', color='skyblue', ax=ax1)

ax1.set_ylabel('Total Reviews', color='blue') # Set X_label

ax1.set_xlabel('Season') # Set Y_label

# Creating a second y-axis for average price
ax2 = ax1.twinx()

# Creating line plot
sns.lineplot(data=seasonal_bookings, x='season', y='price', color='red', marker='o', ax=ax2)

ax2.set_ylabel('Average Price', color='red') # Set Y_label with color = red

plt.title('Seasonal Trends in Bookings and Price') # Set title for line plot

plt.show()


##### 1. Why did you pick the specific chart?

I chose a dual-axis bar and line plot for this visualization because it allows for a clear comparison between two different but related metrics: the total number of reviews (as a proxy for bookings) and the average price across different seasons. The bar plot effectively shows the number of bookings, while the line plot illustrates how average prices change with the seasons, providing a comprehensive view of seasonal trends.

##### 2. What is/are the insight(s) found from the chart?

***The dual-axis plot provides several valuable insights*** :

 1. **Seasonal Booking Patterns** : The bar chart likely reveals peaks in bookings during specific seasons, such as summer or holidays, indicating when travelers are most active.

 2. **Price Fluctuations** : The line plot likely shows how average prices vary throughout the seasons. For instance, if prices rise during peak booking seasons, it suggests increased demand which could impact revenue for hosts and Airbnb.

 3. **Relationship Between Price and Bookings** : By comparing the two plots, we can observe whether higher prices correspond to increased bookings or if there is a threshold beyond which bookings decline, providing insights into price sensitivity among guests.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***Positive Business Impact*** :


 1. **Optimized Pricing Strategies** : Understanding seasonal trends in price and bookings enables Airbnb to develop dynamic pricing strategies. For instance, they can encourage hosts to adjust their prices based on predicted booking trends, maximizing revenue during peak seasons while remaining competitive during off-seasons.

 2. **Targeted Marketing Campaigns** : Insights from the chart can help inform marketing campaigns. For example, if summer shows high bookings, Airbnb can launch targeted promotions for that season, enticing more guests to book listings.

***Potential Negative Growth Insights*** :


 1. **Overpricing Risks** : If hosts increase prices significantly during high-demand seasons, it could lead to reduced bookings if guests perceive the prices as too high. This could potentially deter travelers from booking on Airbnb in favor of more affordable alternatives.

 2. **Market Saturation in Peak Seasons** : If there’s a substantial increase in listings during peak seasons, it may lead to overcrowding. This could create price competition, potentially driving down prices and affecting host profitability.

#### Chart - 13 -- Bar chart

In [None]:
# Chart - 13 visualization code

# Visualization of Average Price by Neighbourhood Group and Season

plt.figure(figsize=(12, 8)) # Fix fig. size

# Creating barplot for Average Price by Neighbourhood Group and Season
sns.barplot(data=avg_location, x='neighbourhood_group', y='avg_price', hue='season', palette='Set1')

plt.title('Average Price by Neighbourhood Group and Season') # Set title for bar plot

plt.xlabel('Neighbourhood Group') # Set X_label

plt.ylabel('Average Price') # Set Y_label

plt.legend(title='Season') # Activate legend

plt.xticks(rotation=45) # Rotate x-axis labels for better readability

plt.show()


##### 1. Why did you pick the specific chart?

I selected a bar plot to visualize average prices by neighborhood group and season because it effectively allows for the comparison of multiple categories across two dimensions. The bars represent average prices in different neighborhood groups, while the color coding by season provides insight into how prices fluctuate depending on the time of year. This format makes it easy to identify trends and differences at a glance.

##### 2. What is/are the insight(s) found from the chart?

***The bar plot can reveal several key insights*** :


 1. **Seasonal Price Variability** : Different seasons may show distinct average prices across neighborhood groups. For example, if summer prices are consistently higher across several neighborhoods, it indicates strong demand during that period.

 2. **Neighborhood Price Disparities** : The plot likely highlights which neighborhood groups are priced higher overall. If certain neighborhoods maintain higher average prices across all seasons, they may be perceived as more desirable or upscale.

 3. **Price Trends by Neighborhood** : By analyzing how prices for each neighborhood change across seasons, we can identify potential seasonal trends that may affect guest behavior and host pricing strategies.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***Positive Business Impact*** :


 1. **Informed Pricing Strategies** : The insights gained from this chart can help Airbnb and its hosts set competitive prices based on neighborhood demand patterns. If hosts in high-demand neighborhoods see that prices increase in certain seasons, they can strategically raise their rates to maximize revenue.

 2. **Targeted Promotions** : Understanding seasonal pricing trends can inform Airbnb's marketing strategies. For example, if certain neighborhoods show increased prices during the summer, Airbnb can promote these areas to attract guests looking for vacation stays, driving more traffic to these listings.

***Potential Negative Growth Insights*** :


 1. **Risk of Overpricing** : If hosts raise prices significantly in high-demand seasons without considering market saturation, it might deter potential guests, leading to decreased bookings. This can be particularly problematic in neighborhoods with many listings.

 2. **Imbalance in Supply and Demand** : If a neighborhood consistently has higher prices and limited availability, it may create frustration for guests seeking affordable options, pushing them to look for alternatives outside of Airbnb.

#### Chart - 14 -- Count plot

In [None]:
# Chart - 14 visualization code

# Visualization of  Engagement Category by Room Type and Neighbourhood Group

plt.figure(figsize=(10, 6)) # Fix fig. size

# Creating countplot for visualization of  Engagement Category by Room Type and Neighbourhood Group
sns.countplot(data=df, x='room_type', hue='engagement_category', palette='Paired')

plt.title('Engagement Category by Room Type and Neighbourhood Group') # Set title for countplot

plt.xlabel('Room Type') # Set X_label

plt.ylabel('Count') # Set Y_label

plt.legend(title='Engagement Category') # Activate legend

plt.xticks(rotation=45) # Rotate x-axis labels for better readability

plt.show()

##### 1. Why did you pick the specific chart?

I chose a count plot for this visualization because it effectively displays the frequency of different engagement categories across various room types. Count plots are particularly useful for categorical data, allowing for easy comparison of counts within each category. This chart can quickly illustrate the distribution of engagement across room types and highlight any significant trends or disparities.

##### 2. What is/are the insight(s) found from the chart?

***The count plot can reveal several insights*** :


 1. **Room Type Popularity** : It allows us to see which room types are most frequently engaged with in different categories. For example, if entire homes show higher counts in a specific engagement category compared to private or shared rooms, it indicates a preference for that type of accommodation among guests.

 2. **Engagement Category Trends** : The plot may highlight how different room types are perceived in terms of guest engagement. If certain engagement categories are predominant in specific room types, it can reflect guest preferences or behaviors.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***Positive Business Impact*** :


 1. **Targeted Marketing Strategies** : Understanding which room types are more frequently engaged can help Airbnb design targeted marketing campaigns. For instance, if entire homes are more popular in a certain engagement category, Airbnb can promote these listings to attract more guests.

 2. **Guiding Host Decisions** : Insights into guest engagement can help hosts understand what type of room might be more profitable in their neighborhood. If hosts see that certain room types are consistently engaged more, they can adjust their listings accordingly to maximize occupancy rates.

***Potential Negative Growth Insights*** :


 1. **Market Saturation Risks** : If many hosts opt to offer the same popular room type (e.g., entire homes) in a particular engagement category, it could lead to oversupply. This might create intense competition, potentially driving prices down and lowering profitability for hosts.

 2. **Missed Opportunities in Underrepresented Categories** : If specific engagement categories are significantly less engaged for certain room types, it may indicate a missed opportunity. For example, if private rooms are underrepresented in a high-engagement category, hosts might consider enhancing their offerings or marketing strategies to attract more guests.

#### Chart - 15 - Correlation Heatmap


In [None]:
# Correlation Heatmap visualization code
# Creating heatmap for correlation

# Select only numerical features for correlation
numerical_df = df.select_dtypes(include=['number'])

plt.figure(figsize=(10, 6)) # Fix fig. size

# plot heatmap for correlation
sns.heatmap(numerical_df.corr(), annot=True, fmt=".2f", cmap='coolwarm')

plt.title('Correlation Heatmap') # Set title for heatmap
plt.show()

##### 1. Why did you pick the specific chart?

I chose a heatmap for visualizing correlation because it effectively represents the strength and direction of relationships between numerical features in a compact format. The heatmap allows for quick identification of positive and negative correlations through color gradients, making it easy to spot strong relationships, correlations, and potential multicollinearity issues at a glance.

##### 2. What is/are the insight(s) found from the chart?

***The heatmap can provide several insights*** :


 1. **Strong Positive Correlations** : If certain features, such as **number_of_reviews** and **availability_365**, show a strong positive correlation, it indicates that properties with more reviews tend to be available for booking more days throughout the year. This could imply that popular listings are also those that hosts keep available more frequently.

 2. **Negative Correlations** : The heatmap might reveal negative correlations, such as between **price** and **number_of_reviews**. A strong negative correlation may suggest that higher-priced listings receive fewer reviews, indicating a potential challenge for premium listings in attracting guests.

 3. **Identifying Multicollinearity** : Features that exhibit high correlation with each other, such as **price** and **avg_price**, could signal multicollinearity, which may affect the performance of certain predictive models. Recognizing these relationships can guide feature selection in model building.

#### Chart - 16 - Pair Plot

In [None]:
# Pair Plot visualization code

# Select relevant numerical columns for pair plot
pairplot_data = df[['price', 'minimum_nights', 'number_of_reviews', 'availability_365']]

# Create the pair plot
plt.figure(figsize=(10, 10))  # Adjusts the overall size of the figure
sns.pairplot(pairplot_data, diag_kind='kde', corner=True, plot_kws={'alpha':0.5})

# Display the plot
plt.suptitle('Pair Plot of Key Numerical Variables', y=1.02)  # Set a title with slight adjustment for spacing
plt.show()


##### 1. Why did you pick the specific chart?

I selected a pair plot to visualize the relationships between key numerical variables in your dataset. A pair plot effectively displays pairwise relationships in a dataset, allowing for easy observation of correlations and distributions across multiple dimensions. This visualization helps to quickly identify trends, clusters, and outliers between different features.

##### 2. What is/are the insight(s) found from the chart?

***The pair plot can reveal several insights*** :


 1. **Relationships Between Variables** : By examining scatterplots of pairs such as **price **vs. **number_of_reviews**, we may observe that listings with more **reviews** generally have lower prices. This suggests that higher-priced listings might attract fewer guests or may be less popular.

 2. **Distribution Patterns** : The diagonal plots (KDE plots) show the distribution of each variable. For example, the **price** distribution may be right-skewed, indicating that most listings are on the lower end of the **price** spectrum, with a few high-priced outliers.

 3. **Minimum Nights vs. Availability** : There could be interesting relationships between **minimum_nights** and **availability_365**. For instance, if properties that require a higher minimum stay are also available more days of the year, this could indicate a trend toward longer-term rentals.

 4. **Outliers and Clusters** : The pair plot can highlight clusters or outliers within the dataset. For instance, if certain properties have exceptionally high prices but very few reviews, they may represent luxury listings that aren’t as well-known.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1. ***User Experience Optimization*** :


 - **Tailored Recommendations** : Use insights from the visualizations to develop an algorithm that suggests listings based on user preferences, such as budget, room type, and neighborhood. For instance, if users frequently search for entire homes in a specific neighborhood, prioritize those listings in search results.

 - **Enhanced Booking Process** : Streamline the booking experience by implementing filters that allow users to sort listings by price, availability, and number of reviews. This can help users quickly find listings that meet their criteria, improving overall satisfaction.

 - **Seasonal Promotions** : Leverage the seasonal trends identified in your analysis to create targeted promotions during high-demand periods. For example, offer discounts for properties that might otherwise go unbooked during off-peak seasons.


***Host Support and Guidance*** :



 - **Dynamic Pricing Tools** : Provide hosts with data-driven pricing recommendations based on the analysis of average prices by room type and neighborhood. This tool could suggest optimal pricing strategies to maximize bookings while remaining competitive.

 - **Education and Resources** : Offer workshops or resources for new hosts to help them understand market trends. Sharing insights on successful pricing strategies, optimal availability, and the importance of gathering reviews can enhance host performance.

 - **Feedback Mechanism** : Establish a feedback loop where hosts can receive insights based on guest reviews and booking data. This will enable them to make informed decisions about property management and guest engagement.


***Strategic Planning for Market Growth*** :



 - **Market Analysis Reports** : Regularly compile reports based on the visualizations, focusing on trends in guest behavior, neighborhood performance, and pricing strategies. These reports can guide Airbnb's marketing campaigns and highlight potential growth areas.

 - **Targeted Marketing Campaigns** : Utilize data on neighborhood popularity and guest preferences to create marketing campaigns that promote specific areas. Highlight the unique features of each neighborhood to attract different segments of travelers, such as families, business travelers, or budget-conscious guests.

 - **Identifying Underserved Areas** : Analyze neighborhoods with lower review counts and explore opportunities for host development. By encouraging new listings in these areas and providing incentives, Airbnb can diversify its offerings and attract a broader range of guests.

# **Conclusion**

In conclusion, the insights derived from this project not only enhance our understanding of the Airbnb marketplace but also provide actionable recommendations for both Airbnb and its hosts. By leveraging data analytics, Airbnb can refine its user experience, empower hosts with valuable insights, and strategically position itself for growth in a competitive landscape. Moving forward, continuous monitoring and analysis will be crucial in adapting to changing market dynamics and ensuring sustained success in the ever-evolving travel industry.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***