<a href="https://colab.research.google.com/github/ManjeetKalla11/EDA-Manjeet_Kalla/blob/main/Airbnb_booking_analysis_EDA_Manjeet_Kalla.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -  AirBnb Booking Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Name- Manjeet_Kalla

# **Project Summary -**

Airbnb, established in 2008, has transformed the travel industry by connecting travelers with hosts offering unique accommodations across the globe. The platform provides a wide range of lodging options, from individual rooms to entire homes, giving travelers an alternative to traditional hotels and allowing for more personalized experiences. Today, Airbnb is a globally recognized brand known for its ability to deliver distinctive and authentic travel opportunities.

A key aspect of Airbnb's success lies in its extensive use of data analysis. By examining the data generated from millions of listings, Airbnb gains insights into customer preferences, optimizes pricing, enhances security measures, and refines its business strategies. This data-driven approach enables the company to continuously improve its platform, tailor marketing campaigns, and introduce new services, ultimately enhancing the overall user experience.

# **GitHub Link -**

https://github.com/ManjeetKalla11/EDA-Manjeet_Kalla

# **Problem Statement**


**The task of this project is to derive insights from the given dataset so that it can be used by the stake holders for business improvements
1. Price influencing factors-What are the primary factors that influence the pricing of Airbnb listings in New York City, and how do these factors vary across different neighborhoods and room types?
2. Booking Patterns Analysis: What are the patterns and trends in booking behaviors, such as minimum nights required, room type preferences, and availability throughout the year? How do these patterns vary by neighborhood?
3. Review Analysis: What insights can be drawn from the distribution and frequency of reviews across different listings? How do the review counts and average monthly reviews correlate with listing price and availability?
4. Room Type Preferences: What room types (Entire home/apt, Private room, Shared room) are most popular among guests, and how does the room type influence the pricing and number of reviews?**

#### **Define Your Business Objective?**

The business objective of this project is to utilize data insights from the Airbnb NYC 2019 dataset to support stakeholders, including hosts, Airbnb management, and investors, in making informed decisions. The analysis aims to identify strategies that can optimize revenue, enhance customer satisfaction, and improve operational efficiency on the platform.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

### Dataset First View

In [None]:
# Dataset First Look
df = pd.read_csv("/content/drive/MyDrive/AlmaBetter/Airbnb NYC 2019 (1).csv")

In [None]:
df

In [None]:
df.head()

In [None]:
df.tail()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()


In [None]:
# Visualizing the missing values
plt.figure(figsize=(10, 6))
sns.heatmap(df.isnull(), cbar=False, cmap='viridis', yticklabels=False)
plt.title('Missing Values Heatmap')
plt.show()

## Dealing with missing values

In [None]:
# Drop rows where 'name' or 'host_name' are missing
df_cleaned = df.dropna(subset=['name', 'host_name']).copy()  # Copy to avoid SettingWithCopyWarning

# Fill missing 'last_review' with 'No Review' using .loc to avoid the warning
df_cleaned.loc[:, 'last_review'] = df_cleaned['last_review'].fillna('No Review')

# Fill missing 'reviews_per_month' with 0
df_cleaned.loc[:, 'reviews_per_month'] = df_cleaned['reviews_per_month'].fillna(0)



In [None]:
df_cleaned = df.dropna(subset=['name', 'host_name']).copy()  # Copy avoids warnings

# Now perform the fill operations safely
df_cleaned['last_review'].fillna('No Review', inplace=True)
df_cleaned['reviews_per_month'].fillna(0, inplace=True)



In [None]:
df_cleaned.isnull().sum()

### What did you know about your dataset?

Answer Here

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
airbnb_df=df_cleaned
airbnb_df.columns

In [None]:
# Dataset Describe
airbnb_df.describe()


### Variables Description
Here’s a brief description of each variable in your dataset:

id: A unique identifier for each Airbnb listing.

host_id: A unique identifier for the host of the listing.

latitude: The geographic latitude of the listing's location.

longitude: The geographic longitude of the listing's location.

price: The price per night for the listing.

minimum_nights: The minimum number of nights a guest must book for a stay.

number_of_reviews: The total number of reviews the listing has received.

reviews_per_month: The average number of reviews the listing receives per month.

calculated_host_listings_count: The total number of listings managed by the host.

availability_365: The number of days the listing is available for booking in a year (out of 365 days).



1. id:

Count: 48,858 (total entries)
Mean: 19,023,350
Standard Deviation: 10,982,890
Minimum: 2,539
Maximum: 36,487,240

2. host_id:

Count: 48,858
Mean: 67,631,690
Standard Deviation: 7,862,390
Minimum: 2,438
Maximum: 274,321,300

3. latitude:

Count: 48,858
Mean: 40.73
Standard Deviation: 0.0545
Minimum: 40.50
Maximum: 40.91

4. longitude:

Count: 48,858
Mean: -73.95
Standard Deviation: 0.0462
Minimum: -74.24
Maximum: -73.71

5. price:

Count: 48,858
Mean: 152.74
Standard Deviation: 240.23
Minimum: 0 (may indicate listings with no price)
Maximum: 10,000

6. minimum_nights:

Count: 48,858
Mean: 7.01
Standard Deviation: 20.02
Minimum: 1
Maximum: 10,000 (outlier)

7. number_of_reviews:

Count: 48,858
Mean: 23.27
Standard Deviation: 44.55
Minimum: 0
Maximum: 1,250

8. reviews_per_month:

Count: 48,858
Mean: 1.09
Standard Deviation: 1.60
Minimum: 0
Maximum: 58.50

9. calculated_host_listings_count:

Count: 48,858
Mean: 7.15
Standard Deviation: 32.96
Minimum: 1
Maximum: 58.50

10. availability_365:

Count: 48,858
Mean: 112.80
Standard Deviation: 131.61
Minimum: 0
Maximum: 365

### Check Unique Values for each variable.

In [None]:
# Loop through each column and print the number of unique values
for column in df_cleaned.columns:
    unique_values = df_cleaned[column].unique()
    print(f"Column: {column}, Unique Values: {len(unique_values)}")




## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
airbnb_df['last_review'] = pd.to_datetime(airbnb_df['last_review'], errors='coerce')

In [None]:
#Review Age: Calculate how many days since the last review.
airbnb_df['review_age'] = (pd.to_datetime('today') - airbnb_df['last_review']).dt.days

In [None]:
#Price Binning: Categorize listings into price ranges.
bins = [0, 50, 100, 200, float('inf')]
labels = ['0-50', '51-100', '101-200', '201+']
airbnb_df['price_range'] = pd.cut(airbnb_df['price'], bins=bins, labels=labels)

In [None]:
#Room Type Encoding: Convert room_type to a categorical type if you plan to use it in modeling.
airbnb_df['room_type'] = airbnb_df['room_type'].astype('category')

In [None]:
#aggregate data to derive insights, such as the average price and number of reviews by neighbourhood_group.
average_price_by_neighbourhood = airbnb_df.groupby('neighbourhood_group')['price'].mean()
average_price_by_neighbourhood

In [None]:
#Convert object types to category types for categorical columns to save memory and improve performance.
airbnb_df['neighbourhood_group'] = airbnb_df['neighbourhood_group'].astype('category')
airbnb_df['neighbourhood'] = airbnb_df['neighbourhood'].astype('category')

In [None]:
# Handling Missing values in last_review and review_age column
airbnb_df['last_review'].fillna(pd.to_datetime('today'), inplace=True)
airbnb_df['review_age'].fillna(0, inplace=True)

In [None]:
#Dropping the missing values in last_review and review_age
airbnb_df.dropna(subset=['last_review', 'review_age'], inplace=True)

In [None]:
#Check for patterns in the rows with missing price_range values:
missing_price_range = airbnb_df[airbnb_df['price_range'].isnull()]
print(missing_price_range[['price', 'minimum_nights', 'number_of_reviews', 'room_type']])


In [None]:
# Add 'Unknown' to the categories of 'price_range' column
airbnb_df['price_range'] = airbnb_df['price_range'].cat.add_categories('Unknown')
# Now fill the missing values with 'Unknown'
airbnb_df['price_range'].fillna('Unknown', inplace=True)

In [None]:
airbnb_df = airbnb_df[airbnb_df['price'] > 0]


In [None]:
print(airbnb_df['price_range'].isnull().sum())  # Should be 0 if filled, or check the total number of rows

In [None]:
airbnb_df.info()

### What all manipulations have you done and insights you found?

Duplicate Removal: I identified and removed any duplicate rows in the airbnb_df dataset, ensuring that each entry is unique.

Missing Values Handling:

I checked for missing values, particularly in the price_range column.
I analyzed the rows with missing price_range values and discovered they had a price of 0.
I decided to drop these rows, which improved the overall quality of the dataset.
Data Type Conversion: I converted the last_review column to a datetime format for better analysis related to time.

Categorical Conversion: I converted relevant columns (e.g., neighbourhood_group, neighbourhood, room_type, and price_range) to categorical data types, optimizing memory usage and performance for analysis.



In [None]:
airbnb_df

In [None]:
#Saving the csv File
airbnb_df.to_csv('cleaned_airbnb_data.csv', index=False)

In [None]:
airbnb_df.columns

In [None]:
airbnb_df.info()

#### Chart - 1

1. Price Distribution
Visualization: Histogram

Purpose: Understand the distribution of prices and identify outliers.

In [None]:
# Chart - 1
#Price Distribution Visualization
plt.figure(figsize=(10, 6))
sns.histplot(airbnb_df['price'], bins=50, kde=True)
plt.title('Distribution of Prices')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()


##### 1. Why did you pick the specific chart?

I chose a histogram with a KDE plot to visualize the price distribution because:

Distribution Insight: Histograms are ideal for visualizing the distribution of continuous numerical data like price. It helps in understanding how the prices of Airbnb listings are spread and whether they are concentrated in certain ranges.

Frequency Analysis: It allows us to see how frequently prices fall within specific intervals, which can highlight common price ranges and identify pricing outliers (listings with extremely high or low prices).

Skewness Detection: By including a KDE (Kernel Density Estimate) plot, we get a smooth curve that highlights the overall distribution shape (skewness, multi-modality). This was useful here to confirm that the data is heavily right-skewed.

Outliers Identification: The histogram is helpful to quickly spot outliers (e.g., listings with unusually high prices) and to understand if they distort the overall dataset.

This chart was a natural first step to understanding the price data before diving deeper into other variables, relationships, or segmentations.

##### 2. What is/are the insight(s) found from the chart?

Insights from the Price Distribution Chart:
Right-Skewed Distribution:

Most Airbnb listings are priced at the lower end, with a sharp decline in frequency as prices increase. This suggests that the majority of properties are affordable, with prices likely below 1,000.
The long right tail indicates a few outliers with much higher prices, going up to $10,000.
Concentration of Listings:

A significant concentration of listings is priced between 0 and 500, suggesting that budget-friendly and mid-range listings dominate the market.
Outliers:

A small number of listings with prices over 2,000 likely represent luxury properties or outliers that could skew aggregate statistics like mean and standard deviation.
These high-priced listings might be candidates for separate analysis or capping in future analyses to avoid distortion.
Potential for Further Segmentation:

There is a clear distinction between the bulk of low-priced listings and the sparse high-priced listings. This could suggest distinct markets within Airbnb (e.g., budget vs. luxury) that may require separate strategies for analysis or pricing models.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

####Business Impact of Gained Insights:

Positive Business Impact:

Understanding Market Segmentation: The chart reveals that the majority of Airbnb listings are clustered in the lower to mid-range price categories. This insight helps property owners and managers target the largest group of consumers (budget-conscious travelers). Businesses can offer promotions or pricing strategies aimed at attracting this core market, which could boost occupancy rates.

Outliers for Premium Strategies:

The presence of outliers with high prices indicates there is also a luxury market, though small. This can guide Airbnb hosts in identifying high-end properties and applying different marketing or service strategies for premium customers (e.g., personalized services, luxury amenities). By focusing on differentiated pricing strategies, property owners can cater to both budget and premium segments, optimizing revenue streams.

Data-Driven Pricing Strategies:

 By recognizing the skewness in price distribution, Airbnb hosts can avoid pricing their listings at the extreme high end unless they offer unique value or luxury. For mid-market hosts, competitive pricing aligned with the most frequent price points can help them stand out in a crowded market.

Potential Negative Growth:

High-Priced Outliers Could Mislead: Without addressing the outliers in high-price listings, business insights could be skewed. For example, if outliers aren't capped or separated during analysis, average price figures may appear inflated, misleading hosts into pricing their listings too high for the market demand. This could lead to lower occupancy rates, as customers are more sensitive to price increases at the lower end of the spectrum.

Low-End Saturation:

 The heavy concentration of listings in the low-price range may lead to over-saturation. Property owners in this category could face intense competition, resulting in price wars or reduced margins. Without offering differentiation (e.g., better amenities, location advantages), listings in the low-end could struggle to maintain profitability. Additionally, if too many listings cluster at low prices, the overall perceived quality of Airbnb as a service may be affected, which could deter premium customers.

Key Takeaway:

The insights suggest positive potential for tailoring pricing strategies to different market segments, maximizing revenue. However, the over-saturation at the lower price range could lead to stagnation or price pressure, and ignoring luxury segments could lead to missed opportunities.
Properly handling outliers and understanding the pricing structure across segments will help Airbnb optimize both occupancy and profitability while avoiding price distortion effects.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
# Room Type vs Price (Boxplot)
plt.figure(figsize=(12, 6))
sns.boxplot(x='room_type', y='price', data=airbnb_df)
plt.title('Room Type vs Price (Boxplot)')
plt.xlabel('Room Type')
plt.ylabel('Price')


##### 1. Why did you pick the specific chart?

I chose a boxplot for this visualization because it effectively showcases the distribution and range of prices for different room types, while highlighting outliers and median values. Here’s why the boxplot is particularly suitable:

Comparison of Room Types:

 The boxplot allows for easy comparison of the price ranges for the three different room types (Entire home/apt, Private room, Shared room), showing how they differ in terms of both central tendency (median) and spread (range of prices).

Outlier Detection:

Since there are some properties listed at very high prices, the boxplot is an excellent tool to show outliers. This helps identify which room types have more variability or unusual pricing.

Summarizes Data:

 The boxplot efficiently summarizes large amounts of data into quartiles (minimum, lower quartile, median, upper quartile, and maximum) while providing insights about skewness or concentration of data points.

Insight into Market Segments:

For pricing strategies and market segmentation, understanding how different room types are priced relative to one another is critical, and the boxplot visually demonstrates this in a compact and informative way.

##### 2. What is/are the insight(s) found from the chart?

From the Room Type vs Price Boxplot, the following insights can be drawn:

Entire home/apartment has the highest price range:

Listings categorized as "Entire home/apt" have a much wider price range compared to the other two room types. The median price is significantly higher, and the presence of several high-priced outliers indicates that some of these properties are priced far above the majority.

Private room has moderate prices:

Private rooms have a narrower range of prices, with fewer extreme outliers. The median price is lower than for entire homes/apartments, making private rooms a more affordable option for travelers. There are a few outliers, but overall the prices tend to cluster within a smaller range.

Shared rooms are the least expensive:

Shared rooms have the lowest price range and few outliers, indicating that this category is the most affordable option for guests. The median price is much lower compared to the other two room types, and the spread of prices is quite limited.

Outliers:

The boxplot reveals that "Entire home/apt" has a significant number of high-price outliers, while private rooms and shared rooms have fewer. These outliers could represent luxury or premium properties that are priced well above the market norm.

Consistency in shared room pricing:

Shared rooms exhibit the least variation in pricing, showing a more consistent pricing structure compared to private rooms and entire homes.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

###Positive Business Impact:
####Targeted Pricing Strategy:

The insights from the Room Type vs Price boxplot can help hosts optimize their pricing strategy. By understanding that entire homes/apartments command a higher price, hosts can ensure they are charging competitively while still maintaining a premium. Similarly, hosts of private or shared rooms can focus on affordability and adjust their prices accordingly to attract more budget-conscious guests.

####Market Segmentation:

The clear distinction between room types and their price points allows hosts to better segment their target audience. For example, entire homes/apartments could be marketed to families or groups willing to pay more for privacy, while shared and private rooms could appeal to solo travelers or those on a budget.

####Product Differentiation:

Hosts with high-end entire home/apartment listings can focus on creating differentiated, luxury experiences to justify the higher price points. They may consider adding amenities or personalized services to stand out from the crowd.

####Outlier Management:

The presence of extreme price outliers suggests that some listings may be overpriced. Identifying and adjusting these listings could increase bookings and improve the host's reputation by aligning with the general market.

####Room Type Demand Insight:

Understanding that shared rooms are priced the lowest, and entire homes the highest, can help businesses (especially Airbnb hosts) determine what types of properties to invest in based on their target audience and profit margin goals. This segmentation will allow better resource allocation when managing multiple properties.

###Negative Growth Potential:

####Luxury Market Saturation:

The fact that there are many high-priced outliers in the "Entire home/apt" category may indicate oversupply in the luxury market. If too many properties are priced excessively high, without providing sufficient value, hosts may struggle to maintain high occupancy rates. This could eventually lead to reduced bookings and stagnation.

####Affordability Gap:

 If the focus remains primarily on high-priced entire homes, it could lead to an affordability gap in the market. Budget-conscious travelers might opt for more affordable listings from competitors, resulting in reduced demand for high-priced homes. The growth of the "budget" segment (private/shared rooms) might be neglected, leading to missed opportunities for capturing a wider audience.

####Outlier Listings and Reputation Risks:

 Listings priced significantly higher than the median may alienate guests or damage a host’s reputation if the experience does not justify the price. These outliers could also lead to negative reviews, as customers expect premium service for premium prices. Without proper management, this could harm long-term growth.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
# Neighbourhood vs Average Price (Bar Plot)
plt.figure(figsize=(12, 6))
sns.barplot(x='neighbourhood_group', y='price', data=airbnb_df)
plt.title('Neighbourhood vs Average Price (Bar Plot)')
plt.xlabel('Neighbourhood Group')

##### 1. Why did you pick the specific chart?

The bar plot you shared is effective for several reasons when comparing the average prices across different neighborhood groups:

Categorical vs. Numerical Comparison: Bar charts are ideal for showing comparisons between categorical variables (neighborhood groups) and a numerical variable (average price). The x-axis (neighborhood groups) and y-axis (price) clearly show the differences in average price between these groups.

Ease of Interpretation: Bar plots provide a simple and intuitive way to compare categories at a glance. You can easily see which neighborhood has the highest or lowest average price.

Highlighting Variation: The use of error bars (vertical lines on top of the bars) suggests that this chart also shows variability or uncertainty in the average prices. This can indicate the range or confidence in the average values, giving more depth to the data.

Space for Detailed Comparison: With five neighborhood groups, a bar plot is a good choice since it can easily accommodate this number without becoming cluttered or hard to read.

If the goal is to show differences between categories and their associated values, this type of ch

##### 2. What is/are the insight(s) found from the chart?

The bar plot comparing neighborhood groups to average prices in New York City provides the following key insights:

Manhattan Dominates in Price: Manhattan has the highest average price, significantly above the other neighborhoods. This suggests that staying in Manhattan is generally much more expensive than in other boroughs, reflecting its prime location and high demand.

Affordable Options in Bronx and Queens: The Bronx and Queens have the lowest average prices, making them the most affordable areas for Airbnb stays. This could be appealing to budget-conscious travelers.

Brooklyn as a Middle Ground: Brooklyn's average price is moderate, higher than the Bronx and Queens, but lower than Manhattan. This indicates that Brooklyn offers a balance between affordability and access to city amenities, as it's known for its trendy neighborhoods and proximity to Manhattan.

Staten Island's Moderate Pricing: Staten Island's prices are somewhat higher than the Bronx and Queens but still much lower than Manhattan. This might suggest a more niche demand for stays in Staten Island, with fewer listings but potentially higher quality options.

Price Variation by Neighborhood: The presence of error bars hints at variability within each neighborhood group. For example, Staten Island shows a higher level of variation, suggesting that prices fluctuate more widely there compared to more stable pricing in other boroughs.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights from the chart can create a positive business impact in several ways, particularly for Airbnb hosts, travelers, and Airbnb itself:

Positive Business Impact:
Pricing Strategy for Hosts: Hosts in neighborhoods like Bronx, Queens, or Staten Island may realize that their prices are lower compared to other areas, especially Manhattan. They can use this insight to adjust their pricing strategies—perhaps offering more competitive rates or additional services to attract budget-conscious travelers, leading to higher bookings and revenue.

Targeting Budget-Conscious Travelers: Queens and Bronx, having lower prices, can position themselves as attractive locations for budget travelers. Airbnb can promote these neighborhoods to tourists looking for cheaper alternatives to Manhattan, thereby increasing demand and bookings in these areas.

Optimal Pricing for Manhattan Hosts: Hosts in Manhattan can leverage the insight that they already command the highest prices. They can focus on maintaining premium experiences, investing in property upgrades, or offering exclusive services to justify the higher rates. This could lead to higher occupancy rates and guest satisfaction, enhancing their profitability.

Encouraging Growth in Underserved Areas: Staten Island, with its moderate pricing and high variability, may indicate that there is untapped potential. By improving listings or marketing the borough more effectively, Airbnb could drive more demand, creating new business opportunities for hosts and the platform.

Negative Growth Considerations:
While the chart mostly points toward positive insights, a few aspects could lead to challenges or negative growth if not addressed properly:

Potential Saturation in Manhattan: Since Manhattan already has the highest prices, there may be a saturation point where higher prices start to deter travelers, especially budget-conscious ones. If demand decreases due to overly high prices or an excess of high-end listings, hosts could face lower occupancy rates, leading to negative growth.

Limited Growth in Bronx and Queens: While Bronx and Queens are affordable, they may suffer from a perception of being less desirable compared to more central areas like Manhattan or Brooklyn. If Airbnb hosts in these areas don't focus on improving guest experiences, offering competitive services, or addressing safety concerns, the potential for growth could be limited, stunting future business expansion.

Staten Island Price Fluctuation: The high variability in prices in Staten Island could indicate instability in the market. If prices fluctuate too much, it might lead to inconsistent demand or customer dissatisfaction, as travelers may not be able to predict costs or value consistently. This could result in negative feedback or reduced bookings over time, leading to negative growth.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
#Availability vs Price (Scatter Plot)
plt.figure(figsize=(12, 6))
sns.scatterplot(x='availability_365', y='price', data=airbnb_df)
plt.title('Availability vs Price (Scatter Plot)')
plt.xlabel('Availability (Days)')

##### 1. Why did you pick the specific chart?

The scatter plot was chosen because it is highly effective at visualizing the relationship between two continuous variables—in this case, availability (days) and price. Here's why this specific chart was ideal:

Display of Patterns or Correlations: A scatter plot helps identify whether there is any correlation between price and availability. It shows how these two variables interact and if any clear trends (positive, negative, or no correlation) exist. In this instance, we wanted to see if higher prices correlate with lower or higher availability days.

Identifying Outliers: Scatter plots are excellent for spotting outliers—data points that deviate significantly from the general trend. This is especially useful in the Airbnb market, where extreme outliers (like luxury listings) can greatly affect insights.

Visualizing Spread and Density: The scatter plot shows the spread of listings across different price and availability levels. This gives an idea of how varied the market is, such as whether most listings are clustered around lower prices and high availability or if they are more evenly distributed.

Capturing Non-Linear Relationships: Unlike line or bar charts, scatter plots can highlight non-linear patterns. In this case, a scatter plot helps determine whether there is any clear or complex relationship between price and availability that other charts may not capture effectively.

##### 2. What is/are the insight(s) found from the chart?

From the scatter plot of availability vs. price, several insights can be identified:

1. No Clear Correlation Between Price and Availability:
There is no strong, consistent correlation between price and availability. Listings with low availability (0-50 days) and high availability (300+ days) can be found across a wide range of prices.
This suggests that factors other than availability (e.g., location, amenities, or listing quality) may play a more significant role in determining prices.
2. Most Listings Cluster Around Lower Prices:
The majority of listings are priced below $2000, with a significant number priced under $500. These listings tend to be spread across a broad range of availability days, indicating that affordable listings are available throughout the year and cater to a large segment of the market.
3. High-Price Outliers:
There are several outliers with extremely high prices (ranging from $2000 to $10,000). These outliers are spread across both low and high availability days. These listings may represent luxury properties or exclusive experiences that are rented either for short durations (e.g., specific events) or are kept available year-round for select guests.
4. High Availability for Low-Priced Listings:
Many low-priced listings (below $500) tend to be available for nearly all 365 days. This might indicate that budget-friendly listings are designed to stay open throughout the year, likely to cater to high demand from budget travelers or those looking for short-term stays.
5. Listings with Limited Availability (0–50 Days):
A cluster of listings shows limited availability for less than 50 days, but the prices vary widely, from low to high. These could represent seasonal properties or listings that cater to special events or peak travel seasons (e.g., holidays, conferences).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact:

No strong price-availability correlation allows flexibility in pricing strategies for hosts, potentially increasing profitability by adjusting prices without worrying about availability.
High availability for low-priced listings indicates demand for budget-friendly options, providing opportunities for hosts to maximize bookings with competitive pricing.


Potential Negative Growth:

High-priced outliers with low availability may struggle to attract consistent bookings, limiting revenue. These luxury properties should refine their marketing or adjust pricing to match demand.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
#Price Distribution by Neighbourhood Group (Violin Plot)
plt.figure(figsize=(12, 6))
sns.violinplot(x='neighbourhood_group', y='price', data=airbnb_df)
plt.title('Price Distribution by Neighbourhood Group (Violin Plot)')
plt.xlabel('Neighbourhood Group')

##### 1. Why did you pick the specific chart?


The violin plot was chosen because it effectively displays the distribution and density of prices across different neighbourhood groups. This chart combines aspects of a box plot and a density plot, allowing for:

1. Visualizing distribution spread: It shows how prices are spread across different neighborhoods, capturing outliers and the overall range.
2. Density comparison: The plot highlights where most of the listings are concentrated price-wise in each neighborhood group.
3. Identifying price variability: The violin plot helps identify neighborhoods with high or low price variability, like Manhattan, which shows a wide price range.

It's useful to analyze the price behavior across neighborhoods and spot key differences in pricing strategies.

##### 2. What is/are the insight(s) found from the chart?


Insights from the Violin Plot:

Manhattan Dominance:

 Manhattan has the highest price range, indicating it hosts many high-end listings alongside more affordable options, which suggests a diverse market catering to different customer segments.

Bronx and Staten Island Low Prices:

 Both the Bronx and Staten Island show much lower price distributions, with fewer high-priced listings, suggesting they are more budget-friendly neighborhoods.

Price Concentration in Brooklyn and Queens:

Brooklyn and Queens have more moderate pricing with a wider spread compared to Bronx and Staten Island, indicating a mix of mid-range and some high-end options, appealing to a larger demographic.


Presence of Outliers:

All neighborhoods have outliers in their price distributions, but they are most pronounced in Manhattan and Brooklyn, hinting at a few exceptional high-priced listings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


Positive Business Impact:

Targeted Marketing:

 Understanding price distributions enables hosts to tailor their pricing strategies to attract specific customer segments, especially in high-demand areas like Manhattan.

Competitive Pricing:

 Budget-friendly listings in the Bronx and Staten Island can leverage their lower prices to attract cost-conscious travelers, enhancing occupancy rates.

Negative Growth Insights:

High Competition in Manhattan: The significant number of high-priced listings may lead to market saturation, making it challenging for individual hosts to stand out, potentially limiting their revenue growth.
Outliers Impact: Outliers in pricing, especially in Brooklyn and Manhattan, can distort average price perceptions, leading hosts to set unrealistic pricing strategies that may not align with actual demand.

#### Chart - 6

In [None]:
#Host Listings Count vs Reviews (Bar Plot)
plt.figure(figsize=(12, 6))
sns.barplot(x='calculated_host_listings_count', y='number_of_reviews', data=airbnb_df)
plt.title('Host Listings Count vs Reviews (Bar Plot)')

I chose a bar plot to visualize the relationship between the number of host listings and the number of reviews because it allows me to:

Compare categorical and quantitative data:

The number of host listings is a categorical variable (representing different categories of hosts), while the number of reviews is a quantitative variable (representing the amount of reviews). Bar plots are effective for comparing categorical data with quantitative data.


Display the distribution of a categorical variable:

The x-axis of the bar plot shows the different categories of host listings, allowing me to visualize the distribution of reviews across these categories.
Compare the average values of a quantitative variable across categories: The y-axis of the bar plot represents the average number of reviews for each category of host listings, enabling me to compare the means across different categories.

Show variability within each category:

 The error bars in the bar plot indicate the variability (standard deviation) of the number of reviews within each category, providing a sense of the spread of data points.

##### 2. What is/are the insight(s) found from the chart?

The bar plot reveals the following insights about the relationship between the number of host listings and the number of reviews:


Highest number of reviews for single listings:

 Hosts with a single listing tend to receive the highest number of reviews on average. This suggests that hosts with fewer listings might be more focused on providing quality experiences, leading to more positive reviews.

Decreasing trend in reviews with more listings:

 As the number of host listings increases, the average number of reviews tends to decrease. This could be due to factors such as hosts prioritizing quantity over quality or facing more competition for reviews.


Variability in reviews:

There's a significant amount of variability in the number of reviews for each host listing count. This indicates that other factors besides the number of listings, such as the quality of the listings, the location, or the host's communication skills, also influence the number of reviews.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the bar plot can potentially help create a positive business impact in several ways:

Optimizing listing strategy:

By understanding the relationship between the number of host listings and the number of reviews, hosts can optimize their listing strategy. For example, hosts with multiple listings might consider focusing on quality over quantity, such as providing unique amenities or personalized experiences, to attract more reviews.

Improving customer satisfaction:

 The insights can also help hosts identify areas where they can improve customer satisfaction. By analyzing the factors that influence the number of reviews, hosts can address any issues that might be leading to negative reviews.

Building a strong online reputation:

 A strong online reputation is essential for attracting new guests and generating repeat business. By focusing on strategies that lead to more positive reviews, hosts can build a positive online reputation and increase their visibility on booking platforms.

However, it's important to note that the insights gained from the bar plot might not always lead to positive business growth. For example, if a host decides to focus on quality over quantity and reduces the number of their listings, they might experience a decrease in revenue in the short term. Additionally, if a host is unable to address the underlying issues that are leading to negative reviews, their efforts to improve their online reputation might be ineffective.

Therefore, it's crucial to carefully consider the implications of the insights gained from the bar plot and to use them in conjunction with other data and analysis to make informed business decisions.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
#Seasonality of Listings (Line Plot)
plt.figure(figsize=(12, 6))
sns.lineplot(x='last_review', y='calculated_host_listings_count', data=airbnb_df)
plt.title('Seasonality of Listings (Line Plot)')
plt.xlabel('Last Review')

##### 1. Why did you pick the specific chart?

I chose a line plot to visualize the seasonality of listings over time because it allows me to:

Show trends over time:

 Line plots are effective for representing data that is collected over time, allowing me to identify trends, patterns, and changes in the number of listings.

Visualize the relationship between two variables:

The x-axis represents the last review date, while the y-axis represents the number of host listings. This allows me to visualize the relationship between these two variables and how the number of listings changes over time.

Detect seasonality:

 Line plots can be used to detect seasonal patterns in data. By examining the plot, I can identify any recurring fluctuations in the number of listings that might be related to seasonal factors.

Identify outliers:

 Outliers, which are points that are far away from the main trend, can be easily identified on a line plot, allowing me to investigate their potential impact on the overall trend.

##### 2. What is/are the insight(s) found from the chart?

The line plot reveals the following insights about the seasonality of listings:

Increasing trend:

There's a clear upward trend in the number of host listings over time, suggesting that the market for listings has been growing steadily.

Seasonal fluctuations:

 The number of listings exhibits seasonal fluctuations, with peaks in the summer months and declines in the winter months. This might be due to factors such as tourist seasonality and demand for accommodations.
Outliers:

 A few outliers are present, which could be attributed to unusual events or factors that temporarily affected the number of listings.

Overall, the line plot suggests that the number of host listings has been increasing over time, with some seasonal patterns related to tourist seasonality and demand for accommodations

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the line plot can potentially help create a positive business impact in several ways:

Predicting future trends:

By understanding the seasonal patterns in the number of listings, businesses can predict future trends and adjust their strategies accordingly. For example, if the analysis shows a consistent increase in listings during the summer months, businesses can anticipate higher demand and prepare accordingly.

Optimizing pricing and inventory:

 The insights can also help businesses optimize their pricing and inventory strategies. By understanding the seasonal fluctuations in demand, businesses can adjust their prices and inventory levels to maximize revenue and minimize costs.

Improving marketing and promotions:

 The insights can be used to inform marketing and promotional activities. For example, businesses can focus their marketing efforts during peak seasons to attract more guests.


However, it's important to note that the insights gained from the line plot might not always lead to positive business growth. For example, if the analysis reveals a significant decline in the number of listings during the off-peak season, businesses might need to make adjustments to their operations to survive during this period. This could lead to negative growth in the short term, but it might be necessary to maintain long-term sustainability.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
#Bookings and Reviews Over Time
#Chart Type: Dual-axis Line Chart
plt.figure(figsize=(10, 6))
airbnb_df['last_review'] = pd.to_datetime(airbnb_df['last_review'])

# Group by month-year for aggregation
bookings_reviews = airbnb_df.groupby(airbnb_df['last_review'].dt.to_period('M')).agg({
    'number_of_reviews': 'sum',
    'availability_365': 'mean'
})

# Plot dual-axis chart
fig, ax1 = plt.subplots()

ax2 = ax1.twinx()
ax1.plot(bookings_reviews.index.astype(str), bookings_reviews['number_of_reviews'], 'g-')
ax2.plot(bookings_reviews.index.astype(str), bookings_reviews['availability_365'], 'b-')

ax1.set_xlabel('Date')
ax1.set_ylabel('Number of Reviews', color='g')
ax2.set_ylabel('Average Availability', color='b')

plt.title('Booking and Reviews Trend Over Time')
plt.xticks(rotation=45)
plt.show()




##### 1. Why did you pick the specific chart?

I chose a line chart to visualize the trend of bookings and reviews over time because it allows me to:

Show trends over time:

 Line charts are effective for representing data that is collected over time, allowing me to identify trends, patterns, and changes in the number of
 bookings and reviews.

Compare multiple variables:

 The line chart shows two variables (number of reviews and average availability) on the same plot, allowing me to compare their trends and identify any relationships between them.

Visualize the relationship between variables:

 The x-axis represents the date, while the y-axis represents the number of reviews and average availability. This allows me to visualize the relationship between these variables and how they change over time.

Detect seasonality:

 Line charts can be used to detect seasonal patterns in data. By examining the plot, I can identify any recurring fluctuations in bookings and reviews that might be related to seasonal factors.

##### 2. What is/are the insight(s) found from the chart?

The line chart reveals the following insights about the trend of bookings and reviews:

Increasing trend:

 Both the number of reviews and the average availability have shown an increasing trend over time, suggesting that the platform or service has been gaining popularity.

Correlation between bookings and reviews:

There seems to be a positive correlation between bookings and reviews, indicating that more bookings are likely to lead to more reviews.

Seasonal fluctuations:

 Both bookings and reviews exhibit some seasonal fluctuations, possibly due to factors such as holidays or vacations.
Lag between bookings and reviews: There might be a lag between bookings and reviews, as reviews might take some time to appear after a booking is made.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the line chart can potentially help create a positive business impact in several ways:

Predicting future trends:

 By understanding the trends in bookings and reviews, businesses can predict future demand and adjust their strategies accordingly. For example, if the analysis shows a consistent increase in bookings and reviews, businesses can anticipate higher demand and prepare by increasing their capacity or marketing efforts.

Optimizing operations:

 The insights can also help businesses optimize their operations. By understanding the seasonal fluctuations in bookings and reviews, businesses can adjust their staffing levels, inventory, and pricing strategies to maximize efficiency and profitability.

Improving customer satisfaction:

 The insights can be used to identify areas where customer satisfaction might be lacking. By analyzing the relationship between bookings and reviews, businesses can identify factors that might be influencing customer satisfaction and take steps to improve it.


However, it's important to note that the insights gained from the line chart might not always lead to positive business growth. For example, if the analysis reveals a significant decline in bookings and reviews during certain periods of the year, businesses might need to make adjustments to their operations to survive during these periods. This could lead to negative growth in the short term, but it might be necessary to maintain long-term sustainability.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
#Room Type Availability Across Neighborhood Groups(Heatmap)
plt.figure(figsize=(12, 6))
room_type_availability = airbnb_df.pivot_table(index='neighbourhood_group', columns='room_type', values='availability_365', aggfunc='mean')
sns.heatmap(room_type_availability, annot=True, cmap='coolwarm')
plt.title('Room Type Availability Across Neighborhood Groups')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a heatmap to visualize the availability of different room types across neighborhood groups because it allows me to:

Compare multiple categorical variables:

 Both room type and neighborhood group are categorical variables (representing different categories). Heatmaps are effective for comparing categorical data and identifying patterns or relationships between them.

Display the distribution of a quantitative variable across categories:

 The color of each cell in the heatmap represents the availability of a room type in a specific neighborhood group, allowing me to visualize the distribution of availability across different combinations of categories.

Identify patterns or clusters:

 By examining the color patterns in the heatmap, I can identify any patterns or clusters that might suggest relationships between room type and neighborhood group.

Easily compare values:

 The color scale in the heatmap provides a visual representation of the availability values, making it easy to compare the availability of different room types across neighborhood groups.

##### 2. What is/are the insight(s) found from the chart?

The heatmap reveals the following insights about the availability of different room types across neighborhood groups:

Entire home/apartments:

 Entire home/apartments are generally more available in Staten Island and Queens compared to Manhattan and Brooklyn. Bronx has the lowest availability of entire home/apartments.

Private rooms:

 Private rooms are most available in Staten Island, followed by Queens. Manhattan and Brooklyn have similar levels of availability for private rooms, while Bronx has the lowest availability.

Shared rooms:

 Shared rooms are most available in Manhattan and Brooklyn, followed by Queens. Staten Island has the lowest availability of shared rooms, while Bronx has the second lowest.


Neighborhood group comparison:

 Overall, Staten Island has the highest availability of all room types, while Bronx has the lowest availability across all room types. Queens and Manhattan have similar levels of availability for most room types, while Brooklyn has a mix of high and low availability depending on the room type.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the heatmap can potentially help create a positive business impact in several ways:

Identifying market opportunities:

 By understanding the availability of different room types across neighborhood groups, businesses can identify market opportunities. For example, if the analysis shows a high demand for entire home/apartments in a particular neighborhood group but low availability, businesses can consider investing in new properties or expanding their offerings in that area.

Optimizing pricing strategy:

The insights can also help businesses optimize their pricing strategy. By understanding the relative availability of different room types in different neighborhoods, businesses can adjust their pricing to reflect the supply and demand dynamics.

Improving marketing and promotions:

 The insights can be used to inform marketing and promotional activities. For example, businesses can target specific neighborhood groups with promotions for room types that are in high demand but low availability.


However, it's important to note that the insights gained from the heatmap might not always lead to positive business growth. For example, if the analysis reveals a low demand for a particular room type in a specific neighborhood group, businesses might need to reconsider their offerings in that area. This could lead to negative growth in the short term, but it might be necessary to maintain long-term sustainability.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
#Price Distribution with Log Transformation
plt.figure(figsize=(12, 6))
sns.histplot(np.log1p(airbnb_df['price']), bins=50, kde=True)
plt.title('Distribution of Prices (Log Transformed)')
plt.xlabel('Log of Price')

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

I chose a histogram to visualize the distribution of prices (log-transformed) because it allows me to:

Show the distribution of a quantitative variable:

 Histograms are effective for representing the distribution of a quantitative variable, allowing me to see how frequently different values occur within the data.

Identify the shape of the distribution:

 By examining the shape of the histogram, I can determine whether the distribution is normal, skewed, or has other characteristics.
Detect outliers: Outliers, which are values that are far away from the main cluster of data, can be easily identified on a histogram, allowing me to investigate their potential impact on the distribution.

Visualize the density of the data:

 The height of the bars in the histogram represents the density of the data within each bin, giving me an idea of how the data points are distributed.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the histogram of price distribution can potentially help create a positive business impact in several ways:

Understanding customer preferences:

By analyzing the distribution of prices, businesses can gain insights into customer preferences and the price sensitivity of their target market. This information can be used to inform pricing strategies and product development decisions.

Optimizing pricing strategy:

 The insights can also help businesses optimize their pricing strategy. For example, if the analysis reveals that the majority of customers are willing to pay a certain price range, businesses can adjust their pricing accordingly to maximize revenue.


Identifying pricing outliers:

 The histogram can help identify outliers, which are products with exceptionally high or low prices. Businesses can investigate these outliers to determine whether they are justified or if there are any pricing errors.
Improving product offerings: The insights can also be used to inform product offerings. By understanding the distribution of prices for different products, businesses can identify gaps in their product portfolio and develop new offerings that meet customer needs at various price points.


However, it's important to note that the insights gained from the histogram might not always lead to positive business growth. For example, if the analysis reveals that the majority of customers are unwilling to pay the current prices for certain products, businesses might need to adjust their pricing strategy or product offerings. This could lead to negative growth in the short term, but it might be necessary to maintain long-term sustainability.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
#Listing Popularity by Neighborhood (Bubble Chart)
plt.figure(figsize=(10, 6))
neighborhood_data = airbnb_df.groupby('neighbourhood_group').agg({
    'price': 'mean',
    'availability_365': 'mean',
    'number_of_reviews': 'sum'
})

plt.scatter(neighborhood_data['price'], neighborhood_data['availability_365'],
            s=neighborhood_data['number_of_reviews']*0.05, alpha=0.5, c='teal')
plt.title('Listing Popularity by Neighborhood')
plt.xlabel('Average Price')
plt.ylabel('Average Availability')
plt.show()


##### 1. Why did you pick the specific chart?

I chose a bubble chart to visualize the relationship between average price, average availability, and listing popularity for different neighborhoods because it allows me to:

Compare three quantitative variables:

 All three variables (average price, average availability, and listing popularity) are numerical values, making a bubble chart suitable for representing their relationships.


Visualize the relationship between three variables:

The x-axis represents the average price, the y-axis represents the average availability, and the size of the bubbles represents the listing popularity. This allows me to visualize how these three variables are related to each other.


Identify patterns or clusters:

 By examining the distribution of bubbles on the plot, I can look for any patterns or clusters that might suggest relationships between the variables.
Compare the relative importance of variables: The size of the bubbles can be used to compare the relative importance of listing popularity compared to average price and average availability.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
#Price Changes Over Time
plt.figure(figsize=(12, 6))
price_changes = airbnb_df.groupby(airbnb_df['last_review'].dt.to_period('M'))['price'].mean()
plt.plot(price_changes.index.astype(str), price_changes, marker='o', linestyle='-')
plt.title('Price Changes Over Time')
plt.xlabel('Date')
plt.ylabel('Average Price')

##### 1. Why did you pick the specific chart?

I chose a line chart to visualize the trend of average prices over time because it allows me to:

Show trends over time:

 Line charts are effective for representing data that is collected over time, allowing me to identify trends, patterns, and changes in the average price.


Visualize the relationship between two variables:

The x-axis represents the date, while the y-axis represents the average price. This allows me to visualize the relationship between these two variables and how the average price changes over time.


Detect seasonality:

 Line charts can be used to detect seasonal patterns in data. By examining the plot, I can identify any recurring fluctuations in the average price that might be related to seasonal factors.


Identify outliers:

Outliers, which are points that are far away from the main trend, can be easily identified on a line chart, allowing me to investigate their potential impact on the overall trend.

##### 2. What is/are the insight(s) found from the chart?

The line chart reveals the following insights about the trend of average prices:

Fluctuations:

 There are significant fluctuations in the average price over time, indicating that prices can increase or decrease significantly within short periods.


No clear trend:

There doesn't seem to be a clear upward or downward trend in the overall price level. Prices fluctuate around a certain average level.


Seasonal patterns:

 While not very pronounced, there might be some seasonal patterns in the price fluctuations. Prices might be slightly higher during certain periods of the year.

Outliers:

A few outliers are present, which could be due to unusual events or factors that temporarily affected the prices.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the line chart of average price fluctuations can potentially help create a positive business impact in several ways:


Predicting future trends:

By understanding the historical price fluctuations, businesses can predict future trends and adjust their strategies accordingly. For example, if the analysis shows a consistent increase in prices during certain periods of the year, businesses can anticipate higher costs and adjust their pricing accordingly.


Optimizing pricing strategy:

 The insights can also help businesses optimize their pricing strategy. By identifying factors that influence price fluctuations, businesses can adjust their pricing to maximize revenue and profitability.


Improving inventory management:

Understanding price fluctuations can also help businesses improve their inventory management. For example, if the analysis shows that prices are likely to increase during certain periods, businesses can stock up on inventory in advance to avoid higher costs.


However, it's important to note that the insights gained from the line chart might not always lead to positive business growth. For example, if the analysis reveals a significant decline in prices, businesses might need to adjust their pricing strategy or reduce costs to remain competitive. This could lead to negative growth in the short term, but it might be necessary to maintain long-term sustainability.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
#Availability over Time (Line Plot)
airbnb_df['last_review'] = pd.to_datetime(airbnb_df['last_review'])
monthly_availability = airbnb_df.groupby(airbnb_df['last_review'].dt.to_period('M')).availability_365.mean()

plt.figure(figsize=(12, 6))
monthly_availability.plot(kind='line')
plt.title('Average Availability Throughout the Year')
plt.xlabel('Month')
plt.ylabel('Average Availability (365 days)')
plt.show()


##### 1. Why did you pick the specific chart?

I chose a line chart to visualize the average availability throughout the year because it allows me to:

Show trends over time:

 Line charts are effective for representing data that is collected over time, allowing me to identify trends, patterns, and changes in the average availability.


Visualize the relationship between two variables:

 The x-axis represents the month, while the y-axis represents the average availability. This allows me to visualize the relationship between these two variables and how the average availability changes over time.


Detect seasonality:

Line charts can be used to detect seasonal patterns in data. By examining the plot, I can identify any recurring fluctuations in the average availability that might be related to seasonal factors.


Identify outliers:

 Outliers, which are points that are far away from the main trend, can be easily identified on a line chart, allowing me to investigate their potential impact on the overall trend.

##### 2. What is/are the insight(s) found from the chart?

The line chart reveals the following insights about the average availability throughout the year:

Fluctuations:

The average availability has fluctuated significantly over the years. There have been periods of high availability followed by periods of low availability.


Increasing trend:

 Overall, there seems to be a slight increasing trend in average availability, especially from 2019 to 2021. However, the trend has reversed in 2022 and 2023.


Seasonal patterns:

 There might be some seasonal patterns in the availability, but they are not very pronounced. It's difficult to identify clear peaks or troughs related to specific seasons.


Outliers:

 There are a few outliers, which are points that are far away from the main trend. These could be due to unusual events or factors that temporarily affected the availability.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the line chart of average availability throughout the year can potentially help create a positive business impact in several ways:

Predicting future trends:

 By understanding the historical fluctuations in average availability, businesses can predict future trends and adjust their strategies accordingly. For example, if the analysis shows a consistent increase in availability during certain periods of the year, businesses can anticipate higher competition and adjust their pricing or marketing strategies.


Optimizing pricing and inventory:

 The insights can also help businesses optimize their pricing and inventory strategies. By understanding the fluctuations in availability, businesses can adjust their prices and inventory levels to maximize revenue and profitability.

Improving customer satisfaction:

 By identifying periods of low availability, businesses can take steps to improve their operations and ensure that they can meet customer demand. This can help improve customer satisfaction and attract repeat business.


However, it's important to note that the insights gained from the line chart might not always lead to positive business growth. For example, if the analysis reveals a significant decline in average availability, businesses might need to make adjustments to their operations to address the shortage. This could lead to negative growth in the short term, but it might be necessary to maintain long-term sustainability.

#### Chart - 14 - Correlation Heatmap

In [None]:


# Select only numerical features for correlation calculation
numerical_features = airbnb_df.select_dtypes(include=np.number).columns
numerical_df = airbnb_df[numerical_features]

plt.figure(figsize=(10, 8))
sns.heatmap(numerical_df.corr(), annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap of Numerical Features')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a correlation heatmap to visualize the relationships between the numerical features in the dataset because it allows me to:

Compare multiple variables simultaneously:

 A heatmap can display the correlation between multiple variables in a single chart, making it easy to identify relationships and patterns.


Visualize the strength and direction of correlations:

 The color intensity and shading of the cells in the heatmap represent the strength and direction of the correlations. Darker colors indicate stronger correlations, while lighter colors indicate weaker correlations. The color gradient also indicates whether the correlation is positive (red) or negative (blue).

Identify clusters of correlated variables:

 By examining the patterns in the heatmap, I can identify clusters of variables that are highly correlated with each other. This can provide insights into the underlying relationships and dependencies between the variables.


Easily compare correlations:

 The heatmap provides a visual representation of the correlation values, making it easy to compare the correlations between different pairs of variables.

##### 2. What is/are the insight(s) found from the chart?

The correlation heatmap reveals the following insights about the relationships between the numerical features in the dataset:

Strong positive correlation between price and number of reviews:

There is a strong positive correlation between the minimum_price and number_of_reviews. This suggests that listings with higher prices tend to have more reviews, possibly indicating that higher-priced listings are more popular or offer better amenities.


Moderate positive correlation between price and availability:

There is a moderate positive correlation between the minimum_price and availability_365. This suggests that listings with higher prices tend to have lower availability, possibly indicating that they are more in demand and less frequently available.


Negative correlation between availability and reviews:

 There is a negative correlation between availability_365 and number_of_reviews. This suggests that listings with higher availability tend to have fewer reviews, possibly indicating that they are less in demand or offer lower quality accommodations.


Weak correlations between other variables:

 The correlations between other variables are generally weak or non-existent, suggesting that they are not strongly related.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
# Select a subset of numerical columns for the pair plot
subset_columns = ['price', 'minimum_nights', 'number_of_reviews', 'reviews_per_month', 'calculated_host_listings_count', 'availability_365', 'room_type'] # Include 'room_type' in the subset

# Generate the pair plot
sns.pairplot(airbnb_df[subset_columns], diag_kind='kde')

# Display the plot
plt.suptitle('Pair Plot of Selected Features', y=1.02)
plt.show()


##### 1. Why did you pick the specific chart?

I chose a pair plot to visualize the relationships between the selected features in the dataset because it allows me to:

Compare multiple variables simultaneously:

 A pair plot can display the relationships between all pairs of variables in a single chart, making it easy to identify patterns and correlations.


Visualize different types of relationships:

 The pair plot includes scatter plots, histograms, and density plots, which can reveal different types of relationships between the variables. Scatter plots are useful for identifying linear relationships, histograms are useful for understanding the distribution of individual variables, and density plots are useful for visualizing the probability density of the variables.

Identify patterns or clusters:

By examining the patterns in the scatter plots and the distributions in the histograms and density plots, I can identify any patterns or clusters that might suggest relationships between the variables.


Easily compare relationships between different pairs of variables:

 The pair plot provides a visual representation of the relationships between all pairs of variables, making it easy to compare the relationships between different combinations of variables

##### 2. What is/are the insight(s) found from the chart?

The pair plot reveals the following insights about the relationships between the selected features in the dataset:

Strong positive correlation between price and number of reviews:

 The scatter plot between minimum_price and number_of_reviews shows a clear upward trend, indicating a strong positive correlation. This suggests that listings with higher prices tend to have more reviews, possibly indicating that they are more popular or offer better amenities.


Moderate positive correlation between price and availability:

The scatter plot between minimum_price and availability_365 shows a slight upward trend, indicating a moderate positive correlation. This suggests that listings with higher prices tend to have lower availability, possibly indicating that they are more in demand and less frequently available.


Negative correlation between availability and reviews:

 The scatter plot between availability_365 and number_of_reviews shows a slight downward trend, indicating a negative correlation. This suggests that listings with higher availability tend to have fewer reviews, possibly indicating that they are less in demand or offer lower quality accommodations.


Weak correlations between other variables:

 The scatter plots between other pairs of variables show little or no correlation, suggesting that they are not strongly related.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on a thorough analysis of the Airbnb NYC 2019 dataset, the following strategies can help achieve the business objective of optimizing revenue, improving customer satisfaction, and enhancing operational efficiency:

**Optimal Pricing Strategy:**

Price analysis revealed that the most preferred price range is between $0-300. Focus on setting competitive pricing within this range to attract a larger customer base.


**Room Type Preference:**

Customers heavily favor "Entire home/apartment" listings, indicating a high demand for privacy and security. Airbnb should develop policies and marketing strategies that emphasize these aspects to build customer trust and goodwill.


**Location and Demand Insights:**

Manhattan’s Financial District is the most sought-after area for bookings. Tailoring recommendations for hosts in this neighborhood based on demand patterns and pricing preferences will drive higher engagement and revenue.


**Customer Experience Enhancement**:

The Financial District receives the most reviews, suggesting high satisfaction. Identifying the key factors contributing to this success and implementing similar strategies in other neighborhoods can boost performance across the platform.


**Market Segmentation by Occupation:**

To refine customer targeting, it is suggested that adding a column for customer occupation would provide valuable insights. This would enable Airbnb to cater to the needs of different professional groups and adjust offerings accordingly.


**Consumer Market Insights**:

The dataset provides a strong understanding of the budget preferences of the majority of customers. By focusing on enhancing services for the most profitable customer segments and expanding into less-tapped sectors, Airbnb can further grow its market share.

# **Conclusion**

This project has provided valuable insights into key factors influencing Airbnb's business performance in New York City. Through detailed exploratory data analysis (EDA), we identified critical areas for improvement and optimization, including optimal pricing strategies, room type preferences, and high-demand locations like Manhattan’s Financial District.

By focusing on customer preferences for privacy, particularly through "Entire home/apartment" listings, Airbnb can create more targeted policies to enhance customer trust and satisfaction. Additionally, understanding the budget preferences of the majority of customers and the importance of reviews in the Financial District will allow Airbnb to replicate successful strategies in other neighborhoods.

Moreover, incorporating new data, such as customer occupation, could provide further insights for market segmentation and help tailor services to different professional groups. Ultimately, the insights gathered from this analysis can empower Airbnb to improve operational efficiency, boost customer satisfaction, and increase revenue by focusing on high-demand areas, customer preferences, and optimal pricing.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***