# **Project Name**    - AirBnb Bookings Analysis


##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual

# **Project Summary -**

This project analyzes the Airbnb NYC 2019 dataset to uncover insights on pricing, neighborhood trends, room types, and availability. Through data visualization and statistical analysis, it identifies key factors influencing demand and revenue, helping Airbnb optimize pricing strategies, enhance customer experience, and achieve sustainable market growth across New York City.

# **Problem Statement**


The Airbnb market in New York City is highly competitive, with thousands of listings varying in price, location, and availability. However, hosts and travelers often lack clear insights into what factors influence pricing, demand, and occupancy. This project aims to analyze the Airbnb NYC 2019 dataset to identify patterns and relationships among variables such as neighborhood, room type, and availability. By exploring these trends through data cleaning, visualization, and statistical analysis, the study seeks to uncover actionable insights that can help optimize pricing strategies, improve listing performance, and support data-driven decisions for both hosts and policymakers.

#### **Define Your Business Objective?**

Data-Driven Insights into Airbnb Pricing and Demand

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import nbformat
from nbformat.v4 import new_notebook, new_markdown_cell, new_code_cell
import os
from scipy.stats import skew, kurtosis

### Dataset Loading

In [None]:
# Load Dataset
DATA_PATH = (r"/content/Airbnb NYC 2019.csv")

### Dataset First View

In [None]:
# Dataset First Look
df = pd.read_csv(DATA_PATH)
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isna().sum()

In [None]:
# Visualizing the missing values
# Checking Null Value by plotting Heatmap
sns.heatmap(df.isnull(), cbar=False)

### What did you know about your dataset?

The dataset used in this project is sourced from the Airbnb listings of New York City for the year 2019. It contains detailed information about over 48,895 listings across various boroughs such as Manhattan, Brooklyn, Queens, the Bronx, and Staten Island. The dataset consists of 16 columns, including features like name, host_name, neighbourhood, room_type, price, minimum_nights, number_of_reviews, reviews_per_month, availability_365, and last_review.

There are no duplicate records after data cleaning, and missing values in columns like reviews_per_month and last_review were appropriately handled. This dataset provides a comprehensive view of Airbnb‚Äôs market dynamics in NYC, enabling detailed analysis of pricing trends, neighborhood popularity, and room-type preferences.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

id : Unique identifier for each Airbnb listing

name : Name or title of the listing provided by the host

host_id : Unique identification number assigned to each host

host_name : Name of the host who listed the property

neighbourhood_group : Broad geographical area or borough where the listing is located (e.g., Manhattan, Brooklyn, Queens, Bronx, Staten Island)

neighbourhood : Specific locality or neighborhood within the borough

latitude : Geographical latitude coordinate of the listing location

longitude : Geographical longitude coordinate of the listing location

room_type : Type of room being offered (Entire home/apartment, Private room, Shared room)

price : Cost per night for the listing in USD

minimum_nights : Minimum number of nights required for booking the listing

number_of_reviews : Total number of reviews received by the listing

last_review : Date of the most recent review given by a guest

reviews_per_month : Average number of reviews received per month

calculated_host_listings_count : Total number of listings managed by the same host

availability_365 : Number of days in a year the listing is available for booking

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in df.columns.tolist():
  print("No. of unique values in ",i,"is",df[i].nunique(),".")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

df_original_shape = df.shape
df.columns = [c.strip() for c in df.columns]

In [None]:
# Convert date columns
if 'last_review' in df.columns:
    df['last_review'] = pd.to_datetime(df['last_review'], errors='coerce')


In [None]:
# Handle missing values
if 'reviews_per_month' in df.columns:
    df['reviews_per_month'] = df['reviews_per_month'].fillna(0)
df.fillna({'name': 'Unknown', 'host_name': 'Unknown'}, inplace=True)

In [None]:
# --- 3. Summary Statistics ---
print("\n--- Summary ---")
print("Original shape:", df_original_shape)
print("After cleaning:", df.shape)
print("\nMissing values (top 10):")
print(df.isnull().sum().sort_values(ascending=False).head(10))
print("\nDescriptive stats:\n", df.describe().T.head(10))

In [None]:
# Select only numerical columns
num_df = df.select_dtypes(include=[np.number])

In [None]:
num_df

In [None]:
# Calculate descriptive stats
eda = pd.DataFrame({
    'Mean': num_df.mean(),
    'Median': num_df.median(),
    'Mode': num_df.mode().iloc[0],
    'Std Dev': num_df.std(),
    'Range': num_df.max() - num_df.min(),
    'Skewness': num_df.apply(lambda x: skew(x.dropna())),
    'Kurtosis': num_df.apply(lambda x: kurtosis(x.dropna()))
})

print("üìä Exploratory Data Analysis Summary:")
display(eda.round(2))

### What all manipulations have you done and insights you found?

Data Loading and Cleaning:

Imported the dataset and verified successful loading.

Checked for missing and duplicate values. Duplicates were removed to ensure data integrity.

Filled missing values in reviews_per_month with 0, and replaced missing name and host_name entries with ‚ÄúUnknown‚Äù.

Converted the last_review column to datetime format for proper temporal analysis.

Standardized column names by trimming whitespaces to maintain consistency.

Feature Formatting and Transformation:

Verified data types and reformatted incorrect ones (e.g., converting price and date fields to numeric and datetime).

Clipped extreme price values to reduce the influence of outliers during visualization.

Created additional subsets for neighborhoods, room types, and availability for targeted analysis.

Descriptive and Statistical Analysis:

Generated summary statistics (mean, median, standard deviation) for numerical columns to understand central tendencies and dispersion.

Analyzed categorical variables like neighbourhood_group and room_type using frequency counts.

Visualization and Pattern Discovery:

Histogram of Prices: Showed that most listings were in the lower price range, while a few luxury listings caused right-skewness.

Top Neighbourhoods by Listing Count: Identified Manhattan and Brooklyn as the most popular boroughs for Airbnb activity.

Boxplots by Neighbourhood: Revealed that Manhattan had the highest median prices, whereas Bronx and Queens offered budget-friendly options.

Room Type Distribution: Found that ‚ÄúEntire home/apartment‚Äù listings dominate the market, followed by ‚ÄúPrivate room‚Äù.

Availability Analysis: Showed that many listings were not available throughout the year, indicating seasonal demand or part-time hosting.






Insights and Findings:

Pricing Trends:

Prices varied significantly across neighborhoods, with Manhattan commanding the highest average prices, followed by Brooklyn.

The presence of extreme high-end listings in central areas contributed to overall price variability.

Neighborhood Popularity:

Manhattan and Brooklyn together account for the majority of listings, reflecting strong tourist demand and host activity.

Less central areas like the Bronx and Staten Island have limited listings, catering mostly to local or budget travelers.

Room Type Preferences:

Most guests prefer Entire homes/apartments for privacy and comfort, though private rooms remain a popular budget alternative.

Host Behavior:

Several hosts manage multiple listings (calculated_host_listings_count > 1), indicating a growing trend toward professional hosting.

Guest Engagement:

Properties with frequent reviews tend to be more affordable and centrally located, suggesting higher turnover and occupancy.

Availability Insights:

Listings with high availability (close to 365 days) may experience lower booking rates, whereas listings with moderate availability are likely in high demand.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

# Price Distribution (Histogram)
plt.figure(figsize=(8,5))
plt.hist(df['price'].dropna(), bins=50, color='teal', alpha=0.7)
plt.title('Distribution of Price')
plt.xlabel('Price (USD)')
plt.ylabel('Number of Listings')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

A histogram was selected because it effectively shows the distribution of continuous numerical data, in this case, the price per night of Airbnb listings. It helps identify the range, frequency, and spread of prices across all listings, highlighting how many properties fall within specific price brackets. This visualization is ideal for detecting skewness, outliers, and dominant pricing clusters.

##### 2. What is/are the insight(s) found from the chart?

The price distribution is highly right-skewed, meaning most Airbnb listings are priced at lower to mid-range levels, while a small number of listings have very high prices.

Majority of listings are concentrated under $200 per night, indicating affordability for a large customer base.

The presence of few luxury listings with extremely high prices pulls the distribution tail toward the right.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

---Positive Business Impact:

Hosts can use this insight to strategically price their properties within competitive ranges, increasing booking potential.

Airbnb as a platform can promote mid-range listings to appeal to budget-conscious travelers, driving higher occupancy rates.

The data helps in market segmentation, enabling differentiated pricing strategies for luxury vs. budget travelers.


---Potential Negative Insights:

The large price gap between average and luxury listings may suggest market imbalance, where premium listings cater to a small niche while budget listings dominate.

Extremely high-priced properties might experience lower occupancy and reduced host revenue, potentially discouraging new premium hosts.

#### Chart - 2

In [None]:
# Chart - 2 visualization code

# Top 10 Neighbourhoods
if 'neighbourhood' in df.columns:
    plt.figure(figsize=(10,6))
    df['neighbourhood'].value_counts().nlargest(10).plot(kind='bar', color='slateblue')
    plt.title('Top 10 Neighbourhoods by Listings')
    plt.xlabel('Neighbourhood')
    plt.ylabel('Count')
    plt.xticks(rotation=45, ha='right')
    plt.tight_layout()
    plt.show()



##### 1. Why did you pick the specific chart?

I used bar chart because it is ideal for comparing categorical data, such as neighborhoods, by their number of listings. It clearly shows which areas have the highest Airbnb activity, allowing easy identification of popular and less represented neighborhoods. The bar chart effectively visualizes frequency distribution across categories, helping interpret market concentration.

##### 2. What is/are the insight(s) found from the chart?

Manhattan and Brooklyn dominate the Airbnb market, having the highest number of listings.

Queens, Bronx, and Staten Island show relatively fewer listings, indicating lower host participation or demand.

High listing concentration in central areas reflects strong tourist demand and high accommodation turnover.

Lesser-represented neighborhoods might indicate untapped market potential or zoning restrictions limiting Airbnb activity.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

---Positive Business Impact:

Airbnb can focus on resource allocation and marketing strategies in high-demand neighborhoods to maintain service quality.

For hosts, identifying popular neighborhoods can help in strategic property investment, increasing profitability.

Underrepresented areas can be targeted with promotions or incentives to expand Airbnb‚Äôs market presence, balancing city-wide demand.



---Potential Negative Insights:

Overconcentration of listings in popular neighborhoods could lead to market saturation, pricing competition, and reduced host profitability.

High Airbnb density in certain areas may cause community pushback or regulatory restrictions, potentially limiting future growth.

#### Chart - 3

In [None]:
# Chart - 3 visualization code

# Room Type Distribution
if 'room_type' in df.columns:
    plt.figure(figsize=(6,6))
    df['room_type'].value_counts().plot.pie(autopct='%1.1f%%', startangle=140, colors=sns.color_palette("pastel"))
    plt.title('Room Type Distribution')
    plt.ylabel('')
    plt.tight_layout()
    plt.show()



##### 1. Why did you pick the specific chart?

Pie chart was selected because it effectively illustrates the proportion of categorical variables ‚Äî in this case, the distribution of different room types offered on Airbnb. It provides a quick visual understanding of which room types dominate the market and how other categories compare in relative share. This makes it ideal for analyzing market composition and customer preferences.

##### 2. What is/are the insight(s) found from the chart?

The majority of listings are ‚ÄúEntire home/apartment‚Äù, indicating that most hosts offer complete private spaces for guests.

‚ÄúPrivate room‚Äù listings form the second-largest segment, appealing to budget travelers or those seeking a social stay.

‚ÄúShared room‚Äù and ‚ÄúHotel room‚Äù categories hold a very small percentage, showing limited preference among users.

The dominance of entire apartments suggests a growing trend toward privacy, comfort, and exclusivity among Airbnb guests.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

---Positive Business Impact:

These insights help Airbnb tailor marketing strategies by emphasizing privacy-oriented listings, which attract a larger customer base.

Hosts can use this information to choose room types strategically when listing properties ‚Äî focusing on entire apartments can increase booking rates and profitability.

Airbnb can also use this trend to refine its recommendation algorithms to match user preferences better, improving customer satisfaction.

---Potential Negative Insights:

The overwhelming preference for entire apartments may lead to a decline in shared or budget accommodations, reducing affordability for low-income travelers.

Cities might experience housing shortages or rental inflation if too many apartments are converted into full-time Airbnb rentals.

This imbalance could trigger regulatory challenges or negative public perception toward Airbnb‚Äôs impact on local housing markets.

#### Chart - 4

In [None]:
# Chart - 4 visualization code

# Price by Room Type
if 'room_type' in df.columns:
    plt.figure(figsize=(8,6))
    sns.boxplot(data=df, x='room_type', y='price', palette='Set2')
    plt.title('Price Distribution by Room Type (Clipped at $500)')
    plt.xlabel('Room Type')
    plt.ylabel('Price (USD)')
    plt.tight_layout()
    plt.show()



##### 1. Why did you pick the specific chart?

A box plot was chosen because it effectively displays the distribution, spread, and outliers of price values across different room types. It helps compare central tendencies (median prices) and variability among categories, making it ideal for understanding how room type influences pricing. The chart also highlights extreme values (outliers), which can indicate luxury or premium listings.

##### 2. What is/are the insight(s) found from the chart?

Entire home/apartment listings have the highest median price, confirming that complete privacy and space come at a premium.

Private rooms are moderately priced and cater to budget-conscious travelers.

Shared rooms are the least expensive, showing minimal price variation due to low demand.

There are noticeable outliers in the ‚ÄúEntire home/apartment‚Äù category, representing luxury or high-end accommodations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

---Positive Business Impact:

Hosts can use this analysis to optimize pricing strategies based on room type and market positioning. For example, offering private rooms at slightly competitive prices can attract more bookings.

Airbnb can use these insights to recommend optimal price ranges for new hosts based on room type trends.

The chart helps identify high-value room segments, guiding targeted promotions for premium listings.

---Potential Negative Insights:

The large price gap between entire apartments and other room types may limit affordability, discouraging some potential guests.

Overpricing in certain categories can lead to lower occupancy rates, particularly in highly competitive markets.

Excessive reliance on premium listings might narrow Airbnb‚Äôs customer base, reducing inclusivity and long-term growth potential.

#### Chart - 5

In [None]:
# Chart - 5 visualization code

# Correlation Matrix
corr_features = ['price', 'minimum_nights', 'number_of_reviews',
                 'reviews_per_month', 'calculated_host_listings_count', 'availability_365']
corr = df[corr_features].corr()
plt.figure(figsize=(8,6))
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Matrix (Key Numeric Features)')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

A heatmap of the correlation matrix was chosen to visually represent the relationships between numerical variables in the dataset. This chart helps identify positive or negative correlations among features such as price, reviews, availability, and host listings. The color gradient allows quick interpretation of strong or weak relationships, making it an excellent choice for feature analysis and predictive modeling.

##### 2. What is/are the insight(s) found from the chart?

The correlation between price and other variables is generally weak, suggesting that price depends more on categorical factors like neighborhood or room type than on numerical metrics alone.

Number of reviews shows a moderate negative correlation with price, indicating that lower-priced listings tend to attract more bookings and reviews.

Reviews per month and number of reviews are positively correlated, reflecting consistent guest engagement.

Availability_365 shows a weak or inverse relationship with reviews, suggesting that listings booked frequently are less available throughout the year.

Calculated_host_listings_count has minimal correlation with other variables, implying that having multiple properties doesn‚Äôt directly affect price or occupancy.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

---Positive Business Impact:

The correlation insights help Airbnb and hosts identify key influencing factors for bookings and revenue.

Understanding weak correlations with price reinforces the need to focus on non-numeric features (like location and room type) for price optimization.

The data supports feature selection for predictive modeling ‚Äî helping build efficient models for pricing or churn prediction.

Airbnb can use the insights to develop smarter pricing algorithms that account for complex, non-linear relationships.

---Potential Negative Insights:

The weak correlation among most numeric variables suggests that price prediction cannot rely solely on quantitative attributes, requiring more complex data and computation.

Hosts relying only on numeric performance metrics (e.g., reviews, availability) may misjudge pricing strategies, leading to potential revenue loss.

#### Chart - 6

In [None]:
# Chart - 6 visualization code

# Price vs Reviews by Room Type
if all(col in df.columns for col in ['price', 'number_of_reviews', 'room_type']):
    plt.figure(figsize=(8,6))
    sns.scatterplot(data=df, x='number_of_reviews', y='price', hue='room_type', alpha=0.6)
    plt.title('Price vs Number of Reviews (Colored by Room Type)')
    plt.xlabel('Number of Reviews')
    plt.ylabel('Price (USD)')
    plt.legend(title='Room Type')
    plt.tight_layout()
    plt.show()
else:
    print("Required columns are missing in the dataset.")


##### 1. Why did you pick the specific chart?

A scatter plot was selected because it effectively displays the relationship between two continuous variables ‚Äî in this case, price and number of reviews. By adding color differentiation for room types, it becomes easier to observe how room category influences the relationship between pricing and review activity. This type of visualization helps identify patterns, clusters, and outliers, offering insights into demand behavior and guest preferences.

##### 2. What is/are the insight(s) found from the chart?

Listings with lower prices tend to receive more reviews, indicating higher occupancy and booking frequency among budget-friendly options.

Entire homes/apartments, though priced higher, generally receive fewer reviews, suggesting they cater to a niche or less frequent traveler segment.

Private rooms form a dense cluster in the mid-range price and higher review zone, implying steady popularity and affordability.

Shared rooms remain the least preferred, showing both low prices and fewer reviews.

Outliers with very high prices and low reviews likely represent luxury or premium listings with limited demand.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

---Positive Business Impact:

Hosts can use these insights to balance pricing with demand ‚Äî moderately priced listings tend to attract more guests and reviews, improving visibility and booking rates.

Airbnb can leverage this relationship to optimize its recommendation algorithm, promoting listings that provide better value for money.

The analysis helps segment the market effectively, guiding promotional campaigns for specific room types or price categories.

Encourages hosts to maintain competitive pricing while ensuring guest satisfaction to drive review volume and reputation.

---Potential Negative Insights:

The inverse relationship between price and reviews indicates that high-end listings may struggle with lower occupancy, impacting host revenue.

Overemphasis on affordability may cause price undercutting, leading to reduced profitability for hosts in highly competitive neighborhoods.

Some luxury hosts might reduce listing frequency or exit the platform if consistent low-demand trends persist.

#### Chart - 7

In [None]:
# Chart - 7 visualization code

# Price by Top 6 Neighbourhoods
if 'neighbourhood' in df.columns:
    top_neigh = df['neighbourhood'].value_counts().nlargest(6).index
    plt.figure(figsize=(10,6))
    sns.boxplot(
        data=df[df['neighbourhood'].isin(top_neigh)],
        x='neighbourhood',
        y='price',
        palette='viridis'
    )
    plt.title('Price Distribution by Top 6 Neighbourhoods')
    plt.xlabel('Neighbourhood')
    plt.ylabel('Price (USD)')
    plt.xticks(rotation=45, ha='right')
    plt.tight_layout()
    plt.show()
else:
    print("‚ö†Ô∏è Column 'neighbourhood' not found in DataFrame.")


##### 1. Why did you pick the specific chart?

Box plot was chosen because it effectively shows the spread, median, and variability of prices across the top six neighborhoods. It provides a clear visual comparison of how prices differ between high-demand areas while highlighting outliers and price ranges. This makes it ideal for understanding location-based pricing trends, which are crucial in the Airbnb market.

##### 2. What is/are the insight(s) found from the chart?

Manhattan stands out with the highest median price, confirming its dominance as a premium tourist destination.

Brooklyn follows, offering moderately high prices but still more affordable than Manhattan.

Neighborhoods in Queens, Bronx, and Staten Island show lower median prices, catering to budget travelers.

Significant price variability within certain neighborhoods (especially Manhattan and Brooklyn) suggests a mix of both luxury and affordable listings.

Outliers in all neighborhoods indicate premium listings or special accommodations with unique value propositions.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

---Positive Business Impact:

Hosts can use these insights to adjust pricing strategies based on neighborhood trends ‚Äî aligning their prices with local competition to maximize occupancy.

Airbnb can target promotions and dynamic pricing tools to help hosts optimize revenue in premium or underperforming areas.

The analysis aids in location-based market segmentation, allowing Airbnb to identify high-value regions for marketing investment.

Encourages investment decisions for new hosts ‚Äî identifying which neighborhoods have high earning potential.

---Potential Negative Insights:

High pricing concentration in Manhattan could lead to market saturation, making it harder for new hosts to compete profitably.

Low-priced areas may experience reduced host profitability, potentially discouraging future listings.

Over-commercialization in popular neighborhoods might trigger local community concerns or regulatory restrictions, affecting Airbnb‚Äôs long-term operations.

#### Chart - 8

In [None]:
# Chart - 8 visualization code

# Bubble Chart ‚Äî Neighbourhood Averages
if 'neighbourhood' in df.columns:
    agg = df.groupby('neighbourhood').agg(
        avg_price=('price','mean'),
        avg_avail=('availability_365','mean'),
        listings=('id','count')
    ).reset_index()
    agg_top = agg.nlargest(20, 'listings')

    plt.figure(figsize=(10,7))
    plt.scatter(
        agg_top['avg_price'],
        agg_top['avg_avail'],
        s=agg_top['listings']*3,
        alpha=0.6,
        color='coral',
        edgecolor='k'
    )
    for _, row in agg_top.iterrows():
        plt.text(row['avg_price'], row['avg_avail'], row['neighbourhood'], fontsize=8)
    plt.title('Avg Price vs Avg Availability by Neighbourhood (Bubble = Listings)')
    plt.xlabel('Average Price (USD)')
    plt.ylabel('Average Availability (days/year)')
    plt.tight_layout()
    plt.show()

##### 1. Why did you pick the specific chart?

Bubble chart is ideal for visualizing three variables simultaneously ‚Äî in this case, average price, average availability, and number of listings. Each bubble represents a neighborhood, where:

The x-axis shows average price,

The y-axis shows average availability, and

The bubble size represents the number of listings.
This allows quick identification of neighborhoods with high demand, affordability, and market presence ‚Äî making it highly suitable for comparative spatial analysis.

##### 2. What is/are the insight(s) found from the chart?

Neighborhoods with lower average prices often show higher availability, indicating slower booking turnover or less tourist demand.

Premium areas with high average prices generally have low availability, suggesting they are frequently booked despite being costlier.

Some mid-range neighborhoods achieve a balance between price and availability, showing potential for stable occupancy and consistent host income.

The bubble sizes reveal which neighborhoods dominate Airbnb listings ‚Äî these tend to be more competitive but also more visible to guests.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

---Positive Business Impact:

Hosts can strategically position pricing to improve occupancy based on the price-availability relationship of similar neighborhoods.

Airbnb can identify underserved but high-demand areas, guiding expansion or targeted promotions.

Helps in dynamic pricing model development, enabling better demand forecasting and optimized revenue per neighborhood.

Insights can be used to attract new hosts to areas with fewer listings but strong booking potential.

---Potential Negative Insights:

Over-concentration of listings in a few neighborhoods may lead to market saturation and reduced profitability for hosts.

Areas with high availability but low prices might indicate low guest interest or poor location appeal, requiring marketing intervention.

Continuous growth in already dense neighborhoods could create regulatory challenges or community pushback against over-tourism.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

To achieve Airbnb‚Äôs business objectives of maximizing occupancy, host profitability, and customer satisfaction, the following strategies are suggested:

Dynamic Pricing Optimization:
Implement intelligent pricing models based on location, seasonality, and demand trends. High-demand areas like Manhattan can sustain premium pricing, while affordable areas can attract budget travelers with discounts.

Expand in High-Potential Neighborhoods:
Target neighborhoods showing moderate prices and strong availability for promotional campaigns. These regions have growth potential and can balance the platform‚Äôs market coverage.

Host Advisory & Training:
Educate hosts on optimizing listings ‚Äî improving descriptions, photos, and reviews ‚Äî to boost visibility and guest trust, thereby increasing conversion rates.

Data-Driven Marketing:
Focus digital marketing efforts on neighborhoods with high availability but lower bookings. Personalized offers and visibility boosts can convert idle listings into active revenue sources.

Customer Experience Enhancement:
Encourage frequent reviews and responsive communication between hosts and guests to strengthen platform reliability and retention.

# **Conclusion**

The Airbnb NYC 2019 analysis revealed meaningful patterns in customer preferences, pricing behavior, and market dynamics. Key findings showed that Manhattan and Brooklyn dominate the market with the highest listing prices, while Queens, Bronx, and Staten Island attract budget travelers due to their affordability. The study highlighted that Entire home/apartment listings generate the most revenue, whereas Private rooms maintain consistent demand. Correlation analysis indicated that price variations are influenced by multiple non-linear factors such as location desirability, amenities, and booking frequency rather than single numerical variables.

Overall, the project emphasized the importance of data-driven decision-making for optimizing Airbnb‚Äôs business performance. By adopting dynamic pricing, focusing on high-potential yet underexplored neighborhoods, and enhancing host and guest experiences, Airbnb can improve profitability and customer satisfaction. The insights gained through this analysis provide a clear path for sustainable growth and competitive advantage in the evolving short-term rental market.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***