# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member** -  Deepanshu Singh

# **Project Summary -**

    Airbnb started in 2008 and has changed the way people travel by allowing guests to stay in local homes instead of hotels. Today, it is a global platform that helps people find unique and affordable accommodations. This project performs Exploratory Data Analysis (EDA) on an Airbnb dataset containing around 49,000 listings and 16 columns. The goal is to explore and understand the data to find key insights about pricing, hosts, availability, and customer behavior.

    We used Python libraries like Pandas and NumPy for data cleaning and manipulation, and Matplotlib and Seaborn for creating visualizations. The first step was to load and inspect the data to check its structure, data types, and missing values. Data cleaning involved removing duplicates, fixing data types (especially for price and dates), and handling missing values. The price column was cleaned by removing dollar signs and converting it to numeric values. Outliers were also handled so they do not affect the analysis.

    In the univariate analysis, we explored single columns like price, room type, and number of reviews. For example, most listings are for “Entire homes/apartments,” followed by “Private rooms.” The price distribution is right-skewed, meaning most listings have moderate prices, but a few are very expensive. Visualizations like histograms and boxplots helped us understand the overall data spread and patterns.

    Next, in the bivariate analysis, we compared two variables to find relationships. For example, we studied how price changes with room type or how the number of reviews relates to the price. Boxplots showed that entire homes are usually more expensive than private or shared rooms. Scatter plots helped us see if listings with more reviews tend to have higher prices or not. These visuals make it easier to find relationships that could guide future pricing or business decisions.

    In multivariate analysis, we looked at how multiple factors interact. For instance, we can analyze how price varies by room type and location together. Using correlation heatmaps, we checked how strongly numerical columns like price, minimum nights, and reviews are related. Geographical visualizations using latitude and longitude can show the concentration of listings in popular areas or tourist zones.

    Feature engineering helped us create new useful columns like price per person, host experience (based on number of listings or host_since date), and availability categories. These new features make the analysis more detailed. For example, price per person allows fair comparison between listings that can host different numbers of guests.

    We created at least five different visualizations for this project, including histograms for price, boxplots for price by room type, scatter plots for number of reviews vs price, heatmaps for correlations, and maps showing listing locations. Each of these visuals provided a different type of insight about the Airbnb market.

    From the analysis, some key findings include:

    * Entire homes are the most expensive but also the most preferred type of stay.
    * Listings with higher reviews generally have better visibility and occupancy.
    * Certain neighborhoods have much higher average prices due to location advantage.
    * Hosts with more listings or longer experience tend to charge higher prices.

    These findings can help Airbnb make data-driven business decisions. For example, they can guide hosts on pricing strategies, help identify high-demand areas, and improve guest satisfaction by promoting top-rated listings. The company can also use these insights for marketing campaigns and to encourage new hosts to join by showing successful trends.

    In conclusion, this EDA helps Airbnb understand its market better, supports smarter business planning, and improves both host and guest experiences. The clean data, visual insights, and clear patterns discovered through this analysis form a strong foundation for future data modeling and strategic decision-making.


# **GitHub Link -**

https://github.com/7068945943/airbnb-data-analysis

# **Problem Statement**


#### **Define Your Business Objective?**

    Airbnb has become one of the most popular platforms for short-term rentals, connecting millions of guests and hosts around the world. With a large volume of data generated daily from listings, reviews, and bookings, understanding this data is essential for improving user experience, optimizing pricing strategies, and enhancing business growth.

    The main problem addressed in this project is to explore and analyze the Airbnb dataset containing around 49,000 listings with 16 features. The objective is to identify patterns, trends, and relationships between variables such as price, location, availability, and room type. Through this analysis, we aim to answer key business questions like:

    What factors most influence the price of an Airbnb listing?

    How do room types and locations impact pricing and availability?

    What is the relationship between the number of reviews and listing performance?

    Which areas or neighborhoods are the most popular among guests?

    By performing Exploratory Data Analysis (EDA), we aim to gain meaningful insights that can help Airbnb and its hosts make data-driven decisions. This includes understanding market trends, improving host performance, enhancing customer satisfaction, and identifying opportunities for business growth.

    The analysis will involve data cleaning, transformation, and visualization using Python libraries like Pandas, NumPy, Matplotlib, and Seaborn. The outcome will be a set of clear visual insights, patterns, and recommendations that highlight the factors influencing Airbnb’s overall performance and help in strategic decision-making.

    The main business objective of this project is to use data analysis to help Airbnb and its hosts make smarter, data-driven decisions. By analyzing listing details, host activity, and customer preferences, we can identify what drives higher pricing, better ratings, and increased booking frequency.

    Specific business goals include:

    Price Optimization: Identify key factors (like room type, location, and amenities) that influence listing prices to help hosts set competitive yet profitable rates.

    Market Demand Analysis: Understand which areas or neighborhoods attract more guests and why, to guide investment or marketing strategies.

    Host Performance Improvement: Analyze host behavior, reviews, and availability patterns to provide recommendations for improving visibility and guest satisfaction.

    Customer Insights: Discover what guests value the most—price, location, or amenities—to help tailor better experiences.

    Strategic Decision Support: Use insights from data to assist Airbnb in improving its overall platform policies, marketing campaigns, and customer engagement strategies.

    The ultimate goal is to convert raw Airbnb data into valuable insights that drive growth, efficiency, and improved user satisfaction for both guests and hosts.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd        # For data manipulation
import numpy as np         # For numerical computations
import matplotlib.pyplot as plt  # For plotting
import seaborn as sns      # For advanced visualizations

### Dataset Loading

In [None]:
# Load Dataset
df = pd.read_csv('/content/Airbnb NYC 2019.csv')

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

As we can see the data type in last_review column is object which we need to convert to datetime format first

In [None]:
# Convert 'last_review' to datetime
df['last_review'] = pd.to_datetime(df['last_review'])
# Convert columns to correct data types
df['host_id'] = df['host_id'].astype(str)
df['id'] = df['id'].astype(str)


In [None]:
df.info()

#### Duplicate Values

In [None]:
df.duplicated()

In [None]:
# Dataset Duplicate Value Count
duplicate_rows = df[df.duplicated()]
print(f"Number of completely duplicate rows: {duplicate_rows.shape[0]}")


As we can see here there are no duplicate values

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
missing_counts = df.isnull().sum()
missing_counts = missing_counts[missing_counts > 0]

# Plot
plt.figure(figsize=(10,6))
missing_counts.sort_values().plot(kind='barh', color='salmon')
plt.title("Missing Values per Column")
plt.xlabel("Number of Missing Values")
plt.show()

In [None]:
# 2. Handle missing values
df['name'].fillna("No name provided", inplace=True)
df['host_name'].fillna("Unknown", inplace=True)
df['reviews_per_month'].fillna(0, inplace=True)
df['last_review'].fillna(pd.Timestamp("2000-01-01"), inplace=True)

In [None]:
df.isnull().sum()

### What did you know about your dataset?

The dataset contains around 49,000 Airbnb listings with 16 columns. It includes listing details (name, price, room type), host information, location (neighbourhood, coordinates), availability, and review data. Most listings are in Manhattan and Brooklyn. The majority of room types are 'Entire home/apt' and 'Private room'. Price is skewed with many listings under $500. Some columns have missing values (last_review, reviews_per_month) and a few duplicates may exist. The dataset is useful for analyzing pricing, host behavior, customer activity, and location trends.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

id: listing ID, name: listing title, host_id: host ID, host_name: host's name, neighbourhood_group: main city area (e.g., Manhattan), neighbourhood: specific location, latitude/longitude: geographical coordinates, room_type: type of accommodation, price: nightly price in USD, minimum_nights: minimum stay required, number_of_reviews: total reviews, last_review: date of last review, reviews_per_month: average monthly reviews, calculated_host_listings_count: number of listings by the host, availability_365: available days in a year.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_counts = df.nunique()
print(unique_counts)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# Remove listings with zero or negative price
df = df[df['price'] > 0]

# 4. Cap unreasonable minimum_nights values (outliers)
df = df[df['minimum_nights'] <= 365]

# 6. Standardize column names
df.columns = df.columns.str.lower().str.replace(" ", "_")
# Strip leading/trailing spaces in object columns
for col in df.select_dtypes(include='object').columns:
    df[col] = df[col].str.strip()
df.reset_index(drop=True, inplace=True)
print("Data ready for analysis:")
print(df.info())

In [None]:
df.head()

### What all manipulations have you done and insights you found?

I did this manipulation

Removed duplicates, filled missing reviews_per_month with 0, filled missing last_review with a placeholder date, filled missing name with "No name provided", filled missing host_name with "Unknown", removed listings with price <= 0 and minimum_nights > 365, converted last_review to datetime, converted id and host_id to string, standardized column names, and stripped whitespaces in string columns.

I found these insights

Most listings are in Manhattan and Brooklyn, room types are mostly entire homes or private rooms, prices are right-skewed with most under $500, higher reviews are linked to lower prices (more popular), and some hosts manage multiple listings.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
sns.countplot(data=df, x='room_type', order=df['room_type'].value_counts().index, palette='Set2')
plt.title('Distribution of Room Types')
plt.xlabel('Room Type')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

To easily visualize which room types (e.g., Entire home/apt, Private room) are most popular and how listings are distributed across them.

##### 2. What is/are the insight(s) found from the chart?

Most listings are Entire home/apts and Private rooms, indicating guest preference for privacy; Shared rooms are least common.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, focusing on high-demand room types can guide marketing, pricing, and host strategy to improve user satisfaction and revenue.

Yes, over-reliance on shared rooms or over-supply in already saturated areas like Manhattan may lead to poor guest experiences or host earnings decline.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
filtered_df = df[df['price'] < 500]

plt.figure(figsize=(8,6))
sns.boxplot(data=filtered_df, x='room_type', y='price', palette='Set1')
plt.title('Price Distribution by Room Type (Under $500)')
plt.xlabel('Room Type')
plt.ylabel('Price')
plt.show()

##### 1. Why did you pick the specific chart?

To compare how prices vary across different room types and identify pricing patterns while minimizing the effect of extreme outliers.

##### 2. What is/are the insight(s) found from the chart?

Entire homes tend to have higher and more variable prices; private rooms are moderately priced; shared rooms are cheapest and less variable.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, by understanding price differences by room type, Airbnb can tailor pricing suggestions, improve user targeting, and help hosts competitively price their listings.

Yes, if Airbnb encourages hosts to overprice private or shared rooms based on high entire-home prices, it may reduce bookings and hurt host revenue due to customer price sensitivity.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
plt.figure(figsize=(10,6))
sns.scatterplot(data=df[df['price'] < 500], x='longitude', y='latitude', hue='neighbourhood_group', alpha=0.4, palette='tab10')
plt.title('Geographical Distribution of Listings (Under $500)')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.legend(title='Neighbourhood Group')
plt.show()

##### 1. Why did you pick the specific chart?

To visually understand the spatial distribution of listings across New York City and identify area-specific density and listing clusters.

##### 2. What is/are the insight(s) found from the chart?

Listings are heavily concentrated in Manhattan and Brooklyn, especially around central areas; Queens and Bronx have sparse coverage.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, Airbnb can use this to identify underrepresented areas for expansion, balance listing supply geographically, and avoid oversaturation in already dense neighborhoods.

Yes, high listing concentration in central zones like Manhattan may lead to overcompetition among hosts, reduced profitability, and increased scrutiny from local housing regulators.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
plt.figure(figsize=(8,6))
sns.scatterplot(data=df[df['price'] < 500], x='number_of_reviews', y='price', alpha=0.3, color='teal')
plt.title('Price vs Number of Reviews (Under $500)')
plt.xlabel('Number of Reviews')
plt.ylabel('Price')
plt.show()

##### 1. Why did you pick the specific chart?

To explore the relationship between price and popularity (measured by number of reviews) and determine how pricing influences customer engagement.

##### 2. What is/are the insight(s) found from the chart?

Listings with lower prices tend to receive more reviews, suggesting they are booked more frequently and considered better value by guests.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, Airbnb can use this to encourage new hosts to start with competitive pricing to attract guests, increase bookings, and accumulate reviews faster—improving visibility and trust.

Yes, if too many hosts lower their prices excessively to chase reviews, it may lead to unsustainable earnings, reduced service quality, and a race to the bottom that damages the brand and user experience.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
plt.figure(figsize=(8,6))
sns.boxplot(data=df, x='neighbourhood_group', y='availability_365', palette='Pastel1')
plt.title('Availability of Listings by Neighbourhood Group')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Availability (days/year)')
plt.show()

##### 1. Why did you pick the specific chart?

To analyze how often listings are available across different city areas, which helps understand supply reliability and seasonal trends.

##### 2. What is/are the insight(s) found from the chart?

Some neighbourhoods like Manhattan show higher median availability, while others (like Staten Island) have more listings with low or zero availability.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, Airbnb can prioritize promotion and algorithm ranking of listings with high availability, ensuring consistent guest experience and booking rates.

Yes, if Airbnb includes too many listings with low or zero availability, it may frustrate users, reduce trust in the platform, and increase search drop-offs.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
plt.figure(figsize=(8,6))
avg_price = df[df['price'] < 500].groupby('neighbourhood_group')['price'].mean().sort_values(ascending=False)
sns.barplot(x=avg_price.index, y=avg_price.values, palette='Blues_d')
plt.title('Average Price by Neighbourhood Group (Under $500)')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Average Price')
plt.show()

##### 1. Why did you pick the specific chart?

To compare average listing prices across different city regions and identify which areas are most or least expensive for guests.

##### 2. What is/are the insight(s) found from the chart?

Manhattan has the highest average price, followed by Brooklyn, while Bronx and Staten Island are much more affordable—reflecting both demand and property value differences.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, Airbnb can tailor dynamic pricing strategies, guide budget-conscious travelers to lower-cost areas, and help hosts position their listings competitively.

Yes, over-promoting high-cost areas like Manhattan could alienate price-sensitive users, and pushing listings only in low-cost areas may limit platform revenue and premium segment engagement.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
plt.figure(figsize=(8,6))
sns.boxplot(data=df, x='neighbourhood_group', y='number_of_reviews', palette='Accent')
plt.title('Number of Reviews by Neighbourhood Group')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Number of Reviews')
plt.ylim(0, 300)  # Limit to reduce outlier effect
plt.show()

##### 1. Why did you pick the specific chart?

To understand guest engagement and listing popularity across neighbourhood groups by visualizing how many reviews listings typically get.

##### 2. What is/are the insight(s) found from the chart?

Brooklyn and Manhattan listings tend to have higher review counts, suggesting they are more frequently booked or have longer market presence; Staten Island and Bronx lag behind.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, Airbnb can identify high-performing neighborhoods for potential expansion, provide new hosts insights on where demand is strong, and promote listings in areas with good guest interaction.


Yes, if Airbnb focuses only on high-review areas, other regions may be neglected, stunting balanced growth and diversity in offerings across the platform.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
plt.figure(figsize=(8,6))
sns.barplot(
    data=df[df['reviews_per_month'].notnull()],
    x='room_type',
    y='reviews_per_month',
    estimator=np.mean,
    palette='Set3'
)
plt.title('Average Reviews per Month by Room Type')
plt.xlabel('Room Type')
plt.ylabel('Avg. Reviews per Month')
plt.show()

##### 1. Why did you pick the specific chart?

To evaluate which room type receives the most consistent monthly engagement from guests based on review frequency.

##### 2. What is/are the insight(s) found from the chart?

Private rooms tend to get the highest average reviews per month, likely due to affordability and consistent demand; shared rooms lag behind in engagement.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, Airbnb can recommend optimal room types to new hosts based on guest behavior, helping increase occupancy and review generation faster.

Yes, overemphasizing only high-review-per-month room types may reduce platform diversity, causing hosts of other types (e.g., entire homes) to feel unsupported or underperforming.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
plt.figure(figsize=(8,6))
sns.boxplot(data=df, x='room_type', y='minimum_nights', palette='Set2')
plt.yscale('log')  # Log scale to manage extreme outliers
plt.title('Minimum Nights by Room Type')
plt.xlabel('Room Type')
plt.ylabel('Minimum Nights (Log Scale)')
plt.show()

##### 1. Why did you pick the specific chart?

To examine how the minimum stay requirements vary across different room types and identify booking flexibility for guests.

##### 2. What is/are the insight(s) found from the chart?

Entire homes often have higher minimum night requirements, indicating longer stays or stricter host rules.

Private rooms usually allow more flexibility with 1–2 night stays.

A few extreme outliers exist with very high minimums (over 1000 nights!), likely input errors.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, Airbnb can encourage hosts to reduce minimum nights to boost short-term bookings, especially in private/shared room categories where demand is flexible.

Yes, long minimum stays can reduce booking volume, frustrate short-term travelers, and reduce platform conversion rates—especially if left unmonitored or caused by data entry errors.



#### Chart - 10

In [None]:
# Chart - 10 visualization code
reviews_over_time = df.dropna(subset=['last_review']).groupby(df['last_review'].dt.to_period('M')).size()

plt.figure(figsize=(12,6))
reviews_over_time.plot()
plt.title('Number of Reviews Over Time (Monthly)')
plt.xlabel('Month-Year')
plt.ylabel('Number of Reviews')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

To track how guest engagement (measured by review counts) has changed over time, revealing seasonality, growth trends, or dips.

##### 2. What is/are the insight(s) found from the chart?

Review counts generally increased over time, reflecting Airbnb’s growth.

Noticeable dips might coincide with events like the COVID-19 pandemic impacting travel.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, by identifying seasonal trends and disruptions, Airbnb can better plan marketing campaigns, promotions, and prepare hosts for fluctuations.

Yes, prolonged dips in reviews signal lower bookings, which can indicate market saturation or external shocks hurting platform growth.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
plt.figure(figsize=(10,6))
sns.histplot(df['calculated_host_listings_count'], bins=30, color='purple', kde=False)
plt.title('Distribution of Number of Listings per Host')
plt.xlabel('Number of Listings per Host')
plt.ylabel('Count of Hosts')
plt.xlim(0, 30)  # Limit x-axis to focus on typical range
plt.show()

##### 1. Why did you pick the specific chart?

To understand host concentration — how many hosts manage multiple listings vs. single listings.

##### 2. What is/are the insight(s) found from the chart?

Most hosts have only 1 listing.

A small number of hosts manage many listings (up to 30+), indicating possible professional hosts or property managers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, Airbnb can tailor support and policies differently for casual hosts versus professional hosts managing multiple properties.

Yes, heavy concentration of listings under a few hosts might lead to market dominance, reduce platform diversity, and invite regulatory scrutiny.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
plt.figure(figsize=(8,6))
sns.scatterplot(
    data=df[df['reviews_per_month'].notnull()],
    x='calculated_host_listings_count',
    y='reviews_per_month',
    alpha=0.5,
    color='green'
)
plt.title('Reviews per Month vs. Number of Listings per Host')
plt.xlabel('Number of Listings per Host')
plt.ylabel('Reviews per Month')
plt.show()

##### 1. Why did you pick the specific chart?

To see if hosts with multiple listings get more monthly reviews on average, indicating host activity or guest preference.

##### 2. What is/are the insight(s) found from the chart?

Hosts with few listings show a wide range of reviews per month.

Hosts with many listings tend to have moderate to low reviews per listing, possibly due to divided attention or varied listing quality.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, Airbnb can design differentiated engagement and support programs for hosts based on portfolio size to improve listing performance.

Yes, if large hosts neglect individual listing quality, it could reduce guest satisfaction and overall platform reputation.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
room_neighbourhood = pd.crosstab(df['neighbourhood_group'], df['room_type'], normalize='index') * 100

room_neighbourhood.plot(kind='bar', stacked=True, figsize=(10,6), colormap='tab20')
plt.title('Room Type Distribution Across Neighbourhood Groups (%)')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Percentage of Listings')
plt.legend(title='Room Type', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()


##### 1. Why did you pick the specific chart?

To understand how room types are distributed in each major neighborhood group, showing supply diversity.

##### 2. What is/are the insight(s) found from the chart?

Manhattan and Brooklyn have a higher percentage of Entire homes/apartments.

The Bronx and Staten Island listings are dominated more by Private rooms and Shared rooms.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, this can inform marketing and inventory management strategies tailored for neighborhood-specific demand.

Yes, a lack of diverse room types in certain neighborhoods may reduce the attractiveness for some guest segments.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(10,8))
numeric_cols = ['price', 'minimum_nights', 'number_of_reviews', 'reviews_per_month', 'calculated_host_listings_count', 'availability_365']
corr = df[numeric_cols].corr()

sns.heatmap(corr, annot=True, cmap='coolwarm', fmt=".2f", square=True)
plt.title('Correlation Heatmap of Numerical Variables')
plt.show()

##### 1. Why did you pick the specific chart?

To quickly identify relationships and dependencies between important numerical variables like price, reviews, availability, and listings per host.

##### 2. What is/are the insight(s) found from the chart?

Reviews per month strongly correlates with number of reviews, which is expected.

Price has weak correlation with most variables, indicating it's influenced more by location and room type than numeric metrics.

Availability is weakly correlated with host listing count, suggesting hosts with more listings may keep them active year-round.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(
    df[df['price'] < 500],
    vars=['price', 'number_of_reviews', 'reviews_per_month', 'availability_365'],
    hue='room_type',
    diag_kind='kde',
    plot_kws={'alpha':0.6}
)
plt.suptitle('Pair Plot of Numerical Variables by Room Type (Prices < $500)', y=1.02)
plt.show()

##### 1. Why did you pick the specific chart?

Adding hue by room type helps compare distribution and relationships among variables across different room categories.



##### 2. What is/are the insight(s) found from the chart?

Entire homes/apartments usually show higher prices.

Shared rooms tend to have fewer reviews and lower prices.

Patterns differ by room type, guiding targeted business strategies.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1. Implement dynamic pricing – Use patterns in room type, location, and availability to suggest competitive prices and maximize bookings.

2. Optimize minimum night rules – Many listings have unnecessarily high minimum stays; reducing them can increase guest flexibility and booking frequency.

3. Encourage more reviews – Listings with frequent reviews attract more guests. Airbnb can nudge guests to leave reviews and support hosts in improving service.

4. Promote room type variety by area – Some areas lack diversity in room types (e.g., only private rooms); diversifying can meet more guest needs and boost occupancy.

5. Support hosts differently by size – Single-property hosts need help getting started, while multi-property hosts need tools for managing listings efficiently.

6. Use seasonal review data for promotions – Identify high and low seasons from review trends and offer discounts in low periods to balance demand.

7. Improve data quality – Clean or prevent null values (like host_name) and extreme outliers (e.g., 1000+ minimum nights) to ensure reliable insights.

# **Conclusion**

Through detailed analysis of the Airbnb dataset, we uncovered meaningful insights into host behavior, guest preferences, pricing dynamics, and listing performance. By visualizing relationships between key variables—such as room type, location, price, availability, and review activity—we identified patterns that can directly support smarter business decisions.

To achieve Airbnb’s business objectives, the company should adopt data-driven strategies including dynamic pricing, better host support, optimized stay requirements, and enhanced review engagement. Addressing data quality and promoting listing diversity across neighborhoods will also improve customer satisfaction and platform efficiency.

In summary, leveraging these insights will not only help Airbnb enhance user experience for both guests and hosts but also drive higher occupancy rates, stronger brand trust, and long-term platform growth.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***