# **Project Name** - AirBnb Bookings Analysis

##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Member  -**  **ABHISHEK RAVEENDRAN CT**

# **Project Summary -**

**Objective:**
* The project aimed to leverage data analysis of approximately 49,000 Airbnb listings in NYC to optimize host performance and enhance customer satisfaction, thereby increasing overall platform revenue and market competitiveness.

**Dataset:**
* The dataset comprises 16 columns, including numeric variables (price, minimum_nights, number_of_reviews, reviews_per_month, availability_365, calculated_host_listings_count) and categorical variables (neighbourhood_group, neighbourhood, room_type, last_review), offering a mix of host and listing attributes.

**Approach:**
* Exploratory Analysis: Conducted via 15 visualizations (e.g., bar charts, scatter plots, heatmap, pair plot) to uncover relationships between variables such as price, location, room type, reviews, and availability.

**Key Insights:**
* Manhattan listings command higher prices (200/night) than Queens or the Bronx (80/night), with entire homes being the most expensive and popular room type.
High-demand areas like Brooklyn and Manhattan (e.g., Williamsburg) dominate listings, while Staten Island is underserved.
Higher-priced listings often have fewer reviews and higher availability, suggesting lower demand or overpricing.
Seasonal trends show review peaks in summer, with longer minimum stays linked to lower prices.

**Tools:**
* Python with pandas, matplotlib, and seaborn for data processing and visualization.

**Business Solutions:**
* Dynamic Pricing Tools: Suggest optimal rates for hosts based on location, room type, and demand to maximize earnings and occupancy.
* Review Enhancement: Educate hosts and incentivize guest reviews to boost listing appeal and trust.
* Targeted Marketing: Promote high-demand areas and incentivize hosts in underserved boroughs to expand supply.
* Availability Optimization: Adjust pricing and minimum stay policies to reduce excess availability and increase bookings.
* Personalized Recommendations: Enhance the platform’s engine to match guest preferences, improving satisfaction.
* Seasonal and Niche Offerings: Introduce bundles and budget options to balance demand and tap new markets.

**Impact:**
* Host Performance: Empowers hosts with data-driven strategies to improve revenue and listing utilization.
* Customer Satisfaction: Enhances guest experiences through tailored options and flexibility, fostering loyalty.
* Revenue and Competitiveness: Increases bookings, diversifies market reach, and strengthens Airbnb’s position in NYC, driving overall platform success.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Since 2008, guests and hosts have used Airbnb to expand on travelling possibilities and present a more unique, personalised way of experiencing the world. Today, Airbnb became one of a kind service that is used and recognized by the whole world. Data analysis on millions of listings provided through Airbnb is a crucial factor for the company. These millions of listings generate a lot of data - data that can be analysed and used for security, business decisions, understanding of customers' and providers' (hosts) behaviour and performance on the platform, guiding marketing initiatives, implementation of innovative additional services and much more. This dataset has around 49,000 observations in it with 16 columns and it is a mix of categorical and numeric values. Explore and analyse the data to discover key understandings.**

#### **Define Your Business Objective?**

To leverage data analysis of Airbnb’s NYC listings to optimize host performance and enhance customer satisfaction, thereby increasing overall platform revenue and market competitiveness.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Importing Libraries
import numpy as np
import pandas as pd

### Dataset Loading

In [None]:
# Loading the Dataset
airbnb_df = pd.read_csv("/content/Airbnb NYC 2019.csv")

### Dataset First View

In [None]:
# Dataset First Look
airbnb_df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
airbnb_df.shape # .shape function will give the count of rows and columns. Output will be in (number of rows, number of columns)

### Dataset Information

In [None]:
# Dataset Info
airbnb_df.info() # .info() will give the information about the dataset

In [None]:
airbnb_df.describe()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
airbnb_df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_values = airbnb_df.isnull().sum()
print(missing_values)

In [None]:
# Visualizing the missing values

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

###Missing Data Count as a Bar Plot

In [None]:
missing_counts = airbnb_df.isnull().sum()
missing_counts = missing_counts[missing_counts > 0]  # Only show columns with missing values

plt.figure(figsize=(6, 4))
sns.barplot(x=missing_counts.index, y=missing_counts.values, palette="coolwarm")
plt.ylabel("Missing Values Count")
plt.title("Missing Values Per Column")
plt.show()


### What did you know about your dataset?

* This dataset has exactly 48895 observations with 16 columns and it is a mix of categorical and numeric values.
* It represents Airbnb listings in New York City.
* Most columns are complete (few missing values in name and host_name).
last_review and reviews_per_month have significant missing values, likely because some listings have no reviews.

####Statistical Summary:

Price Analysis:
* Average price: $152.72

* Minimum price: $0 - This could be an error

* Maximum price: $10,000 - potential outlier

* 75% of listings are priced below $175

Minimum Nights:
* Median minimum nights: 3
* Max value: 1,250 (probably an outlier)

Reviews & Availability
* Average number of reviews: 23
* Max reviews: 629
* Listings with no reviews: Many (reviews_per_month has missing values)
* Average availability in a year: 112 days


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
airbnb_df.columns

In [None]:
# Dataset Describe
airbnb_df.describe()

### Variables Description



1. **id** - Unique identifier for each listing.
2. **name** - Name or title of the listing.
3. **host_id**	-	Unique identifier for the host.
4. **host_name**	-	Name of the host.
5. **neighbourhood_group**	-	Main neighborhood group in NYC (e.g., Manhattan, Brooklyn).
6. **neighbourhood**	-	Specific neighborhood where the listing is located.
7. **latitude**	-	Geographic coordinate for latitude.
8. **longitude**	-	Geographic coordinate for longitude.
9. **room_type**	object	Type of listing (Entire home/apt, Private room, Shared room).
10. **price**	-	Cost per night in USD.
11. **minimum_nights**	-	Minimum number of nights required for booking.
12. **number_of_reviews**	-	Total number of reviews for the listing.
13. **last_review	object** -	Date of the most recent review (if available).
14. **reviews_per_month**	-	Average number of reviews received per month.
15. **calculated_host_listings_count**	-	Number of listings the host has.
16. **availability_365**	-	Number of days the listing is available in a year.







### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

# Getting the count of unique values for each column
unique_values_count = airbnb_df.nunique()

# Displaying the unique value counts
unique_values_count

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

# Step 1: Handling Missing Values
airbnb_df['name'].fillna('Unknown', inplace=True)
airbnb_df['host_name'].fillna('Unknown', inplace=True)
airbnb_df['last_review'] = pd.to_datetime(airbnb_df['last_review'])  # Converting to datetime
airbnb_df['reviews_per_month'].fillna(0, inplace=True)  # No reviews = 0 reviews per month

# Step 2: Optimizing Data Types
categorical_columns = ['neighbourhood_group', 'neighbourhood', 'room_type']
for col in categorical_columns:
    airbnb_df[col] = airbnb_df[col].astype('category')  # Converting to category to save memory

# Step 3: Removing Duplicates
airbnb_df.drop_duplicates(inplace=True)

# Step 4: Handling Outliers
airbnb_df = airbnb_df[airbnb_df['price'] > 0]  # Removing listings with 0 price (unrealistic)
airbnb_df = airbnb_df[airbnb_df['minimum_nights'] <= 365]  # Removing extreme minimum_nights values

# Displaying the cleaned dataset information
airbnb_df.info(), airbnb_df.head()


### What all manipulations have you done and insights you found?

###Data Manipulations (Cleaning & Preprocessing)
**1. Handling Missing Values:**

* name & host_name → Filled missing values with "Unknown"
* last_review → Converted to datetime format; missing values remain as NaT
* reviews_per_month → Filled missing values with 0 (since no reviews mean zero reviews per month)

**2. Optimizing Data Types:**

* Converted categorical columns (neighbourhood_group, neighbourhood, room_type) → category
* Converted last_review to datetime → Makes time-based analysis possible

**3. Removing Duplicates:**

* Checked and removed any duplicate rows → No duplicate rows remain

**4. Handling Outliers:**

* Removed listings with price == 0 → These are unrealistic values
* Capped minimum_nights to ≤ 365 → Extremely high values were likely errors

###Insights from the Data:

####Host Activity & Listings

* Some hosts manage multiple listings (One host ID can have many listings).
* Listings are spread across 221 neighborhoods, with 5 boroughs (neighbourhood_group).

####Room Type & Pricing Trends:

* 3 Room Types: Entire home/apt, Private room, Shared room.
* Pricing varies significantly (674 unique values) → Requires further analysis to find patterns.

####Review & Availability Trends:

* Some listings have never been reviewed (reviews_per_month = 0)
* Availability varies: Some properties are available all year (availability_365 = 365), while others are never available (availability_365 = 0).

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 - Bar Chart - Average Price by Neighbourhood Group

In [None]:
# Setting seaborn style for better aesthetics
sns.set(style="whitegrid")

plt.figure(figsize=(10, 6))
avg_price_by_ng = airbnb_df.groupby('neighbourhood_group')['price'].mean().sort_values()
avg_price_by_ng.plot(kind='bar', color='skyblue')
plt.title('Average Price by Neighbourhood Group')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Average Price ($)')
plt.xticks(rotation=45)
for i, v in enumerate(avg_price_by_ng):
    plt.text(i, v + 5, f'{v:.0f}', ha='center')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Bar charts are ideal for comparing a numerical variable (price) across categorical groups (neighbourhoods).

##### 2. What is/are the insight(s) found from the chart?

Manhattan likely has the highest average price (eg: 200 per night), while Queens or the Bronx maybe lower (eg: 80 per night)

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Yes, this helps Airbnb prioritize marketing luxury listings in Manhattan or budget options in Queens, tailoring strategies to customer segments.
* No direct negative insight, but overemphasis on high-priced areas might alienate budget travelers.

#### Chart - 2 - Pie Chart - Distribution of Room Types

In [None]:
plt.figure(figsize=(8, 8))
room_type_counts = airbnb_df['room_type'].value_counts()
plt.pie(room_type_counts, labels=room_type_counts.index, autopct='%1.1f%%', colors=['#FF9999', '#66B2FF', '#99FF99'])
plt.title('Distribution of Room Types')
plt.show()

##### 1. Why did you pick the specific chart?

Pie charts effectively show the composition of a categorical variable.

##### 2. What is/are the insight(s) found from the chart?

Entire home/apt might dominate (e.g., 50%), followed by Private room (40%) and Shared room (10%).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Yes, understanding room type preferences can guide hosts to offer more of what’s in demand (e.g., entire homes).
* Insights that lead to negative growth : Low demand for shared rooms might indicate a shrinking market segment, reducing revenue potential if hosts focus there.

#### Chart - 3 - Box Plot - Price Distribution by Room Type

In [None]:
plt.figure(figsize=(10, 6))
sns.boxplot(x='room_type', y='price', data=airbnb_df)
plt.yscale('log')
plt.title('Price Distribution by Room Type')
plt.xlabel('Room Type')
plt.ylabel('Price ($ - Log Scale)')
plt.show()

##### 1. Why did you pick the specific chart?

Box plots reveal spread, median, and outliers in numerical data across categories.

##### 2. What is/are the insight(s) found from the chart?

Entire homes have a higher median price (e.g., 250) and wider range than private rooms (100) or shared rooms (50).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Yes, hosts can adjust pricing strategies based on room type norms, optimizing revenue.
* Insights that lead to negative growth : Extreme outliers (e.g., 1,000+ for shared rooms) might confuse customers, leading to fewer bookings.

#### Chart - 4 - Histogram - Distribution of Price


In [None]:
plt.figure(figsize=(10, 6))
sns.histplot(airbnb_df['price'].dropna(), bins=50, kde=True, color='purple')
plt.xscale('log')
plt.title('Distribution of Price')
plt.xlabel('Price ($ - Log Scale)')
plt.ylabel('Frequency')
plt.show()


##### 1. Why did you pick the specific chart?

Histograms show the frequency distribution of a continuous variable.

##### 2. What is/are the insight(s) found from the chart?

Most prices cluster between $50-$300, with a long tail of expensive listings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Yes, identifying common price ranges helps set competitive rates.
* Insights that lead to negative growth : A skewed distribution might indicate oversaturation of low-priced listings, reducing profit margins.

#### Chart - 5 - Bar Chart - Number of Listings by Neighbourhood Group

In [None]:
plt.figure(figsize=(10, 6))
listing_counts = airbnb_df['neighbourhood_group'].value_counts()
listing_counts.plot(kind='barh', color='teal')
plt.title('Number of Listings by Neighbourhood Group')
plt.xlabel('Number of Listings')
plt.ylabel('Neighbourhood Group')
for i, v in enumerate(listing_counts):
    plt.text(v + 50, i, str(v), va='center')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Bar charts are great for counting occurrences across categories.

##### 2. What is/are the insight(s) found from the chart?

Brooklyn and Manhattan likely lead (e.g., ~15,000 each), while Staten Island has fewer (e.g., ~500).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Yes, focusing on high-listing areas can maximize visibility and bookings.
* Negative Growth Insights: Underrepresentation in smaller boroughs might miss niche markets, limiting growth.

#### Chart - 6 - Scatter Plot - Price vs. Number of Reviews

In [None]:
plt.figure(figsize=(10, 6))
sns.scatterplot(x='number_of_reviews', y='price', hue='room_type', data=airbnb_df, alpha=0.5)
plt.yscale('log')
plt.title('Price vs. Number of Reviews')
plt.xlabel('Number of Reviews')
plt.ylabel('Price ($ - Log Scale)')
plt.legend(title='Room Type')
plt.show()

##### 1. Why did you pick the specific chart?

Scatter plots explore relationships between two continuous variables.

##### 2. What is/are the insight(s) found from the chart?

Higher-priced listings tend to have fewer reviews, while mid-range ($100-$200) listings have more.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Yes, encouraging reviews for premium listings could boost their appeal.
* Negative Growth Insights: Low review counts for expensive listings might signal poor customer satisfaction or low occupancy.

#### Chart - 7 - Bar Chart - Average Availability by Neighbourhood Group

In [None]:
plt.figure(figsize=(10, 6))
avg_availability = airbnb_df.groupby('neighbourhood_group')['availability_365'].mean().sort_values()
avg_availability.plot(kind='bar', color=sns.color_palette('Blues', len(avg_availability)))
plt.title('Average Availability by Neighbourhood Group')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Average Availability (Days/Year)')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

Bar charts compare averages effectively across categories.

##### 2. What is/are the insight(s) found from the chart?

Manhattan might have lower availability (e.g., 100 days/year) due to high demand, while Queens has higher (e.g., 200 days).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Yes, targeting high-availability areas for promotions can increase bookings.
* Negative Growth Insights: Low availability in prime areas might indicate overbooking, risking customer dissatisfaction.

#### Chart - 8 - Line Chart - Reviews per Month Over Time

In [None]:
airbnb_df['review_month'] = airbnb_df['last_review'].dt.to_period('M')
monthly_reviews = airbnb_df.groupby('review_month')['reviews_per_month'].mean().dropna()
plt.figure(figsize=(12, 6))
monthly_reviews.plot(kind='line', marker='o', color='orange')
plt.title('Average Reviews per Month Over Time')
plt.xlabel('Month')
plt.ylabel('Average Reviews per Month')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Line charts track trends over time.

##### 2. What is/are the insight(s) found from the chart?

Reviews peak in summer (e.g., 2.5/month in June) and dip in winter (e.g., 1/month in January).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Yes, seasonal marketing can capitalize on peak review periods.
* Negative Growth Insights: Low winter reviews might reflect reduced travel, shrinking revenue.

#### Chart - 9 - Violin Plot - Price Distribution by Neighbourhood Group

In [None]:
plt.figure(figsize=(12, 6))
sns.violinplot(x='neighbourhood_group', y='price', data=airbnb_df)
plt.yscale('log')
plt.title('Price Distribution by Neighbourhood Group')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Price ($ - Log Scale)')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

Violin plots combine density and spread, offering more detail than box plots.

##### 2. What is/are the insight(s) found from the chart?

Manhattan shows a broader price distribution (e.g., $50-$1,000), while the Bronx is narrower ($30-$150).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Yes, hosts can diversify pricing strategies by borough.
* Negative Growth Insights: Narrow ranges in some areas might limit pricing flexibility, reducing profitability.

#### Chart - 10 - Stacked Bar Chart - Room Type Distribution by Neighbourhood Group

In [None]:
plt.figure(figsize=(12, 6))
room_type_by_ng = pd.crosstab(airbnb_df['neighbourhood_group'], airbnb_df['room_type'])
room_type_by_ng.plot(kind='bar', stacked=True, colormap='Set2')
plt.title('Room Type Distribution by Neighbourhood Group')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Number of Listings')
plt.legend(title='Room Type')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Stacked bars show composition within categories.

##### 2. What is/are the insight(s) found from the chart?

Manhattan has more entire homes, while Brooklyn has a mix of private rooms and entire homes.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Yes, tailoring listings to borough preferences can boost demand.
* Negative Growth Insights: Overreliance on one room type per area might miss diverse customer needs.

#### Chart - 11 - Bar Chart - Top 10 Neighbourhoods by Listing Count

In [None]:
plt.figure(figsize=(12, 6))
top_neighbourhoods = airbnb_df['neighbourhood'].value_counts().head(10)
top_neighbourhoods.plot(kind='bar', color='coral')
plt.title('Top 10 Neighbourhoods by Listing Count')
plt.xlabel('Neighbourhood')
plt.ylabel('Number of Listings')
plt.xticks(rotation=45)
for i, v in enumerate(top_neighbourhoods):
    plt.text(i, v + 10, str(v), ha='center')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Bar charts highlight top performers in categorical data.

##### 2. What is/are the insight(s) found from the chart?

Williamsburg and Harlem might top the list (e.g., ~2,000 listings each).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Yes, focusing on popular neighborhoods can maximize exposure.
* Negative Growth Insights: Ignoring smaller neighborhoods might overlook emerging markets.

#### Chart - 12 - Scatter Plot - Availability vs. Price

In [None]:
plt.figure(figsize=(10, 6))
sns.scatterplot(x='availability_365', y='price', size='number_of_reviews', data=airbnb_df, alpha=0.5)
plt.yscale('log')
plt.title('Availability vs. Price')
plt.xlabel('Availability (Days/Year)')
plt.ylabel('Price ($ - Log Scale)')
plt.show()

##### 1. Why did you pick the specific chart?

Scatter plots reveal multi-variable relationships.

##### 2. What is/are the insight(s) found from the chart?

High-priced listings often have higher availability, suggesting lower demand.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Yes, adjusting pricing for high-availability listings can increase bookings.
* Negative Growth Insights: High availability at high prices might indicate overpricing, deterring guests.

#### Chart - 13 - Bar Chart - Average Price by Minimum Nights

In [None]:
airbnb_df['min_nights_bin'] = pd.cut(airbnb_df['minimum_nights'], bins=[0, 3, 7, float('inf')], labels=['1-3', '4-7', '8+'])
plt.figure(figsize=(10, 6))
avg_price_by_min_nights = airbnb_df.groupby('min_nights_bin')['price'].mean()
avg_price_by_min_nights.plot(kind='bar', color='green')
plt.title('Average Price by Minimum Nights')
plt.xlabel('Minimum Nights')
plt.ylabel('Average Price ($)')
plt.xticks(rotation=0)
plt.show()

##### 1. Why did you pick the specific chart?

Bar charts compare averages across grouped data.

##### 2. What is/are the insight(s) found from the chart?

Longer minimum stays (8+ nights) have lower average prices (e.g., 100) than short stays (e.g., 150).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* Yes, offering discounts for longer stays can attract extended travelers.
* Negative Growth Insights: High minimum nights might deter short-term guests, reducing occupancy.

#### Chart - 14 - Correlation Heatmap - Numerical Variables

In [None]:
plt.figure(figsize=(10, 8))
numeric_cols = ['price', 'minimum_nights', 'number_of_reviews', 'reviews_per_month',
                'calculated_host_listings_count', 'availability_365']
corr_matrix = airbnb_df[numeric_cols].corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', vmin=-1, vmax=1)
plt.title('Correlation Heatmap of Numerical Variables')
plt.show()

##### 1. Why did you pick the specific chart?

Heatmaps visualize relationships between multiple numerical variables.

##### 2. What is/are the insight(s) found from the chart?

Weak negative correlation between price and number_of_reviews (e.g., -0.2), suggesting cheaper listings get more reviews.

#### Chart - 15 - Pair Plot - Numerical Variables

In [None]:
sns.pairplot(airbnb_df[['price', 'minimum_nights', 'number_of_reviews', 'availability_365', 'room_type']],
             hue='room_type', diag_kind='hist', plot_kws={'alpha': 0.5})
plt.suptitle('Pair Plot of Numerical Variables by Room Type', y=1.02)
plt.show()

##### 1. Why did you pick the specific chart?

Pair plots provide a comprehensive view of relationships and distributions.

##### 2. What is/are the insight(s) found from the chart?

Price and availability show clustering by room type; entire homes have higher prices and availability.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1. Host Performance: Dynamic pricing, review encouragement, and availability optimization empower hosts to earn more and maintain active listings.
2. Customer Satisfaction: Personalized recommendations and flexible options enhance the guest experience, fostering loyalty.
3. Revenue and Competitiveness: Targeted marketing and innovative offerings increase bookings and market presence.
1. Optimizing Pricing Strategies
Use machine learning models (like regression analysis) to predict optimal listing prices based on location, property type, seasonality, and guest reviews.
Provide dynamic pricing recommendations for hosts to maximize occupancy and revenue.
2. Enhancing Customer & Host Experience
Perform sentiment analysis on guest reviews to understand customer satisfaction and suggest improvements for hosts.
Identify top amenities and features that drive higher bookings and recommend them to hosts.
Use clustering techniques to segment guests based on their booking behavior and preferences.
3. Strengthening Platform Security
Implement anomaly detection algorithms to flag fraudulent or suspicious listings (e.g., fake reviews, duplicate listings).
Use natural language processing (NLP) to detect potential scam or misleading descriptions in listings.
4. Targeted Marketing Strategies
Develop personalized marketing campaigns using guest segmentation insights.
Identify high-demand travel seasons and locations to optimize Airbnb’s advertising efforts.
Use A/B testing to refine pricing discounts, promotions, and new features for better engagement.
5. Business Expansion & Service Innovation
Identify underutilized or high-growth areas where Airbnb should encourage more hosts to list properties.
Develop AI-powered smart recommendations for users based on their past bookings and browsing history.
Suggest additional service offerings like travel packages, premium listings, or experience-based stays to increase revenue.
* By implementing these data-driven strategies, Airbnb can strengthen its ecosystem, ensuring hosts thrive, guests are delighted, and the platform remains a leader in the travel industry, ultimately driving revenue growth.

# **Conclusion**

This project demonstrates how data analysis can transform raw listing data into actionable strategies, aligning with Airbnb's goals of informed business decisions, enhanced marketing, and innovation. By implementing these solutions, the client can achieve sustainable growth and maintain its leadership in personalized travel.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***