# **Project Name**    -



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual
##### **Team Member 1 -** - Rutika Kishor Khobragade


# **Project Summary -**

Write the summary here within 500-600 words.



**Project Summary: Airbnb Bookings Analysis**

The objective of the **Airbnb Bookings Analysis** project is to analyze and interpret booking data to uncover valuable insights that can help optimize both host performance and platform efficiency. This analysis focuses on understanding patterns, trends, and factors that influence booking behavior, pricing strategies, and overall market dynamics within the Airbnb ecosystem.

Key components of the project include:

1. **Data Collection and Preparation**:
   - Gathering historical booking data, including information on reservations, guest demographics, property characteristics, pricing, and location.
   - Cleaning and preparing the dataset to ensure accuracy and consistency for analysis.

2. **Trend Analysis**:
   - Identifying booking trends based on time (seasonality), location, and property type.
   - Analyzing seasonal variations and peak booking periods to optimize pricing and availability.

3. **Pricing Optimization**:
   - Evaluating the impact of pricing strategies on booking volume and revenue generation.
   - Investigating the correlation between price, occupancy rates, and guest satisfaction to determine ideal pricing models for hosts.

4. **Performance Metrics**:
   - Calculating key performance indicators (KPIs) such as occupancy rates, revenue per available room (RevPAR), and average nightly rates.
   - Benchmarking host performance based on guest reviews, response time, and cancellation rates.

5. **Guest Behavior Analysis**:
   - Analyzing guest preferences, including booking lead time, length of stay, and specific amenities.
   - Segmenting guests by demographics and behavior to identify target groups for personalized marketing strategies.

6. **Geographic Insights**:
   - Identifying high-demand locations, emerging markets, and areas with growth potential for new listings.
   - Comparing urban versus rural listings to understand regional differences in booking patterns.

7. **Competitor and Market Analysis**:
   - Benchmarking individual listings against similar properties within the same region to gauge competitive standing.
   - Analyzing competitor pricing, occupancy rates, and guest feedback to uncover opportunities for improvement.

8. **Recommendation and Strategy Development**:
   - Based on insights gained from the analysis, providing actionable recommendations for pricing strategies, host performance improvement, and market positioning.
   - Suggesting tactics for maximizing occupancy, enhancing guest satisfaction, and improving revenue for both hosts and Airbnb.

The ultimate goal of this project is to provide a data-driven framework for improving booking rates, increasing host profitability, and enhancing the overall guest experience on Airbnb. By leveraging advanced analytical techniques, the project aims to empower hosts with actionable insights to make informed decisions and optimize their listings for better performance on the platform.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**
**Problem Statement: AirBnB Bookings Analysis**

The objective of this analysis is to gain insights into the factors influencing bookings and pricing for AirBnB properties across various cities. The analysis will identify trends, patterns, and key drivers of booking behavior to help property owners, investors, and platform managers optimize their operations.

### Key Questions to Address:
1. **What are the primary factors affecting booking decisions?**
   - Investigate how features like location, property type, price, host ratings, and availability influence booking frequency.
   
2. **How do pricing trends vary by location, season, and property type?**
   - Analyze price fluctuations across different cities, seasons (e.g., holiday vs off-peak periods), and property types (e.g., apartments, houses, shared rooms).

3. **What is the impact of host reviews and ratings on booking probability?**
   - Examine the relationship between the number of reviews and overall ratings on the likelihood of a booking.

4. **Which factors contribute to higher or lower occupancy rates?**
   - Study the factors that lead to higher occupancy rates, such as property characteristics, seasonal demand, or the timing of the booking.

5. **How do booking patterns differ across geographic regions?**
   - Compare booking patterns across different cities or regions to understand local preferences and demand.

### Deliverables:
- A detailed report containing visualizations, trends, and insights on the relationship between property features and bookings.
- Predictive models (if applicable) for booking demand or pricing strategies.
- Recommendations for property owners on optimizing listings for higher occupancy and better pricing strategies.

### Data Required:
- AirBnB listing data (including features like location, price, type, number of bedrooms, host information, reviews, etc.)
- Booking data (dates of bookings, cancellations, length of stays)
- Seasonal and event data that might affect demand.

The analysis will inform decision-making for hosts and platform managers, leading to optimized pricing, targeted marketing strategies, and improved customer satisfaction.

#### **Define Your Business Objective?**

The business objective for an Airbnb bookings analysis is to gain insights into factors that influence booking patterns, pricing trends, occupancy rates, and customer preferences. This can help optimize property management, pricing strategies, and marketing efforts to maximize revenue, improve guest experiences, and enhance operational efficiency. Specific objectives might include:

1. **Identifying High-Demand Periods and Locations:** Understanding peak seasons and popular areas to adjust pricing and marketing strategies accordingly.
2. **Price Optimization:** Analyzing how different factors (e.g., property size, location, amenities) impact pricing and occupancy rates to set competitive rates.
3. **Guest Demographics and Preferences:** Identifying patterns in guest behavior, such as booking lead time, length of stay, and preferred amenities, to personalize offerings.
4. **Revenue Maximization:** Identifying strategies that increase overall revenue, such as adjusting pricing dynamically, offering discounts, or increasing visibility during high-demand times.
5. **Competitive Benchmarking:** Analyzing competitor listings to assess competitive advantages and areas for improvement in service offerings or pricing.
6. **Operational Efficiency:** Identifying areas to streamline operations, such as cleaning schedules, check-in/check-out times, and property maintenance.
7. **Improving Customer Satisfaction:** Analyzing guest reviews and ratings to identify opportunities to enhance service quality and guest experience.

Overall, the aim is to leverage data-driven insights to make informed decisions that drive profitability and growth for Airbnb hosts or property management teams.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***



### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
data = pd.read_csv('/content/drive/MyDrive/Airbnb_project/Airbnb NYC 2019 (1).csv')

### Dataset First View

In [None]:
# Dataset First Look
data

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
data.shape

### Dataset Information

In [None]:
# Dataset Info
data.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(data[data.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
data.isnull().sum()

In [None]:
# Visualizing the missing values
sns.barplot(data.isnull().sum())
plt.xticks(rotation = 90)
plt.xlabel('Columns')
plt.ylabel('Missing Values')
plt.title('Missing Values in Each Column')
plt.show()

### What did you know about your dataset?

***Dataset contains 48,895 listings with 16 columns, covering Airbnb listings in New York City from 2019. Below is a breakdown of the key features:***

#####**Key Columns and Their Meanings:**
* id – Unique listing identifier
* name – Name of the listing (some missing values)
* host_id – Unique ID of the host
* host_name – Name of the host (some missing values)
* neighbourhood_group – The major boroughs of NYC (e.g., Manhattan, Brooklyn)
* neighbourhood – The specific neighborhood within the borough
* latitude & longitude – Coordinates of the listing
* room_type – Type of accommodation (Entire home/apt, Private room, Shared room)
* price – Price per night in USD
* minimum_nights – Minimum nights required for a booking
* number_of_reviews – Total reviews received
* last_review – Date of the most recent review (some missing values)
* reviews_per_month – Average reviews per month (missing where no reviews exist)
* calculated_host_listings_count – Number of listings by the same host
* availability_365 – Number of days the listing is available in a year

#####**Observations:**
* Missing Data: Some values in name, host_name, last_review, and reviews_per_month are missing.
* Pricing: There could be outliers in price (extremely high/low values).
* Availability: Some listings have availability_365 = 0, meaning they were inactive.
* Reviews: reviews_per_month is NaN for listings with no reviews.
#####**Possible Analyses:**
* Price Trends: Distribution of prices and factors influencing them.
* Top Hosts: Hosts with the most listings.
* Popular Neighborhoods: Areas with the most listings or highest occupancy.
* Review Sentiment Analysis: If review text data is available.










## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
data.columns

In [None]:
# Dataset Describe
data.describe(include="all")

### Variables Description

The dataset contains 16 columns, including both categorical and numerical features. Here's an overview of the columns:

* id - Unique identifier for each listing.
* name - Name of the listing.
* host_id - Unique identifier for each host.
* host_name - Name of the host.
* neighbourhood_group - Broad area group (e.g., Manhattan, Brooklyn).
* neighbourhood - Specific area within the group.
* latitude - Latitude coordinate of the listing.
* longitude - Longitude coordinate of the listing.
* room_type - Type of room (e.g., Entire home/apt, Private room).
* price - Price per night.
* minimum_nights - Minimum number of nights required for booking.
* number_of_reviews - Total number of reviews for the listing.
* last_review - Date of the last review.
* reviews_per_month - Average number of reviews per month.
* calculated_host_listings_count - Total number of listings per host.
* availability_365 - Number of days the listing is available in a year.



### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in data.columns.tolist():
  print("Number of unique values in" , i , "=" , data[i].nunique())

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
df = data.select_dtypes(include=['int64','float64','object'])
df

In [None]:
# Write your code to make your dataset analysis to ready.
# Convert 'last_review' to datetime
df['last_review'] = pd.to_datetime(df['last_review'])

# Fill missing values in 'reviews_per_month' with 0
df['reviews_per_month'].fillna(0, inplace=True)

# Drop rows with missing 'name' or 'host_name'
df.dropna(subset=['name', 'host_name'], inplace=True)

# Remove duplicate entries if any
df.drop_duplicates(inplace=True)

# Remove outliers in 'price' and 'minimum_nights'
df = df[(df['price'] > 0) & (df['price'] < df['price'].quantile(0.99))]
df = df[(df['minimum_nights'] < df['minimum_nights'].quantile(0.99))]

# Save the cleaned dataset
cleaned_file_path = "Airbnb_NYC_2019_Cleaned.csv"
df.to_csv(cleaned_file_path, index=False)

print("Dataset cleaned and saved successfully.")

### What all manipulations have you done and insights you found?

### **Manipulations Performed:**
1. **Converted `last_review` to datetime** for better date-based analysis.
2. **Filled missing `reviews_per_month` with 0** (assuming no reviews).
3. **Removed duplicate entries** to ensure unique listings.
4. **Dropped rows with missing `name` or `host_name`** (minimal data loss).
5. **Filtered extreme outliers** in:
   - `price` (kept values below the 99th percentile).
   - `minimum_nights` (kept values below the 99th percentile).

---

### **Key Insights from the Cleaned Dataset:**
1. **Price Analysis:**
   - Average price: **$137.43**.
   - 75% of listings are **below $175**.
   - Price range (after filtering): **$10 - $795**.

2. **Room Type Distribution:**  
   - The dataset includes **entire apartments, private rooms, and shared spaces**.

3. **Minimum Nights Analysis:**
   - Median stay: **2 nights**.
   - Majority of listings require **1-5 nights** minimum.

4. **Availability:**
   - Average availability per year: **111 days**.
   - 50% of listings have availability of **43 days or less**.

5. **Host Activity:**
   - Average listings per host: **7.18**.
   - Some hosts manage **over 300 listings**.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
# distribution of room_type
plt.figure(figsize=(7,5))
sns.countplot(x=data['room_type'],color='red')
plt.xticks(rotation=45)
plt.title("Distribution of room_type")
plt.show()


##### 1. Why did you pick the specific chart?

A count plot provides a simple and effective way to visualize the distribution of categorical variables. For room types, it helps us see which type of accommodation is most common.

##### 2. What is/are the insight(s) found from the chart?

Entire homes/apartments are the most common listing type, followed by private rooms. Shared rooms and hotel rooms are much less common.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact: This insight highlights that entire homes and private rooms dominate the market, which aligns with customer preference for more private experiences. Airbnb could focus marketing efforts on these categories, which are already popular.

Negative Growth: The relatively low number of hotel rooms could indicate untapped potential for collaborations with boutique hotels or a lack of competitive edge in that category, which could be explored further.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
# longitude distribution
plt.figure(figsize=(7,5))
sns.histplot(x=data['longitude'],bins=10,kde=True,color='blue')
plt.title("Distribution of longitude")
plt.show()

##### 1. Why did you pick the specific chart?

A histogram of longitude represents geographic coordinates and is numerical. The histogram helps identify where most Airbnb listings are concentrated across NYC.

##### 2. What is/are the insight(s) found from the chart?

* The longitude values for NYC typically range from -74.25 to -73.7, with different boroughs concentrated in specific regions.
* The highest concentration of listings appears around Manhattan (-74.00 to -73.95) and Brooklyn (-73.95 to -73.85).
*Fewer listings exist towards the far east (Queens) or far west (Staten Island).



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact: The peaks in longitude distribution show that Airbnb listings are densely concentrated in Manhattan and Brooklyn, which are prime locations for tourists.Hosts and investors can focus on these high-demand areas to maximize bookings and revenue.If these areas have growing tourism potential, hosts could expand there for lower competition and better profitability.

Negative Growth: If too many listings exist in high-density locations (e.g., Midtown Manhattan, Williamsburg), it may lead to oversupply.NYC has strict short-term rental regulations, especially in Manhattan, where entire home rentals are often restricted.If Airbnb listings exceed legal limits, crackdowns and fines could hurt business growth.While some areas have fewer listings, it may indicate low tourist demand rather than an untapped opportunity.






#### Chart - 3

In [None]:
# Chart - 3 visualization code
# Distribution of neighbourhood group
plt.figure(figsize=(7,5))
sns.countplot(x=data['neighbourhood_group'],color='green')
plt.xticks(rotation=45)
plt.title("Distribution of neighbourhood_group")
plt.show()

##### 1. Why did you pick the specific chart?

Neighbourhood Group is a Categorical Variable which represents different boroughs (e.g., Manhattan, Brooklyn, Queens, Bronx, Staten Island).Since these values are discrete categories, a count plot is the best way to visualize their distribution.


##### 2. What is/are the insight(s) found from the chart?

* The highest number of Airbnb listings are in Manhattan and Brooklyn.
* Queens has a noticeable number of listings but significantly fewer than Manhattan and Brooklyn.
* This suggests moderate demand, likely due to its proximity to airports and lower prices compared to Manhattan.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact: Maximize Occupancy & Revenue: Focus on high-demand areas (Manhattan, Brooklyn) for better earnings.Queens and Bronx might offer growth opportunities with lower competition.Hosts can set competitive prices based on demand in different boroughs.

Negative Growth: The high number of listings means intense competition.
New hosts may struggle to get bookings or may need to lower prices, affecting profitability.Increased enforcement could reduce available listings, impacting hosts who rely on short-term stays.While these areas have fewer listings, it might be due to low tourist interest, meaning low occupancy rates.
Investing in these areas without demand research could lead to financial losses.



#### Chart - 4

In [None]:
# Chart - 4 visualization code
# Distribution of minimum nights
plt.figure(figsize=(7,5))
sns.histplot(x=data['minimum_nights'],bins=10,kde=True,color='green')
plt.title("Distribution of minimum_nights")
plt.show()

##### 1. Why did you pick the specific chart?

A histogram is ideal for visualizing the distribution of numerical data.It helps identify common values, trends, and outliers.This chart shows the most common minimum stay requirements set by Airbnb hosts.
The KDE (Kernel Density Estimate) curve helps smooth the distribution for better trend visualization.


##### 2. What is/are the insight(s) found from the chart?

* The majority of listings have low minimum night requirements, with a sharp decline as the number increases. This suggests that most Airbnb hosts set a low minimum stay requirement.
* There may be extreme values (e.g., listings requiring unusually high minimum nights), which could skew the analysis.
* A significant number of listings have a minimum stay of 1-3 nights, indicating that short-term rentals are dominant in the dataset.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact: Shorter minimum stays are likely more attractive to tourists and short-term travelers. Airbnb could use this insight to encourage hosts to lower their minimum stay requirements to attract more bookings.

Negative Growth: Listings with long minimum stays may limit flexibility, which could reduce bookings, particularly among tourists who prefer shorter stays.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
# Distribution Between neighbourhood_group and room_type
plt.figure(figsize=(7,5))
sns.boxplot(data=data , x = 'neighbourhood_group' , y = 'price')
plt.title('neighbourhood_group v/s price')

##### 1. Why did you pick the specific chart?

A boxplot because it is excellent for visualizing the distribution of numerical data (price) across different categories (neighbourhood_group).The boxplot shows the median price, interquartile range (IQR), and outliers, helping identify variations in price across different neighborhoods.

##### 2. What is/are the insight(s) found from the chart?

* Manhattan has the highest median price and the widest range of prices, including many high-price outliers.
* Queens, The Bronx, and Staten Island have relatively lower median prices and fewer extreme outliers.
* There are many extreme high-price listings, especially in Manhattan and Brooklyn.
* Staten Island and The Bronx have more stable prices with fewer fluctuations.
* Manhattan and Brooklyn show greater variability, likely due to higher demand and a mix of budget and luxury accommodations.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact: If certain neighbourhood_groups have higher median prices with reasonable price distribution (i.e., not many extreme outliers), those areas could be prioritized for marketing or expansion.

Negative Growth: If a neighbourhood_group has a lot of extreme price outliers or very low median prices, it could indicate lower demand or potential overpricing issues, leading to fewer bookings.


#### Chart - 6

In [None]:
# Chart - 6 visualization code
# price v/s room type
plt.figure(figsize=(8,6))
sns.boxplot(x='room_type',y='price',data=data)
plt.title("price v/s room type")

##### 1. Why did you pick the specific chart?

A box plot is ideal for comparing the distribution of prices across different room types, allowing us to detect variations and outliers.

##### 2. What is/are the insight(s) found from the chart?

Entire homes/apartments are the most expensive, followed by hotel rooms. Private rooms and shared rooms are much cheaper.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact: This price variation confirms that Airbnb can serve different customer segments, from budget to luxury. However, price competition among cheaper room types (private/shared rooms) may necessitate added value through better amenities or experiences.

Negative Growth: If prices for private rooms or shared spaces become too competitive, some hosts might leave the platform due to lower profitability.



#### Chart - 7

In [None]:
# Chart - 7 visualization code
plt.figure(figsize=(7,5))
sns.barplot(data=data , x = 'room_type' , y = 'price' , hue = 'neighbourhood_group' , palette = 'Set2')
plt.title('price by room_type and neighbourhood_group')
plt.show()

##### 1. Why did you pick the specific chart?

The bar plot was chosen because it is an effective way to compare categorical variables—in this case, "room_type" and "neighbourhood_group"—against a numerical value, "price." A bar plot clearly displays how different room types (e.g., Entire home/apt, Private room, Shared room) vary in price across different neighborhood groups.Bar plots make it easy to compare averages and identify trends or outliers.



##### 2. What is/are the insight(s) found from the chart?

* Entire homes/apartments have the highest average price across all neighbourhood groups, followed by private rooms, while shared rooms have the lowest prices.
* Manhattan has the highest prices overall, especially for entire homes/apartments. This aligns with the general expectation that Manhattan is the most expensive borough.
* Brooklyn follows as the second most expensive borough, particularly for entire homes/apartments, but prices are lower compared to Manhattan.
* Queens, Bronx, and Staten Island have significantly lower prices, especially for private and shared rooms.
* Shared rooms have relatively low variation in price across different boroughs, indicating a more consistent pricing structure.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact: Manhattan listings, especially entire apartments, have the highest average prices. This suggests strong demand, making it a profitable location for Airbnb hosts.Businesses targeting premium customers can focus on luxury rentals in Manhattan.Entire homes/apartments have the highest prices across all neighborhoods, while private and shared rooms are more affordable.
This allows Airbnb to cater to different customer segments, from budget travelers to luxury-seeking tourists.

Negative Growth: If some listings are priced too high relative to demand (e.g., high-priced private rooms in less popular areas), they may struggle to attract bookings.Hosts may need to adjust pricing based on competitive analysis.



#### Chart - 8

In [None]:
# Chart - 8 visualization code
plt.figure(figsize=(10,5))
sns.barplot(data=data , x = 'neighbourhood_group' , y = 'minimum_nights' , hue = 'room_type' , palette = 'Set3')
plt.title('minimum_nights across neighbourhood_group and room_type')
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart is ideal for comparing categorical variables, such as neighbourhood_group and room_type, against a numerical variable (minimum_nights).It allows easy visualization of trends and differences between neighborhoods and room types.

##### 2. What is/are the insight(s) found from the chart?

* The Bronx and Staten Island generally have higher average minimum night requirements across all room types compared to other boroughs.
* Across all boroughs, entire home/apartment listings tend to have higher minimum night requirements than private or shared rooms.
* Hosts may prefer longer stays to reduce turnover costs and vacancy gaps.
* These areas show lower average minimum night requirements, likely due to higher demand from tourists looking for short-term rentals.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact: The variation in minimum nights across room types and neighborhoods allows hosts to target different types of travelers (short-term tourists vs. long-term stays).Entire homes/apartments often have higher minimum night requirements, appealing to long-term visitors or business travelers.Hosts in high-demand areas (like Manhattan) can adjust their minimum stay policy to optimize revenue and reduce operational costs (e.g., cleaning fees).Encouraging shorter minimum stays in popular locations could attract more guests and increase occupancy rates.

Negative Growth: If some boroughs (e.g., Staten Island or The Bronx) require longer minimum stays, they may struggle to attract short-term tourists, leading to lower occupancy rates.If some areas enforce high minimum stay policies due to local regulations, Airbnb hosts might lose potential short-term rental customers.This could force hosts to rely only on long-term bookings, which might not always be consistent.





#### Chart - 9

In [None]:
# Chart - 9 visualization code
plt.figure(figsize=(7,5))
sns.scatterplot(x='latitude', y='price', data=data , color='pink')
plt.title('latitude v/s price')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a scatter plot because it is ideal for visualizing the relationship between two continuous variables: latitude (location) and price (listing price). The scatter plot helps identify trends, patterns, or outliers in the data. If there is a geographical trend in pricing, such as higher prices in certain areas, this plot will reveal that.

##### 2. What is/are the insight(s) found from the chart?

* The prices are scattered across different latitudes, but certain locations seem to have higher price clusters.
* Higher prices might be concentrated in specific latitude ranges, potentially indicating premium locations.
* There are some extreme price points, suggesting the presence of luxury listings that charge significantly more than the average.
* The majority of listings fall within a moderate price range, indicating a more balanced market distribution.




##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact: If higher prices cluster around specific latitudes (e.g., Manhattan), businesses can focus on acquiring or listing properties in those areas to maximize revenue.Hosts can optimize pricing based on geographic trends, setting competitive prices in high-demand locations.

Negative Growth: If listings in less desirable locations have high prices but low occupancy, it may indicate overpricing, leading to revenue loss.If prices are significantly lower in certain latitudes, it could indicate market saturation or declining demand, requiring a reassessment of property strategy.Some areas may have restrictions on short-term rentals, leading to potential legal challenges and business risks.



#### Chart - 10

In [None]:
# Chart - 10 visualization code
plt.figure(figsize=(7,5))
sns.barplot(data=data , x='room_type' , y='number_of_reviews' , hue='neighbourhood_group' )
plt.title('number_of_reviews across rooom_type and neighbourhood_group')
plt.show()


##### 1. Why did you pick the specific chart?

The bar chart was chosen because It allows easy comparison of the number of reviews across different room types while also distinguishing between different neighborhood groups using color (hue). Both room_type and neighbourhood_group are categorical variables, making a bar chart the best fit.

##### 2. What is/are the insight(s) found from the chart?

* Among all room types, private rooms generally have the highest number of reviews, suggesting they are more frequently booked and reviewed by guests.
* Listings in Manhattan and Brooklyn tend to have more reviews compared to other boroughs, indicating higher demand and guest engagement in these areas.
* Listings in Queens, Staten Island, and the Bronx generally receive fewer reviews, indicating lower demand compared to Manhattan and Brooklyn.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact: Certain room types (e.g., "Entire home/apt") and specific neighborhood groups (e.g., Manhattan, Brooklyn) receive significantly more reviews, indicating high guest engagement and demand.Hosts can adjust pricing and marketing efforts based on room types that attract the most customer interaction.Investing in "Entire home/apt" properties in high-demand neighborhoods can increase booking rates.

Negative Growth: Some room types (e.g., "Shared room") may have significantly fewer reviews, indicating lower guest interest.Hosts with such listings might struggle to maintain occupancy and profitability.Some neighborhoods might have lower engagement due to accessibility, pricing, or safety concerns.Hosts should analyze why certain areas perform worse and adjust marketing strategies accordingly.


#### Chart - 11

In [None]:
# Chart - 11 visualization code
plt.figure(figsize=(7,5))
sns.lineplot(data=data , x='neighbourhood_group' , y='calculated_host_listings_count' , hue = 'room_type')
plt.title("calculated_host_listing_count across neighbourhood_group and room_type")
plt.show()

##### 1. Why did you pick the specific chart?

A line plot is effective in showing trends and patterns across categorical variables. Here, it helps visualize how the number of listings varies across different neighbourhood groups and room types.

##### 2. What is/are the insight(s) found from the chart?

* Some neighborhoods (e.g., Manhattan and Brooklyn) have a significantly higher number of host listings compared to others, indicating a concentration of Airbnb activity.
* "Entire home/apt" tends to have the highest number of listings across most neighborhood groups.
* "Private room" listings are also significant but vary by location.
* "Shared room" listings are consistently low across all neighborhoods, suggesting limited demand for such accommodations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact: The visualization shows how the number of listings varies by room_type across neighbourhood_group.Hosts can identify high-supply areas and strategically position new listings where demand is strong but competition is manageable.If specific room types (e.g., "Entire home/apt") dominate in certain neighborhoods, hosts can focus on those profitable options.
Investors can adjust their offerings based on what works best in each borough.

Negative Growth: If some neighborhoods (e.g., Manhattan or Brooklyn) have extremely high listing counts for certain room types, oversaturation might reduce profitability.
Hosts in such areas might experience lower occupancy rates or price competition, making it harder to maintain revenue.If some areas (e.g., The Bronx, Staten Island) have very few listings, it could indicate lower demand or regulatory restrictions.
Hosts need to investigate whether these areas are genuinely underutilized opportunities or if low demand makes expansion unfeasible.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
plt.figure(figsize=(7,5))
sns.boxplot(data=data , x='room_type' , y='availability_365' , palette='Set1')
plt.title('room_type v/s availability')
plt.show()

##### 1. Why did you pick the specific chart?

A box plot was chosen because it is well-suited for visualizing the distribution of numerical data (availability_365) across different categories (room_type).A box plot provides a clear summary of the spread, median, and possible outliers for each room type, making it easy to compare availability across categories.

##### 2. What is/are the insight(s) found from the chart?

* listings generally have lower availability, with many having zero availability for the year. This suggests that hosts may rent them out seasonally or use them personally for part of the year.
* tend to have a higher median availability compared to entire homes, meaning they are listed for rent more consistently.
* show the highest availability on average, often being available for most of the year. This could be because hosts renting out shared spaces are more likely to have them consistently available.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact: High availability of private and shared rooms can ensure more consistent bookings, leading to stable revenue.

Negative Growth: Could indicate hosts are using these properties occasionally, limiting income potential.If demand is low, high availability doesn't translate into bookings, leading to revenue stagnation.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
plt.figure(figsize=(7,5))
sns.barplot(data=data , x='neighbourhood_group' , y='reviews_per_month' , hue='room_type' , palette='Set1')
plt.title('reviews_per_month across neighbourhood_group and room_type')
plt.show()

##### 1. Why did you pick the specific chart?

A bar plot is ideal for comparing reviews_per_month across different neighbourhood_group categories, with a breakdown by room_type.Since neighbourhood_group and room_type are categorical variables, a bar plot effectively displays differences between them.

##### 2. What is/are the insight(s) found from the chart?

* Manhattan and Brooklyn receive the highest reviews per month, indicating strong demand in these areas. These boroughs are likely hotspots for Airbnb activity.
* Entire homes/apartments tend to get fewer reviews per month compared to private rooms, suggesting:
Guests who book entire homes may stay longer, leading to fewer total bookings and reviews.
Private rooms could be more frequently booked, leading to higher review counts.
* Staten Island has the lowest reviews per month across all room types, implying lower demand for Airbnb stays in this area.
* Shared rooms receive the fewest reviews overall, indicating limited popularity compared to other room types.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business Impact: These neighborhoods show the highest reviews_per_month, indicating high guest engagement and frequent bookings.Businesses focusing on these areas can maximize revenue by targeting travelers looking for highly reviewed properties.Entire homes and private rooms in popular areas receive more reviews per month.

Negative Growth: These areas have significantly fewer reviews per month, indicating lower guest engagement or fewer bookings.If these areas continue to have low demand, hosts may struggle with occupancy and profitability.Across most neighborhoods, shared rooms have the lowest reviews_per_month.This suggests lower demand, meaning that investments in shared accommodations might not yield high returns.


#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(12, 8))
corr_matrix = data[['price', 'number_of_reviews', 'reviews_per_month', 'availability_365']].corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap')

##### 1. Why did you pick the specific chart?

The heatmap was chosen because it effectively visualizes correlations between numerical variables, allowing us to identify relationships that impact business decisions.Helps in understanding how factors like price, number of reviews, reviews per month, and availability influence each other.

##### 2. What is/are the insight(s) found from the chart?

* Higher-priced listings tend to have fewer reviews.
* This suggests that expensive properties may have lower booking frequency or cater to a niche market.
* Listings with a high number of total reviews also receive frequent new reviews, suggesting steady guest turnover and high occupancy.
* Listings with very high availability do not necessarily get more reviews, indicating that just being available does not guarantee bookings.


#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
plt.figure(figsize=(10, 8))
col  = ['price', 'number_of_reviews', 'reviews_per_month', 'availability_365']
sns.pairplot(data = data, vars = col, corner=True)

##### 1. Why did you pick the specific chart?

A pairplot is an excellent choice for exploring relationships between multiple numerical variables.This chart shows how price, number_of_reviews, reviews_per_month, and availability_365 interact.It helps detect trends, clusters, or patterns between different attributes.The scatterplots reveal whether two variables have a positive, negative, or no correlation (e.g., price vs. reviews).Some extreme values (e.g., very high prices) can be seen clearly.This helps in understanding whether outliers are affecting the data distribution.By setting corner=True, we avoid redundant duplicate plots, making it easier to interpret the relationships.


##### 2. What is/are the insight(s) found from the chart?

* The price variable shows an extreme right skew, meaning a few listings have exceptionally high prices.
* Higher-priced properties do not necessarily have more reviews or higher availability.
* Many expensive listings appear in the lower range of reviews, suggesting that premium listings may have lower occupancy or fewer bookings.
* Listings with a high total number of reviews also receive frequent new reviews, confirming high occupancy and guest engagement.
* High availability does not guarantee more reviews.
* Some properties are available for the full 365 days but have very few reviews, possibly due to low demand in those locations.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

To achieve business objectives effectively, the client should focus on the following strategies based on insights from the dataset:

### **1. Optimize Pricing Strategy**
- **Target Affordable Price Ranges:** Listings with moderate pricing tend to attract more reviews and frequent bookings. Offer competitive pricing for high-demand areas like Manhattan and Brooklyn.
- **Analyze Premium Listings:** High-priced properties often have fewer reviews. Enhance marketing efforts or provide unique experiences to attract a niche clientele.

### **2. Focus on High-Demand Areas**
- **Maximize Presence in Manhattan and Brooklyn:** These neighborhoods have the highest engagement (reviews per month). Expanding inventory or improving visibility in these areas can boost revenue.
- **Improve Offerings in Emerging Areas:** Staten Island and the Bronx show lower engagement. Introduce promotional strategies or unique property features to attract travelers.

### **3. Leverage Property Types**
- **Promote Entire Homes and Private Rooms:** These categories consistently outperform shared rooms in terms of engagement. Prioritize investments in these types for stable revenue streams.
- **Reassess Shared Room Strategy:** Since shared rooms show less demand, focus on differentiating them with value-added services or discontinue underperforming listings.

### **4. Enhance Availability Optimization**
- **Align Availability with Demand:** Properties with 100–300 days of availability often perform better. Avoid overly extended availability for listings with limited demand.
- **Seasonal Adjustments:** Use historical data to align availability and pricing with peak tourist seasons to maximize occupancy and revenue.

### **5. Improve Guest Engagement**
- **Focus on Guest Reviews:** Encourage guests to leave reviews, as properties with frequent reviews appear more reliable and attract more bookings.
- **Deliver Exceptional Experiences:** Enhance cleanliness, amenities, and service quality to boost guest satisfaction and repeat bookings.

### **6. Use Data-Driven Marketing**
- **Target High-Traffic Locations:** Focus on marketing properties in neighborhoods with proven demand.
- **Leverage Analytics:** Regularly review pricing, occupancy, and guest feedback to refine strategies.

By implementing these strategies, the client can effectively align with business objectives such as increasing occupancy, maximizing revenue, and improving customer satisfaction. Would you like detailed recommendations for specific neighborhoods or property types?

# **Conclusion**

### **Conclusion from the Airbnb NYC 2019 Data Analysis:**

1. **High-Demand Areas:**  
   - **Manhattan and Brooklyn** are the most active neighborhoods, receiving the highest number of reviews per month.  
   - Investing in these areas can maximize revenue and occupancy rates.

2. **Impact of Pricing:**  
   - Listings with **moderate pricing** receive more reviews, indicating higher demand.  
   - **Expensive properties tend to have fewer reviews**, suggesting lower booking frequencies.  
   - A **data-driven pricing strategy** is essential for optimizing occupancy.

3. **Room Type Performance:**  
   - **Entire homes and private rooms** perform significantly better than shared rooms in terms of engagement and reviews.  
   - Shared rooms have lower demand and might require additional marketing or discounts to attract bookings.

4. **Availability vs. Bookings:**  
   - **Simply increasing availability does not guarantee more bookings.**  
   - Listings with **300+ days of availability** do not necessarily have more reviews, suggesting demand-driven bookings rather than supply-driven.  
   - **Optimal availability management** (aligned with seasonal trends) is crucial.

5. **Guest Engagement & Reviews:**  
   - Listings with more reviews per month have higher guest engagement, indicating consistent bookings.  
   - Encouraging guest reviews and maintaining high service quality can improve property rankings and bookings.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***