# **Project Name**    -    AIRBNB NYC 2019 BOOKING DATA ANALYSIS



##### **Project Type**    - EDA
##### **Contribution**    - Individual

# **Project Summary -**

The project centers around the exploratory data analysis (EDA) of the Airbnb dataset for New York City (NYC), which provides insights into Airbnb listings in the city, including their pricing, location, availability, and other key attributes. The goal of this project is to uncover patterns, trends, and relationships within the data, and to prepare it for potential modeling or further analysis. The analysis focuses on data wrangling, visualizations, and extracting meaningful insights from the dataset, ultimately aimed at better understanding the dynamics of Airbnb in NYC.

It includes various columns that contain information such as the listing’s price, availability, host details, neighborhood, number of reviews, location coordinates, room type, and more. To begin, a thorough examination of the dataset was carried out to understand its structure and assess its quality. The dataset contains both categorical and numerical data, which requires different handling techniques in order to extract meaningful insights.

This project is devided into five major segments which includes:

1. Knowing The Data

2. Understanding The Variables Present in Data

3. Data Wrangling

4. Data Vizualization, Storytelling & Experimenting with charts

5. Solution to Business Objective


**Knowing The Data and Understanding The Variable** --

To work with any dataset and extracting insights from it, the very first step is to understand the dataset. It includes Mounting the drive and reading the data. Secondly we need to understand the variables or columns present in the dataset or type of data present in it.

**Data Wrangling** --  
One of the first steps in any EDA project is preparing the data for analysis.Further, it was essential to convert some variables to appropriate data types, such as ensuring that date columns were in datetime format and numerical columns were in integer or float formats.



**Data Exploration and Visualization**  --

A series of visualizations were created using popular libraries like Matplotlib and Seaborn to reveal patterns in the data. This Project includes 15 visualisation charts which are of different types such as Bar chart, Histograms, Box plots, Pie chart, Line plot, Heatmap, Matrix Correlation Heatmap, Scatter plot, Pair plot etc. These all 15 charts shows different kind of visualisation and relation between variable. All charts are different from each other.


**Room Types and Pricing** -- A comparison of different room types (e.g., entire home/apt, private room, shared room) and their respective prices revealed some interesting trends. Entire apartments and homes were generally more expensive than private and shared rooms, which is consistent with common expectations.

**Location Analysis** -- The geographical distribution of Airbnb listings across NYC was another area of interest. A scatter plot or heatmap was used to map the locations of listings based on latitude and longitude, revealing that the majority of listings were concentrated in Manhattan, Brooklyn, and Queens. This geographic distribution underscores the impact of location on price and demand, with more popular tourist areas commanding higher prices.

**Geographic Distribution** -- A key aspect of the analysis was examining the geographic distribution of Airbnb listings across NYC. Using a scatter plot or heatmap, the locations of listings were visualized based on latitude and longitude. It was evident that Manhattan, Brooklyn, and Queens had the highest concentration of listings. This geographical clustering correlates with areas that are more popular among tourists and are therefore in higher demand, resulting in higher prices.

**Availability Trends** -- The availability of listings was analyzed over time to explore any seasonal trends. A time-series plot revealed a clear seasonal pattern, with availability peaking during the summer months and declining during the winter. This trend suggests that hosts are more likely to list their properties during the tourist-heavy summer season, while availability drops during the colder months when tourism generally slows down.

**Popularity and Reviews** -- The relationship between the number of reviews and the price of listings was explored. The analysis shows that listings with more reviews were typically more expensive, suggesting that higher-rated or more popular listings attract higher prices. This relationship likely indicates that positive reviews drive demand, leading to higher pricing.



# **GitHub Link -**

https://github.com/Iamshilpashah

# **Problem Statement**


The objective of this analysis is to explore and gain insights into the factors influencing Airbnb listings in New York City. Specifically, we aim to understand the relationships between different variables, such as price, room type, neighborhood group, availability, and number of reviews. By analyzing these factors, the goal is to identify patterns, trends, and correlations that can provide actionable insights. Key questions to address include:


*   How do different room types (e.g., Entire Home, Private Room, Shared Room)
affect the price of a listing?
*   What is the impact of location (neighborhood group and neighborhood) on pricing and availability?

*   How do reviews and host activity (e.g., number of listings a host has) influence the success of a listing (price, reviews, and availability)?
*   Are there any seasonal or temporal trends in pricing or reviews?




#### **Define Your Business Objective?**

The main business objective of this project is to analyze and extract actionable insights from the Airbnb listings in New York City to help stakeholders (including Airbnb hosts, property managers, investors, and even Airbnb itself) make informed decisions that can enhance performance, optimize pricing strategies, and improve user experience.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

Let's dive deeper into the AIRBNB NYC 2019 Dataset and gain insights.

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import re


### Dataset Loading

In [None]:
# Mounting drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Reading dataset
airbnb_df = pd.read_csv("/content/drive/MyDrive/Datasets/Airbnb NYC 2019.csv")


### Dataset First View

In [None]:
# Dataset First Look
airbnb_df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
airbnb_df.shape

In [None]:
# Fetching columns
airbnb_df.columns

### Dataset Information

In [None]:
# Dataset Info
airbnb_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
airbnb_df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
airbnb_df.isnull().sum()

In [None]:
# Visualizing the missing values

# Installing missingno first
!pip install missingno

# Importing the library
import missingno as msno

# set the size of plot
plt.figure(figsize = (1,1))

# Generating the heatmap
msno.heatmap(airbnb_df)

# Display the plot
plt.show()

### What did you know about your dataset?

The dataset contains 48,895 rows and 16 columns, representing data on Airbnb listings in New York City. This data includes information about listings, host details, reviews, pricing, and availability. The dataset can help analyze various aspects of Airbnb's performance in the NYC market, such as pricing patterns, host activity, and the popularity of different neighborhoods.

Data present in different columns are of three types they are as follows:

Numeric: id, host_id, latitude, longitude, price, minimum_nights, number_of_reviews, reviews_per_month, calculated_host_listings_count, availability_365.
Categorical: name, host_name, neighbourhood_group, neighbourhood, room_type.
Datetime: last_review.

**Missing Data**:
The Dataset consist of null values in some columns.They are as follows-

"name" column contains 16 null values, "host_name" contains 21 null values, last_review" contains 10052 null values, "review_per_month" contains 10052 null values, "days_since_last_review" contains 10052 and "month" also contains 10052 null values.

**Data Quality:**
The dataset appears to be fairly comprehensive but could have inconsistencies, such as outliers in price (e.g., very high prices for luxury listings) or missing data in some review-related columns.

It might be necessary to handle outliers in columns like price and minimum_nights to ensure accurate analysis.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
cols = list(airbnb_df.columns)
cols

In [None]:
# Dataset Describe
airbnb_df.describe()

### Variables Description

The dataset has 16 variables. variables present in the AIRBNB NYC dataset are as follows:

**'id'**: Unique identifier for each listing (numeric).

**'name'**: Name or title of the listing (string).

**'host_id'**: Unique identifier for the host (numeric).

'**host_name'**: Name of the host (string).

**'neighbourhood_group'**: The larger geographical area in NYC (e.g., Manhattan, Brooklyn) (categorical).

**'neighbourhood'**: Specific neighborhood within NYC (categorical).

**'latitude'**: Latitude coordinate of the listing (numeric).

**'longitude'**: Longitude coordinate of the listing (numeric).

**'room_type'**: Type of room being rented (e.g., Entire Home, Private Room, Shared Room) (categorical).

**'price'**: Price per night for the listing (numeric).

**'minimum_nights'**: Minimum number of nights required to book the listing (numeric).

**'number_of_reviews'**: Total number of reviews received by the listing (numeric).

**'last_review'**: Date of the last review for the listing (datetime).

**'reviews_per_month'**: Average number of reviews per month (numeric).

**'calculated_host_listings_count'**: Number of listings a host has (numeric).

**'availability_365'**: Number of days the listing is available in a year (numeric).

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

airbnb_df.nunique(axis = 'columns')

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

# 1. Grouping the data
df_grouped = airbnb_df.groupby('neighbourhood_group')
#print(df_grouped.get_group(name))
for name,group in df_grouped:
  print(f"Neighbourhood Groups : {name}")
  print(df_grouped.get_group(name))

In [None]:
# 2. Convert Data types
#converting last_review to datetime format
airbnb_df['last_review'] = pd.to_datetime(airbnb_df['last_review'], errors = 'coerce')

#testing
airbnb_df.dtypes

In [None]:
# 3. creating new columns "is_expensive", "days_since_last_reviews", "price_per_night", and "is_host_superhost"
airbnb_df['is_expensive'] = airbnb_df['price'] > 300

airbnb_df['days_since_last_reviews'] = (pd.to_datetime('today') - airbnb_df['last_review']).dt.days

airbnb_df['price_per_night'] = airbnb_df['price'] / airbnb_df['minimum_nights']

airbnb_df['is_host_superhost'] = airbnb_df['calculated_host_listings_count'] > 1

print(airbnb_df[['is_expensive', 'days_since_last_reviews','price_per_night', 'is_host_superhost']].head())

#testing columns
print("Now all columns are:")
list(airbnb_df.columns)

In [None]:
# 4. # Extract the length of the listing name
airbnb_df['name_length'] = airbnb_df['name'].astype(str).apply(lambda x: len(x))

# Check for any special characters in the name (e.g., underscores, numbers)
airbnb_df['name_contains_special_char'] = airbnb_df['name'].astype(str).apply(lambda x: bool(re.search(r'[^a-zA-Z0-9\s]', x)))

# Check the first few rows
print(airbnb_df[['name', 'name_length', 'name_contains_special_char']].head())

In [None]:
# 5. Group by 'neighbourhood_group' and calculate the average price
avg_price_neighbourhood = airbnb_df.groupby('neighbourhood_group')['price'].mean().reset_index()

# Group by 'room_type' and get the average price per room type
avg_price_room_type = airbnb_df.groupby('room_type')['price'].mean().reset_index()

# Print the results
print(avg_price_neighbourhood)
print(avg_price_room_type)

In [None]:
#6. Is manhattan neighbourhood preffered more over other neighbourhood or not!

# Filter the dataset for listing in Manhattan and other neighbourhoods
manhattan_listings = airbnb_df[airbnb_df['neighbourhood_group'] == "Manhattan"]
other_listings = airbnb_df[airbnb_df['neighbourhood_group'] != "Manhattan"]

# Calculate average price or average number of reviews
avg_price_manhattan = manhattan_listings['price'].mean()
avg_price_other = other_listings['price'].mean()

avg_reviews_manhattan = manhattan_listings['number_of_reviews'].mean()
avg_reviews_other = other_listings['number_of_reviews'].mean()

# Compare other metrics between Manhattan and other neighbourhoods
if avg_price_manhattan > avg_price_other:
  print("Other neighbourhoods are preffered over Manhattan.")
elif avg_price_manhattan < avg_price_other:
  print("Manhattan is preffered over other neighbourhoods.")
else:
  print("No significant prefrence is observed between Manhattan and other neighbourhood.")

In [None]:
airbnb_df.shape

### What all manipulations have you done and insights you found?

I have performed 6 manipulations...


1.   The first code I have written which groups the data by "neighbourhood_group".

2.   The second code converts the data type of "last_review" column  into datetime format.

3.   The third code creates four new columns such as "is_expensive", "days_since_last_reviews","price_per_night", and "is_host_superhost".

4.   The fourth code Extract the length of the listing name and Check for any special characters in the name (e.g., underscores, numbers) present or not.


5.   And the fifth code provide two insights. First insight given is average price by neighbourhood, and the second insight is average price by room type.

6.   The last code analysis compares the average price and number of reviews for listings in Manhattan versus other neighborhoods. Depending on whether Manhattan has higher or lower average prices and reviews, it determines whether Manhattan is more preferred compared to other neighborhoods, or if there is no significant preference.In we found Manhattan Neighbourhood not much preffered.






## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
avg_price_by_room_type = airbnb_df.groupby('room_type')['price'].mean()
plt.figure(figsize = (10,6))
avg_price_by_room_type.plot(kind = 'bar', color = 'blue')
plt.xlabel('Room Type')
plt.ylabel('Average Price')
plt.title('Average Price By Room Type')
plt.show()

##### 1. Why did you pick the specific chart?

The picked chart is Bar chart which best suits for displaying Average Price by Room Type in this case, because it effectively presents the categorical comparisons.

##### 2. What is/are the insight(s) found from the chart?

The room type is categorized in three types, they are as "Entire home/apt", "Private room" and "Shared room". Here each bar represents the Average price for each room type such as the average price for "Entire home/apt" is above 200, average price for "Private room" is between 75 to 100 and for "Shared room" is below 75.

Average price of "Entire home/apt" is higher/costly among all three types of room.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from analyzing the average price by room type can have a significant positive impact on the business, particularly in the context of an Airbnb-like platform or property management business.

There is not a insight that can lead to negative growth because, business can adjust their pricing, marketing, and inventory strategies, which leads to a more informed decision-making process.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
top_hosts = airbnb_df.groupby(['host_name', 'name'])['reviews_per_month'].mean().reset_index().sort_values(by=['reviews_per_month'], ascending=False).head(10)
plt.figure(figsize = (10,6))
sns.barplot(data = top_hosts, y = "host_name", x = "reviews_per_month", hue = "name", dodge = False)
plt.xlabel("Reviews per month")
plt.ylabel("Host name")
plt.title("Top 10 Hosts and Listings by Reviews per Month")
plt.legend(title = "Listing Name", bbox_to_anchor = (1.05, 1), loc = "upper left")
plt.show()

##### 1. Why did you pick the specific chart?

The barplot was chosen because it allows for a clear comparison of reviews per month for the top 10 hosts, with a visual distinction for different listings (via the hue parameter). It efficiently shows the relationship between the host name, their listings, and the average reviews per month. The barplot is ideal for ranking and comparing categories (hosts) and their associated values (reviews per month), especially when there's a need to group or distinguish multiple variables (listings).

##### 2. What is/are the insight(s) found from the chart?

The chart highlights the top 10 hosts with the highest average reviews per month, indicating their popularity and engagement with guests.

It reveals which specific listings under these hosts attract more reviews, showing that some hosts might have multiple high-performing listings.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights can positively impact the business by, The business can leverage successful hosts for partnerships, promotions, or ambassador programs to enhance platform visibility and trust.

Yes, there is an insight that can be negtive for business growth that is if a small number of hosts are driving most reviews, the business might become too dependent on these hosts for growth. This could lead to issues if any of these hosts face reputation damage or leave the platform.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
room_types = airbnb_df['room_type'].unique()
avg_reviews_per_month = airbnb_df.groupby("room_type")["reviews_per_month"].mean()

plt.figure(figsize = (10,6))
plt.bar(room_types, avg_reviews_per_month, color = 'Turquoise')
plt.xlabel("Room Type")
plt.ylabel("Average Reviews per Month")
plt.title("Average Reviews per Month by Room Type")
plt.xticks(rotation = 45)
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart was chosen because it is an effective way to compare the average reviews per month across different room types. The categorical data (room types) on the x-axis allows for easy comparison with the numerical data (average reviews per month) on the y-axis. The chart provides clarity in visualizing which room types are receiving the most engagement (reviews), and the bars make it easy to identify Average reviews per month by type of room.

##### 2. What is/are the insight(s) found from the chart?

The chart shows how different room types (e.g., entire home, private room, shared room) compare in terms of average reviews per month. This highlights the level of guest engagement for each room type.

Certain room types may have significantly higher reviews per month, indicating that guests are more likely to leave reviews for those options, potentially due to a better experience or higher demand.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights help creating a positive business impact because, the business can focus on promoting room types that receive higher reviews, indicating strong demand and guest satisfaction. This could lead to better-targeted marketing strategies.And by understanding which room types get more reviews, the business can identify characteristics that enhance guest experience, leading to better customer retention and higher ratings in future.

Yes, there is a insight that can lead to negative growth, if room types with fewer reviews are not meeting guest expectations, there could be bad in customer satisfaction over time, damaging the platform's image and reducing overall growth.So the business need to work on this issue to prevent from declining in future.




#### Chart - 4

In [None]:
# Chart - 4 visualization code
plt.figure(figsize = (10,6))
sns.histplot(airbnb_df['price'], kde=True, color='blue', bins=30)
plt.xlabel('Price')
plt.ylabel('Freequency')
plt.title('Price Distribution')
plt.show()

##### 1. Why did you pick the specific chart?

A histogram with KDE (Kernel Density Estimate) was chosen because it provides a clear visualization of the distribution of prices in the dataset. The histogram shows the frequency of price ranges, while the KDE adds a smooth curve to highlight the overall distribution pattern. This combination helps to understand the central tendency, spread, and skewness of the price data effectively.

##### 2. What is/are the insight(s) found from the chart?

The chart shows the frequency of different price ranges, helping to understand how prices are distributed across the dataset.

If the distribution is skewed to the right (i.e., there are many lower-priced listings and fewer higher-priced ones), this could indicate that most listings are priced lower, with a small number of higher-priced ones.

The histogram's peaks and valleys show where most listings fall in terms of pricing, allowing the business to identify common price ranges for properties.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights can help Knowing the most common price ranges helps the business tailor its offerings to cater to the majority of customers while adjusting for demand in higher or lower-priced categories.

If a significant portion of listings is clustered around a certain price range, businesses that price too far above or below this range may face lower demand. Overpricing could lead to low occupancy, while underpricing might result in missed revenue opportunities.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
plt.figure(figsize = (10,6))
sns.boxplot(airbnb_df, x= 'room_type', y = 'price', color = 'grey')
plt.xlabel("Room Type")
plt.ylabel("Price")
plt.title("Price Distribution by Room Type")
plt.show()

##### 1. Why did you pick the specific chart?

A boxplot was chosen because it effectively displays the distribution of prices by room type, including key statistics like the median, interquartile range (IQR), and potential outliers. It provides a clear view of how prices vary within each room type, making it easier to compare distributions across categories.

##### 2. What is/are the insight(s) found from the chart?

The boxplot reveals the range of prices for each room type, showing the variability within each category. For example, "entire home" might have a wide price range, while "shared room" may have a narrower spread.

The median line within each box shows the central tendency of prices for each room type, indicating where most prices are concentrated.

Any outliers (dots outside the whiskers) highlight listings with prices significantly different from most others, which could either indicate luxury properties or errors in pricing.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes,gained insights help creating a positive business impact. The business can assess which room types have higher or lower price variability, guiding decisions on which types to focus on or promote based on demand and profitability.

Yes, insights that lead to negative growth is outliers, especially in higher-priced rooms, could signal overpricing for certain properties. If these prices are not justified by the quality of the listing, they may result in low occupancy and negative customer feedback, damaging the platform’s reputation.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
plt.figure(figsize = (25, 10))
avg_price_neighbourhood = airbnb_df.groupby('neighbourhood')['price'].mean().sort_values(ascending = False)
avg_price_neighbourhood = avg_price_neighbourhood.head(100)
plt.bar(avg_price_neighbourhood.index, avg_price_neighbourhood.values, color = 'salmon')
plt.title('Average Price by Neighbourhood')
plt.xlabel('Neighbourhood')
plt.ylabel("Average Price")
plt.xticks(rotation = 90, ha = 'right')
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart was chosen to display the average price by neighbourhood because it provides a clear and direct comparison of prices across different neighbourhoods. By selecting the top 100 neighbourhoods, the chart allows for easy identification of which areas are more expensive. The bar chart format also helps to visualize rankings effectively, making it easy to compare neighbourhoods side by side.

##### 2. What is/are the insight(s) found from the chart?

The chart highlights which neighbourhoods have the highest average prices for Airbnb listings, helping to identify areas that are more expensive.

 The top neighbourhoods with the highest average prices can be identified, which may indicate high-demand, premium locations or neighbourhoods.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights help creating a positive business impact Knowing which neighbourhoods have higher prices can help tailor marketing campaigns to attract more customers to those areas, especially those willing to pay a premium for location.
By understanding pricing trends in different neighbourhoods, the business can adjust pricing strategies to maximize revenue, such as offering discounts or promotions in less expensive areas to attract more guests.

Yes, there is a insight that can affect the business negatively which is, if the average price in certain neighbourhoods is much higher than the market norm and is not justified by the property’s value, this could lead to lower occupancy rates. Customers may avoid these areas if they perceive the listings to be overpriced compared to similar offerings elsewhere.The pricing should be sustainable.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
room_type_counts = airbnb_df['room_type'].value_counts()
plt.figure(figsize = (12, 8))
room_type_counts.plot.pie(colors = ['violet', 'pink', 'grey'])
plt.title("Room Type Distribution")
plt.ylabel('')
plt.show()

##### 1. Why did you pick the specific chart?

The pie chart was chosen because it effectively visualizes the proportional distribution of different room types in the Airbnb dataset. Pie charts are ideal for showing relative sizes of categories as they highlight the share of each category (in this case, room types) within the total. The color scheme (violet, pink, and grey) also helps to differentiate the segments and make the chart more visually appealing, enhancing the viewer's ability to interpret the data.

##### 2. What is/are the insight(s) found from the chart?

The chart likely reveals the relative popularity of various room types available on Airbnb. Insights include identifying which room type dominates in terms of listings (for example, if most properties are "Entire homes" or "Private rooms"), and understanding how other room types like "Shared rooms" compare. If a single room type comprises a large share, it indicates that it is the most common listing on Airbnb. Conversely, a more even distribution might suggest a balanced variety of options available to guests.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insight can help in understanding which room types are most common can guide hosts or Airbnb in adjusting pricing strategies, optimizing supply, and targeting customer preferences.

Yes, the insight can lead to negative growth of business because if one room type (for example, "Entire homes") dominates the pie chart by a significant margin, this could indicate a lack of diversity in room offerings.
Because relying too heavily on a single room type might limit the target customer base. If demand for that specific room type decreases due to market shifts, seasonality, or changing consumer preferences, it could hurt business. For instance, if the demand for "Entire homes" drops due to a rise in solo or budget travelers, Airbnb may lose out on a growing segment of the market. So
insights could signal a need for more diverse room types, a better balance between luxury and budget options, and an awareness of changing consumer preferences to avoid stagnation or negative growth in the Airbnb business.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
plt.figure(figsize = (10,6))
sns.scatterplot(airbnb_df, x='longitude', y='latitude', hue= 'room_type', color='deep')
plt.title("Latitude vs. Longitude of Listings")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()

##### 1. Why did you pick the specific chart?

The scatter plot was chosen because it provides a clear visualization of the geographic distribution of Airbnb listings based on their longitude and latitude. This type of chart is ideal for showing spatial relationships, allowing us to see the clustering or spread of listings across different areas. The hue argument (colored by room type) adds an extra layer of information, helping us understand how different room types are distributed geographically. This can offer valuable insights for both customer preferences and business strategies.

##### 2. What is/are the insight(s) found from the chart?

The scatter plot can reveal clusters of listings in specific regions (e.g., in city centers or tourist hotspots). These clusters might indicate high demand in those areas, while other regions may have sparse listings.
There are large gaps or areas with few listings, it might signal regions where Airbnb has limited presence, presenting potential for expansion.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights can lead to positive business impacts.By identifying areas with high demand (i.e., clusters of listings) or under-served locations (i.e., gaps in listings), Airbnb can focus marketing efforts or expand its offerings into these regions.

Yes, there could be insights that indicate potential for negative growth.If the chart reveals a large concentration of listings in specific high-demand areas (e.g., city centers), it could suggest market saturation. Overcrowding in popular locations might lead to increased competition between hosts, driving down prices and reducing profitability. This could negatively affect the business if hosts are unable to sustain competitive pricing or differentiate their offerings.And the chart shows some gaps in certain regions, especially in areas with growing tourism or residential development, Airbnb might be missing potential business opportunities. Not addressing these gaps could prevent the business leading to a loss in potential growth.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
airbnb_df['last_review'] = pd.to_datetime(airbnb_df['last_review'])
airbnb_df['month'] = airbnb_df['last_review'].dt.month
availability_by_month = airbnb_df.groupby('month')['availability_365'].mean()

plt.figure(figsize = (10,6))
sns.heatmap(availability_by_month.values.reshape(1,-1))
plt.title("Listing Availability by Month")
plt.yticks([], [])
plt.xticks(range(12), ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
plt.show()


##### 1. Why did you pick the specific chart?

The heatmap was chosen because it is an effective way to visually represent the variation in listing availability over the months of the year. The heatmap's color gradient allows for a quick understanding of the patterns, highlighting which months have higher or lower average availability of listings. By using a heatmap, we can clearly observe seasonal trends and fluctuations, which would be harder to interpret in a standard line or bar chart.

##### 2. What is/are the insight(s) found from the chart?

The heatmap likely reveals clear seasonal patterns in listing availability. For example, we may see higher availability during certain months (e.g., in the off-season) and lower availability during peak months (e.g., summer or holiday periods).

The heatmap may show specific months where listings are less available and  certain months show high availability, it might reflect the off-peak season when demand is lower.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights can positively impact the business. Understanding months with lower availability can help Airbnb plan promotions or incentives for hosts to make their listings more available during slower months. This could also help boost bookings during off-peak seasons and maximizing revenue.

Yes, there could be potential insights that indicate negative growth, depending on the patterns. A significant portion of listings show high availability during low-demand months, it could imply that Airbnb is over-relying on slower periods for bookings, which might not be sustainable for long-term growth. It may result in a lack of sufficient bookings and lower overall profitability during peak months.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
plt.figure(figsize = (10, 6))
sns.histplot(airbnb_df['minimum_nights'], kde=True, color='green', bins= 30)
plt.title("Minimum Nights Distribution")
plt.xlabel('Minimum Nights')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

The histogram with KDE (Kernel Density Estimate) was chosen because it effectively visualizes the distribution of a continuous variable, in this case, the minimum number of nights required for booking. The histogram allows us to see the frequency of different values (i.e., how many listings require a certain number of minimum nights), while the KDE provides a smoothed curve that shows the overall trend or distribution pattern. This combination of the histogram and KDE offers both the discrete count of listings and the general shape of the data, making it easier to identify patterns such as peaks, outliers, or skewness.

##### 2. What is/are the insight(s) found from the chart?

The histogram will likely show that most listings require a low number of minimum nights (likely around 1 or 2), indicating that many hosts prefer to keep their listings flexible and accessible for short-term stays.
The histogram is right-skewed, meaning that the majority of listings have low minimum nights, while a smaller proportion have much higher minimum night requirements. This suggests that short-term stays dominate.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights from this chart can guide strategies for growth because of insights into the distribution of minimum nights can help Airbnb suggest to hosts that they adjust their minimum night requirements for better occupancy, especially during peak seasons when travelers may be looking for short stays.

Yes, there are insights that might signal potential negative growth. The data shows many listings requiring high minimum nights in areas with high tourist demand, it could limit bookings, especially in cities or regions where short-term travelers prefer flexible accommodation. For example, in cities where tourists typically stay for only a few days, listings with high minimum night requirements could reduce overall occupancy rates. This could decrease revenue for Airbnb hosts and the platform overall.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
plt.figure(figsize = (10, 6))
sns.boxplot(airbnb_df, x='neighbourhood_group', y='reviews_per_month', color='orange')
plt.title('Reviews per Month by Neighbourhood Group')
plt.xlabel("Neighbourhood Group")
plt.ylabel("Reviews per Month")
plt.show()

##### 1. Why did you pick the specific chart?

The boxplot was chosen because it effectively shows the distribution of reviews per month across different neighborhood groups. It provides a clear view of the median, variability, and outliers (extremely high or low values), which helps compare how neighborhoods perform in terms of reviews.

##### 2. What is/are the insight(s) found from the chart?

The median line shows the typical number of reviews per month for each neighborhood group.

The spread of the box indicates how much review numbers vary within each group. Points outside the whiskers represent listings with unusually high or low reviews.

 The boxplot allows comparison between different neighborhood groups in terms of reviews.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights can help by:

Targeting Popular Areas for marketing to boost bookings.
Improving Low-Performing Areas through promotions and host guidance.
Helping Hosts Adjust their strategies based on neighborhood performance to increase reviews.

The insight can lead to negative growth by:

Low Review Numbers in some areas could indicate weak demand, affecting overall growth.
High Variability in reviews suggests inconsistency, potentially hurting Airbnb's reputation.
Outliers with Low Reviews could point to underperforming listings, which may hinder business growth in those regions.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
airbnb_df['last_review'] = pd.to_datetime(airbnb_df['last_review'])
price_trend = airbnb_df.groupby('last_review')['price'].mean()
plt.figure(figsize = (10, 6))
price_trend.plot(kind = 'line', color = 'blue', linewidth = 2)
plt.title('Average Price Trend Over Time')
plt.xlabel('Date')
plt.ylabel('Average Price')
plt.xticks(rotation = 0)
plt.show()

##### 1. Why did you pick the specific chart?

The line chart was chosen because it effectively shows trends over time, which is ideal for analyzing the average price trend of Airbnb listings. By plotting the average price per day over time, the line chart clearly illustrates how the price has changed, helping to identify seasonal fluctuations, long-term trends, or irregular price behavior.

##### 2. What is/are the insight(s) found from the chart?

Price Fluctuations: The line chart will reveal whether the average price of listings has increased or decreased over time.

Seasonal Trends: The chart shows spikes or dips at certain times of the year, it could indicate seasonal price variations (e.g., higher prices during holidays or peak tourist seasons).

Price Stability: The line is relatively smooth, it could suggest that prices have been stable over the observed period.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights can create the positive business impact:

Guide Pricing Strategies: Airbnb can use the trend to recommend price adjustments during peak and off-peak seasons, maximizing revenue.

Market Predictions: Predicting price trends can help Airbnb and hosts anticipate demand and adjust pricing accordingly.

Customer Targeting: Understanding price fluctuations helps Airbnb target specific customer segments, offering discounts or promotions during slower periods.


Yes, insights that could lead to negative growth include:

Rising Prices: If prices are consistently increasing, it could price out some customers, especially during periods of high competition. This could reduce bookings and hurt Airbnb’s market share.

Price Instability: Significant fluctuations in prices may lead to customer confusion or dissatisfaction, hurting long-term business growth.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
reviews_trend = airbnb_df.groupby('last_review')['number_of_reviews'].sum()
plt.figure(figsize = (10, 8))
reviews_trend.plot(kind = 'line', color = 'red', linewidth = 2)
plt.title('Number of Reviews Trend Over Time')
plt.xlabel('Date')
plt.ylabel('Number of Reviews')
plt.grid(True)
plt.xticks(rotation = 0)
plt.show()

##### 1. Why did you pick the specific chart?

The line chart was chosen because it effectively shows trends over time. By plotting the number of reviews over time, the chart allows us to observe changes in customer engagement and identify patterns such as increases during certain months or years, helping to understand how the platform's activity evolves.

##### 2. What is/are the insight(s) found from the chart?

Growth in Reviews: The line shows an upward slope, it indicates that the number of reviews is increasing, suggesting that more people are engaging with Airbnb over time.

Seasonal Variations: Peaks in the graph may represent specific months in 2019 when more people are booking, such as during holidays or tourist seasons, while dips could point to quieter periods.

Fluctuating Engagement: A steady rise or fall could show consistent customer activity, or it may highlight periods of stagnant growth or decline in reviews.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

es, the insights could have a positive impact by:

Optimizing Marketing: Recognizing when reviews spike can help Airbnb time promotions or target customers during those high-activity periods.

Encouraging More Reviews: A declining or stagnant review trend might prompt Airbnb to encourage hosts to engage guests for more reviews or implement strategies to boost activity.

Better Forecasting: Understanding when reviews peak can help forecast future demand, ensuring resources and marketing are focused on the right times.


Yes, negative trends could include:

Decreasing Number of Reviews: A downward trend could indicate less user activity, possibly due to customer dissatisfaction or less frequent bookings. This could signal a potential slowdown in growth.

Inconsistent Review Activity: Significant fluctuations in the graph, like sharp drops or irregular patterns, might indicate instability in user engagement, which can be harmful to long-term business growth.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize = (10,6))
correlation_matrix = airbnb_df.corr(numeric_only=True)
sns.heatmap(correlation_matrix, cmap = 'coolwarm')
plt.title("Correlation Heatmap")
plt.show()

##### 1. Why did you pick the specific chart?

The correlation heatmap was chosen because it provides a clear and visual representation of the relationships between multiple numeric variables. It highlights positive or negative correlations between features in the dataset, helping to identify important patterns and interactions. The heatmap is useful for quickly spotting which variables are strongly correlated with others, which can inform decisions.

##### 2. What is/are the insight(s) found from the chart?

Insights found from the chart are:

Positive correlation: Certain variables show a strong positive correlation (e.g., price and number_of_reviews), it indicates that as one variable increases, the other also tends to increase.

Negative correlation: A negative correlation (e.g., availability_365 and price) could suggest that as one variable increases, the other decreases. For example, a higher availability might lead to lower prices due to lower demand.

No correlation: Variables with a correlation close to zero may not have a significant relationship with each other.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
plt.figure(figsize = (12, 8))
sns.pairplot(airbnb_df[['price', 'minimum_nights','number_of_reviews', 'room_type']], hue = 'room_type', palette = 'dark')
plt.title("Pair Plot: Price, Minimum Nights, and Number of reviews")
plt.show()

##### 1. Why did you pick the specific chart?

The pairplot was chosen because it allows us to explore the relationships between multiple variables in a single view. It shows pairwise relationships between selected features (e.g., price, minimum_nights, number_of_reviews, and room_type) and provides insights into how these variables interact with each other. By using color for room type, the chart also helps to visually distinguish the distribution of each room type across different features.

##### 2. What is/are the insight(s) found from the chart?

Insights found from the chart are:

Price vs. Number of Reviews: The plot might show that listings with more reviews tend to have higher prices, suggesting that popular listings with positive customer feedback command higher rates.

Price vs. Minimum Nights: We may observe that listings with a higher price could have a higher minimum nights requirement, especially for premium or luxury listings.

Room Type Distribution: Different room types may cluster differently in terms of price and minimum nights, allowing for insights into how room type influences pricing or stay requirements.

Trends by Room Type: By observing how the points for each room type are distributed across the axes, we can infer if certain room types (like entire homes or shared rooms) tend to have higher prices or fewer minimum nights.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

To help the client achieve the business objective of analyzing and extracting actionable insights from the Airbnb listings in New York City, here are several tailored recommendations that can assist stakeholders (including Airbnb hosts, property managers, investors, and Airbnb itself) in making informed decisions to enhance performance, optimize pricing strategies, and improve user experience:

1. Optimize Pricing Strategies for Airbnb Host.

2. Enhance User Experience for Guests.

3. Improve Visibility for Hosts and Listings.

4. Boost Marketing Strategies for Airbnb Hosts.

5. Monitor and Improve Performance Metrics.

6. Utilize Reviews and Feedback to Drive Continuous Improvement.

# **Conclusion**

In conclusion, the exploratory data analysis of the Airbnb NYC dataset provided important insights into the pricing, popularity, and availability of Airbnb listings in New York City. The findings demonstrated that factors such as room type, location, reviews, and seasonality play significant roles in determining the prices and availability of listings. This analysis serves as a foundation for further investigations, such as predictive modeling to forecast pricing trends or optimize Airbnb listings. The project showcases the value of data wrangling and visualization in understanding complex datasets and extracting actionable insights.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***