<a href="https://colab.research.google.com/github/stratnaparkhi026/EDA-project-Airbnb-Booking-Analysis/blob/main/Copy_of_Sample_EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -  **Capstone Project: EDA AirBnb Bookings Analysis**



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Candidate Name**    - Saurabh Ratnaparkhi

# **Project Summary -**

The Airbnb project involves gathering and analyzing data related to Airbnb listings, hosts, and guests in order to gain insights into the trends, patterns, and impacts of short-term rentals in the city. The project aims to provide a better understanding of the Airbnb market in New York City and its implications for the housing market, tourism industry, and local communities.

Key Objectives:-

Data Collection: The project involves collecting data from various sources, including the Airbnb platform, publicly available datasets, and other relevant resources. This data includes information about listings, hosts, pricing, occupancy rates, and geographical distribution.

Data Analysis: Once the data is collected, it is analyzed to identify key trends and patterns in the NYC Airbnb market. This analysis may include examining factors such as listing types, property locations, rental prices, guest reviews, and host characteristics.

Impact Assessment: The project aims to assess the impact of Airbnb on the local housing market, tourism industry, and communities. This involves studying the effects of short-term rentals on rental prices, housing availability, neighborhood dynamics, and the overall economy.

Policy Recommendations: Based on the findings of the analysis, the project may provide recommendations for policymakers and relevant stakeholders. These recommendations may address issues such as housing affordability, regulation of short-term rentals, taxation, and community engagement.

Visualization and Reporting: The project may involve creating visualizations, infographics, and reports to present the findings in a clear and accessible manner. These materials can help stakeholders understand the complexities of the NYC Airbnb market and make informed decisions.

Overall, the NYC Airbnb project aims to provide a comprehensive understanding of the Airbnb market in New York City and its impact on various aspects of the city's economy and communities. The project's findings and recommendations can inform policy decisions, facilitate informed discussions, and contribute to the ongoing debate surrounding the regulation and management of short-term rentals in NYC.

# **GitHub Link -**

https://github.com/stratnaparkhi026/EDA-project-Airbnb-Booking-Analysis/blob/main/Copy_of_Sample_EDA_Submission_Template.ipynb

# **Problem Statement**

The main objective of this exploratory data analysis (EDA) project is to understand the dynamics of Airbnb rentals in New York City (NYC) by analyzing various factors such as pricing, availability, neighborhood groups, room types, and customer feedback. By examining the average price, number of listings, availability, and customer reviews, we aim to gain insights into the pricing patterns, popularity of neighborhoods, customer satisfaction, and market opportunities. This analysis will provide valuable information for potential investors or hosts to make informed decisions and understand the key factors influencing the success and profitability of Airbnb rentals in NYC.

#### **Define Your Business Objective?**

The primary business objective of this project is to optimize the client's Airbnb rental business in New York City by maximizing occupancy rates, profitability, and guest satisfaction. By implementing data-driven strategies and insights, the aim is to achieve higher occupancy rates, increase revenue, and ensure exceptional guest experiences, ultimately driving the success and growth of the client's business in the competitive NYC Airbnb market.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np

# Import Visualization Libraries
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
url = 'https://raw.githubusercontent.com/stratnaparkhi026/EDA-project-Airbnb-Booking-Analysis/main/Copy%20of%20Airbnb%20NYC%202019.csv'
airbnb_df = pd.read_csv(url)

### Dataset First View

In [None]:
# Dataset First Look
airbnb_df.head()

In [None]:
airbnb_df.tail()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
row_count, column_count = airbnb_df.shape
print("Number of rows:", row_count)
print("Number of columns:", column_count)

### Dataset Information

In [None]:
# Dataset Info
airbnb_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_count = airbnb_df.duplicated().sum()
print("Number of duplicate values:", duplicate_count)

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
airbnb_df.isnull().sum()

In [None]:
# Visualizing the missing values
plt.figure(figsize=(5, 3))
sns.heatmap(airbnb_df.isnull(), cbar=False, cmap='Reds')
plt.title('Missing Values Heatmap')
plt.show()

In [None]:
# No. of room types available for rent
airbnb_df.room_type.unique()

In [None]:
# How many neighbourhood groups are there?
airbnb_df.neighbourhood_group.unique()

In [None]:
# How many neighborhoods are there?
airbnb_df.neighbourhood.nunique()

In [None]:
# Total number of hosts
airbnb_df.host_id.nunique()

In [None]:
# Total number of listings
airbnb_df.id.nunique()

### What did you know about your dataset?

Answer Here

1.   The dataset has 16 columns and 48,895 rows.
2.   A significant number of values seem to be missing from the last_review and reviews_per_month columns.
3.   The rows are entirely unique meaning there is no repeated record.
4.   There are five neighborhood groups, 221 neighborhoods, 37,457 hosts, 48,895 listings, and three types of rooms
     available for rental.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
airbnb_df.columns

In [None]:
# Dataset Describe
airbnb_df.describe()

### Variables Description

Answer Here
1.   ID : Unique ID of the Airbnb listing
2.   Name : Name of the Airbnb listing
3.   Host ID : Unique Host ID  
4.   Host name : Name of the Host
5.   Neighbourhood group : Location
6.   Neighbourhood : Area
7.   Latitude : Latiude Range
8.   Longitude : Longitude Range
9.   Room type : Type of Room
10.  listed Price : Price of listing
11.  Minimum nights : Minimum nights to be paid for
12.  Number_of_reviews : Number of reviews
13.  Last_review : Last date of review
14.  Reviews per month : Number of reviews per month
15.  Calculated host listings count : Total count of listings of the host
16.  Availability 365 : Availability around the year

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in airbnb_df.columns.tolist():
  print("No. of unique values in",i,"is",airbnb_df[i].nunique())

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
nan_count = airbnb_df.isna().sum()
print(nan_count)

In [None]:
airbnb_df.dropna(subset=['name','host_name',"last_review","reviews_per_month"], inplace=True)
print(airbnb_df.isna().sum())
print(airbnb_df.shape)

**Handling the outliers in the price column**


In [None]:
# boxplot for price column
sns.boxplot(x = airbnb_df['price'])
plt.show()

In [None]:
# writing a outlier function for removing outliers in important columns.
def iqr_technique(DFcolumn):
  Q1 = np.percentile(DFcolumn, 25)
  Q3 = np.percentile(DFcolumn, 75)
  IQR = Q3 - Q1 # interquantile range
  lower_range = Q1 - (1.5 * IQR)
  upper_range = Q3 + (1.5 * IQR)

  return lower_range,upper_range

In [None]:
# Executing the outlier function
lower_bound,upper_bound=iqr_technique(airbnb_df['price'])
airbnb_df1 = airbnb_df[(airbnb_df.price>lower_bound) & (airbnb_df.price<upper_bound)]

# so the outliers are removed from price column now check with boxplot and also check shape of new Dataframe!
sns.boxplot(x = airbnb_df1['price'])
print("shape of the dataset before removing outlier:", airbnb_df.shape)
print("shape of the dataset after removing outlier:", airbnb_df1.shape)

### What all manipulations have you done and insights you found?

Answer Here.
*   we removed the null values in the 'name', 'host_name', 'last_reviews' & 'reviews_per_month' columns.
*   After that we checked for all the null values are removed or not by using shape() method.
*   After this we handled the outliers using the IQR Approach as we could see in the box plot that we had a bunch of
    outliers in the 'price' column.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

### **Chart - 1: Distribution of Airbnb Bookings Price Range.**

We can notice that the price column has minimum price as 0, which is surprising as price 0 does'nt make sense to do bussiness. so to overcome this problem Lets fill the column of price having price as 0 with appropriate price values (by filling the price with median price for each room_type)

In [None]:
# Chart - 1 visualization code
# Lets fill '0' with median price for each room type.
airbnb_df1.loc[ (airbnb_df1.room_type=='Entire home/apt') & (airbnb_df1.price==0),'price']=airbnb_df1.loc[ (airbnb_df1.room_type=='Entire home/apt') & (airbnb_df1.price!=0),'price'].median()
airbnb_df1.loc[ (airbnb_df1.room_type=='Private room') & (airbnb_df1.price==0),'price']=airbnb_df1.loc[ (airbnb_df1.room_type=='Private room') & (airbnb_df1.price!=0),'price'].median()
airbnb_df1.loc[ (airbnb_df1.room_type=='Shared room') & (airbnb_df1.price==0),'price']=airbnb_df1.loc[ (airbnb_df1.room_type=='Shared room') & (airbnb_df1.price!=0),'price'].median()
airbnb_df1.shape # after updating price column

In [None]:
# Create a figure with a custom size
plt.figure(figsize=(12, 5))

# Set the seaborn theme to darkgrid
sns.set_theme(style='darkgrid')

# Create a histogram of the 'price' column of the Airbnb_df dataframe
# using sns distplot function and specifying the color as red
sns.histplot(airbnb_df1['price'],color=('r'), kde = True)

# Add labels to the x-axis and y-axis
plt.xlabel('Price', fontsize=14)
plt.ylabel('Density', fontsize=14)

# Add a title to the plot
plt.title('Distribution of Airbnb Prices',fontsize=15)

##### **1. Why did you pick the specific chart?**

*   A histplot is a type of chart that displays the distribution of a dataset. It is a graphical representation of the data that shows how often each value or group of values occurs. Histplots are useful for understanding the distribution of a dataset and identifying patterns or trends in the data.
*   Thus, we used the histogram plot to analyse the price distributions for the dataset.
*   A boxplot is used to summarize the key statistical characteristics of a dataset, including the median, quartiles, and range, in a single plot. Boxplots are useful for identifying the presence of outliers in a dataset, comparing the distribution of multiple datasets, and understanding the dispersion of the data.
*   Thus I used box plot to analyse the outliers and interquartile range including median, maximum and minimum value.

##### **2. What is/are the insight(s) found from the chart?**

*   The range of prices being charged on Airbnb appears to be from 20 to 330 dollars , with the majority of listings falling in the price range of 50 to 150 dollars.
*   The distribution of prices appears to have a peak in the 50 to 150 dollars range, with a relatively lower density of listings in higher and lower price ranges.
*   There may be fewer listings available at prices above 250 dollars, as the density of listings drops significantly in this range.

##### **3. Will the gained insights help creating a positive business impact?**
Are there any insights that lead to negative growth? Justify with specific reason.

Histogram and Box plot cannot give us whole information regarding data. It's done just to see the distribution of the column data over the dataset.

### **Chart - 2: Highest number of apartments owned by the host.**

In [None]:
# Number of listings owned by the hosts according to host_name column.
airbnb_df1['host_name'].value_counts().reset_index().head(10)

In [None]:
airbnb_df1['host_name'].value_counts()[:10].plot(kind='barh')
plt.title('Top hosts(not distinct) with highest number of listings on Airbnb')

In [None]:
# Number of listings owned by the hosts according to host_id column.
airbnb_df1["host_id"].value_counts().reset_index().head(10)

In [None]:
airbnb_df1['host_id'].value_counts()[:10].plot.bar()
plt.title('Top host_id with highest number of listings on Airbnb')

In [None]:
# Top 10 hosts having highest number of listings on airbnb
top_host=airbnb_df1.groupby(['host_id','host_name'])['calculated_host_listings_count'].max().sort_values(ascending=False).head(10).reset_index()
h_name = top_host['host_name'].head(10)
count = top_host['calculated_host_listings_count'].head(10)
# Figure Size
fig = plt.figure(figsize =(15, 5))
# Horizontal Bar Plot
plt.bar(h_name[0:10], count[0:10])
# Show Plot
plt.title("Top hosts(distinct) with highest number of listings on Airbnb")
plt.show()

##### **1. Why did you pick the specific chart?**

A bar plot is an effective visualization technique for comparing the highest number of listings owned by the hosts in the Airbnb data set. This visual representation allows for easy interpretation and identification of trends. Viewers can quickly compare the heights of the bars to determine which hosts tend to have highest number of listings.

##### **2. What is/are the insight(s) found from the chart?**

From this, we can see that the host_name Michael it's appearing 309 times in the host_name column, so this might imply that Michael is having the highest number of rooms. But from the host_id column, it shows the highest appearance of any host_id is 164, so this implies that there can be multiple people that may have the same name that's why we are getting different highest appearance in host_name as compared to host_id.

##### **3. Will the gained insights help creating a positive business impact?**
Are there any insights that lead to negative growth? Justify with specific reason.

*   Sonder (NYC) is having maximum numbers of rooms for the guest and having multiple appartments in same building in different different neighbourhood, so for Airbnb he might be very important person.
*   The analysis can have a positive business impact by helping the airbnb to hold the good relationship for the top hosts that owned the highest number of listings.


### **Chart - 3: Which are the top 10 neighbourhood having maximum number of appartments on airbnb?**

In [None]:
airbnb_df1['neighbourhood'].value_counts().head(10)

In [None]:
# Plotting top 10 neighbourhood which are having maximum number of appartments on Airbnb

pd.value_counts(airbnb_df1['neighbourhood'])[:10].plot.bar()
plt.title("Top neighbourhoods with highest number of listings")

##### **1. Why did you pick the specific chart?**

A column plot is an effective visualization technique for comparing the top neighbourhood with highest number of listings on the Airbnb. Viewers can quickly compare the heights of the column to determine which neighbourhood tend to have highest number of listings.

##### **2. What is/are the insight(s) found from the chart?**

Bedford-Stuyvesant and Williamsburg are the top neighbourhoods having maximum number of properties on airbnb.

##### **3. Will the gained insights help creating a positive business impact?**
Are there any insights that lead to negative growth? Justify with specific reason.

*   Analyzing the distribution of listings across neighbourhoods can provide insights into the popularity.
*   Identifying the most popular neighbourhoods can help hosts focus their marketing efforts and optimize their listings.
*   Concentrating resources in the neighbourhoods with the highest number of listings can increase bookings and profitability.

### **Chart - 4: Which are top 3 neighbourhood in each group having maximum price amongst their respective neighbourhood group?**

In [None]:
# Dataframe of each neighbourhood group
df_manhattan=airbnb_df1[airbnb_df1['neighbourhood_group']=='Manhattan']
df_queens=airbnb_df1[airbnb_df1['neighbourhood_group']=='Queens']
df_brooklyn=airbnb_df1[airbnb_df1['neighbourhood_group']=='Brooklyn']
df_bronx=airbnb_df1[airbnb_df1['neighbourhood_group']=='Bronx']
df_staten=airbnb_df1[airbnb_df1['neighbourhood_group']=='Staten Island']

In [None]:
# Top 3 neighbourhoods in Manhattan which are having maximum prices
my_df1 = df_manhattan.groupby(['neighbourhood'])['price'].max().sort_values(ascending=False).reset_index().head(3)
print('Top 3 neighbourhoods in Manhattan which are having maximum prices')
print(my_df1)

In [None]:
# Top 3 neighbourhoods in Staten Island which are having maximum prices
print('Top 3 neighbourhoods in Staten Island which are having maximum prices')
df_staten.groupby(['neighbourhood'])['price'].max().sort_values(ascending=False).reset_index().head(3)

In [None]:
# top 3 neighbourhoods in bronx which are having maximum prices
print('Top 3 neighbourhoods in Bronx which are having maximum prices')
df_bronx.groupby(['neighbourhood'])['price'].max().sort_values(ascending=False).reset_index().head(3)

In [None]:
# top 3 neighbourhoods in Queens which are having maximum prices
print('Top 3 neighbourhoods in Queens which are having maximum prices')
df_queens.groupby(['neighbourhood'])['price'].max().sort_values(ascending=False).reset_index().head(3)

In [None]:
# top 3 neighbourhoods in brooklyn which are having maximum prices
print('Top 3 neighbourhood in Brooklyn which are having maximum prices')
df_brooklyn.groupby(['neighbourhood'])['price'].max().sort_values(ascending=False).reset_index().head(3)

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

### **Chart - 5: How neighbourhood is related to reviews ?**

#### **Top 5 Neighbourhoods having highest reviews per month**

In [None]:
my_df1 = airbnb_df1.groupby(['neighbourhood'])['reviews_per_month'].sum().sort_values(ascending=False).reset_index().head()
print(my_df1)

# Plotting the pie chart
plt.pie(my_df1['reviews_per_month'], labels=my_df1['neighbourhood'], autopct='%1.1f%%', startangle=90)
plt.title('Top 5 neighbourhoods having highest reviews per month')
plt.show()

####**Top 5 Neighbourhoods having highest number of reviews**

In [None]:
my_df2 = airbnb_df1.groupby(['neighbourhood'])['number_of_reviews'].sum().sort_values(ascending=False).reset_index().head()
print(my_df2)

# Plotting the pie chart
plt.pie(my_df2['number_of_reviews'], labels=my_df2['neighbourhood'], autopct='%1.1f%%', startangle=90)
plt.title('Top 5 neighbourhoods having highest number of reviews')
plt.show()

##### **1. Why did you pick the specific chart?**

A pie plot is a suitable visualization for determining the top neighbourhoods with highest reviews in Airbnb dataset. It represents the distribution of reviews through proportional slices, where each slice corresponds to a neighbourhood and its size reflects the relative number of reviews. By comparing the sizes of the slices, it becomes evident which neighbourhood has the largest slice and thus the highest number of reviews. The clear and intuitive nature of the pie plot allows viewers to easily grasp the distribution and identify the most popular neighbourhood group for accommodations in NYC.

##### **2. What is/are the insight(s) found from the chart?**

Bedford-Stuyvesant is the neighbourhood has highest number of total reviews and highest number of reviews per month.

##### **3. Will the gained insights help creating a positive business impact?**

---


Are there any insights that lead to negative growth? Justify with specific reason.

By considering the reviews from the customers across all the neighbourhoods, hosts and guests can align their strategies and choices with the preferences and trends observed in the market.

### **Chart - 6: What is the average price of Airbnb listings categorized by room type?**

In [None]:
# applying groupby over 'neighbourhood_groups' and 'room_type'
# then applying mean of price  and unstacking for clear visualization

avg_price_df = airbnb_df1.groupby(['neighbourhood_group','room_type'])['price'].mean().unstack()
print(avg_price_df)

# ploting the barplot
avg_price_df.plot.bar(figsize=(15,5),ylabel='Average Price calculated')
plt.title("Average price for each room type according to each neighbourhood group")

##### **1. Why did you pick the specific chart?**

A column chart is used in the Airbnb dataset to visualize the distribution of listing prices across different room types. This plot allows for a clear comparison of prices among the categories, with each room type represented by a separate column. The column chart provides valuable insights into the price dynamics and trends associated with each room type, assisting hosts and guests in making informed decisions based on their budget and preferences.

##### **2. What is/are the insight(s) found from the chart?**

*   The bar plot allows for a visual representation of the price ranges and variations among different room types. Guests can quickly identify the price brackets associated with each room type, helping them narrow down their search and find accommodations that fit their budget.
*   Hosts can leverage the information from the bar plot to make informed pricing decisions. By understanding the distribution of prices for different room types, hosts can set competitive and attractive prices for their listings.

##### **3. Will the gained insights help creating a positive business impact?**
Are there any insights that lead to negative growth? Justify with specific reason.

The bar plot helps identify the popularity of different room types based on their frequency of occurrence and price distribution. Hosts can use this information to align their offerings with guest preferences. By offering competitive prices for popular room types, hosts can attract more bookings and enhance guest satisfaction.

### **Chart - 7: What is the distribution of room types across different neighbourhood groups?**

In [None]:
# Set figure size
# plt.figure(figsize=(7, 5))

# Set title
plt.title("Room Type on Neighbourhood Group")

# Defining the desired color palette
colors = ['#41d3bd', '#f05d5e', '#272932']

# Setting the theme and color palette for the plot
sns.set_theme(style="whitegrid", palette=colors)

# Creating a countplot showing the distribution of room types in each neighbourhood group
sns.countplot(x=airbnb_df1.neighbourhood_group, hue=airbnb_df1.room_type)
plt.xlabel('Neighbourhood Group')
plt.ylabel('Count')

# Add a legend for the room types
plt.legend(title='Room Type')

# Adjust the plot layout
plt.tight_layout()

# Display the plot
plt.show()

##### **1. Why did you pick the specific chart?**

a countplot showing the distribution of room types in each neighbourhood group. This chart was selected because it effectively visualizes the relationship between the categorical variables "neighbourhood_group" and "room_type" by displaying the counts of each room type within each neighbourhood group. It allows for easy comparison and identification of the dominant room types in different neighbourhood groups.

##### **2. What is/are the insight(s) found from the chart?**

*   The chart provides insights into the distribution of room types across different neighbourhood groups.
*   It helps identify the most common room type in each neighbourhood group.
*   It reveals any variations in room type distribution between neighbourhood groups.

##### **3. Will the gained insights help creating a positive business impact?**
Are there any insights that lead to negative growth? with specific reason.

*   The gained insights can positively impact business by informing marketing strategies and target audience selection.
*   They can help hosts optimize pricing and revenue by aligning with prevalent room types in each neighbourhood group.
*   The insights assist in resource allocation and inventory management based on demand patterns.
*   No insights from the chart directly lead to negative growth. However, a limited distribution of room types in a neighborhood group may require hosts to consider diversifying offerings or targeting alternative markets to mitigate potential negative impacts.

### **Chart - 8: What is the distribution of the room type and its distribution over the location ?**

####**Room type distrubution**

In [None]:
plt.title("Total number of listings across each room_type")
airbnb_df1['room_type'].value_counts().plot(kind='bar',color=['g','b','y'])

Lets see how room_type is distributed over all location ,is there any place where there is a dominance of any particular room_type over the others despite of thier overall ratios ?

**Scatter plot**

In [None]:
plt.figure(figsize=(15,8))
sns.scatterplot(x=airbnb_df1['longitude'],y=airbnb_df1['latitude'], hue=airbnb_df1['room_type']).set(title='Room type scatter plot on map')
plt.show()

##### **1. Why did you pick the specific chart?**

a scatter plot overlaid on a map image of New York City. This chart was selected to visualize the geographical distribution of room types across different neighborhoods.

##### **2. What is/are the insight(s) found from the chart?**

So we can notice the following:

1)Maximum number of rooms are Entire home/Apartment and Private room, there are only few shared rooms.

2)So mostly hosts prefer to give Entire home/Appartment or Private Rooms rather than shared rooms.

##### **3. Will the gained insights help creating a positive business impact?**
Are there any insights that lead to negative growth? Justify with specific reason.

It helps identify clusters of various types of rooms in specific areas of the city which makes airbnb easy to identify what are the most trending room types.

### **Chart - 9: Top neighbourhood groups having highest number of reviews?**

In [None]:
my_df1 = airbnb_df1.groupby(['neighbourhood_group'])['number_of_reviews'].sum().sort_values(ascending=False).reset_index()
print(my_df1)

# column plot to show highest reviews per month
fig, ax = plt.subplots(figsize=(8, 5))
sns.barplot(x = 'neighbourhood_group', y = 'number_of_reviews', data = my_df1, ax=ax).set(title='Top neighbourhood groups having highest number of reviews')
plt.show()

##### **1. Why did you pick the specific chart?**

A bar chart is used in the Airbnb dataset to visualize the highest number of reviews gained by the neighbourhood groups. This plot allows for a clear comparison of neighbourhood groups each represented by a separate column.

##### **2. What is/are the insight(s) found from the chart?**

As we can see that Brooklyn has more number of reviews and the least number of reviews goes to the Staten Island. Manhattan has second highest number of reviews than the brooklyn.

##### **3. Will the gained insights help creating a positive business impact?**
Are there any insights that lead to negative growth? Justify with specific reason.

*   Higher review counts indicate increased bookings and reflect guest satisfaction and preferences.
*   Positive reviews contribute to improved reputation and credibility for hosts, attracting more guests and increasing bookings.
*   However, it's important to manage negative reviews and address any issues to maintain a positive reputation.

### **Chart - 10: Stay Requirement counts by Minimum Nights**

In [None]:
# Group the DataFrame by the minimum_nights column and count the number of rows in each group
min_nights_count = airbnb_df1.groupby('minimum_nights').size().reset_index(name = 'count')

# Sort the resulting DataFrame in descending order by the count column
min_nights_count = min_nights_count.sort_values('minimum_nights', ascending=True)

# Select the top 10 rows
min_nights_count = min_nights_count.head(15)

# Reset the index
min_nights_count = min_nights_count.reset_index(drop=True)

# Display the resulting DataFrame
min_nights_count

In [None]:
# Extract the minimum_nights and count columns from the DataFrame
minimum_nights = min_nights_count['minimum_nights']
count = min_nights_count['count']

# Set the figure size
plt.figure(figsize=(12, 4))

# Create the bar plot
plt.plot(minimum_nights, count, color='brown')
plt.scatter(minimum_nights, count, color='red')

# Plot the line

# Add axis labels and a title
plt.xlabel('Minimum Nights', fontsize='14')
plt.ylabel('Count', fontsize='14')
plt.title('Stay Requirement by Minimum Nights', fontsize='15')

# Show the plot
plt.show()

##### **1. Why did you pick the specific chart?**

By utilizing a line plot, we can gain insights into how the stay requirement count is related by minimum nights. This information can assist hosts and guests in making informed decisions based on the relationship between stay requirement count and minimum nights on airbnb.

##### **2. What is/are the insight(s) found from the chart?**

*  The majority of listings on Airbnb have a minimum stay requirement of 1 or 2 nights, with 9487 and 9799 listings, respectively.
*   The number of listings with a minimum stay requirement decreases as the length of stay
increases, with 6341 listings requiring a minimum stay of 3 nights, and so on.

##### **3. Will the gained insights help creating a positive business impact?**
Are there any insights that lead to negative growth? Justify with specific reason.

*   Hosts may choose to set higher minimum stay requirements during peak seasons or events to maximize revenue. Analyzing stay requirement counts by minimum nights can help hosts optimize pricing and availability to achieve higher occupancy rates and revenue.
*   Some guests may prefer shorter stays, while others may prefer longer stays. By aligning their minimum stay requirements with customer preferences, hosts can attract more bookings and improve guest satisfaction.

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
# mask = np.triu(np.ones_like(airbnb_df.corr(), dtype=bool))
cmap = sns.diverging_palette(100, 7, s=75, l=40, n=5, center="light", as_cmap=True)
hm = sns.heatmap(data = airbnb_df1.corr(), annot = True, cmap = cmap, fmt='.2f')
plt.show()

##### 1. Why did you pick the specific chart?

The correlation heatmap was chosen because it provides a comprehensive overview of the relationships between variables in the dataset. It allows for easy identification of patterns and associations between different features, helping to uncover potential insights and understand the underlying structure of the data.

##### 2. What is/are the insight(s) found from the chart?

*   The heatmap provides visual representation of correlations between variables in the Airbnb dataset.
*   The insights gained from the heatmap can help understand the interdependencies between variables and their relationships.
*   These insights can inform decision-making processes by identifying strong correlations and potential factors influencing certain outcomes.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
# Pair Plot visualization code
# Create a pairplot using the seaborn library to visualize the relationships between different variables in the Airbnb NYC dataset
sns.pairplot(airbnb_df1)
# Show the plot
plt.show()

##### 1. Why did you pick the specific chart?

The pair plot was chosen to visualize the relationships between different variables in the Airbnb NYC dataset in a concise and comprehensive manner.

##### 2. What is/are the insight(s) found from the chart?

*   The pair plot is a visual representation of the relationships between pairs of variables in the Airbnb 2019 NYC dataset.
*   It displays scatter plots for numeric variables and histograms for categorical variables.
*   By examining the scatter plots, we can identify patterns such as linear relationships, where two variables have a strong positive or negative correlation.
*   Clusters of points in the scatter plots can indicate groups or subsets within the data that share similar characteristics.
*   Outliers, represented as data points deviating significantly from the general trend, can also be detected in the pair plot.
*   The pair plot provides insights into the correlations and distributions between variables, allowing us to identify potential associations and dependencies.
*   These insights can be valuable for further analysis and decision-making processes, such as selecting relevant features for predictive modeling or understanding the factors influencing certain outcomes.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?


Based on the analysis of the NYC Airbnb dataset, here are some suggestions to help the client achieve their business objectives:
1.   Optimize Pricing Strategy: Analyze the relationship between pricing and factors such as location, property type, and amenities. Adjust prices accordingly to maximize occupancy rates and profitability. Consider implementing dynamic pricing strategies based on demand and seasonality.
2.   Improve Listing Quality: Enhance the attractiveness and competitiveness of listings by improving the quality of descriptions, photos, and amenities. Highlight unique selling points and ensure accurate and detailed information to attract potential guests.
3.  Enhance Customer Experience: Focus on providing excellent customer service and ensuring a positive experience for guests. Promptly address any issues or concerns, and encourage guests to leave reviews to build credibility and attract more bookings.  
4.    Maintain Property Availability: Ensure a consistent availability of properties throughout the year, especially during peak tourist seasons and events. Plan maintenance and renovation schedules to minimize disruptions and maximize occupancy rates.
5.    Monitor Competitor Activity: Stay updated on the offerings and prices of competitors in the local market. Adjust strategies accordingly to remain competitive and capture potential guests.
6.    Consider Special Offers and Discounts: Implement promotional offers, discounts, or loyalty programs to attract guests and encourage repeat bookings. Monitor the impact of these offers on occupancy rates and profitability.





# **Conclusion**

The Airbnb project aimed to provide a comprehensive analysis of the short-term rental market in New York City using a dataset that includes various attributes of Airbnb listings. By examining the information available, we were able to gain valuable insights into the trends and patterns within the industry.

Room types provided insights into the variety of accommodations available on Airbnb. The dataset included information on the type of room listed, such as entire homes, private rooms, or shared rooms. This analysis helped us understand the preferences of hosts and guests and provided valuable information on the diversity of options for travelers in NYC.

Price was a significant aspect of the project, as it directly impacted both hosts and guests. The dataset included the price of each listing, allowing us to analyze the range and distribution of prices across different neighborhoods and room types. This information helped us identify areas with higher-priced accommodations and understand the factors that influenced pricing.

Minimum nights indicated the minimum duration for which guests had to book a listing. By examining this attribute, we gained insights into the preferences of hosts and the demand for longer stays in the Airbnb market.

The number of reviews and the date of the last review provided important metrics to evaluate the popularity and performance of listings. Analyzing the number of reviews allowed us to understand the level of engagement and satisfaction among guests. Additionally, the reviews per month metric provided insights into the frequency of bookings and the overall demand for a particular listing.

The calculated host listings count attribute indicated the total count of listings for each host. This allowed us to identify hosts with multiple listings and assess the involvement of professional hosts in the Airbnb market.

Finally, the availability 365 attribute captured the availability of listings throughout the year. This information helped us understand the seasonality of the Airbnb market and identify periods of high demand and low availability.

In conclusion, By examining attributes such as listing details, host information, location, pricing, reviews, and availability, we gained valuable insights into the industry's dynamics, trends, and patterns. This information can be used to inform various stakeholders, including hosts, guests, policymakers, and researchers, to make informed decisions and understand the impact of short-term rentals on the city.


### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***