# **Project Name - short-term rental Bookings Analysis**

**Author:** Prahlad Somapurkar  
This notebook explores pricing, availability, and customer behavior in a short-term rental dataset using Python exploratory data analysis techniques.


### **Project Type - Exploratory Data Analysis**


# **Project Summary -**

*   **The purpose of the analysis:** understanding the factors that influence short-term rental prices in New York City, or identifying patterns of all variables and  Our analysis provides useful information for travelers and hosts in the city and also provides some best insights for short-term rental business.
*   This project involved exploring and cleaning a dataset to prepare it for analysis.
*   This project also helps to understand customer preferences and popular areas for booking.
*   The insights can support better pricing and business decisions in the rental market.

# **Problem Statements -**

1. What are the most popular neighborhoods for short-term rental rentals in New York City?

2. How has the short-term rental market in New York City changed over time?

3. Are there any patterns or trends in terms of the types of properties that are being rented out on short-term rental in New York City?

4. Are there any factors that seem to be correlated with the prices of short-term rental rent

# **Let's Begin !**




### **Importing the necessary libraries**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt     #for visualization
%matplotlib inline
import seaborn as sns               #for visualization
import warnings
warnings.filterwarnings('ignore')

### **Load short-term rental Dataset**

In [None]:
rental_df = pd.read_csv('/content/Airbnb NYC 2019.csv')
rental_df

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.94190,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.10,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48890,36484665,Charming one bedroom - newly renovated rowhouse,8232441,Sabrina,Brooklyn,Bedford-Stuyvesant,40.67853,-73.94995,Private room,70,2,0,,,2,9
48891,36485057,Affordable room in Bushwick/East Williamsburg,6570630,Marisol,Brooklyn,Bushwick,40.70184,-73.93317,Private room,40,4,0,,,2,36
48892,36485431,Sunny Studio at Historical Neighborhood,23492952,Ilgar & Aysel,Manhattan,Harlem,40.81475,-73.94867,Entire home/apt,115,10,0,,,1,27
48893,36485609,43rd St. Time Square-cozy single bed,30985759,Taz,Manhattan,Hell's Kitchen,40.75751,-73.99112,Shared room,55,1,0,,,6,2


##**About the Dataset – short-term rental Bookings**

*   This short-term rental dataset contains nearly 49,000 observations from New York , with 16 columns of data.

*   The Data includes both categorical and numeric values, providing a diverse range of information about the listings.

*   This Dataset may be useful for analyzing trends and patterns in the short-term rental market in New York and

In [None]:
rental_df.head(3)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365


##**UNDERSTAND THE GIVEN VARIABLES**

**Listing_id :-** This is a unique identifier for each listing in the dataset.

**Listing_name :-** This is the name or title of the listing, as it appears on the short-term rental website.

**Host_id :-** This is a unique identifier for each host in the dataset.

**Host_name :-** This is the name of the host as it appears on the short-term rental website.

**

# **Data Exploration and Data Cleaning**

In [None]:
rental_df.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [None]:
rental_df.head().T

Unnamed: 0,0,1,2,3,4
id,2539,2595,3647,3831,5022
name,Clean & quiet apt home by the park,Skylit Midtown Castle,THE VILLAGE OF HARLEM....NEW YORK !,Cozy Entire Floor of Brownstone,Entire Apt: Spacious Studio/Loft by central park
host_id,2787,2845,4632,4869,7192
host_name,John,Jennifer,Elisabeth,LisaRoxanne,Laura
neighbourhood_group,Brooklyn,Manhattan,Manhattan,Brooklyn,Manhattan
neighbourhood,Kensington,Midtown,Harlem,Clinton Hill,East Harlem
latitude,40.64749,40.75362,40.80902,40.68514,40.79851
longitude,-73.97237,-73.98377,-73.9419,-73.95976,-73.94399
room_type,Private room,Entire home/apt,Private room,Entire home/apt,Entire home/apt
price,149,225,150,89,80


In [None]:
#checking what are the variables here:
rental_df.columns

Index(['id', 'name', 'host_id', 'host_name', 'neighbourhood_group',
       'neighbourhood', 'latitude', 'longitude', 'room_type', 'price',
       'minimum_nights', 'number_of_reviews', 'last_review',
       'reviews_per_month', 'calculated_host_listings_count',
       'availability_365'],
      dtype='object')



---


*   **so now first rename few columns for better understanding of variables -**

In [None]:
rename_col = {'id':'listing_id','name':'listing_name','number_of_reviews':'total_reviews','calculated_host_listings_count':'host_listings_count'}

In [None]:
# use a pandas function to rename the current function
rental_df = rental_df.rename(columns = rename_col)
rental_df.head(2)

Unnamed: 0,listing_id,listing_name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,total_reviews,last_review,reviews_per_month,host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355


In [None]:
#checking shape of Airbnb dataset
rental_df.shape

(48895, 16)

In [None]:
#basic information about the dataset
rental_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48895 entries, 0 to 48894
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   listing_id           48895 non-null  int64  
 1   listing_name         48879 non-null  object 
 2   host_id              48895 non-null  int64  
 3   host_name            48874 non-null  object 
 4   neighbourhood_group  48895 non-null  object 
 5   neighbourhood        48895 non-null  object 
 6   latitude             48895 non-null  float64
 7   longitude            48895 non-null  float64
 8   room_type            48895 non-null  object 
 9   price                48895 non-null  int64  
 10  minimum_nights       48895 non-null  int64  
 11  total_reviews        48895 non-null  int64  
 12  last_review          38843 non-null  object 
 13  reviews_per_month    38843 non-null  float64
 14  host_listings_count  48895 non-null  int64  
 15  availability_365     48895 non-null 

**So, host_name, neighbourhood_group, neighbourhood and room_type fall into categorical variable category.**

**While host_id, latitude, longitude, price, minimum_nights, number_of_reviews, last_review, reviews_per_month, host_listings_count, availability_365 are numerical variables**

---

In [None]:
# check duplicate rows in dataset
rental_df = rental_df.drop_duplicates()
rental_df.count()

Unnamed: 0,0
listing_id,48895
listing_name,48879
host_id,48895
host_name,48874
neighbourhood_group,48895
neighbourhood,48895
latitude,48895
longitude,48895
room_type,48895
price,48895


**so, there is no any duplicate rows in Dataset**

---

In [None]:
# checking null values of each columns
rental_df.isnull().sum()

Unnamed: 0,0
listing_id,0
listing_name,16
host_id,0
host_name,21
neighbourhood_group,0
neighbourhood,0
latitude,0
longitude,0
room_type,0
price,0



**host_name** and **listing_name** are not that much of null values, so first  we are good to fill those with some substitutes in both the columns first.




In [None]:
rental_df['listing_name'].fillna('unknown',inplace=True)
rental_df['host_name'].fillna('no_name',inplace=True)

In [None]:
#so the null values are removed
rental_df[['host_name','listing_name']].isnull().sum()

Unnamed: 0,0
host_name,0
listing_name,0


now, the columns **last_review** and **reviews_per_month** have total 10052 null values each.

**last_review** column is not required for our analysis as compared to **number_of_reviews** & **reviews_per_month**. We're good to drop this column.

**listing_id** also not that much of important for our analysis but i dont remove because of **listing_id** and **listing_name** is pair and removing list

In [None]:
rental_df = rental_df.drop(['last_review'], axis=1)     #removing last_review column beacause of not that much important

In [None]:
rental_df.info()      # the last_review column is deleted

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48895 entries, 0 to 48894
Data columns (total 15 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   listing_id           48895 non-null  int64  
 1   listing_name         48895 non-null  object 
 2   host_id              48895 non-null  int64  
 3   host_name            48895 non-null  object 
 4   neighbourhood_group  48895 non-null  object 
 5   neighbourhood        48895 non-null  object 
 6   latitude             48895 non-null  float64
 7   longitude            48895 non-null  float64
 8   room_type            48895 non-null  object 
 9   price                48895 non-null  int64  
 10  minimum_nights       48895 non-null  int64  
 11  total_reviews        48895 non-null  int64  
 12  reviews_per_month    38843 non-null  float64
 13  host_listings_count  48895 non-null  int64  
 14  availability_365     48895 non-null  int64  
dtypes: float64(3), int64(7), object(5)
m

The **reviews_per_month** column also containing null values and we can simple put 0 reviews by replacing NAN's
i think this is make sense -

In [None]:
rental_df['reviews_per_month'] = rental_df['reviews_per_month'].replace(to_replace=np.nan,value=0).astype('int64')

In [None]:
# the null values are replaced by 0 value
rental_df['reviews_per_month'].isnull().sum()

np.int64(0)

In [None]:
rental_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48895 entries, 0 to 48894
Data columns (total 15 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   listing_id           48895 non-null  int64  
 1   listing_name         48895 non-null  object 
 2   host_id              48895 non-null  int64  
 3   host_name            48895 non-null  object 
 4   neighbourhood_group  48895 non-null  object 
 5   neighbourhood        48895 non-null  object 
 6   latitude             48895 non-null  float64
 7   longitude            48895 non-null  float64
 8   room_type            48895 non-null  object 
 9   price                48895 non-null  int64  
 10  minimum_nights       48895 non-null  int64  
 11  total_reviews        48895 non-null  int64  
 12  reviews_per_month    48895 non-null  int64  
 13  host_listings_count  48895 non-null  int64  
 14  availability_365     48895 non-null  int64  
dtypes: float64(2), int64(8), object(5)
m

**so there is no null value now in 'reviews_per_month' column** because we replaced null value by 0 value. this will make sense because there is no any such data to find those null value




In [None]:
#so now check Dataset columns changed and null values, last_review column removed.
rental_df.sample(5)

---
### **Check Unique Value for variables and doing some experiments -**

In [None]:
# check unique values for listing/property Ids
# all the listing ids are different and each listings are different here.
rental_df['listing_id'].nunique()

In [None]:
# so there are 221 unique neighborhood in Dataset
rental_df['neighbourhood'].nunique()

In [None]:
#and total 5 unique neighborhood_group in Dataset
rental_df['neighbourhood_group'].nunique()

In [None]:
#so total 11453 different hosts in Airbnb-NYC
rental_df['host_name'].nunique()

In [None]:
# most of the listing/property are different in Dataset
rental_df['listing_name'].nunique()

**Note** - so i think few listings/property with same names has different hosts in different areas/neighbourhoods of a neighbourhood_group


In [None]:
rental_df[rental_df['host_name']=='David']['listing_name'].nunique()

# so here same host David operates different 402 listing/property

In [None]:
rental_df[rental_df['listing_name']==rental_df['host_name']].head()

# there are few listings where the listing/property name and the host have same names

In [None]:
rental_df.loc[(rental_df['neighbourhood_group']=='Queens') & (rental_df['host_name']=='Alex')].head(4)

# Same host have hosted different listing/property in different or same neighbourhood in same neighbourhood groups
# like Alex hosted different listings in most of different neighbourhood and there are same also in queens neighbourhood_group!


---

# **Describe the Dataset and removing outliers**

In [None]:
# describe the DataFrame
rental_df.describe()

**Note** - price column is very important so we have to find big outliers in important columns first.

In [None]:
sns.boxplot(x = rental_df['price'])

plt.show()

---

### **using IQR technique**

In [None]:
# writing a outlier function for removing outliers in important columns.
def iqr_technique(DFcolumn):
  Q1 = np.percentile(DFcolumn, 25)
  Q3 = np.percentile(DFcolumn, 75)
  IQR = Q3 - Q1
  lower_range = Q1 - (1.5 * IQR)
  upper_range = Q3 + (1.5 * IQR)                        # interquantile range

  return lower_range,upper_range

In [None]:
lower_bound,upper_bound = iqr_technique(rental_df['price'])

rental_df = rental_df[(rental_df.price>lower_bound) & (rental_df.price<upper_bound)]

In [None]:
# so the outliers are removed from price column now check with boxplot and also check shape of new Dataframe!

sns.boxplot(x = rental_df['price'])
print(rental_df.shape)

In [None]:
# so here outliers are removed, see the new max price
print(rental_df['price'].max())



---

# **Data Visualization**




   **(1) Distribution Of short-term rental Bookings Price Range Using Histogram**


In [None]:
# Create a figure with a custom size
plt.figure(figsize=(12, 5))

# Set the seaborn theme to darkgrid
sns.set_theme(style='darkgrid')

# Create a histogram of the 'price' column of the rental_df dataframe
# using sns distplot function and specifying the color as red
sns.distplot(rental_df['price'],color=('r'))

# Add labels to the x-axis and y-axis
plt.xlabel('Price', fontsize=14)
plt.ylabel('Density', fontsize=14)

# Add a title to the plot
plt.title('Distribution of Airbnb Prices',fontsize=15)

**observations -->**

*   The range of prices being charged on short-term rental appears to be from **20 to 330 dollars** , with the majority of listings falling in the price range of **50 to 150 dollars.**

*   The distribution of prices appears to have a peak in the **50 to 150 dollars range**, with a relatively lower density of listings in higher and lower price ranges.

*   There may be fewer 



---


   **(2) Total Listing/Property count in Each Neighborhood Group using Count plot**




In [None]:
# Count the number of listings in each neighborhood group and store the result in a Pandas series
counts = rental_df['neighbourhood_group'].value_counts()

# Reset the index of the series so that the neighborhood groups become columns in the resulting dataframe
Top_Neighborhood_group = counts.reset_index()

# Rename the columns of the dataframe to be more descriptive
Top_Neighborhood_group.columns = ['Neighborhood_Groups', 'Listing_Counts']

# display the resulting DataFrame
Top_Neighborhood_group


In [None]:
# Set the figure size
plt.figure(figsize=(12, 8))

# Create a countplot of the neighbourhood group data
sns.countplot(rental_df['neighbourhood_group'])

# Set the title of the plot
plt.title('Neighbourhood_group Listing Counts in NYC', fontsize=15)

# Set the x-axis label
plt.xlabel('Neighbourhood_Group', fontsize=14)

# Set the y-axis label
plt.ylabel('total listings counts', fontsize=14)


**Observations -->**

*   Manhattan and Brooklyn have the highest number of listings on short-term rental, with over 19,000 listings each.

*   Queens and the Bronx have significantly fewer listings compared to Manhattan and Brooklyn, with 5,567 and 1,070 listings, respectively

*   Staten Island has the fewest number of listings, with only 365.

*   The distribution of listings across the differ



---

**(3) Average Price Of Each Neighborhood Group using Point Plot**

In [None]:
# Group the Airbnb dataset by neighborhood group and calculate the mean of each group
grouped = rental_df.groupby("neighbourhood_group").mean()

# Reset the index of the grouped dataframe so that the neighborhood group becomes a column
neighbourhood_group_avg_price = grouped.reset_index()

# Rename the "price" column to "avg_price"
neighbourhood_group_avg_price = round(neighbourhood_group_avg_price.rename(columns={"price": "avg_price"}),2)

# Select only the "neighbourhood_group" and "avg_price" columns
neighbourhood_group_avg_price[['neighbourhood_group', 'avg_price']].head()

In [None]:
#import mean function from the statistics module
from statistics import mean

# Create the point plot
sns.pointplot(x = 'neighbourhood_group', y='price', data=rental_df, estimator = np.mean)

# Add axis labels and a title
plt.xlabel('Neighbourhood Group',fontsize=14)
plt.ylabel('Average Price',fontsize=14)
plt.title('Average Price by Neighbourhood Group',fontsize=15)

**Observations -->**

*   The average price of a listing in New York City varies significantly across different neighborhoods, with **Manhattan having the highest 146 dollars/day  average price** and **the Bronx having the lowest near 77 dollars/day.**

*   In second graph price distribution is very high in Manhattan and Brooklyn.
but Manhattan have more varity in price range, you can see in secon

---

**(4) Price Distribution Of Each Neighborhood Group using Violin Plot**

*Additional note: Analysis refined for portfolio presentation.*


In [None]:
# Create the violin plot for price distribution in each Neighbourhood_groups

ax= sns.violinplot(x='neighbourhood_group',y='price',data= rental_df)

**Observations -->**

*   price distribution is very high in Manhattan and Brooklyn. but Manhattan have more Diversity in price range, you can see in violin plot.

*   Queens and Bronx have same price distribution but in Queens area more distribution in 50$ to 100$ but diversity in price is not like Manhattan and Brooklyn.

---





---
**(4) Top Neighborhoods by Listing/property using Bar plot**

In [None]:
# create a new DataFrame that displays the top 10 neighborhoods in the Airbnb NYC dataset based on the number of listings in each neighborhood
Top_Neighborhoods = rental_df['neighbourhood'].value_counts()[:10].reset_index()

# rename the columns of the resulting DataFrame to 'Top_Neighborhoods' and 'Listing_Counts'
Top_Neighborhoods.columns = ['Top_Neighborhoods', 'Listing_Counts']

# display the resulting DataFrame
Top_Neighborhoods



In [None]:
# Get the top 10 neighborhoods by listing count
top_10_neigbourhoods = rental_df['neighbourhood'].value_counts().nlargest(10)

# Create a list of colors to use for the bars
colors = ['c', 'g', 'olive', 'y', 'm', 'orange', '#C0C0C0', '#800000', '#008000', '#000080']

# Create a bar plot of the top 10 neighborhoods using the specified colors
top_10_neigbourhoods.plot(kind='bar', figsize=(15, 6), color = colors)

# Set the x-axis label
plt.xlabel('Neighbourhood', fontsize=14)

# Set the y-axis label
plt.ylabel('Total Listing Counts', fontsize=14)

# Set the title of the plot
plt.title('Listings by Top Neighborhoods in NYC', fontsize=15)


**Observations -->**

*   The top neighborhoods in New York City in terms of listing counts are Williamsburg, Bedford-Stuyvesant, Harlem, Bushwick, and the Upper West Side.

*   The top neighborhoods are primarily located in Brooklyn and Manhattan. This may be due to the fact that these boroughs have a higher overall population and a higher demand for housing.

*   The number of listings alone may



---

**(5) Top Hosts With More Listing/Property using Bar chart**

In [None]:
# create a new DataFrame that displays the top 10 hosts in the Airbnb NYC dataset based on the number of listings each host has
top_10_hosts = rental_df['host_name'].value_counts()[:10].reset_index()

# rename the columns of the resulting DataFrame to 'host_name' and 'Total_listings'
top_10_hosts.columns = ['host_name', 'Total_listings']

# display the resulting DataFrame
top_10_hosts



In [None]:
# Get the top 10 hosts by listing count
top_hosts = rental_df['host_name'].value_counts()[:10]

# Create a bar plot of the top 10 hosts
top_hosts.plot(kind='bar', color='peru', figsize=(18, 7))

# Set the x-axis label
plt.xlabel('top10_hosts', fontsize=14)

# Set the y-axis label
plt.ylabel('total_NYC_listings', fontsize=14)

# Set the title of the plot
plt.title('top 10 hosts on the basis of no of listings in entire NYC!', fontsize=15)


**Observations -->**

*   The top three hosts in terms of total listings are Michael, David, and John, who have 383, 368, and 276 listings, respectively.

*   There is a relatively large gap between the top two hosts and the rest of the hosts. For example, john has 276 listings, which is significantly fewer than Michael's 383 listings.

*   In this top10 list Mike has 184 listings, which is signif



---

**(6) Number Of Active Hosts Per Location Using Line Chart**

In [None]:
# create a new DataFrame that displays the number of hosts in each neighborhood group in the Airbnb NYC dataset
hosts_per_location = rental_df.groupby('neighbourhood_group')['listing_id'].count().reset_index()

# rename the columns of the resulting DataFrame to 'Neighbourhood_Groups' and 'Host_counts'
hosts_per_location.columns = ['Neighbourhood_Groups', 'Host_counts']

# display the resulting DataFrame
hosts_per_location



In [None]:
# Group the data by neighbourhood_group and count the number of listings for each group
hosts_per_location = rental_df.groupby('neighbourhood_group')['listing_id'].count()

# Get the list of neighbourhood_group names
locations = hosts_per_location.index

# Get the list of host counts for each neighbourhood_group
host_counts = hosts_per_location.values

# Set the figure size
plt.figure(figsize=(12, 5))

# Create the line chart with some experiments using marker function
plt.plot(locations, host_counts, marker='o', ms=12, mew=4, mec='r')

# Add a title and labels to the x-axis and y-axis
plt.title('Number of Active Hosts per Location', fontsize='15')
plt.xlabel('Location', fontsize='14')
plt.ylabel('Number of Active Hosts', fontsize='14')

# Show the plot
plt.show()

**Observations -->**

*   Manhattan has the largest number of hosts with 19501,Brooklyn has the second largest number of hosts with 19415.

* After that Queens with 5567 and the Bronx with 1070. while Staten Island has the fewest with 365.

*   Brooklyn and Manhattan have the largest number of hosts, with more than double the number of hosts in Queens and more than 18 times the number of hosts in 



---
**(7) Average Minimum Price In Neighborhoods using Scatter and Bar chart**


In [None]:
# create a new DataFrame that displays the average price of Airbnb rentals in each neighborhood
neighbourhood_avg_price = rental_df.groupby("neighbourhood").mean().reset_index().rename(columns={"price": "avg_price"})[['neighbourhood', 'avg_price']]

# select the top 10 neighborhoods with the lowest average prices
neighbourhood_avg_price = neighbourhood_avg_price.sort_values("avg_price").head(10)

# join the resulting DataFrame with the 'neighbourhood_group' column from the Airbnb NYC dataset, dropping any duplicate entries
neighbourhood_avg_price_sorted_with_group = neighbourhood_avg_price.join(rental_df[['neighbourhood', 'neighbourhood_group']].drop_duplicates().set_index('neighbourhood'),
on='neighbourhood')

# Display the resulting data
display(neighbourhood_avg_price_sorted_with_group.style.hide_index())


In [None]:
neighbourhood_avg_price = (rental_df.groupby("neighbourhood").mean().reset_index().rename(columns={"price": "avg_price"}))[['neighbourhood', 'avg_price']]
neighbourhood_avg_price = (neighbourhood_avg_price.sort_values("avg_price"))

# Group the data by neighborhood and calculate the average price
neighbourhood_avg_price = rental_df.groupby("neighbourhood")["price"].mean()

# Create a new DataFrame with the average price for each neighborhood
neighbourhood_prices = pd.DataFrame({"neighbourhood": neighbourhood_avg_price.index, "avg_price": neighbourhood_avg_price.values})

# Merge the average price data with the original DataFrame#trying to find where the coordinates belong from the latitude and longitude
df = rental_df.merge(neighbourhood_prices, on="neighbourhood")

# Create the scattermapbox plot
fig = df.plot.scatter(x="longitude", y="latitude", c="avg_price", title="Average Airbnb Price by Neighborhoods in New York City", figsize=(12,6), cmap="plasma")
fig

In [None]:
# Extract the values from the dataset
neighborhoods = neighbourhood_avg_price_sorted_with_group['neighbourhood']
prices = neighbourhood_avg_price_sorted_with_group['avg_price']

# Create the bar plot
plt.figure(figsize=(15,5))
plt.bar(neighborhoods, prices,width=0.5, color = 'orchid')
plt.xlabel('Neighborhood')
plt.ylabel('Average Price')
plt.title('Average Price by Neighborhood')

# Show the plot
plt.show()

**Observations -->**

* All of the neighborhoods listed are located in the outer boroughs of New York City (Bronx, Queens, and Staten Island). This suggests that these neighborhoods may have a lower overall cost of living compared to neighborhoods in Manhattan and Brooklyn.

*  Most of these neighborhoods are located in the Bronx and Staten Island. These boroughs tend to have a lower overall cost 



---
**(8) Total Counts Of Each Room Type**


In [None]:
# create a new DataFrame that displays the number of listings of each room type in the Airbnb NYC dataset
top_room_type = rental_df['room_type'].value_counts().reset_index()

# rename the columns of the resulting DataFrame to 'Room_Type' and 'Total_counts'
top_room_type.columns = ['Room_Type', 'Total_counts']

# display the resulting DataFrame
top_room_type



In [None]:
# Set the figure size
plt.figure(figsize=(10, 6))

# Get the room type counts
room_type_counts = rental_df['room_type'].value_counts()

# Set the labels and sizes for the pie chart
labels = room_type_counts.index
sizes = room_type_counts.values

# Create the pie chart
plt.pie(sizes, labels=labels, autopct='%1.1f%%')

# Add a legend to the chart
plt.legend(title='Room Type', bbox_to_anchor=(0.8, 0, 0.5, 1), fontsize='12')

# Show the plot
plt.show()


**Observations -->**

*  The majority of listings on short-term rental are for entire homes or apartments, with 22784 listings, followed by private rooms with 21996 listings, and shared rooms with 1138 listings.

*  There is a significant difference in the number of listings for each room type. For example, there are almost 20 times as many listings for entire homes or apartments as there are for 



---
**(9) Stay Requirement counts by Minimum Nights using Bar chart**


In [None]:
# Group the DataFrame by the minimum_nights column and count the number of rows in each group
min_nights_count = rental_df.groupby('minimum_nights').size().reset_index(name = 'count')

# Sort the resulting DataFrame in descending order by the count column
min_nights_count = min_nights_count.sort_values('count', ascending=False)

# Select the top 10 rows
min_nights_count = min_nights_count.head(15)

# Reset the index
min_nights_count = min_nights_count.reset_index(drop=True)

# Display the resulting DataFrame
min_nights_count

In [None]:
# Extract the minimum_nights and count columns from the DataFrame
minimum_nights = min_nights_count['minimum_nights']
count = min_nights_count['count']

# Set the figure size
plt.figure(figsize=(12, 4))

# Create the bar plot
plt.bar(minimum_nights, count)

# Add axis labels and a title
plt.xlabel('Minimum Nights', fontsize='14')
plt.ylabel('Count', fontsize='14')
plt.title('Stay Requirement by Minimum Nights', fontsize='15')

# Show the plot
plt.show()

**Observations -->**

*   The majority of listings on short-term rental have a minimum stay requirement of 1 or 2 nights, with 12067 and 11080 listings, respectively.

*   The number of listings with a minimum stay requirement decreases as the length of stay increases, with 7375 listings requiring a minimum stay of 3 nights, and so on.

*   There are relatively few listings with a minimum stay req



---
**(10) Total Reviews by Each Neighborhood Group using Pie Chart**


In [None]:
# Group the data by neighborhood group and calculate the total number of reviews
reviews_by_neighbourhood_group = rental_df.groupby("neighbourhood_group")["total_reviews"].sum()

# Create a pie chart
plt.pie(reviews_by_neighbourhood_group, labels=reviews_by_neighbourhood_group.index, autopct='%1.1f%%')
plt.title("Number of Reviews by Neighborhood Group in New York City", fontsize='15')

# Display the chart
plt.show()



**Observations -->**

*   Brooklyn has the largest share of total reviews on short-term rental, with 43.3%, followed by Manhattan with 38.9%.

* Queens has the third largest share of total reviews, with 14.2%, followed by the Bronx with 2.6% and Staten Island with 1.0%.

*   The data suggests that short-term rental is more popular in Brooklyn and Manhattan compared to the other neighborhood groups



---
**(11) Number of Max. Reviews by Each Neighborhood Group using Pie Chart**





In [None]:
# Group the Airbnb data by neighbourhood group
reviews_by_neighbourhood_group = rental_df.groupby("neighbourhood_group")["total_reviews"].max()

# Create a pie chart to visualize the distribution of maximum number of reviews among different neighbourhood groups
plt.pie(reviews_by_neighbourhood_group, labels=reviews_by_neighbourhood_group.index, autopct='%1.1f%%')

# Add a title to the chart
plt.title("Number of maximum Reviews by Neighborhood Group in NYC", fontsize='15')

# Display the chart
plt.show()



**Observations -->**

*   Queens and Manhattan seem to be the most popular neighborhoods for reviewing, as they have both high number of maximum reviews.

*   Queens has the highest percentage of reviews at 26.5%, but it has the third highest number of listings, behind Manhattan and Brooklyn. This suggests that Queens may be a particularly popular destination for tourists or visitors, even though 



---
**(12) most reviewed room type per month in neighbourhood groups**


In [None]:
# create a figure with a default size of (10, 8)
f, ax = plt.subplots(figsize=(10, 8))

# create a stripplot that displays the number of reviews per month for each room type in the Airbnb NYC dataset
ax = sns.stripplot(x='room_type', y='reviews_per_month', hue='neighbourhood_group', dodge=True, data=rental_df, palette='Set1')

# set the title of the plot
ax.set_title('Most Reviewed room_types in each Neighbourhood Groups', fontsize='14')



**Observations -->**

*   We can see that Private room recieved the most no of reviews/month where Manhattan had the highest reviews received for Private rooms with more than 50 reviews/month, followed by Manhattan in the chase.

*   Manhattan & Queens got the most no of reviews for Entire home/apt room type.

*   There were less reviews recieved from shared rooms as compared to other room types a

**(13)Count Of Each Room Types In Entire NYC Using Multiple Bar Plot**

In [None]:
# Now analysis Room types count in Neighbourhood groups in NYC

# Set the size of the plot
plt.rcParams['figure.figsize'] = (8, 5)

# Create a countplot using seaborn
ax = sns.countplot(y='room_type', hue='neighbourhood_group', data=rental_df, palette='bright')

# Calculate the total number of room_type values
total = len(rental_df['room_type'])

# Add percentage labels to each bar in the plot
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))

# Add a title to the plot
plt.title('count of each room types in entire NYC', fontsize='15')

# Add a label to the x-axis
plt.xlabel('Room counts', fontsize='14')

# Rotate the x-tick labels
plt.xticks(rotation=90)

# Add a label to the y-axis
plt.ylabel('Rooms', fontsize='14')

# Display the plot
plt.show()



**Observations -->**

* Manhattan has more listed properties with Entire home/apt around 24.6% of total listed properties followed by Brooklyn with around 19.5%.

*   Private rooms are more in Brooklyn as in 21.9% of the total listed properties followed by Manhattan with 16.9% of them. While 7.3% of private rooms are from Queens.

*   Very few of the total listed have shared rooms listed on short-



---

**(14) use latitude and longitude in scatterplot map and find neighbourhood_groups and Room types in map**

In [None]:
#trying to find where the coordinates belong from the latitude and longitude

# set the default figure size for the seaborn library
sns.set(rc={"figure.figsize": (10, 8)})

# create a scatter plot that displays the longitude and latitude of the listings in the Airbnb NYC dataset
ax = sns.scatterplot(data=rental_df, x="longitude", y="latitude", hue='neighbourhood_group', palette='bright')

# set the title of the plot
ax.set_title('Location Co-ordinates', fontsize='14')



In [None]:
# Let's observe the type of room_types

# set the default figure size for the seaborn library
sns.set(rc={"figure.figsize": (10, 8)})

# create a scatter plot that displays the longitude and latitude of the listings in the Airbnb NYC dataset with room_types.
ax = sns.scatterplot(x=rental_df.longitude, y=rental_df.latitude, hue=rental_df.room_type, palette='muted')

# set the title of the plot
ax.set_title('Distribution of type of rooms across NYC', fontsize='14')





---
**(15) Price variations in NYC Neighbourhood groups using scatter plot**


In [None]:
# Let's have an idea of the price variations in neighborhood_groups

# create a scatter plot that displays the longitude and latitude of the listings in the Airbnb NYC dataset, with the color of each point indicating the price of the listing
lat_long = rental_df.plot(kind='scatter', x='longitude', y='latitude', label='price_variations', c='price',
                  cmap=plt.get_cmap('jet'), colorbar=True, alpha=0.4, figsize=(10, 8))

# add a legend to the plot
lat_long.legend()


**Observations -->**

*   The range of prices for accommodations in Manhattan is particularly high, indicating that it is the most expensive place to stay in NYC due to its various attractive amenities, as shown in the attached image.

*   they are likely to attract a lot of tourists or visitors because of more valuable things to visit so price is higher than other neighbourhood groups.

*   Trave

---
**(16) Find Best Location Listing/Property Location For Travelers and Hosts**

In [None]:
# Group the data by neighborhood and calculate the average number of reviews
neighbourhood_avg_reviews = rental_df.groupby("neighbourhood")["total_reviews"].mean()

# Create a new DataFrame with the average number of reviews for each neighborhood
neighbourhood_reviews = pd.DataFrame({"neighbourhood": neighbourhood_avg_reviews.index, "avg_reviews": neighbourhood_avg_reviews.values})

# Merge the average number of reviews data with the original DataFrame
df = rental_df.merge(neighbourhood_reviews, on="neighbourhood")

# Create the scattermapbox plot
fig = df.plot.scatter(x="longitude", y="latitude", c="avg_reviews", title="Average Airbnb Reviews by Neighborhoods in New York City", figsize=(14,8), cmap="plasma")

# Display the scatter map
fig



**Observations -->**

* I have attached a photo of this map because of some valuable insight. The neighborhoods near the airport in Queens would have a higher average number of reviews, as they are likely to attract a lot of tourists or visitors who are passing through the area.

*   There could also be other factors contributing to the high average number of reviews in these neighborhoods. For ex



---
**(17) Correlation Heatmap Visualization**




In [None]:
# Calculate pairwise correlations between columns
corr = rental_df.corr()

# Display the correlation between columns
corr



In [None]:
# Set the figure size
plt.figure(figsize=(12,6))

# Visualize correlations as a heatmap
sns.heatmap(corr, cmap='BrBG',annot=True)

# Display heatmap
plt.show()


**Observations -->**

*   There is a moderate positive correlation (0.58) between the host_id and id columns, which suggests that hosts with more listings are more likely to have unique host IDs.

*   There is a weak positive correlation (0.17) between the price column and the calculated_host_listings_count column, which suggests that hosts with more listings tend to charge higher prices for their



---

**(18) Pair Plot Visualization**


In [None]:
# create a pairplot using the seaborn library to visualize the relationships between different variables in the Airbnb NYC dataset
sns.pairplot(rental_df)

# show the plot
plt.show()



*   A pair plot consists of multiple scatterplots arranged in a grid, with each
scatterplot showing the relationship between two variables

*   It can be used to visualize relationships between multiple variables and to identify patterns in the data.

---

---
## **BUSINESS CONCLUSION :-**



*   Manhattan and Brooklyn have the highest demand for short-term rental rentals, as evidenced by the large number of listings in these neighborhoods. This could make them attractive areas for hosts to invest in property.

*   Manhattan is world-famous for its parks, museums, buildings, town, liberty, gardens, markets, island and also its substantial number of tourists throughout the year ,it make

---

# **Thank You**




## Final Key Takeaways 📌
- Pricing varies significantly by location and room type.
- Availability patterns suggest seasonal demand.
- Such insights help hosts optimize pricing and occupancy.
