# **Project Name**    -  **AirBnb Bookings Analysis**



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**


In this project, we are analyzing Airbnb's data for New York City (NYC) in 2019. NYC is not only the most famous city in the world but also a top global destination for visitors attracted to its museums, entertainment, restaurants, UN offices, and commerce.

The project began with a comprehensive understanding of the Airbnb dataset, including data size, information about properties and their availability, prices, locations, reviews, and ratings. We explored data related to Airbnb listings, including the number of properties listed, host characteristics, the variety of amenities available, and the occupancy rate of different properties, among other factors. Further analysis of the data aimed to understand the significance of the reviews left by Airbnb users.

Exploratory data analysis projects on Airbnb typically involve investigating patterns and trends in various aspects of the platform, such as pricing, the popularity and availability of listings. This data can be used to gain insights into consumer behavior and preferences, as well as to inform marketing and business strategies for hosts and Airbnb as a company. Techniques such as data visualization and objective solutions may be used to analyze the data and draw meaningful conclusions.

In this type of analysis, data visualizations such as line plots, scatter plots, and bar charts are used to help identify trends, patterns, and relationships in the data. For instance, a bar chart can be used to show the distribution of properties across different neighborhoods in a city.

Overall, exploratory data analysis provides crucial insights for the Airbnb platform to improve customer satisfaction and enhance rental revenues. The insights also benefit renters who can use the data generated to gain a deeper understanding of the landscape and make informed decisions.

# **GitHub Link -**

GitHub Link.

# **Problem Statement**


The purpose of this exploratory data analysis project is to analyze and examine the factors that influence customer bookings and preferences. The dataset used in this analysis includes information on customer demographics, subscription room type and location, minimum stays, retention rate and experience with service.

The aim is to identify insights and patterns in the data that can help the company understand the drivers of customer retention and inform future decision-making regarding host listing, location, price and customer service and marketing strategies.

#### **Define Your Business Objective?**

1.   Recommending marketing campaign strategies and predicting the destination neighbourhood which are in high demand.

2.   Using Exploratory Data Analysis, find out the most demanded room type, neighbourhood_group.

3.   Find the average days guests prefer to stay in single visit in different room type in varied neighbourhood_group.

4.   Find out the most sought after Price bracket in which maximum booking happens and get most reviews.

5.   Find the neighbourhood_group in which maximum listings done by top hosts? Specify the reason behind it with your insight.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import missingno as msno
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
import os
print(os.getcwd())

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

In [None]:
file_path='/content/drive/MyDrive/Almabetter/Airbnb_NYC_2019.csv'

In [None]:
Air_df=pd.read_csv(file_path)

### Dataset First View

In [None]:
# Dataset First Look
Air_df.head(5).T

In [None]:

Air_df.rename(columns={'id':'listing_id','name':'listing_name','number_of_reviews':'total_reviews'},inplace=True)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
Air_df.shape


### Dataset Information

In [None]:
# Dataset Info
Air_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
Air_df= Air_df.drop_duplicates()
Air_df.count().T

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
Air_df.isnull().sum()

In [None]:
# Visualizing the missing values
Air_df['host_name'].fillna('--No Name--',inplace=True )
Air_df['listing_name'].fillna('*--Unkonwn--*',inplace=True )

In [None]:
Air_df['reviews_per_month'] = Air_df['reviews_per_month'].replace(to_replace=np.nan,value=0).astype('int64')
#Air_df['last_review'] = Air_df['last_review'].fillna(0,inplace=True )


In [None]:
Air_df = Air_df.drop(['last_review'], axis=1)

In [None]:
Air_df.isnull().sum()

In [None]:
Air_df.dtypes

### What did you know about your dataset?

Answer Here : In Airbnb dataset has 48895 rows and 16 columns Lets try to understand about the columns we've got here

* **Listing_id:** This column show  a unique id identifying an airbnb lisitng


* **Listing_name:** The name of listed properties/room_type on platform                      
* **Host_id:** This column show a unique id identifying an airbnb host                            
* **Host_name:** This column show  whom host is registered                    
* **Neighbourhood_group:** This column show the group of area                
* **Neighbourhood:**  Area falls under neighbourhood_group                     
* **Latitude:** Coordinate of listing                          
* **Longitude:** Coordinate of listing                       
* **Room_type:**  This column show  how many types of room                          
* **Price:**  This column show price of rooms                            
* **Minimum_nights:** This column show how many night spent in room                       
* **Total_reviews:** This column show total count of reviews given by visitors                       
* **Last_review:** Content of last review given                       
* **Reviews_per_month:**  Checks of per month/reviews given per month                 
* **Calculated_host_listings_count:** This column show total no of listing registered under the host     
* **Availability_365:**  The number of days for which a host is available in a year.  


In this dataset **Last_review** had 10025 null values

## ***2. Understanding Your Variables***

In [None]:
 #Dataset Columns
 Air_df.columns

In [None]:
# Dataset Describe
Air_df.describe()


### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
#Unique value of listing_id
Air_df['listing_id'].nunique()

In [None]:
#Unique value of neighbourhood
Air_df['neighbourhood'].nunique()

In [None]:
#Unique value of neighbourhood_group
Air_df['neighbourhood_group'].value_counts()

In [None]:
#Unique value of host_name
Air_df['host_name'].nunique()

In [None]:
#Unique value of listing_name
Air_df['listing_name'].nunique()

In [None]:
#Unique value of room type
Air_df['room_type'].value_counts()

In [None]:
Air_df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

**1 Neighbourhood_group Listing Counts in NYC**

In [None]:
# Set the figure size
plt.figure(figsize=(4, 4))

# Create a countplot of the neighbourhood group data
sns.countplot(x='neighbourhood_group', data=Air_df)

# Set the title of the plot
plt.title('Neighbourhood_group Listing Counts in NYC', fontsize=15)

# Set the x-axis label
plt.xlabel('Neighbourhood_Group', fontsize=14)

# Set the y-axis label
plt.ylabel('Total Listings Counts', fontsize=14)

# Rotate the x-axis labels for better readability (optional)
plt.xticks(rotation=45)

# Show the plot
plt.show()


##### 1. Why did you pick the specific chart?

This chart shows different neighbourhood group

##### 2. What is/are the insight(s) found from the chart?

Total 5 neighbourhood group Brooklyn, Manhattan, Queens, Staten island, Bronx this are the 5 neighbourhood group

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights will help to increase business by targeting particular area of market which have higher number of orders and booking
Manhattan and Brooklyn have the highest number of listings on Airbnb, with over 19,000 listings each.

Queens and the Bronx have significantly fewer listings compared to Manhattan and Brooklyn, with 5,567 and 1,070 listings, respectively

Staten Island has the fewest number of listings, with only 365.


#### Chart - 2

**2 Check busiest host**

In [None]:
#Busiest_host
busiest_hosts = Air_df.groupby(['host_name', 'host_id','room_type'])['total_reviews'].max().reset_index()
busiest_hosts = busiest_hosts.sort_values(by='total_reviews', ascending=False).head(10)
busiest_hosts



In [None]:
name = busiest_hosts['host_name']
reviews = busiest_hosts['total_reviews']

fig = plt.figure(figsize = (8, 5))
plt.bar(name, reviews, color ='chocolate', width = 0.4)
plt.xlabel("Name of the Host")
plt.ylabel("Number of Reviews")
plt.title("Busiest Hosts", fontsize=14)
plt.show()

##### 1. Why did you pick the specific chart?

This is best chart which give us idea about which kind of room_type having which facilities attract most number of customers.

##### 2. What is/are the insight(s) found from the chart?

This chart shows that busiest hosts in top 10 are mostly Private room and few entire home/apt. That shows Private rooms have high frequncy compared to others.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Are there any insights that lead to negative growth? Justify with specific reason.

We can use busiest hosts experience and their room type for our understanding like locality, facilities, aesthetics, service by which customer getting satisfied and paying more number of visits. Accordingly we can arrange training for our host community for making positive impact like these hosts


#### Chart - 3

In [None]:
total_room_type = Air_df['room_type'].value_counts().reset_index()

# rename the columns of the resulting DataFrame to 'Room_Type' and 'Total_counts'
total_room_type.columns = ['Room_Type', 'Total_counts']

# display the resulting DataFrame
total_room_type

In [None]:
# Chart - 3 visualization code


plt.figure(figsize=(4, 4))

# Create a countplot of the neighbourhood group data
sns.countplot(x='room_type', data=Air_df)

# Set the title of the plot
plt.title('Types of Room', fontsize=15)

# Set the x-axis label
plt.xlabel('Types of romm', fontsize=14)

# Set the y-axis label
plt.ylabel('Total room Counts', fontsize=14)

# Rotate the x-axis labels for better readability (optional)
plt.xticks(rotation=45)

# Show the plot
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

**4 Checking top 15 neighbourhoods on the basis of no of listings in entire NYC**

In [None]:
top_15_neigbours= Air_df['neighbourhood'].value_counts()[:15]
colors = ['c', 'g', 'olive', 'y', 'm', 'orange', '#C0C0C0', '#800000', '#008000', '#000080', '#E9B824','#FFA1F5', '#279EFF']
top_15_neigbours.plot(kind='bar',color=colors)
plt.xlabel('Neighbourhood')
plt.ylabel('Counts in entire NYC')
plt.title('Top 15 neighbourhoods in entire NYC on the basis of count of listings')


##### 1. Why did you pick the specific chart?

This chart shows the which neighbourhood had most no of booking.

##### 2. What is/are the insight(s) found from the chart?

Williamsburg ,Bedford-Stuyvesant and Harlem are top 3 neighbouhood which has most no of booking and chelsea, Lower east side and Astoria has very low booking numbers

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

With the help of this data we can find target places to improve services at those and it will also help to get profit by limiting the advertisement market area so we can minimize the cost and increase the profit.

#### Chart - 5

**5 Average price in different room**

In [None]:
# Chart - 5 visualization code
avg_price = Air_df.groupby(["room_type"])["price"].mean()
colors = ['#79155B', '#C23373', '#F6635C']
a = avg_price.plot.bar(figsize = (5,5), fontsize = 10, color=colors)
a.set_xlabel("Room Price", fontsize = 11)
a.set_ylabel("average price", fontsize = 11)
a.set_title("Average price in different room ", fontsize=12)

##### 1. Why did you pick the specific chart?

This chart shows the average price of 3 types of rooms

##### 2. What is/are the insight(s) found from the chart?

This Chart shows shows the average mean price of Entire home/apt is higher as compared to remaining two room types. This also give us profit oriented new listing approach for all stakeholders benefits.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This Chart attracts special attention to make sure the avalability as well as best service for this room type for overall positive customer sentiments and good reviews.

#### Chart - 6

**6 The rooms that have the most good reviews from customers**

In [None]:
# Chart - 6 visualization code
customers_reviews = Air_df.groupby(["room_type"])["total_reviews"].max()
colors = ['#F2CD5C', '#B4B4B3', '#219C90']
a = customers_reviews.plot.bar(figsize = (5,5), fontsize = 10, color=colors)
a.set_xlabel("Types of room", fontsize = 11)
a.set_ylabel("Customers reviews", fontsize = 11)
a.set_title(" The rooms that have the most good reviews from customers", fontsize=12)

##### 1. Why did you pick the specific chart?

This chart shows which room got the most good reviews from customers

##### 2. What is/are the insight(s) found from the chart?

The chart shows that Private room got most reviews and entire home/apt and shared room got alomst same reviews

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 Private room got most good reviews from customer and entire home/apt got second most review and shared room got low review

#### Chart - 7

**7 Most demanded room by customers**

In [None]:

most_damand_room = Air_df.groupby(['room_type'])["host_id"].count()
colors = ['#E7AB79', '#B25068', '#774360']
b = most_damand_room.plot.bar(figsize = (4,4), fontsize = 10, color=colors)
b.set_xlabel("Room type", fontsize = 12)
b.set_ylabel("No of booking", fontsize = 12)
b.set_title("Most demanded room type", fontsize=14)


##### 1. Why did you pick the specific chart?

This chart shows that which room type is in most demand.

##### 2. What is/are the insight(s) found from the chart?

From this chart we found that shared room is least in demand and most demand room type is entire house/apt, people like to rent a entire house/apt followed by shared room. So data suggests to make sure availability and service of demanded room type.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

It help us to know that most people want entire house/apt for rent so that according to our data we will try to make/list/availability most no. of rental properties as entire house/apt or private rooms.

#### Chart - 8

**8 Average Stays in different room types**

In [None]:
# Chart - 8 visualization code
Air_df.groupby('room_type')['minimum_nights'].mean().plot(figsize= (3,4), kind='bar', color='#F73D93')
plt.title('Average Stays in different room types', fontsize = 14)
plt.xlabel('Room types', fontsize = 12)
plt.ylabel('Average Stays', fontsize = 12 )

##### 1. Why did you pick the specific chart?

We choose this chart to show the average nights stays in different room types.

##### 2. What is/are the insight(s) found from the chart?

This chart gives us information about making adequate facilities for average days and when respective room type going to available again for booking.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

By this chart we can derive business profit by making basic calculations like no of times respective room type can be available for booking in a month and accordingly we have to decide booking price considering expenses.

#### Chart - 9

**9 Average price of different neighbourhood groups**

In [None]:
# Chart - 6 visualization code
avg_price = Air_df.groupby(["neighbourhood_group"])["price"].mean()
colors = [ '#809A6F', '#A25B5B', '#CC9C75', '#D5D8B5', '#AEC3AE']
a = avg_price.plot.bar(figsize = (5,5), fontsize = 10, color=colors)
a.set_xlabel("Neighbourhood Group", fontsize = 11)
a.set_ylabel("Average price", fontsize = 11)
a.set_title("Average price of different neighbourhood groups", fontsize=12)

##### 1. Why did you pick the specific chart?

This chart shows price difference between the neighbourhood groups.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that Manhattan has the most average price in the neighbourhood group. This shows the demand for the Manhatten over others

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The higher average price shows more listing price can be feasible for making more profit as well as more listings can be done in the high demand neighboourhood. And spend on offline advertising accordingly

#### Chart - 10

**10 Top Host Name in entire NYC on the basis of count of listings**

In [None]:
top_10_host_name= Air_df['host_name'].value_counts()[:10]
top_10_host_name

In [None]:
# Chart - 10 visualization code
top_10_host_name= Air_df['host_name'].value_counts()[:10] #checking top 10 neighbourhoods on the basis of no of listings in entire NYC!
colors = [ '#219C90', '#E9B824', '#EE9322', '#D83F31','#F31559', '#FF52A2']
top_10_host_name.plot(kind='bar',color=colors)
plt.xlabel('Host Name ')
plt.ylabel('Host per year')
plt.title('Top Host Name in entire NYC on the basis of count of listings')

##### 1. Why did you pick the specific chart?

This chart shows top 10 host

##### 2. What is/are the insight(s) found from the chart?

This chart shows Michael, Davide and sonder are the top 3 host

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

With the help of this data we can find top 10 host to improve services at those and it will also help to get profit by limiting the advertisement market area so we can minimize the cost and increase the profit.

#### Chart - 11

**11 Room Availability throughout Neighbourhood/Room Type using line plot and scatter plot**

In [None]:
# Chart - 11 visualization code
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(22, 6))
ax = axes.flatten()

sns.lineplot(data=Air_df, x='neighbourhood_group', y='availability_365', hue='room_type', ax=ax[0])
ax[0].set_title('Room Availability throughout Neighbourhood/Room Type')

sns.scatterplot(data=Air_df, x='price', y='total_reviews', hue='room_type', ax=ax[1])
ax[1].set_title('Price vs Number of Reviews')
sns.despine(fig, left=True)

##### 1. Why did you pick the specific chart?

In first chart shows how neighbourhood group is busy or available for booking throughout year. In later we tried to draw relationshiop between price and no of reviews.

##### 2. What is/are the insight(s) found from the chart?

First chart shows Statan Island is busiest among all even for least demanded shared room and Manhattan and Brroklyn shows descent bookings and availability.

Second chart shows negative relation between Price and no of reviews. Usually cheaper rooms has more occupancy hence more reviews.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The occupancy and/or vacancy in respective areas for different room type can draw attention for different required solution like listing and/or delisting room_type from neighbourhood group. And Price vs Reviews shows booking volume for resonable Booking Price hence revenue generation Price point range.



#### Chart - 12

In [None]:
# Chart - 12 visualization code

fig = plt.subplots(figsize=(6, 6))

sns.countplot(data=Air_df[Air_df['availability_365']  == 365], x='neighbourhood_group', hue='room_type', palette='GnBu_d')
plt.title('No. of Neighbourhood Group Properties Available 365 days', fontsize=12)
plt.show()

##### 1. Why did you pick the specific chart?

This chart will give us idea about availability of room_type in respective neighbourhood group

##### 2. What is/are the insight(s) found from the chart?

This subplot gives us clear picture i.e. "Trend" of room_type available most during a year. e.g. Private room has most availability round the year except in Manhattan and least availability in Shared room irrespective of Neighbourhood group.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The above Subplot shows us which room type in which neighbourhood group has most as well as least availability throughout the year. Hence this data can be for listing and delisting demanded and least asked room type resp.

Negative Insight - In Manhattan we get to know that Entire home/apt has most availability round the year and in earlier charts we also know that same room_type has most booking but vacancy also most maybe due to more listing of same room_type in the area.

#### Chart - 13

In [None]:

# Chart - average prefered price at every neighbourhood group as per type of room
avg_price_df = Air_df.groupby(['neighbourhood_group','room_type'])['price'].mean().unstack()
avg_price_df

In [None]:
avg_price_df.plot.bar(figsize=(10,5),ylabel='Average Price calculated')

##### 1. Why did you pick the specific chart?

This chart gives us idea about relationship between Price and Room_type in different neighbourhood_group. In this bar plot we are getting the price comparison of each room type in different neighbourhood groups.

##### 2. What is/are the insight(s) found from the chart?

This chart clearly shows some trends in price and room type wrt neighbourhood groups. This chart shows Entire home/apt has highest average price followed by Private room and Shared room. And this chart also shows Manhattan gives us most average price compared to other neighbourhood groups.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This chart gives us good business insight about revenue realisation like from which room type and which neighbourhood group are going to give us highest monetary benefits for ultimate goal of profit.

Negative Insight: If this chart insight gets to the hosts then they won't be that much attracted towards low avg room type as well as neighbourhood group, hence business expansion may face hurdle.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
corr = Air_df.corr(method = 'kendall')
plt.figure(figsize= (10,8))
sns.heatmap(corr, annot =True)
Air_df.columns

##### 1. Why did you pick the specific chart?

The Correlation Heatmap shows us visual bi-variate correlation and relationship between numerical columns of Dataframe in the form of colour shades. We get quick visualisation of large amount of data

##### 2. What is/are the insight(s) found from the chart?

In the Heatmap Matrix some variables with some other have positive correlation and some also have negative correlation. e.g.

Positive corr : Number_of_reviews and reviews_per_month, calculated_host_listings_count and availability_365

Negative corr : price and longitude, number_of_reviews and id, reviews_per_month and minimum_nights

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(Air_df, hue='room_type',
             x_vars=['price', 'total_reviews','reviews_per_month','availability_365'],
             y_vars=['price', 'total_reviews','reviews_per_month','availability_365'],
             kind='scatter', diag_kind= 'hist')

##### 1. Why did you pick the specific chart?

The Pair plot is perfect example for showing correlation between any two required numerical variable in given dataset. It gives all considered charts in one grid like Dashboard.

##### 2. What is/are the insight(s) found from the chart?

The relationship between two variables and formation of separated clustres shows some insight like Price and No of reviews have negative relationship, more price less number of reviews per month, reviews per month vs availability_365 shows less than 20 reviews per month, clustre of availability_365 vs price shows entire home/apt properties available for higher prices etc.

Histogram shows every variable has its frequency, distribution and density. Also shows characteristics of skewness.

Chart - 17

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(15, 6))
ax = axes.flatten()

sns.violinplot(data=Air_df, x='neighbourhood_group', y='price', ax=ax[0])

sns.violinplot(data=Air_df, x='neighbourhood_group', y='price', hue='room_type')

#### Chart - 16

**16 visualization of each neighbourhood_group using latitude and longitude**

In [None]:



plt.figure(figsize = (12,6))
sns.scatterplot(x = Air_df["longitude"], y = Air_df["latitude"], hue = Air_df["neighbourhood_group"])
plt.show

1. Why did you pick the specific chart?

This chart shows the location wrt longitude and latitude of different neighbourhood groups in the city.

2. What is/are the insight(s) found from the chart?

The Scatter plot shows Manhattan and Brooklyn has almost similar longitude that's why they both garner almost 85% of bookings. And Staten Island belongs to outskirts so has less bookings as well as listings.

3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This chart shows locations nearer to the prime hotspots can garner more bookings hence we try hard to list more number of demanding room type and increase overall revenue by attracting customer.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

We have find out the Top 15 neighbourhoods where number of booking is high, we can focus on those areas for marketting campaigns and advertisements for maximize booking and reduce marketting cost by redcing the focus areas.

Most of the guests doesn't prefer shared rooms rather choose entire home/apt and private room. Manhattan and Brooklyn are most demanded neighbourhood groups.

The average stays in Entire home/apt, Shared room and Private room are 8, 6 and 5 days approx. resp. Get to know the average number of booking for respective room type in a month.

Most subscribed booking price is come within the bracket of $10-400 where almost 95% booking happens as well as most reviews are also given by these guests. One more thing Private room and Entire home/apt gives most number of reviews.

Maximum listings done by a top hosts are also concentrated in Manhattan and Brooklyn like higher prices and most bookings. We can say that hosts are also inclined to Manhattan and Brooklyn due to higher price realisation and more booking as we know from earlier charts; so we can ask more comission from hosts here and realise more profit.

# **Conclusion**

From the above Exploratory Data Analysis of Airbnb Dataset we can conclude that:

Manhattan and Brooklyn are the two distinguished, expensive & posh areas of NY. Though location of property has high effect on deciding price, but a property in popular location doesn't mean it will stay occupied in most of the time.

The people who prefer to stay in Entire home or Apartment they are going to stay bit longer and same is the most booked room type in the Neighbourhood group.

The findings from an exploratory data analysis project on Airbnb can help both hosts and guests make more informed decisions. Hosts can learn more about what amenities guests are looking for and how to price their property competitively. On the other hand, guests can follow some parameteres to make decisions about the location, amenities, and price of properties they want to book.

The given Airbnb Dataset has vast data but lacks in some required features because it is not easy to decide property valuation. Overall, conducting an exploratory data analysis project on Airbnb can provide valuable insights into the dynamics of the short-term rental market and enhance the user experience for both hosts and guests.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***