# **Project Name**    - ***AirBnb Project***



# **Project Summary - AirBnb Bookings Analysis**

Airbnb is an online marketplace that connects people who want to rent out their homes with people who are looking for accommodations in that locale. It currently covers more than 100,000 cities and 220 countries worldwide. For hosts, it's a way to earn money while protecting their property from potential damage. However, for guests, it's a risky venture that they should avoid.

For this project we are analyzing Airbnb’s New York City(NYC) data. NYC is not only the most famous city in the world but also top global destination for visitors drawn to its museums, entertainment, restaurants and commerce. According to the Office of New York State Comptroller, NYC hosted 66.6 million visitors in 2019.

Data analysis on thousands of listings provided through Airbnb is a crucial factor for the company. Our main objective is to find out the key metrics that influence the listing of properties on the platform. For this, we will explore and visualize the dataset from Airbnb in NYC using basic exploratory data analysis (EDA) techniques. We have found out the distribution of every Airbnb listing based on their location, including their price range, room type, listing name, and other related factors. We have analyzed this dataset from different angles and have come up with interesting insights. This can help in making strategic data-driven decisions by the marketing team, finance team and technical team of Airbnb.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**This project aims to analyze Airbnb data using Python language and it's libraries, to perform data cleaning and preparation, develop interactive visualizations, and create dynamic plots to gain insights into pricing variations, availability patterns, review per month, and location-based trends.**

#### **Define Your Business Objective?**

- The goal of the project - The purpose of the project is to gather information and analyze the detailed information of the different bookings in the neighborhood groups in order to provide insights about the bookings in a particular area as per your preference, type of rooms, and price accordingly.
- We have tried discovering relationships among different columns and found meaningful insights to decipher business impacts.

### Data Cleaning ###
1. Remove the null and duplicate values from the dataset
2. cleaning Dataset for better result

### Datat Visulization ###


# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np                               #importing numpy
import pandas as pd                              #imporing pandas
import matplotlib.pyplot as plt                  #imporitng matplotlib
# %matplotlib inline                               #Jupyter Notebook command
plt.rcParams['figure.figsize'] = (10, 7)         #to set the default size of figures
import seaborn as sns                            #importing seaborn
import missingno as msno                         #importing missingo for visulizing missing data

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

#File path of Airbnb dataset in google drive
file_path = "/content/drive/MyDrive/Colab Notebooks/Dataset/Airbnb NYC 2019.csv"
airbnb_df = pd.read_csv(file_path)

### Dataset First View

In [None]:
# Dataset First Look
airbnb_df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
row , column = airbnb_df.shape
print("No of rows:",row)
print("No of columns:",column)

### Dataset Information

In [None]:
# Dataset Info
airbnb_df.info()
airbnb_df.describe()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
airbnb_df[airbnb_df.duplicated()].sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
airbnb_df.isnull().sum()

In [None]:
# Visualizing the missing values
msno.bar(airbnb_df)
plt.show()

### What did you know about your dataset?

As per above observation we can say name, host_name, neighbourhood_group, neighbourhood, room_type, last_review fields are object type and remains are numeric type data.



1.   There is no duplicate value in this data set
2.   There are some missing value in the data set in name, host_name, reviews_per_month, last_review column                                                                                                     



## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
airbnb_df.columns

In [None]:
# Dataset Describe
airbnb_df.describe(include="all")

### Variables Description



1. ID(int64) - ID of the customer who booked the Airbnb.
2. Name(object) - Name Airbnb.
3. host_id(int64) - ID of the owner who rents out the property.
4. Host_name(object) - Name of the owner
5. neighbourhood_group(object)- It indicates the region that includes several small neighborhoods
6. Neighborhood(object) - It indicates the geographically smaller areas within a city or region having their own local identity and characteristics.
7. Latitude(float64) - Shows Latitude of the place
8. Longitude(float64) - Shows Longitude of the place
9. room_type(object) - Indicates the type of the room like private room, Entire home/apt, or shared room.
10. price(int64) - Show Rent price
11. minimum_nights(int64) - Show minimum night for which you must rent the home or room
12. number_of_reviews(int64) - shows the number of reviews
13. last_review(object) - Shows the last date of the review
14. reviews_per_month(float64) -  show number of reviews given in a particular month.
15. calculated_host_listings_count(int64)- It shows the count of listings per host.
16. availability_365(int64) - It indicates for how many days the Airbnb is available in a year.







### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
airbnb_df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
airbnb_df.drop(['last_review',"reviews_per_month"], axis=1, inplace=True)
airbnb_df.dropna(inplace=True)
airbnb_df

In [None]:
def outliers(col):
  sns.boxplot(airbnb_df[col],color='#A059BF')
  plt.ylabel(col)
  return plt.show()

In [None]:
print([column for column in airbnb_df.columns if type(column) is not object])
# x=input("enter column name from above List: ")
# outliers(x)
outliers("price")

In [None]:
#removing outliers
airbnb_df.drop(airbnb_df[airbnb_df['minimum_nights']>=600].index,inplace=True,axis=0)

In [None]:
#handling outliers having  price '0'
x=airbnb_df['price'].mode()
airbnb_df['price'].replace(0,x[0],inplace=True)

### What all manipulations have you done and insights you found?

- The columns last_review and reviews_per_month had large number of null values which contributed a lot to the data and from analysis point of view these columns are of no significance so we dropped these columns.
- Replacing the outlier value of price with mode value of price and drop all outlier value of minimum nighs column

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1) Avg. Price per Location

In [None]:
# Chart - 1 visualization code
#Average_price of property according to the location
avg_price_preffered_df = airbnb_df.groupby(['neighbourhood_group','room_type'], as_index=False)['price'].mean().rename(columns={'price':'Average price'})
avg_price_preffered_df

In [None]:
#Unstack the group by information for plot the graph
avg_price_for_room_type = airbnb_df.groupby(['neighbourhood_group','room_type'])['price'].mean().unstack()
avg_price_for_room_type

In [None]:
# Chart of average price of property according to location
avg_price_for_room_type.plot.bar()
plt.title("average price of each neighbourhood group with room types")
plt.ylabel("Price")
plt.xlabel("Location")

##### 1. Why did you pick the specific chart?

i choose bar chart because i want to show cateforical data in the chart and i know that bar chart is best fit for categorial data so i choose bar chart to show the data .

##### 2. What is/are the insight(s) found from the chart?

From my observation i found that manhatten location has exceptionally high average price for all the room type , and rest of the location has price range between 50-140.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

yes, from above insights you can improve business by establish business in manhatten location because you can get high price for each room or house.

#### Chart - 2) Host no. per Location

In [None]:
# Chart - 2 visualization code
#Apply group by operation on neighbourhood_group for find the number of host according to location
no_of_host_per_location = airbnb_df.groupby('neighbourhood_group',as_index=False)['host_id'].count().sort_values(['host_id'], ascending=False).rename(columns={'neighbourhood_group':'Location','host_id':'Host'})
no_of_host_per_location

In [None]:
# Chart to show the number of host according to the location
plt.legend(labels = ['No. of Host'])
plt.plot(no_of_host_per_location['Location'], no_of_host_per_location['Host'])
plt.title('Number of host per location')
plt.ylabel('Host')
plt.xlabel('Location')


##### 1. Why did you pick the specific chart?

i choose line, because i want to see the trend of host presence with respect to location. as per my knownledge for observe the trend we use line chart so i choose line chart

##### 2. What is/are the insight(s) found from the chart?

chart show that manhattan has highest no. of host with 20k plus host where as staten island has host no. as low as 50.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

As per observation, manhattan has highest host so business competition is high but if you want less bussiness competition then you can choose staten island

#### Chart - 3) Top Booking Property

In [None]:
# Chart - 3 visualization code
#Highest number of bookings of the property
highest_bookings= airbnb_df.groupby(['neighbourhood_group','name'])['name'].agg({'count'}).reset_index().rename(columns={'count': "No of bookings", 'name':"Property Name" }).sort_values(by='No of bookings',ascending=False)
top_ten_highest_bookings= highest_bookings[:10]
top_ten_highest_bookings

In [None]:
#Graph for the top 10 highest bookings
top_ten_highest_bookings.plot.bar(x='Property Name',y = 'No of bookings')
plt.title('Property with the most booking')
plt.ylabel('No of bookings')
plt.xlabel('Property Name')
plt.show()

##### 1. Why did you pick the specific chart?

Since property name is categorical data and number of booking is the numerical data and for depicting this appropriate graph is bar graph.

##### 2. What is/are the insight(s) found from the chart?

Visualize the top 10 properties with the highest number of bookings in the Airbnb dataset.We can easily infer from the graph that all the top ten properties have the same number of bookings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Are there any insights that lead to negative growth? Justify with specific reason.

Provides insight into the success of certain hosts and their properties.Recognizing popular properties can inform targeted marketing efforts for those specific listings.High-performing hosts may be approached for potential partnerships or collaborations.Displaying top-performing properties can enhance customer trust in the platform.

Other hosts may feel neglected if not featured, potentially impacting relationships.Focusing solely on the number of bookings may overlook other factors like customer satisfaction.

#### Chart - 4) Finding Total count of each room types

In [None]:
# Chart - 4 visualization code

#Finding unique values from column 'room_type'
airbnb_room_type = airbnb_df.room_type.unique()
airbnb_room_type

In [None]:
#Which is the most listed room type?
airbnb_roomtype_count = dict(airbnb_df.room_type.value_counts())
airbnb_roomtype_count

In [None]:
#Creating Dataset
room_type = list(airbnb_roomtype_count.keys())
data = list(airbnb_roomtype_count.values())

#Creating color parameters
colors = ( "skyblue", "red", "yellow",)

#Creating explode data
explode = (0.03, 0.03, 0.1)

#Wedge properties
wp = { 'linewidth' : 1, 'edgecolor' : "pink" }

#Creating autocpt arguments
def func(pct, allvalues):
    absolute = int(pct / 100.*np.sum(allvalues))
    return "{:.1f}%\n({:d})".format(pct, absolute)

#Creating Pie Chart
fig, airbnb_pie_chart = plt.subplots(figsize =(12, 8))
wedges, texts, autotexts = airbnb_pie_chart.pie(data, autopct = lambda pct: func(pct, data),
                                                explode = explode,
                                                shadow = True,
                                                colors = colors,
                                                startangle = 0,
                                                wedgeprops = wp,
                                                textprops = dict(color ="black"))

#Adding legend
airbnb_pie_chart.legend(wedges, room_type,
                        title ="Room Type",
                        loc ="upper left",
                        bbox_to_anchor=(1, 0., 0.,1))

plt.setp(autotexts, size = 15, weight = "bold")
airbnb_pie_chart.set_title("Count of Listed Rooms")

##### 1. Why did you pick the specific chart?

Pie chart provides a visual representation of the proportion of each room type in the dataset.A pie chart is effective in visually communicating the distribution of room types, making it easy to grasp.

##### 2. What is/are the insight(s) found from the chart?

The Entire home/apt are highest in count(52.2%) and shared rooms are lowest in count(2.4%) and private room are 45.4%.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Are there any insights that lead to negative growth? Justify with specific reason.

Helps in tailoring marketing strategies based on the popularity of different room types.

#### Chart - 5) Finding Relation between neighbourhood group and availability of rooms

In [None]:
# Chart - 5 visualization code

# Create a box plot to show the relation between the number of availability of rooms in neighbourhood group
ax = sns.boxplot(data=airbnb_df, x='neighbourhood_group',y='availability_365',palette='plasma')
ax.set_title('Relation between Neighbourhood group & Availability of rooms')
ax.set_ylabel('Availability 365 Days')
ax.set_xlabel('Neighbourhood Group')


##### 1. Why did you pick the specific chart?

The box plot visualizes the relationship between room availability and neighborhood groups, providing insights into distribution and outliers.

##### 2. What is/are the insight(s) found from the chart?

There is highest avalibility in Staten Island and lowest in Brooklyn.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Informs pricing strategies based on the availability of rooms in different neighborhood groups.Understanding the distribution of room availability aids in balancing supply and demand, optimizing the marketplace.Can contribute to a better user experience by setting appropriate expectations for room availability.

#### Chart - 6) Top 25 most used words from listing names

In [None]:
# Chart - 6 visualization code
# Creting empty list to store name strings
airbnb_names=[]

# Getting name string from 'name' column and appending it to the empty list
for name in airbnb_df.name:
    airbnb_names.append(name)

# Setting a function to split name strings into seperate words
def split_name(name):
    ns = str(name).split()
    return ns

# Creating empty list to store the count of words
names_count = []

# Getting name string to append it to the names_count list
for n in airbnb_names:
    for word in split_name(n):
        word = word.lower()
        names_count.append(word)

In [None]:
#Importing 'counter' library to count and generate raw data and count top 25 most used words
from collections import Counter

#Counting most common words
count_words = Counter(names_count).most_common()
count_words[:25]

In [None]:

#Cleaning the List by removing prepositions
items_to_remove = {('in', 16733), ('the', 3869), ('to', 3827), ('of', 2993), ('-', 2272), ('a', 1909)}
top_25_cleaned = [e for e in count_words if e not in items_to_remove]
top_25 = top_25_cleaned[:25]
top_25

In [None]:
#Converting the data into DataFrame
word_count_df = pd.DataFrame(top_25)
word_count_df.rename(columns={0:'Words',1:'Counts'},inplace=True)
word_count_df

In [None]:
#Plotting the Chart
count_viz = sns.barplot(data = word_count_df,x='Words',y='Counts')
count_viz.set_title('Top 25 Used Words for Listing Names')
count_viz.set_ylabel('Count of words')
count_viz.set_xlabel('Words')

#Adjusting Bar labels
count_viz.set_xticklabels(count_viz.get_xticklabels(),rotation = 90)

##### 1. Why did you pick the specific chart?

To data of different category we which bar chat so i use bar chat

##### 2. What is/are the insight(s) found from the chart?

From the chart above, we see the top 25 words used in the listing name. We can use the word cloud visualization method to help us better understand the chart.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

we can give name as per above observation to gain special attention

#### Chart - 7) Top 10 hosts with most listings

In [None]:
# Chart - 7 visualization code
#Creating DataFrame of host id with the number of counts
listing_count=airbnb_df['host_id'].value_counts().reset_index()
count_host_id_df= pd.DataFrame(listing_count)

#Storing top 10 hosts with most listings
top_host_id = count_host_id_df.head(10)
top_host_id

In [None]:
# Plotting the Chart
top_host_chart = sns.barplot(x= 'host_id', y= 'count', color='y', data=top_host_id, order=top_host_id.sort_values('count',ascending = False).host_id)
plt.xticks(rotation=90)
top_host_chart.set_title('Hosts with most listings in New York')
top_host_chart.set_xlabel('Host IDs')
top_host_chart.set_ylabel('Count of listings')

##### 1. Why did you pick the specific chart?

Identifies and displays the top hosts based on the count of their listings.

##### 2. What is/are the insight(s) found from the chart?

The bar graph shows the top 10 host_id with highest listing.The top most is 219517861 and so on.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Informs marketing strategies by identifying hosts with a significant number of listings.Displaying information about top hosts can enhance the user experience for guests.

Focusing solely on the number of listings may overlook other aspects of host performance.

#### Chart - 8) top three hosts based on their turnover.

In [None]:
# Chart - 8 visualization code
# Find the maximum price across different host name
top_host=airbnb_df.groupby(['host_name','host_id'])['price'].sum().reset_index()
top_host.rename(columns={'price':'total_price'},inplace=True)
top_host.head()

In [None]:
# find top three host best on their turnover
top_3=top_host.sort_values('total_price',ascending=False).iloc[:3,:3]
top_3

In [None]:
#Creating a bar chart plot
sns.set(rc={'figure.figsize':(8,10)})
top_3_host_chart = sns.barplot(x='host_name',y='total_price',data = top_3)
top_3_host_chart.set_title('Three hosts based on their turnover')
top_3_host_chart.set_xlabel('Host name')
top_3_host_chart.set_ylabel('Total price')

##### 1. Why did you pick the specific chart?

Bar chart clearly visualize the total turnover of the top three hosts based on the sum of prices for their listings.



##### 2. What is/are the insight(s) found from the chart?

From the chart we are able to understand that sonder has the highest turnover in the top three.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Understanding which hosts contribute the most to total turnover can inform business strategies.Featuring top hosts in marketing materials can attract both hosts and guests.

Other hosts may feel neglected if not featured, potentially impacting relationships.Focusing solely on turnover may overlook other aspects of host performance.

#### Chart - 9) Total no. of nights spend per location

In [None]:
# Chart - 9 visualization code
# Find the maximum(minimum nights) spending by customer across different neighourhood group
total_nights=airbnb_df.groupby('neighbourhood_group')['minimum_nights'].mean().reset_index()
final_nights=total_nights.sort_values('minimum_nights',ascending=False)
final_nights.head()


In [None]:
#Creating a chart of total no. nights spend per location
Night_per_location = sns.barplot(x='neighbourhood_group',y='minimum_nights',data = final_nights)
Night_per_location.set_title('Minimum Average no. of nights per location', weight='bold')
Night_per_location.set_ylabel('Minimum night')
Night_per_location.set_xlabel('Neighbourhood group')

##### 1. Why did you pick the specific chart?

The chart allows for a comparison of the average minimum nights spent in different neighborhood groups.


##### 2. What is/are the insight(s) found from the chart?

Provides insights into customer behavior related to minimum nights, aiding in strategic decision-making.Helps identify areas with high demand for longer stays, informing marketing strategies.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Understanding average minimum nights contributes to optimizing the balance of supply and demand.

#### Chart - 10) Total no. of nights spend per room types

In [None]:
# Chart - 10 visualization code
# Finding the maximum(minimum nights) time spend by customer across different room type
total_room=airbnb_df.groupby('room_type')['minimum_nights'].mean().reset_index()
room_types=total_room.sort_values('minimum_nights',ascending=True)
room_types

In [None]:
#Creating dataset
labels=list(room_types['room_type'])
sizes=list(room_types['minimum_nights'])
# create color parameter
colors=['green','pink','yellow']
# create explode
explode = (0.07, 0.07, 0.07)
#creating pie chart
plt.pie(sizes,explode=explode,labels=labels,colors=colors,autopct='%1.1f%%',shadow=True)
plt.title('Total no. of nights spend per room types', fontsize=20)
plt.axis("equal")
plt.show()

##### 1. Why did you pick the specific chart?

Provides a visual representation of the proportion of average minimum nights of each room type.



##### 2. What is/are the insight(s) found from the chart?

The entire home appartment is highest in average minimum nights and private room is lowest.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Helps identify areas with high demand for longer stays, informing marketing strategies.Provides insights into customer behavior related to minimum nights, aiding in strategic decision-making.Understanding average minimum nights spent contributes to optimizing the balance of supply and demand.

#### Chart - 11) most number of reviews

In [None]:
# Chart - 11 visualization code
#calculating the reviews
highest_review=airbnb_df.groupby(['host_id','host_name']).agg({'number_of_reviews':'sum'}).reset_index()
highest_review.sort_values(by='number_of_reviews',ascending=False,inplace=True)
highest_review=highest_review[:10]
highest_review


In [None]:

#plotting the graph with highest reviwes
sns.barplot(x='host_id', y='number_of_reviews', data=highest_review, palette='viridis')
plt.xlabel('Host Name')
plt.ylabel('Number of Reviews')
plt.title('Distribution of Reviews for Top Hosts')
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

It helps in identifying hosts with the highest number of reviews, providing insight into the popularity and satisfaction levels of these hosts.



##### 2. What is/are the insight(s) found from the chart?

Indicates the popularity of certain hosts based on the frequency of reviews.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Highlighting top hosts with positive reviews can be leveraged for marketing and attracting more guests.Positive reviews contribute to the reputation of the hosts and the overall platform.

High review numbers do not necessarily indicate overall quality or guest satisfaction.



#### Chart - 12) Density of property within a neighbourhood group with location

In [None]:
# Chart - 12 visualization code
#scatter plot for location
location=sns.scatterplot(x='longitude',y='latitude',data=airbnb_df,hue='neighbourhood_group')
location.set_title("Density of property within a neighbourhood group with location")


##### 1. Why did you pick the specific chart?

The scatter plot is used to visualize the geographical distribution of properties based on latitude and longitude, with different colors representing different neighborhood groups.

##### 2. What is/are the insight(s) found from the chart?

Indicates density patterns in different areas, highlighting areas with a higher concentration of properties.The most dense area is Brooklyn with respect to longitude and latitude.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

It helps in understanding the spatial distribution of properties across different neighborhoods.Helps users (both hosts and guests) understand the distribution of properties, aiding in decision-making.Can be used for marketing by showcasing the diversity of property locations within different neighborhood groups.

#### Chart - 13) Correlation of different numerical attributes.

In [None]:
# Chart - 13 visualization code
# Correlation Heatmap visualization code
numeric_df=airbnb_df.select_dtypes(include=['int64','float64'])
sns.heatmap(numeric_df.corr(),annot=True,cmap='coolwarm')

##### 1. Why did you pick the specific chart?

The correlation heatmap is used to visualize the correlation matrix of a dataset.It helps identify relationships and dependencies between different numerical variables in the dataset.

##### 2. What is/are the insight(s) found from the chart?


Warmer colors (e.g., shades of red) indicates that as one variable increases, the other tends to increase as well. Cooler colors (e.g., shades of blue) indicates that as one variable increases, the other tends to decrease. The diagonal line represents the correlation of a variable with itself and is always 1.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


Businesses can optimize resources by understanding which factors influence each other.Insights from the chart can inform decision-making by highlighting variables that strongly correlate, aiding in strategic planning.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?

- Allocate marketing resources more heavily in Manhattan and Brooklyn due to their high host concentration and customer demand.
- Since 'Entire home/apt' is the most listed and preferred type, consider promoting and expanding this category.
- Explore ways to increase the popularity of 'Shared Room' listings, as they currently constitute a small percentage.
- Implement dynamic pricing strategies for Manhattan, as it has the highest average payment across all room types. This can maximize revenue.
- Recognize the high availability in Staten Island and tailor promotions or incentives to increase bookings.
- Address the lower availability in Brooklyn; this might involve encouraging hosts to list their properties or partnering with them.
- Capitalize on the trend of longer stays in private rooms in Brooklyn and Manhattan. Consider promotions or packages for extended bookings.
- Encourage hosts to use popular keywords like 'bedroom,' 'cozy,' 'private,' etc., in their listings to attract more attention.
- Provide support and incentives for hosts with a high turnover, especially those in the top three like Sonder(nyc), Blueground, and Sally.
- Leverage the popularity of the top ten hosts with the most reviews in marketing campaigns to build trust and attract more customers.
- Utilize the density information in Brooklyn for targeted advertising or promotions in specific regions.

# **Conclusion**

Through this exploratory data analysis (EDA) and visualization, we gained several interesting insights into the Airbnb rental market. This Airbnb dataset appeared to be a very rich dataset with a variety of columns that allowed us to do deep data exploration on each significant column presented, like we got to know:

- Manhattan is the most focused place in New York for hosts to do their business.

- Manhattan and Broklyn has the most number of hosts.

- Customers pay highest average amount in Manhattan for all the three types of room.

- 'Entire home/apt' room type has the highest number of listing of 52% and ‘Shared Room’ is the least listed room type at only 2.4% in total.

- There is highest avalibility in Staten Island and lowest in Brooklyn in the year.

- People stay for longer duration of time in Private rooms in Brooklyn and Manhattan.

- Words such as ‘bedroom’, ‘cozy’, ‘private’, ‘apartment’ and ‘spacious’ are used more frequently than words such as ‘park’, ‘near’, ‘village’ and ‘heart’.

- Count of listing by top 10 hosts is almost 2.5%(1270 listings) of the whole dataset. Top three host base on their turnover are Sonder(nyc),Blueground, Sally and best host is Sonder(nyc).

- More customer preferred Manhattan location for night stay then Brooklyn.

- The entire home appartment is highest in average minimum nights and private room is lowest.

- The top ten hosts with the most reviews helps in marketing.
The most dense area is Brooklyn with respect to longitude and latitude.

- The heatmap can pe used as a corelation function which defines the impact of a attribute on another attribute.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***