# **Project Name**    - **AirBnb Bookings Analysis**



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Individual**      - Namira Mujawar

# **Project Summary -**

The project is an exploratory data analysis (EDA) of Airbnb bookings data, aimed at gaining insights into the patterns and trends of the Airbnb market. The dataset includes information about Airbnb bookings in a particular location, such as the listing description, the booking date, room type,review, and the price.

The project will begin by cleaning and preprocessing the data, handling any missing or incorrect values, and transforming the data into a usable format for analysis. The next step will be to conduct a comprehensive analysis of the data, using descriptive statistics, visualization tools, and statistical techniques to identify patterns and trends in the data.

The analysis will focus on answering questions such as:

* What is the distribution of Airbnb listings by location, price, and availability?
* What factors influence the price of Airbnb listings?
* What is the seasonal pattern of Airbnb bookings?
* What are the most popular neighborhoods for Airbnb bookings?

The project will conclude by summarizing the findings and providing insights that can be used to optimize Airbnb bookings, improve customer experience, and increase profitability for Airbnb hosts.


# **GitHub Link -**

https://github.com/NamiraMujawar/airbnb-booking-analysis-eda

# **Problem Statement**



Since 2008, guests and hosts have used Airbnb to expand on travelling possibilities and present a more unique, personalised way of experiencing the world. Today, Airbnb became one of a kind service that is used and recognized by the whole world. Data analysis on millions of listings provided through Airbnb is a crucial factor for the company. These millions of listings generate a lot of data data that can be analysed and used for security, business decisions, understanding of customers and providers' (hosts) behaviour and performance on the platform, guiding marketing initiatives, implementation of innovative additional services and much more. This dataset has around 49,000 observations in it with 16 columns and it is a mix of categorical and numeric values. Explore and analyse the data to discover key understandings.

#### **Define Your Business Objective?**


* Understanding the market demand: The analysis can help to understand the demand for Airbnb bookings in a particular location and identify the popular times for booking. This information can be used to optimize the pricing strategy, plan marketing campaigns, and adjust supply accordingly.

* Optimizing the pricing strategy: The analysis can help to identify the factors that affect the price of Airbnb listings, such as location, amenities, and availability. This information can be used to optimize the pricing strategy, set competitive prices, and improve the profitability of the Airbnb business.

* Improving customer experience: The analysis can help to identify the common amenities and features that customers are looking for in Airbnb listings. This information can be used to improve the customer experience, add new features to the listings, and increase customer satisfaction.

* Identifying new business opportunities: The analysis can help to identify new business opportunities in the Airbnb market, such as identifying under-served areas or niches. This information can be used to develop new products or services that cater to these segments and increase the market share.

In summary, the business objective of Airbnb booking data analysis is to gain insights that can be used to optimize the Airbnb business, improve customer experience, and increase profitability.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import numpy as np
import pandas as pd
from numpy import math
from numpy import loadtxt
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import rcParams

import warnings
warnings.filterwarnings('ignore')

### Dataset Loading

In [None]:
# Load Dataset
Airbnb = pd.read_csv('Airbnb_NYC_2019.csv')

### Dataset First View

In [None]:
# Dataset First Look
Airbnb.head()

In [None]:
#Bottom 5 rows of dataset
Airbnb.tail()

In [None]:
Airbnb.head(10)

In [None]:
# DataType of each columns
Airbnb.dtypes

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
Airbnb.shape

### Dataset Information

In [None]:
# Dataset Info
Airbnb.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(Airbnb[Airbnb.duplicated()])


In [None]:
# sum of the dublicate values
Airbnb.duplicated().sum()

In [None]:
#remove duplicate values from the dataset
Airbnb.drop_duplicates(inplace=True)

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(Airbnb.isnull().sum())

In [None]:
# Visualizing the missing values
Airbnb.isnull()

In [None]:
# Visualizing the missing values
# Checking Null Value by plotting Heatmap
sns.heatmap(Airbnb.isnull(), cbar=False)

### What did you know about your dataset?

Airbnb booking analysis dataset typically contains information about Airbnb bookings in a particular location, including the listing description, booking date, number of guests, price, and other related attributes.Dataset consist of 48895 rows and 16 columns.

Some attributes found in the Airbnb booking analysis dataset include:

* Booking information
* Pricing information
* Reviews and ratings
* Host information

The dataset can be used to conduct a variety of analyses, such as understanding the market demand, optimizing the pricing strategy, improving customer experience, and identifying new business opportunities in the Airbnb market. It can also be used to develop predictive models that can help to forecast future demand and optimize the Airbnb business.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
Airbnb.columns

In [None]:
#Dataset Columns
for col in Airbnb.columns:
    print(col)

In [None]:
# Dataset Describe
Airbnb_dataframe = pd.DataFrame(Airbnb)

print(Airbnb_dataframe.describe())

In [None]:
# Dataset Describe
Airbnb.describe(include='all')

### Variables Description 

Variables found in an Airbnb booking analysis dataset:

Host ID: A unique identifier for the Airbnb host.

Listing name: The name of the Airbnb listing.
Neighborhood: The neighborhood where the Airbnb listing is located.

Latitude and Longitude: The geographical coordinates of the Airbnb listing.

Room type: The type of room available for booking, such as private room, shared room or entire home/apartment.

Price: The nightly price of the Airbnb listing.

Minimum nights: The minimum number of nights required to book the Airbnb listing.

Ratings: The overall rating and review scores of the Airbnb listing, as rated by previous guests.

Host response rate: The rate at which the Airbnb host responds to inquiries and booking requests.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in Airbnb.columns.tolist():
  print("No. of unique values in ",i,"is",Airbnb[i].nunique(),".")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
#Drop unnecessary columns
Airbnb.drop(['name','id','host_name','last_review'], axis=1)

In [None]:
#Examining Changes
Airbnb.head(5)

In [None]:
#Rreplace the 'reviews per month' by zero
Airbnb.fillna({'reviews_per_month':0})
#examing changes
Airbnb.reviews_per_month.isnull().sum()

In [None]:
#Remove the NaN values from the dataset
Airbnb.isnull().sum()
Airbnb.dropna(how='any',inplace=True)
Airbnb.info() #.info() function is used to get a abbreviate summary of the dataframe

In [None]:
#Examine Continous Variables
Airbnb.describe()

In [None]:
#Print all the columns names
Airbnb.columns


In [None]:
# Importing datetime modules
from datetime import datetime
from datetime import date

In [None]:
def parse_string_to_date(datestring):

  return convert_to_year(datestring)

In [None]:
Airbnb['last_review'] = pd.to_datetime(Airbnb['last_review'], format='%Y-%m-%d')

In [None]:
Airbnb.info()

In [None]:
# to change the data type of price to integer
Airbnb['price'] = Airbnb['price'].astype(int)

In [None]:
# to get the mean price by neighbourhood
Airbnb.groupby('neighbourhood')['price'].mean()

In [None]:
# to create a new column for price per bedroom
Airbnb['availability_of_the_rooms_per_min_nights'] = Airbnb['availability_365'] / Airbnb['minimum_nights'] 
print(Airbnb['availability_of_the_rooms_per_min_nights'])

In [None]:
Airbnb.drop(Airbnb.index[729], inplace=True)

In [None]:
# Let's check the column 'last review'

Airbnb['last_review']

In [None]:
# So 'last review' is a date column, we'll fix that. Also, we'll impute the null values with the
# very first date in this listing, denoting that the listing has almost never been reviewed

Airbnb['last_review'] = pd.to_datetime(Airbnb['last_review'])

In [None]:
# Let's check the min and max timestamps

Airbnb['last_review'].min(), Airbnb['last_review'].max()

In [None]:
# O boy! the max date of review is the year 2058. Let's find out how many such bogus dates are there, and fix them

Airbnb[Airbnb['last_review'].apply(lambda x: x.year) > 2022]

In [None]:
#let's change these wrong review dates to the median review date, giving benefit of doubt to the host

Airbnb.loc[Airbnb[Airbnb['last_review'].apply(lambda x: x.year) > 2022].index, 'last_review'] = Airbnb['last_review'].median()

In [None]:
# Now let's impute the null values to the minimum date in the dataset

Airbnb.loc[Airbnb['last_review'].isnull(), 'last_review'] = Airbnb['last_review'].median()

In [None]:

Airbnb[Airbnb.duplicated()]

In [None]:
Airbnb.drop_duplicates(keep='first', inplace=True)

In [None]:
Airbnb.duplicated(subset=['host_name', 'latitude', 'longitude', 'price']).sum()

In [None]:
# find these duplicate entries and manually confirm our hunch

temp = Airbnb.loc[Airbnb.duplicated(subset=['host_name', 'latitude', 'longitude', 'price'], keep=False)].copy()
temp = temp.groupby(['host_name', 'latitude', 'longitude', 'price'])
for key, subdf in temp:
    print(key)
    print(pd.DataFrame(subdf), '\n')
    break

In [None]:
del temp, subdf
Airbnb.drop_duplicates(subset=['host_name', 'latitude', 'longitude', 'price'], inplace=True)
Airbnb.info()

In [None]:
Airbnb.isnull().sum().sort_values(ascending=False)

In [None]:
Airbnb['reviews_per_month'].min(), Airbnb['reviews_per_month'].max()

In [None]:
#median to impute the column

Airbnb.loc[Airbnb['reviews_per_month'].isnull(), 'reviews_per_month'] = 0.79

In [None]:
# check the variability in the columns of dataframe

for col in Airbnb.columns:
    print(f'Column {col} \t has {Airbnb[col].nunique()} unique values')

In [None]:
# Let's check the host_identity_verified value count

Airbnb.host_name.value_counts()

In [None]:
# Let's check the neighbourhood group values

Airbnb['neighbourhood_group'].value_counts()

In [None]:
Airbnb.neighbourhood.value_counts().sort_values(ascending=False)[:10]

In [None]:
# Impute NAME column with 'blank'
Airbnb.loc[Airbnb['name'].isnull(), 'name'] = 'blank'    

# Impute host id with 0
Airbnb.loc[Airbnb['host_id'].isnull(), 'host_id'] = 0   


# Impute host name with 'blank'
Airbnb.loc[Airbnb['host_name'].isnull(), 'host_name'] = 'blank'

In [None]:
Airbnb.loc[Airbnb['neighbourhood_group']=='manhatan', 'neighbourhood_group'] = 'Manhattan'
Airbnb.loc[Airbnb['neighbourhood_group']=='brookln', 'neighbourhood_group'] = 'Brooklyn'

In [None]:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="MyApp")

# Check sample location
location = geolocator.geocode("Manhattan")
location

In [None]:
#We impute availability 365 with median value 127
Airbnb['availability_365'] = Airbnb['availability_365'].fillna(127)

In [None]:
# Let us clip the data between 0 and 4th quartile
Airbnb['minimum_nights'].min(), Airbnb['minimum_nights'].max()

In [None]:
# Let's impute the 'minimum nights' feature with the median 3

Airbnb['minimum_nights'] = Airbnb['minimum_nights'].fillna(3)

In [None]:
Airbnb.price.mean()

In [None]:
# Impute the price with mean
Airbnb.price.fillna(Airbnb.price.mean(), inplace=True)

In [None]:
Airbnb.isnull().sum().sort_values(ascending=False)

In [None]:
combined_df = pd.DataFrame(columns=['name', 'host_id', 'longitude', 'latitude', 'neibourhood', 'neibourhood_group'])

In [None]:
combined_df - pd.DataFrame(columns=['name', 'host_id', 'price', '', 'number_of_reviews', 'minimum_nights'])

In [None]:
combined_df.head()

In [None]:
idx = Airbnb.loc[Airbnb.neighbourhood.isnull()].index
Airbnb.loc[idx, 'neighbourhood'] = Airbnb.loc[idx].apply(lambda x: \
                                                loc_from_coord(x.lat, x.long), axis=1)

In [None]:
del temp

In [None]:
Airbnb.loc[idx].head()

In [None]:
idx = Airbnb.loc[Airbnb.neighbourhood.isnull()].index
Airbnb.loc[idx, 'neighbourhood'] = Airbnb.loc[idx].apply(lambda x: \
                                                loc_from_coord(x.lat, x.long), axis=1)

In [None]:
del temp

In [None]:
Airbnb.loc[idx].head()

In [None]:
#which roomtype has a demand

prf_areas = Airbnb.groupby(['neighbourhood_group', 'room_type'])['minimum_nights'].count().reset_index() 
prf_areas = prf_areas.sort_values(by='minimum_nights', ascending=False) 
prf_areas.head (3)

In [None]:
#imum price

Airbnb.groupby('neighbourhood_group',as_index=False)['price'].max().sort_values(['price'], ascending = False).rename(columns = {'price': 'Maximum price', 'neighbourhood_group':'Area'})

In [None]:
#area vise maximum price

max_price_Airbnb = Airbnb.groupby('neighbourhood_group', as_index=False)['price'].max().sort_values(['price'], ascending = True).rename (columns= {'price': 'Maximum price', 'neighbourhood_group':'Area'})

In [None]:
max_price_Airbnb

In [None]:
#area wise minimum price

min_price_Airbnb = Airbnb.groupby('neighbourhood_group',as_index=False)['price'].min().sort_values(['price'], ascending = True).rename(columns={"price":"Minimum price", 'neighbourhood_group':'Area'})
min_price_Airbnb

In [None]:
mearge_price_Airbnb = pd.merge(max_price_Airbnb,min_price_Airbnb,on='Area')

In [None]:
mearge_price_Airbnb

In [None]:
#top 10 hosts who have most reviews 
most_reviews_Airbnb = Airbnb.groupby(['host_name', 'room_type', 'neighbourhood_group', 'neighbourhood'])['number_of_reviews'].max().reset_index() 
most_reviews_Airbnb = most_reviews_Airbnb.sort_values(by='number_of_reviews', ascending=False).head(18)

most_reviews_Airbnb

In [None]:
#Average price of property based on location and Room type
avg_price = Airbnb.groupby(['neighbourhood_group', 'room_type'])['price'].mean().reset_index().rename(columns={'neighbourhood_group':'Area'})



In [None]:
avg_price

In [None]:
Airbnb_concat = pd.concat([min_price_Airbnb,max_price_Airbnb],axis=0)

In [None]:
Airbnb_concat 

In [None]:
Airbnb_concat1 = pd.concat([min_price_Airbnb,max_price_Airbnb],axis=1)

In [None]:
Airbnb_concat1

### What all manipulations have you done and insights you found?

Data Wrangling Techniques:

Handling missing values: Missing values can be handled using techniques such as imputation, substitution, or removal.

Removing duplicates: Duplicates can be removed to ensure that the analysis is based on unique data points.

Handling outliers: Outliers can be handled by removing or replacing them with more appropriate values.

Converting data types: The data types of the variables can be converted to ensure that they are in a consistent format for analysis.

Merging and splitting data: Data can be merged or split to create new variables or group data points for analysis.

Insights from Airbnb Booking Analysis Data:

* Price distribution: The analysis can provide insights into the distribution of prices for Airbnb listings in a particular location, identifying the range and median prices.

* Seasonal patterns: The analysis can reveal the seasonal patterns of Airbnb bookings, such as peak booking periods during holidays or special events.

* Location preferences: The analysis can identify the most popular neighborhoods or locations for Airbnb bookings, as well as the factors that influence these preferences.

* Customer ratings and reviews: The analysis can provide insights into the overall satisfaction of customers with Airbnb listings, based on their ratings and reviews.

These insights can help to optimize the Airbnb business, improve customer experience, and increase profitability for Airbnb hosts.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 pie chart for room_type

In [None]:
# Chart - 1 visualization code
print(Airbnb.room_type.value_counts())
print(" ")
# Dependant Variable Column Visualization
Airbnb['room_type'].value_counts().plot(kind='pie',
                              figsize=(15,6),
                               autopct="%1.1f%%",
                               startangle=90,
                               shadow=True,
                               labels=['Entire home/apt(%)','Private room (%)','Shared room (%)'],
                               colors=['skyblue','red','green'],
                               explode=[0,0,0]
                              )

##### 1. Why did you pick the specific chart?

A pie chart is a circular chart that is divided into slices to represent the proportion of each category in a dataset. In the case of Airbnb booking analysis, the pie chart can be used to visualize the proportion of each room type available for booking, such as private room, shared room, or entire home/apartment.

The pie chart is a suitable visualization for this variable because it allows viewers to quickly understand the relative proportions of each room type. The circular shape of the chart makes it easy to compare the sizes of each slice, and the use of different colors or labels can help to distinguish each category.

##### 2. What is/are the insight(s) found from the chart?

In Airbnb booking analysis, maximum people gave prefrance to book entire home and apartment with 52.3%,on the second position private rooms are booked by the customer with 45.5%. And only 2.2% people booked shared rooms.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The positive business impact of this insight from Airbnb booking analysis is that it can help the company and its hosts to optimize their listings and increase profitability.

Firstly, the fact that more than half of the customers prefer to book entire homes and apartments can provide insights into the demand for larger and more private spaces. Hosts can use this information to adjust their listings to better cater to this demand, such as by offering more spacious or luxurious accommodations. This can help hosts to attract more bookings and potentially increase their revenue.

Secondly, the high demand for private rooms can also provide opportunities for hosts to offer more personalized services and experiences to their guests. Hosts can provide amenities and services that are tailored to the preferences of their guests, such as providing local recommendations or offering personalized tours. This can help to improve customer satisfaction and loyalty, and potentially lead to repeat bookings and positive reviews.

Finally, the low demand for shared rooms can provide insights into the types of accommodations that are less popular among customers. Hosts can use this information to adjust their listings or pricing strategies to better reflect customer preferences, which can help to increase their occupancy rates and revenue.

In summary, the insight that customers prefer to book entire homes and apartments and private rooms can provide valuable insights for hosts and Airbnb to optimize their listings and improve profitability by catering to customer preferences.

#### Chart - 2 neighbourhood_group Vs. calculated_host_listings_count(Bivariate with Categorical - Numerical)

> Indented block



In [None]:
# Chart - 2 visualization code
#  Showing Average calculated_host_listings_count Percentage wise
# Showing top neighbourhood_group

print((Airbnb.groupby(['neighbourhood_group'])['calculated_host_listings_count'].mean()*100).sort_values(ascending = False).reset_index(name="Average booking %").head(10))
print(" ")


# State vs. average true churn percantage visualization code
# Vizualizing top 10 churned state
plt.rcParams['figure.figsize'] = (12, 7)
color = plt.cm.copper(np.linspace(0, 0.5, 20))
((Airbnb.groupby(['neighbourhood_group'])['calculated_host_listings_count'].mean())*100).sort_values(ascending = False).head(10).plot.bar(color = ['violet','indigo','b','g','y','orange','r'])
plt.title(" neighbourhood_group with most calculated_host_listings_count percentage", fontsize = 20)
plt.xlabel('neighbourhood_group', fontsize = 15)
plt.ylabel('percentage', fontsize = 15)
plt.show()

In [None]:
#  Showing Average calculated_host_listings_count Percentage wise
# Showing top neighbourhood_group
print((Airbnb.groupby(['neighbourhood_group'])['calculated_host_listings_count'].mean()*100).sort_values(ascending = True).reset_index(name="Average Room_type %").head(10))
print(" ")



# State vs. average true churn percantage visualization code
# Vizualizing bottom 10 churned state
plt.rcParams['figure.figsize'] = (12, 7)
color = plt.cm.copper(np.linspace(0, 0.5, 20))
((Airbnb.groupby(['neighbourhood_group'])['calculated_host_listings_count'].mean())*100).sort_values(ascending = True).head(10).plot.bar(color = ['violet','indigo','b','g','y','orange','r'])
plt.title(" neighbourhood_group with most calculated_host_listings_count ", fontsize = 20)
plt.xlabel('neighbourhood_group', fontsize = 15)
plt.ylabel('count_of_booking', fontsize = 15)

##### 1. Why did you pick the specific chart?

A bar plot chart between calculated_host_listings_count and neighbourhood_group can be useful in visualizing the distribution of the number of listings by host in each neighborhood group. The calculated_host_listings_count column represents the number of listings by a particular host in the Airbnb platform, while the neighbourhood_group column represents the different groups of neighborhoods in a particular city.

By plotting a bar plot chart between these two variables, we can quickly see which neighborhood group has the highest number of listings per host and which has the lowest. It can also help identify any outliers in the data, such as hosts with an unusually high or low number of listings.

This information can be helpful for both hosts and guests. Hosts can use this information to see how their listing(s) compare to other hosts in their neighborhood group, while guests can use this information to make more informed decisions when selecting a neighborhood to stay in based on the number of available listings per host.

##### 2. What is/are the insight(s) found from the chart?

According to the Airbnb dataset we can see people gave maximum prefrance to the Manattan neighborhood(more than 800), On the second position Queens neighborhood(more than 200).Neighborhood like Boronx, Brooklyn and Staten Island have almost similar count of bookings.

##### 3. Will the gained insights help creating a positive business impact? 
The positive business impact of this information for Airbnb could be that it helps them understand the most popular and in-demand neighborhoods for their users. This information can help Airbnb make informed decisions about where to focus their marketing efforts and investment in terms of adding new listings or improving existing ones.

For example, by knowing that Manhattan and Queens are the most popular neighborhoods, Airbnb can prioritize acquiring more listings in those areas, as they are likely to generate the highest bookings and revenue. Similarly, Airbnb can work to improve the quality of listings in these neighborhoods to ensure that guests have a positive experience, leading to more repeat bookings and positive reviews.

Additionally, this information can help Airbnb optimize their pricing strategy in different neighborhoods. They can use this information to set prices that reflect the demand for each neighborhood and adjust prices during peak times to maximize revenue.



There may be some potential negative business impacts for Airbnb based on this information as well.

Firstly, if there are limited listings in the most popular neighborhoods such as Manhattan and Queens, it may lead to increased competition among hosts and potentially result in higher prices for guests. This could lead to guests choosing alternative accommodation options or even choosing to stay in less popular neighborhoods with more affordable prices.

Additionally, if the popularity of certain neighborhoods changes over time, Airbnb may need to shift their strategy and resources to adapt to these changes. For example, if a once-popular neighborhood becomes less in demand, Airbnb may need to consider reducing their investment in that area and focusing on other neighborhoods instead.

Another potential negative impact is that hosts in less popular neighborhoods may feel neglected or undervalued by Airbnb, which may affect their motivation to continue listing their properties on the platform. Airbnb may need to find ways to incentivize hosts in these neighborhoods to keep their listings active and attract more guests to these areas.

Overall, while understanding the booking preferences of users in different neighborhoods can be valuable for Airbnb, it is important for the company to carefully consider both the positive and negative impacts of this information on their business and their users.

#### Chart - 3 Price with room_type(Univariate)

> 



In [None]:
# Chart - 3 visualization code
print(Airbnb[Airbnb['availability_365']<=359].loc[:,['room_type']].value_counts())
print(" ")


Airbnb[Airbnb['availability_365']<=9].loc[:,['room_type']].value_counts().plot(kind='pie',
                              figsize=(15,6),
                               autopct="%1.1f%%",
                               startangle=90,
                               shadow=True,
                               labels=['Private room','Private room','Shared room'],
                               colors=['skyblue','red','green'],
                               explode=[0,0,0]
                              )

In [None]:
print(Airbnb[Airbnb['number_of_reviews']<=359].loc[:,['room_type']].value_counts())
print(" ")


Airbnb[Airbnb['number_of_reviews']<=190].loc[:,['room_type']].value_counts().plot(kind='pie',
                              figsize=(15,6),
                               autopct="%1.1f%%",
                               startangle=90,
                               shadow=True,
                               labels=['Private room','Private room','Shared room'],
                               colors=['skyblue','red','green'],
                               explode=[0,0,0]
                              )

##### 1. Why did you pick the specific chart?

A pie chart is a circular chart that is divided into slices to represent the proportion of each category in a dataset. In the case of Airbnb booking analysis, the pie chart can be used to visualize the proportion of each room type available for booking, such as private room, shared room, or entire home/apartment.

The pie chart is a suitable visualization for this variable because it allows viewers to quickly understand the relative proportions of each room type. The circular shape of the chart makes it easy to compare the sizes of each slice, and the use of different colors or labels can help to distinguish each category.

##### 2. What is/are the insight(s) found from the chart?

A pie chart of number of reviews can provide insights into the popularity of accommodations. It can show the proportion of accommodations that have a certain number of reviews, which can be useful for understanding which accommodations are most frequently booked and reviewed. For example, it may show that a large proportion of accommodations have fewer than 10 reviews, or that a small proportion have a very large number of reviews.

A pie chart of room type can provide insights into the distribution of different types of accommodations. It can show the proportion of accommodations that are private rooms, shared rooms, or entire homes or apartments, which can be useful for understanding the different types of accommodations that are available and which types are most popular. For example, it may show that most accommodations are entire homes or apartments, or that there is a relatively even distribution of different types of accommodations.

##### 3. Will the gained insights help creating a positive business impact? 
Insights from a pie chart of price can help businesses determine which price ranges are most popular among Airbnb users. This can help them decide whether they should adjust their pricing strategy to make their accommodations more competitive in the market. For example, if the pie chart shows that a large proportion of accommodations fall into the mid-range price category, a business may want to consider lowering their prices to make their accommodations more attractive to customers.

Insights from a pie chart of number of reviews can help businesses understand the popularity of their accommodations and how they compare to competitors. If the pie chart shows that a large proportion of accommodations have a high number of reviews, a business may want to prioritize getting more reviews in order to stay competitive. On the other hand, if the pie chart shows that many accommodations have fewer reviews, a business may be able to gain a competitive advantage by actively soliciting more reviews from satisfied customers.

Insights from a pie chart of room type can help businesses understand the types of accommodations that are most popular among Airbnb users. If the pie chart shows that most accommodations are entire homes or apartments, a business may want to consider focusing on that type of accommodation to appeal to a larger customer base. On the other hand, if the pie chart shows that shared rooms or private rooms are more popular, a business may want to consider offering more of those types of accommodations to stay competitive in the market.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
plt.style.use('fivethirtyeight')
plt.figure(figsize=(13,7))
plt.title("Neighbourhood Group")
g = plt.pie(Airbnb.neighbourhood_group.value_counts(), labels=Airbnb.neighbourhood_group.value_counts().index,autopct='%1.1f%%', startangle=180)
plt.show()

##### 1. Why did you pick the specific chart?

A pie chart is a circular chart that is divided into slices to represent the proportion of each category in a dataset. In the case of Airbnb booking analysis, the pie chart can be used to visualize the proportion of each room type available for booking, such as private room, shared room, or entire home/apartment.

The pie chart is a suitable visualization for this variable because it allows viewers to quickly understand the relative proportions of each room type. The circular shape of the chart makes it easy to compare the sizes of each slice, and the use of different colors or labels can help to distinguish each category.

##### 2. What is/are the insight(s) found from the chart?

As we can see Manhattan neighourhood is more popular in people with 46.0%, Brooklyn neighourhood is at second position with percentage 44.4,third most neighourhood in people is Queens,then Bronx and Staten Island respectively.

##### 3. Will the gained insights help creating a positive business impact? 
The insights gained from the pie chart of the popularity of different neighborhoods in New York City, specifically that Manhattan and Brooklyn are the most popular neighborhoods among Airbnb users, can certainly help create a positive business impact for a business operating in the hospitality industry in New York City.

For example, if a business is looking to expand its operations and add new properties to its portfolio, the insights from the pie chart can inform where the business should focus its efforts. Based on the popularity of Manhattan and Brooklyn, a business may want to prioritize acquiring properties in those neighborhoods to appeal to the largest customer base.

Additionally, the insights gained from the pie chart can inform a business's marketing strategy. If a business has properties in both Manhattan and Brooklyn, it may want to highlight the popularity of those neighborhoods in its marketing materials to attract more customers. This can help the business stand out from competitors and increase its bookings.

Overall, the insights gained from the pie chart can certainly help a business create a positive impact by making data-driven decisions to improve its operations and appeal to the largest customer base possible.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
plt.figure(figsize=(13,7))
plt.title("Type of Room")
sns.countplot(Airbnb.room_type, palette="muted")
fig = plt.gcf()
plt.show()

##### 1. Why did you pick the specific chart?

Bar charts are a commonly used visualization tool that are easy to understand and interpret. They are effective in displaying categorical data and can show the relative size of each category.

Bar charts can be used to compare data between different categories, which makes them useful for comparing bookings between different room types.

Bar charts are customizable and can be used to display additional information, such as the total number of bookings or the percentage of bookings for each room type.

Bar charts can also be used to show changes in data over time, by creating a stacked bar chart that shows the breakdown of bookings by room type for each time period.

##### 2. What is/are the insight(s) found from the chart?

7000 above people gave prefarance to book entire hone/apt,and 5000 pluse people booked private rooms and only few people booked shared rooms in airbnb booking.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

* Positive impact:

By understanding that most users prefer to book entire homes or apartments, Airbnb can adjust their supply to meet the demand and encourage more hosts to list their entire properties. This can help improve the user experience and increase customer satisfaction, which can lead to repeat business and positive reviews.
Airbnb can also invest in targeted marketing campaigns to promote the booking of entire homes or apartments, which can lead to an increase in bookings and revenue.


* Negative impact:

If Airbnb does not have enough supply of entire homes or apartments to meet the high demand, it can result in frustrated customers who may opt to use other platforms to book their accommodations. This can lead to a loss of revenue for Airbnb and damage to their reputation.
The low demand for shared rooms can also have a negative impact if Airbnb has invested heavily in this area. For example, if Airbnb has launched marketing campaigns to promote shared rooms, but the demand remains low, it can result in a waste of resources and a negative return on investment.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
plt.figure(figsize=(13,7))
plt.title("Room Type on Neighbourhood Group")
sns.countplot(Airbnb.neighbourhood_group,hue=Airbnb.room_type, palette="muted")
plt.show()

##### 1. Why did you pick the specific chart?

A multi-bar graph could be useful in visualizing the distribution of bookings across different neighborhoods, providing a clear comparison of the popularity of each area.

##### 2. What is/are the insight(s) found from the chart?

Insights gained from a multi-bar graph of neighborhood and count of bookings in an Airbnb booking analysis can include identifying the most popular and least popular neighborhoods, understanding which neighborhoods have the highest and lowest demand for Airbnb accommodations, and helping hosts and property managers adjust their pricing, marketing and listing strategies accordingly.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from a multi-bar graph of neighborhood and count of bookings in an Airbnb booking analysis can help create a positive or negative business impact. By understanding which neighborhoods have the highest and lowest demand, hosts and property managers can adjust their pricing, marketing and listing strategies to optimize revenue and improve customer satisfaction. However, if certain neighborhoods are consistently underperforming, it may be necessary to reevaluate the suitability of those areas for Airbnb accommodations.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
plt.style.use('classic')
plt.figure(figsize=(13,7))
plt.title("Neighbourhood Group vs. Availability Room")
sns.boxplot(data=Airbnb, x='neighbourhood_group',y='availability_365',palette="dark")
plt.show()

##### 1. Why did you pick the specific chart?

I chose boxplots for the analysis of neighborhood versus availability of rooms in Airbnb bookings because boxplots are an effective way to summarize and visualize the distribution of a dataset, particularly when comparing groups. They allow for easy identification of outliers and the examination of quartiles, median, and range of values. In this case, boxplots can help identify any differences in the availability of rooms across different neighborhoods and provide insights into potential patterns or trends that can inform decision-making for both hosts and guests.

##### 2. What is/are the insight(s) found from the chart?

Identifying neighborhoods with consistently high or low availability, uncovering seasonal availability patterns, or detecting any potential outliers that may require further investigation.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the analysis of neighborhood versus availability of rooms in Airbnb bookings using boxplots can have both positive and negative impacts on the business. On the positive side, identifying neighborhoods with consistently high availability can help hosts make informed decisions on pricing and marketing strategies, ultimately increasing their revenue. On the negative side, discovering neighborhoods with low availability may negatively impact the booking experience for guests, leading to a decrease in customer satisfaction and potentially a decrease in bookings. Overall, the insights gained from the analysis can help inform strategic decision-making for both hosts and Airbnb as a business.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
plt.figure(figsize=(13,7))

ax=Airbnb[Airbnb.price<500].plot(kind='scatter', x='longitude',y='latitude',label='availability_365',c='price',cmap=plt.get_cmap('jet'),colorbar=True,alpha=0.4)
ax.legend()
plt.ioff()
plt.show()

##### 1. Why did you pick the specific chart?

I picked a scatter plot to visualize the relationship between the longitude, latitude, availability_365, and price variables in Airbnb booking analysis because it is an effective way to display the correlation and distribution of data points in two dimensions. By plotting the longitude and latitude on the x and y-axis, respectively, we can display the geographic distribution of the Airbnb listings. By incorporating the availability_365 and price variables as the color and size of the data points, we can observe any patterns or trends in their relationships with the location of the listings. The scatter plot also enables us to easily identify any outliers or clusters that may require further exploration. Overall, a scatter plot is a useful tool for exploring the spatial distribution of Airbnb listings and their relationships with availability and pricing.

##### 2. What is/are the insight(s) found from the chart?

Identifying clusters of high or low availability and pricing, exploring any relationships between geographic location and listing popularity, or uncovering any potential outliers that may require further investigation.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the analysis of the scatter plot for longitude, latitude, label, availability_365, and price in Airbnb booking analysis can have both positive and negative impacts on the business. For example, identifying popular and less popular locations can help hosts optimize their pricing and marketing strategies, ultimately increasing revenue. However, identifying a cluster of listings with low availability and high pricing may negatively impact the booking experience for guests, leading to a decrease in customer satisfaction and potentially a decrease in bookings. Overall, the insights gained from the analysis can help inform strategic decision-making for both hosts and Airbnb as a business.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
plt.style.use('classic')
plt.figure(figsize=(13,7))
plt.title("Neighbourhood Group Price Distribution < 400")
sns.boxplot(y="price",x ='neighbourhood_group' ,data = Airbnb[Airbnb.price<400])
plt.show()

##### 1. Why did you pick the specific chart?

I chose boxplots for the analysis of price and neighborhood group in Airbnb booking analysis because boxplots are an effective way to compare the distribution of a continuous variable (price) across different categories (neighborhood group), allowing for easy identification of differences and potential outliers in pricing patterns.

##### 2. What is/are the insight(s) found from the chart?

Identifying neighborhoods with higher or lower pricing, detecting potential outliers that may require further investigation, and understanding the overall distribution of pricing patterns across different neighborhood groups. These insights can inform pricing and marketing strategies for hosts and help Airbnb make data-driven decisions.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the analysis of the boxplot for price and neighborhood group in Airbnb booking analysis can have both positive and negative impacts on the business. For example, identifying neighborhoods with higher pricing can help hosts optimize their pricing strategies, ultimately increasing revenue. However, if the analysis reveals that certain neighborhoods have significantly lower pricing, this may indicate a lack of demand in that area, potentially leading to decreased bookings. Overall, the insights gained from the analysis can help inform strategic decision-making for both hosts and Airbnb as a business.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
plt.figure(figsize=(12,8))
sns.scatterplot(x=Airbnb.longitude,y=Airbnb.latitude,hue=Airbnb.neighbourhood_group)
plt.show()

##### 1. Why did you pick the specific chart?

I picked a scatter plot to visualize the relationship between the longitude, latitude, availability_365, and price variables in Airbnb booking analysis because it is an effective way to display the correlation and distribution of data points in two dimensions. By plotting the longitude and latitude on the x and y-axis, respectively, we can display the geographic distribution of the Airbnb listings. By incorporating the availability_365 and price variables as the color and size of the data points, we can observe any patterns or trends in their relationships with the location of the listings. The scatter plot also enables us to easily identify any outliers or clusters that may require further exploration. Overall, a scatter plot is a useful tool for exploring the spatial distribution of Airbnb listings and their relationships with availability and pricing.

##### 2. What is/are the insight(s) found from the chart?

Identifying clusters of high or low availability and pricing, exploring any relationships between geographic location and listing popularity, or uncovering any potential outliers that may require further investigation.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the analysis of the scatter plot for longitude, latitude, label, availability_365, and price in Airbnb booking analysis can have both positive and negative impacts on the business. For example, identifying popular and less popular locations can help hosts optimize their pricing and marketing strategies, ultimately increasing revenue. However, identifying a cluster of listings with low availability and high pricing may negatively impact the booking experience for guests, leading to a decrease in customer satisfaction and potentially a decrease in bookings. Overall, the insights gained from the analysis can help inform strategic decision-making for both hosts and Airbnb as a business.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
figsize=(12,8)
sns.boxenplot(x='price',data=Airbnb)

##### 1. Why did you pick the specific chart?

I may have chosen a boxenplot for the analysis of price in Airbnb booking analysis because it is a more detailed version of a boxplot, showing the distribution of data in greater detail. The boxenplot displays not only the median and quartiles, but also the shape of the distribution beyond the quartiles, allowing for a more comprehensive understanding of the data. This can be particularly useful in identifying potential outliers and understanding the spread of the data, ultimately leading to more informed decision-making.

##### 2. What is/are the insight(s) found from the chart?

Identifying the range and distribution of prices, detecting any potential outliers that may require further investigation, and understanding the spread and shape of the data. These insights can inform pricing and marketing strategies for hosts and help Airbnb make data-driven decisions.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the analysis of the boxenplot for price in Airbnb booking analysis can have both positive and negative impacts on the business. For example, identifying the range and distribution of prices can help hosts optimize their pricing strategies, ultimately increasing revenue. However, if the analysis reveals that prices are consistently high or low, this may impact the booking experience for guests, leading to a decrease in customer satisfaction and potentially a decrease in bookings. Overall, the insights gained from the analysis can help inform strategic decision-making for both hosts and Airbnb as a business.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
Airbnb.head()
plt.figure(figsize=(12,8))
df = Airbnb[Airbnb['minimum_nights']==1]
df1 = df.groupby(['room_type','neighbourhood_group'])['price'].mean().sort_values(ascending=True)
df1.plot(kind='bar')
plt.title('Average Price for rooms in neighbourhood group')
plt.ylabel('Average Daily Price')
plt.xlabel('Neighbourhood Group')
plt.show()
print('List of Average Price per night based on the neighbourhood group')
pd.DataFrame(df1).sort_values(by='room_type')

In [None]:
print('Top 20 most expensive locality in Airbnb listing are :')
df4 = Airbnb.dropna(subset=["price"]).groupby("neighbourhood")[["neighbourhood", "price"]].agg("mean").sort_values(by="price",
                                                                                                              ascending=False).rename(index=str, columns={"price": "Average price per night based on neighbourhood"}).head(15)

df4.plot(kind='bar')
plt.show()
pd.DataFrame(df4)

In [None]:
print('Least expensive neighbourhood according to Airbnb listing are')
df4 = Airbnb.dropna(subset=["price"]).groupby("neighbourhood")[["neighbourhood", "price"]].agg("mean").sort_values(by="price",
                                                                                                              ascending=False).rename(index=str, columns={"price": "Average price per night based on neighbourhood"}).tail(15)

df4.plot(kind='bar')
plt.show()
pd.DataFrame(df4)

In [None]:
df5 = Airbnb.groupby('neighbourhood')[['neighbourhood','host_name']].agg(['count']
                                                                   )['host_name'].sort_values(by='count',ascending=False).rename(index=str,columns={'Count':'Listing Count'})

df5.head(15).plot(kind='barh')
plt.show()
pd.DataFrame(df5.head(15))

In [None]:
print('Least Listing number of count')
df5 = Airbnb.groupby('neighbourhood')[['neighbourhood','host_name']].agg(['count']
                                                                   )['host_name'].sort_values(by='count',ascending=False).rename(index=str,columns={'Count':'Listing Count'})

df5.tail(15).plot(kind='barh')
plt.show()
pd.DataFrame(df5.tail(15))

##### 1. Why did you pick the specific chart?

Bar-plot is very useful and easy graph to understand patterns. Barplot gives clear idea about the data to take business decision. 

##### 2. What is/are the insight(s) found from the chart?

Average Price for rooms in neighbourhood group,Top 20 most expensive locality in Airbnb listing are,Least expensive neighbourhood according to Airbnb listings this types of data is need to be  visualize to take business decisions.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from the analysis of a bar plot in Airbnb booking analysis can help create a positive and negative business impact. Hosts can optimize their listings by identifying popular and less popular amenities. However, missing or poorly reviewed amenities can impact the booking experience and lead to a decrease in customer satisfaction and bookings.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
plt.figure(figsize=(15,8))
sns.scatterplot(y=Airbnb['price'],x=Airbnb['availability_365'])

##### 1. Why did you pick the specific chart?

In an Airbnb booking analysis, a scatter plot can be used to investigate the relationship between variables such as listing price and review scores, or between the number of reviews and occupancy rates. By plotting these variables against each other, you can identify any potential patterns or trends in the data, such as whether higher review scores are associated with higher listing prices, or whether higher occupancy rates are associated with more reviews.

Scatter plots can also be useful in identifying any potential outliers in the data, which are individual points that are far away from the general trend in the plot. Outliers can be important to consider in an Airbnb booking analysis, as they may indicate unusual or extreme cases that could be affecting the overall trends in the data.

##### 2. What is/are the insight(s) found from the chart?

Insights gained from a scatter plot in an Airbnb booking analysis can include identifying positive or negative relationships between variables, determining whether these relationships are linear or non-linear, detecting any potential outliers in the data, and identifying clusters or patterns in the dataset. For example, a scatter plot might show that higher review scores are associated with higher listing prices, or that there is a diminishing return on investment for higher listing prices. These insights can be used to inform decisions around pricing, marketing, and customer satisfaction. However, it is important to interpret the results of a scatter plot with caution, as correlation does not necessarily imply causation and other factors may be influencing the relationships between variables.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from a scatter plot in an Airbnb booking analysis can have both positive and negative business impacts. By identifying positive relationships between variables such as higher review scores being associated with higher listing prices, businesses can optimize pricing strategies, improve customer satisfaction and drive revenue. However, detecting potential outliers, non-linear relationships or clusters of data might also highlight potential negative business impacts such as lower than expected occupancy rates or lower review scores than competitors. Thus, businesses should carefully consider all insights gained from scatter plots and use them to inform decisions and improve their overall performance.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
#Get Correlation between different variables
corr = Airbnb.corr(method='kendall')
plt.figure(figsize=(15,8))
sns.heatmap(corr, annot=True)
Airbnb.columns

In [None]:
Airbnb.shape

In [None]:
Airbnb.head(15)

##### 1. Why did you pick the specific chart?

A correlation heatmap is a useful way to visualize the correlations between multiple numerical variables in a dataset. It uses color coding to represent the strength and direction of the correlations between variables, with darker colors indicating stronger positive or negative correlations.

In an Airbnb booking analysis, a correlation heatmap can be used to investigate the relationships between variables such as listing price, number of bedrooms, and review scores. For example, you might use a correlation heatmap to investigate whether there is a positive correlation between listing price and review scores, or whether there is a negative correlation between listing price and the number of reviews.

##### 2. What is/are the insight(s) found from the chart?

* Identifying strong positive or negative correlations: A correlation heatmap can help identify which variables are strongly positively or negatively correlated with each other. For example, you might find that listing price is strongly positively correlated with the number of bedrooms, indicating that larger properties tend to command higher prices.

* Identifying weak or non-existent correlations: A correlation heatmap can also help identify variables that are weakly or not at all correlated with each other. For example, you might find that there is little correlation between listing price and the number of reviews a property has received, indicating that reviews may not be a strong predictor of price.

* Identifying potential multicollinearity: A correlation heatmap can help identify variables that are highly correlated with each other, which can be an indication of multicollinearity. Multicollinearity can be a problem in statistical analyses such as regression, as it can lead to unstable estimates and unreliable results.

* Identifying potential outliers: A correlation heatmap can also help identify any variables that appear to be outliers in the dataset. Outliers can be important to consider in an Airbnb booking analysis, as they may indicate unusual or extreme cases that could be affecting the overall trends in the data.

#### Chart - 15 - Pair Plot 

In [None]:
# Pair Plot visualization code
sns.pairplot(Airbnb, hue="room_type")

##### 1. Why did you pick the specific chart?

A pair plot, also known as a scatter plot matrix, is a useful way to visualize the relationships between multiple numerical variables in a dataset. Each scatter plot in the matrix displays the relationship between two variables, while the diagonal plots show the distribution of each variable. This allows you to quickly identify any patterns or trends in the data, as well as potential outliers.

In an Airbnb booking analysis, a pair plot can be used to visualize the relationships between variables such as listing price, number of bedrooms, and review scores. For example, you might use a pair plot to investigate whether there is a relationship between listing price and the number of bedrooms, or whether there is a relationship between review scores and the number of reviews.

##### 2. What is/are the insight(s) found from the chart?

A pair plot can help identify patterns and relationships between multiple numerical variables in a dataset. By examining the scatter plots in the matrix, you can identify any potential correlations between variables, such as a positive correlation between listing price and the number of bedrooms. This information can be useful in making informed decisions around pricing and marketing.

In addition, a pair plot can help identify any potential outliers in the data. Outliers can be seen as individual points that are far away from the general trend in a scatter plot. Outliers can be important to consider in an Airbnb booking analysis, as they may indicate unusual or extreme cases that could be affecting the overall trends in the data.

#### Chart - 16 

In [None]:
# Visualizing code of hist plot for each columns to know the data distibution
for col in Airbnb.describe().columns:
  fig=plt.figure(figsize=(9,6))
  ax=fig.gca()
  feature= (Airbnb[col])
  sns.distplot(Airbnb[col])
  ax.axvline(feature.mean(),color='magenta', linestyle='dashed', linewidth=2)
  ax.axvline(feature.median(),color='cyan', linestyle='dashed', linewidth=2)
  ax.set_title(col)
plt.show()

# Visualizing code of box plot for each columns to know the data distibution
for col in Airbnb.describe().columns:
    fig = plt.figure(figsize=(9, 6))
    ax = fig.gca()
    Airbnb.boxplot( col, ax = ax)
    ax.set_title('Label by ' + col)
    #ax.set_ylabel("Churn")
plt.show()

##### 1. Why did you pick the specific chart?

Histograms are a useful way to visualize the distribution of a numerical variable. By dividing the range of values into bins and counting the number of data points that fall into each bin, a histogram can give insights into the shape, central tendency, and spread of the data. This can be particularly useful in an Airbnb booking analysis when trying to understand the distribution of variables such as listing prices, number of bedrooms, or review scores.

On the other hand, box plots are a great tool for understanding the distribution of a numerical variable as well as its potential outliers. Box plots display the median, quartiles, and outliers of a dataset, allowing you to quickly understand the spread and skewness of the data. In an Airbnb booking analysis, box plots could be useful in visualizing the distribution of listing prices or the number of reviews

##### 2. What is/are the insight(s) found from the chart?

Histograms can provide insights into the distribution of a numerical variable, such as the distribution of listing prices, review scores, or number of bedrooms. The shape of the histogram can provide information about the central tendency of the data (e.g., whether it is normally distributed or skewed), as well as the spread of the data (e.g., whether it is narrow or wide). Histograms can also highlight potential outliers, which may be important to consider in an Airbnb booking analysis.

Box plots can also provide insights into the distribution of a numerical variable, as well as its central tendency, spread, and potential outliers. The box in the plot represents the interquartile range (IQR), which includes the middle 50% of the data. The line in the box represents the median, and the whiskers extending from the box indicate the range of the data, excluding any outliers. Outliers can be plotted as individual points beyond the whiskers. Box plots can be useful in identifying any extreme values that may be affecting the overall distribution of the data, as well as any potential skewness.

#### Chart - 17 - Box Plot 

In [None]:
# Box Plot for price attribute with respective room type
Airbnb.boxplot(column='price',by='room_type')

##### 1. Why did you pick the specific chart?

I picked the Box Plot for the price attribute with respect to room_type chart because it is an effective way to visually represent the distribution of prices for each room type.

The Box Plot displays the median price (represented by the horizontal line in the box), as well as the upper and lower quartiles (the top and bottom of the box), and the range of prices (the vertical lines or "whiskers" extending from the box). This allows us to quickly compare the distribution of prices for each room type and identify any outliers or extreme values.

Additionally, the Box Plot enables us to easily identify any differences in the distribution of prices between the room types. For example, if one room type has a wider range of prices or more outliers, this can indicate that there are differences in the quality or amenities of the listings within that room type.

Overall, the Box Plot is a useful tool for exploring the distribution of prices for each room type and identifying any patterns or differences that may exist.

##### 2. What is/are the insight(s) found from the chart?

In Airbnb booking data analysis, Box Plot for price attribute with respective room type shows Entire home/apt are costly room type, followed by private rooms and then shared rooms are expensive. Also entire home/apt and private room type have outliers at 10000.

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ? 
Based on the analysis of Airbnb booking data, there are several recommendations that could help the client achieve their business objectives:

Improve the quality of listings: As we found that there is a positive correlation between the number of reviews and the booking price, it is important for the client to focus on improving the quality of listings on their platform. They could provide more resources and guidance to hosts to help them create more attractive and high-quality listings.

Focus on popular neighborhoods: As Manhattan and Queens were found to be the most popular neighborhoods, the client could focus on acquiring more listings in these areas and improving the quality of existing listings. This could help increase bookings and revenue for the client.

Optimize pricing strategy: Based on the booking patterns and preferences of users, the client could optimize their pricing strategy to reflect the demand for different types of rooms and neighborhoods. They could use dynamic pricing to adjust prices during peak times and maximize revenue.

Improve user experience: To provide a better experience for their guests, the client could consider investing in technology and tools to improve the booking process and communication between hosts and guests. They could also provide more personalized recommendations to guests based on their preferences and booking history.

Expand to new markets: The client could consider expanding their platform to new markets or neighborhoods that are currently underserved by Airbnb. This could help them attract new hosts and guests and increase their revenue and market share.

Overall, by focusing on these recommendations, the client can achieve their business objectives of increasing bookings, revenue, and user satisfaction.

# **Conclusion**

In conclusion, the analysis of Airbnb booking data provides valuable insights into the booking patterns and preferences of users. Through the analysis of various variables such as the type of room, neighborhood, and pricing, we can identify trends and patterns that can help Airbnb optimize their business strategy and provide a better experience for their users.

We found that entire apartments were the most popular type of room booked by guests, followed by private rooms and shared rooms. Additionally, we found that Manhattan and Queens were the most popular neighborhoods, with the highest number of listings per host. We also discovered that there was a positive correlation between the number of reviews and the booking price, suggesting that guests were willing to pay more for higher-quality listings.

Overall, this analysis provides valuable insights into the Airbnb platform and can help the company make data-driven decisions to optimize their business strategy and improve the user experience. By understanding the booking preferences of users, Airbnb can prioritize their efforts and resources to maximize revenue and provide a better experience for their hosts and guests.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***