<a href="https://colab.research.google.com/github/Ritesh-saini74/EDA-project/blob/main/Airbnb_Bookings_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**



*
Airbnb is an online marketplace that connects people who want to rent out their property with people who are looking for accommodations, typically for short stays.


* Airbnb offers hosts a relatively easy way to earn some income from their property.




* Today, Airbnb became one of a kind service that is used and recognized by the whole world. Data analysis on millions of listings provided through Airbnb is a crucial factor for the company. These millions of listings generate a lot of data — data that can be analyzed and used for security, business decisions, understanding of customers’ and providers’ (hosts) behavior and performance on the platform, guiding marketing initiatives, implementation of innovative additional services and much more.

* For this project we are analyzing Airbnb’s New York City(NYC) data of 2019. NYC is not only the most famous city in the world but also top global destination for visitors drawn to its museums, entertainment, restaurants and commerce.


* This dataset has around 48895 observations in it with 16 columns and it is a mix of categorical and numeric values. Let’s Explore and analyze the data to discover key understandings.





    # What is EDA ?

* Exploratory data analysis (EDA) is used by data scientists to analyze and investigate data sets and summarize their main characteristics.



* Exploratory Data Analysis or EDA is used to take insights from the data. Data Scientists and Analysts try to find different patterns, relations, and anomalies in the data using some statistical graphs and other visualization techniques. Following things are part of EDA :

1. Get maximum insights from a data set
2. Uncover underlying structure
3. Extract important variables from the dataset
4. Detect outliers and anomalies(if any)
5. Test underlying assumptions
6. Determine the optimal factor settings

* The main purpose of EDA is to detect any errors, outliers as well as to understand different patterns in the data. It allows Analysts to understand the data better before making any assumptions. The outcomes of EDA helps businesses to know their customers, expand their business and take decisions accordingly.






# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


1.**Business trend**- Collecting and organizing listing information for Airbnb is an important means of providing personalized recommendations to tenants. The availability, location, type,
etc. of the listing is an important basis for ensuring that the tenant can find the right listing. Using listing information to capture business changes and trends is the foundation that helps Airbnb deliver quality service.

 2.**Customized service**- Customer satisfaction is an important reference for customized services. Through a detailed analysis of customer satisfaction, Airbnb can find the qualities of quality homes and promote them to all listings to further increase customer satisfaction.

  3.**Pricing model**- Airbnb's customers fall into two broad categories (tenants and renters), and for both parties housing prices are an important indicator of their satisfaction. A well-established pricing model can help Airbnb meet the price needs of its customers.

#### **Define Your Business Objective?**

. The company aims to create a safe and trusted community where hosts and guests can connect and have positive experiences.

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

### Dataset Loading

In [None]:
# Load Dataset

from google.colab import drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
dataset = pd.read_csv('/content/Airbnb NYC 2019.csv')

### Dataset First View

In [None]:
# Dataset First Look

dataset.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
dataset.shape

*Dataset have 48895 rows and 16 columns*

### Dataset Information

In [None]:
# Dataset Info

dataset.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

dupli_count=dataset.duplicated().sum()
print(f'Number of duplicate value is {dupli_count}')

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

dataset.isnull().sum()

In [None]:
# Visualizing the missing values

plt.figure(figsize=(5,3))
sns.heatmap(dataset.isnull())
plt.title('Missing value heatmap')
plt.show()

### What did you know about your dataset?

As we see from above map, there are 10052 missing value in both the column of last review and reviews per month, 16 missing value in the column of name and 21 in host name.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
dataset.columns

In [None]:
# Dataset Describe

dataset.describe()

### Variables Description

ID: unique Id's

Name: Name of Airbnb's listing

Host ID: Unique host Id's

Host Name: Name of the Host

Neighbourhood Group: Location

Neighbourhood: Area

Lalitude: Lalitude Range

Longitude: Longitude Range

Room type: Type of room listed

Price: Price of listing

Minimum nights: Minimum night's to be paid for

Number of Reviews: Number of Reviews

Last Reviews: Last date of review

Reviews per month: Number of review per month

Calculated host listing count: Total count of listings of the host

Availability 365: Availability around the year

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

dataset.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Create a copy of the current dataset

df = dataset.copy()

In [None]:
# Checking the number of rows with zero price

df[df.price==0].shape

*we will drop this entries where price equal to zero*

In [None]:
# Removing rows having zero price values

df = df[df['price'] !=0]

*Both NAME and HOST NAME has very less missing value so we fill that with unknown and no name respectively*

In [None]:
# add string in missing places of name and host name

df['name'].fillna('unknown',inplace=True)
df['host_name'].fillna('anonymous',inplace=True)

In [None]:
# fiiling null value with zero

df['reviews_per_month']= dataset['reviews_per_month'].fillna(0)

In [None]:
# drop the last review column

df.drop(['last_review'],axis=1,inplace=True)

In [None]:
# new shape

df.shape

### What all manipulations have you done and insights you found?

Previous shape was contain 48895 rows and 16 columns

Now, the new shape have 48884 rows and 15 columns means we removed all the rows where the price value is zero.

In [None]:
# Checking null value again

df.isnull().sum()

In [None]:
# correlation between variables

corr = df.corr( method = "kendall")
fig =plt.figure(figsize=(8,6))
sns.heatmap(corr, annot = True)
df.columns


What can we learn about different hosts and areas?

In [None]:
host_areas =dataset.groupby(['host_name','neighbourhood_group'])['calculated_host_listings_count'].max().reset_index()
host_areas.sort_values(by='calculated_host_listings_count',ascending=False).head(5)

We find that Host name Sonder(NYC) has listed highest number of listings in Manhattan followed by Blueground

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1



In [None]:
##assign the numerical column to variable

numerical_columns=list(dataset.select_dtypes(['int64','float64']).columns)
numerical_features=pd.Index(numerical_columns)
numerical_features

In [None]:
#printing displots to analyze the distribution of all numerical features
for col in numerical_features:
  plt.figure(figsize=(7,5))
  sns.distplot(x=dataset[col])
  plt.xlabel(col)
plt.show()

##### 1. Why did you pick the specific chart?

A Distplot or distribution plot, depicts the variation in the data distribution. Seaborn Distplot represents the overall distribution of continuous data variables. The Seaborn module along with the Matplotlib module is used to depict the distplot with different variations in it.

##### 2. What is/are the insight(s) found from the chart?

Its represent that the distribution is normal distribution or the data distribution is skewed in nature

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Histogram can't give information regarding data. It only give about the distribution about the data

#### Chart - 2

In [None]:
# Group the dataset by "Neighbourhood group" and calculate the average price

neighborhood_avg_price = df.groupby('neighbourhood_group')['price'].mean()

# Set the style of the plot
sns. set (style="whitegrid")

# Create a bar plot

plt.figure (figsize=(10, 6))
ax = sns. barplot (x=neighborhood_avg_price.index, y=neighborhood_avg_price)

# Set the title and axis labels
plt.title('Neighbourhood Group vs Average Price')
plt.xlabel ('Neighbourhood Group')
plt.ylabel ('Average Price')
plt.xticks(rotation=30)

##### 1. Why did you pick the specific chart?

The bar graph helps to compare the different sets of data among different groups easily. It shows the relationship using two axes, in which the categories are on one axis and the discrete values are on the other axis.

##### 2. What is/are the insight(s) found from the chart?

Analyzing the average prices of different neighborhood groups in the Airbnb 2019 NYC data set using a bar plot can provide valuable insights.
By visually comparing the heights of the bars, we can identify which neighborhoods have higher or lower average prices. This information can
be crucial for both hosts and guests. Hosts can gain insights into which neighborhoods tend to command higher prices and potentially adjust
their pricing strategies accordingly. Guests, on the other hand, can use this information to make informed decisions about where to book
accommodations based on their budget and preferences. Additionally, the ordered arrangement of the bars can reveal trends, highlighting
which neighborhood groups consistently have higher or lower average prices. Overall, this analysis can help stakeholders in the Airbnb market
understand the variations in prices across different neighborhood groups, enabling them to make data-driven decisions and optimize their
experiences in the NYC market.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from analyzing the average prices of different neighborhood groups in theyAirbnb 2019 NYC data set can have a positive
business impact by helping hosts strategically set prices and maximize profits, and enabling guests to find accommodations within their
desired budget.

However, there is a potential for negative growth if certain neighborhoods consistently ave lower average prices, indicating a
lack of demand or less desirable features.

#### Chart - 3

In [None]:
# Group the data by neighbourhood group' and calculate the average availability

availability_by_group = df.groupby ('price')['availability_365'].mean()

# Create a bar plot

plt.figure (figsize=(8, 6))
sns. lineplot(x=availability_by_group.index, y=availability_by_group.values)

# Set the title and axis labels
plt.title('Average Availability of Listings by Price')
plt.xlabel('Price')
plt.ylabel ('Average Availability (in days)')

#Display the plot
plt. show()

##### 1. Why did you pick the specific chart?

A line plot is a graph that displays data with the help of symbols above a number line showing the frequency of each value. It is used to organize the data in a simple way and is very easy to interpret.


##### 2. What is/are the insight(s) found from the chart?

The line plot shows how availability changes with price for different neighborhood groups in the Airbnb 2019 NYC dataset.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the line plot can have a positive business impact by optimizing pricing strategies and helping guests make informed
choices.

#### Chart - 4

In [None]:

# Group the data by neighbourhood group and room_type, and calculate the mean price

df = df.groupby (['neighbourhood_group','room_type'], as_index=False) [['price' ]].mean()

# Create a figure
plt.figure (figsize=(12, 6))

# Create a bar plot with x as neighbourhood group, y as price, hue as room type
df = sns.barplot (x="neighbourhood_group", y="price", data=df, hue='room_type')

# Set the x and y labels, and the title of the plot
plt.xlabel ('Neighbourhood Group')
plt.ylabel ('Average Price')
plt.title('Average Price by Room Type')
# Show the plot
plt. show()

##### 1. Why did you pick the specific chart?

Bar graphs are used to compare things between different groups

Advantages:

show each data category in a frequency distribution.

display relative numbers or proportions of multiple categories.

summarize a large data set in visual form.

clarify trends better than do tables.

estimate key values at a glance.

permit a visual check of the accuracy and reasonableness of calculations.

##### 2. What is/are the insight(s) found from the chart?

The plot reveals the price ranges and differences between room types, allowing guests to identify
accommodations that align with their budget. It also highlights popular room types and potential outliers, providing an understanding of market
preferences and unique pricing patterns.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

have a positive business impact by optimizing pricing strategies and meeting guest preferences. However, there is a potential for negative
growth if certain room types are consistently overpriced or if intense price competition arises.

#### Chart - 5

In [None]:
# room type and their prices according to area

neighbourhood_group = ['Brooklyn', 'Manhattan', 'Queens', 'Manhattan', 'Brooklyn', 'Staten Island', 'Queens', 'Bronx', 'Queens', 'Bronx']
room_type = ['Entire home/apt', 'Entire home/apt', 'Private room', 'Private room', 'Private room', 'Entire home/apt', 'Entire home/apt', 'Private room', 'Shared room', 'Entire home/apt']

#Create a dictionary named room_dict to store the count of each room type. Loop through the room_type list and increase the count of the room type in the dictionary if it already exists. If not, add the room type as a key with the count as 1.
room_dict = {}

for i in room_type:
    room_dict[i] = room_dict.get(i, 0) + 1

#Plot a bar graph using the plt.bar function. The x-axis will be the room types which are the keys of the room_dict dictionary and the y-axis will be the count of each room type which are the values of the room_dict dictionary.
plt.bar(room_dict.keys(), room_dict.values(), color='green', edgecolor='blue')


plt.title('Room Types')
plt.xlabel('Room Type')
plt.ylabel('Count')
plt.show()


##### 1. Why did you pick the specific chart?

Barplot shows the relationship between a numeric and a categoric variable. Each entity of the categoric variable is represented as a bar. The size of the bar represents its numeric value.

##### 2. What is/are the insight(s) found from the chart?

We found that Entire home/apt is the highest number of room types overall and prices are high in the brooklyn and Manhattan for entire home/apt.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

It found that home/apt is the highest number of room types means guest wants this or they like privacy beacuse as we see shared room category is least. so we really wants to do something about shared rooms.

#### Chart - 6

In [None]:
# Number of reviews in term of area

area_reviews = dataset.groupby(['neighbourhood_group'])['number_of_reviews'].max().reset_index()
area = area_reviews['neighbourhood_group']
review = area_reviews['number_of_reviews']
fig = plt.figure(figsize =(10,5))

plt.bar(area, review, color ="blue", width =0.5)
plt.xlabel('Area')
plt.ylabel('Review')
plt.title("Number of Reviews in terms of area")
plt.show()


##### 1. Why did you pick the specific chart?

A barplot (or barchart) is one of the most common types of graphic. It shows the relationship between a numeric and a categoric variable

##### 2. What is/are the insight(s) found from the chart?

Here we found that neighbourhood group of Queens , manhattan area ,brooklyn , staten island and bronx is highest number of review in terms of area respectively.

#### Chart - 7

In [None]:
# Number of review Vs price

price_area = dataset.groupby(['price'])['number_of_reviews'].max().reset_index()
price_list = price_area['price']
review = price_area['number_of_reviews']
fig =plt.figure(figsize =(10,5))

plt.scatter(price_list, review)
plt.xlabel('Price')
plt.ylabel('Number of reviews')
plt.title('Number of Reviews VS Price')
plt.show()


##### 1. Why did you pick the specific chart?

A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two different numeric variables. The position of each dot on the horizontal and vertical axis indicates values for an individual data point. Scatter plots are used to observe relationships between variables.

##### 2. What is/are the insight(s) found from the chart?

From above visualization we can say that most number of people like to stay in less price and their reviews are higher in those areas

#### Chart - 8

In [None]:
# Which hosts are the busiest and why is the reason?

busy_hosts = dataset.groupby(['host_id','host_name','room_type'])['number_of_reviews'].max().reset_index()
busy_hosts = busy_hosts.sort_values(by = 'number_of_reviews', ascending =False).head(10)

name_hosts = busy_hosts['host_name']
review_got = busy_hosts['number_of_reviews']

fig = plt.figure(figsize =(10,5))

plt.bar(name_hosts,review_got, color ='purple', width =0.5)
plt.xlabel('Name of the Host')
plt.ylabel('Review')
plt.title("Busiest Host in terms of reviews")
plt.show()

##### 1. Why did you pick the specific chart?

A bar plot or bar chart is a graph that represents the category of data with rectangular bars with lengths and heights that is proportional to the values which they represent. The bar plots can be plotted horizontally or vertically. A bar chart describes the comparisons between the discrete categories. One of the axis of the plot represents the specific categories being compared, while the other axis represents the measured values corresponding to those categories.


##### 2. What is/are the insight(s) found from the chart?

We have found Busiest hosts :

Dona

Ji

Maya

Carol

Danielle

Because these hosts listed their room type as Entire home and Private room which is preferred by most number of people and also their reviews are higher.

#### Chart - 9

In [None]:
# Which Hosts are charging higher price?

Highest_price= dataset.groupby(['host_id','host_name','room_type','neighbourhood_group'])['price'].max().reset_index()
Highest_price= Highest_price.sort_values(by = 'price', ascending =False).head(10)

name_of_host = Highest_price ['host_name']
price_charge = Highest_price['price']

fig = plt.figure(figsize =(10,5))

plt.bar(name_of_host,price_charge , color ='orange', width =0.5)
plt.xlabel('Name of the Host')
plt.ylabel('Price')
plt.title("Hosts with maximum price charges")
plt.show()


##### 1. Why did you pick the specific chart?

A bar plot or bar chart is a graph that represents the category of data with rectangular bars with lengths and heights that is proportional to the values which they represent. The bar plots can be plotted horizontally or vertically. A bar chart describes the comparisons between the discrete categories. One of the axis of the plot represents the specific categories being compared, while the other axis represents the measured values corresponding to those categories.


##### 2. What is/are the insight(s) found from the chart?

Now we have seen that 10 Hosts who are charging maximum price:


Jelena, Kathrine, Erin, Matt, Olson, Amy, Rum, Jessica, Sally, Jack

#### Chart - 10

In [None]:
# What is the room count in overall NYC according to the listing of room types?

plt.rcParams['figure.figsize'] = (8, 5)
ax= sns.countplot(y='room_type',hue='neighbourhood_group',data=dataset,palette='bright')

total = len(dataset['room_type'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))

plt.title('Count of each room types in NYC')
plt.xlabel('Rooms')
plt.xticks(rotation=90)
plt.ylabel('Room Counts')

plt.show()

##### 1. Why did you pick the specific chart?

The countplot is used to represent the occurrence(counts) of the observation present in the categorical variable. It uses the concept of a bar chart for the visual depiction.


##### 2. What is/are the insight(s) found from the chart?

Manhattan has more listed properties with Entire home/apt around 27% of total listed properties followed by Brooklyn with around 19.6%.
Private rooms are more in Brooklyn as in 20.7% of the total listed properties followed by Manhattan with 16.3% of them. While 6.9% of private rooms are from Queens.
We can infer that Brooklyn,Queens,Bronx has more private room types while Manhattan which has the highest no of listings in entire NYC has more Entire home/apt room types.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?

---


 The idea of starting their own Airbnb attracts many people looking to make money from an existing asset their house or inner-city apartment. Perhaps it has something to do with the freedom of being your boss, or maybe it’s about the chance to meet interesting people from different parts of the world.

If you have available space, running an Airbnb is a great way to make extra money on the side, and if things go well, it can even become your primary source of income. But having a successful Airbnb isn’t simply listing your rental property and waiting for the bookings to roll in. There’s actually a lot more to it.

There are plenty of effective strategies that you can employ to become a successful host with just a single property. Here are our top tips on running an Airbnb business successfully.

#High-quality photos make a difference

Photos that communicate your property’s unique features can make your listing appealing to potential guests. You should take beautiful photos that capture the character of your place, the surrounding area, and any amenities and show your property from all angles.

Make sure that your rental space is clean before photographing, and try to use natural lighting. You may want to hire a professional photographer because professional photos that capture the atmosphere of the listing can help it stand out from the rest and get more bookings.

#Respond to Airbnb guests immediately

When travelers see interesting Airbnb properties, they are likely to contact several hosts simultaneously. The faster you respond to booking inquiries, the higher your chances of getting the guests. If prospective guests don’t hear from you quickly, they will simply go elsewhere.

If you want to be successful, you should be ready to respond to your guests at any given moment. Be polite and helpful when it comes to special requests. Conversations with guests require a substantial time commitment, and you should be prepared to devote some portion of each day to this important task, which is rather challenging.

#Make your guests feel welcome

Your Airbnb rental will be your guests’ home away from home, so make sure your place is clean and tidy before guests come there. In addition to stocking your listing with necessities, you should consider having books, movies, and entertainment options for your guests.

#Price your listing reasonably

The main reason travelers prefer Airbnb rentals is that they are more affordable than staying in a hotel. So you should do research to find out what other similar rentals in your neighborhood are charging and set a realistic rate.

You may want to consider using software that will allow you to apply dynamic pricing to your listing. Setting a competitive rate will enhance your chances of attracting guests and having a high occupancy rate.

#Hire professional cleaners

Cleaning the property yourself is one way to cut costs. But you may quickly find that this task becomes incredibly time-consuming, especially for short-term lets. And if you do full-time rentals, the turnaround time between guests may be fast, so hiring a cleaning service may be the best option. Just make sure to provide your cleaners with a cleaning checklist.

#Create a comprehensive welcome guide

If you would like your guests to enjoy their stay without asking lots of questions, you’ll need to provide them with a printed copy of a detailed welcome book. It should list everything about your property and what guests need to know about their stay.


**Running an Airbnb allows you to create additional income on your own terms with available space that may otherwise have been sitting empty. It’s up to you to decide whether you keep it as a part-time hustle or work to grow your hosting business to make more money. Just keep in mind that to do it right, you need to invest some time and effort.**


# **Conclusion**

We find that Host name Sonder(NYC) has listed highest number of listings in Manhattan followed by Blueground.

We found that Entire home/apt is the highest number of room types overall and prices are high in the brooklyn and Manhattan for entire home/apt.

From above visualization we can say that most number of people like to stay in less price and their reviews are higher in those areas.

We have found Busiest hosts :

Dona, Ji, Maya,Carol,Danielle

Because these hosts listed their room type as Entire home and Private room which is preferred by most number of people and also their reviews are higher.

Now we have seen that 10 Hosts who are charging maximum price:
Jelena,Kathrine,Erin,Matt,Olson,Amy,Rum,Jessica,Sally & Jack

Max Price is 10000 USD

From this visualization We found that most of the people likely to stay at Entire home and Private room which are present in Manhattan, Brooklyn & Queens and also vistors referring stay in room which listing price is less.

We have seen all the correlation between different variables

Manhattan has more listed properties with Entire home/apt around 27% of total listed properties followed by Brooklyn with around 19.6%.

Private rooms are more in Brooklyn as in 20.7% of the total listed properties followed by Manhattan with 16.3% of them. While 6.9% of private rooms are from Queens. We can infer that Brooklyn,Queens,Bronx has more private room types while Manhattan which has the highest no of listings in entire NYC has more Entire home/apt room types.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***