# **Project Name**    - Airbnb Booking Analysis (NYC)



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1 -** Simarjeet Kaur Wade

# **Project Summary -**

Since 2008, guests and hosts have used Airbnb to expand on traveling possibilities and present a more unique, personalized way of experiencing the world. Today, Airbnb became one of a kind service that is used and recognized by the whole world. Data analysis on millions of listings provided through Airbnb is a crucial factor for the company. These millions of listings generate a lot of data - data that can be analyzed and used for security, business decisions, understanding of customers' and providers' (hosts) behavior and performance on the platform, guiding marketing initiatives, implementation of innovative additional services and much more.
This dataset has around 49,000 observations in it with 16 columns and it is a mix between categorical and numeric values.

For this project, I'm using Google colab a web IDE with a python programming language to write our script. IDE or Integrated Development Environment is a software application used for software development. To get the data, I'm using Airbnb data that publicly shared on the internet under the Creative Commons License. Before,to load the data into our IDE, first need to import various external libraries/modules that needed for visualization and analysis.



# **GitHub Link -**

Name - Simarjeet Kaur Wade.  
Github Link -

# **Problem Statement**


Explore and analyze the data to discover key understandings (not limited to these) such as :

- What can we learn about different hosts and areas?
- What can we learn from predictions? (ex: locations, prices, reviews, etc)
- Which hosts are the busiest and why?
- Is there any noticeable difference of traffic among different areas and what could be the reason for it?

#### **Define Your Business Objective?**

The Airbnb EDA project aims to help people who want to rent out their properties on Airbnb. By analyzing data about listings in a specific city, the project wants to find patterns and trends in the market. This information can then be used to give advice to potential hosts on how to make their listings better and earn more money.

The main goal of the project is to provide useful insights about the market, such as which neighborhoods and types of properties are popular, when people are more likely to book, and how different features affect prices and occupancy rates. With this knowledge, the project can suggest practical tips to potential hosts on how to improve their listings and attract more bookings.

The ultimate objective of the project is to help potential hosts succeed on Airbnb by increasing their chances of getting bookings and making more money. This is what Airbnb wants too - they want to connect travelers with interesting and affordable places to stay, while also helping hosts earn extra income by sharing their space.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset
air_bnb = pd.read_csv('/content/drive/MyDrive/Projects/Airbnb/Airbnb NYC 2019.csv')

### Dataset First View

In [None]:
# Dataset First Look
air_bnb.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
air_bnb.shape

### Dataset Information

In [None]:
# Dataset Info
air_bnb.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
air_bnb[air_bnb.duplicated()].sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
air_bnb.isnull().sum()

In [None]:
# Visualizing the missing values
air_bnb.isna().any()

### What did you know about your dataset?

Airbnb booking analysis dataset comprises of 48,895 rows and 16 columns (variables) with a size of around 6MB+. There are no duplicate values in our dataset however there are few null values in specific columns namely name, host_name, last_review and reviews_per_month.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
air_bnb.columns

In [None]:
# Dataset Describe
air_bnb.describe()

### Variables Description

It has been observed that the price ranges from 0-10k with a mean of $152.72. Minimum nights spend varied from 1 to 1250 days with a mean of 7 days (makes sense). On an average 23 reviews have been coming to a location.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
print("Unique values in id are : ", air_bnb['id'].nunique())
print("Unique values in host id are : ", air_bnb['host_id'].nunique())
print("Unique values in host name are : ", air_bnb['host_name'].nunique())
print("Unique values in neighbourhood group are : ", air_bnb['neighbourhood_group'].nunique())
print("Unique values in neighbourhood are : ", air_bnb['neighbourhood'].nunique())
print("Unique values in latitude are : ", air_bnb['latitude'].nunique())
print("Unique values in longitude are : ", air_bnb['longitude'].nunique())
print("Unique values in room_type are : ", air_bnb['room_type'].nunique())
print("Unique values in price are : ", air_bnb['price'].nunique())
print("Unique values in minimum_nights are : ", air_bnb['minimum_nights'].nunique())
print("Unique values in number_of_reviews are : ", air_bnb['number_of_reviews'].nunique())
print("Unique values in last_review are : ", air_bnb['last_review'].nunique())
print("Unique values in reviews_per_month are : ", air_bnb['reviews_per_month'].nunique())
print("Unique values in calculated_host_listings_count are : ", air_bnb['calculated_host_listings_count'].nunique())
print("Unique values in availability_365 are : ", air_bnb['availability_365'].nunique())

## 3. ***Data Wrangling***

It has been observed that few columns contains null values. We can check whether we have to remove the particular column or replace the nulls with some suitable character. Generally dropping a column should be avoided as it may remove relevant information. A good practice is to remove a column if missing values > 25-30%. However this depends on the information the variable contains and solely on a business objective whether to drop a column or not.

Missing value count of the columns are provided below -    
1. name - 0.03%
2. host_name - 0.04%
3. last_review - 20.55%
4. reviews_per_month - 20.55%

We can drop the last 2 columns for ease of work and replace missing values for initial 2 columns.

### Data Wrangling Code

In [None]:
air_bnb.head()

In [None]:
air_bnb.shape

In [None]:
air_bnb_test = air_bnb

In [None]:
air_bnb_test['name'].fillna('No Name',inplace=True)
air_bnb_test['host_name'].fillna('NO_NAME',inplace=True)

In [None]:
air_bnb_test.isna().any()

In [None]:
# Drop Unnecessary/Unwanted columns
air_bnb_test.drop(['last_review', 'reviews_per_month'], axis=1, inplace=True)

In [None]:
air_bnb_test.shape

### What all manipulations have you done and insights you found?

The same has been covered above. We have removed 2 non-relevent columns from the dataset and replaced null values. Now the shape is 48895,14.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 - Bar Chart

In [None]:
air_bnb.columns

Room Type Analysis

In [None]:
room_type_count = air_bnb['room_type'].value_counts()
room_type_count

In [None]:
plt.bar(room_type_count.index, room_type_count.values)
plt.xlabel('Count')
plt.ylabel('Room Type')
plt.title('Room Types Distribution')
plt.show()

Neighbourhood Group Analysis

In [None]:
neighbourhood_group_count = air_bnb['neighbourhood_group'].value_counts()
neighbourhood_group_count

In [None]:
plt.bar(neighbourhood_group_count.index, neighbourhood_group_count.values)
plt.xlabel('Count')
plt.ylabel('Neighbourhood Group')
plt.title('Neighbourhood Group Distribution')
plt.show()

Neighbourhood Analysis

In [None]:
neighbourhood_count = air_bnb['neighbourhood'].value_counts()
neighbourhood_count

In [None]:
import plotly.express as px
fig = px.bar(neighbourhood_count)
fig.show()

Host Name Analysis

In [None]:
host_name_count = air_bnb['host_name'].value_counts()
host_name_count

##### 1. Why did you pick the specific chart?

This chart helps us to provide bivariate analysis for categorical variables.

##### 2. What is/are the insight(s) found from the chart?

1. There are three unique types of room category namely shared room, private room and entire home/apartment. Entire home/apartment is preferred among the tourists while shared room is preferred the least.
2. Manhattan and Brooklyn are preferred the most while Staten Island is preferred the least.
3. Williamsburg leads in neighbourhood while Willowbrook is the last.
4. Michael hosts the most in Airbnb

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Analyzing room types and understanding the requirements of your target audience can indeed have a positive impact on business. By gaining insights into the distribution of room types and identifying the most in-demand types, you can better cater to the preferences and needs of potential customers.

With this information, you can make informed decisions regarding pricing, marketing strategies, and resource allocation. It can help you optimize your offerings, attract the right customers, and maximize revenue. By aligning your business strategy with the identified room types, you can effectively promote your business and improve customer satisfaction.

#### Chart - 2 Scatter Plot

**Price Vs Room Type Analysis**

In [None]:
air_bnb.columns

In [None]:
fig = px.scatter(air_bnb, x="price", y="room_type",color='room_type')
fig.show()

**Review vs Room Type Analysis**

In [None]:
fig = px.scatter(air_bnb, x="number_of_reviews", y="room_type",color='room_type')
fig.show()

**Price Vs Availabilty**

In [None]:
fig = px.scatter(air_bnb, x="availability_365", y="price",color='price')
fig.show()

##### 1. Why did you pick the specific chart?

To compare the analysis between two measures.

##### 2. What is/are the insight(s) found from the chart?

1. Price of shared room lies between 0-2K, private room lies from 0-3K while entire room lies from 0-4K (costliest). One can see many outliers in Entire room/apartment booking field.
2. Based on the analysis of customer reviews/feedback, the following trends were observed:
   (a) Private room listings typically have a range of 0 to 500 reviews/feedback.
   (b) Entire room listings tend to have a range of 0 to 350 reviews/feedback.
   (c) Shared room listings generally have a range of 0 to 130 reviews/feedback.
3. Upon analyzing the pricing data for all types of rooms over the course of 365 days, it is evident from the chart that the majority of prices fall within the range of 0-2k. This range represents the most common prices observed for bookings across various room types.
The chart highlights the prevalent pricing trend, indicating that most accommodations, regardless of room type, are priced within this range.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

By analyzing data related to prices, room availability, and the number of reviews, businesses can gain valuable insights that can contribute to their growth and significantly impact their business and marketing strategies.

Understanding the pricing trends within the market enables businesses to make informed decisions when setting their own prices. By identifying the range of prices for different room types, businesses can align their pricing strategy with market standards and customer expectations, ensuring they remain competitive.

Examining the availability of rooms provides businesses with insights into their inventory management. By understanding the availability patterns, businesses can optimize their resource allocation, manage bookings efficiently, and potentially increase their revenue by maximizing occupancy rates.

Additionally, the number of reviews plays a crucial role in building trust and credibility among potential customers. A higher number of positive reviews can significantly impact a business's reputation, leading to increased bookings and customer loyalty. By monitoring and analyzing reviews, businesses can identify areas for improvement and take necessary actions to enhance customer satisfaction.

Utilizing data-driven insights from these factors, businesses can make better-informed decisions regarding their pricing strategies, resource allocation, and marketing efforts. By aligning their offerings with market trends and customer preferences, businesses can achieve growth, attract more customers, and effectively promote their services, resulting in long-term business success.

#### Chart - 3

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
air_bnb.columns

In [None]:
plt.figure(figsize=(20,10))
sns.heatmap(data = air_bnb.corr(), annot=True, cmap = 'coolwarm')

##### 1. Why did you pick the specific chart?

Heatmaps is a representation of data in the form of a map or diagram in which data values are represented as colours.

##### 2. What is/are the insight(s) found from the chart?

Based on the data analysis, certain visual patterns were observed in the correlation matrix. The presence of red blocks signifies a strong correlation between the variables 'id' and 'host_id'. This implies that these two variables are closely related to each other.

In contrast, the dark blue blocks in the correlation matrix suggest no significant correlation between the variables related to the number of reviews. This indicates that the number of reviews is not strongly correlated with other variables in the dataset.

#### Chart - 15 - Pair Plot

In [None]:
sns.pairplot(data = air_bnb)

##### 1. Why did you pick the specific chart?

Pairplot allows us to plot pairwise relationships between variables within a dataset.

##### 2. What is/are the insight(s) found from the chart?

By examining the pairplot, we can observe the relationship between variables within the dataset. In particular, when focusing on the price of rooms per night, it becomes apparent that the price for a single night is generally higher compared to longer durations.

This observation suggests a correlation between the price and the duration of stay. Typically, accommodations charge a premium for shorter stays, such as a single night, while offering discounted rates for longer stays.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

To achieve business objective here are some suggestions:

1. Make the Airbnb website and app easy to use and enjoyable for both hosts and guests.

2. Offer a wide range of different places to stay, so there's something for everyone.

3. Take safety seriously by verifying hosts, having reviews from past guests, and secure payment systems.

4. Build a strong community of hosts by providing support and resources.

5. Encourage hosts and guests to be environmentally friendly and promote sustainable practices.

6. Work with governments and follow local rules to avoid any legal issues.

7. Expand into new markets and adapt to their specific needs and preferences.

8. Embrace new technologies to make the Airbnb experience even better.

By focusing on these areas, Airbnb can achieve its goals of growing, keeping customers happy, and being a trusted and successful brand.

# **Conclusion**

Airbnb changed the way people travel by helping them find cool and different places to stay, and letting regular people make money by renting out their homes. It has lots of different types of places to choose from and has been part of the sharing economy. But sometimes it has caused problems with high rent and safety issues. Overall, Airbnb has made a big impact on how we travel and is working to make things better and safer.
