# **Project Name**    - AirBnb Bookings Analysis


##### **Project Type**    - Exploratory Data Analysis
##### **Contribution**    - Individual
##### **Team Member 1 - Saurabh Sharma
##### **Team Member 2 - Harshika Singhal



# **Project Summary -**
<b> Airbnb, as in “Air Bed and Breakfast,” is a service that lets property owners rent out their spaces to travelers looking for a place to stay. Travelers can rent a space for multiple people to share, a shared space with private rooms, or the entire property for themselves. The model also gives you the opportunity to customize and personalize your guests’ experience the way you want. Airbnb was started in 2008 by Brian Chesky and Joe Gebbia, based in San Fransisco California.The platform is accessible via website and mobile app.

This project aims to analyze hotel booking data to gain insights and make data-driven decisions for improving the their operations and customer experience. By examining various aspects of the data, such as booking patterns, customer preferences, and revenue trends,

This project seeks to provide valuable information to optimize resource allocation, marketing strategies, and overall hotel
performance.

# **GitHub Link -**

Name - Saurabh Sharma

link - https://github.com/SaurabhSharma-1994


Name - Harshika Singhal

Link - https://github.com/dashboard

# **Problem Statement**


**Since 2008, guests and hosts have used Airbnb to expand on traveling possibilities and present a more unique, personalized way of experiencing the world. Today, Airbnb became one of a kind service that is used and recognized by the whole world. Data analysis on millions of listings provided through Airbnb is a crucial factor for the company. These millions of listings generate a lot of data - data that can be analyzed and used for security, business decisions, understanding of customers' and providers' (hosts) behavior and performance on the platform, guiding marketing initiatives, implementation of innovative additional services and much more.**




#### **Define Your Business Objective?**

**Explore and analyze the data to discover key understandings (not limited to these) such as :**
* Location of neighbourhood and number of hotel
* Types of Room in each location
* Neighbourhood group and Availability of rooms
* Average time stay in each type of room

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
!pip install klib

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from collections import Counter
import klib

### Dataset Loading

In [None]:
from google.colab import drive      #Here we are connecting our drive to our colab
drive.mount('/content/drive')


In [None]:
file_path = '/content/drive/MyDrive/Air BnB EDA Project/Data Set/Airbnb NYC 2019.csv'     #This is the path of our file in our Google drive


In [None]:
# Load Dataset
airbnb=pd.read_csv(file_path)     #Here we are reading the the file using Pandas library

### Dataset First View

In [None]:
# Dataset First Look
airbnb.head()     #Here we are reading the dataset using head function.

In [None]:
airbnb.tail()     #Here we are reading the last 5 rows of dataset and default value for our tail() is 5.

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
airbnb.shape      #Shape is used to check the number of rows and columns in a dataset.

In [None]:
airbnb.count() #Use to count the number of times an element in columns

In [None]:
airbnb.count(axis=1)   #Use to count the number of times an element in each rows

### Dataset Information

In [None]:
# Dataset Info
airbnb.info()     #The use of info here is to get some basic information about a topic or a term and this will give you a brief summary about Airbnb dataset.

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(airbnb[airbnb.duplicated()])    #This code is used to find the number of duplicated rows in the dataframe ‘airbnb’.

In [None]:
# Missing Values/Null Values Count
airbnb.isnull().sum()

#### Missing Values/Null Values

In [None]:
# Visualizing the missing values
sns.heatmap(airbnb.isnull())


In [None]:
# Use fillna() method to replace the NULL values with a specified value.
airbnb.fillna(0, inplace=True)

In [None]:
airbnb.isnull().sum()     #Here we are checking if we still have any null value or not

In [None]:
airbnb.describe()     #Here we are checking the statistical information about the Airbnb dataset, such as count, mean, standard deviation, minimum and maximum values, quartiles, etc.

### What did you know about your dataset?

OBSERVATIONS:-
* The data contains total 48895 rows

* The dataset contains total 16 columns
* Min Price 1s 0 USD and Max Price is 10000 USD
* Mean Price is 152 USD
* On average people stay for 7 days in a room.
* There were total four columns with null values.



## ***2. Understanding Your Variables***

In [None]:
#Here we are checking all the Columns present in our airbnb dataset
list(airbnb.columns)

In [None]:
#Here we are checking the statistical information about the Airbnb dataset, such as count, mean, standard deviation, minimum and maximum values, quartiles, etc.
#include='all' will show all numeric and non-numeric values in the data set
airbnb.describe(include='all')

### Variables Description

* **id                :** The unique id for each new customer given by hotels.

* **name                :** The name of each customer.

* **host_id            :** host_id is id of customer.

* **host_name            :** The name of the customer.

* **neighbourhood_group  :** The group in the neighbours.

* **neighbourhood       :** City in the neighbours.

* **latitude             :** location of hotels in latitude.

* **longitude            :** location of hotels in longitude.

* **room_type             :** type of room(Private,Entire home/apt).

* **price**          : price of room in USD.

* **minimum_nights**          : stay for nights.

* **number_of_reviews**         : reviews by customer.

* **last_review**         : date at which customer given th review.

* **calculated_host_listings_count**     : total counts of host.

* **availability_365**         : availablity of customer.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in airbnb.columns.tolist():
  print("No. of unique values in",i,"is",airbnb[i].nunique(),".")

## 3. ***Data Wrangling***

### Data Wrangling Code

What all manipulations have you done and insights you found?


# **1. Dropped the Host name as they are not very much usefull in our Analysis**

In [None]:
# Write your code to make your dataset analysis ready.
airbnb.drop(["host_name"], axis=1, inplace = True)
airbnb.head()

# **2. Last Review**

In [None]:
#Checking the one column name "Last Review" in our dataset so that we can understand the data type and get the insisght about the data
airbnb['last_review'].head()


In [None]:
#Used the to_datetime function for converting the 'last_review' whose data type is objects to data type datetime
airbnb['last_review'] = pd.to_datetime(airbnb['last_review'], format = "%Y-%m-%d")
airbnb['last_review']

In [None]:
# Use dropna() to remove rows having null values
airbnb.dropna().head()

**We see that minimum price is zero which looks incorrect so we can assign $100 as a minimum price**

In [None]:
# Creating a custom function for setting up the minimum vaalue as zero
def min_price(x):
  if x == 0:
    return 100
  else:
    return x

In [None]:
# Applying custom function for setting minimum value in price as $100
airbnb["price"] = airbnb["price"].apply(min_price)
airbnb[airbnb["price"]==100]

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

:# **Chart - 1 Location of neighbourhood vs number of hotel**

In [None]:
# Chart - 1 Neighbourhood Location
df_airbnb = airbnb.groupby(["neighbourhood_group"], as_index=False).count()
fig, ax = plt.subplots(figsize=(10, 8))
df_airbnb_sorted = df_airbnb.sort_values("id", ascending=False)
x = sns.barplot(data=df_airbnb_sorted, x="neighbourhood_group", y="id", palette='plasma')
x.set_title("NEIGHBOURHOOD LOCATION")
x.set_xlabel("Neighbourhood")
x.set_ylabel("Number of Hotels")



##### 1. Why did you pick the specific chart?

 The chart allows us to compare the number of hotels across different neighborhood groups. By looking at the lengths of the bars, you can quickly identify which neighborhoods have a higher concentration of hotels and which have fewer.

##### 2. What is/are the insight(s) found from the chart?

1. The chart helps identify the neighborhood groups with the highest number of hotels. The taller the bar, the greater the number of hotels in that particular neighborhood group

2. We can observe how the number of hotels is distributed across different neighborhood groups. This insight can be valuable for understanding the hotel market in the area and identifying popular or less-explored neighborhoods for potential accommodation options.

3. The chart also reveals the imbalances or gaps in the distribution of hotels across neighborhood groups. Some neighborhood groups have significantly fewer hotels compared to others, it indicate potential opportunities for growth or untapped markets.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from the provided bar chart can potentially lead to a positive business impact

**Positive Growth Insights:**

1.  Identifying neighborhoods with a high concentration of hotels can highlight
areas with a high demand for accommodation. This information can help businesses make informed decisions about expanding or establishing new hotels in those neighborhoods, potentially leading to increased revenue and market share.

2.  Based on the insights gained from the chart, stakeholders such as hotel owners, investors, or tourism agencies can make informed decisions regarding investment, marketing efforts, or resource allocation in different neighborhood groups.

**Negative Growth Insights:**

1.  If certain neighborhood groups have a significantly higher number of hotels compared to others, it may indicate an oversaturated market. This could lead to intense competition, reduced profitability, and difficulty in attracting customers. Businesses entering such markets may face challenges in achieving sustainable growth and profitability.

# **Chart - 2 Types of Room in each location**

In [None]:
# Chart - 2 Types of Room in each location
x = sns.countplot(data=airbnb, x="neighbourhood_group", hue="room_type")
x.set_title("TYPES OF ROOM IN EACH LOCATION")
x.set_xlabel("Neighbourhood")
x.set_ylabel("Number of Hotels")

plt.show()

##### 1. Why did you pick the specific chart?

This visualization will provide insights into the distribution of room types across different neighborhood groups. It can help identify which types of rooms are more prevalent in each neighborhood group, allowing businesses or analysts to understand the market dynamics and cater their services accordingly.

##### 2. What is/are the insight(s) found from the chart?

By comparing the heights of the bars within each neighborhood group, we can identify the most common or popular room types in different neighborhoods. Here in this graph we can see that in Manhattan demand for  Entire home/apartment is higher then the private room or shared room but in other neighborhood group demand for private room is higher as compare to Entire home/apartment This information can be valuable for understanding the preferences and demands of customers in specific areas.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights from the count plot can potentially help create a positive business impact. However, it's important to note that specific business outcomes can vary depending on the context and the actions taken based on the insights.

Positive Business Impact:

1.  Targeted Marketing: Understanding the prevalent room types in each neighborhood group can help businesses tailor their marketing strategies to target specific customer preferences. By aligning offerings and promotions with popular room types, businesses can attract more customers and increase bookings.

2.  Market Differentiation: Identifying variations in room type distribution across neighborhood groups can help businesses differentiate themselves from competitors. By offering unique room types or focusing on underrepresented options, businesses can stand out in the market and attract customers seeking something different.

3.  Market Expansion: Discovering neighborhoods with a lower count of a specific room type presents growth opportunities. By expanding offerings in those areas, businesses can tap into underserved markets and potentially capture new customers, leading to positive growth.

Negative Growth Insights:

1.  Oversaturated Markets: If certain room types are heavily concentrated in specific neighborhood groups, it may indicate oversaturation. Businesses entering such neighborhoods with similar offerings may face intense competition, lower occupancy rates, and reduced profitability. This insight could lead to negative growth if not addressed appropriately.

2.  Limited Demand: If a particular room type has a consistently low count across all neighborhood groups, it suggests limited demand. Businesses focusing exclusively on such room types may struggle to attract customers and experience negative growth. Careful evaluation of the market potential and customer preferences is crucial before investing in these room types.

3.  Inefficient Resource Allocation: If the count plot reveals significant variations in room type distribution across neighborhoods, it can highlight inefficiencies in resource allocation. Businesses heavily investing in room types with low demand in specific neighborhoods might experience negative growth due to misalignment between supply and demand.

# Chart - 3 Types of room and average prices per person

In [None]:
# Chart - 3 Types of room and mean prices per person
df_mpp = airbnb.groupby(["room_type"], as_index = False).agg({"price": "mean", "calculated_host_listings_count": "mean"})
df_mpp["avg. pay per person"] = df_mpp["price"]/df_mpp["calculated_host_listings_count"]
x= sns.barplot(data = df_mpp, x = "room_type", y = "avg. pay per person")

# Naming the Chart
x.set_title("AVERAGE PRICE OF ROOM PER PERSON")

# Naming Y & X axis
x.set_ylabel("Price")
x.set_xlabel("Type of Rooms")

plt.show()

##### 1. Why did you pick the specific chart?

 This chart is used to compare different categories of data. Here, you are comparing the average price of different types of rooms. The x-axis represents the different types of rooms and the y-axis represents the average price per person. The height of each bar represents the average price per person for that particular type of room.

##### 2. What is/are the insight(s) found from the chart?

 The average price per person is highest for hotel rooms and lowest for shared rooms. This means that hotel rooms are the most expensive and shared rooms are the cheapest option for travelers. Another possible insight is that the average price per person is similar for private rooms and entire homes/apartments. This means that there is not much difference in cost between renting a private room or an entire home/apartment.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the chart can help create a positive business impact. For example, if you are an Airbnb host, you can use this information to set the price of your listing. You can also use this information to determine which type of room is most popular among travelers and adjust your listing accordingly. However, there are no insights that lead to negative growth. The insights gained from the chart are purely descriptive and do not provide any causal relationship between the variables. Therefore, it is up to the business owner to interpret the data and make decisions based on their own goals and objectives.

# Chart - 4 Neighbourhood group and Availability of rooms

In [None]:
# Chart - 3 Neighbourhood group and Availability of rooms

plt.figure(figsize=(14,6))
ax = sns.boxplot(data=airbnb, x='neighbourhood_group',y='availability_365',palette='rocket')

# Naming the Chart
ax.set_title('Relation between Neighbourhood group & Availability of rooms')

# Naming X & Y axis
ax.set_ylabel('Availability 365 Days')
ax.set_xlabel('Neighbourhood Group')

#Adjusting Bar labels
ax.set_xticklabels(ax.get_xticklabels(), size = '15')

##### 1. Why did you pick the specific chart?

The boxplot is used to show the relationship between the Neighbourhood group and the availability of rooms. It allows you to see the distribution of availability within each neighbourhood group and identify any differences or patterns between them.

##### 2. What is/are the insight(s) found from the chart?

Staten island has highest availability of rooms over 365 days followed by bronx.
Brooklyn and manhattan has least availability of rooms

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights regarding the availability of rooms in different Neighbourhood groups can guide strategic planning for the business. Understanding which neighborhoods have higher availability (such as Staten Island and the Bronx) and which have lower availability (such as Brooklyn and Manhattan) can inform decisions related to property acquisition, expansion, and investment.

# Chart - 5 Average time stay in each type of room

In [None]:
# Chart - 5 Average time stay in each type of room

#Here we are finding the average stay in Airbnb by finding the average nights spends
airbnb_average_stay = airbnb.groupby(["room_type"], as_index = False).agg({"minimum_nights": "mean"}).sort_values(by = "minimum_nights")

#Creating a Barplot using Seaborn library
x = sns.barplot(data = airbnb_average_stay, x = "room_type", y = "minimum_nights")

# Naming the Chart
x.set_title("AVERANGE TIME STAY IN EACH TYPE OF ROOM")

# Naming the  X and Y axis
x.set_ylabel("Night Stayed")
x.set_xlabel("Type of Rooms")

##### 1. Why did you pick the specific chart?

Barplot chart is displaying the average time stayed in each type of room. The heights of the bars is representing the average minimum nights stayed for each room type.

##### 2. What is/are the insight(s) found from the chart?

The chart provides information about the average minimum nights stayed for each room type. It indicates that, on average, guests staying in private rooms tend to stay for approximately 5 days, while those in shared rooms stay for around 6 days, and guests in entire homes/apartments stay for approximately 8 days

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive business impact**

* By understanding the average time stayed in different room types, hosts and property owners can optimize their revenue. They can adjust their pricing strategies, minimum night requirements, and availability to maximize occupancy and revenue. For example, if guests tend to stay longer in entire homes/apartments, hosts can prioritize these listings and potentially command higher rates for longer stays.


**Negative growth impact**

* If hosts solely focus on accommodating guests with longer stays, it may lead to a reduced number of bookings or limited capacity utilization. This could potentially result in negative growth

* Guests generally prefer shorter stays in certain room types, hosts may face increased competition in those segments. This could lead to a more saturated market, potentially affecting pricing dynamics and overall profitability.

# **Chart - 6 Hotel Used Every Year**

In [None]:
#Chart - 6 Hotel Used Every Year

#Grouping and Counting
df_air_bnb_huey = airbnb.groupby(["last_review"], as_index = False).count()

#Date Range Selection
start_date = '2010-01-01'
end_date = '2019-12-31'
mask = (df_air_bnb_huey['last_review'] >= start_date) & (df_air_bnb_huey['last_review'] <= end_date)

#Data Filtering
df_air_bnb_huey = df_air_bnb_huey.loc[mask]

#Line Plot Creation
x = sns.lineplot(data = df_air_bnb_huey, x="last_review", y="id")

#Chart Title and Axis Labels:
x.set_title("HOTEL BOOKED EVERY YEAR")
x.set_ylabel("Year")
x.set_xlabel("count of hotel booked")

##### 1. Why did you pick the specific chart?

The line plot was chosen for the scenario because it effectively visualizes the time series data of hotel bookings over time, enabling trend analysis, showcasing the sequential relationship, and providing insights into the distribution of bookings.

##### 2. What is/are the insight(s) found from the chart?


The chart depicting the count of hotel bookings over time reveals the insights such as seasonal patterns, long-term trends, peak periods of high demand, and notable changes in booking patterns, providing valuable information for optimizing pricing, resource planning, and marketing strategies in the hospitality industry.

In above chart we can see that count of hotel booking has increased significantly after 2016 and it was highest in 2019.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Insights Leading Positive Business Impact:**
* Insights such as seasonal patterns, long-term trends, and peak periods of high demand can help businesses optimize pricing, resource planning, and marketing strategies, leading to increased revenue, profitability, and customer satisfaction.

**Insights Leading to Negative Growth:**
* As of now there is no any insights that leads to Negative growth but If the long-term trend shows consistent decline, businesses need to respond with strategies to counteract negative trends and regain growth. Additionally, missed opportunities from ineffective utilization of seasonal patterns or peak periods, as well as inaccurate forecasting, can lead to slower growth or missed revenue targets.

# **Chart - 7 Room Type v/s Number of People**

In [None]:
# Chart - 7 Room Type v/s Number of People

#Figure Size
plt.figure(figsize=(12, 6))

#Boxplot Creation
x = sns.boxenplot(x="room_type", y="calculated_host_listings_count", data=airbnb)

#Chart Title and Axis Labels:
x.set_title("ROOM TYPE V/S NUMBER OF PEOPLE")
x.set_ylabel("Number of occupants")
x.set_xlabel("Type of room")

##### 1. Why did you pick the specific chart?

The boxenplot was chosen because it enables easy comparison of the distribution of occupants across room types, identification of outliers, presentation of summary statistics, and visualization of relationships between room types and the number of occupants.

##### 2. What is/are the insight(s) found from the chart?

The chart provides potential insights such as variations in occupancy levels across different room types such as Shared room has lowest occupents and Entire home/apt has most number of occupents.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**

* The gained insights can help businesses optimize resource allocation, inform pricing strategies, and guide property management decisions, leading to improved efficiency, revenue maximization, enhanced guest experience, and positive customer reviews.

**Insights Leading to Negative Growth:**

*  Failure to address capacity constraints identified by the insights or misaligned marketing strategies may result in missed revenue opportunities, dissatisfied customers, and hindered growth.

# **Chart - 8 Neighbourhood with most number of Hotels**

In [None]:
#Grouping and Counting
df_nmh_airbnb = airbnb.groupby(["neighbourhood"]).count()
neighborhood_counts = Counter(airbnb['neighbourhood'])

#Top 10 Neighborhoods
top_10 = neighborhood_counts.most_common(10)

#Bar Plot Creation:
x = sns.barplot(x = "neighbourhood", y="count", data = pd.DataFrame(top_10, columns=["neighbourhood", "count"]))

#Axis Manipulation:
plt.xticks(rotation=90)

#Chart Title and Axis Labels:
x.set_title("Count of Neighbourhood")
x.set_ylabel("Frequency")
x.set_xlabel("Neighbourhood")

##### 1. Why did you pick the specific chart?

The bar plot was selected because it effectively presents the count of listings in different neighborhoods, allowing for easy comparison and identification of the most popular neighborhood

##### 2. What is/are the insight(s) found from the chart?

The chart provides potential insights into the popularity of neighborhoods based on the count of listings, the concentration of listings in specific neighborhoods, market demand for accommodation in different areas, and potential investment opportunities in neighborhoods with lower listing counts.

In above chart we have represented the most popular top 10 neighborhoods with respect to most number of hotels

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**
* By focusing on popular neighborhoods with a higher count of listings, aligning offerings with guest preferences, and exploring untapped markets, businesses can increase revenue, enhance customer satisfaction, and establish a competitive advantage.

**Insights Leading to Negative Growth:**
* Potential risks include oversaturation in popular neighborhoods, leading to increased competition and potential downward pressure on prices, as well as neglecting neighborhoods with a lower count of listings, missing out on untapped market potential and growth opportunities.

# **Chart - 9 Number of night v/s Number of reviews**

In [None]:
# Chart - 9 Number of night v/s Number of reviews
#Figure Size:
fig,ax = plt.subplots(figsize = (12, 6))

#Scatter Plot Creation:
x = sns.scatterplot(data = airbnb, x="minimum_nights", y="number_of_reviews")

#Chart Title and Axis Labels:
x.set_title("NUMBER OF NIGHTS V/S NUMBER OF REVIEWS")
x.set_ylabel("Number of Reviews")
x.set_xlabel("Night Stayed")

##### 1. Why did you pick the specific chart?

The scatter plot was selected because it effectively visualizes the relationship between the number of nights stayed and the number of reviews. It allows for observing patterns, trends, and potential correlations between the two variables at the individual listing level. Additionally, the scatter plot displays the distribution of the variables and provides insights into data density, particularly when there are many data points..

##### 2. What is/are the insight(s) found from the chart?

The chart reveals whether there is a correlation between the number of nights stayed and the number of reviews received. Higher numbers of reviews indicate higher guest engagement, satisfaction, or longer stays.

In Above chart we can see that number of reviews are higher when the guest are stayed for less number of night stayes.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**

* The analysis reveals that there is a correlation between the number of nights stayed and the number of reviews received.
Guest whose night stay is less are giving more reviews.

* The scatter plot provide insights into the specific stay durations that generate a higher number of reviews, helping businesses identify the lengths of stay that result in increased review activit

**Negative Business Growth:**
* Less people stay at hotel for long period of time and people who stay for long period of time are giving less reviews

# **Chart - 10 Scatter plot according to neighbourhood group**

In [None]:
# Chart - 10 Scatter plot according to neighbourhood group

#Figure Size:
plt.figure(figsize=(14,6))

#Figure Size:
sns.scatterplot(x=airbnb['longitude'],y=airbnb['latitude'], hue=airbnb['neighbourhood_group']).set(title='Scatter plot according to neighbourhood group')
plt.show()

##### 1. Why did you pick the specific chart?

The scatter plot was selected because it effectively visualizes the geographical distribution of Airbnb listings based on latitude and longitude coordinates. It allows for the identification of spatial patterns, clusters, and trends in the data. Additionally, the use of a categorical hue to differentiate neighborhood groups adds another dimension to the visualization, providing insights into the distribution of listings across different areas. The scatter plot's clear representation of data points on a map enhances readability and facilitates the interpretation of the geographical distribution of listings.

##### 2. What is/are the insight(s) found from the chart?

These insights help us in understanding the popularity and availability of accommodations in different neighborhood groups, identifying unique characteristics or high-density areas, and gaining knowledge about the geographic features that attract hosts and guests to specific regions.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**

* The gained insights can help businesses achieve a positive impact by Targeted Marketing, Pricing Optimization and Business Expansion.

**Risks and Considerations:**

* While the insights themselves do not inherently lead to negative growth, businesses should be aware of potential risks, such as:

* Oversaturated markets and Neglecting neighborhood groups with lower popularity may result in missed growth opportunities and limited market reach.

# **Chart - 11 Room Type Distribution on Scatterplot accross United States**

In [None]:
#Chart - 11 Room Type Distribution on Scatterplot accross United States

#Figure Size:
plt.figure(figsize=(14,6))

#Figure Size:
sns.scatterplot(x=airbnb['longitude'],y=airbnb['latitude'], hue=airbnb['room_type']).set(title='ROOM TYPE SCATTER PLOT ON MAP')

#Displaying the Plot
plt.show()

##### 1. Why did you pick the specific chart?

The scatter plot on a map is selected because it effectively visualizes the geographical distribution of Airbnb listings based on latitude and longitude coordinates, with different room types represented by different colors. Here are the key reasons for choosing this chart is Geographical Representation, Variable Comparison and  It helps in understanding the geographical relationships between data points, allowing for a better understanding of the distribution of room types in specific regions or neighborhoods.

##### 2. What is/are the insight(s) found from the chart?

Here in above graph we can see that density of shared is very less and density of Private room and Entire Home/ apartment is more as compared to shared room.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**
* The gained insights can help businesses achieve a positive impact by Analyzing the geographical distribution of Airbnb listings and room types enables businesses to target marketing efforts, optimize pricing strategies, and allocate resources effectively, resulting in increased revenue and profitability.

**Negative Business Impact:**
* Businesses should be cautious of oversaturation and neglecting growth opportunities in specific areas to avoid potential negative impacts on profitability and market expansion.

# **Chart - 12 Neighbourhood vs Price**

In [None]:
# Chart - 12 Neighbourhood vs Price

#Figure Size:
fig = plt.figure(figsize=(30, 8))

#Line Plot Creation:
sns.lineplot(data=airbnb, x='neighbourhood', y='price', hue='room_type',palette="hls")

#X-Axis Label Rotation:
plt.xticks(rotation=90)

#Chart Title and Axis Labels:
plt.title('Neighbourhood vs Price')
plt.ylabel("Price")
plt.xlabel("Neighbourhood")

##### 1. Why did you pick the specific chart?

The line plot effectively visualizes the relationship between neighborhood, price, and room type in the Airbnb dataset, allowing for comparisons of price variations across different neighborhoods and room types, identification of trends, and differentiation of room types within each neighborhood.

##### 2. What is/are the insight(s) found from the chart?




As we can see in above lineplot that it is quite evident that Entire room/apt has all time high price and shared room has the lowest price throughout the neighbourhood.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**
* The gained insights regarding the relationship between neighborhood, price, and room type can positively impact businesses by enabling pricing optimization, targeted marketing, and the identification of competitive advantages through differentiated pricing and niche market targeting.

**Negative Business Impact**

* While the gained insights themselves do not inherently lead to negative growth, businesses should be cautious of potential risks, such as pricing disparities and misalignment with market demand, which could impact customer satisfaction, market balance, resource utilization, and revenue generation.


# **Chart - 13 Minimum Stay for a single booking.**

In [None]:
#Grouping and Aggregating Data:
stay_airbnb = airbnb.groupby(['neighbourhood_group'], as_index=False)['minimum_nights'].mean()
stay_airbnb

In [None]:
stay_airbnb['minimum_nights']

#Creation of Empty List:
stay_type = []

#Write a function to define the stay in 3 category. 1.shortterm_visit, 2.midterm_visit, 3.longterm_visit
for i in stay_airbnb['minimum_nights']:
  if i <= 5:
    stay_type.append('shortterm_visit')                 # Less than or equal to 5 days is Short Term Visit - For Business/Lesiure/Personal
  elif i > 5 and i <= 90:
    stay_type.append('midterm_visit')                   # Less than or equal to 90 days is Mid Term Visit - For Bagpackers
  else:
    stay_type.append('longterm_visit')

In [None]:
#DataFrame Column Creation:

stay_airbnb['Visit Type'] = stay_type
stay_airbnb

In [None]:
#Figure Size:
fig = plt.figure(figsize=(14, 6))

#Styling
sns.set_theme(style="ticks")

#Styling
sns.barplot(data= stay_airbnb, y='minimum_nights', x='Visit Type', hue ='neighbourhood_group', palette="Dark2")

#Bar Plot Creation
plt.title('Minimum nights vs Visit Type')

##### 1. Why did you pick the specific chart?

The bar plot is chosen to visually compare the average minimum nights stayed for different categorized visit types, grouped by neighborhood groups, allowing for clear differentiation and easy interpretation of the relationship between these variables.

##### 2. What is/are the insight(s) found from the chart?

* On the basis of hosts allowing minimum mandatory stay Manhattan, Queens and Brooklyn hosts prefer customers having a minimum 'Mid-term visit' whereas hosts in Bronx and Staten Island prefer customers having a minimum 'Short-term visit'.

* Bronx and Staten Island can be preferred for shorter stays over other neighbourhoods making it budget friendly to some extent.

* Manhattan and Brooklyn being posh areas and the implementaion of higher mandatory stays for single booking will be make these trips/visits expensive.

* Different marketing initiatives can be rolled out based on the mandatory stay period in following neighbourhoods.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**

The gained insights on host preferences and mandatory stay periods have the potential to create a positive business impact. By optimizing pricing strategies based on demand for different stay durations in specific neighborhoods, businesses can maximize revenue and maintain competitiveness. Targeted marketing campaigns tailored to neighborhood-specific preferences can attract the desired audience, leading to increased bookings. By aligning offerings with customer preferences, businesses can enhance customer satisfaction, resulting in positive reviews, repeat bookings, and positive word-of-mouth recommendations, fostering long-term success.



**Negative Business Impact**

While the gained insights themselves do not inherently lead to negative growth, businesses should consider potential risks such as competitive pricing and market segmentation. Careful pricing strategies should be implemented to avoid pricing themselves out of the market, ensuring affordability and maintaining competitiveness. Additionally, businesses should avoid overemphasizing specific neighborhoods or stay durations to prevent neglecting other market segments, ensuring a diverse customer base and avoiding exclusion of potential customers. By considering these factors, businesses can mitigate risks and foster sustainable growth.

#### Chart - 14 - Correlation Heatmap

In [None]:
#Chart - 14 - Correlation Heatmap
klib.corr_plot(airbnb)

##### 1. Why did you pick the specific chart?

The heatmap helps to identify patterns and relationships between variables.
A correlation heatmap provides a visual representation of the relationships between variables in a dataset. It helps identify strong correlations, detect multicollinearity, guide variable selection, and inform feature engineering decisions. By examining the heatmap, patterns and relationships can be identified, aiding in data analysis, model building, and variable manipulation to gain insights and improve predictive accuracy.

##### 2. What is/are the insight(s) found from the chart?

* High correlation number represents high correlation between two variables
eg. number of reviews and reviews per month has correction factor as 0.59 which represents they are highly correlated.
* low correlation number represents less correlation between two variables.
eg.  host id and minimun nights has correlation factor -0.019 which represents they are not much dependent on each other

#### Chart - 15 - Pair Plot

In [None]:
#A pairplot plot a pairwise relationships in a dataset. here we can see distribution of each pair with neighbourhood groups.
sns.pairplot(data = airbnb, hue='neighbourhood_group')
plt.title('Pair plot with different Location')


# NOTE: This chart pairplot is taking aproximately 10 minutes to execute

##### 1. Why did you pick the specific chart?




The pairplot is a useful visualization tool that allows us to explore the pairwise relationships and distributions of variables in the Airbnb dataset, with the ability to differentiate the plots based on the neighborhood groups. By examining the scatter plots and histograms, we can gain insights into how variables are related and understand the distribution patterns across different locations. The pairplot aids in identifying potential correlations, outliers, and trends, providing a comprehensive view of the data and facilitating further analysis.

##### 2. What is/are the insight(s) found from the chart?

* From the above pair plot we can conclude that most of the customers are visiting to Manhattan followed by Brooklyn
* Pricing, minimum night stay, average listing price is more for Manhattan region.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the analysis that we have done, here are some recommendations to help the client achieve their business objectives:

* Pricing Optimization: Utilize the insights gained from the analysis of neighborhood, room type, and minimum nights to optimize pricing strategies. Adjust prices based on demand and customer preferences in specific neighborhoods, offering competitive rates for different stay durations. This can maximize revenue and attract the desired audience.

* Targeted Marketing: Tailor marketing efforts to specific neighborhoods and stay durations. Develop targeted campaigns that highlight the unique features and advantages of accommodations in each neighborhood. Promote shorter stays in Bronx and Staten Island as budget-friendly options, while emphasizing luxury and longer stays in Manhattan and Brooklyn. This can increase customer engagement and bookings.

* Customer Satisfaction: Align offerings with customer preferences to enhance satisfaction. Consider the average minimum nights stayed, room types, and neighborhood preferences to provide accommodation options that meet customer requirements. Continuously monitor customer feedback and reviews to address any issues and ensure a positive guest experience.

* Market Expansion: Explore growth opportunities in neighborhoods with high demand and limited supply. Identify areas where shorter stays or specific room types are in demand and consider expanding property acquisition or management in those locations. This can help tap into underserved markets and drive business growth.

* Competitive Analysis: Continuously monitor competitors in the market, especially in posh areas like Manhattan and Brooklyn. Stay updated on their pricing strategies, room types offered, and customer reviews. Identify opportunities to differentiate offerings and provide unique value propositions to maintain a competitive edge.


By implementing these recommendations, the client can enhance pricing strategies, target marketing efforts, improve customer satisfaction, expand into new markets, stay competitive, and make data-driven decisions. These actions can help achieve the business objectives of increasing revenue, attracting customers, and driving positive business growth in the Airbnb market.

# **Conclusion**

* Manhattan and Brooklyn are the posh areas in NY as  there is  maximum footfall and properties based on prices and listings are are on the higher side.

* Manhattan and Brooklyn have the highest number of hosts.

* Manhattan has highest number of Private rooms and Entire House/Apt. in culmination followed by Brooklyn.

* Highest accommodations of 10,000 USD are available at Manhattan, Brooklyn and Queens.

* Most popular hosts are Sonder, Blueground ,Kara to name a few based on number of reviews and calculated host listing counts.

* Staten Island seems more to be available for booking throughout the year compared to other neighbourhoods.

* Sonder,Blueground ,Sally are some of the top hosts based on their turnover.
Financial District, Midtown, Chelsea are some of the top neighbourhood based on their turnover.

* Shared rooms are mostly available over other room types and Entire Home /Apt which has the highest proportion of room share are mostly on the expensive ends.
Fort Wadsworth and Woodrow are expensive neighbourhood based on median listed price belonging to Staten Island.

* Most hosts allow a minimum 5 nights mandatory stay for single booking but the average increases in case of Manhattan, Brooklyn and Queens.

* Bronx and Staten Island are mostly preferred for Shorter visits and onwards and others are for slightly longer stays.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***