# **Project Name**    -



##### **Project Type**    - AirBnb Bookings Analysis - Exploratory Data Analysis
##### **Contribution**    - Individual
##### **Team Member 1 -Aditya Saxena

# **Project Summary -**

Write the summary here within 500-600 words.

Airbnb has become a popular way for people to rent out their homes or find a place to stay. In this project, we look closely at the data from Airbnb to learn about where listings are located, how much they cost, what guests like, and what makes hosts successful.

This project aims to analyze Airbnb data to gain insights into customer preferences, top hosts and preferred accommodations. By using data analysis tools and techniques, we explore various aspects such as distribution of listings, pricing patterns, customer reviews, and factors affecting host revenue. The findings from this analysis can provide valuable information for hosts to optimize their listings, for customers to make informed decisions, and for policymakers to understand the impact on local housing markets.

First, we look at average prices for different types of accommodations (private room, entire home/apt)? the average prices for different types of accommodations, namely private rooms and entire home/apartments, within the Airbnb platform. The pricing of accommodations plays a crucial role in influencing booking decisions and revenue generation for hosts. By analyzing data from Airbnb listings, we seek to uncover patterns and trends in pricing across these two accommodation types.

Next, we dive into listings. Who are the top hosts in terms of the number of listings they manage? Airbnb platform in terms of the number of listings they manage. Hosts play a pivotal role in shaping the Airbnb experience, as they are responsible for providing accommodations and hospitality to guests.

Next, we dive into understanding Client Preferences in Accommodation Types to analyze client preferences regarding different types of accommodations, including private rooms, shared rooms, and entire homes/apartments. By understanding which accommodation types are most preferred by clients, businesses in the hospitality industry can tailor their offerings to better meet customer demands and enhance customer satisfaction.

Next, we dive into understanding Client Preferences in Accommodation Locations to identify the most preferred area and location by clients for accommodation bookings. By analyzing customer preferences in terms of location, businesses in the hospitality industry can better understand where clients prefer to stay and tailor their offerings accordingly to meet customer demands.

Next, we dive into analyzing accommodation Availability Across Locations to analyze the availability of accommodations across different locations and identify the areas with the highest and lowest availability. By understanding availability patterns, businesses in the hospitality industry can optimize their inventory management, pricing strategies, and marketing efforts to meet customer demand effectively and maximize revenue.

By analyzing data from Airbnb listings, we aim to identify and profile the most prolific hosts, shedding light on their practices, strategies, and impact on the short-term rental market. Finally, we look at what makes hosts successful. We analyze factors like how well their listing is described, how often their place is available, and how competitive their prices are. By understanding these factors, hosts can make changes to earn more money. This project gives insights into how Airbnb works for both hosts and guests. Hosts can learn how to improve their listings to attract more guests and earn more money. Guests can learn what to look for when booking a place. Policymakers can also use this information to make rules that benefit everyone involved. This study shows how data can help us understand and improve the world of short-term rentals.



# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**

The project aims to provide actionable insights that can be utilized by Airbnb hosts to enhance their hosting practices and improve guest experiences. By leveraging data analysis techniques, hosts can optimize their listings, attract more guests, and ultimately increase their earnings on the platform. Additionally, this project will contribute to a better understanding of customer preferences and behaviors in the short-term rental market.
Understanding Airbnb data and its implications can benefit both hosts and guests on the platform. Hosts can optimize their listings to attract more guests and increase revenue, while guests can enjoy improved experiences by selecting accommodations that align with their preferences. Furthermore, policymakers and industry stakeholders can utilize these insights to make informed decisions that support the growth and sustainability of the short-term rental market.



#### **Define Your Business Objective?**

Answer Here.

Help hosts figure out how much they should charge for their place on Airbnb, so they can make more money and attract more guests.

Find out who's doing really well on Airbnb by managing lots of listings. We can learn from them and maybe even work together to make Airbnb even better.

Understand what guests like the most when they book on Airbnb—whether it's having a private room, sharing a room, or having the whole place to themselves. This way, hosts can make their listings more appealing to guests.

Figure out where everyone wants to stay when they use Airbnb. This helps hosts and property owners know where to buy or rent properties and how to market them to get more bookings.

Find out where it's easy or hard to book a place on Airbnb in different areas. Hosts can adjust their availability and prices to match what guests are looking for, so they can fill up their places and make more money.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np #Numpy Library - used for
import pandas as pd #Pandas Library

### Dataset Loading

In [None]:
# Load Dataset

url = 'https://drive.google.com/uc?id=1iwYbhVcKE4Ze70rz-U37biPOkIFP7Tej'

df=pd.read_csv(url)

### Dataset First View

In [None]:
# Dataset First Look

df.head(5) #It will return first 5 rows of data


### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

df.shape #Function to count row and column

### Dataset Information

In [None]:
# Dataset Info

df.info() #It return non null values in column

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

dup_count = df.duplicated().sum() #This function will count duplicated rows in entire dataset

print(dup_count)

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

missing_value = df.isnull().sum().sum() #This function will count missing value in entire datset

print(f"missing_value: {missing_value}")


In [None]:
# Visualizing the missing values

### What did you know about your dataset?

Answer Here
In Airbnb dataset, we have got information about various aspects of listings and bookings on the platform. This includes details such as property type (e.g., entire home/apartment, private room, shared room), location (e.g., city, neighborhood), pricing, availability, and more. The dataset likely consists of multiple rows, each representing a unique listing, and columns representing different attributes or features of those listings. Overall, our dataset provides a comprehensive view of the Airbnb ecosystem, allowing us to analyze trends, patterns, and insights to optimize pricing, improve guest experiences, and inform strategic decisions for hosts and property managers.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

df.columns

In [None]:
# Dataset Describe

df.describe()

### Variables Description

Answer Here

id: Unique identifier for each listing.

name: Name of the listing provided by the host.

host_id: Unique identifier for the host of the listing.

host_name: Name of the host.

neighbourhood_group: Location of the listing categorized into groups (e.g., boroughs in a city).

neighborhood: Specific area or neighborhood of the listing.

latitude: Latitude coordinate of the listing location.

longitude: Longitude coordinate of the listing location.

room_type: Type of accommodation offered in the listing (e.g., private room, entire home/apartment).

price: Price of the listing per night.

minimum_nights: Minimum number of nights required for booking the listing.

number_of_reviews: Total number of reviews received for the listing.

Last_review: Date and content of the last review received for the listing.

reviews_per_month: Average number of reviews received per month for the listing.

calculated_host_listings_count: Total number of listings managed by the host.

availability_365: Number of days the listing is available for booking within a year.



### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

# Check Unique Values for each variable.

for i in df.columns: #It will iterate over columns of df dataset

    uniq_val = df[i].unique() #This function will return uniques values

    print("Unique values for column ", i, ":") #print column name
    print(uniq_val)
    print("Total unique values:", len(uniq_val))  #print total unique values
    print()  #This us added here to add one space between unique values of each columns


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

#Objective 1. What are the average prices for different types of accommodations (private room, entire home/apt)?
print("Objeective 1. What are the average prices for different types of accommodations (private room, entire home/apt)?")
print()
df['Price_per_night'] = df['price']/df['minimum_nights']  #Create extra column here to get price of room per night

pvt_rooms = df[df['room_type']=='Private room']   # Filtering room = private room

avg_private = pvt_rooms['Price_per_night'].mean()  #Getting average price of private room

print(f"Average price of private room in all neighbourhood: {round(avg_private,0)}")

Shared_room = df[df['room_type']=='Shared room']  # Filtering room = Shared room

avg_shared = Shared_room['Price_per_night'].mean() #Getting average price of shared room

print(f"Average price of Shared room in all neighbourhood: {round(avg_shared,0)}")

entire_home = df[df['room_type']=='Entire home/apt']  # Filtering room = Entire room

avg_entire = entire_home['Price_per_night'].mean()    #Getting average price of entire home

print(f"Average price of Entire room/apt room in all neighbourhood: {round(avg_entire,0)}")
print()

#Objective 2 Who are the top hosts in terms of the number of listings they manage?
print("Objective2 Who are the top hosts in terms of the number of listings they manage?")
print()

#Top host for private room
p_r = df[df['room_type']=='Private room']

host_Pvt_room = p_r['host_name'].value_counts()  #Getting the counts of host who sell private rooms

max_host = host_Pvt_room.idxmax()   #Getting the top most host who sell private rooms

print(f"Top host in private rooms: {max_host}")


#Top host for Shared room
p_r_a = df[df['room_type']=='Shared room']

host_shared_room = p_r_a['host_name'].value_counts()  #Getting the counts of host who sell shared rooms

max_host_shared = host_shared_room.idxmax()      #Getting the top most host who sell shared rooms

print(f"Top host in shared rooms: {max_host_shared}")

#Top host for Entire home/apt
p_r_b = df[df['room_type']=='Entire home/apt']

host_entire_room = p_r_b['host_name'].value_counts()   #Getting the counts of host who sell entire rooms

max_host_entire = host_entire_room.idxmax()         #Getting the top most host who sell entire home

print(f"Top host in Entire home: {max_host_entire}")

print()

#Objective 3. What are the most preferred types of accommodations (private rooms,Shared room, entire homes/apartments)?
print("Objective 3.What are the most preferred types of accommodations (private rooms,Shared room, entire homes/apartments)?")
print()
pvt_filter=df[df['room_type']=='Private room']  #Filtering data of private rooms

add_checks_p =pvt_filter['reviews_per_month'].sum()    #Getting addition of number of times private room is used


shared_filter=df[df['room_type']=='Shared room']  #Filtering data of Shared rooms

add_check_s=shared_filter['reviews_per_month'].sum()  #Getting addition of number of times shared room is used


Entire_filter=df[df['room_type']=='Entire home/apt']  #Filtering data of Entire home

add_check_e=Entire_filter['reviews_per_month'].sum()   #Getting addition of number of times Entire home is used



#Getting most preferred accommodation form here
if add_checks_p>=add_check_s and add_checks_p>=add_check_e:
  print("The most preferred types of accommodations : Private Room")
elif add_check_s >= add_checks_p and add_check_s >= add_check_e:
  print("The most preferred types of accommodations : Shared Room")
else:
  print("The most preferred types of accommodations : Entire room/apt")
print()

#Objective 4.What are the most preferred area and location by clients
print("Objective 4.What are the most preferred area and location by clients")
print()

#Getting preferred location
pre_location_c = df.loc[:,['neighbourhood_group']].value_counts() #Here getting count of maximum location used by clients

pre_location_max=pre_location_c.idxmax()  #Getting maximum of location

print(f"Most preferred location by customers: {pre_location_max}")

#Getting preferred area

pre_area_c = df.loc[:,['neighbourhood']].value_counts() #Here getting count of maximum area used by clients

pre_area_max=pre_area_c.idxmax()  #Getting maximum of area

print(f"Most preferred area by customers: {pre_area_max}")
print()


#Objective 5.Highest and lowest availability as per location
print("Objective 5.Highest and lowest availability as per location")
print()

brooklyn_a = df[df['neighbourhood_group']=='Brooklyn']  #Filtering data for Brooklyn location
brooklyn_add=brooklyn_a['availability_365'].sum()   #Getting sum of Brooklyn location

bronx_a = df[df['neighbourhood_group']=='Bronx']  #Filtering data for Bronx location
bronx_add=bronx_a['availability_365'].sum()   #Getting sum of Bronx location

manhattan_a = df[df['neighbourhood_group']=='Manhattan']  #Filtering data for Manhattan location
manhattan_add=manhattan_a['availability_365'].sum()   #Getting sum of Manhattan location

queens_a = df[df['neighbourhood_group']=='Queens']   #Filtering data for Queens location
queens_add=queens_a['availability_365'].sum()     #Getting sum of Queens location

staten_a = df[df['neighbourhood_group']=='Staten Island']  #Filtering data for Staten Island location
staten_add=staten_a['availability_365'].sum()             #Getting sum of Staten location


#Dictionary create to store location as key and it sum of location as values
di = {'Brooklyn':brooklyn_add, 'Bronx':bronx_add,'Manhattan':manhattan_add,'Queens':queens_add,'Staten Island':staten_add }

max_availity= max(di,key=di.get) #This get function is used to return maximum availabitlty as perlocation

print(f"Maximum availability as per location: {max_availity}")

min_availity= min(di,key=di.get)   #This get function is used to return minimum availabitlty as perlocation

print(f"Minimum availability as per location: {min_availity}")


### What all manipulations have you done and insights you found?

Answer Here.

Average Prices for Different Types of Accommodations:

Calculated the average prices for private rooms, shared rooms, and entire homes/apartments across all neighborhoods.

Insight: Entire homes/apartments have the highest average price, followed by private rooms and shared rooms. This suggests that guests are willing to pay more for the privacy and amenities offered by entire accommodations.
Top Hosts in Terms of the Number of Listings They Manage:

Identified the hosts with the highest number of listings in each category: private rooms, shared rooms, and entire homes.

Insight: Certain hosts, such as David, Sergii, and Sonder (NYC), manage a significant number of listings, indicating their prominence in the Airbnb market. This could be due to factors such as reputation, property management skills, or the size of their property portfolio.
Most Preferred Types of Accommodations:

Determined which type of accommodation (private rooms, shared rooms, or entire homes/apartments) is most preferred by guests.

Insight: Entire homes/apartments are the most preferred type of accommodation. This preference could be driven by factors such as privacy, convenience, and amenities available in entire accommodations.
Most Preferred Area and Location by Clients:

Identified the most preferred location and area based on customer preferences.

Insight: Customers prefer staying in Manhattan and Williamsburg, suggesting that these areas offer desirable amenities, attractions, and convenience for Airbnb guests. Factors such as proximity to popular landmarks, transportation options, and safety may influence these preferences.
Highest and Lowest Availability as per Location:

Determined the locations with the highest and lowest availability of Airbnb listings.

Insight: Manhattan has the highest availability of listings, indicating a high demand for accommodations in this area. Conversely, Staten Island has the lowest availability, which may suggest lower demand or fewer properties available for rent in this location.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

avg_price_vis = [round(avg_private,0),round(avg_shared,0),round(avg_entire,0)] #Getting the average prces of room from above wrangling code
room_types_vis = ['Private Room', 'Shared Room', 'Entire Home/Apt']  #Room types


plt.bar(room_types_vis,avg_price_vis,width=0.5)  #Plotting bar chart
plt.ylim([10, 100])  #This function is ues to set limit of y axis
plt.ylabel("Average Price") #Title of y axis
plt.xlabel("Room type")  #Title of x axis
plt.title("Average prices of room types") #Title of graph
plt.show()


##### 1. Why did you pick the specific chart?



Answer Here. I chose a bar chart because it's a good way to show the average prices of listing types.When you look at the chart, you can easily see which listing is cheap and which is costly because the bars are different heights.Bar chart is a good choice because it's easy to understand and compare them.

##### 2. What is/are the insight(s) found from the chart?

Answer Here
The most expensive entire home/apartment on Airbnb and the cheapest shared room.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here. Hosts can use the average prices as benchmarks to set competitive rates for their listings. By aligning their prices with the market average, hosts can attract more guests and potentially increase their bookings and revenue.

In simpler terms, if hosts set their prices much higher or lower than the average, they might have trouble getting guests. If prices are way too high, people on a budget might not book, and if they're too low, guests might wonder if there's something wrong with the place. This could mean fewer bookings and less money for hosts. It's important for hosts to find a balance and set prices that are fair and competitive.

#### Chart - 2

In [None]:
# Chart - 2 visualization code

room_types_vis = ['David(Private Rooms)', 'Sergii(Shared Rooms)', 'Sonder(Entire Homes)']  #Names of top host in terms of listing, got this from above data wrangling

pvt_c=host_Pvt_room.max()   #Getting max count of top host in private rooms
sh_c=host_shared_room.max()   #Getting max count of top host in shared rooms
ent_c=host_entire_room.max()   #Getting max count of top host in shared rooms

max_room_count = [pvt_c,sh_c,ent_c]

plt.bar(room_types_vis,max_room_count,width=0.5,color = 'skyblue')
plt.ylabel("Room count sell by host") #Title of y axis
plt.xlabel("Top host : Room type")  #Title of x axis
plt.title("Count of top host in terms of listing") #Title of graph
plt.show()




##### 1. Why did you pick the specific chart?

Answer Here. A bar chart allows us to compare the counts of different room types ith respect to top host. Each bar represents a top host within specific listing.

##### 2. What is/are the insight(s) found from the chart?

Answer Here. The top host in terms of the number of listings managed for private rooms is David.

Sergii is identified as the top host for shared rooms

Sonder (NYC) is recognized as the top host for entire homes/apartments, managing the most number of listings from all three accommodation type.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Being recognized as a top host can enhance the reputation of hosts on the Airbnb platform, potentially attracting more guests and increasing bookings.

The presence of top hosts with a large number of listings may increase competition for other hosts, making it more challenging for them to attract guests and secure bookings.

#### Chart - 3

In [None]:
# Chart - 3 visualization code

room_counts = df['room_type'].value_counts() #It will return count of specific listing.


room_counts.plot(kind='pie') #This function is used here to plot pie chart
plt.title('Most preferes type of Listing by customers') #This is used to put title of graph
plt.ylabel('') #Due to pandas function, count word is printed on y axis of chart so to remove that is used this function

plt.show() #This is used to show graph as output

##### 1. Why did you pick the specific chart?

Answer Here. I have used pie chart is an effective way to visualize the distribution of count of accommodation types. Here's how you can easily interpret the pie chart to determine the most preferred type of accommodation

##### 2. What is/are the insight(s) found from the chart?

Answer Here Largest slice reprsenting "Entire Homes/Apartments" , it suggests that this type of accommodation is the most preferred among the customers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here. "Entire Homes/Apartments" are the most preferred type of accommodation hosts focus on expanding their portfolio of entire homes/apartments or investing resources in improving the quality and amenities of such accommodations.

if host exclusively offer entire homes/apartments without catering to other segments, they may limit their market reach so they should focus on other listings too.

#### Chart - 4

In [None]:
# Chart - 4 visualization code

pre_location_c = df.loc[:,['neighbourhood_group']].value_counts()  #Getting count as per location

pre_location_c_reset = pre_location_c.reset_index() #To convert data into 2D structure

pre_location_c_reset.columns = ['location', 'count']  # Rename columns


plt.scatter(pre_location_c_reset['location'], pre_location_c_reset['count']) # Plotting the scatter plot

# Adding labels and title
plt.xlabel('Neighbourhood Group(Location)')
plt.ylabel('Count of location')
plt.title('Counts of Accommodations in Each Neighbourhood Group')

plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. Scatter charts are used to visualize the relationship between two continuous variables. In this case, since i'am looking to identify the most preferred location based on frequency counts so i have used scatter graph

##### 2. What is/are the insight(s) found from the chart?

Answer Here. By observing the concentration of points on the scatter chart, we can esily identify the location with the highest density of bookings or visits is Manhattan and location with least booking is Staten Island

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here. Understanding the most preferred locations by clients allows businesses to focus their marketing efforts more effectively. They can tailor their advertising campaigns to target specific demographics or geographic areas, leading to increased brand awareness and customer engagement.

Focusing solely on the most preferred locations identified in the scatter chart may lead to neglecting underperforming areas. Ignoring these areas can result in missed opportunities for growth and revenue generation.

#### Chart - 5

In [None]:
# Chart - 5 visualization code

co_Avail_loc = [brooklyn_add,bronx_add,manhattan_add,queens_add,staten_add] #Getting count of location from above wrangling code
location_a = ['brooklyn','bronx_add','manhattan','queens','staten'] #Name of location

plt.bar(location_a, co_Avail_loc,color='Violet') #Plotting bar chart
# Adding labels and title
plt.xlabel('location')
plt.ylabel('Count')
plt.title("Availability as per location")
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. A bar chart allows us to compare the counts of different location with respect to availability. Each bar represents a availability within specific location.

##### 2. What is/are the insight(s) found from the chart?

Answer Here. By observing the bars on the bar chart, we can esily identify the location with the highest availability is manhattan and location with least availability is Staten Island.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here. Businesses can allocate more resources, such as inventory or staff, to the Manhattan location where there is higher availability, thus maximizing potential sales and customer satisfaction.

Focusing solely on Manhattan due to its high availability might lead to neglect of other locations, such as Staten Island. If not managed carefully, this could lead to decreased customer satisfaction and loyalty in those areas, ultimately resulting in negative growth.

#### Chart - 6

In [None]:

# Extract latitude and longitude from the data
latitude = df['latitude']

longitude = df['longitude']

# Plotting the scatter plot, axis and label

plt.scatter(longitude, latitude, color='blue')
plt.title('Geographical Distribution of Listings')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.grid(True) #To add grid in graph
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

For the scatter plot of latitude vs. longitude, it's a good choice when visualizing geographical data because it allows you to directly map points on a two-dimensional plane using latitude and longitude coordinates.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

By observing clusters of points on the scatter plot, you can identify areas with a high concentration of listings. Dense clusters suggest popular neighborhoods or regions where there's a high demand for accommodation.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here
Understanding the concentration of listings in certain geographical areas allows businesses to focus their marketing efforts more effectively. They can tailor promotional campaigns to specific neighborhoods or regions where there's a high demand for accommodation,

Disparities in listing distribution across different geographical regions could result in uneven revenue streams. If businesses disproportionately invest in high-demand areas while neglecting others, it may lead to imbalanced growth and missed opportunities in underserved markets.

#### Chart - 7

In [None]:
# Chart - 7 visualization code

room_counta = df['room_type'].value_counts() #Gettting the count of room type

neighbour_count = df['neighbourhood_group'].value_counts() #Getting count of location

counts = df.groupby(['neighbourhood_group', 'room_type']).size().unstack(fill_value=0) #Group function , group unique value , unstack will convert data into multidimensional

counts.plot(kind='bar') #This function is used here to plot bar graph
plt.title('Counts of Room Types by Neighbourhood Group')  #This is used to put title of graph
plt.xlabel('Neighbourhood Group')  #Title for x-axis
plt.ylabel('Count')  #Title for y-axis

plt.legend(title='Room Type') #Legend : use to show title of legend

plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. A stacked bar chart allows us to compare the counts of different room types within each neighborhood group. Each bar represents a neighborhood group, and the segments within each bar represent the counts of different room types, making it easy to see the relative proportions of each room type within each group.

##### 2. What is/are the insight(s) found from the chart?

Answer Here Manhattan: The dominant room type in Manhattan is Entire home/apt

Brooklyn: In Brooklyn, Private room listings are the most common, followed closely by Entire home/apt.

Queens:Private room listings are the most common in Queens

Bronx:Private room is preferred.

StatenIsland:Private room is preferred.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here Hosts can tailor their marketing strategies and listing descriptions to highlight the predominant room types in each neighborhood group.

Hosts offering less popular room types may face competition from hosts offering more preferred accommodation options.

#### Chart - 8 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

numerical_columns = ['latitude', 'longitude', 'price', 'minimum_nights', 'number_of_reviews',
                     'reviews_per_month', 'calculated_host_listings_count', 'availability_365']
numeric_df = df[numerical_columns]

# Create a correlation matrix
correlation_matrix = numeric_df.corr()

# Plot the correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. A correlation heatmap allows you to quickly identify which pairs of variables are positively or negatively correlated. This can help you understand how changes in one variable affect another and identify potential patterns or dependencies in your data.

##### 2. What is/are the insight(s) found from the chart?

Answer Here 'reviews_per_month' variable is highly correlated with 'number_of_reviews' variable. Likewise we can check correlation for other variable.

#### Chart - 9 - Pair Plot

In [None]:
# Pair Plot visualization code

numeric_columns = ['latitude', 'longitude', 'price', 'minimum_nights', 'number_of_reviews',
                   'reviews_per_month', 'calculated_host_listings_count', 'availability_365']
numeric_df = df[numeric_columns]

# Create a pair plot
sns.pairplot(numeric_df)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. Pair plots allow you to visualize the pairwise relationships between numerical variables in your dataset. Each scatterplot in the grid represents the relationship between two variables, helping you identify potential correlations or patterns.

##### 2. What is/are the insight(s) found from the chart?

Answer Here. There is high correlation between availability_365 variable and latitude and longitude variable.

Least correlation between calculated_host_listings_count and reviews_per_month.

Likewise with pair plot we can see the relation individually based on target variable

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

Adjust pricing strategies based on the average prices for different types of accommodations. For example, consider offering competitive pricing for private and shared rooms while maximizing revenue from entire homes/apartments.
Utilize dynamic pricing algorithms to adjust prices based on factors such as demand, seasonality, and competitor prices in specific neighborhoods.

Identify opportunities to attract and retain top hosts by offering incentives, such as preferred listing placements or promotional support.
Provide host training and resources to improve property management skills, customer service, and guest satisfaction, leading to positive reviews and increased bookings.

Focus marketing efforts on promoting entire homes/apartments, the most preferred type of accommodation among guests. Highlight the advantages of booking entire accommodations, such as privacy, amenities, and convenience.
Offer incentives or discounts to guests who book entire accommodations to encourage bookings in this category.

Allocate resources to enhance offerings in preferred locations such as Manhattan and Williamsburg. This may include expanding the number of listings, improving property amenities, and enhancing the overall guest experience.
Collaborate with local businesses and attractions to create value-added experiences for guests staying in preferred locations, increasing the attractiveness of these areas.

# **Conclusion**

Write the conclusion here.

• Private rooms have the lowest average price among the different accommodation types, while entire homes/apartments have the highest average price. Shared rooms fall in between.

• David is the top host for private rooms, Sergii is the top host for shared rooms, and Sonder (NYC) is the top host for entire homes/apartments. These hosts manage the highest number of listings in their respective categories.

• The most preferred type of accommodation among guests is entire homes/apartments. This suggests that guests generally prefer having the entire space to themselves rather than sharing with others.

• The most preferred location by clients is Manhattan, and the most preferred area is Williamsburg. This indicates that these areas are popular among Airbnb guests, possibly due to their attractions, amenities, or convenience.

• Manhattan has the highest availability, indicating a greater number of listings and possibly more options for guests. Conversely, Staten Island has the lowest availability, suggesting fewer listings and potentially higher demand in this area.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***