<a href="https://colab.research.google.com/github/chiragpandey37/chiragpandey37/blob/main/Copy_of_Airbnb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Airbnb Bookings Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual



# **Project Summary -**

Write the summary here within 500-600 words.

### Project Summary: Airbnb Booking Analysis

#### Objective
The goal of this project was to analyze the Airbnb dataset and provide insights to achieve a positive business impact. By examining various aspects of the data, including pricing, reviews, neighborhood groups, and room types, we aimed to uncover patterns, trends, and correlations that could inform strategic decision-making and improvements. The analysis revealed key findings related to pricing strategy, listing optimization, demand, and competitive analysis, which can help the client make data-driven decisions.

#### Key Analyses and Findings
1. **Pricing Analysis**:
   - **Relationship Exploration**: By investigating the relationship between price and factors such as neighborhood group, room type, and availability, we identified appropriate pricing ranges for different listing categories. This insight helps in setting competitive prices and maximizing revenue.

2. **Neighborhood Analysis**:
   - **Market Focus**: By analyzing the popularity and demand for different neighborhood groups, we identified specific areas where the client can concentrate their marketing efforts and investments to attract more guests.

3. **Review Management**:
   - **Guest Feedback**: Studying the number of reviews and reviews per month highlighted the importance of responding to guest feedback, addressing concerns, and actively encouraging positive reviews. This can enhance guest satisfaction and increase bookings.

4. **Host Analysis**:
   - **Host Collaboration**: By identifying hosts with a high number of listings, the client can collaborate with them to ensure a consistent and quality experience for guests.

#### Recommendations
The analysis of the Airbnb dataset provided valuable insights and recommendations for the client. These insights include:
- **Pricing Strategy**: Implement appropriate pricing ranges for different listing categories to set competitive prices and maximize revenue.
- **Listing Optimization**: Optimize listings based on identified patterns and trends to improve visibility and appeal.
- **Neighborhood Targeting**: Focus marketing efforts and investments on high-demand neighborhoods to attract more guests.
- **Review Management**: Improve guest satisfaction by actively managing reviews and addressing feedback.
- **Host Involvement**: Collaborate with high-performing hosts to ensure consistent and quality guest experiences.

Overall, these recommendations can help the client optimize operations, improve guest satisfaction, and drive business growth in the Airbnb market.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


Airbnb faces several challenges in maximizing revenue, improving guest satisfaction, and maintaining a competitive advantage in the market. These challenges include:

- **Pricing Strategy**: Identifying optimal pricing ranges for different listing categories to stay competitive and maximize revenue.
- **Market Focus**: Determining which neighborhoods to target for marketing efforts and investments to attract more guests.
- **Guest Satisfaction**: Understanding the importance of review management in enhancing guest satisfaction and increasing bookings.


#### **Define Your Business Objective?**

The primary business objective of this project is to provide Airbnb hosts with data-driven pricing strategies that will enable them to:

- Maximize Occupancy Rates: Ensure high booking rates to maintain steady income.
- Increase Revenue: Optimize pricing to achieve the highest possible earnings while maintaining competitiveness.
- Enhance Guest Satisfaction: Set prices that meet or exceed guest expectations, leading to positive reviews and repeat bookings.
- Maintain Market Competitiveness: Continuously adapt to the dynamic short-term rental market to attract more guests compared to competitors.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


### Dataset Loading

In [None]:
# Load Dataset from local server
from google.colab import files
uploaded = files.upload()

### Dataset First View

In [None]:
# Dataset First Look
df = pd.read_csv('/content/Airbnb NYC 2019.csv')
df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
plt.figure(figsize=(5,3))
sns.heatmap(df.isnull(),cbar=False,cmap='Reds')
plt.title('missing values')
plt.show()

### What did you know about your dataset?

- The dataset contains information for Airbnb listings in NYC in 2019.
- It has 48895 rows and 16 columns.
- There are missing values in the columns 'name', 'host_name', 'last_review' and 'reviews_per_month'.
- The columns are of different data types including object, integer and float.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

In [None]:
df.columns


- id: The unique identifier for each Airbnb listing.
- name: The title or name given to the listing by the host.
- host_id: The unique identifier for the host of the listing.
- host_name: The name of the host.
- neighbourhood_group: The larger geographical area in which the listing is located
- neighbourhood: The specific neighborhood or area within the larger geographical area where the listing is situated.
- latitude: The latitude coordinate of the listing's location.
- longitude: The longitude coordinate of the listing's location.
- room_type: The type of room or space available for rent (e.g., - Private room, Entire home/apartment).
- price: The nightly rental price for the listing.
- minimum_nights: The minimum number of nights a guest must book to stay at the listing.
- number_of_reviews: The total number of reviews received for the listing.
- last_review: The date of the most recent review for the listing.
- reviews_per_month: The average number of reviews the listing receives per month.
- calculated_host_listings_count: It indicates how many - properties or accommodations the host has listed on Airbnb.
- availability_365: This column indicates the number of days within a year that the listing is available for booking.

### Check Unique Values for each variable.

In [None]:
df.columns

In [None]:
# Check  number of Unique Values for each variable.
df.nunique()

In [None]:
#unique vales of id
df['id'].unique()

In [None]:
#unique values of name
df['name'].unique()

In [None]:
#unique vales of host_id
df['host_id'].unique()

In [None]:
#unique values of host name
df['host_name'].unique()

In [None]:
#unique values of neighbourhood group
df['neighbourhood_group'].unique()

In [None]:
#unique values of neighbourhood
df['neighbourhood'].unique()

In [None]:
#unique values of latitude
df['latitude'].unique()



In [None]:
#unique values of longitide
df['longitude'].unique()


In [None]:
#unique values of room type
df['room_type'].unique()

In [None]:
#unique values of price
df['price'].unique()


In [None]:
#unique values of minimum night
df['minimum_nights'].unique()


In [None]:
#unique values of number of reviews
df['number_of_reviews'].unique()


In [None]:
#unique values of last review
df['last_review'].unique()


In [None]:
#unique values of reviews per month
df['reviews_per_month'].unique()


In [None]:
#unique values of calculated host listing count
df['calculated_host_listings_count'].unique()


In [None]:
#unique values of availablity 365
df['availability_365'].unique()

## 3. ***Data Wrangling***

In [None]:
df.columns

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
#fill the missing area of name
#check datatype of name
df['name'].dtypes

In [None]:
#datatype is object so we use mode
df['name'].mode()

In [None]:
#fill missing values of name
df['name'].fillna('Hillside Hotel',inplace=True)

In [None]:
#check missing values of name
df['name'].isnull().sum()

In [None]:
#fill missing values of host_name
# check datatype of host name
df['host_name'].dtypes

In [None]:
#check mode of host_name
df['host_name'].mode()

In [None]:
#fill missing value
df['host_name'].fillna('Michael',inplace=True)

In [None]:
#check null value of host_name
df['host_name'].isnull().sum()

In [None]:
#fill missing values of reviews-per- month
#check data type
df['reviews_per_month'].dtypes

In [None]:
#check data is skewed or not
sns.histplot(data=df,x='reviews_per_month',kde=True)
plt.show()

In [None]:
#data is right skewed so we use median
df['reviews_per_month'].median()

In [None]:
#fill missing values
df['reviews_per_month'].fillna(df['reviews_per_month'].median(),inplace=True)

In [None]:
#check null vale of reviews_per_month
df['reviews_per_month'].isnull().sum()

In [None]:
#fill missing values of last_review
#check data type of last review
df['last_review'].dtypes

In [None]:

#change data type
df['last_review'] = pd.to_datetime(df['last_review'])
#find mode of last review
df['last_review'].mode()

In [None]:
#fill missing vales
df['last_review'].fillna(df['last_review'].mode()[0],inplace=True)

In [None]:
#check null value of last_review
df['last_review'].isnull().sum()

In [None]:
df.head(2)

In [None]:
#find which neigbourhood have highest listings
df['neighbourhood_group'].value_counts()



In [None]:
#which room type have highest price
df.groupby('room_type')['price'].mean().sort_values(ascending=False)

In [None]:
#make new column name year
df['year']=df['last_review'].dt.year

In [None]:
df

In [None]:
#year having most reviews
df['last_review'].dt.year.value_counts()

In [None]:


# Checking listing availability having 365 days availability,modest availablity, least availability and unavailability


# 365 days availability
available_365 = df[df['availability_365'] == 365]

#modest availability
modest_available = df[(df['availability_365'] > 100) & (df['availability_365'] < 365)]
# Least availability
# Use bitwise AND operator '&' to combine conditions
least_available = df[(df['availability_365'] > 0) & (df['availability_365'] < 100)]

# Unavailable listings
unavailable = df[df['availability_365'] == 0]

# Print the counts
print(f"Listings available 365 days: {available_365.shape[0]}")
print(f"Listings available modest : {modest_available.shape[0]}")
print(f"Listings with least availability: {least_available.shape[0]}")
print(f"Unavailable listings: {unavailable.shape[0]}")


In [None]:
#convert avilability_365 into data frame
ava_365=pd.DataFrame(available_365).reset_index()



In [None]:
#check which room type have available most
ava_365['room_type'].value_counts()

In [None]:
#which room type is mostly availabe in which year
ava_365['year'].value_counts()



In [None]:
#convert modest_available into datafrmae
mod_ava= pd.DataFrame(modest_available).reset_index()

In [None]:
#check which room type have available modestly
mod_ava['room_type'].value_counts()

In [None]:
#in which year rooms are modest available

mod_ava['year'].value_counts()

In [None]:
#convet availabilty_ least in dataframe
ava_least=pd.DataFrame(least_available).reset_index()




In [None]:
#check which room type have available least
ava_least['room_type'].value_counts()

In [None]:
#in which year rooms are least available
ava_least['year'].value_counts()

In [None]:
# make dataframe of room mostly unavailable
ava_0=pd.DataFrame(unavailable).reset_index()

In [None]:
#check which in  room type is mostly not availabe
ava_0['room_type'].value_counts()

In [None]:
#in which year rooms are mostly unavailable
ava_0['year'].value_counts()

In [None]:
# whcih room type in mostly there
df['room_type'].value_counts()

In [None]:
#in which area room are mostly available

# Create a crosstab of room type and neighbourhood group
room_type_by_neighbourhood = pd.crosstab(df['room_type'], df['neighbourhood_group'])

# Get the room type with the highest availability in each neighbourhood group
most_available_room_type = room_type_by_neighbourhood.idxmax(axis=1)

# Print the results
print(f"Most available room type in each neighbourhood group:\n{most_available_room_type}")


In [None]:
# in which area room are mostly available
df['neighbourhood_group'].value_counts()


In [None]:
#delete host id column
ava_365.drop('host_id',axis=1,inplace=True)
ava_365

### What all manipulations have you done and insights you found?


**Handling Missing Values:**

- **Filling Missing Values**:

 - Fill missing values in the 'name' column with the most frequent name, "Hillside Hotel".
 - Fill missing values in the 'host_name' column with the most frequent host name, "Michael".
 - Fill missing values in the 'reviews_per_month' column with the median value.
 - Fill missing values in the 'last_review' column with the most frequent date, '2019-06-23'.

**Insights:**

- The dataset contains missing values in the 'name', 'host_name', 'reviews_per_month', and 'last_review' columns.
- The distribution of 'reviews_per_month' is right-skewed, indicating that most listings receive a relatively small number of reviews per month, while a few listings receive a large number of reviews.

**Findings**

1- **Neighborhood with the Highest Listings:**

 - Identify which neighborhood has the highest number of listings.

2- **Room Type with the Highest Price:**

 - Determine which room type has the highest average price.

3- **Year with the Most Reviews:**

 - Identify the year that has the most reviews.

4- **Room Type Availability:**

 - Determine which room type is most available, modestly available, least available, and mostly unavailable.

 - Analyze availability by year for each room type.

5- **Predominant Room Type:**

 - Identify the most common room type.

6- **Area with the Most Available Rooms:**

 - Determine in which area rooms are mostly available.

**Insights:**

- Manhattan has the highest number of listings with 21,661, while Staten Island has the lowest with 373.

- Entire homes/apartments have the highest average price at $211.79,

- while shared rooms have the lowest at $70.12.

- The year 2019 has the highest number of reviews (35,261), and 2011 has the lowest (7).

- Listing availability (measured by days available per year):
  - Most available (365 days): 1,295 listings.
  - Modestly available: 17,830 listings.
  - Least available: 12,200 listings.
  - Unavailable: 17,533 listings.

- Private rooms are mostly available, while entire homes/apartments are modestly available, least available, and most often unavailable.

- In 2019, private rooms were mostly available, entire homes/apartments were modestly available, least available, and often unavailable.

- Manhattan has a higher number of shared rooms and entire homes/apartments, while Brooklyn has a higher number of private rooms.

**New Column**

 - **Year Column:**

  - Create a new column showing only the year.

**Insight:**

- The new column displays only the years from the dataset.

**Delete Column**

 - **Drop Host ID Column:**
  - Drop the 'host_id' column in the 'ava_365' dataset.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

In [None]:
df['room_type'].unique()

#### Chart - 1

Bar chart showing no.of neighbourhoods in different neighbourhood group

In [None]:
# Chart - 1 visualization code
# no.of neighbourhoods in different neighbourhood group
neighbourhood_count=df.groupby('neighbourhood_group')['neighbourhood'].count().reset_index()

sns.barplot(x='neighbourhood_group',y='neighbourhood',data=neighbourhood_count)
plt.xlabel('neighbourhood group')
plt.ylabel('no of different neighbourhood')
plt.title('Number of Neighborhoods in Different Neighborhood Groups')
plt.show()



##### 1. Why did you pick the specific chart?

The bar plot is suitable here to show total no of local neighbourhood inside neighbourhood group.

##### 2. What is/are the insight(s) found from the chart?

From this chart i got to know that the manhattan group has the highest no of local neighbourhood which means Manhattan has highest listing whereas the Staten island group has the lowest no. of local neighbourhood which means it as least listing.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 The insights gained from analyzing the chart can potentially help create a positive business impact. For example:

          -Identifying Popular Neighborhoods

          -Understanding Price Trends

          -Improving Customer Experience

#### Chart - 2

Stacked chart showing relationship between neighbouhood group and the count of room type.

In [None]:
# Chart - 2 visualization code
# relationship between neighbouhood group and the count of room type.
count_of_room_type=df.groupby(['neighbourhood_group','room_type']).size().unstack()
count_of_room_type.plot(kind='bar',stacked=True)
plt.xlabel('neighbourhood group')
plt.ylabel('count')
plt.title('Stacked Bar Chart: Neighborhood Group vs Room Type')
plt.legend(title='room type')
plt.show()


##### 1. Why did you pick the specific chart?

I picked this stacked bar plot because it is a good way to show relationship between two categorical variables

##### 2. What is/are the insight(s) found from the chart?

We can observe from the graph that:

-Brooklyn has the highest no. of private rooms.

-Manhattan has the highest no. of Entire home/Apt and Shared rooms.

-Whereas Staten Island has lowest no. of every type of room

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from analyzing an Airbnb dataset can potentially help create a positive business impact. For example:

- optimization in pricing of listing by host.

- helps in understanding demand pattern of different type of room type in different neighbourhood.

#### Chart - 3

scatter plot showing relationship between cordinates and neighbourhood groups.

In [None]:
# Chart - 3 visualization code
#distribution of room types (room_type).
plt.figure(figsize=(10, 6))
sns.scatterplot(x='longitude',y='latitude',hue='neighbourhood_group',data=df)
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('Scatter Plot: Coordinates vs Neighborhood Group')
plt.legend(title='Neighbourhood Group')
plt.show()

##### 1. Why did you pick the specific chart?

As the scatter plot is good to visualise two continuous variable thats why to show relationship between two coordinate and neighbourhood group i have choosen this chart.

##### 2. What is/are the insight(s) found from the chart?

From the graph we can see the scatter plot of coordinates (latitude and longitude) displays a linear relationship, it suggests that there is a correlation or pattern between the latitude and longitude values.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from a scatter plot showing a linear relationship between coordinates (latitude and longitude) can potentially help create a positive business impact in certain scenarios. For example:

- Route planning(transportation routes, such as roads, railways, or shipping lanes)

- Location-based Marketing strategies

#### Chart - 4

Bar plot to show relationship between different neighbourhood group and avg price of the listing situated there.

In [None]:
# Chart - 4 visualization code
#relationship between different neighbourhood group and avg price of the listing situated there.
avg_price=df.groupby('neighbourhood_group')['price'].mean().reset_index()
plt.figure(figsize=(10,6))
sns.barplot(x='neighbourhood_group',y='price',data=avg_price)
plt.xlabel('neighbourhood group')
plt.ylabel('avg price')
plt.title('Bar Plot: Neighborhood Group vs Average Price')
plt.show()


##### 1. Why did you pick the specific chart?

Using a bar plot to visualize the average prices is effective in presenting the data in a straightforward manner. It allows for a quick understanding of the average price ranges for each neighborhood group and helps in identifying any significant disparities or trends.

##### 2. What is/are the insight(s) found from the chart?

Insights observed from the charts are:

- Manhattan group has the highest avg price.

- Whereas Bronx has the lowest avg price for listings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from analyzing the relationship between the neighborhood group and average price can potentially help create a positive business impact. Here are a few ways these insights can be valuable:

- Helps in strategizing price segment.

- Helps in marketting and targetting specific customers segment

#### Chart - 5

Line chart to show relation between different room type and avg minimum nights.

In [None]:
# Chart - 5 visualization code
# relation between different room type and avg minimum nights.
avg_min_night = df.groupby('room_type')['minimum_nights'].mean().reset_index()
plt.figure(figsize=(10, 6))
sns.lineplot(x='room_type', y='minimum_nights', data=avg_min_night)
plt.xlabel('Room Type')
plt.ylabel('Average Minimum Night')
plt.title('Average Minimum Night by Room Type')



##### 1. Why did you pick the specific chart?

A line chart allows us to easily identify trends or patterns in the data. By connecting data points with lines, it provides a visual representation of how the values change over a continuous variable or time.

##### 2. What is/are the insight(s) found from the chart?

Insights observed from the graph:

- There is down trend of avg minimum in the private room whereas as up trend for Entire home/Apt and Share room.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from analyzing the average minimum night and room type can potentially have a positive business impact in the following ways:

Understanding the average minimum night for each room type allows businesses to set appropriate pricing strategies. Room types with longer average minimum nights may command higher rates, while shorter minimum nights may attract more flexible or last-minute bookings. This information enables businesses to optimize pricing and revenue management strategies to maximize profitability.

#### Chart - 6

Line plot showing relationship between different room type and their avg pricing.

In [None]:
# Chart - 6 visualization code
# relationship between different room type and their avg pricing.
avg_price=df.groupby('room_type')['price'].mean().reset_index()
plt.figure(figsize=(10, 6))
sns.lineplot(x='room_type', y='price', data=avg_price)
plt.xlabel('Room Type')
plt.ylabel('Avg Price')
plt.title('Price by Room Type')
plt.show()

##### 1. Why did you pick the specific chart?

The line chart is good to show a trend in the graph thats why i used the line chart to show relation between room type and avg price of each.

##### 2. What is/are the insight(s) found from the chart?

The trend is going downward that means the entire home apt has higher price which gradually decreases in the private room and shared room

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from visualizing the relationship between room type and price can potentially create a positive business impact. It can help in following ways:

- pricing optimization.

- Competitive Positioning.

- Customer Segmentation.

#### Chart - 7

Pie chart to show percentage relation between room type and their total reviews

In [None]:
# Chart - 7 visualization code
#percentage relation between room type and their total reviews

total_review=df.groupby('room_type')['number_of_reviews'].sum()

plt.pie(total_review,labels=total_review.index,autopct='%1.1f%%',startangle=90)
plt.axis('equal')
plt.title('Distribution of Reviews by Room Type')
plt.show()


##### 1. Why did you pick the specific chart?

A pie chart can be used to show the distribution of reviews among different room types. Each room type is represented as a slice of the pie, with the size of the slice proportional to the number of reviews. This type of graph provides a visual representation of the relative popularity of each room type.

##### 2. What is/are the insight(s) found from the chart?

From the chart we can observe that the percentage remove of entire home/apt is 51% which is higher from every other room type which means entire home/apt is most populer and the room type shared room has the least percentage of reviews.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from visualizing the distribution of reviews among different room types using a pie chart can potentially have a positive business impact. For example it can help in-

- Identifying Popular Room Types.

- For enhancing service and Product.

- Pricing and Revenue Management

#### Chart - 8

Horizontal bar chart to show relation between number of reviews and neighbourhood group.




In [None]:
# Chart - 8 visualization code
total_review_count=df.groupby('neighbourhood_group')['number_of_reviews'].sum().reset_index()
total_review_sorted=total_review_count.sort_values('number_of_reviews',ascending=False)

plt.figure(figsize=(10,6))
plt.barh(total_review_sorted['neighbourhood_group'], total_review_sorted['number_of_reviews'])
plt.xlabel('Number of Reviews')
plt.ylabel('Neighbourhood Group')
plt.title('Number of Reviews by Neighbourhood Group')
plt.show()

##### 1. Why did you pick the specific chart?

A bar plot allows you to compare the number of reviews across different neighborhood groups, which are categorical variables. Each neighborhood group is represented by a separate bar, making it easy to compare the heights of the bars to understand the differences in the number of reviews.

##### 2. What is/are the insight(s) found from the chart?

From the graph i can conclude that:

- Brooklyn neighbourhood has the highest no. of reviews.

- whereas Staten island has lowest review or we can say least popular.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gain from the graph can help in following ways:

- To identify popular neighbourhood.

- To improve guest satisfaction through reviews.

- Helps in competitive business.

#### Chart - 9

Bar chart showing top 10 popular listing on the basis of reviews.

In [None]:
# Chart - 9 visualization code
#top 10 popular listing on the basis of reviews.

top_10_listing=df.nlargest(10,'number_of_reviews')

plt.figure(figsize=(8, 4))
sns.barplot(x='number_of_reviews',y='name',data=top_10_listing,color='blue')
plt.xlabel('number of reviews')
plt.ylabel('Listing name')
plt.title('Top 10 Listings')
plt.show()

##### 1. Why did you pick the specific chart?

A bar plot is well suited for compairing values across different categories.In this case each bar represent listing name and the height of bar represent no. of reviews. By using bar plot we can compare the popularity of different listings.

##### 2. What is/are the insight(s) found from the chart?

The insights observed from the chart is top 10 highest reviewed listing located in different neighbourhood.

Among all other listing room near JFK Queen bed has the highest review that means that listing is most popular.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes the insights gained from the chart helps in creating a positive business impact by identifying the top 10 listings with the highest number of reviews, businesses can highlight these popular listings to potential customers. Promoting these listings can increase their visibility and attract more bookings, leading to higher occupancy rates and increased revenue.

#### Chart - 10

Line chart to show trend in the year of last review date

In [None]:
# Chart - 10 visualization code
#trend in the year of last review date

counting_reviews=df['year'].value_counts().sort_index()

plt.figure(figsize=(10,6))
plt.plot(counting_reviews.index,counting_reviews.values,marker='o')
plt.xlabel('Year')
plt.ylabel('Number of Reviews')
plt.title('Trend in Number of Reviews by Year')
plt.xticks(counting_reviews.index)
plt.show()


##### 1. Why did you pick the specific chart?

The line chart represent data points with connected line indicating a sense of continuity between the data point.This is appropriate for visualising the progression of reviews over time as it helps in understanding the overall trend.

##### 2. What is/are the insight(s) found from the chart?

the line chart is clearly showing up trend that means the no of reviews gradually increased from year 2011 to 2019 which shows that the listing becomes more popular over the year.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes the insight gained from the chart will help in the following ways:

**Identifing Seasonal Patterns**: By examining the trend in reviews over different years, businesses can identify any seasonal patterns or fluctuations in customer activity. Understanding these patterns can help in planning and allocating resources accordingly. For example, if there is a peak in reviews during certain months, businesses can adjust their marketing efforts or pricing strategies to capitalize on the increased demand.

#### Chart - 11

Stacked chart showing relation between neighbourhood groups,room type and avg reviews per month.

In [None]:
# Chart - 11 visualization code
#relation between neighbourhood groups,room type and avg reviews per month.
avg_review_per_month=df.groupby(['neighbourhood_group','room_type'])['reviews_per_month'].mean().unstack()
avg_review_per_month.plot(kind='bar',stacked=True)
plt.xlabel('neighbourhood group')
plt.ylabel('Avg review per month')
plt.title('Stacked Bar Chart: Neighborhood Group vs Room Type vs avg review per month ')
plt.legend(title='room type')
plt.show()

##### 1. Why did you pick the specific chart?

This chart allows for the visual comparison of theaverage review activity among different room types and neighbourhood groups.It provide insights into which room types and neighbourhood groups have higher or lower avg reviews per month helping identify area of higher customer engagement.

##### 2. What is/are the insight(s) found from the chart?

This chart provide insights like:

- Queens has the highest avg review per month.

- Brooklyn has the lowest avg review per month.

- Entire home/apt has the highest avg review per month and shared room has lowest.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights from the stacked bar chart showcasing the average reviews per month for different room types within each neighborhood group can help create a positive business impact. Such as:

- Identifying in popular room type.

- understanding in customer preferences.

- improving customer satisfaction.

#### Chart - 12

Pie chart to show distribution of different neighbourhood group and host lsting count.

In [None]:
# Chart - 12 visualization code
#distribution of different neighbourhood group and host lsting count.
total_calculated_host_list=df.groupby('neighbourhood_group')['calculated_host_listings_count'].sum()
plt.figure(figsize=(10,6))
plt.pie(total_calculated_host_list,labels=total_calculated_host_list.index,autopct='%1.1f%%')
plt.axis('equal')
plt.title('Distribution of total host list count by each neighbourhood group')
plt.show()

##### 1. Why did you pick the specific chart?

Pie chart here perfectly represent the different percentage of listing count over each neighbourhood group. This chart helps to show percentage distribution which is easier to interpret.

##### 2. What is/are the insight(s) found from the chart?

From the graph we can observe:

- Manhattan has the highest percentage of total host list count.

- Whereas the Staten island has lowest percentage.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes the gained insights help creating a positive business impact. The pie chart allows businesses to identify the neighborhood groups with the highest calculated host listing counts. This information can help businesses focus their efforts on neighborhoods that have a higher concentration of hosts and listings, potentially indicating higher demand or popularity in those areas. By targeting these high-performing neighborhood groups, businesses can allocate resources more effectively and increase their chances of attracting customers.

#### Chart - 13

Box plot for showing relation between availability 365 and room type.

In [None]:
# Chart - 13 visualization code
#relation between availability 365 and room type.
sns.boxplot(x='room_type', y='availability_365', data=df, palette='Set1')
plt.xlabel('Room Type')
plt.ylabel('Availability for 365 Days')
plt.title('Availability for 365 Days by Room Type')
plt.show()

##### 1. Why did you pick the specific chart?

We can use a box plot to visualize the relationship between room type and availability for 365 days. A box plot provides a concise summary of the distribution of data, including the median, quartiles, and potential outliers.

##### 2. What is/are the insight(s) found from the chart?

availability of shared room is higher as compared to private room and Entire home/apt room type.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights from visualizing the relationship between room type and availability for 365 days can potentially help create a positive business impact in following ways:

- helps in optimizing pricing and revenue.

- enhancing Resource allocation.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
#Relation between Average Price in Neighbourhood Group vs. Room Type

#Creating a pivot table to calculate the average price for each combination of neighborhood group and room
pivot_table= df.pivot_table(index='neighbourhood_group',columns='room_type', values='price',aggfunc='mean')

plt.figure(figsize=(10,6))
sns.heatmap(pivot_table,fmt='.0f',annot=True,cmap='coolwarm')
plt.xlabel('Room type')
plt.ylabel('Neighbourhood Group')
plt.title('Heatmap: Average Price in Neighbourhood Group vs. Room Type')
plt.show()

##### 1. Why did you pick the specific chart?

The heat map allows us to compare the avg prices across different neighbourhood groups and room types.the color intensity provides a visual representation of price level making it easy to identify areas or type of accomodation with higher or lower avg prices.

##### 2. What is/are the insight(s) found from the chart?

from the chart we can observe following:

- price variation by neighbourhood group.

- price difference by diffirent room type.

- we can see the higher correlation between manhattan and entire room/apt and brooklyn and shared room.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(df, vars=['price', 'number_of_reviews', 'reviews_per_month'], hue='neighbourhood_group')
plt.show()

##### 1. Why did you pick the specific chart?

A pairplot with variable price, reviews per month, number of reviews can be a good visualization to explore relationship between these variables. The pairplot allows us to examine pairwise correlation and distribution of this variable while also taking account neighbourhood group as hue parameter.

##### 2. What is/are the insight(s) found from the chart?

The relation between the variables price,reviews per month and total number of reviews and considering neighbourhood group variable as hue parameter.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?


### Suggestions to Achieve Business Objectives

#### 1. Pricing Strategy:
- **Analyze Price Distribution:** Conduct a thorough analysis of price distribution across different neighborhoods, room types, and seasons to identify optimal pricing ranges.
- **Dynamic Pricing:** Implement a dynamic pricing strategy that adjusts prices based on demand, seasonality, and competitor pricing.
- **Weekend/Weekday Pricing:** Differentiate pricing for weekends and weekdays to capture higher demand during weekends.

#### 2. Market Focus:
- **Identify High-Demand Neighborhoods:** Analyze booking patterns and popularity of different neighborhoods to identify areas with high demand.
- **Targeted Marketing:** Focus marketing efforts on high-demand neighborhoods through targeted advertising and promotions.
- **Explore New Markets:** Consider expanding into new neighborhoods with potential for growth.

#### 3. Guest Satisfaction:
- **Active Review Management:** Respond promptly to guest reviews, address concerns, and actively encourage positive reviews.
- **Enhance Listing Descriptions:** Provide accurate and detailed descriptions of listings, highlighting amenities and unique features.
- **Improve Communication:** Maintain clear and proactive communication with guests throughout the booking process.

#### 4. Additional Suggestions:
- **Host Collaboration:** Partner with high-performing hosts to ensure consistent quality and guest satisfaction.
- **Data-Driven Decision Making:** Leverage data analytics to continuously monitor performance, identify trends, and make informed decisions.

By implementing these strategies, the client can optimize pricing, target the right markets, and enhance guest satisfaction, ultimately leading to increased revenue and business growth.

# **Conclusion**

Write the conclusion here.



The analysis of this dataset has provided valuable insights to help drive business growth. By examining various aspects of the data, such as price, reviews, neighborhood groups, and room types, we have gained important information to achieve business objectives.

For example:

- **Pricing Strategy:** There is a relationship between neighborhood groups, neighborhoods, and room types that can help set competitive prices.
- **Customer Demands:** Identifying top features of listings based on location and room type will help meet customer needs.
- **Host Collaboration:** Partnering with hosts who have the most listings can ensure a consistent and quality experience for guests.
- **Review Engagement:** Actively engaging with reviews and ratings will help attract more customers.

These insights are essential for optimizing pricing, targeting the right markets, and improving guest satisfaction, leading to increased revenue and business growth.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***