# **Project Name**    - **Airbnb booking analysis**



##### **Project Type**    - EDA(exploratory data analysis)
##### **Contribution**    - Individual


# ****Project Summary: Airbnb Booking Analysis Using Python****



**Introduction:**
The Airbnb Booking Analysis project aims to explore and analyze a dataset containing information about Airbnb bookings. Leveraging the power of Python and popular data analysis libraries, this project delves into understanding patterns, trends, and insights related to property listings, pricing, and guest preferences on the Airbnb platform.

**Objective:**
The primary goal of this project is to extract actionable insights that can benefit both Airbnb hosts and potential guests. By utilizing Python for data analysis, the project aims to uncover key factors influencing pricing, identify popular property types, and explore geographic trends in booking preferences. The findings can assist hosts in optimizing their listings and help travelers make informed decisions based on their preferences and budget.

**Tools and Technologies:**
The project utilizes Python as the main programming language for data manipulation, analysis, and visualization. Key libraries such as Pandas, NumPy, and Matplotlib are employed for data processing and plotting. Seaborn, a statistical data visualization library, is utilized to create visually appealing and informative charts. Jupyter Notebooks serve as the development environment, allowing for an interactive and iterative analysis process.

**Data Exploration:**
The dataset includes various attributes such as property type, neighborhood, pricing, and reviews. Initial exploration involves loading the data into a Pandas DataFrame, examining data types, and checking for missing values. Descriptive statistics provide a snapshot of the dataset, guiding subsequent analysis.

**Price Analysis:**
A significant aspect of the project involves understanding the factors influencing property pricing. Visualization tools are employed to illustrate the distribution of prices, and statistical measures are used to identify outliers. By exploring the relationship between variables such as room type, location, and pricing, the project aims to provide insights into optimal pricing strategies for hosts.

**Geographic Trends:**
Geographic analysis involves visualizing the distribution of listings across neighborhoods and identifying patterns in booking preferences. Heatmaps and geographical plots are generated to highlight popular areas, helping hosts focus their marketing efforts and travelers find accommodations in desired locations.

**Room Type Preferences:**
Analyzing room type preferences provides valuable insights into the types of properties that attract guests. Bar charts and pie charts are employed to visually represent the distribution of different room types. Understanding guest preferences can assist hosts in tailoring their offerings to better meet market demand.

**Conclusion and Recommendations:**
In conclusion, the Airbnb Booking Analysis project utilizes Python to uncover actionable insights for both hosts and guests. By exploring pricing dynamics, geographic trends, and room type preferences, the project provides valuable information to enhance the Airbnb experience. Hosts can optimize their listings based on data-driven recommendations, while guests can make more informed booking decisions aligned with their preferences.

**Future Work:**
Future iterations of the project could include predictive modeling to forecast pricing trends, sentiment analysis on reviews to gauge guest satisfaction, and the integration of external datasets for a more comprehensive analysis. Additionally, deploying interactive dashboards using tools like Plotly or Tableau could enhance the accessibility of insights for a broader audience.

In summary, the Airbnb Booking Analysis project showcases the power of Python in extracting meaningful insights from data, contributing to the continuous improvement of the Airbnb platform for hosts and guests alike.

# **GitHub Link -**

https://github.com/RAHULNEGI1620/AIRBNB


# **Problem Statement**


**The challenge is to conduct a comprehensive analysis of the Airbnb dataset for the year 2019, aiming to derive actionable insights that can inform strategic decision-making and enhance the overall user experience. The analysis should encompass various facets of the platform, including but not limited to property characteristics, pricing dynamics, customer reviews, and geographical trends. Key objectives include understanding the factors influencing property pricing, identifying popular locations, and uncovering patterns that can contribute to optimizing the Airbnb business model.**

#### **Define Your Business Objective?**

Business Objectives:

**Optimize Pricing Strategy**: Explore the dataset to identify factors influencing property prices, such as location, property type, amenities, and seasonality. Develop insights to help Airbnb hosts set competitive and attractive prices, maximizing revenue while ensuring value for guests.

**Enhance User Experience**: Analyze customer reviews and feedback to understand common themes, preferences, and pain points. Propose recommendations to improve the overall user experience on the platform, fostering positive interactions between hosts and guests.

**Geographical Expansion Planning**: Identify and analyze popular locations and trends in user demand. Provide insights to guide Airbnb's strategic decisions for expanding into new markets or strengthening its presence in existing ones.

**Host Engagement and Support**: Examine host-related data to understand host behaviors, satisfaction, and challenges. Develop strategies to engage hosts effectively, providing support and incentives to maintain a thriving community of property owners.

**Marketing and Customer Acquisition**: Analyze the dataset to identify effective marketing channels, customer acquisition strategies, and factors influencing user decisions. Provide insights to refine marketing efforts and attract new hosts and guests to the platform.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

### Dataset Loading

In [None]:
# Load Dataset


In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Dataset First View

In [None]:
# Dataset First Look
airbnb = pd.read_csv("/content/Airbnb NYC 2019.csv")
airbnb

In [None]:
airbnb.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
num_rows=airbnb.shape[0]
num_columns=airbnb.shape[1]
print(f"The dataset has {num_rows} rows and {num_columns} columns")

### Dataset Information

In [None]:
# Dataset Info
airbnb.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
airbnb.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# missing values
airbnb.isnull().sum()

In [None]:
airbnb.fillna({'name': 'N/A'}, inplace=True)

In [None]:
# Missing Values/Null Values Count
missing_values_count = airbnb.isnull().sum()
missing_values_count = missing_values_count[missing_values_count > 0]
print(missing_values_count)

In [None]:
airbnb.drop(['latitude'], axis=1, inplace=True)
airbnb.head()

In [None]:
# Visualizing the missing values
plt.figure(figsize=(10, 6))
bars = plt.bar(missing_values_count.index, missing_values_count)
plt.title('Number of Missing Values by Column')
plt.xlabel('Columns')
plt.ylabel('Number of Missing Values')

# Adding labels to the bars
for i in range(len(bars)):
    yval = missing_values_count[i]
    plt.text(bars[i].get_x() + bars[i].get_width() / 2, yval, f'{int(yval)}', ha='center', va='bottom')

plt.show()


In [None]:
airbnb.fillna({'reviews_per_month': 0}, inplace=True)

In [None]:
airbnb.isnull().sum()

### What did you know about your dataset?

This dataset has around 48,895 observations in it with 16 columns and it is a mix of categorical and numeric values.
It also contains missing values in different columns as shown in bar chart above.


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
airbnb.columns

In [None]:
# Dataset Describe
airbnb.describe()

### Variables Description

 Here's a brief variable description for each column:

1. **id:** Unique identifier for each listing.
2. **name:** Descriptive name of the listing.
3. **host_id:** Unique identifier for each host.
4. **host_name:** Name of the host.
5. **neighbourhood_group:** The borough or area of the listing.
6. **neighbourhood:** The specific neighborhood within the borough.
7. **latitude:** Latitude coordinates of the listing.
8. **longitude:** Longitude coordinates of the listing.
9. **room_type:** Type of room (e.g., Entire home/apt, Private room, Shared room).
10. **price:** Price per night for the listing.
11. **minimum_nights:** Minimum number of nights required to book the listing.
12. **number_of_reviews:** Total number of reviews for the listing.
13. **last_review:** Date of the last review.
14. **reviews_per_month:** Average number of reviews per month.
15. **calculated_host_listings_count:** Number of listings by the host.
16. **availability_365:** Number of days the listing is available in a year.



### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for columns in airbnb.columns:
  unique_values = airbnb[columns].unique()
  # Display the count of unique values for each variable
# Display the count of unique values for each variable
for column in airbnb.columns:
    unique_count = airbnb[column].nunique()
    print(f"Number of unique values for {column}: {unique_count}")




## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

In [None]:
#Columns 'neighbourhood_group', 'neighbourhood', and 'room_type' are categorical variables, so it'll be more appropriate to convert them from an object to a category type.
airbnb['neighbourhood_group'] = airbnb['neighbourhood_group'].astype('category')
airbnb['neighbourhood'] = airbnb['neighbourhood'].astype('category')
airbnb['room_type'] = airbnb['room_type'].astype('category')

In [None]:
#Now we can see unique categories in these columns:
airbnb['neighbourhood_group'].values

In [None]:
airbnb.head(10)

In [None]:
airbnb.tail()

**1.What can we learn about different hosts and areas?**

In [None]:
host_areas = airbnb.groupby(['host_id','neighbourhood_group'])['calculated_host_listings_count'].max().reset_index()
host_areas.sort_values(by='calculated_host_listings_count',ascending=False).head(5)


In [None]:
from matplotlib import pyplot as plt
airbnb['host_id'].plot(kind='line', figsize=(8, 4), title='host_id')
plt.gca().spines[['top', 'right']].set_visible(False)

we found that host name sonder(NYC) has highest number of listings in manhattan followed by blueground.


**2.what we learn from room type and their prices according to their area?**

In [None]:
room_price_area_wise = airbnb.groupby(['neighbourhood_group','room_type'])['price'].max().reset_index()
room_price_area_wise.sort_values(by='price',ascending=False).head(10)

### What all manipulations have you done and insights you found?

After performing data wrangling in given dataset we dropped some unnecessary columns and meanwhile we found that host name sonder(NYC) has highest number of listings in manhattan followed by blueground and also people prefer to go for entire home and private room as compared to the shared rooms.

In [None]:
#outliers

In [None]:
airbnb.describe()

Here are some observations from the output:

'price': The minimum price is 0, which seems strange, and the maximum price is 10,000, which is significantly higher than the 75th percentile. It also has high standard deviation.

'minimum_nights': The maximum value is 1,250, which is much higher than the 75th percentile.

'number_of_reviews': The maximum value is 629, which is much higher than the 75th percentile.

'reviews_per_month': The maximum value is 58.5, which is much higher than the 75th percentile.

'calculated_host_listings_count': The maximum value is 327, which is much higher than the 75th percentile.

In [None]:
outlier_cols = [
    'price', 'minimum_nights', 'number_of_reviews', 'reviews_per_month', 'calculated_host_listings_count'
]

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
for col in outlier_cols:
    plt.figure(figsize=(5, 5))
    sns.boxplot(x=airbnb[col])
    plt.show()

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1
**which types of room attracted more crowd?**



In [None]:
# Chart - 1 visualization code
neighbourhood_group = ['Brooklyn','Manhattan','Queens','Manhattan','Brooklyn','Staten Island','Queens','Bronx','Queens','Bronx']
room_type = ['Entire home/apt','Entire home/apt','Private room','Private room','Entire home/apt','Entire home/apt','Private room','Shared room','Entire home/apt']

room_dict = {}

for i in room_type:
  room_dict[i] = room_dict.get(i,0)+1

plt.bar(room_dict.keys(), room_dict.values(), color='green',edgecolor='blue')
plt.title('Room Types')
plt.xlabel('Room Type')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart is a common way to visualize the distribution of categorical data or the relationship between a categorical variable and a numerical variable.

##### 2. What is/are the insight(s) found from the chart?

we found that Entire home/apt is the highest number of room types overall and prices are high in the brooklyn and manhattan for entire home/apt

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact Insight:

One positive insight could be that "Entire home/apartment" listings in high-demand neighborhoods (e.g., Manhattan) tend to have higher prices compared to other room types. This could indicate that hosts offering entire homes in popular areas are able to command premium prices, possibly due to the convenience and exclusivity of having an entire living space to oneself. This insight may lead to strategies such as investing in marketing for entire home/apartment listings in these high-demand neighborhoods, potentially maximizing revenue.

Negative Growth Insight:

On the flip side, a potentially negative insight could be that "Shared room" listings across various neighborhoods have consistently lower prices, and the demand for such listings is not as strong. This could suggest that shared rooms may not be as appealing to Airbnb guests, leading to lower occupancy rates and potentially limiting revenue opportunities. Hosts might consider re-evaluating the pricing strategy for shared rooms or explore ways to enhance the appeal of shared accommodations to attract more guests

#### Chart - 2
**Is there a relationship between the number of reviews a neighborhood group has received and its location in the city?**

In [None]:
# Chart - 2 visualization code
area_reviews = airbnb.groupby(['neighbourhood_group'])['number_of_reviews'].max().reset_index()
area_reviews

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

# Create a bar chart of the number of reviews for each neighborhood group
plt.bar(area_reviews['neighbourhood_group'], area_reviews['number_of_reviews'])
plt.xlabel('Neighbourhood group')
plt.ylabel('Number of reviews')
plt.title('Number of reviews by neighbourhood group')
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart is a common way to visualize the distribution of categorical data or the relationship between a categorical variable and a numerical variable

##### 2. What is/are the insight(s) found from the chart?

Queens got highest number of reveiws followed by Manhattan and brooklyn.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

### Positive Impact:

1. **Identifying Hotspots:** High review counts in specific neighborhood groups indicate potential business hotspots, aiding targeted investments and expansion.

2. **Customer Engagement:** Increased reviews suggest active customer engagement, fostering positive word-of-mouth and attracting more clients.

### Negative Growth:

1. **Low Activity Areas:** Consistently low review counts in certain neighborhoods may signal reduced business activity, posing challenges for operations in those areas.

2. **Negative Feedback:** High counts of negative reviews can erode customer trust, impacting satisfaction and potentially leading to a decline in business.

### Justification:

The specifics of positive and negative impacts depend on a thorough analysis of the dataset, considering factors like average ratings, review distribution, and correlation with external influences.

#### Chart - 3
**What can we learn about different hosts and areas?**

In [None]:
# Chart - 3 visualization code

host_areas = airbnb.groupby(['host_id', 'neighbourhood_group'])['calculated_host_listings_count'].max().reset_index()
host_areas.sort_values(by='calculated_host_listings_count', ascending=False).head(5)

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

# Assuming 'host_areas' is the DataFrame resulting from your provided code
top_hosts = host_areas.sort_values(by='calculated_host_listings_count', ascending=False).head(5)

# Create a horizontal bar chart
plt.barh(top_hosts['neighbourhood_group'], top_hosts['calculated_host_listings_count'], color='skyblue')
plt.xlabel('Calculated Host Listings Count')
plt.ylabel('neighbourhood group')
plt.title('Top Hosts in Each Neighborhood Group')
plt.show()


##### 2. What is/are the insight(s) found from the chart?

we found that  host name Sonder(NYC) has listed highest number of listings in Manhattan followedby blueground

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


**Positive Business Impact:**
***Business Collaboration***: Identifying top hosts offers collaboration opportunities, fostering strategic partnerships for business expansion.

***Market Optimization***: Recognizing dominant hosts in neighborhoods enables tailored strategies, optimizing operations for market understanding.

**Negative Growth Concern**:
Overdependence Risk:

Insight: Single dominant hosts pose overdependence risk.
Negative Impact: Dependency on a few hosts may lead to negative growth if they face issues or reduce listings.
Intense Competition Challenges:

Insight: High host counts indicate intense competition.
Negative Impact: Fierce competition may challenge pricing and customer acquisition, impacting profit margins.
Lack of Diversity Warning:

Insight: Dominant hosts suggest market lack of diversity.
Negative Impact: Limited market options and vulnerability if the dominating host encounters challenges.

#### Chart - 4
**What we learn from room type and their prices according to area?**

In [None]:
# Chart - 4 visualization code
#What we learn from room type and their prices according to area?
room_price_area_wise = airbnb.groupby(['neighbourhood_group','room_type'])['price'].max().reset_index()
room_price_area_wise.sort_values(by='price', ascending=False).head(10)

In [None]:
neighbourhood_group = ['Brooklyn','Manhattan','Quenns','Manhattan','Brooklyn','Staten Island','Queens','Bronx','Queens','Bronx']
room_type = ['Entire home/apt','Entire home/apt','Private room','Private room','Private room','Entire home/apt','Entire home/apt','Private room','Shared room','Entire home/apt']

room_dict = {}

for i in room_type:
  room_dict[i] = room_dict.get(i,0)+1

plt.bar(room_dict.keys(), room_dict.values(), color='green', edgecolor='blue')
plt.title('Room Types')
plt.xlabel('Room Type')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart is a common way to visualize the distribution of categorical data or the relationship between a categorical variable and a numerical variable.

##### 2. What is/are the insight(s) found from the chart?

the graph shows popularity of room types according to the location.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:
Popular Room Type: The dominance of 'Entire home/apt' indicates a high demand for complete accommodations, presenting an opportunity for hosts offering such units to meet market preferences.

Negative Impact:
Limited Diversity: If the majority of listings are 'Entire home/apt,' it may suggest a lack of diversity in room types. This could limit options for budget-conscious travelers seeking alternatives.

Competitive Challenges: High demand for 'Entire home/apt' may lead to increased competition among hosts offering such accommodations, potentially affecting pricing strategies and profit margins.

#### Chart - 5
**What can we learn from data?(ex: loctions,prices, reviews, etc)**

In [None]:
# Chart - 5 visualization code

area_reviews = airbnb.groupby(['neighbourhood_group'])['number_of_reviews'].max().reset_index()
area_reviews

In [None]:
area = area_reviews['neighbourhood_group']
review = area_reviews['number_of_reviews']
fig = plt.figure(figsize =(10,5))

plt.bar(area, review, color = "blue", width =0.5)
plt.xlabel('Area')
plt.ylabel('Review')
plt.title("Number of reviews in terms of area")
plt.show()

##### 2. What is/are the insight(s) found from the chart?

The chart visualizes the number of reviews in different neighborhood groups, with Manhattan receiving the highest number of reviews.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:
Manhattan Dominance: The high number of reviews in Manhattan suggests a thriving market, potentially indicating a strong demand for accommodations. This can be advantageous for businesses targeting this popular area.

Opportunity for Improvement: Areas with lower review counts might present opportunities for improvement and targeted marketing efforts to boost customer engagement.

Negative Impact:
Overcrowded Market: While high reviews in Manhattan are positive, it could also mean increased competition, potentially making it challenging for new or smaller businesses to gain visibility.

Underperformance in Some Areas: Lower review counts in certain neighborhood groups may indicate underperformance or less popularity, posing challenges for businesses in those areas.Answer Here

#### Chart - 6
**Price VS No of reviews**

In [None]:
# Chart - 6 visualization code


price_area = airbnb.groupby(['price'])['number_of_reviews'].max().reset_index()
price_area.head(10)

In [None]:
price_list = price_area['price']
review = price_area['number_of_reviews']
fig = plt.figure(figsize = (10,5))

plt.scatter(price_list, review)
plt.xlabel('Price')
plt.ylabel('Number of reviews')
plt.title('number of reviews vs price')

##### 1. Why did you pick the specific chart?

This type of plot is useful for understanding the nature of the relationship between the variables and for identifying any outliers or clusters in the data.

##### 2. What is/are the insight(s) found from the chart?

from this visualization we can say that most number of people like to stay in less price and their reviews are higher in those areas.The scatter plot illustrates the relationship between the price of accommodations and the number of reviews, showing that there isn't a clear linear correlation between the two.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:
Diverse Price-Review Distribution: The scatter plot highlights a diverse distribution of reviews across different price ranges, indicating that various accommodations, regardless of price, receive attention and reviews.

Potential for Competitive Pricing: Businesses can identify a range of price points that attract reviews, allowing for strategic pricing decisions to remain competitive and appealing to a broader audience.

Negative Impact:
No Clear Correlation: The absence of a clear correlation between price and reviews suggests that higher prices do not necessarily guarantee more reviews. This may pose challenges for businesses relying solely on higher pricing strategies for success.

Competitive Challenges: Accommodations at various price levels may face intense competition, making it crucial for businesses to differentiate themselves through other factors such as quality, amenities, or unique offerings.

#### Chart - 7
**which hosts are the busiest and what is the reason behind it?**

In [None]:
# Chart - 7 visualization code


busy_hosts = airbnb.groupby(['host_id','room_type'])['number_of_reviews'].max().reset_index()
busy_hosts = busy_hosts.sort_values(by = 'number_of_reviews', ascending = False).head(10)
busy_hosts

In [None]:
from matplotlib import pyplot as plt
busy_hosts.plot(kind='scatter', x='host_id', y='number_of_reviews', s=32, alpha=.8)
plt.gca().spines[['top', 'right',]].set_visible(False)

##### 2. What is/are the insight(s) found from the chart?

The bar chart visualizes the number of reviews for each busy host, indicating that some hosts receive significantly more reviews than others.
so top busiest hosts are as follows:
Dona, ji, Maya, Carol, Danielle

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:
Top Performers Recognition: High review counts for specific hosts highlight their popularity and success, offering an opportunity for recognition and potential collaborations.

Business Expansion: Hosts with consistently high reviews may consider expanding their offerings or services, leveraging their positive reputation to attract more guests.

Negative Impact:
Market Dominance Concerns: If a few hosts dominate the reviews, it may raise concerns about market diversity and potential challenges for newer hosts to gain visibility.

Risk of Dependency: Businesses overly reliant on a small number of hosts may face risks if these hosts experience a downturn or decide to reduce their listings.

#### Chart - 8
**Which hosts are charging higher price?**

In [None]:
# Chart - 8 visualization code


Highest_price = airbnb.groupby(['host_id','room_type','neighbourhood_group'])['price'].max().reset_index()
Highest_price = Highest_price.sort_values(by = 'price', ascending = False).head(10)
Highest_price

In [None]:
name_of_host = Highest_price['room_type']
price_charge = Highest_price['price']

fig = plt.figure(figsize =(10,5))

plt.bar(name_of_host,price_charge,color ='orange', width = 0.5)
plt.xlabel('Name of the host')
plt.ylabel('Price')
plt.title("Host with maximum price charges")
plt.show

##### 2. What is/are the insight(s) found from the chart?

now we have seen that 10 hosts who are charging maximum price are as follows:
jelena,kathrine,erin,matt,olson,amy,rum,jessica,sally,jack
max price is 10000USD

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

the chart highlights a host with the highest price charges, signaling potential benefits in terms of premium positioning and profitability. However, hosts should carefully consider market exclusivity and maintain competitiveness to ensure sustained success.

#### Chart - 9
**Is there any traffic difference among different areas and what could be the reason for it?**

In [None]:
# Chart - 9 visualization code


airbnb.head(10)


In [None]:
traffic_areas = airbnb.groupby(['neighbourhood_group','room_type'])['minimum_nights'].count().reset_index()
traffic_areas = traffic_areas.sort_values(by = 'minimum_nights', ascending = False).head(10)
traffic_areas

In [None]:
areas_Traffic = traffic_areas['room_type']
room_stayed = traffic_areas['minimum_nights']

fig = plt.figure(figsize = (7,5))
plt.bar(areas_Traffic,room_stayed,color ="blue",width = 0.2)

plt.xlabel("Room type")
plt.ylabel("Minimum night")
plt.title("Traffic areas based on minimum nights booked")
plt.show()

##### 2. What is/are the insight(s) found from the chart?

most of the people likely to stay at entire home and private room which are present in manhattan,brooklyn and queens and also visitors referring stay in room which listing price is less.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact:
Stay Duration Understanding: The chart provides insights into the minimum nights booked for different room types, aiding hosts in understanding guest preferences for stay duration in high-traffic areas.

Strategic Planning: Hosts can use this information to strategically plan their offerings, adjusting minimum night requirements based on room types and demand patterns.

Negative Impact:
Potential Booking Constraints: If minimum night requirements are too high, it may limit bookings, especially for shorter stays. Hosts should carefully balance requirements to avoid deterring potential guests.

Room-Type Disparities: Significant differences in minimum nights booked across room types may highlight disparities in demand. Hosts should consider adjusting their offerings to align with market preferences.

#### Correlation Heatmap
 **what is correlation between different variables?**

In [None]:
# Chart - 10 visualization code


corr = airbnb.corr(method = 'kendall')
fig = plt.figure(figsize = (12,6))
sns.heatmap(corr, annot = True)
airbnb.columns

##### 1. Why did you pick the specific chart?

he Kendall rank correlation matrix for the Airbnb dataset and created a heatmap to visualize the correlations between different variables. The heatmap uses the seaborn library.

##### 2. What is/are the insight(s) found from the chart?

the correlation heatmap provides valuable insights for both positive and negative impacts on business

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

leveraging insights from the correlation heatmap can empower businesses to make data-driven decisions, enhance customer experiences, and adapt to market dynamics. Positive impacts include informed decision-making and strategic offerings, while negative impacts underscore the importance of risk mitigation and adaptation to changing business conditions.

#### pairplot
**what is the rooom count in overall NYC according to the listings of room types?**

In [None]:
# Chart - 11 visualization code
#what is the rooom count in overall NYC according to the listings of room types?

plt.rcParams['figure.figsize'] = (8,5)
ax = sns.countplot(y='room_type',hue='neighbourhood_group',data=airbnb,palette='bright')

total = len(airbnb['room_type'])
for p in ax.patches:
      percentage = '{:.1f}%'.format(100 * p.get_width()/total)
      x =p.get_x() + p.get_width()+0.02
      y =p.get_y() + p.get_height()/2
      ax.annotate(percentage, (x,y))

plt.title('count of each room types in NYC')
plt.xlabel('Rooms')
plt.xticks(rotation=90)
plt.ylabel

##### 1. Why did you pick the specific chart?

It allows quick comparisons of room type prevalence in different areas, aiding in understanding the diversity of accommodations available.

##### 2. What is/are the insight(s) found from the chart?

the chart provides valuable insights into the room type distribution in different neighborhood groups in NYC, empowering hosts and businesses to make informed decisions and tailor their offerings to meet market preferences effectively.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1. **Optimize Pricing:**
   - Implement dynamic pricing algorithms.
   - Provide hosts with market-based pricing recommendations.
   - Encourage competitive pricing with incentives.

2. **Enhance User Experience:**
   - Address review-based insights to improve satisfaction.
   - Implement features aligned with user preferences.
   - Focus on user education for a positive experience.

3. **Geographical Expansion:**
   - Prioritize expansion in high-demand regions.
   - Tailor marketing to target demographics.
   - Establish local partnerships for a better user experience.

4. **Host Engagement and Support:**
   - Develop robust support with resources and responsive service.
   - Introduce recognition programs for engaged hosts.
   - Gather feedback for continual improvement.

5. **Marketing and Acquisition:**
   - Refine marketing using effective channels.
   - Utilize targeted advertising for new users.
   - Leverage social media and influencers for brand visibility.

6. **Competitive Edge:**
   - Monitor competitors for emerging trends.
   - Benchmark against industry leaders.
   - Innovate based on competitor insights.

7. **Data-Driven Decisions:**
   - Foster a culture of data-driven decision-making.
   - Conduct regular data reviews and strategy sessions.
   - Invest in data literacy training for stakeholders.

Implementing these concise recommendations will contribute to achieving business objectives, driving growth, and maintaining competitiveness in the online accommodation marketplace. Regular reassessment based on data and market trends is key for sustained success.

# ****Conclusion****

This project involves optimizing Airbnb's business strategies based on a thorough analysis of the 2019 dataset. By implementing dynamic pricing, enhancing user experience, strategically expanding into high-demand regions, engaging hosts effectively, refining marketing approaches, staying competitive through continuous analysis, and fostering a data-driven culture, Airbnb aims to drive growth, improve customer satisfaction, and maintain a strong position in the competitive short-term rental market. Regular reassessment and adaptation to market trends will be crucial for sustained success.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***