<a href="https://colab.research.google.com/github/SouvikChakraborty472/Exploratory_Data_Analysis/blob/main/EDA_AirBnb_Bookings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Exploratory Data Analysis of AirBnb Bookings



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual/Team
##### **Member -** Souvik Chakraborty


# **Project Summary -**

Since 2008, Airbnb has transformed travel by offering a unique and personalized way for guests and hosts to connect, enhancing the travel experience globally. Today, Airbnb is a widely recognized and utilized service worldwide.

The millions of listings on Airbnb generate vast amounts of data, which are pivotal for the company. Analyzing this data is essential for:

1. **Security**: Ensuring the safety of both guests and hosts by identifying and mitigating potential risks.
2. **Business Decisions**: Informing strategic decisions to improve operations, expand services, and enhance user experience.
3. **Understanding Behavior**: Gaining insights into the behaviors and preferences of both guests and hosts to tailor services and improve satisfaction.
4. **Performance Analysis**: Assessing the performance of listings and hosts to identify best practices and areas for improvement.
5. **Marketing Initiatives**: Guiding targeted marketing campaigns and promotional efforts to attract and retain users.
6. **Innovative Services**: Implementing new and innovative services that add value to the platform and address user needs.

Through comprehensive data analysis, Airbnb can continue to innovate, improve its offerings, and maintain its position as a leader in the travel and hospitality industry.

This dataset contains approximately 49,000 observations with 16 columns, comprising both categorical and numeric values. This diverse data can provide a comprehensive overview of various aspects of Airbnb listings, allowing for detailed analysis and insights into trends, behaviors, and performance metrics across the platform.

# **GitHub Link -**

https://github.com/SouvikChakraborty472/Exploratory_Data_Analysis/blob/main/EDA_AirBnb_Bookings.ipynb

# **Problem Statement**


To explore and analyze the dataset effectively and answer the listed questions, we need to perform various data analysis and visualization techniques. Here's a step-by-step outline to tackle each question:

1. What can we learn about different hosts and areas
2. What can we learn from room type and their prices according to area?
3. What can we learn from Data? (ex: locations, prices, reviews, etc)
4. Which hosts are the busiest and why is the reason?
5. Which Hosts are charging higher prices?
6. Is there any traffic difference among different areas and what could be the reason for it?
7. What is the correlation between different variables?
8. What is the room count in overall NYC according to the listing of room types?

#### **Define Your Business Objective?**

### Business Objective

The business objectives for enhancing Airbnb's operations, host performance, and customer satisfaction are as follows:

1. **Develop Targeted Marketing Strategies:**
   - Promote Manhattan listings as premium options for affluent travelers and business professionals.
   - Highlight Brooklyn’s cultural appeal for younger travelers and solo adventurers.
   - Emphasize affordability and unique local experiences in Queens and the Bronx.

2. **Enhance Host Support and Development:**
   - Provide Manhattan hosts with resources to maintain luxury standards.
   - Assist Brooklyn hosts in marketing private rooms effectively.
   - Guide Queens and Bronx hosts in emphasizing affordability and local charm.

3. **Improve User Experience:**
   - Enhance search functionality for better accommodation filtering.
   - Implement personalized recommendations using machine learning algorithms.

4. **Guide Investment and Property Development:**
   - Offer insights to investors on demand trends for different room types and locations.
   - Encourage the development of new properties in high-demand areas.

5. **Enhance Customer Service:**
   - Improve query resolution times and ensure prompt issue resolution.
   - Provide proactive customer support to address potential issues early.

6. **Implement Dynamic Pricing Strategies:**
   - Use data analytics to optimize pricing based on demand, seasonality, and competition.
   - Offer targeted discounts and promotions to attract guests during low-demand periods.

7. **Establish Loyalty Programs:**
   - Develop loyalty programs for repeat guests with benefits like discounts and priority bookings.
   - Recognize and reward high-performing hosts with incentives and recognition.

8. **Introduce Innovation and Additional Services:**
   - Launch innovative services like curated local experiences and personalized travel itineraries.
   - Explore additional revenue streams through travel insurance, local partnerships, and premium services.

These objectives aim to strategically enhance Airbnb's market position, improve host and guest experiences, and drive growth through targeted initiatives and innovations.







# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

### Dataset Loading

In [None]:
# Mount Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset
file_path = '/content/drive/My Drive/AirbnbNYC.csv'
# Load the CSV file
df = pd.read_csv(file_path)
df

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
num_rows, num_columns = df.shape

print("Number of rows:", num_rows)
print("Number of columns:", num_columns)

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
num_duplicates = df.duplicated().sum()
print("Number of duplicates:", num_duplicates)

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Filling missing values
df['name'].fillna('Absent', inplace = True)
df['host_name'].fillna('Absent', inplace =  True)

In [None]:
# Visualizing the missing values
sns.heatmap(df.isnull())

### What did you know about your dataset?

The Airbnb dataset, consisting of around 48,895 observations and 16 columns, is a valuable resource for data analysis. It contains a mix of both categorical and numeric data. This dataset helps them make informed decisions regarding security, business strategies, customer and host behavior, and platform performance. Through this data, I understand the gains insights that guide marketing efforts, the development of new services, and more. It is a rich source of information for understanding and optimizing the Airbnb experience for both guests and hosts, contributing to the global recognition of Airbnb as a unique and personalized travel service since 2008.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
df.drop(['latitude','longitude','last_review','reviews_per_month'],axis=1,inplace=True)

In [None]:
# Dataset Describe
df.describe()

The summary statistics shows us that the average price is 152.72, the average minimum nights stay is 7.03 nights, and the average number of reviews is 23.27 per listing. We also learn that a host has an average of 7.14 places listed and availability averages 112.78 vacant days per year.

Most importantly, the min price is showing as zero and the max price as 10,000. Something isn't right with the data, so need to look into this issue and check for outliers.

In [None]:
# Checking for outliers
df.agg({'price':['mean','median','max','count']})

The minimum value is set as zero, which does not make any sense since there are no free rooms on Airbnb - and the maximum value is set 10,000 - which just seems too high. Notice how the mean is considerably higher than the median. This is an indication that the data is skewed.

Let's use a boxplot to get a better understanding of price distribution:

In [None]:
#Plotting the boxplot of the price data.
plt.figure(figsize=(10,5))
ax = sns.boxplot(y='price', data=df).set_title('Price Distribution by neighbourhood group')
sns.set_theme(style='white')
plt.xlabel('Neighbourhood')
plt.ylabel('Price')
plt.show()

In [None]:
# As expected there are outliers present in the data.
# In order to fix this, we will be using the quantile based flooring and capping.
# First, let's print the 10th and 90th percentile of the price column.
# low quantile
q_low = df['price'].quantile(0.10)
print(q_low)
#high quantile
q_high = df['price'].quantile(0.9)
print(q_high)


Looks like the 10th percentile is 49USD and the 90th percentile is 269USD. Since we are dealing with Airbnb listings, it is not uncommon for a few places to be much more expensive than others, especially luxurious ones.

Given that we are going to perform calculations in our data and the mean can be skewed by outliers, let's go ahead and implement the above technique by removing the values that don't apply to the specified range – set between 49USD and 269USD.

In [None]:
#removing the values below 10th percentile and above 90th percentile
df = df.drop(df[df['price']<q_low].index)
df = df.drop(df[df['price']>q_high].index)

In [None]:
#Lets create a new boxplot and check the result.
plt.figure(figsize=(10,5))
ax = sns.boxplot(y='price', data=df).set_title('Price Distribution')
sns.set_theme(style='white')
plt.ylabel('Price')
plt.show()

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in df.columns:
  print("No. of unique values in ",i,"is",df[i].nunique())

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Highest number o host listing as per neighbourhood
host_areas =df.groupby(['host_name','neighbourhood_group'])['calculated_host_listings_count'].max().reset_index()
host_areas.sort_values(by='calculated_host_listings_count',ascending=False)

We find that Host name Sonder(NYC) has listed highest number of listings in Manhattan followed by Blueground

In [None]:
# Room type & their prices according to area
room_price_area_wise = df.groupby(['neighbourhood_group','room_type'])['price'].max().reset_index()
room_price_area_wise.sort_values(by='price', ascending=False)

In [None]:
# Number of reviews obtained by neighbourhood
area_reviews = df.groupby(['neighbourhood_group'])['number_of_reviews'].max().reset_index()
area_reviews

In [None]:
# Busiest hosts according to prefered room type
busy_hosts = df.groupby(['host_id','host_name','room_type'])['number_of_reviews'].max().reset_index()
busy_hosts = busy_hosts.sort_values(by = 'number_of_reviews', ascending =False).head(20)
busy_hosts

In [None]:
# Hosts charging high
Highest_price= df.groupby(['host_id','host_name','room_type','neighbourhood_group'])['price'].max().reset_index()
Highest_price= Highest_price.sort_values(by = 'price', ascending =False).head(20)
Highest_price

In [None]:
# Area prefered as night stay
traffic_areas = df.groupby(['neighbourhood_group','room_type'])['minimum_nights'].count().reset_index()
traffic_areas = traffic_areas.sort_values(by ='minimum_nights',ascending = False).head(10)
traffic_areas

In [None]:
price_area = df.groupby(['price'])['number_of_reviews'].max().reset_index()
price_area.head(10)

### What all manipulations have you done and insights you found?

1. **Top Hosts in Manhattan:**
   - **Sonder(NYC)**: Highest number of listings.
   - **Blueground**: Second highest number of listings.

2. **Preferred Room Types:**
   - **Entire home/apt**: Highest number of listings overall.
   - **Price Trends**: Higher prices for entire home/apartments in Brooklyn and Manhattan.

3. **Guest Preferences:**
   - Guests prefer lower-priced listings with higher reviews.
   - Areas with high preference: Manhattan, Brooklyn, and Queens.

4. **Busiest Hosts:**
   1. Dona
   2. Ji
   3. Maya
   4. Carol
   5. Danielle

   - These hosts list room types like Entire home and Private room, which are highly preferred and have higher reviews.

5. **Hosts Charging Maximum Prices:**
   - Hosts: Jelena, Kathrine, Erin, Matt, Olson, Amy, Rum, Jessica, Sally, Jack
   - Maximum price listed: $10,000 USD

### Key Insights
- **Most Preferred Room Types**: Entire home and Private room.
- **Popular Areas**: Manhattan, Brooklyn, and Queens.
- **Guest Behavior**: Favor lower-priced listings with higher reviews.

This summary provides an overview of host activities, pricing trends, and guest preferences in the Airbnb market of NYC.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 Room count in NYC according to Room type

In [None]:
plt.rcParams['figure.figsize'] = (8, 5)
ax= sns.countplot(y='room_type',hue='neighbourhood_group',data=df,palette='bright')

total = len(df['room_type'])
for p in ax.patches:
        percentage = '{:.1f}%'.format(100 * p.get_width()/total)
        x = p.get_x() + p.get_width() + 0.02
        y = p.get_y() + p.get_height()/2
        ax.annotate(percentage, (x, y))

plt.title('Count of each room types in NYC')
plt.xlabel('Rooms')
plt.xticks(rotation=90)
plt.ylabel('Room Counts')

plt.show()

##### 1. Why did you pick the specific chart?

Comparison: They allow for quick comparison between different categories. For example, you can easily see which category is the most or least frequent.

##### 2. What is/are the insight(s) found from the chart?

Manhattan has more listed properties with Entire home/apt around 27% of total listed properties followed by Brooklyn with around 19.6%.

Private rooms are more in Brooklyn as in 20.7% of the total listed properties followed by Manhattan with 16.3% of them. While 6.9% of private rooms are from Queens.

We can infer that Brooklyn,Queens,Bronx has more private room types while Manhattan which has the highest no of listings in entire NYC has more Entire home/apt room types.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Targeted Marketing Campaigns

Manhattan:

Luxury and Privacy: Market Manhattan listings as luxury options that offer entire homes/apartments for travelers seeking privacy and exclusivity. Highlight the unique, high-end experiences available in Manhattan.
Corporate Stays: Promote entire homes/apartments for business travelers who may prefer the convenience and privacy of having an entire space.

Brooklyn:

Community and Culture: Emphasize Brooklyn's cultural vibrancy and the communal experience of staying in private rooms. Target young travelers, solo adventurers, and budget-conscious guests.
Extended Stays: Position private rooms as ideal for longer stays, offering a home-like environment in a trendy neighborhood.

Queens and Bronx:

Affordability and Local Experience: Highlight the affordability and local charm of private rooms in Queens and the Bronx. Market these areas to travelers seeking a more authentic, local experience at a lower cost.


#### Chart - 2 Price distribution by neighbourhood gropus

In [None]:
# price distribution by neighbourhood gropus
plt.figure(figsize=(12,8))
ax = sns.violinplot(x="neighbourhood_group", y="price", data=df).set_title('Price Distribution by neighbourhood groups')
plt.show()

##### 1. Why did you pick the specific chart?

Violin plots are used in data visualization for several reasons:

1. **Combining Box Plot and Density Plot Features**: Violin plots combine the features of a box plot and a kernel density plot. This allows for a comprehensive visualization that shows both the summary statistics (like median, interquartile range, etc.) and the distribution shape of the data.

2. **Visualizing Distribution Shape**: Unlike box plots, which only provide summary statistics, violin plots show the full distribution of the data, including multimodal distributions. This makes it easier to see where the data is concentrated and if there are any gaps or unusual patterns.

3. **Comparing Multiple Distributions**: Violin plots are particularly useful for comparing the distributions of multiple groups or categories. The side-by-side nature of violins makes it straightforward to compare shapes, centers, and spreads of different distributions.

4. **Handling Large Data Sets**: For large data sets, violin plots can provide a clearer and more informative summary of the data distribution compared to histograms or scatter plots, which can become cluttered.

5. **Identifying Outliers and Skewness**: The shape of the violin can help identify outliers, skewness, and the modality of the data. This can be more informative than the limited outlier information provided by box plots alone.

### When to Use Violin Plots

- When you need to visualize the distribution of a continuous variable.
- When comparing the distributions of a continuous variable across different categories or groups.
- When you want to understand the density and distribution shape along with summary statistics.
- When dealing with complex distributions that are not well represented by box plots or histograms alone.

### Example Use Cases

- **Comparing Test Scores**: Visualizing the distribution of test scores across different classes or schools.
- **Gene Expression Data**: Showing the distribution of gene expression levels across different conditions or experiments.
- **Financial Data**: Comparing the distribution of returns for different stocks or financial instruments.

In summary, violin plots are a powerful tool for visualizing data distributions and comparing multiple groups, providing a detailed view that combines the strengths of both box plots and density plots.

##### 2. What is/are the insight(s) found from the chart?

we can see that Manhattan has a higher price range and is the most expensive one. Brooklyn has the second-highest rental prices, while the Bronx appears as the most affordable one.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Tailored Marketing Strategies

Manhattan:

Luxury Market: Focus marketing efforts on promoting the luxury and exclusivity of Manhattan listings. Highlight the premium experiences and amenities available in this borough to attract high-end travelers.
Business Travelers: Target business travelers who may be willing to pay a premium for proximity to corporate offices and business hubs.

Brooklyn:

Cultural Appeal: Emphasize the cultural attractions, trendy neighborhoods, and unique experiences in Brooklyn. Promote it as a slightly more affordable yet exciting alternative to Manhattan.
Young Professionals: Target young professionals and creatives who are drawn to Brooklyn's vibrant scene and are willing to pay for a mid-range price.

Bronx:

Budget-Friendly Options: Market the Bronx as a budget-friendly option for families, long-term stays, and budget-conscious travelers. Highlight the value for money and affordability of this borough.
Local Experiences: Promote local attractions and unique experiences in the Bronx that offer a different perspective of New York City.

#### Chart - 3 Correlation between different variables

In [None]:
# Chart - 3 Visualizing code of histogram plot & boxplot for each columns to know the data distribution
numerical_df = df.select_dtypes(include=['number'])

corr = numerical_df.corr(method="kendall")
fig = plt.figure(figsize=(12, 6))
sns.heatmap(corr, annot=True)
plt.show()
df.columns

##### 1. Why did you pick the specific chart?

Histograms are used for several important reasons in data analysis and statistics:

1. **Visualizing Data Distribution**: Histograms provide a visual representation of the distribution of a dataset. By displaying the frequency of data points within specified ranges (bins), they make it easier to understand the shape, central tendency, and spread of the data.

2. **Identifying Patterns**: They help identify patterns such as skewness (whether the data is skewed to the left or right), modality (the number of peaks in the data), and the presence of outliers.

3. **Summarizing Large Data Sets**: When dealing with large datasets, histograms condense the information into a more manageable form, allowing for quick insights into the general trends and characteristics of the data.

4. **Comparing Different Datasets**: By comparing histograms of different datasets, one can easily see differences in distributions, such as shifts in central tendency, differences in variability, or changes in the shape of the distribution.

5. **Supporting Decision Making**: In fields such as quality control, finance, and research, histograms are essential tools for making informed decisions based on data analysis. They help identify whether data meets expected distributions or standards.

6. **Detecting Anomalies**: Histograms can highlight unexpected deviations from an expected distribution, which may indicate errors, anomalies, or unique events within the data.

7. **Communicating Data Insights**: They are a straightforward way to communicate complex data insights to others, including those who may not have a strong statistical background. The visual nature of histograms makes it easier to explain and understand data patterns.

Overall, histograms are a fundamental tool in exploratory data analysis, allowing analysts to quickly and effectively summarize, understand, and communicate the underlying structure of data.

##### 2. What is/are the insight(s) found from the chart?

We have seen all the correlation between different variables

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Correlation analysis is a powerful statistical tool that can provide valuable insights for business development. Understanding the correlation between different variables can help a business grow and make informed decisions

#### Chart - 4 Traffic based on minimum nights booked

In [None]:
areas_Traffic = traffic_areas['room_type']
room_stayed = traffic_areas['minimum_nights']

fig = plt.figure(figsize =(7,5))

plt.bar(areas_Traffic,room_stayed, color ="blue", width = 0.2)

plt.xlabel("Room Type")
plt.ylabel("Minimum Night")
plt.title("Traffic Areas based on Minimum Nights Booked")
plt.show()

##### 1. Why did you pick the specific chart?

Highlighting Trends: When bars are ordered in a meaningful way (e.g., ascending or descending order, or by category), bar plots can help highlight trends or patterns in the data.

##### 2. What is/are the insight(s) found from the chart?

### Insights on Room Preferences and Locations

Our analysis reveals the following key insights about Airbnb guests' preferences:

1. **Preferred Room Types**:
    - **Entire Home/Apartments** and **Private Rooms** are the most preferred room types among guests.

2. **Popular Locations**:
    - The most popular locations where these preferred room types are listed are **Manhattan**, **Brooklyn**, and **Queens**.

3. **Pricing Preferences**:
    - Guests tend to favor listings with lower prices, indicating price sensitivity in their booking decisions.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

These insights can significantly help Airbnb in optimizing their business strategies:

1. Focus on Preferred Room Types
Encouraging Listings: Airbnb can encourage more hosts to list entire homes and private rooms, especially in high-demand areas like Manhattan, Brooklyn, and Queens.
Quality Assurance: Implementing quality assurance measures for these room types to ensure they meet guest expectations, which can lead to higher satisfaction and repeat bookings.
2. Geographic Targeting
Marketing Campaigns: Airbnb can tailor marketing campaigns to promote listings in Manhattan, Brooklyn, and Queens, highlighting the availability of entire homes and private rooms.
Balancing Supply and Demand: Encouraging hosts in less popular areas to improve their listings or offer competitive pricing can help balance demand across different locations.

#### Chart - 5 Busiest host in terms of reviews

In [None]:
name_hosts = busy_hosts['host_name']
review_got = busy_hosts['number_of_reviews']

fig = plt.figure(figsize =(20,5))

plt.bar(name_hosts,review_got, color ='purple', width =0.5)
plt.xlabel('Name of the Host')
plt.ylabel('Review')
plt.title("Busiest Host in terms of reviews")
plt.show()

##### 1. Why did you pick the specific chart?

Comparative Analysis: Grouped or stacked bar plots allow for the comparison of sub-categories within each main category. This is useful for more detailed analysis where comparisons between different sub-groups are needed.

##### 2. What is/are the insight(s) found from the chart?

We have identified the busiest hosts on Airbnb based on their room types and the number of reviews they have received. The top hosts are:

1. Dona
2. Ji
3. Maya
4. Carol
5. Danielle

Reasons for Their Success
Preferred Room Types: These hosts primarily list their properties as either "Entire home" or "Private room," which are the most preferred room types among Airbnb users. This preference is likely contributing to their high booking rates and busy schedules.

High Number of Reviews: These hosts also have a significant number of reviews, indicating that they have had many guests and likely provide a satisfactory experience that encourages guests to leave feedback.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Understanding why these hosts are so successful can provide several actionable insights for Airbnb and other hosts on the platform:

Encouraging Preferred Room Types: Airbnb can encourage more hosts to list entire homes and private rooms, particularly in high-demand areas. This could increase booking rates and overall satisfaction.

Highlighting Successful Hosts: Airbnb can use these top hosts as case studies or examples to educate other hosts on best practices. This might include tips on property management, guest communication, and providing a superior guest experience.

#### Chart - 6 Number of reiews/price

In [None]:
area = area_reviews['neighbourhood_group']
review = area_reviews['number_of_reviews']
fig = plt.figure(figsize =(10,5))

plt.bar(area, review, color ="blue", width =0.5)
plt.xlabel('Area')
plt.ylabel('Review')
plt.title("Number of Reviews in terms of area")
plt.show()

##### 1. Why did you pick the specific chart?

Comparative Analysis: Grouped or stacked bar plots allow for the comparison of sub-categories within each main category. This is useful for more detailed analysis where comparisons between different sub-groups are needed.

##### 2. What is/are the insight(s) found from the chart?

We can see that Manhattan got the highest number of reviews. Brooklyn & Queens get approximately equal number of reviews. Staten Island get the least review.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Host Education and Support:
Host Training Programs: Offering training and support to hosts in Manhattan to maintain high standards, and providing additional resources to hosts in Brooklyn, Queens, and Staten Island to improve their listings and attract more reviews.
Best Practices Sharing: Sharing best practices from highly reviewed listings in Manhattan with hosts in other boroughs to help them improve their performance.

#### Chart - 7 histogram for room types

In [None]:
# Plotting histogram for room types to look for which are most preferred room types.
plt.rcParams['figure.figsize'] = (10,5)
hp = sns.histplot(df['room_type'], color= 'red')
hp.set_xlabel('Room type')
hp.set_ylabel('Number of listings')
plt.show()

##### 1. Why did you pick the specific chart?

Frequency Distribution:

Histograms display the frequency distribution of a dataset, showing how often each range of values occurs. This is helpful for understanding the prevalence of certain values within the dataset.

##### 2. What is/are the insight(s) found from the chart?

The histogram plot reveals that the most preferred room types among Airbnb customers are entire homes/apartments and private rooms. Specifically:

1. Entire Home/Apartment: This room type is the most popular, constituting 52% of the total listings. This indicates a strong preference for privacy and the convenience of having an entire place to oneself.

2. Private Room: Following closely, private rooms make up 45.7% of the listings. This suggests that a significant number of travelers are looking for a balance between cost-effectiveness and privacy, opting for a private sleeping area while possibly sharing common spaces with the host or other guests.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Improved Customer Experience
Optimized Search and Recommendations: By enhancing the search algorithms to prioritize entire homes/apartments and private rooms, Airbnb can provide a more personalized and relevant browsing experience. This can lead to higher conversion rates as customers find what they are looking for more easily.
Detailed Listings: Encouraging hosts to provide comprehensive details, high-quality photos, and clear descriptions for these preferred room types can improve the attractiveness of the listings and result in better customer engagement and satisfaction.

#### Chart - 8 Percentage of neighbourhood

In [None]:
# Visualizing using pie chart
df['neighbourhood_group'].value_counts().plot(kind = 'pie', figsize = (8,8), autopct = '%1.1f%%', fontsize = 15)
plt.title("Neighbourhood Group", fontsize = 25)
plt.show()

##### 1. Why did you pick the specific chart?

Pie charts are a popular tool for visualizing data for several reasons:

1. **Simplicity**: Pie charts are straightforward and easy to understand. They provide a clear visual representation of the parts of a whole, making it easy to compare proportions.

2. **Quick Insights**: The visual nature of pie charts allows viewers to quickly grasp the relative sizes of different categories. This is useful for presentations and reports where you need to convey information efficiently.

3. **Comparative Analysis**: Pie charts are particularly effective when you want to show the relationship of individual parts to the whole. For example, they can be used to illustrate market share distribution among companies or the percentage breakdown of a budget.

4. **Visual Appeal**: When designed well, pie charts can be visually appealing and engaging. They can attract attention and make data more approachable, especially for audiences that might find raw numbers overwhelming.

5. **Highlighting Key Data**: By emphasizing certain slices (using colors, exploding slices, etc.), pie charts can draw attention to key data points, making it easier to highlight important information.

However, it's important to note that pie charts are not always the best choice for data visualization. They can become cluttered and hard to read with too many categories, and it's difficult to compare slices that are similar in size. In such cases, other types of charts like bar charts or histograms might be more effective.

##### 2. What is/are the insight(s) found from the chart?

Based on the data analysis, it is evident that the majority of people renting through Airbnb prefer to rent entire homes or apartments, which constitutes 52% of the listings. This preference suggests that travelers highly value privacy and the convenience of having an entire place to themselves.

Following this, 45.7% of the users opt for private rooms. This choice indicates that a significant portion of Airbnb customers is looking for a balance between affordability and privacy, as private rooms typically cost less than entire homes or apartments but still offer a private sleeping area.

Lastly, shared rooms are the least considered room type, making up only a small fraction of the listings. This trend shows that most travelers prefer not to share common spaces with strangers, prioritizing more private accommodation options.

These insights highlight the importance of offering a diverse range of accommodation types to meet varying customer preferences, with a clear emphasis on privacy and exclusivity.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights derived from the analysis can significantly help create a positive business impact for Airbnb in several ways:

### Strategic Planning and Inventory Management:
1. **Inventory Allocation**: Understanding that 52% of customers prefer entire homes or apartments allows Airbnb to encourage hosts to list more entire homes or apartments. This can help meet customer demand more effectively and improve booking rates.
2. **Supply Optimization**: With 45.7% of customers opting for private rooms, Airbnb can focus on attracting more listings of private rooms, particularly in high-demand areas, ensuring that supply matches customer preferences.




## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the comprehensive analysis and insights derived from the Airbnb dataset, here are several business solutions that can be implemented to enhance Airbnb's operations, host performance, and customer satisfaction:

### 1. **Targeted Marketing Strategies**

#### Manhattan
- **Luxury and Exclusivity Campaigns**: Promote listings as premium options offering entire homes/apartments, targeting affluent travelers seeking privacy and luxury.
- **Corporate Stays Promotions**: Focus on attracting business travelers by highlighting the convenience and exclusivity of entire home/apartment listings in Manhattan.

#### Brooklyn
- **Community and Culture Focus**: Market Brooklyn listings by emphasizing the vibrant local culture and the communal experience of staying in private rooms. Target younger travelers, solo adventurers, and those seeking authentic local experiences.
- **Extended Stay Offers**: Create promotional packages for long-term stays in private rooms, catering to digital nomads, students, and professionals looking for affordable, home-like accommodations.

#### Queens and Bronx
- **Affordable Travel Campaigns**: Highlight the affordability and unique local experiences in Queens and the Bronx. Promote private rooms as cost-effective alternatives for budget-conscious travelers.
- **Local Experience Packages**: Offer packages that include local tours, experiences, and recommendations to attract travelers interested in exploring these areas more deeply.

### 2. **Host Support and Development**

- **Manhattan Hosts**: Provide resources and training for maintaining high standards of luxury and exclusivity. Encourage hosts to enhance their listings with premium amenities and services to attract high-paying guests.
- **Brooklyn Hosts**: Support hosts with marketing strategies for private rooms, including tips on creating appealing and affordable listings. Offer workshops on leveraging Brooklyn’s cultural appeal to attract more guests.
- **Queens and Bronx Hosts**: Guide hosts on how to market their listings effectively by emphasizing affordability and local charm. Provide tools for enhancing guest experience through personalized touches and local recommendations.

### 3. **Enhanced User Experience**

- **Customized Search Functionality**: Improve the search features on the platform to allow users to filter listings based on preferred room type, location, and price range. This will help guests find the most suitable accommodations quickly and efficiently.
- **Personalized Recommendations**: Use machine learning algorithms to provide personalized accommodation recommendations based on user preferences, past behavior, and current trends.

### 4. **Investment and Property Development**

- **Strategic Investment Guidance**: Provide insights to potential investors and property developers about the demand trends for entire homes/apartments in Manhattan and private rooms in Brooklyn, Queens, and the Bronx. Guide them in making informed investment decisions.
- **Development of New Properties**: Encourage the development of new properties in high-demand areas with a focus on the preferred room types. This can help meet the growing demand and expand the inventory of desirable listings.

### 5. **Customer Service Enhancements**

- **Improved Query Resolution**: Enhance customer service by reducing query resolution times and ensuring customer issues are resolved promptly. Implement a robust feedback mechanism to continuously improve service quality.
- **Proactive Customer Support**: Offer proactive support to guests and hosts, addressing potential issues before they escalate. Provide training and incentives for customer service agents to ensure high levels of satisfaction.

### 6. **Pricing Strategies**

- **Dynamic Pricing Models**: Implement dynamic pricing strategies that adjust based on demand, seasonality, and competitive pricing. Use data analytics to optimize pricing for different room types and locations.
- **Discounts and Promotions**: Offer targeted discounts and promotional offers to attract guests during low-demand periods. Provide special deals for long-term stays and repeat customers.

### 7. **Loyalty Programs**

- **Guest Loyalty Programs**: Develop loyalty programs for repeat guests, offering benefits such as discounts, priority bookings, and exclusive offers.
- **Host Recognition and Rewards**: Recognize and reward hosts with high performance and excellent reviews. Provide incentives for maintaining high standards and delivering exceptional guest experiences.

### 8. **Innovation and Additional Services**

- **Innovative Offerings**: Introduce new and innovative services, such as curated local experiences, concierge services, and personalized travel itineraries.
- **Additional Revenue Streams**: Explore additional revenue streams, such as offering travel insurance, local partnerships, and premium services for both guests and hosts.




# **Conclusion**

By implementing these business solutions, Airbnb can enhance its market presence, improve customer and host satisfaction, and drive overall growth. These strategies leverage the insights gained from data analysis to create targeted, effective, and innovative approaches that address the diverse needs of Airbnb’s users.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***