<a href="https://colab.research.google.com/github/KushangShah/CapstoneProject/blob/main/Sample_EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Kushang Shah(Individual)


# **Project Summary -**

Title: Exploratory Data Analysis of Airbnb Dataset: Unveiling Insights for Optimal Stays

Introduction:
This project aimed to perform an in-depth exploratory data analysis (EDA) on a comprehensive Airbnb dataset. The dataset consisted of 48,895 entries, each containing 16 informative columns, such as listing details, host information, location attributes, pricing, and reviews. By analyzing this dataset, we sought to unravel key insights and patterns that could enhance the understanding of Airbnb listings and facilitate better decision-making for both hosts and guests.

Data Loading and Understanding
The initial step involved loading the dataset into a Pandas dataframe, enabling us to explore its structure and familiarize ourselves with the variables. Through this process, we gained valuable knowledge about the dataset's size, column types, and potential areas for analysis.

Data Cleaning for Accurate Analysis
Data cleaning is crucial for maintaining data integrity and ensuring accurate analysis. We addressed missing values by implementing appropriate strategies, either through imputation or by removing rows/columns with excessive missing data. Furthermore, we conducted a thorough examination for duplicate entries, efficiently eliminating any redundancies in the dataset.

Descriptive Statistics: Unveiling Central Tendencies and Variability
To gain a comprehensive understanding of the dataset, we computed descriptive statistics for numerical columns, such as price, minimum nights, and number of reviews. By calculating measures like mean, median, minimum, maximum, and quartiles, we obtained a clear picture of the dataset's central tendencies and variability. Concurrently, we explored the frequency distribution of categorical variables, shedding light on the distribution of different categories within the dataset.

Data Visualization: Unleashing Patterns and Trends
Data visualization is a powerful tool that enables us to uncover hidden patterns and trends. Through various visualizations, including histograms, bar charts, scatter plots, and heatmaps, we embarked on a journey to explore relationships between variables and identify noteworthy insights. For instance, visualizations helped us analyze the distribution of prices across different neighborhoods and room types, ultimately enabling us to discern any spatial trends or disparities.

Feature Engineering: Augmenting Analysis Dimensions
To enrich our analysis, we delved into feature engineering. This process involved creating new features or modifying existing ones to extract more meaningful insights. By deriving additional features, such as the host's average reviews per month or the host's total listings, we were able to uncover fresh dimensions of analysis that provided richer context and enhanced our understanding of the dataset.

Correlation Analysis: Discovering Relationships
Correlation analysis was instrumental in identifying relationships between variables. By calculating correlation coefficients and visualizing them through a correlation matrix, we unraveled significant correlations that offered valuable insights. This analysis allowed us to identify factors that potentially influence pricing or impact the number of reviews, thereby empowering hosts and guests with crucial information for their decision-making processes.

Temporal Analysis: Unveiling Trends Over Time
The dataset contained temporal information, such as the last review date. Through temporal analysis, we explored trends over time, seasonal patterns, and any changes in host activity or reviews. This analysis provided valuable insights into the dynamic nature of Airbnb listings and revealed temporal factors that may influence bookings or reviews.

Conclusion:
In conclusion, this project's comprehensive exploratory data analysis of the Airbnb dataset has successfully unveiled key insights and patterns. By following a systematic approach that encompassed data loading, cleaning, descriptive statistics, data visualization, feature engineering, correlation analysis, and temporal analysis, we gained a profound understanding of the dataset's nuances. The analysis yielded actionable insights for hosts to optimize their listings and for guests to make informed decisions when booking stays. Ultimately, this project underscores the significance of EDA in extracting meaningful insights from data

# **GitHub Link -**

https://github.com/KushangShah

# **Problem Statement**


The goal of this project is to analyze the Airbnb dataset and address the following problem:

"How can we gain insights into the factors influencing the pricing and availability of Airbnb listings in a specific location?"

Key Components of the Problem Statement:

1. Pricing Analysis: Identify the factors that significantly impact the pricing of Airbnb listings, such as room type, location, host characteristics, and amenities. Determine the extent to which each factor contributes to pricing variations.

2. Availability Analysis: Investigate the factors affecting the availability of Airbnb listings throughout the year. Analyze seasonal patterns, identify periods of high and low availability, and explore potential correlations between availability and pricing.

3. Location Influence: Examine the influence of specific neighborhoods or neighborhood groups on pricing and availability. Determine whether certain locations have higher demand or are associated with higher prices.

4. Host Impact: Evaluate the impact of host characteristics, such as the number of listings they manage and their hosting history, on pricing and availability. Assess whether experienced or highly-rated hosts tend to charge more or have better availability.

5. Recommendations: Based on the analysis, provide recommendations for both hosts and potential guests. Suggest strategies for hosts to optimize pricing and improve availability based on the identified influential factors. Offer insights for guests to find suitable listings based on pricing and availability patterns.

By addressing this problem, we aim to provide valuable insights and recommendations to both hosts and guests in the Airbnb ecosystem, enabling them to make informed decisions and optimize their experience on the platform.

#### **Define Your Business Objective?**

The primary business objective related to the Airbnb dataset analysis is to maximize the revenue and utilization of Airbnb listings by understanding the factors influencing pricing and availability. This involves:

1. Optimizing Pricing Strategy: Gain insights into the key factors affecting pricing variations for Airbnb listings. By identifying the most influential factors, hosts can strategically set competitive prices to attract guests while maximizing their revenue.

2. Enhancing Listing Availability: Understand the factors impacting the availability of listings throughout the year. By analyzing seasonal patterns and demand fluctuations, hosts can optimize their listing availability to ensure maximum utilization and minimize periods of low occupancy.

3. Improving Guest Experience: Provide valuable insights and recommendations to potential guests regarding suitable listings based on pricing and availability. Enhancing the guest experience contributes to positive reviews, increased bookings, and potentially higher revenue for hosts.

4. Supporting Business Decisions: The analysis of the Airbnb dataset can help inform strategic business decisions related to expansion, investment, and resource allocation. Understanding the market dynamics and influential factors can guide decision-makers in making informed choices to optimize business outcomes.

Ultimately, the business objective is to drive profitability, increase occupancy rates, and improve customer satisfaction within the Airbnb ecosystem by leveraging data-driven insights and recommendations.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

### Dataset First View

In [None]:
# Dataset First Look
airbnb_df=pd.read_csv('/content/drive/MyDrive/CSV files/Airbnb NYC 2019.csv')
airbnb_df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
airbnb_df.shape

### Dataset Information

In [None]:
# Dataset Info
airbnb_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
airbnb_df[airbnb_df.duplicated()].count()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
miss_value = airbnb_df.isna()
miss_value.sum()

In [None]:
# Visualizing the missing values
import missingno as ms
ms.matrix(airbnb_df)
plt.show

### What did you know about your dataset?

Airbnb datasets typically contain information about properties listed on the Airbnb platform, including various attributes and features associated with each listing. Some common fields that might be present in an Airbnb dataset include:  
ID: A unique identifier for each listing.

Name: The title or name of the listing.

Host ID: A unique identifier for the host of the listing.

Host Name: The name of the host.

Neighbourhood Group: The group or category of the neighborhood where the listing is located.

Neighbourhood: The specific neighborhood where the listing is situated.

Latitude: The latitude coordinates of the listing's location.

Longitude: The longitude coordinates of the listing's location.

Room Type: The type of room or accommodation being offered (e.g., entire home/apartment, private room, shared room).

Price: The price per night for the listing.

Minimum Nights: The minimum number of nights required to book the listing.

Number of Reviews: The total number of reviews received for the listing.

Last Review: The date of the last review for the listing.

Reviews per Month: The average number of reviews per month for the listing.

Calculated Host Listings Count: The total number of listings managed by the host.

Availability 365: The number of days the listing is available for booking within a year

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
airbnb_df.columns

In [None]:
# Dataset Describe
airbnb_df.describe()

### Variables Description

1. **ID:** Unique listing identifier.

2. **Name:** Title or brief description of the listing.

3. **Host ID:** Unique identifier for the listing's host.

4. **Host Name:** Name of the host managing the listing.

5. **Neighbourhood Group:** Categorization of the neighborhood.

6. **Neighbourhood:** Specific location or area of the listing.

7. **Latitude/Longitude:** Geographic coordinates of the listing.

8. **Room Type:** Type of accommodation (e.g., entire home, private room).

9. **Price:** Cost per night for booking.

10. **Minimum Nights:** Minimum required nights for booking.

11. **Number of Reviews:** Cumulative count of reviews received.

12. **Last Review:** Date of the most recent review.

13. **Reviews per Month:** Average monthly review count.

14. **Calculated Host Listings Count:** Total number of listings managed by the host.

15. **Availability 365:** Number of days the listing is available in a year.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
airbnb_df.columns

In [None]:
# ID
unique_ids = airbnb_df['id'].nunique()
print("Number of unique IDs:", unique_ids)

In [None]:
# Name
unique_names = airbnb_df['name'].nunique()
print("Unique names:", unique_names)

In [None]:
# Host ID
unique_host_ids = airbnb_df['host_id'].nunique()
print("Number of unique host IDs:", unique_host_ids)

In [None]:
# Host Name
unique_host_names = airbnb_df['host_name'].nunique()
print("Unique host names:", unique_host_names)

In [None]:
# Neighbourhood Group
unique_neighbourhood_groups = airbnb_df['neighbourhood_group'].nunique()
print("Unique neighbourhood groups:", unique_neighbourhood_groups)

In [None]:
# Neighbourhood
unique_neighbourhoods = airbnb_df['neighbourhood'].nunique()
print("Unique neighbourhoods:", unique_neighbourhoods)

In [None]:
# Latitude
unique_latitudes = airbnb_df['latitude'].nunique()
print("Unique latitudes:", unique_latitudes)

In [None]:
# Longitude
unique_longitudes = airbnb_df['longitude'].nunique()
print("Unique longitudes:", unique_longitudes)

In [None]:
# Room Type
unique_room_types = airbnb_df['room_type'].nunique()
print("Unique room types:", unique_room_types)

In [None]:
# Price
unique_prices = airbnb_df['price'].nunique()
print("Unique prices:", unique_prices)

In [None]:
# Minimum Nights
unique_min_nights = airbnb_df['minimum_nights'].nunique()
print("Unique minimum nights:", unique_min_nights)


In [None]:
# Number of Reviews
unique_num_reviews = airbnb_df['number_of_reviews'].nunique()
print("Unique number of reviews:", unique_num_reviews)

In [None]:
# Last Review
unique_last_reviews = airbnb_df['last_review'].nunique()
print("Unique last reviews:", unique_last_reviews)

In [None]:
# Reviews per Month
unique_reviews_per_month = airbnb_df['reviews_per_month'].nunique()
print("Unique reviews per month:", unique_reviews_per_month)

In [None]:
# Calculated Host Listings Count
unique_host_listings_count = airbnb_df['calculated_host_listings_count'].nunique()
print("Unique calculated host listings count:", unique_host_listings_count)

In [None]:
# Availability 365
unique_availabilities = airbnb_df['availability_365'].nunique()
print("Unique availabilities:", unique_availabilities)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
airbnb_df.info()

In [None]:
airbnb_df = airbnb_df.drop(columns=['host_id', 'host_name'],axis=1)

In [None]:
airbnb_df['availability_365'].value_counts()

In [None]:
# replacing 0 with unknown in availability
airbnb_df['availability_365'] = airbnb_df['availability_365'].replace(0, 'Unknown')

# output
airbnb_df['availability_365'].value_counts()

### What all manipulations have you done and insights you found?

1. **Dropping Columns:**
   - **Description:** Removed 'host_id' and 'host_name' columns from `airbnb_df`.
   - **Insights:** Enhances focus on relevant aspects, simplifies analysis by reducing dimensionality.

2. **Replacing 0 with 'Unknown':**
   - **Description:** Replaced 0 in 'availability_365' with 'Unknown' using `replace()` function.
   - **Insights:** Provides a descriptive label for listings with 0 availability, aiding in differentiation and potential analysis.

3. **Checking Updated Value Counts:**
   - **Description:** Checked updated value counts of 'availability_365' after replacement.
   - **Insights:** Examining distribution, identifying patterns, and understanding frequency of different availability periods in the dataset.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

In [None]:
airbnb_df.corr()

#### Chart - 1

In [None]:
# Chart - 1 visualization code

# creating bar graph
g = airbnb_df['neighbourhood_group'].value_counts().plot(kind='bar')
g.set_title('Distribution of listings By Neighbourhood Group')
g.set_xlabel('Neighbourhood Group')
g.set_ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

So that i can visualize the distribution listings across different neighbourhood group

##### 2. What is/are the insight(s) found from the chart?

Identify which neighbourhood groups have the highest number of listings, which can help in targeting specific areas for business opportunities.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, we are able to find positive impact so bases on the result we can focus on that neighbourhood group.

#### Chart - 2

In [None]:
# Chart - 2 visualization code

# creating pie chart
pie = airbnb_df['room_type'].value_counts().plot(kind='pie', autopct='%1.1f%%')
pie.set_title('room type distribution')
#pie.set_xlabel('')
pie.set_ylabel('')
plt.show()

##### 1. Why did you pick the specific chart?

To show the proportion of each room tyoe in the dataset.

##### 2. What is/are the insight(s) found from the chart?

Determine the common room type available, Which can inform Business decisions on property investments or marketing strategies.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Insights from the Airbnb dataset indicate a positive business impact through high demand for entire homes/apartments and private rooms. However, the limited demand for shared rooms suggests potential challenges and slower growth for hosts in that category. Additional factors like pricing and customer preferences should be considered for informed decision-making.

#### Chart - 3

In [None]:
# Chart - 3 visualization code

# creating histogram
plt.figure(figsize=(10,6))
plt.hist(airbnb_df['price'])
plt.title('Price Distribution')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

To see the distribution of listing price

##### 2. What is/are the insight(s) found from the chart?

Understand the price range of listings and identify any outliers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The Airbnb dataset can help create a positive business impact by identifying popular price ranges and optimizing pricing strategies. Overpriced listings and limited availability in affordable ranges may lead to negative growth and should be addressed.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
# Latitude vs. Longitude in Scatter Plot.
plt.figure(figsize=(10, 8))
plt.scatter(airbnb_df['longitude'], airbnb_df['latitude'], alpha=0.6, c='b', edgecolors='k')
plt.title('Latitude vs. Longitude')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.grid(True)
plt.show()


##### 1. Why did you pick the specific chart?

To plot the geographical locations of listings.

##### 2. What is/are the insight(s) found from the chart?

Visualize the spatial distribution of listings and identify patterns based on latitude and longitude.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The scatter plot of latitude vs. longitude does not directly provide insights for creating a positive business impact. Instead, it visualizes the geographic distribution of the Airbnb listings. However, by analyzing the scatter plot in combination with other data and factors, such as popular neighborhoods or proximity to attractions, hosts and businesses can gain insights that may contribute to a positive business impact. These insights can inform decisions related to targeting specific areas, optimizing marketing strategies, and identifying potential growth opportunities.

#### Chart - 5

In [None]:
airbnb_df['neighbourhood_group'].value_counts()

In [None]:
# Chart - 5 visualization code
#Box Plot - Price Distribution by Neighbourhood Group
plt.figure(figsize=(10, 8))
airbnb_df.boxplot(column='price', by='neighbourhood_group')
plt.title('Price Distribution by Neighbourhood Group')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Price')
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

Comparing the price distributions across the different Nrighbourhood Group

##### 2. What is/are the insight(s) found from the chart?

variations in prices between neighbourhood groups.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Insights from the box plot of price distribution by neighborhood group show variations in prices across different areas. Manhattan and Brooklyn have the highest number of listings, indicating potential positive business impact. However, the presence of higher-priced listings in Manhattan may lead to negative growth for budget-conscious travelers in that area.

#### Chart - 6

In [None]:
# Chart - 6 visualization code

# creating line chat - Num of reviews over time
plt.figure(figsize=(10, 6))
airbnb_df.groupby('last_review')['number_of_reviews'].sum().plot(kind='line')
plt.title('No. of Reviews over time')
plt.xlabel('last reviews data')
plt.ylabel('No. of reviews')
plt.show()

##### 1. Why did you pick the specific chart?

To track the cumulative number of reviews over time

##### 2. What is/are the insight(s) found from the chart?

Analyze trends in review activity and assess the popularity of listings over different periods.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, The provided insights from the groupby operation on the "last_review" column, summing the "number_of_reviews," reflect the cumulative reviews for each date

#### Chart - 7

In [None]:
# Chart - 7 visualization code
#tacked Bar Chart - Neighbourhood Group Distribution by Room Type
plt.figure(figsize=(10, 8))
pd.crosstab(airbnb_df['neighbourhood_group'], airbnb_df['room_type']).plot(kind='bar', stacked=True)
plt.title('Neighbourhood Group Distribution by Room Type')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Count')
plt.show()


##### 1. Why did you pick the specific chart?

To visualize the distribution of room types within each neighbourhood group.

##### 2. What is/are the insight(s) found from the chart?

Identify the predominant room types in different neighbourhood groups, enabling targeted marketing or investment decisions.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights from the stacked bar chart of room type distribution by neighborhood group can contribute to a positive business impact. It helps identify popular room types in each neighborhood, enabling hosts to align their offerings with demand

#### Chart - 8

In [None]:
# Chart - 8 visualization code
#Violin Plot - Price Distribution by Neighbourhood Group and Room Type:
plt.figure(figsize=(10, 8))
sns.violinplot(data=airbnb_df, x='neighbourhood_group', y='price', hue='room_type')
plt.title('Price Distribution by Neighbourhood Group and Room Type')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Price')
plt.show()

##### 1. Why did you pick the specific chart?

To compare the price distributions across neighbourhood groups and room types simultaneously.

##### 2. What is/are the insight(s) found from the chart?

Assess the price ranges and variations within neighbourhood groups and room types, identifying potential pricing strategies.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes,gained insights from the violin plot of price distribution by neighborhood group and room type can support creating a positive business impact. It helps identify price ranges that are popular in each neighborhood and for each room type, allowing hosts to optimize pricing strategies. However, if there are limited bookings or high prices in certain room types or neighborhoods, it may result in negative growth due to decreased demand or affordability concerns.

#### Chart - 9

In [None]:
# Chart - 9 visualization
#Scatter Plot - Minimum Nights vs. Price:
plt.figure(figsize=(10, 8))
airbnb_df.plot(kind='scatter', x='minimum_nights', y='price')
plt.title('Minimun Nights vs. Price')
plt.xlabel("Minimum Nights")
plt.ylabel('Price')
plt.show()

##### 1. Why did you pick the specific chart?

 To analyze the relationship between the minimum number of nights and the listing price.

##### 2. What is/are the insight(s) found from the chart?

Assess whether longer minimum stay requirements correlate with higher or lower prices, helping in setting minimum night policies or pricing strategies.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 There is a possibility of negative growth if there is a strong negative correlation between minimum nights and price. If higher prices are associated with longer minimum night requirements, it may deter potential guests, especially those seeking shorter stays. This can lead to a decrease in bookings and potential negative impact on business growth. Hosts should carefully consider the balance between price and minimum night requirements to avoid potential negative consequences.

#### Chart - 10

In [None]:
# Chart - 10 visualization
#Scatter Plot - Number of Reviews vs. Reviews per Month:
plt.figure(figsize=(10, 8))
plt.scatter(airbnb_df['number_of_reviews'], airbnb_df['reviews_per_month'], alpha=0.5, color='blue')
plt.title('Number of Reviews vs. Reviews per Month')
plt.xlabel('Number of Reviews')
plt.ylabel('Reviews per Month')
plt.grid(True)
plt.legend(['Data points'])
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

To explore the relationship between the total number of reviews and the average number of reviews per month

> Indented block



##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights from the data suggest that higher number_of_reviews and reviews_per_month can create a positive business impact by attracting more bookings and indicating guest satisfaction. However, lower values may lead to negative growth, indicating lower guest engagement and potential dissatisfaction.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
neighbourhood_prices = airbnb_df.groupby('neighbourhood')['price'].mean().sort_values()

#line chat - avg. price by neighbourhood
plt.figure(figsize = (12, 6))
neighbourhood_prices.plot(kind='line', marker='o', linestyle='-', color='blue')
plt.title("Average price of listings by Neighbourhood")
plt.xlabel('Neighbourhood')
plt.ylabel('Average price')
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

To compare the average prices across different neighbourhoods.

##### 2. What is/are the insight(s) found from the chart?

Identify neighbourhoods with higher or lower average prices, helping in market analysis or property investment decisions.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights from the average price of listings in different neighborhoods can help create a positive business impact. Lower average prices in some neighborhoods like Bull's Head and Hunts Point may attract budget-conscious travelers and increase bookings. However, higher average prices in neighborhoods like Tribeca and Fort Wadsworth may limit the target audience and potentially lead to negative growth if demand is insufficient.

#### Chart - 12

In [None]:
# Chart - 12 visualization
#Box Plot - Price Distribution by Room Type and Neighbourhood Group
plt.figure(figsize=(10, 8))
sns.boxplot(data=airbnb_df, x='room_type', y='price', hue='neighbourhood_group')
plt.title('Price Distribution by Room Type and Neighbourhood Group')
plt.xlabel('Room Type')
plt.ylabel('Price')
plt.xticks(rotation=45)
plt.legend(title='Neighbourhood Group')
plt.show()

##### 1. Why did you pick the specific chart?

To visualize the price distributions based on both room type and neighbourhood group.

##### 2. What is/are the insight(s) found from the chart?

Assess price variations across room types and neighbourhood groups simultaneously, informing pricing strategies or investment decisions.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

yes,he gained insights help creating a positive business impact

#### Chart - 13

In [None]:
# Chart - 13 visualization
#Bar Chart - Number of Listings by Neighbourhood:
plt.figure(figsize=(10, 8))
airbnb_df['neighbourhood'].value_counts().nlargest(10).plot(kind='bar')
plt.title('Top 10 Neighbourhoods with Highest Listing Counts')
plt.xlabel('Neighbourhood')
plt.ylabel('Listing Count')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

To identify the top 10 neighbourhoods with the highest number of listings.

##### 2. What is/are the insight(s) found from the chart?

 Recognize popular neighbourhoods with a large number of listings, guiding marketing efforts or investment decisions.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the top 10 neighbourhoods with the highest listing counts can potentially create a positive business impact. These neighbourhoods, such as Williamsburg, Bedford-Stuyvesant, and Harlem, have a high demand for Airbnb accommodations, indicating a potential market opportunity. However, without further analysis, it is difficult to determine if there are any specific insights that could lead to negative growth.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(10, 8))
sns.heatmap(airbnb_df.corr(), annot=True, cmap="coolwarm", linewidths=0.5)
plt.title("Correlation Matrix")
plt.show()

##### 1. Why did you pick the specific chart?

The Question said to pick this specific chart. but,
Visualizing correlation matrices through heatmaps is a popular method for exploring variable relationships. Key reasons for using this technique include:

1. **Easy Interpretation:** Heatmaps use color gradients to visually represent correlation values, making it easy to interpret relationships—darker shades for strong positive correlations and lighter shades for strong negative correlations.

2. **Comprehensive Overview:** The heatmap provides a quick, comprehensive view of overall patterns in variable associations, allowing the identification of correlated clusters or weak/no correlations.

3. **Identifying Relationships:** Heatmaps reveal relationships not immediately apparent in numerical values alone, aiding in the identification of strong positive/negative correlations for further analysis or decision-making.

4. **Feature Selection:** Useful for feature selection by identifying highly correlated variables with the target, helping improve model performance and interpretability by choosing less correlated variables.

5. **Data Preprocessing:** Assists in detecting multicollinearity by highlighting high correlations between variables, guiding feature engineering or data preprocessing to address potential impacts on model performance and interpretation.

In summary, the correlation matrix heatmap is a potent visualization tool for understanding variable relationships, uncovering patterns, and making informed decisions in data analysis.

##### 2. What is/are the insight(s) found from the chart?

The insights from `airbnb_df` Correlation Matrix:

1. **Price:**
   - Negative correlation (-0.150019) with 'longitude,' suggesting prices decrease slightly as longitude coordinates increase.
   - Positive correlation (0.042799) with 'minimum_nights,' indicating higher prices associated with longer minimum stay requirements.

2. **Number of Reviews:**
   - Negative correlation (-0.319760) with 'id,' suggesting listings with higher IDs have fewer reviews, potentially due to newer listings.
   - Positive correlation (0.549868) with 'reviews_per_month,' showing listings with more reviews tend to have higher monthly review rates.

3. **Calculated Host Listings Count:**
   - Positive correlation (0.133272) with 'id,' indicating hosts with more listings have higher host IDs.
   - Positive correlation (0.127960) with 'minimum_nights,' suggesting hosts with more listings may have longer minimum stay requirements.

These insights offer valuable understanding of relationships within the `airbnb_df` dataset, providing insights into pricing factors, review dynamics, and host characteristics.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
# Select the columns for the pair plot
columns = ['price', 'minimum_nights', 'number_of_reviews', 'reviews_per_month', 'calculated_host_listings_count']

sns.pairplot(airbnb_df[columns])


##### 1. Why did you pick the specific chart?

1. **Scatter Plots:**
   - **Price and Longitude:** Indicates a potential negative correlation, suggesting a slight decrease in price as longitude coordinates increase.
   - **Price and Minimum Nights:** Shows a scatter with no clear pattern, suggesting a weak correlation between price and minimum nights.
   - **Number of Reviews and ID:** Displays a negative correlation, indicating that listings with higher IDs tend to have fewer reviews.
   - **Number of Reviews and Reviews Per Month:** Shows a positive correlation, suggesting listings with more reviews also have higher monthly review rates.
   - **Calculated Host Listings Count and ID:** Indicates a positive correlation, suggesting hosts with more listings have higher host IDs.
   - **Calculated Host Listings Count and Minimum Nights:** Displays a scatter with no clear pattern, suggesting a weak correlation between host listings count and minimum nights.

2. **Diagonal Histograms:**
   - **Price:** Histogram provides insights into the distribution of prices, helping identify patterns or outliers.
   - **Minimum Nights:** Histogram shows the distribution of minimum nights, aiding in understanding the typical stay requirements.
   - **Number of Reviews:** Histogram illustrates the distribution of review counts, offering insights into the popularity of listings.
   - **Reviews Per Month:** Histogram provides information on the distribution of review rates over months.
   - **Calculated Host Listings Count:** Histogram shows the distribution of the number of listings per host.

3. **Correlation:**
   - **Observation of Scatter Plots:** Positive and negative slopes in scatter plots indicate the direction and strength of correlation between variables.
   - **No Clear Pattern:** Scatter plots with no distinct pattern suggest no significant correlation between the plotted variables.

The pair plot serves as a comprehensive visualization tool, offering insights into variable relationships, distributions, and potential correlations within the dataset.

##### 2. What is/are the insight(s) found from the chart?

The insights gained from the pair plot provide valuable information for data-driven decision-making. Understanding variable relationships, patterns, and correlations is crucial for identifying influential factors, exploring opportunities, and making informed business decisions. This knowledge enables strategic planning and performance evaluation, helping businesses optimize operations and capitalize on areas for improvement.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1. **Optimize Pricing Strategy:**
   - Identify overpriced listings and adjust to enhance competitiveness.
   - Address limited availability in affordable ranges through incentives or discounts.

2. **Focus on Popular Neighbourhoods:**
   - Prioritize marketing efforts on Manhattan and Brooklyn.
   - Highlight unique features of these neighbourhoods to attract potential guests.

3. **Understand Room Type Preferences:**
   - Focus on entire home/apartment and private room listings.
   - Implement competitive pricing and enhance listing features for these room types.

4. **Analyze Review Trends:**
   - Monitor review sentiment to identify areas for improvement.
   - Encourage proactive review collection to enhance overall reputation.

5. **Price Distribution by Neighbourhood Group and Room Type:**
   - Set competitive prices based on trends in each neighbourhood and room type.
   - Differentiate pricing based on demand, adjusting according to market conditions.

These solutions align with business objectives, ensuring effective pricing, targeting popular areas, focusing on preferred accommodations, understanding guest satisfaction, and adapting pricing to market trends. Continuous monitoring and analysis are essential for ongoing refinement and adaptation to changing conditions.

# **Conclusion**

1. **Neighbourhood Distribution:**
   - Concentration of listings in Manhattan and Brooklyn.
   - Opportunity for client to focus efforts and resources on these popular neighbourhoods.

2. **Room Type Preferences:**
   - Majority of listings are entire home/apartment and private room.
   - Emphasizes the importance of catering to preferences for these room types.

3. **Pricing Strategy:**
   - Identification of overpriced listings and limited availability in affordable ranges.
   - Opportunity for client to adjust pricing, address overpricing, and introduce promotions to boost bookings.

4. **Review Trends:**
   - Analysis of reviews and last review dates for gauging guest satisfaction.
   - Emphasis on continuous monitoring, addressing negative feedback, and enhancing overall guest experiences.

5. **Price Distribution by Neighbourhood Group and Room Type:**
   - Insights from violin plot to guide competitive pricing.
   - Aligning prices with market trends for specific neighbourhoods and room types to optimize revenue.

In conclusion, the EDA provides actionable insights for the client to enhance business operations, improve customer experiences, and gain a competitive edge in the Airbnb market. Continued monitoring and adaptation of strategies are essential for ongoing success.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***