<a href="https://colab.research.google.com/github/Negiamit034/Exploratory-Data-Analysis-of-SugarCane-Dataset/blob/main/Capstone_Project_Using_Airbnb_Dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Exploratary Data Analysis Using Airbnb Dataset



##### **Project Type**    - EDA
##### **Contribution**    - Individual

# **Project Summary -**

Title: Exploratory Data Analysis of Airbnb Dataset: Unveiling Insights for Optimal Stays

Introduction:
This project aimed to perform an in-depth exploratory data analysis (EDA) on a comprehensive Airbnb dataset. The dataset consisted of 48,895 entries, each containing 16 informative columns, such as listing details, host information, location attributes, pricing, and reviews. By analyzing this dataset, we sought to unravel key insights and patterns that could enhance the understanding of Airbnb listings and facilitate better decision-making for both hosts and guests.

Data Loading and Understanding
The initial step involved loading the dataset into a Pandas dataframe, enabling us to explore its structure and familiarize ourselves with the variables. Through this process, we gained valuable knowledge about the dataset's size, column types, and potential areas for analysis.

Data Cleaning for Accurate Analysis
Data cleaning is crucial for maintaining data integrity and ensuring accurate analysis. We addressed missing values by implementing appropriate strategies, either through imputation or by removing rows/columns with excessive missing data. Furthermore, we conducted a thorough examination for duplicate entries, efficiently eliminating any redundancies in the dataset.

Descriptive Statistics: Unveiling Central Tendencies and Variability
To gain a comprehensive understanding of the dataset, we computed descriptive statistics for numerical columns, such as price, minimum nights, and number of reviews. By calculating measures like mean, median, minimum, maximum, and quartiles, we obtained a clear picture of the dataset's central tendencies and variability. Concurrently, we explored the frequency distribution of categorical variables, shedding light on the distribution of different categories within the dataset.

Data Visualization: Unleashing Patterns and Trends
Data visualization is a powerful tool that enables us to uncover hidden patterns and trends. Through various visualizations, including histograms, bar charts, scatter plots, and heatmaps, we embarked on a journey to explore relationships between variables and identify noteworthy insights. For instance, visualizations helped us analyze the distribution of prices across different neighborhoods and room types, ultimately enabling us to discern any spatial trends or disparities.

Feature Engineering: Augmenting Analysis Dimensions
To enrich our analysis, we delved into feature engineering. This process involved creating new features or modifying existing ones to extract more meaningful insights. By deriving additional features, such as the host's average reviews per month or the host's total listings, we were able to uncover fresh dimensions of analysis that provided richer context and enhanced our understanding of the dataset.

Correlation Analysis: Discovering Relationships
Correlation analysis was instrumental in identifying relationships between variables. By calculating correlation coefficients and visualizing them through a correlation matrix, we unraveled significant correlations that offered valuable insights. This analysis allowed us to identify factors that potentially influence pricing or impact the number of reviews, thereby empowering hosts and guests with crucial information for their decision-making processes.

Temporal Analysis: Unveiling Trends Over Time
The dataset contained temporal information, such as the last review date. Through temporal analysis, we explored trends over time, seasonal patterns, and any changes in host activity or reviews. This analysis provided valuable insights into the dynamic nature of Airbnb listings and revealed temporal factors that may influence bookings or reviews.

Conclusion:
In conclusion, this project's comprehensive exploratory data analysis of the Airbnb dataset has successfully unveiled key insights and patterns. By following a systematic approach that encompassed data loading, cleaning, descriptive statistics, data visualization, feature engineering, correlation analysis, and temporal analysis, we gained a profound understanding of the dataset's nuances. The analysis yielded actionable insights for hosts to optimize their listings and for guests to make informed decisions when booking stays. Ultimately, this project underscores the significance of EDA in extracting meaningful insights from data

# **GitHub Link -**

https://github.com/Negiamit034

# **Problem Statement**


The goal of this project is to analyze the Airbnb dataset and address the following problem:

"How can we gain insights into the factors influencing the pricing and availability of Airbnb listings in a specific location?"

Key Components of the Problem Statement:

1. Pricing Analysis: Identify the factors that significantly impact the pricing of Airbnb listings, such as room type, location, host characteristics, and amenities. Determine the extent to which each factor contributes to pricing variations.

2. Availability Analysis: Investigate the factors affecting the availability of Airbnb listings throughout the year. Analyze seasonal patterns, identify periods of high and low availability, and explore potential correlations between availability and pricing.

3. Location Influence: Examine the influence of specific neighborhoods or neighborhood groups on pricing and availability. Determine whether certain locations have higher demand or are associated with higher prices.

4. Host Impact: Evaluate the impact of host characteristics, such as the number of listings they manage and their hosting history, on pricing and availability. Assess whether experienced or highly-rated hosts tend to charge more or have better availability.

5. Recommendations: Based on the analysis, provide recommendations for both hosts and potential guests. Suggest strategies for hosts to optimize pricing and improve availability based on the identified influential factors. Offer insights for guests to find suitable listings based on pricing and availability patterns.

By addressing this problem, we aim to provide valuable insights and recommendations to both hosts and guests in the Airbnb ecosystem, enabling them to make informed decisions and optimize their experience on the platform.**

#### **Define Your Business Objective?**

The primary business objective related to the Airbnb dataset analysis is to maximize the revenue and utilization of Airbnb listings by understanding the factors influencing pricing and availability. This involves:

1. Optimizing Pricing Strategy: Gain insights into the key factors affecting pricing variations for Airbnb listings. By identifying the most influential factors, hosts can strategically set competitive prices to attract guests while maximizing their revenue.

2. Enhancing Listing Availability: Understand the factors impacting the availability of listings throughout the year. By analyzing seasonal patterns and demand fluctuations, hosts can optimize their listing availability to ensure maximum utilization and minimize periods of low occupancy.

3. Improving Guest Experience: Provide valuable insights and recommendations to potential guests regarding suitable listings based on pricing and availability. Enhancing the guest experience contributes to positive reviews, increased bookings, and potentially higher revenue for hosts.

4. Supporting Business Decisions: The analysis of the Airbnb dataset can help inform strategic business decisions related to expansion, investment, and resource allocation. Understanding the market dynamics and influential factors can guide decision-makers in making informed choices to optimize business outcomes.

Ultimately, the business objective is to drive profitability, increase occupancy rates, and improve customer satisfaction within the Airbnb ecosystem by leveraging data-driven insights and recommendations.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
airbnb_df=pd.read_csv('/content/drive/MyDrive/Airbnb NYC 2019.csv')

### Dataset First View

In [None]:
airbnb_df # Dataset First Look

### Dataset Rows & Columns count

In [None]:
airbnb_df.shape # Dataset Rows & Columns count

### Dataset Information

In [None]:
airbnb_df.info()     #Dataset Info

#### Duplicate Values

In [None]:
airbnb_df[airbnb_df.duplicated()].count() # Dataset Duplicate Value Count

#### Missing Values/Null Values

In [None]:
missing_value=airbnb_df.isna()   # Missing Values/Null Values Count
missing_value.sum()

In [None]:
import missingno as msno    # Visualizing the missing values
msno.matrix(airbnb_df)
plt.show()

1.Reason for choosing the matrix plot:
The matrix plot is often used as an initial visualization to quickly identify the presence and patterns of missing data across columns and rows. It helps visually highlight the density and distribution of missing values.

2.Insights from the matrix plot:
The insights obtained from the matrix plot may include:

1.Identifying columns with a high concentration of missing values.

2.Observing clusters or patterns of missingness in certain areas of the dataset.

3.Recognizing potential relationships between missing values in different columns or rows.

3.Potential positive business impact of gained insights:
The gained insights can help create a positive business impact by:

1.Guiding data cleaning and preprocessing efforts to handle missing values appropriately.

2.Improving data quality by identifying and addressing areas with a high concentration of missing values.

3.Enhancing decision-making and analysis by ensuring more complete and reliable data.

4.Potential negative growth insights:

If the matrix plot reveals a substantial amount of missing values in critical columns related to essential business factors (e.g., pricing, customer satisfaction), it could lead to negative growth. The insights gained would highlight the need to address data quality issues, as relying on incomplete or unreliable data may result in flawed analyses, inaccurate decision-making, and potentially dissatisfied customers.

### What did you know about your dataset?

Airbnb datasets typically contain information about properties listed on the Airbnb platform, including various attributes and features associated with each listing. Some common fields that might be present in an Airbnb dataset include:  
ID: A unique identifier for each listing.

Name: The title or name of the listing.

Host ID: A unique identifier for the host of the listing.

Host Name: The name of the host.

Neighbourhood Group: The group or category of the neighborhood where the listing is located.

Neighbourhood: The specific neighborhood where the listing is situated.

Latitude: The latitude coordinates of the listing's location.

Longitude: The longitude coordinates of the listing's location.

Room Type: The type of room or accommodation being offered (e.g., entire home/apartment, private room, shared room).

Price: The price per night for the listing.

Minimum Nights: The minimum number of nights required to book the listing.

Number of Reviews: The total number of reviews received for the listing.

Last Review: The date of the last review for the listing.

Reviews per Month: The average number of reviews per month for the listing.

Calculated Host Listings Count: The total number of listings managed by the host.

Availability 365: The number of days the listing is available for booking within a year

## ***2. Understanding Your Variables***

In [None]:
airbnb_df.columns #Dataset columns

In [None]:
airbnb_df.describe() # Dataset Describe

### Variables Description

Description of each variable in your Airbnb dataset:

1. ID: A unique identifier for each listing. It helps to distinguish one listing from another.

2. Name: The title or name of the listing. It provides a brief description or title for the property.

3. Host ID: A unique identifier for the host of the listing. It helps identify the host associated with a particular listing.

4. Host Name: The name of the host. It indicates the name of the person who owns or manages the listing.

5. Neighbourhood Group: The group or category of the neighborhood where the listing is located. It classifies the neighborhood based on a broader grouping or category.

6. Neighbourhood: The specific neighborhood where the listing is situated. It provides the name of the neighborhood or area where the property is located.

7. Latitude: The latitude coordinates of the listing's location. It represents the geographic location on the Earth's surface in terms of latitude.

8. Longitude: The longitude coordinates of the listing's location. It represents the geographic location on the Earth's surface in terms of longitude.

9. Room Type: The type of room or accommodation being offered. It describes the type of space available for booking, such as an entire home/apartment, private room, or shared room.

10. Price: The price per night for the listing. It indicates the cost of booking the property for one night.

11. Minimum Nights: The minimum number of nights required to book the listing. It specifies the minimum duration of stay set by the host.

12. Number of Reviews: The total number of reviews received for the listing. It represents the cumulative count of reviews provided by guests who have stayed at the property.

13. Last Review: The date of the last review for the listing. It indicates the most recent date when a review was posted for the property.

14. Reviews per Month: The average number of reviews per month for the listing. It calculates the average monthly review count based on the total number of reviews and the duration of time the listing has been available.

15. Calculated Host Listings Count: The total number of listings managed by the host. It denotes the count of all the properties the host is managing or offering on the platform.

16. Availability 365: The number of days the listing is available for booking within a year. It represents the total count of days the property is available for booking out of 365 days in a year.

### Check Unique Values for each variable.

In [None]:
airbnb_df.columns

In [None]:
# ID
unique_ids = airbnb_df['id'].nunique()
print("Number of unique IDs:", unique_ids)

# Name
unique_names = airbnb_df['name'].nunique()
print("Unique names:", unique_names)

# Host ID
unique_host_ids = airbnb_df['host_id'].nunique()
print("Number of unique host IDs:", unique_host_ids)

# Host Name
unique_host_names = airbnb_df['host_name'].nunique()
print("Unique host names:", unique_host_names)

# Neighbourhood Group
unique_neighbourhood_groups = airbnb_df['neighbourhood_group'].nunique()
print("Unique neighbourhood groups:", unique_neighbourhood_groups)

# Neighbourhood
unique_neighbourhoods = airbnb_df['neighbourhood'].nunique()
print("Unique neighbourhoods:", unique_neighbourhoods)

# Latitude
unique_latitudes = airbnb_df['latitude'].nunique()
print("Unique latitudes:", unique_latitudes)

# Longitude
unique_longitudes = airbnb_df['longitude'].nunique()
print("Unique longitudes:", unique_longitudes)

# Room Type
unique_room_types = airbnb_df['room_type'].nunique()
print("Unique room types:", unique_room_types)

# Price
unique_prices = airbnb_df['price'].nunique()
print("Unique prices:", unique_prices)

# Minimum Nights
unique_min_nights = airbnb_df['minimum_nights'].nunique()
print("Unique minimum nights:", unique_min_nights)

# Number of Reviews
unique_num_reviews = airbnb_df['number_of_reviews'].nunique()
print("Unique number of reviews:", unique_num_reviews)

# Last Review
unique_last_reviews = airbnb_df['last_review'].nunique()
print("Unique last reviews:", unique_last_reviews)

# Reviews per Month
unique_reviews_per_month = airbnb_df['reviews_per_month'].nunique()
print("Unique reviews per month:", unique_reviews_per_month)

# Calculated Host Listings Count
unique_host_listings_count = airbnb_df['calculated_host_listings_count'].nunique()
print("Unique calculated host listings count:", unique_host_listings_count)

# Availability 365
unique_availabilities = airbnb_df['availability_365'].nunique()
print("Unique availabilities:", unique_availabilities)


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
airbnb_df.info()

In [None]:
airbnb_df=airbnb_df.drop(columns=['host_id','host_name'],axis=1)

In [None]:
airbnb_df['availability_365'].value_counts()

In [None]:
# Replace 0 with 'unknown' in the 'availability_365' column
airbnb_df['availability_365'] = airbnb_df['availability_365'].replace(0,'Unknown')

# Check the updated value counts
airbnb_df['availability_365'].value_counts()

### What all manipulations have you done and insights you found?

1. Dropping Columns:
   - I dropped the 'host_id' and 'host_name' columns from the `airbnb_df` dataframe using the `drop()` function with the `columns` parameter and `axis=1`.
   - Description: This manipulation removes the 'host_id' and 'host_name' columns from the dataset.
   - Insights: By dropping these columns,focusing on other aspects of the dataset that are more relevant to your analysis. This can help reduce the dimensionality of the data and potentially simplify your analysis.

2. Replacing 0 with 'Unknown':
   - I replaced the value 0 in the 'availability_365' column with the string 'Unknown' using the `replace()` function.
   - Description: This manipulation replaces all occurrences of 0 in the 'availability_365' column with the value 'Unknown'.
   - Insights: By replacing 0 with 'Unknown',providing a more descriptive label for listings that have availability of 0 days. This change can help differentiate them from other listings and potentially highlight their unique characteristics or booking restrictions.

3. Checking Updated Value Counts:
   - I checked the updated value counts of the 'availability_365' column after replacing 0 with 'Unknown' using the `value_counts()` function.
   - Description: This step provides the count of unique values in the 'availability_365' column.
   - Insights: By examining the updated value counts, I can gain insights into the distribution of availability periods within the dataset. You can observe the frequency of different availability periods (e.g., Unknown, 365 days) and potentially identify any patterns or trends in the data.

Overall, the manipulations I performed involve removing unnecessary columns and modifying a specific column to provide clearer information. These manipulations can help streamline analysis and provide more meaningful insights by focusing on relevant variables and enhancing the interpretation of the 'availability_365' column.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

In [None]:
airbnb_df.corr()

#### Chart - 1

In [None]:
airbnb_df.info()

In [None]:
#Bar Chart - Neighbourhood Group Distribution:
import matplotlib.pyplot as plt

# Plotting the bar plot
ax = airbnb_df['neighbourhood_group'].value_counts().plot(kind='bar')

# Setting the title and labels
ax.set_title('Distribution of Listings by Neighbourhood Group')
ax.set_xlabel('Neighbourhood Group')
ax.set_ylabel('Count')

# Displaying the plot
plt.show()

##### 1. Why did you pick the specific chart?

To visualize the distribution of listings across different neighbourhood groups.

##### 2. What is/are the insight(s) found from the chart?

Identify which neighbourhood groups have the highest number of listings, which can help in targeting specific areas for business opportunities.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes,By the distrubution of listing across different neighbourhood groups we are able to find psitive impact so bases on the result we can focus on that neighbourhood group.

#### Chart - 2

In [None]:

airbnb_df['room_type'].value_counts()

In [None]:
#Pie Chart - Room Type Distribution:
import matplotlib.pyplot as plt

# Plotting the pie chart
ax = airbnb_df['room_type'].value_counts().plot(kind='pie', autopct='%1.1f%%')

# Setting the title and labels
ax.set_title('Room Type Distribution')
ax.set_ylabel('')

# Displaying the plot
plt.show()

##### 1. Why did you pick the specific chart?

To show the proportion of each room type in the dataset.

##### 2. What is/are the insight(s) found from the chart?

Determine the most common room types available, which can inform business decisions on property investments or marketing strategies.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Insights from the Airbnb dataset indicate a positive business impact through high demand for entire homes/apartments and private rooms. However, the limited demand for shared rooms suggests potential challenges and slower growth for hosts in that category. Additional factors like pricing and customer preferences should be considered for informed decision-making.

#### Chart - 3

In [None]:
import matplotlib.pyplot as plt

# Plotting the histogram
plt.figure(figsize=(10, 6))  # Adjust the figure size if desired
plt.hist(airbnb_df['price'], bins=20, edgecolor='black')

# Setting the title and labels
plt.title('Price Distribution')
plt.xlabel('Price')
plt.ylabel('Frequency')

# Adding grid lines for better readability
plt.grid(True, axis='y')

# Displaying the plot
plt.show()


##### 1. Why did you pick the specific chart?

To visualize the distribution of listing prices.

##### 2. What is/are the insight(s) found from the chart?

Understand the price range of listings and identify any outliers or skewed distributions.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Insights from the price distribution histogram of the Airbnb dataset can help create a positive business impact by identifying popular price ranges and optimizing pricing strategies. Overpriced listings and limited availability in affordable ranges may lead to negative growth and should be addressed.

#### Chart - 4

In [None]:
#Scatter Plot - Latitude vs. Longitude:

# Plotting the scatter plot
plt.figure(figsize=(10, 8))  # Adjust the figure size if desired
plt.scatter(airbnb_df['longitude'], airbnb_df['latitude'], alpha=0.6, c='b', edgecolors='k')

# Setting the title and labels
plt.title('Latitude vs. Longitude')
plt.xlabel('Longitude')
plt.ylabel('Latitude')

# Adding grid lines for better readability
plt.grid(True)

# Displaying the plot
plt.show()


##### 1. Why did you pick the specific chart?

 To plot the geographical locations of listings.

##### 2. What is/are the insight(s) found from the chart?

Visualize the spatial distribution of listings and identify clusters or patterns based on latitude and longitude.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The scatter plot of latitude vs. longitude does not directly provide insights for creating a positive business impact. Instead, it visualizes the geographic distribution of the Airbnb listings. However, by analyzing the scatter plot in combination with other data and factors, such as popular neighborhoods or proximity to attractions, hosts and businesses can gain insights that may contribute to a positive business impact. These insights can inform decisions related to targeting specific areas, optimizing marketing strategies, and identifying potential growth opportunities.

#### Chart - 5

In [None]:
airbnb_df['neighbourhood_group'].value_counts()

In [None]:
#Box Plot - Price Distribution by Neighbourhood Group:
plt.figure(figsize=(10, 8))  # Adjust the figure size if desired
airbnb_df.boxplot(column='price', by='neighbourhood_group')

# Setting the title and labels
plt.title('Price Distribution by Neighbourhood Group')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Price')

# Rotating the x-axis labels for better readability
plt.xticks(rotation=45)

# Displaying the plot
plt.show()


##### 1. Why did you pick the specific chart?

To compare the price distributions across different neighbourhood groups.

##### 2. What is/are the insight(s) found from the chart?

Identify variations in prices between neighbourhood groups, helping in pricing strategies or targeting specific market segments.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Insights from the box plot of price distribution by neighborhood group show variations in prices across different areas. Manhattan and Brooklyn have the highest number of listings, indicating potential positive business impact. However, the presence of higher-priced listings in Manhattan may lead to negative growth for budget-conscious travelers in that area.

#### Chart - 6

In [None]:
#Line Chart - Number of Reviews over Time:
plt.figure(figsize=(10, 6))  # Adjust the figure size if desired
airbnb_df.groupby('last_review')['number_of_reviews'].sum().plot(kind='line')

# Setting the title and labels
plt.title('Number of Reviews over Time')
plt.xlabel('Last Review Date')
plt.ylabel('Number of Reviews')

# Displaying the plot
plt.show()


##### 1. Why did you pick the specific chart?

To track the cumulative number of reviews over time.

##### 2. What is/are the insight(s) found from the chart?

Analyze trends in review activity and assess the popularity or performance of listings over different periods.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes,The provided insights from the groupby operation on the "last_review" column, summing the "number_of_reviews," reflect the cumulative reviews for each date

#### Chart - 7

In [None]:
#tacked Bar Chart - Neighbourhood Group Distribution by Room Type:
plt.figure(figsize=(10, 8))  # Adjust the figure size if desired
pd.crosstab(airbnb_df['neighbourhood_group'], airbnb_df['room_type']).plot(kind='bar', stacked=True)

# Setting the title and labels
plt.title('Neighbourhood Group Distribution by Room Type')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Count')

# Displaying the plot
plt.show()


##### 1. Why did you pick the specific chart?

To visualize the distribution of room types within each neighbourhood group.

##### 2. What is/are the insight(s) found from the chart?

Identify the predominant room types in different neighbourhood groups, enabling targeted marketing or investment decisions.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights from the stacked bar chart of room type distribution by neighborhood group can contribute to a positive business impact. It helps identify popular room types in each neighborhood, enabling hosts to align their offerings with demand

#### Chart - 8

In [None]:
#Violin Plot - Price Distribution by Neighbourhood Group and Room Type:
plt.figure(figsize=(10, 8))  # Adjust the figure size if desired
sns.violinplot(data=airbnb_df, x='neighbourhood_group', y='price', hue='room_type')

# Setting the title and labels
plt.title('Price Distribution by Neighbourhood Group and Room Type')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Price')

# Displaying the plot
plt.show()

##### 1. Why did you pick the specific chart?

To compare the price distributions across neighbourhood groups and room types simultaneously.

##### 2. What is/are the insight(s) found from the chart?

Assess the price ranges and variations within neighbourhood groups and room types, identifying potential pricing strategies.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes,gained insights from the violin plot of price distribution by neighborhood group and room type can support creating a positive business impact. It helps identify price ranges that are popular in each neighborhood and for each room type, allowing hosts to optimize pricing strategies. However, if there are limited bookings or high prices in certain room types or neighborhoods, it may result in negative growth due to decreased demand or affordability concerns.

#### Chart - 9

In [None]:
#Scatter Plot - Minimum Nights vs. Price:
plt.figure(figsize=(10, 8))  # Adjust the figure size if desired
airbnb_df.plot(kind='scatter', x='minimum_nights', y='price')

# Setting the title and labels
plt.title('Minimum Nights vs. Price')
plt.xlabel('Minimum Nights')
plt.ylabel('Price')

# Displaying the plot
plt.show()

##### 1. Why did you pick the specific chart?

 To analyze the relationship between the minimum number of nights and the listing price.

##### 2. What is/are the insight(s) found from the chart?

Assess whether longer minimum stay requirements correlate with higher or lower prices, helping in setting minimum night policies or pricing strategies.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 There is a possibility of negative growth if there is a strong negative correlation between minimum nights and price. If higher prices are associated with longer minimum night requirements, it may deter potential guests, especially those seeking shorter stays. This can lead to a decrease in bookings and potential negative impact on business growth. Hosts should carefully consider the balance between price and minimum night requirements to avoid potential negative consequences.






#### Chart - 10

In [None]:
#Scatter Plot - Number of Reviews vs. Reviews per Month:
plt.figure(figsize=(10, 8))  # Adjust the figure size if desired
plt.scatter(airbnb_df['number_of_reviews'], airbnb_df['reviews_per_month'], alpha=0.5, color='blue')

# Setting the title and labels
plt.title('Number of Reviews vs. Reviews per Month')
plt.xlabel('Number of Reviews')
plt.ylabel('Reviews per Month')

# Customizing the plot
plt.grid(True)
plt.legend(['Data points'])
plt.tight_layout()

# Displaying the plot
plt.show()

##### 1. Why did you pick the specific chart?

To explore the relationship between the total number of reviews and the average number of reviews per month

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights from the data suggest that higher number_of_reviews and reviews_per_month can create a positive business impact by attracting more bookings and indicating guest satisfaction. However, lower values may lead to negative growth, indicating lower guest engagement and potential dissatisfaction.

#### Chart - 11

In [None]:
#Line Chart - Average Price by Neighbourhood
neighbourhood_prices = airbnb_df.groupby('neighbourhood')['price'].mean().sort_values()

# Creating the line chart
plt.figure(figsize=(12, 6))  # Adjust the figure size if desired
neighbourhood_prices.plot(kind='line', marker='o', linestyle='-', color='blue')

# Setting the title and labels
plt.title('Average Price of Listings by Neighbourhood')
plt.xlabel('Neighbourhood')
plt.ylabel('Average Price')

# Customizing the plot
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()

# Displaying the plot
plt.show()

##### 1. Why did you pick the specific chart?

To compare the average prices across different neighbourhoods.

##### 2. What is/are the insight(s) found from the chart?

Identify neighbourhoods with higher or lower average prices, helping in market analysis or property investment decisions.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights from the average price of listings in different neighborhoods can help create a positive business impact. Lower average prices in some neighborhoods like Bull's Head and Hunts Point may attract budget-conscious travelers and increase bookings. However, higher average prices in neighborhoods like Tribeca and Fort Wadsworth may limit the target audience and potentially lead to negative growth if demand is insufficient.

#### Chart - 12

In [None]:
#Box Plot - Price Distribution by Room Type and Neighbourhood Group
plt.figure(figsize=(10, 8))  # Adjust the figure size if desired
sns.boxplot(data=airbnb_df, x='room_type', y='price', hue='neighbourhood_group')

# Setting the title and labels
plt.title('Price Distribution by Room Type and Neighbourhood Group')
plt.xlabel('Room Type')
plt.ylabel('Price')

# Customizing the plot
plt.xticks(rotation=45)
plt.legend(title='Neighbourhood Group')

# Displaying the plot
plt.show()

##### 1. Why did you pick the specific chart?

To visualize the price distributions based on both room type and neighbourhood group.

##### 2. What is/are the insight(s) found from the chart?

Assess price variations across room types and neighbourhood groups simultaneously, informing pricing strategies or investment decisions.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

yes,he gained insights help creating a positive business impact

#### Chart - 13

In [None]:
#Bar Chart - Number of Listings by Neighbourhood:
plt.figure(figsize=(10, 8))  # Adjust the figure size if desired
airbnb_df['neighbourhood'].value_counts().nlargest(10).plot(kind='bar')

# Setting the title and labels
plt.title('Top 10 Neighbourhoods with Highest Listing Counts')
plt.xlabel('Neighbourhood')
plt.ylabel('Listing Count')

# Rotating the x-axis labels for better readability
plt.xticks(rotation=45)

# Displaying the plot
plt.show()

##### 1. Why did you pick the specific chart?

To identify the top 10 neighbourhoods with the highest number of listings.

##### 2. What is/are the insight(s) found from the chart?

 Recognize popular neighbourhoods with a large number of listings, guiding marketing efforts or investment decisions.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the top 10 neighbourhoods with the highest listing counts can potentially create a positive business impact. These neighbourhoods, such as Williamsburg, Bedford-Stuyvesant, and Harlem, have a high demand for Airbnb accommodations, indicating a potential market opportunity. However, without further analysis, it is difficult to determine if there are any specific insights that could lead to negative growth.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(10, 8))
sns.heatmap(airbnb_df.corr(), annot=True, cmap="coolwarm", linewidths=0.5)
plt.title("Correlation Matrix")
plt.show()

##### 1. Why did you pick the specific chart?

Visualizing the correlation matrix as a heatmap is a commonly used technique to explore the relationships between variables in a dataset. Here's why this chart is chosen:

1. Easy Interpretation: A heatmap provides a visual representation of the correlation values between variables using color gradients. This makes it easy to interpret the strength and direction of the relationships between variables. Strong positive correlations are typically represented by darker shades, while strong negative correlations are represented by lighter shades.

2. Comprehensive Overview: The correlation matrix heatmap allows you to quickly grasp the overall patterns of associations between variables in a single chart. By examining the entire matrix, you can identify clusters of variables that are highly correlated or variables that have weak or no correlations with others.

3. Identifying Relationships: Heatmaps help in identifying relationships that may not be immediately evident from numerical values alone. You can identify variables that have strong positive or negative correlations, which can guide further analysis or decision-making.

4. Feature Selection: Heatmaps can assist in feature selection by identifying variables that are highly correlated with the target variable. Highly correlated variables may provide redundant or overlapping information, and selecting a subset of variables with lower correlations can help improve model performance and interpretability.

5. Data Preprocessing: Heatmaps can be useful in identifying variables with high correlations, indicating potential multicollinearity. Multicollinearity can impact the performance and interpretation of regression models, and detecting such relationships can guide feature engineering or data preprocessing steps.

Overall, the correlation matrix heatmap is a powerful visualization tool that allows you to understand the relationships between variables in a concise and intuitive manner. It helps in uncovering patterns, identifying dependencies, and making informed decisions during data analysis.

##### 2. What is/are the insight(s) found from the chart?

Based on the correlation matrix of the `airbnb_df` dataset, here are some insights that can be derived:

1. Price:
   - There is a negative correlation (-0.150019) between 'price' and 'longitude', suggesting that as the longitude coordinates increase, the price tends to decrease slightly.
   - There is a positive correlation (0.042799) between 'price' and 'minimum_nights', indicating that higher prices are associated with longer minimum stay requirements.

2. Number of Reviews:
   - There is a negative correlation (-0.319760) between 'number_of_reviews' and 'id', suggesting that listings with higher IDs tend to have fewer reviews. This relationship could be influenced by the fact that newer listings have had less time to accumulate reviews.
   - There is a positive correlation (0.549868) between 'number_of_reviews' and 'reviews_per_month', indicating that listings with higher review counts tend to have higher monthly review rates.

3. Calculated Host Listings Count:
   - There is a positive correlation (0.133272) between 'calculated_host_listings_count' and 'id', indicating that hosts with more listings tend to have higher host IDs.
   - There is a positive correlation (0.127960) between 'calculated_host_listings_count' and 'minimum_nights', suggesting that hosts with more listings may have longer minimum stay requirements.

These insights provide some understanding of the relationships between variables in the dataset.

#### Chart - 15 - Pair Plot

In [None]:
# Select the columns for the pair plot
columns = ['price', 'minimum_nights', 'number_of_reviews', 'reviews_per_month', 'calculated_host_listings_count']

# Create the pair plot
sns.pairplot(airbnb_df[columns])


##### 1. Why did you pick the specific chart?

I picked the pair plot because it allows us to visualize the relationships between multiple variables in a single plot. By plotting each variable against every other variable, we can quickly identify any potential correlations or patterns between them.

From the pair plot, we can gain several insights:

1. Scatter plots: The scatter plots show the relationship between two variables. We can observe if there is a linear or non-linear relationship between variables such as price, minimum nights, number of reviews, reviews per month, and calculated host listings count.

2. Diagonal histograms: The histograms on the diagonal show the distribution of each variable individually. We can examine the distribution of variables such as price, minimum nights, number of reviews, reviews per month, and calculated host listings count to identify any patterns or outliers.

3. Correlation: By examining the scatter plots and observing the general trend of the data points, we can get an idea of the correlation between variables. Positive correlation is indicated by a positive slope in the scatter plot, while negative correlation is indicated by a negative slope. No significant correlation is indicated by a scatter plot with no clear pattern.

##### 2. What is/are the insight(s) found from the chart?

The insights gained from the pair plot can help in understanding the relationships between variables and identifying potential patterns or trends in the data. This information can be useful for making data-driven decisions, identifying influential factors, and exploring potential opportunities or areas of improvement in the business.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***