# **Project Name**    - AirBnb Capstone Project SQL Pandas



In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
!pip install pandas pandasql

##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual(SUMIT Kaushik)


# **Project Summary -**

The analysis of Airbnb data for New York City provides valuable insights into pricing strategies, booking patterns, and guest preferences across different neighbourhoods and room types. By leveraging these insights, property managers and hosts can make informed decisions to enhance guest satisfaction, optimize pricing, and improve occupancy rates. This data-driven approach not only helps in maximizing revenue but also in delivering better customer experiences in a competitive market.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


The objective of this project is to analyze Airbnb data for New York City to uncover actionable insights related to pricing, booking patterns, and guest preferences across various neighbourhoods and room types. The goal is to address several key questions and challenges faced by Airbnb hosts and property managers:

1. **How do average prices for Airbnb listings vary across different neighbourhoods in New York City?**

2. **What are the average minimum nights required for different room types, and how does this vary by neighbourhood group?**

3. **What is the average number of reviews received for different room types, and how does this vary by neighbourhood group?**



# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


5. You have to create at least 20 logical & meaningful charts having important insights.

[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]







# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sqlite3
import missingno as msno
import seaborn as sns


### Dataset Loading

In [None]:
# Load Dataset
df=pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Data Sets/Airbnb/Airbnb_NYC.csv")


In [None]:
conn = sqlite3.connect('Airbnb_NYC.db')

df.to_sql('Airbnb_NYC',conn,if_exists='replace',index = False)

conn.close()

In [None]:
conn = sqlite3.connect('Airbnb_NYC.db')
conn.commit()
user = pd.read_sql_query('''select * from Airbnb_NYC''',conn)

### Dataset First View

In [None]:
# Dataset First Look
pd.read_sql_query('''

select * from Airbnb_NYC

''',conn)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print(df.shape)
print(df.columns)

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_values = df.isnull().sum()
print(missing_values)

In [None]:
!pip install missingno

In [None]:
# Visualizing the missing values
msno.matrix(df)

### What did you know about your dataset?

Answer Here

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
cursor = conn.cursor()
cursor.execute('''
ALTER TABLE Airbnb_NYC
DROP COLUMN last_review
''')
cursor.execute('''
ALTER TABLE Airbnb_NYC
DROP COLUMN reviews_per_month
''')

# Commit the changes
conn.commit()


In [None]:
cursor = conn.cursor()
cursor.execute('''
ALTER TABLE Airbnb_NYC
DROP COLUMN id
''')


# Commit the changes
conn.commit()


In [None]:
df1 = pd.read_sql_query('''SELECT * FROM Airbnb_NYC''', conn)
df1.columns

In [None]:
pd.read_sql_query('''
select name , host_id from Airbnb_NYC
''', conn)

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
df1.columns

In [None]:
# Chart - 1 visualization code
chart1 = pd.read_sql_query('''
SELECT neighbourhood_group, AVG(price) AS Average_Price, room_type
FROM Airbnb_NYC
GROUP BY neighbourhood_group, room_type;
''', conn)

# Plotting with Seaborn
plt.figure(figsize=(5, 3))
sns.barplot(data=chart1, x='neighbourhood_group', y='Average_Price', hue='room_type')
plt.title('Average Price by Neighbourhood Group')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Average Price')
plt.legend(title='Room Type')
plt.xticks(rotation=45)  # Rotate x-axis labels if needed
plt.show()

##### 1. Why did you pick the specific chart?

I picked a bar plot for your chart because it’s effective for comparing discrete categories. Given your data involves grouping by neighbourhood_group with room_type as a secondary factor, a bar plot is a natural choice for displaying these comparisons..

##### 2. What is/are the insight(s) found from the chart?

From the bar plot of average price by neighbourhood group and room type, several insights can be gleaned:

1. Price Differences by Neighbourhood Group:
Higher or Lower Prices: You can identify which neighbourhood groups generally have higher or lower average prices. For instance, if one neighbourhood group consistently shows higher average prices compared to others, it might indicate that it is a more expensive area.

2. Patterns Across Neighbourhoods:
Consistent Trends: You might observe consistent patterns across neighbourhood groups. For example, if the price difference between room types is consistent across all neighbourhood groups, it suggests that the relative pricing of room types is similar regardless of location.
Anomalies or Outliers: Some neighbourhoods might have unique pricing patterns. For example, a specific neighbourhood might have very high prices for all room types, or very low prices compared to others.

3. Economic Insights:
Market Segment Analysis: By comparing prices across room types and neighbourhoods, you can gain insights into different market segments. For example, if higher-end neighbourhoods have high average prices for entire homes but lower prices for private rooms, it could indicate that entire homes are more sought after in those areas.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


The insights gained from the bar plot of average price by neighbourhood group and room type can indeed have a significant impact on business decisions, both positive and potentially negative. Here’s how they might influence your business:

Positive Business Impact:
Informed Pricing Strategy:

Optimal Pricing: By understanding which neighbourhoods and room types command higher prices, you can set competitive rates for your properties. If you notice that entire homes in upscale areas have higher average prices, you might price your own properties accordingly to maximize revenue.
Dynamic Pricing: Insights into price variations can help you implement dynamic pricing strategies, adjusting rates based on neighbourhood demand and room type trends.
Targeted Marketing:

Focus on High-Demand Areas: If certain neighbourhoods consistently show higher average prices, you might focus your marketing efforts on these areas to attract guests willing to pay more.
Highlight Premium Features: For neighbourhoods with high average prices, emphasize the unique or premium features of your properties to justify higher rates.
Strategic Investment Decisions:

Identify Growth Areas: High average prices in certain neighbourhoods might indicate growth potential. Investing in properties in these areas could yield high returns.
Diversify Portfolio: Understanding which room types are more profitable in various neighbourhoods allows you to diversify your property portfolio strategically.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
chart2 = pd.read_sql_query('''
select room_type , avg(minimum_nights) as Avg_Minimum_Nights from Airbnb_NYC
group by room_type
''', conn)
plt.figure(figsize=(4, 4))
sns.barplot(data=chart2, x='room_type', y='Avg_Minimum_Nights', palette='viridis')

# Customizing the plot
plt.title('Average Minimum Nights by Room Type')
plt.xlabel('Room Type')
plt.ylabel('Average Minimum Nights')
plt.xticks(rotation=45)  # Rotate x-axis labels if needed
plt.show()

##### 1. Why did you pick the specific chart?

Answer. **Data Structure:** You have categorical data (room_type) and a numerical summary (Avg_Minimum_Nights). A bar chart is ideal for comparing numerical values across categories.

**Purpose: **The bar chart effectively shows how average minimum nights differ across different room types, making it easy to compare these differences visually.

##### 2. What is/are the insight(s) found from the chart?

**1. Room Type Comparison**
Insight: You can see how the average number of minimum nights required varies between different room types. For example, if "Entire home/apt" has a higher average minimum stay compared to "Private room" and "Shared room," it indicates that entire homes or apartments typically have longer minimum stay requirements.

**2. Patterns and Trends**
Insight: Identifying which room types require longer or shorter minimum stays can help in understanding market demand or pricing strategies. For instance, if "Entire home/apt" requires more nights on average, it may suggest that these accommodations are aimed at longer-term stays or are more likely to attract vacationers who prefer renting a whole unit.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from analyzing the "Average Minimum Nights by Room Type" can certainly contribute to positive business impacts, but they also have the potential to reveal challenges or lead to negative growth if not properly addressed. Here's how the insights can be both beneficial and potentially detrimental, along with specific justifications:

**Positive Business Impact**

**Optimizing Pricing Strategies:**

Insight: Understanding that "Entire home/apt" has a higher average minimum stay can help in setting appropriate pricing strategies. For example, if you know these units are in demand for longer stays, you might offer discounts for extended bookings or adjust rates based on seasonality and booking patterns.

**Impact:** Optimized pricing can increase revenue and attract the right customer segments.
Targeted Marketing:

**Insight:** If "Private room" has a lower average minimum stay, it might be popular among short-term travelers or budget-conscious guests. Tailoring marketing efforts to highlight affordability and flexibility can attract this demographic.
**Impact:** Targeted marketing strategies can improve booking rates and customer satisfaction.
Improving Operational Efficiency:

**Insight:** By understanding which room types have different minimum stay requirements, property managers can make strategic decisions about property usage, such as converting or adjusting room types to meet market demands.


#### Chart - 3

In [None]:
# Chart - 3 visualization code
chart3 = pd.read_sql_query('''
SELECT neighbourhood_group, AVG(price) AS average_price
FROM Airbnb_NYC
GROUP BY neighbourhood_group;
''', conn)
plt.figure(figsize=(12, 6))
sns.barplot(data=chart3, x='neighbourhood_group', y='average_price', palette='viridis')

# Customizing the plot
plt.title('Average Price by Neighbourhood_group')
plt.xlabel('Neighbourhood_group')
plt.ylabel('Average Price')
plt.xticks(rotation=45)  # Rotate x-axis labels if they overlap
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart is ideal for comparing numerical values across different categories. It effectively displays the size of each category in relation to others. In this case, it shows how the average price varies among different neighbourhood groups.

##### 2. What is/are the insight(s) found from the chart?

**Price Distribution Across Neighbourhoods:**

**Insight:** The chart shows which neighbourhoods have higher or lower average prices. For example, if the bars for certain neighbourhoods are significantly higher than others, it indicates that these areas have higher average rental prices.

**Market Segmentation:**

Insight: By examining the average prices, you can segment the market into high-end, mid-range, and budget-friendly neighbourhoods. This segmentation helps in targeting different customer segments more effectively.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Enhanced Pricing Strategies:**

**Impact:** Adjusting prices according to neighbourhood trends can optimize revenue. For instance, charging a premium in high-demand areas or offering competitive rates in less expensive neighbourhoods can attract more bookings and increase profitability.
Targeted Marketing Efforts:

**Impact:** Tailoring marketing messages to highlight the unique features of properties in high-priced areas (e.g., luxury amenities) or budget-friendly options in lower-priced areas can attract the right audience, improving booking rates.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
chart4 = pd.read_sql_query('''
SELECT neighbourhood_group, AVG(minimum_nights) AS average_minimum_nights
FROM Airbnb_NYC
GROUP BY neighbourhood_group;
''', conn)
plt.figure(figsize=(8, 4))
sns.barplot(data=chart4, x='neighbourhood_group', y='average_minimum_nights', palette='viridis')

# Customizing the plot
plt.title('Average minimum_nights by Neighbourhood_Group')
plt.xlabel('Neighbourhood')
plt.ylabel('Average minimum_nights_stay')
plt.xticks(rotation=45)  # Rotate x-axis labels if they overlap
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart effectively compares the average minimum nights required across different categories (neighbourhood groups). Each bar represents a neighbourhood group, and its height indicates the average minimum nights.

##### 2. What is/are the insight(s) found from the chart?

**Insights from the Chart**

**Minimum Stay Requirements by Neighbourhood:**

**Insight:** The chart shows which neighbourhood groups require longer or shorter minimum stays. For example, if a neighbourhood group has taller bars, it indicates a higher average minimum stay requirement.

**Insight:** Different neighbourhoods may cater to different types of travelers. For instance, neighbourhoods with higher average minimum nights might be targeted at vacationers or long-term stays, while those with lower averages might attract short-term visitors.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**

**Optimized Booking Policies:**

**Impact:** Adjusting policies to fit neighbourhood-specific patterns can lead to higher occupancy rates and improved guest satisfaction. For example, if a neighbourhood has a higher average minimum stay, offering incentives or promotions for longer stays could attract more guests.
Targeted Marketing Strategies:

**Impact:** Tailoring marketing messages based on these insights can attract the right audience and increase bookings. For instance, emphasizing the convenience and flexibility of shorter stays in certain areas can appeal to travelers seeking brief accommodations.
Enhanced Operational Efficiency:

**Impact:** Efficient operations aligned with neighbourhood trends can improve guest experiences, reduce operational costs, and increase overall profitability.
Strategic Pricing Adjustments:

Impact: Adjusting pricing strategies based on minimum stay requirements can optimize revenue and align prices with market conditions.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
chart5 = pd.read_sql_query('''
SELECT room_type, AVG(number_of_reviews) AS average_reviews
FROM Airbnb_NYC
GROUP BY room_type;
''', conn)
plt.figure(figsize=(4, 3))
sns.barplot(data=chart5, x='room_type', y='average_reviews', palette='viridis')

# Customizing the plot
plt.title('Average reviews by room_type')
plt.xlabel('room_type')
plt.ylabel('Average reviews')
plt.xticks(rotation=45)  # Rotate x-axis labels if they overlap
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart is ideal for comparing average values across different categories. It allows you to see which room types receive more or fewer reviews on average.

##### 2. What is/are the insight(s) found from the chart?

**Insights from the Chart**

**Performance by Room Type:**

**Insight:** The chart will reveal how different room types perform in terms of the number of reviews. For example, if the bar for “Entire Home” is significantly higher than for “Private Room” or “Shared Room,” it indicates that entire homes tend to receive more reviews on average.

**Guest Preferences:**

**Insight:** Higher average reviews for certain room types might indicate higher guest satisfaction or preferences for those types of accommodations.

**Insight:** Room types with higher average reviews might reflect better guest experiences or higher-quality accommodations.

Marketing and Pricing Strategy:

Insight: If certain room types receive significantly more reviews, it might be beneficial to adjust marketing strategies and pricing to highlight these options more prominently.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Impacts**

**Impact:** By promoting room types with higher average reviews, you can leverage positive guest experiences to attract more bookings. For example, if "Entire Home" options consistently receive more reviews, marketing these as premium offerings can appeal to guests looking for higher-quality or more private accommodations.

**Impact:** Focused improvements on room types with lower average reviews can enhance overall guest satisfaction. Investing in upgrades or better service for these room types can lead to improved reviews and higher occupancy rates over time.

**Impact:** Adjusting pricing strategies to reflect the positive feedback for these room types can maximize revenue. For instance, if "Entire Home" options have high average reviews, pricing these higher could be justified by the higher guest satisfaction and demand.
**Potential Negative Impacts**

**Neglecting Lower-Performing Room Types:**

Impact: If significant improvements are not made for room types with lower average reviews, these types may continue to underperform, leading to lost revenue opportunities and potential dissatisfaction among guests who choose these options.
**Overemphasis on Review Metrics:**

**Impact: **Relying solely on average reviews without considering other metrics (like booking frequency or total number of reviews) might lead to skewed strategies. For instance, a room type with high average reviews but a small sample size may not be as reliable an indicator of general guest satisfaction.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
chart5 = pd.read_sql_query('''
SELECT neighbourhood_group, AVG(room_type) AS average_reviews
FROM Airbnb_NYC
GROUP BY neighbourhood_group;
''', conn)
plt.figure(figsize=(6, 3))
sns.pairplot(data=chart5, x='neighbourhood_group', y='room_type', palette='viridis')

# Customizing the plot
plt.title('Average reviews by room_type')
plt.xlabel('neighbourhood_group')
plt.ylabel('Average room_type count')
plt.xticks(rotation=45)  # Rotate x-axis labels if they overlap
plt.show()

##### 1. Why did you pick the specific chart?

Bar plots effectively display comparisons between different categories, making it easy to compare average reviews for different room types within each neighbourhood group.

##### 2. What is/are the insight(s) found from the chart?

**Insights from the Chart**

**Performance by Room Type:**

**Insight:** The chart will reveal how different room types perform in terms of the number of reviews. For example, if the bar for “Entire Home” is significantly higher than for “Private Room” or “Shared Room,” it indicates that entire homes tend to receive more reviews on average.

**Guest Preferences:**

**Insight:** Higher average reviews for certain room types might indicate higher guest satisfaction or preferences for those types of accommodations.

**Insight:** Room types with higher average reviews might reflect better guest experiences or higher-quality accommodations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:** Understanding review trends by room type and neighbourhood group can lead to more effective marketing, operational improvements, and better alignment with guest preferences.

**Potential Negative Impacts:** Misinterpreting or incorrectly analyzing review data could lead to misguided strategies. Ensure the data is accurately represented and interpreted to avoid potential negative outcomes.

#### Chart - 13

In [None]:
df1.head()

In [None]:
# Chart - 13 visualization code
chart7 = pd.read_sql_query('''
SELECT AVG(availability_365) AS Average_Availability, neighbourhood_group, AVG(price) AS Average_price
FROM Airbnb_NYC
GROUP BY neighbourhood_group;
''', conn)
fig, ax1 = plt.subplots(figsize=(12, 8))

# Plotting Average Availability
ax1.set_xlabel('Neighbourhood Group')
ax1.set_ylabel('Average Availability (Days per Year)', color='tab:blue')
sns.lineplot(data=chart7, x='neighbourhood_group', y='Average_Availability', ax=ax1, color='tab:blue', marker='o')
ax1.tick_params(axis='y', labelcolor='tab:blue')

# Create a second y-axis for Average Price
ax2 = ax1.twinx()
ax2.set_ylabel('Average Price', color='tab:red')
sns.lineplot(data=chart7, x='neighbourhood_group', y='Average_price', ax=ax2, color='tab:red', marker='o')
ax2.tick_params(axis='y', labelcolor='tab:red')

plt.title('Dual Axis Line Plot of Average Availability and Average Price by Neighbourhood Group')
plt.xticks(rotation=45)
plt.grid(True)
plt.show()



##### 1. Why did you pick the specific chart?

**Comparing Two Metrics Simultaneously:**

The Dual Axis Line Plot allows you to display two different variables on the same chart but with separate y-axes. This is crucial when you need to compare metrics that are on different scales or units. In your case, Average_Availability (measured in days) and Average_price (measured in currency) are on different scales and cannot be directly compared on a single y-axis.

##### 2. What is/are the insight(s) found from the chart?

**Overall Trends:** Observe the trends for both average availability and average price. For instance, if both metrics increase or decrease together, it may suggest a related trend where higher availability correlates with higher or lower prices.

**Neighbourhood Specific Trends:** Compare trends across individual neighbourhood groups. Some neighbourhoods may show distinct patterns in availability and price, which could be influenced by local factors or market conditions.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact**

**1. Optimizing Pricing Strategies:**

Insight: If the chart reveals that neighbourhoods with higher average availability have lower average prices, property managers and investors can adjust pricing strategies. By aligning prices more closely with demand and availability, they can optimize their revenue. For example, in areas with high availability but low prices, increasing the price might be feasible if the demand increases.

**Business Impact:** This optimization can lead to improved profitability and better alignment with market demand, ultimately enhancing revenue.

**2. Targeted Marketing Efforts:**

**Insight:** Identifying neighbourhoods with high prices and low availability can help in targeting marketing efforts. By focusing on areas with higher demand, businesses can effectively promote their properties and attract more bookings.
Business Impact: Better-targeted marketing can increase occupancy rates and attract high-value clients, contributing to business growth.
**3. Strategic Investment Decisions:**

**Insight:** The chart helps in recognizing areas with high demand (low availability) and high prices. Investors can target such high-demand areas for future investments, leading to potentially higher returns.
Business Impact: Investing in high-demand neighbourhoods can yield better returns and reduce risks associated with oversupply in less desirable areas.

**Potential Negative Growth**
1. Oversupply and Low Prices:

Insight: If the data shows that neighbourhoods with high average availability also have low average prices, it indicates an oversupply situation. This oversupply can lead to decreased revenue per listing.
Negative Impact: Businesses might struggle with lower profitability in these areas. If prices are too low to cover operational costs, it can lead to financial losses and negatively impact overall growth.
**2. High Prices and Low Availability:**

**Insight:** Neighbourhoods with high prices and low availability might indicate a saturated market with high entry barriers. High prices combined with low availability could mean that customers are less willing to pay premium rates, especially if alternatives are available.
Negative Impact: Businesses operating in these areas might face challenges in maintaining high occupancy rates, leading to potential revenue loss or lower growth.
**3. Misalignment with Market Trends:**

**Insight: If trends show a disconnect between average availability and pricing, businesses might be misaligned with current market conditions. For example, maintaining high prices in an area with increasing availability might lead to decreased demand.
Negative Impact: Misalignment with market trends can result in reduced bookings and revenue, as potential customers might turn to more competitively priced options.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
sns.pairplot(df1)
plt.suptitle('Pair Plot of the Dataset', y=1.02, fontsize=16)
plt.show()

##### 1. Why did you pick the specific chart?

I picked the pair plot (scatterplot matrix), because it allows for a comprehensive visualization of the relationships between pairs of variables in a dataset. This is particularly useful when exploring a dataset with multiple numeric variables, as it provides a quick and visual way to identify patterns, correlations, and potential outliers in the data.

##### 2. What is/are the insight(s) found from the chart?

<ul>
<li>
<p><strong>Correlation</strong>: The pair plot helps identify linear relationships between pairs of variables. For example, if points in a scatter plot tend to form a straight line, it indicates a strong linear relationship.</p>
</li>
<li>
<p><strong>Distribution</strong>: The diagonal plots show the distribution of each variable. Deviations from normality or unusual patterns can be observed, which may indicate the need for further investigation.</p>
</li>
<li>
<p><strong>Outliers</strong>: Outliers can be identified as points that fall far away from the main clusters in scatter plots.</p>
</li>
<li>
<p><strong>Variable Importance</strong>: By observing the scatter plots and histograms, we can get a sense of which variables might be important in predicting the target variable.</p>
</li>
<li>
<p>&nbsp;</p>
</li>
</ul>

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization
numeric_cols = df1.select_dtypes(include=['int64', 'float64']).columns
correlation_matrix = df1[numeric_cols].corr()

plt.figure(figsize=(12, 10))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=.5)
plt.title('Correlation Heatmap', fontsize=16)
plt.show()

##### 1. Why did you pick the specific chart?

<ul>
<li>
<p>I chose a correlation heatmap for the following reasons:</p>
<ol>
<li>
<p><strong>Visualizing Relationships</strong>: A heatmap is effective for visualizing the strength and direction of relationships between numeric variables in a dataset.</p>
</li>
<li>
<p><strong>Matrix Representation</strong>: The heatmap represents the correlation matrix, which is a concise way to display pairwise correlations.</p>
</li>
<li>
<p><strong>Color Mapping</strong>: The use of colors (e.g., 'coolwarm' colormap) helps in quickly identifying positive and negative correlations.</p>
</li>
<li>
<p><strong>Annotation</strong>: Annotating the heatmap with correlation coefficients provides detailed information without overcrowding the visualization.</p>
</li>
</ol>
</li>
</ul>

##### 2. What is/are the insight(s) found from the chart?

<ul>
<li>
<p><strong>Strong Correlations</strong>: Strong positive correlations (values close to 1) indicate that as one variable increases, the other variable tends to increase as well. Strong negative correlations (values close to -1) indicate that as one variable increases, the other variable tends to decrease.</p>
</li>
<li>
<p><strong>Weak Correlations</strong>: Weak correlations (values close to 0) suggest little to no linear relationship between variables.</p>
</li>
<li>
<p><strong>Correlation Patterns</strong>: Patterns in the heatmap can reveal relationships between variables that may be of interest for further analysis. For example, high positive correlations between certain features may indicate redundancy in the dataset, while high negative correlations may suggest potential trade-offs.</p>
</li>
<li>
<p>&nbsp;</p>
</li>
</ul>

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

**Summary of Recommendations:**

**Dynamic Pricing:** Adjust rates based on availability and demand.

**Targeted Marketing:** Focus on high-demand areas and tailor messages to different customer segments.

**Strategic Investment:** Invest in high-demand and emerging neighbourhoods.

**Operational Efficiency:** Optimize property management and improve review management.

**Customer Experience:** Personalize guest experiences and address common feedback.

# **Conclusion**

<p><strong>After conducting a thorough analysis of the Airbnb dataset, the following conclusions can be drawn:</strong></p>
<ol>
    <li>
        <p><strong>Pricing and Availability Dynamics</strong>:</p>
        <ul>
            <li><strong>Dual Axis Line Plot</strong>: The relationship between average price and average availability across neighbourhoods reveals how supply and demand interact. Neighbourhoods with low availability and high prices are in high demand, while areas with high availability and low prices might be oversupplied. This insight underscores the importance of dynamic pricing strategies to optimize revenue based on market conditions.</li>
        </ul>
    </li>
    <li>
        <p><strong>Market Segmentation and Positioning</strong>:</p>
        <ul>
            <li><strong>Grouped Bar Plot and Scatter Plot</strong>: These plots highlight variations in key metrics across different neighbourhood groups. High-demand areas with lower availability might justify higher prices, whereas areas with high availability could benefit from competitive pricing or targeted promotions. Effective market segmentation and positioning can enhance competitiveness and attract a diverse clientele.</li>
        </ul>
    </li>
    <li>
        <p><strong>Investment and Growth Opportunities</strong>:</p>
        <ul>
            <li><strong>Correlation Heatmap</strong>: This visualization helps identify which variables are strongly correlated, guiding investment decisions. For instance, investing in neighbourhoods with high demand (low availability) and high prices can yield better returns. Additionally, understanding correlations between operational metrics and performance indicators helps in making informed investment choices.</li>
        </ul>
    </li>
    <li>
        <p><strong>Operational Efficiency and Customer Satisfaction</strong>:</p>
        <ul>
            <li><strong>Correlation Heatmap and Average Reviews by Room Type</strong>: Insights into how operational factors and customer reviews correlate with pricing and availability can guide improvements. Addressing operational inefficiencies and focusing on improving review scores can enhance overall property performance and customer satisfaction.</li>
        </ul>
    </li>
</ol>

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***