<a href="https://colab.research.google.com/github/KushangShah/CapstoneProject/blob/main/Sample_EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Kushang Shah(Individual)


# **Project Summary -**

Title: Exploratory Data Analysis of Airbnb Dataset: Unveiling Insights for Optimal Stays

Introduction:
This project aimed to perform an in-depth exploratory data analysis (EDA) on a comprehensive Airbnb dataset. The dataset consisted of 48,895 entries, each containing 16 informative columns, such as listing details, host information, location attributes, pricing, and reviews. By analyzing this dataset, we sought to unravel key insights and patterns that could enhance the understanding of Airbnb listings and facilitate better decision-making for both hosts and guests.

Data Loading and Understanding
The initial step involved loading the dataset into a Pandas dataframe, enabling us to explore its structure and familiarize ourselves with the variables. Through this process, we gained valuable knowledge about the dataset's size, column types, and potential areas for analysis.

Data Cleaning for Accurate Analysis
Data cleaning is crucial for maintaining data integrity and ensuring accurate analysis. We addressed missing values by implementing appropriate strategies, either through imputation or by removing rows/columns with excessive missing data. Furthermore, we conducted a thorough examination for duplicate entries, efficiently eliminating any redundancies in the dataset.

Descriptive Statistics: Unveiling Central Tendencies and Variability
To gain a comprehensive understanding of the dataset, we computed descriptive statistics for numerical columns, such as price, minimum nights, and number of reviews. By calculating measures like mean, median, minimum, maximum, and quartiles, we obtained a clear picture of the dataset's central tendencies and variability. Concurrently, we explored the frequency distribution of categorical variables, shedding light on the distribution of different categories within the dataset.

Data Visualization: Unleashing Patterns and Trends
Data visualization is a powerful tool that enables us to uncover hidden patterns and trends. Through various visualizations, including histograms, bar charts, scatter plots, and heatmaps, we embarked on a journey to explore relationships between variables and identify noteworthy insights. For instance, visualizations helped us analyze the distribution of prices across different neighborhoods and room types, ultimately enabling us to discern any spatial trends or disparities.

Feature Engineering: Augmenting Analysis Dimensions
To enrich our analysis, we delved into feature engineering. This process involved creating new features or modifying existing ones to extract more meaningful insights. By deriving additional features, such as the host's average reviews per month or the host's total listings, we were able to uncover fresh dimensions of analysis that provided richer context and enhanced our understanding of the dataset.

Correlation Analysis: Discovering Relationships
Correlation analysis was instrumental in identifying relationships between variables. By calculating correlation coefficients and visualizing them through a correlation matrix, we unraveled significant correlations that offered valuable insights. This analysis allowed us to identify factors that potentially influence pricing or impact the number of reviews, thereby empowering hosts and guests with crucial information for their decision-making processes.

Temporal Analysis: Unveiling Trends Over Time
The dataset contained temporal information, such as the last review date. Through temporal analysis, we explored trends over time, seasonal patterns, and any changes in host activity or reviews. This analysis provided valuable insights into the dynamic nature of Airbnb listings and revealed temporal factors that may influence bookings or reviews.

Conclusion:
In conclusion, this project's comprehensive exploratory data analysis of the Airbnb dataset has successfully unveiled key insights and patterns. By following a systematic approach that encompassed data loading, cleaning, descriptive statistics, data visualization, feature engineering, correlation analysis, and temporal analysis, we gained a profound understanding of the dataset's nuances. The analysis yielded actionable insights for hosts to optimize their listings and for guests to make informed decisions when booking stays. Ultimately, this project underscores the significance of EDA in extracting meaningful insights from data

# **GitHub Link -**

https://github.com/KushangShah

# **Problem Statement**


The goal of this project is to analyze the Airbnb dataset and address the following problem:

"How can we gain insights into the factors influencing the pricing and availability of Airbnb listings in a specific location?"

Key Components of the Problem Statement:

1. Pricing Analysis: Identify the factors that significantly impact the pricing of Airbnb listings, such as room type, location, host characteristics, and amenities. Determine the extent to which each factor contributes to pricing variations.

2. Availability Analysis: Investigate the factors affecting the availability of Airbnb listings throughout the year. Analyze seasonal patterns, identify periods of high and low availability, and explore potential correlations between availability and pricing.

3. Location Influence: Examine the influence of specific neighborhoods or neighborhood groups on pricing and availability. Determine whether certain locations have higher demand or are associated with higher prices.

4. Host Impact: Evaluate the impact of host characteristics, such as the number of listings they manage and their hosting history, on pricing and availability. Assess whether experienced or highly-rated hosts tend to charge more or have better availability.

5. Recommendations: Based on the analysis, provide recommendations for both hosts and potential guests. Suggest strategies for hosts to optimize pricing and improve availability based on the identified influential factors. Offer insights for guests to find suitable listings based on pricing and availability patterns.

By addressing this problem, we aim to provide valuable insights and recommendations to both hosts and guests in the Airbnb ecosystem, enabling them to make informed decisions and optimize their experience on the platform.

#### **Define Your Business Objective?**

The primary business objective related to the Airbnb dataset analysis is to maximize the revenue and utilization of Airbnb listings by understanding the factors influencing pricing and availability. This involves:

1. Optimizing Pricing Strategy: Gain insights into the key factors affecting pricing variations for Airbnb listings. By identifying the most influential factors, hosts can strategically set competitive prices to attract guests while maximizing their revenue.

2. Enhancing Listing Availability: Understand the factors impacting the availability of listings throughout the year. By analyzing seasonal patterns and demand fluctuations, hosts can optimize their listing availability to ensure maximum utilization and minimize periods of low occupancy.

3. Improving Guest Experience: Provide valuable insights and recommendations to potential guests regarding suitable listings based on pricing and availability. Enhancing the guest experience contributes to positive reviews, increased bookings, and potentially higher revenue for hosts.

4. Supporting Business Decisions: The analysis of the Airbnb dataset can help inform strategic business decisions related to expansion, investment, and resource allocation. Understanding the market dynamics and influential factors can guide decision-makers in making informed choices to optimize business outcomes.

Ultimately, the business objective is to drive profitability, increase occupancy rates, and improve customer satisfaction within the Airbnb ecosystem by leveraging data-driven insights and recommendations.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

### Dataset First View

In [None]:
# Dataset First Look
airbnb_df=pd.read_csv('/content/drive/MyDrive/CSV files/Airbnb NYC 2019.csv')
airbnb_df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
airbnb_df.shape

### Dataset Information

In [None]:
# Dataset Info
airbnb_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
airbnb_df[airbnb_df.duplicated()].count()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
miss_value = airbnb_df.isna()
miss_value.sum()

In [None]:
# Visualizing the missing values
import missingno as ms
ms.matrix(airbnb_df)
plt.show

### What did you know about your dataset?

Airbnb datasets typically contain information about properties listed on the Airbnb platform, including various attributes and features associated with each listing. Some common fields that might be present in an Airbnb dataset include:  
ID: A unique identifier for each listing.

Name: The title or name of the listing.

Host ID: A unique identifier for the host of the listing.

Host Name: The name of the host.

Neighbourhood Group: The group or category of the neighborhood where the listing is located.

Neighbourhood: The specific neighborhood where the listing is situated.

Latitude: The latitude coordinates of the listing's location.

Longitude: The longitude coordinates of the listing's location.

Room Type: The type of room or accommodation being offered (e.g., entire home/apartment, private room, shared room).

Price: The price per night for the listing.

Minimum Nights: The minimum number of nights required to book the listing.

Number of Reviews: The total number of reviews received for the listing.

Last Review: The date of the last review for the listing.

Reviews per Month: The average number of reviews per month for the listing.

Calculated Host Listings Count: The total number of listings managed by the host.

Availability 365: The number of days the listing is available for booking within a year

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
airbnb_df.columns

In [None]:
# Dataset Describe
airbnb_df.describe()

### Variables Description

1. **ID:** Unique listing identifier.

2. **Name:** Title or brief description of the listing.

3. **Host ID:** Unique identifier for the listing's host.

4. **Host Name:** Name of the host managing the listing.

5. **Neighbourhood Group:** Categorization of the neighborhood.

6. **Neighbourhood:** Specific location or area of the listing.

7. **Latitude/Longitude:** Geographic coordinates of the listing.

8. **Room Type:** Type of accommodation (e.g., entire home, private room).

9. **Price:** Cost per night for booking.

10. **Minimum Nights:** Minimum required nights for booking.

11. **Number of Reviews:** Cumulative count of reviews received.

12. **Last Review:** Date of the most recent review.

13. **Reviews per Month:** Average monthly review count.

14. **Calculated Host Listings Count:** Total number of listings managed by the host.

15. **Availability 365:** Number of days the listing is available in a year.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
airbnb_df.columns

In [None]:
# ID
unique_ids = airbnb_df['id'].nunique()
print("Number of unique IDs:", unique_ids)

In [None]:
# Name
unique_names = airbnb_df['name'].nunique()
print("Unique names:", unique_names)

In [None]:
# Host ID
unique_host_ids = airbnb_df['host_id'].nunique()
print("Number of unique host IDs:", unique_host_ids)

In [None]:
# Host Name
unique_host_names = airbnb_df['host_name'].nunique()
print("Unique host names:", unique_host_names)

In [None]:
# Neighbourhood Group
unique_neighbourhood_groups = airbnb_df['neighbourhood_group'].nunique()
print("Unique neighbourhood groups:", unique_neighbourhood_groups)

In [None]:
# Neighbourhood
unique_neighbourhoods = airbnb_df['neighbourhood'].nunique()
print("Unique neighbourhoods:", unique_neighbourhoods)

In [None]:
# Latitude
unique_latitudes = airbnb_df['latitude'].nunique()
print("Unique latitudes:", unique_latitudes)

In [None]:
# Longitude
unique_longitudes = airbnb_df['longitude'].nunique()
print("Unique longitudes:", unique_longitudes)

In [None]:
# Room Type
unique_room_types = airbnb_df['room_type'].nunique()
print("Unique room types:", unique_room_types)

In [None]:
# Price
unique_prices = airbnb_df['price'].nunique()
print("Unique prices:", unique_prices)

In [None]:
# Minimum Nights
unique_min_nights = airbnb_df['minimum_nights'].nunique()
print("Unique minimum nights:", unique_min_nights)


In [None]:
# Number of Reviews
unique_num_reviews = airbnb_df['number_of_reviews'].nunique()
print("Unique number of reviews:", unique_num_reviews)

In [None]:
# Last Review
unique_last_reviews = airbnb_df['last_review'].nunique()
print("Unique last reviews:", unique_last_reviews)

In [None]:
# Reviews per Month
unique_reviews_per_month = airbnb_df['reviews_per_month'].nunique()
print("Unique reviews per month:", unique_reviews_per_month)

In [None]:
# Calculated Host Listings Count
unique_host_listings_count = airbnb_df['calculated_host_listings_count'].nunique()
print("Unique calculated host listings count:", unique_host_listings_count)

In [None]:
# Availability 365
unique_availabilities = airbnb_df['availability_365'].nunique()
print("Unique availabilities:", unique_availabilities)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***