<a href="https://colab.research.google.com/github/Skylord096/Capstone-Project-1/blob/main/Capstone_Project_2_EDA_Airbnb_Booking_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name - Airbnb Bookings Analysis**    



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Name**    - Akash Saini


# **Project Summary -**

The objective of this project is to conduct an exploratory data analysis (EDA) on Airbnb listings data in New York City to gain insights into the rental landscape. By analyzing various attributes such as listing details, host information, location data, pricing, and availability, we aim to understand key trends, patterns, and relationships within the dataset specific to New York City.

# **GitHub Link -**

https://github.com/Skylord096/Capstone-Project-1

# **Problem Statement**


**1. Price Disparity Analysis:**

* Investigate the factors contributing to price disparities among Airbnb listings in different neighborhoods or boroughs of New York City.
* Explore whether there are significant differences in pricing based on room types (entire home/apartment, private room, shared room).

**2. Host Behavior and Performance:**

* Analyze the relationship between host characteristics (such as host_id, host_name) and listing performance metrics (number of reviews, reviews per month).
* Identify top-performing hosts in terms of overall satisfaction and booking frequency.

**3. Temporal Patterns and Seasonality:**

* Examine temporal patterns in Airbnb bookings, including seasonal fluctuations, peak booking months, and trends over time.
* Investigate whether there are specific events or holidays that influence booking rates and pricing.

**4. Availability and Booking Trends:**

* Study the availability of listings throughout the year and analyze factors influencing listing availability (e.g., minimum nights required, host activity).
* Explore booking trends and demand patterns for different types of accommodations (e.g., entire home/apartment vs. private room).

**5. Geospatial Analysis:**

* Conduct a geospatial analysis to identify spatial clusters of Airbnb listings and assess their proximity to popular tourist attractions, transportation hubs, and amenities.
* Investigate spatial disparities in listing density and pricing across different neighborhoods or boroughs.

**6. Customer Satisfaction and Reviews:**

* Analyze customer reviews and ratings to identify common themes, sentiments, and factors driving guest satisfaction or dissatisfaction.
* Explore the relationship between listing attributes (such as price, location, room type) and customer reviews.

**7. Impact on Local Housing Market:**

* Assess the impact of Airbnb listings on the local housing market, including rental prices, housing availability, and neighborhood gentrification.
* Investigate whether there are correlations between the concentration of Airbnb listings and changes in property values or rental affordability.

**8. Regulatory Compliance and Policy Implications:**

* Evaluate the extent to which Airbnb listings comply with local regulations and zoning ordinances in New York City.
* Assess the effectiveness of existing policies or regulatory measures aimed at regulating short-term rentals and addressing concerns related to housing affordability and community impact.

#### **Define Your Business Objective?**



The business objective for analyzing Airbnb listings data in New York City is to gain actionable insights that can inform decision-making and strategy development for stakeholders involved in the short-term rental market. This includes Airbnb hosts, property managers, tourists, policymakers, and regulatory agencies. Specifically, the business objectives could be:

**Optimizing Revenue and Occupancy Rates:**

*   Identify factors that contribute to higher occupancy rates and pricing for Airbnb listings in different neighborhoods or property types.
*   Develop strategies to optimize listing performance and maximize revenue for hosts and property managers.

**Enhancing Customer Experience:**

* Understand guest preferences, satisfaction levels, and pain points through analysis of reviews and ratings.
* Implement improvements to listings and services to enhance the overall guest experience and drive positive reviews and repeat bookings.

**Market Segmentation and Targeting:**

* Segment the market based on traveler demographics, preferences, and booking behaviors.
* Tailor marketing efforts and pricing strategies to target specific customer segments effectively.

**Compliance and Risk Management:**

* Ensure compliance with local regulations and legal requirements governing short-term rentals in New York City.
* Identify potential risks and liabilities associated with non-compliance and develop strategies to mitigate them.

**Competitive Analysis and Benchmarking:**

* Benchmark performance against competitors and industry standards to identify areas of strength and areas for improvement.
* Monitor market trends and competitor activities to stay competitive in the dynamic short-term rental market.

**Community and Stakeholder Engagement:**

* Foster positive relationships with local communities by addressing concerns related to housing affordability, neighborhood disruption, and regulatory compliance.
* Engage with stakeholders, including policymakers, residents, and advocacy groups, to collaborate on sustainable solutions and responsible tourism practices.

By achieving these business objectives, stakeholders can unlock the full potential of the Airbnb platform while ensuring positive outcomes for hosts, guests, local communities, and the broader tourism ecosystem in New York City. The insights derived from the analysis will enable informed decision-making, strategic planning, and proactive measures to address challenges and capitalize on opportunities in the short-term rental market.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib as plt      #for visualisation
import seaborn as sns         #for visualisation
import folium
import warnings
warnings.filterwarnings('ignore')

### Dataset Loading

In [None]:
# Load Dataset

# Specify the file path
filepath = '/content/drive/MyDrive/Almabetter Projects/Module-2/Capstone Project-2/Airbnb NYC 2019.csv'

#load the file
airbnb_df = pd.read_csv(filepath)

### Dataset First View

In [None]:
# Dataset First Look
airbnb_df

In [None]:
airbnb_df.head()
airbnb_df.tail()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

# Get the number of rows and columns
num_rows, num_columns = airbnb_df.shape

# Print the results
print("Number of rows:", num_rows)
print("Number of columns:", num_columns)

##**About the Dataset â€“ Airbnb Bookings**

*   This Airbnb dataset contains nearly 49,000 observations from New York , with 16 columns of data.

*   The Data includes both categorical and numeric values, providing a diverse range of information about the listings.

*   This Dataset may be useful for analyzing trends and patterns in the Airbnb market in New York and also gain insights into the preferences and behavior of Airbnb users in the area.

*   This dataset contains information about Airbnb bookings in New York City in 2019. By analyzing this data, you may be able to understand the trends and patterns of Airbnb use in the NYC.

### Dataset Information

In [None]:
# Dataset Info
# 1. Shape of the Dataset
print("Dataset Shape:")
print(airbnb_df.shape)
print()

# 2. Column Names
print("Column Names:")
print(airbnb_df.columns)
print()

# 3. Data Types
print("Data Types:")
print(airbnb_df.dtypes)
print()

# 4. Summary Statistics
print("Summary Statistics:")
print(airbnb_df.describe())
print()

# 5. Missing Values
print("Missing Values:")
print(airbnb_df.isnull().sum())
print()

# 6. Unique Values
print("Unique Values:")
for column in airbnb_df.columns:
    if airbnb_df[column].dtype == 'object':  # Only for categorical columns
        print(f"Column: {column}, Unique Values: {airbnb_df[column].nunique()}")


#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

# Count duplicate rows
duplicate_count = airbnb_df.duplicated().sum()

print("Total Duplicate Rows:", duplicate_count)

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

# Count missing values
missing_values_count = airbnb_df.isnull().sum()

print("Missing Values Count:")
print(missing_values_count)


In [None]:
import matplotlib.pyplot as plt

# Visualize missing values as a heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(airbnb_df.isna())

### What did you know about your dataset?

Based on the information gathered from the Airbnb dataset, here's what we know:

1. **Dataset Shape**:
   - The dataset contains a **48895** number of rows and **16** columns, providing an overview of its size and structure.

2. **Column Names**:
   - We have a list of column names, which represent the different attributes or features available in the dataset.

3. **Data Types**:
   - We know the data types of each column, including whether they contain numerical or categorical data.

4. **Summary Statistics**:
   - We have calculated summary statistics for numerical columns, giving insights into the central tendency, dispersion, and potential outliers in the data.

5. **Missing Values**:
   - We have identified columns with missing values and know the count of missing values in each column.

6. **Unique Values**:
   - For categorical columns, we have counted the number of unique values, providing an understanding of the variety and distribution of different categories.

7. **Duplicate Values**:
   - We have checked for duplicate rows in the dataset and counted the total number of duplicate rows, if any.

8. **Visualizing Missing Values**:
   - We have visualized missing values using a heatmap, which helps identify patterns and concentrations of missing data across different columns.

Overall, this information provides us with a comprehensive understanding of the Airbnb dataset, laying the groundwork for further exploratory data analysis and insights generation. We can now proceed with more in-depth analysis, visualization, and interpretation to uncover meaningful patterns and trends in the data.Answer Here

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
airbnb_df.columns

In [None]:
# Dataset Describe
airbnb_df.describe(include='all')

### Variables Description

Understanding the variables (columns) in the `airbnb_df` DataFrame is crucial for conducting effective exploratory data analysis (EDA) and deriving insights from the dataset. Let's briefly discuss each variable and its potential significance:

1. **id**: Unique identifier for each Airbnb listing. This variable can be useful for referencing specific listings or joining with other datasets.

2. **name**: Title or description of the Airbnb listing. While this variable provides textual information about the listing, it may not be directly relevant for quantitative analysis.

3. **host_id**: Unique identifier for the host of the listing. This variable allows us to identify listings associated with each host and potentially analyze host behavior or performance.

4. **host_name**: Name of the host. Similar to the `name` variable, this variable provides information about the host but may not have a direct impact on quantitative analysis.

5. **neighbourhood_group**: Grouping of neighborhoods (boroughs) in New York City. This categorical variable can be useful for analyzing spatial patterns and comparing listings across different neighborhoods.

6. **neighbourhood**: Specific neighborhood where the Airbnb listing is located. Like `neighbourhood_group`, this variable provides spatial information that can be used for geographic analysis.

7. **latitude** and **longitude**: Coordinates of the listing's location. These numerical variables are essential for geospatial analysis and visualization, allowing us to plot listings on maps and identify spatial patterns.

8. **room_type**: Type of room available for rent (e.g., Entire home/apt, Private room, Shared room). This categorical variable provides information about the accommodation type and can be used for segmentation and analysis based on room type.

9. **price**: Price per night for the listing. This numerical variable is a key metric for analyzing pricing trends, identifying outliers, and understanding the distribution of prices across listings.

10. **minimum_nights**: Minimum number of nights required for booking. This numerical variable provides information about booking restrictions and can be useful for understanding host policies.

11. **number_of_reviews**: Total number of reviews for the listing. This numerical variable reflects the popularity and level of engagement with the listing, which can be indicative of listing quality and guest satisfaction.

12. **last_review**: Date of the last review. This temporal variable provides information about the recency of reviews and can be useful for analyzing trends over time.

13. **reviews_per_month**: Average number of reviews per month. This numerical variable complements `number_of_reviews` by providing a normalized measure of review activity over time.

14. **calculated_host_listings_count**: Number of listings managed by the host. This numerical variable provides insights into host activity and scale of operation.

15. **availability_365**: Number of days the listing is available for booking within the next year. This numerical variable reflects listing availability and can be useful for analyzing booking patterns and seasonality.

Understanding these variables allows us to formulate hypotheses, conduct exploratory data analysis, and derive insights that can inform various stakeholders in the Airbnb ecosystem, including hosts, guests, and policymakers.Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
print("Unique Values:")
for column in airbnb_df.columns:
    #if airbnb_df[column].dtype == 'object':  # Only for categorical columns
        print(f"Column: {column}, Unique Values: {airbnb_df[column].nunique()}")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***