<a href="https://colab.research.google.com/github/NRTPRIME/EDA-Capstone-Project-1-/blob/main/EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Airbnb Booking Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**

The objective of this project is to conduct a comprehensive analysis of Airbnb bookings in New York City. By analyzing the data, we aim to gain insights into various aspects of the Airbnb market in the city, including pricing trends, popularity of neighborhoods, and key factors influencing booking success.

The project will involve the following steps:
*  The first step of this project involves obtaining a dataset that contains information about Airbnb bookings in New York City. The dataset should include details such as listing information, host details, pricing, availability, and customer reviews. Various sources, including publicly available data and the Airbnb website, will be explored to gather the necessary information.
*  After obtaining the dataset, the next step is to clean and prepare the collected data. This involves handling missing values, removing duplicates, and addressing inconsistencies. Data transformations and feature engineering will also be performed to ensure the dataset is suitable for analysis.
*  The project will then move on to exploratory data analysis to uncover patterns, relationships, and trends within the dataset. Factors such as neighborhood popularity, pricing variations, and customer preferences will be explored. Visualizations and statistical techniques will be employed to extract meaningful insights.
*  The pricing analysis will examine the factors influencing the pricing of Airbnb listings in New York City. Key determinants of pricing, such as property type, location, amenities, and availability, will be identified. Regression analysis or machine learning algorithms will be used to build a predictive model for estimating listing prices. 
*  The neighborhood analysis will evaluate the popularity of different neighborhoods in New York City for Airbnb bookings. The number of listings, average prices, and customer reviews across various neighborhoods will be analyzed.









# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


The problem is to analyze Airbnb bookings in New York City and address challenges related to pricing optimization, neighborhood popularity, customer satisfaction, and personalized recommendations. The aim is to provide actionable insights and recommendations to improve the Airbnb experience for hosts and guests, enhancing efficiency and effectiveness in the booking process.

#### **Define Your Business Objective?**

The business objective of the New York Airbnb Booking Analysis is to optimize pricing strategies for hosts, enhance the guest experience, improve customer satisfaction, and provide personalized recommendations. This analysis aims to maximize revenue for hosts, help guests find suitable accommodations, and provide valuable insights for Airbnb stakeholders.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required. 
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits. 
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule. 

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')
abnb = pd.read_csv("/content/Airbnb NYC 2019.csv")

### Dataset First View

In [None]:
# Dataset First Look
abnb.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
abnb.shape

So we have 48895 rows and 16 columns in dataset.

### Dataset Information

In [None]:
# Dataset Info
abnb.info()

#### Duplicate Values

In [None]:
# Dataset Duplicates
duplicate_count = abnb.duplicated()
duplicate_count

In [None]:
# Dataset Duplicates Value Count
duplicate_count = abnb.duplicated().sum()
duplicate_count

We have zero duplicate values.

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
abnb_null = abnb.isnull().sum()
abnb_null

In [None]:
# Visualizing the missing values
abnb_null = abnb.isnull().sum()
plt.figure(figsize=(10, 4))
abnb_null.plot(kind='bar')
plt.xlabel('Variables')
plt.ylabel('Missing Values Count')
plt.title('Missing Values by Variables')
plt.show()

### What did you know about your dataset?

* There are total four columns with null values.
* 'name' column is having total 16 null values.
* 'host_name' is having 21 null values.
* 'last_review' and 'reviews_per_month' are having 10052 null values.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
abnb.columns

In [None]:
# Dataset Describe
abnb.describe()

### Variables Description 

We can see our dataset has 48895 rows/indexes and 16 columns/variables. Lets try to understand about the variables we've got here.

* id : A unique id 
* name : The name of listed properties/room_type on platform
* host_id : A unique id 
* host_name : Name of the host
* neighbourhood_group : A location of area
* neighbourhood : Area falls under neighbourhood_group
* latitude : Latitude coordinate of listing
* longitude : Longitude coordinate of listing
* room_type : Type to categorize listing rooms
* price : Price of listing
* minimum_nights : Minimum nights to be paid for single visit
* number_of_reviews : Number of reviews given by visitors
* last_review : Content of last review given
* reviews_per_month : Checks of per month/reviews given per month
* calculated_host_listings_count : Total no. of listing registered under the host
* availability_365 : The number of days for which a host is available in a year.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
abnb.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# droping unnecessary columns
abnb.drop(['last_review'] , axis=1 ,inplace = True)

#We know that reviews_per_month column have many null values, we will replace it with '0'.
abnb.fillna({'reviews_per_month':0} , inplace = True)

#And name and host_name also have some empty indexes, replace it with 'Unknown' and 'no_name' resp.
abnb['name'].fillna('Unknown' ,  inplace = True)
abnb['host_name'].fillna('no_name' ,  inplace = True)

# again examining changes
abnb.isna().sum()

In [None]:
abnb.head()

In [None]:
# Drop the  listings with 0 listing price because it is not possible
abnb_zero = abnb[(abnb['price'])==0].index
abnb.drop(abnb_zero, inplace=True)

# Drop the  listings with 0 listing availability
abnb_avail = abnb[(abnb['availability_365'])==0].index
abnb.drop(abnb_avail, inplace=True)

# again examining changes
abnb.describe()

### What all manipulations have you done and insights you found?

* We have dropped unnecessary column such as last_review.
* We know that reviews_per_month column have many null values, we will replace it with '0' because reviews_per_month can be zero but not null.
* And name and host_name also have some empty indexes we replaced it with 'Unknown' and 'no_name' respectivaly.
* By experimenting we get to know that in columns "price" and "availability_365" shows zero cost and not available throughout year respectively. So we drop that rows because zero price is not justifiable and Hosts not available around the year is not possible.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

In [None]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
import seaborn as sns

#### Chart - 1

In [None]:
# Chart - 1 visualization code
plt.figure(figsize=(8, 5))
sns.scatterplot(x='longitude', y='latitude', data=abnb, hue='neighbourhood_group')
plt.show

##### 1. Why did you pick the specific chart?

This chart shows the location wrt longitude and latitude of different neighbourhood groups in the city.

##### 2. What is/are the insight(s) found from the chart?

* Brooklyn and Manhattan are seems to be dense with hotels and apartments compare to others.
* Brooklyn ,manhatten , queens and bronx are directly connected neighbourhood groups while staten island is not directly connected.




##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

This chart shows locations nearer to the prime hotspots can garner more bookings hence we try hard to list more number of demanding room type and increase overall revenue by attracting customer.

#### Chart - 2

In [None]:
# Chart - 2 visualization code


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot 

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ? 
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***