### Airbnb STORYTELLING CASE STUDY

### **Problem Background**  
- Imagine you are a **data analyst at Airbnb**. In recent months, the company has experienced a **significant decline in revenue**.  
- As travel restrictions ease and more people begin to travel again, **Airbnb wants to ensure it is well-prepared** to capitalize on this shift.

### **Objective**  
- To help **Airbnb determine its next strategic moves**, you have been assigned the task of **analyzing a dataset of Airbnb listings in New York**.

---

### **Presentation - I** (Internal Team)  
- **Data Analysis Managers**: Oversee data analysts, managing processes while having **basic technical expertise**.  
- **Lead Data Analyst**: Supervises the **entire team of data and business analysts** and possesses **strong technical knowledge**.

---

### **Presentation - II** (Business Leadership)  
- **Head of Acquisitions and Operations, NYC**: Responsible for **property and host acquisitions**, including **securing top properties, negotiating prices, and defining service agreements**.  
- **Head of User Experience, NYC**: Focuses on **customer preferences** and manages **property listings on Airbnb’s website and app**. Their role involves **optimizing property rankings across neighborhoods and cities** to **maximize visibility and engagement for hosts**.  

In [33]:
# Import the necessary libraries
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

In [34]:
# Reading Data from file
air = pd.read_csv("AB_NYC_2019.csv")
air.head(5)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,19-10-2018,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,21-05-2019,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,05-07-2019,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,19-11-2018,0.1,1,0


In [35]:
# Checking the dataset shape
air.shape

(48895, 16)

- The dataset contains 48895 rows and 16 columns
- Now we have to check whether there are any missing values in the dataset

In [37]:
# Checking for missing values
air.isnull().sum()

id                                    0
name                                 16
host_id                               0
host_name                            21
neighbourhood_group                   0
neighbourhood                         0
latitude                              0
longitude                             0
room_type                             0
price                                 0
minimum_nights                        0
number_of_reviews                     0
last_review                       10052
reviews_per_month                 10052
calculated_host_listings_count        0
availability_365                      0
dtype: int64

In [38]:
# Now we have the missing values, there are certain columns that are not efficient to the dataset
air.drop(['id','name','last_review'], axis = 1, inplace = True)

In [39]:
# View whether the columns are dropped
air.head(5)

Unnamed: 0,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
0,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,0.21,6,365
1,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,0.38,2,355
2,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,1,365
3,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,4.64,1,194
4,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,0.1,1,0


In [40]:
air.reviews_per_month.isnull().sum()

10052

In [41]:
# Now reviews per month contains more missing values which should be replaced with 0 respectively
air.fillna({'reviews_per_month':0},inplace=True)

In [42]:
air.reviews_per_month.isnull().sum()

0

In [43]:
# There are no missing values present in reviews_per_month column
# Now to check the unique values of other columns'
air.room_type.unique()

array(['Private room', 'Entire home/apt', 'Shared room'], dtype=object)

In [44]:
len(air.room_type.unique())

3

In [45]:
air.neighbourhood_group.unique()

array(['Brooklyn', 'Manhattan', 'Queens', 'Staten Island', 'Bronx'],
      dtype=object)

In [46]:
len(air.neighbourhood_group.unique())

5

In [47]:
len(air.neighbourhood.unique())

221

In [48]:
air.to_csv(r'C:\Users\Vaibh\DS Folders 2024\AirBNB\Storytelling-Case-Study-Airbnb-NYC-main\Storytelling-Case-Study-Airbnb-NYC-main\airbnb_final.csv',index=False, header=True)

In [49]:
air.host_id.value_counts().head(10)

host_id
219517861    327
107434423    232
30283594     121
137358866    103
16098958      96
12243051      96
61391963      91
22541573      87
200380610     65
7503643       52
Name: count, dtype: int64

In [50]:
air2 = air.sort_values(by="calculated_host_listings_count",ascending=False)
air2.head()

Unnamed: 0,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
39773,219517861,Sonder (NYC),Manhattan,Hell's Kitchen,40.76037,-73.99744,Entire home/apt,185,29,1,1.0,327,332
41463,219517861,Sonder (NYC),Manhattan,Financial District,40.70782,-74.01227,Entire home/apt,396,2,8,2.12,327,289
41469,219517861,Sonder (NYC),Manhattan,Financial District,40.7062,-74.01192,Entire home/apt,498,2,8,2.5,327,255
38294,219517861,Sonder (NYC),Manhattan,Financial District,40.70771,-74.00641,Entire home/apt,229,29,1,0.73,327,219
41468,219517861,Sonder (NYC),Manhattan,Financial District,40.70726,-74.0106,Entire home/apt,229,2,2,0.77,327,351
