# AirBnB Case Study

Using Data from AirBnB (2019)

In [1]:
import numpy as np
import pandas as pd

In [2]:
data = pd.read_csv('../AirBnB Case Study/AB_NYC_2019.csv')

In [27]:
data.head(5)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


#### How many neighborhood groups are there and which one shows up the most?

In [14]:
data.groupby('neighbourhood_group').count()['id'].sort_values(axis=0, ascending=False)

neighbourhood_group
Manhattan        21661
Brooklyn         20104
Queens            5666
Bronx             1091
Staten Island      373
Name: id, dtype: int64

There are 5 neighborhoods (the buroughs), and the most popular is Manhattan

#### Are private rooms the most popular in Manhattan?

In [18]:
data[data['neighbourhood_group'] == 'Manhattan'].groupby('room_type').count()['id']

room_type
Entire home/apt    13199
Private room        7982
Shared room          480
Name: id, dtype: int64

No, entire homes / apts are the most popular

#### Which hosts are the busiest based on their reviews?

In [24]:
data.groupby('host_id').count().sort_values('number_of_reviews', ascending=False).head(5)['number_of_reviews']

host_id
219517861    327
107434423    232
30283594     121
137358866    103
16098958      96
Name: number_of_reviews, dtype: int64

In [31]:
data[data['host_id'] == 219517861]['host_name'].head(1)

38293    Sonder (NYC)
Name: host_name, dtype: object

Listed are the top 5 busiest host IDs based on their reviews. The most popular is 'Sonder NYC'

#### Which neighborhood group has the highest average price?

In [36]:
data.groupby('neighbourhood_group').mean()['price'].sort_values(axis=0, ascending=False)

neighbourhood_group
Manhattan        196.875814
Brooklyn         124.383207
Staten Island    114.812332
Queens            99.517649
Bronx             87.496792
Name: price, dtype: float64

Manhattan has the highest average price at $196

#### Which neighborhood group has the highest total price?

In [41]:
data.groupby('neighbourhood_group').max('price')['price'].sort_values(axis=0, ascending=False)

neighbourhood_group
Brooklyn         10000
Manhattan        10000
Queens           10000
Staten Island     5000
Bronx             2500
Name: price, dtype: int64

Brooklyn, Manhattan, and Queens all have rooms that tie for the highest total price

#### Which top 5 hosts have the highest total price?

In [46]:
data.groupby('host_id').sum().sort_values('price', ascending=False).head(5)['price']

host_id
219517861    82795
107434423    70331
156158778    37097
205031545    35294
30283594     33581
Name: price, dtype: int64

The above host IDs have the highest total price of all combined properties

#### Who currently has no (zero) availability with a review count of 100 or more?

In [58]:
data.query('number_of_reviews > 100 & availability_365 == 0').groupby('host_id').count()

Unnamed: 0_level_0,id,name,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
host_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
7490,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
36897,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
67778,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
79402,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
116382,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
142878742,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
143944704,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
159156636,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1
176185168,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1


There are 144 hosts with more than 100 reviews on their properties with no availability

#### What host has the highest total of prices and where are they located?

In [64]:
data.groupby('host_id').sum().sort_values('price', ascending=False).head(1)['price']

host_id
219517861    82795
Name: price, dtype: int64

In [72]:
data[data['host_id'] == 219517861].groupby(['neighbourhood_group', 'neighbourhood']).count()['id'].sort_values(axis=0, ascending=False)

neighbourhood_group  neighbourhood     
Manhattan            Financial District    218
                     Murray Hill            50
                     Theater District       27
                     Hell's Kitchen         15
                     Chelsea                 7
                     Upper East Side         6
                     Midtown                 4
Name: id, dtype: int64

The host with the highest total price in properties is ID 219517861, Sonder NYC. They operate wholly out of Manhattan, with the Financial District holding the majority of their properties

#### When did Danielle from Queens last receive a review?

In [77]:
data.query("host_name == 'Danielle' & neighbourhood_group == 'Queens'").sort_values('last_review', ascending=False).head(1)[['host_name', 'neighbourhood_group', 'last_review']]

Unnamed: 0,host_name,neighbourhood_group,last_review
22469,Danielle,Queens,2019-07-08


The last review was on 2019-7-8