# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidence to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?



In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [3]:
air_bnb = pd.read_csv('files/AB_NYC_2019.csv')
air_bnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [4]:
print('\n', air_bnb.shape)


 (48895, 16)


In [5]:
print('\n', air_bnb.index)


 RangeIndex(start=0, stop=48895, step=1)


In [6]:
print('\n', air_bnb.columns)


 Index(['id', 'name', 'host_id', 'host_name', 'neighbourhood_group',
       'neighbourhood', 'latitude', 'longitude', 'room_type', 'price',
       'minimum_nights', 'number_of_reviews', 'last_review',
       'reviews_per_month', 'calculated_host_listings_count',
       'availability_365'],
      dtype='object')


In [7]:
print('\n', air_bnb.dtypes)


 id                                  int64
name                               object
host_id                             int64
host_name                          object
neighbourhood_group                object
neighbourhood                      object
latitude                          float64
longitude                         float64
room_type                          object
price                               int64
minimum_nights                      int64
number_of_reviews                   int64
last_review                        object
reviews_per_month                 float64
calculated_host_listings_count      int64
availability_365                    int64
dtype: object


In [11]:
# How many neighborhood groups are available and which shows up the most?
## It's 5 and Manhattan shows up the most.

air_bnb_nhGrp_count = air_bnb.groupby('neighbourhood_group', as_index=False).count().sort_values('id', ascending=False).reset_index(drop=True)
air_bnb_nhGrp_count
print(len(air_bnb_nhGrp_count))
print(air_bnb_nhGrp_count.loc[0][0])


5
Manhattan


In [12]:
# Are private rooms the most popular in manhattan?
## No, entire home/apt is the most popular in Manhattan.

air_bnb_room_man = air_bnb.groupby(['neighbourhood_group', 'room_type']).count().loc['Manhattan'].sort_values('id', ascending=False)
air_bnb_room_man

Unnamed: 0_level_0,id,name,host_id,host_name,neighbourhood,latitude,longitude,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
room_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Entire home/apt,13199,13193,13199,13196,13199,13199,13199,13199,13199,13199,9967,9967,13199,13199
Private room,7982,7979,7982,7976,7982,7982,7982,7982,7982,7982,6309,6309,7982,7982
Shared room,480,480,480,480,480,480,480,480,480,480,356,356,480,480


In [15]:
# Which hosts are the busiest and based on their reviews?
## Sonder(NYC)

air_bnb_busy_host = air_bnb.groupby(['host_id', 'host_name']).count().sort_values('number_of_reviews', ascending=False)
air_bnb_busy_host

Unnamed: 0_level_0,Unnamed: 1_level_0,id,name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
host_id,host_name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
219517861,Sonder (NYC),327,327,327,327,327,327,327,327,327,327,207,207,327,327
107434423,Blueground,232,232,232,232,232,232,232,232,232,232,28,28,232,232
30283594,Kara,121,121,121,121,121,121,121,121,121,121,43,43,121,121
137358866,Kazuya,103,103,103,103,103,103,103,103,103,103,51,51,103,103
16098958,Jeremy & Laura,96,96,96,96,96,96,96,96,96,96,61,61,96,96
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13543967,Paulina,1,1,1,1,1,1,1,1,1,1,1,1,1,1
13541655,Michael,1,1,1,1,1,1,1,1,1,1,1,1,1,1
13540183,Ashley,1,1,1,1,1,1,1,1,1,1,0,0,1,1
13538150,Mariana,1,1,1,1,1,1,1,1,1,1,1,1,1,1


In [20]:
#Which neighorhood group has the highest average price?
## Manhattan

air_bnb_nh_Hprice = air_bnb.groupby('neighbourhood_group').mean().sort_values('price', ascending=False)['price']
air_bnb_nh_Hprice

neighbourhood_group
Manhattan        196.875814
Brooklyn         124.383207
Staten Island    114.812332
Queens            99.517649
Bronx             87.496792
Name: price, dtype: float64

In [19]:
# Which neighbor hood group has the highest total price?
## Manhattan

air_bnb_nh_HTprice = air_bnb.groupby('neighbourhood_group').sum().sort_values('price', ascending=False)['price']
air_bnb_nh_HTprice

neighbourhood_group
Manhattan        4264527
Brooklyn         2500600
Queens            563867
Bronx              95459
Staten Island      42825
Name: price, dtype: int64

In [22]:
#Which top 5 hosts have the highest total price?
## Sonder (NYC), Blueground, Sally, Red Awning, Kara

air_bnb_host_HTprice = air_bnb.groupby(['host_id','host_name']).sum().sort_values('price', ascending=False).head(5)['price']
air_bnb_host_HTprice

host_id    host_name   
219517861  Sonder (NYC)    82795
107434423  Blueground      70331
156158778  Sally           37097
205031545  Red Awning      35294
30283594   Kara            33581
Name: price, dtype: int64

In [26]:
# Who currently has no (zero) availability with a review count of 100 or more?
## none

air_bnb_host_reviews = air_bnb.groupby(['host_id','host_name'], as_index = False).count().sort_values('number_of_reviews', ascending=False)
air_bnb_host_reviews_100 = air_bnb_host_reviews[air_bnb_host_reviews['number_of_reviews'] >= 100]
air_bnb_host_reviews_100 

Unnamed: 0,host_id,host_name,id,name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
34629,219517861,Sonder (NYC),327,327,327,327,327,327,327,327,327,327,207,207,327,327
29393,107434423,Blueground,232,232,232,232,232,232,232,232,232,232,28,28,232,232
19564,30283594,Kara,121,121,121,121,121,121,121,121,121,121,43,43,121,121
31064,137358866,Kazuya,103,103,103,103,103,103,103,103,103,103,51,51,103,103


In [35]:
# What host has the highest total of prices and where are they located?
## Sounder (NYC), Located in Financial District

air_bnb_TOPhost_HTprice = air_bnb.groupby(['host_id','host_name', 'neighbourhood']).sum().sort_values('price', ascending=False).head(1)['price']
air_bnb_TOPhost_HTprice

host_id    host_name     neighbourhood     
219517861  Sonder (NYC)  Financial District    57738
Name: price, dtype: int64

In [37]:
# When did Danielle from Queens last receive a review?
## July 8th, 2019
air_bnb_Danielle = air_bnb[(air_bnb['host_name'] == 'Danielle') & (air_bnb['neighbourhood_group'] == 'Queens')].sort_values('last_review', ascending=False)
air_bnb_Danielle

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
22469,18173787,Cute Tiny Room Family Home by LGA NO CLEANING FEE,26432133,Danielle,Queens,East Elmhurst,40.7638,-73.87238,Private room,48,1,436,2019-07-08,16.03,5,337
21517,17222454,Sun Room Family Home LGA Airport NO CLEANING FEE,26432133,Danielle,Queens,East Elmhurst,40.76367,-73.87088,Private room,48,1,417,2019-07-07,14.36,5,338
20403,16276632,Cozy Room Family Home LGA Airport NO CLEANING FEE,26432133,Danielle,Queens,East Elmhurst,40.76335,-73.87007,Private room,48,1,510,2019-07-06,16.22,5,341
22068,17754072,Bed in Family Home Near LGA Airport,26432133,Danielle,Queens,East Elmhurst,40.76389,-73.87155,Shared room,38,1,224,2019-07-06,7.96,5,80
7086,5115372,Comfy Room Family Home LGA Airport NO CLEANING...,26432133,Danielle,Queens,East Elmhurst,40.76374,-73.87103,Private room,54,1,430,2019-07-03,13.45,5,347
33861,26814763,One bedroom with full bed / 1 stop from Manhattan,201647469,Danielle,Queens,Long Island City,40.74565,-73.94699,Private room,108,2,13,2019-06-20,1.74,1,333
27021,21386105,Quiet & clean 1br haven with balcony near the ...,154256662,Danielle,Queens,Astoria,40.77134,-73.92424,Entire home/apt,250,3,1,2018-01-02,0.05,1,180
16349,13151075,ASTORIA APARTMENT OUTDOOR SPACE,18051286,Danielle,Queens,Astoria,40.77221,-73.92901,Private room,50,1,0,,,1,0


## Further Questions

1. Which host has the most listings?

In [38]:
air_bnb_listings= air_bnb.groupby(['host_id', 'host_name']).count()['id'].sort_values(ascending=False)
air_bnb_listings

## Sonder (NYC)

host_id    host_name   
219517861  Sonder (NYC)    327
107434423  Blueground      232
30283594   Kara            121
137358866  Kazuya          103
12243051   Sonder           96
                          ... 
48818023   Sarah             1
48819868   Nick              1
48823036   Fred              1
48823279   Chris             1
2438       Tasos             1
Name: id, Length: 37439, dtype: int64

2. How many listings have completely open availability?

In [40]:
len(air_bnb[air_bnb['availability_365']==365])

## 1,295

1295

3. What room_types have the highest review numbers?

In [41]:
air_bnb_roomType_reviews = air_bnb.groupby('room_type').count()['number_of_reviews']
air_bnb_roomType_reviews

## Entire home/apt

room_type
Entire home/apt    25409
Private room       22326
Shared room         1160
Name: number_of_reviews, dtype: int64

# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --

In [None]:
Which hosts are the busiest and why?
Sonder (NYC) is the busiest host based on numbers of booked nights , listings and reviews.
Entire home/apt in  the most popular based on the highest reviews numbers and numbers of booked nights.
Manhattan has most listing, highest avarage pricem, total price adn most expensive listing.
Hosts who have more than 100 reviews are tend to be busy based on availability.