# AirBnB NY Locations Data Case Study

Goal: Explore various questions using pandas

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?



In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [79]:
air_bnb = pd.read_csv('AB_NYC_2019.csv')
air_bnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [136]:
hosts = air_bnb.sort_values('host_id', ascending = True)
hosts.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
30604,23669201,Great Price: Williamsburg Brooklyn Loft off L ...,2438,Tasos,Brooklyn,Williamsburg,40.71412,-73.94447,Entire home/apt,95,45,1,2018-03-17,0.06,1,0
2290,1101224,THE PUTNAM,2571,Teedo,Brooklyn,Bedford-Stuyvesant,40.68674,-73.93845,Entire home/apt,182,9,27,2019-05-21,0.37,1,23
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
13963,10593675,"La Spezia room. Clean, quiet and comfortable bed",2787,John,Brooklyn,Bensonhurst,40.60951,-73.97642,Shared room,79,1,15,2018-09-29,0.43,6,180
13583,10160215,Torre del Lago Room.,2787,John,Brooklyn,Gravesend,40.60755,-73.9741,Private room,79,1,17,2019-06-26,0.4,6,174


In [81]:
# How many neighborhood groups are available and which shows up the most?

neigh_group = air_bnb.groupby('neighbourhood_group').count().sort_values('id', ascending = False)
neigh_group[['id']]

## There are 5 neighborhood groups. Manhattan shows up the most with 21,661 entries

Unnamed: 0_level_0,id
neighbourhood_group,Unnamed: 1_level_1
Manhattan,21661
Brooklyn,20104
Queens,5666
Bronx,1091
Staten Island,373


In [82]:
# Are private rooms the most popular in manhattan?

room_type = air_bnb.groupby(['neighbourhood_group', 'room_type']).count()
room_type[['host_id']]

## No, entire home/apt is 13,199 vs private room 7,982

Unnamed: 0_level_0,Unnamed: 1_level_0,host_id
neighbourhood_group,room_type,Unnamed: 2_level_1
Bronx,Entire home/apt,379
Bronx,Private room,652
Bronx,Shared room,60
Brooklyn,Entire home/apt,9559
Brooklyn,Private room,10132
Brooklyn,Shared room,413
Manhattan,Entire home/apt,13199
Manhattan,Private room,7982
Manhattan,Shared room,480
Queens,Entire home/apt,2096


In [144]:
# Which hosts are the busiest and based on their reviews?
busiest = air_bnb.groupby('host_id').sum().sort_values(by=['number_of_reviews'],ascending=False)
busiest[['id', 'number_of_reviews']].head()

# host_id = 37312959 had the most with 2273 reviews

Unnamed: 0_level_0,id,number_of_reviews
host_id,Unnamed: 1_level_1,Unnamed: 2_level_1
37312959,51232186,2273
344035,155403744,2205
26432133,74542317,2017
35524316,76057984,1971
40176101,70447963,1818


In [87]:
#Which neighorhood group has the highest average price?

avg_price_neigh = air_bnb.groupby('neighbourhood_group').mean().sort_values('price', ascending = False)
avg_price_neigh[['price']]

#Manhattan has the highest average price with $196.88

Unnamed: 0_level_0,price
neighbourhood_group,Unnamed: 1_level_1
Manhattan,196.875814
Brooklyn,124.383207
Staten Island,114.812332
Queens,99.517649
Bronx,87.496792


In [152]:
# Which neighborhood group has the highest total price?
max_price_neigh = air_bnb.groupby('neighbourhood_group')[['price']].max().sort_values('price', ascending = False)
max_price_neigh
#Brooklyn, Manhattan, and Queens all had at least one property costing $10,000

Unnamed: 0_level_0,price
neighbourhood_group,Unnamed: 1_level_1
Brooklyn,10000
Manhattan,10000
Queens,10000
Staten Island,5000
Bronx,2500


In [153]:
#Which top 5 hosts have the highest total price?
high_price_host = air_bnb.groupby('host_id').sum().sort_values(by=['price'],ascending=False)
high_price_host[['price']].head()

Unnamed: 0_level_0,price
host_id,Unnamed: 1_level_1
219517861,82795
107434423,70331
156158778,37097
205031545,35294
30283594,33581


In [111]:
# Who currently has no (zero) availability with a review count of 100 or more?
no_avail_host = air_bnb[air_bnb['availability_365'] == 0]
no_avail_host_100 = no_avail_host[no_avail_host['number_of_reviews'] >= 100]
print(no_avail_host_100.shape)
no_avail_host_100[['host_id', 'host_name','number_of_reviews']].head()
# 162 hosts have no availibility with 100+ reviews

(162, 16)


Unnamed: 0,host_id,host_name,number_of_reviews
8,7490,MaryEllen,118
94,79402,Christiana,168
132,129352,Sol,193
174,193722,Coral,114
180,67778,Doug,206


In [221]:
# What host has the highest total of prices and where are they located?

#As seen above
high_price_host = air_bnb.groupby('host_id').sum().sort_values(by=['price'],ascending=False)
high_price_host[['price']].head()
# Host ID 219517861 has the highest total of prices with $82,795 over all properties

Unnamed: 0_level_0,price
host_id,Unnamed: 1_level_1
219517861,82795
107434423,70331
156158778,37097
205031545,35294
30283594,33581


In [220]:
high_price_host_loc = air_bnb[air_bnb['host_id']==219517861]
high_price_host_final = high_price_host_loc[['host_id', 'host_name','neighbourhood_group', 'neighbourhood', 'price']]
display(high_price_host_final.head())
high_price_host_neigh_tot = air_bnb[air_bnb['host_id']==219517861].groupby('neighbourhood', as_index=False)['price'].sum().sort_values(by=['price'],ascending=False)
display(high_price_host_neigh_tot)
high_price_host_neigh_loc = air_bnb[air_bnb['host_id']==219517861].groupby('neighbourhood', as_index=False)['price'].count().sort_values(by=['price'],ascending=False)
high_price_host_neigh_loc

# Their name is Sonder(NYC) and are located in Manhattan in various neigborhoods

Unnamed: 0,host_id,host_name,neighbourhood_group,neighbourhood,price
38293,219517861,Sonder (NYC),Manhattan,Financial District,302
38294,219517861,Sonder (NYC),Manhattan,Financial District,229
38588,219517861,Sonder (NYC),Manhattan,Financial District,232
39769,219517861,Sonder (NYC),Manhattan,Murray Hill,262
39770,219517861,Sonder (NYC),Manhattan,Murray Hill,255


Unnamed: 0,neighbourhood,price
1,Financial District,57738
4,Murray Hill,11005
5,Theater District,7743
2,Hell's Kitchen,2789
0,Chelsea,1761
6,Upper East Side,958
3,Midtown,801


Unnamed: 0,neighbourhood,price
1,Financial District,218
4,Murray Hill,50
5,Theater District,27
2,Hell's Kitchen,15
0,Chelsea,7
6,Upper East Side,6
3,Midtown,4


In [134]:
# When did Danielle from Queens last receive a review?

danielle_review = air_bnb[air_bnb['host_name']=='Danielle']
danielle_review_queens = danielle_review[danielle_review['neighbourhood_group']=='Queens'].sort_values(by=['last_review'],ascending=False)
danielle_review_queens[['host_name','host_id', 'neighbourhood_group', 'last_review']]
# The lastest review from a Danielle from Queens was on 2019-07-08

Unnamed: 0,host_name,host_id,neighbourhood_group,last_review
22469,Danielle,26432133,Queens,2019-07-08
21517,Danielle,26432133,Queens,2019-07-07
20403,Danielle,26432133,Queens,2019-07-06
22068,Danielle,26432133,Queens,2019-07-06
7086,Danielle,26432133,Queens,2019-07-03
33861,Danielle,201647469,Queens,2019-06-20
27021,Danielle,154256662,Queens,2018-01-02
16349,Danielle,18051286,Queens,


## Further Questions

1. Which host has the most listings?

In [137]:
most_listings = air_bnb.groupby('host_id').count().sort_values(by=['id'],ascending=False)
most_listings[['id','price']].head()
#Host ID 219517861 has the most listings

Unnamed: 0_level_0,id,price
host_id,Unnamed: 1_level_1,Unnamed: 2_level_1
219517861,327,327
107434423,232,232
30283594,121,121
137358866,103,103
16098958,96,96


2. How many listings have completely open availability?

In [139]:
complete_avail = air_bnb[air_bnb['availability_365']==365]
print(complete_avail.shape)
complete_avail[['host_id', 'host_name','availability_365']].head()
# There are 1295 listings with full availability

(1295, 16)


Unnamed: 0,host_id,host_name,availability_365
0,2787,John,365
2,4632,Elisabeth,365
36,7355,Vt,365
38,45445,Harriet,365
97,82685,Elliott,365


3. What room_types have the highest review numbers?

In [142]:
air_room_types = air_bnb.groupby('room_type').count()
air_room_types[['number_of_reviews']]
# Entire home/apt has the highest review numbers

Unnamed: 0_level_0,number_of_reviews
room_type,Unnamed: 1_level_1
Entire home/apt,25409
Private room,22326
Shared room,1160


# Final Conclusion

There are five neighborhood groups with the following number of listings: Manhattan (21,661), Brooklyn (20,104), Queens(5,666), Bronx(1,091), and Staten Island(373). Listings for private rooms seemed to be more popular than entire homes and shared rooms except for Manhattan where private rooms were second to entire homes. The top five highest total number of reviews for any one host were 2,273, 2,205, 2,017, 1,971, and 1,818. The average amount per listing for each neighborhood group is as follows: Manhattan \\$196.88, Brooklyn \\$124.38, Staten Island \\$114.81, Queens \\$99.52, and Bronx \\$87.50. The highest listing price for each neighborhood group was: Brooklyn \\$10,000, Manhattan \\$10,000, Queens \\$10,000, Staten Island \\$5,000, and Bronx \\$2,500. The highest listing prices for combined properties for the to five hosts are host ID = 219517861, \\$82,795, host ID = 107434423 \\$70,331, host ID = 156158778 \\$37,097, host ID = 205031545 \\$35,294, and host ID = 30283594 \\$33,581. Where the top host, host ID = 219517861 had a host name of Sonder (NYC) and 327 properties all in Manhattan. They were distributed in the foloowing neighborhoods: Financial District (218 listings, combined total \\$57,738), Murray Hill (50 listings, combined total \\$11,005), Theater District (27 listings, combined total \\$7,743), Hell's Kitchen (15 listings, combined total \\$2,789), Chelsea (7 listings, combined total \\$1,761), Upper East Side (6 listings, combined total \\$958), and  Midtown (4 listings, combined total \\$801). There are currently 162 hosts have no availibility with 100+ reviews. The lastest review from a person named Danielle located in Queens was on 2019-07-08. Host ID 219517861 has the most listings with 327 listings. There are 1295 listings with full availability. The number of reviews based on room types are as follows: Entire home/apt 25,409, Private room 22,326, and Shared room 1,160.