# AirBnB NY Locations Data Case Study

Your task will be to take the data provided and find evidence to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

This is to simulate what you will face when you are out in the wild. 

Happy Coding!

In [114]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime

In [120]:
air_bnb = pd.read_csv('AB_NYC_2019.csv')
air_bnb.head()
air_bnb['last_review'] = pd.to_datetime(air_bnb['last_review']) #Setting the last_review column to a datetime type

In [179]:
#Whic hosts are the busiest and why?
hosts = air_bnb.groupby('host_id').calculated_host_listings_count.count().nlargest(5)
hosts

host_id
219517861    327
107434423    232
30283594     121
137358866    103
12243051      96
Name: calculated_host_listings_count, dtype: int64

In [180]:
# How many neighborhood groups are available and which shows up the most?
air_bnb.neighbourhood_group.unique() # 5
air_bnb.groupby('neighbourhood_group').neighbourhood_group.count().nlargest(5) # Manhattan


neighbourhood_group
Manhattan        21661
Brooklyn         20104
Queens            5666
Bronx             1091
Staten Island      373
Name: neighbourhood_group, dtype: int64

In [30]:
# Are private rooms the most popular in manhattan? Nah. It's Entire home/apt
manhattan_query = air_bnb.query("neighbourhood_group == 'Manhattan'")
manhattan_query.room_type.value_counts()

Entire home/apt    13199
Private room        7982
Shared room          480
Name: room_type, dtype: int64

In [181]:
# Which hosts are the busiest and based on their reviews? Michael getting big checks with 417 reviews
by_host = air_bnb.groupby('host_id').count()
by_host.sort_values(by=['number_of_reviews'], ascending=False).head(5)

Unnamed: 0_level_0,id,name,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
host_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
219517861,327,327,327,327,327,327,327,327,327,327,327,207,207,327,327
107434423,232,232,232,232,232,232,232,232,232,232,232,28,28,232,232
30283594,121,121,121,121,121,121,121,121,121,121,121,43,43,121,121
137358866,103,103,103,103,103,103,103,103,103,103,103,51,51,103,103
16098958,96,96,96,96,96,96,96,96,96,96,96,61,61,96,96


In [187]:
#Which neighorhood group has the highest average price? Manhattan
neighbor = air_bnb.groupby('neighbourhood_group').mean()
neighbor['price'].nlargest(5)

neighbourhood_group
Manhattan        196.875814
Brooklyn         124.383207
Staten Island    114.812332
Queens            99.517649
Bronx             87.496792
Name: price, dtype: float64

In [188]:
# Which neighbor hood group has the highest total price? Manhattan with $4,264,527
# air_bnb.sort_values(by=['price'], ascending=False)
neighborhood_total = air_bnb.groupby('neighbourhood_group').sum()
neighborhood_total['price'].nlargest(5)

neighbourhood_group
Manhattan        4264527
Brooklyn         2500600
Queens            563867
Bronx              95459
Staten Island      42825
Name: price, dtype: int64

In [184]:
#Which 5 hosts have the highest total price?
host_total_price = air_bnb.groupby('host_id').sum()
host_total_price.sort_values(by=['price'], ascending=False).head(5)

Unnamed: 0_level_0,id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
host_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
219517861,10885561678,13316.25823,-24198.18856,82795,4353,1281,397.56,106929,98588
107434423,7210036953,9451.60418,-17166.13165,70331,7470,29,6.04,53824,58884
156158778,332529233,488.73929,-887.71735,37097,12,1,1.0,144,776
205031545,1415225676,1996.92821,-3624.34656,35294,750,127,21.21,2401,10796
30283594,1611854192,4931.41347,-8952.50779,33581,3767,65,3.94,14641,37924


In [190]:
# Who currently has no (zero) availability with a review count of 100 or more?
no_availability = air_bnb.loc[(air_bnb['availability_365'] == 0) & (air_bnb['number_of_reviews'] >= 100)] # < All those fools
no_availability

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
8,5203,Cozy Clean Guest Room - Family Apt,7490,MaryEllen,Manhattan,Upper West Side,40.80178,-73.96723,Private room,79,2,118,2017-07-21,0.99,1,0
94,20913,Charming 1 bed GR8 WBurg LOCATION!,79402,Christiana,Brooklyn,Williamsburg,40.70984,-73.95775,Entire home/apt,100,5,168,2018-07-22,1.57,1,0
132,30031,NYC artists’ loft with roof deck,129352,Sol,Brooklyn,Greenpoint,40.73494,-73.95030,Private room,50,3,193,2019-05-20,1.86,1,0
174,44221,Financial District Luxury Loft,193722,Coral,Manhattan,Financial District,40.70666,-74.01374,Entire home/apt,196,3,114,2019-06-20,1.06,1,0
180,45556,"Fort Greene, Brooklyn: Center Bedroom",67778,Doug,Brooklyn,Fort Greene,40.68863,-73.97691,Private room,65,2,206,2019-06-30,1.92,2,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29581,22705516,The Quietest Block in Manhattan :),127740507,Kathleen,Manhattan,Harlem,40.83102,-73.94181,Private room,65,2,103,2019-07-07,5.89,2,0
30461,23574142,queens get away!!,176185168,Janet,Queens,Laurelton,40.68209,-73.73662,Private room,65,1,119,2018-12-24,7.79,1,0
31250,24267706,entire sunshine of the spotless mind room,21074914,Albert,Brooklyn,Bedford-Stuyvesant,40.68234,-73.91318,Private room,49,1,102,2019-07-05,6.73,3,0
32670,25719044,COZY Room for Female Guests,40119874,Stephany,Brooklyn,Prospect-Lefferts Gardens,40.66242,-73.94417,Private room,48,1,131,2019-05-31,9.97,2,0


In [191]:
# What host has the highest total of prices and where are they located? #Sonder in Manhattan - HostID of 219517861
host_and_where = air_bnb.groupby(['host_id', 'neighbourhood_group']).sum()
host_and_where.sort_values(by=['price'], ascending=False).head(5)

Unnamed: 0_level_0,Unnamed: 1_level_0,id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
host_id,neighbourhood_group,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
219517861,Manhattan,10885561678,13316.25823,-24198.18856,82795,4353,1281,397.56,106929,98588
107434423,Manhattan,7142993903,9370.18553,-17018.18255,69741,7410,29,6.04,53360,58347
205031545,Manhattan,1415225676,1996.92821,-3624.34656,35294,750,127,21.21,2401,10796
30283594,Manhattan,1611854192,4931.41347,-8952.50779,33581,3767,65,3.94,14641,37924
156158778,Manhattan,232134838,326.02619,-591.83023,29194,8,1,1.0,96,711


In [121]:
# When did Danielle from Queens last receive a review? # July 8th, 2019
the_danielles = air_bnb.loc[(air_bnb['host_name'] == 'Danielle') & (air_bnb['neighbourhood_group'] == 'Queens')]
the_danielles['last_review'].max()

Timestamp('2019-07-08 00:00:00')

## Further Questions

1. Which host has the most listings?

In [158]:
#Sonder in Manhattan has the most with 327
host_list_sums = air_bnb.groupby(['host_name', 'neighbourhood_group']).max()
host_list_sums.sort_values(by=['calculated_host_listings_count'], ascending=False).head()
#not sure why I am getting an error here

  host_list_sums = air_bnb.groupby(['host_name', 'neighbourhood_group']).max()


Unnamed: 0_level_0,Unnamed: 1_level_0,id,host_id,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
host_name,neighbourhood_group,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Sonder (NYC),Manhattan,35937891,219517861,Upper East Side,40.76447,-73.96295,Private room,699,29,20,2019-06-26,4.52,327,365
Blueground,Manhattan,36404972,107434423,West Village,40.79094,-73.9491,Entire home/apt,481,120,2,2019-05-16,0.39,232,365
Blueground,Brooklyn,36404936,107434423,Williamsburg,40.71493,-73.96365,Entire home/apt,312,30,0,NaT,,232,349
Kara,Manhattan,36309947,157658093,West Village,40.82434,-73.93606,Private room,1170,120,39,2019-06-27,1.48,121,365
Kazuya,Queens,34396674,137358866,Woodside,40.768,-73.87123,Private room,70,30,3,2019-06-23,1.0,103,273


2. How many listings have completely open availability?

In [165]:
# 1295 listings have 365 days of availability
three_six_five_listings = air_bnb.loc[air_bnb['availability_365'] == 365]
three_six_five_listings['availability_365'].count()

1295

3. What room_types have the highest review numbers?

In [176]:
# Entire Home/Apt layouts have the highest number of reviews
rooms = air_bnb.groupby('room_type').sum()
rooms['number_of_reviews']


room_type
Entire home/apt    580403
Private room       538346
Shared room         19256
Name: number_of_reviews, dtype: int64

# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --

In [None]:
'''It's hard to tell why which hosts are the busiest without further information. If this dataset included 
customer testimonials, the "why" part of question 1 would be reasonably answered. Private rooms in Manhattan 
are not the most popular, and given the high average price of $196 (which is highest in the NYC area), 
and extraordinarily high total price, this stands to reason. Upon further investigation of the data set, 
this study found that in all but one case, accumulating the results based on Host_ID versus Host_Name 
did not change the outcome of the questions asked. This is because the top owners of the area are all a 
single invidual/group under one name.

While it is not possible to compare the average costs of AirBnB rentals in other cities due to the limitations
of the data set at present, the author will conclude, based on anecdotal experience, that NYC AirBnB rentals
are extraordinarily over priced, and that the single individuals who own 
'''