# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidence to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in Manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?


This is to simulate what you will face when you are out in the wild. 

Happy Coding!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [267]:
air_bnb = pd.read_csv('../files/AB_NYC_2019.csv')

In [270]:
# How many neighborhood groups are available and which shows up the most?
print(air_bnb.neighbourhood_group.unique())
print('\n5 neighbourhood groups available')
print('\n')
print(air_bnb.groupby('neighbourhood_group', as_index=False).count()[['neighbourhood_group', 'id']])
print('\nManhattan shows up 21661 times')



['Brooklyn' 'Manhattan' 'Queens' 'Staten Island' 'Bronx']

5 neighbourhood groups available


  neighbourhood_group     id
0               Bronx   1091
1            Brooklyn  20104
2           Manhattan  21661
3              Queens   5666
4       Staten Island    373

Manhattan shows up 21661 times


In [265]:
# Are private rooms the most popular in manhattan?

print(air_bnb[air_bnb['neighbourhood_group'] == 'Manhattan'].groupby('room_type').count()['id'])
print('\nIn Manhattan, entire home/apt has over 5000 more listings than private rooms')


room_type
Entire home/apt    13199
Private room        7982
Shared room          480
Name: id, dtype: int64

In Manhattan, entire home/apt has over 5000 more listings than private rooms


In [264]:
# Which hosts are the busiest based on their reviews?

mostReviews = air_bnb.groupby(['host_id', 'host_name']).sum()[['reviews_per_month']]
print(mostReviews.sort_values('reviews_per_month', ascending=False).head()
)
print('\nSonder (NYC) has the most reviews per month')



                        reviews_per_month
host_id   host_name                      
219517861 Sonder (NYC)             397.56
244361589 Row NYC                  111.72
232251881 Lakshmee                  80.63
26432133  Danielle                  68.02
137274917 David                     62.89

Sonder (NYC) has the most reviews per month


In [144]:
#Which neighorhood group has the highest average price?
highestAvgPrice = air_bnb.groupby('neighbourhood_group').mean()
highestAvgPrice = highestAvgPrice.sort_values('price', ascending=False)
print(highestAvgPrice['price'])
print('\nManhattan has the highest average price')

neighbourhood_group
Manhattan        196.875814
Brooklyn         124.383207
Staten Island    114.812332
Queens            99.517649
Bronx             87.496792
Name: price, dtype: float64

Manhattan has the highest average price


In [146]:
# Which neighborhood group has the highest total price?
totalPrice = air_bnb.groupby('neighbourhood_group').sum()
totalPrice = totalPrice.sort_values('price', ascending=False)
print(totalPrice['price'])
print('\nManhattan has the highest total price')

neighbourhood_group
Manhattan        4264527
Brooklyn         2500600
Queens            563867
Bronx              95459
Staten Island      42825
Name: price, dtype: int64

Manhattan has the highest total price


In [158]:
#Which top 5 hosts have the highest total price?
topHosts = air_bnb.groupby(['host_id', 'host_name']).sum()
topHosts = topHosts.sort_values('price', ascending = False)
topHosts['price'].head()


host_id    host_name   
219517861  Sonder (NYC)    82795
107434423  Blueground      70331
156158778  Sally           37097
205031545  Red Awning      35294
30283594   Kara            33581
Name: price, dtype: int64

In [271]:
# Who currently has no (zero) availability with a review count of 100 or more?

noAvail = air_bnb[['host_id', 'host_name', 'number_of_reviews', 'availability_365']]
noAvail = noAvail.groupby(['host_id', 'host_name']).sum()
noAvail = noAvail[(noAvail['availability_365'] == 0) & (noAvail['number_of_reviews'] >= 100)]
print(noAvail )
print('\n135 hosts with 100+ reviews have no availability')


                      number_of_reviews  availability_365
host_id   host_name                                      
7490      MaryEllen                 118                 0
36897     Lydia                     107                 0
79402     Christiana                168                 0
129352    Sol                       193                 0
193722    Coral                     114                 0
...                                 ...               ...
143944704 Ash                       104                 0
155125855 Vicente                   125                 0
176185168 Janet                     119                 0
187487947 Diego                     164                 0
209549523 Mariluz                   241                 0

[135 rows x 2 columns]

135 hosts with 100+ reviews have no availability


In [236]:
# What host has the highest total of prices and where are they located?
hostTotal = air_bnb.groupby(['host_name','neighbourhood_group'])[['price']].sum()
print(hostTotal.sort_values('price', ascending = False).head(1))
print('\nSonder(NYC) has the highest total price and is located in Manhattan.')


                                  price
host_name    neighbourhood_group       
Sonder (NYC) Manhattan            82795

Sonder(NYC) has the highest total price and is located in Manhattan.


In [263]:
# When did Danielle from Queens last receive a review?
danielle = air_bnb[(air_bnb['host_name'] == 'Danielle') & (air_bnb['neighbourhood_group'] == 'Queens')]
danielle = danielle.sort_values('last_review', ascending = False)
danielle = danielle[['host_name', 'last_review']]
print(danielle.head())
print('\nLast review is from 2019-07-08')

      host_name last_review
22469  Danielle  2019-07-08
21517  Danielle  2019-07-07
20403  Danielle  2019-07-06
22068  Danielle  2019-07-06
7086   Danielle  2019-07-03

Last review is from 2019-07-08


## Further Questions

1. Which host has the most listings?

In [346]:
mostListings = air_bnb.groupby(['host_id','host_name']).count()
mostListings = mostListings.sort_values('id', ascending = False)['id'].head(1)
mostListings




host_id    host_name   
219517861  Sonder (NYC)    327
Name: id, dtype: int64

2. How many listings have completely open availability?

In [349]:
completelyOpen = air_bnb[['id','availability_365']]
completelyOpen[completelyOpen['availability_365'] == 365].count()


id                  1295
availability_365    1295
dtype: int64

3. What room_types have the highest review numbers?

In [357]:
print(air_bnb.groupby('room_type')['number_of_reviews'].sum())
air_bnb.groupby('room_type')['number_of_reviews'].sum().head(1)


room_type
Entire home/apt    580403
Private room       538346
Shared room         19256
Name: number_of_reviews, dtype: int64


room_type
Entire home/apt    580403
Name: number_of_reviews, dtype: int64

# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please describe them here.

-- Add your conclusion --