# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidance to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

In [177]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
air_bnb = pd.read_csv('AB_NYC_2019.csv')
air_bnb.head()
air_bnb.keys()

Index(['id', 'name', 'host_id', 'host_name', 'neighbourhood_group',
       'neighbourhood', 'latitude', 'longitude', 'room_type', 'price',
       'minimum_nights', 'number_of_reviews', 'last_review',
       'reviews_per_month', 'calculated_host_listings_count',
       'availability_365'],
      dtype='object')

In [178]:
# 1. Which hosts are the busiest and why?
print('\nSum of reviews per month, indicating how busy host is.')
print(air_bnb.groupby(['host_id', 'host_name']).sum().sort_values('reviews_per_month', ascending = False).head(5)[['reviews_per_month']])

print('\nCount of properties.')
print(air_bnb.groupby(['host_id', 'host_name']).count().sort_values('id', ascending = False).head(5)[['name']])
print('\nSonder (NYC) is the busiest because he has the most properties for rent.\n')


Sum of reviews per month, indicating how busy host is.
                        reviews_per_month
host_id   host_name                      
219517861 Sonder (NYC)             397.56
244361589 Row NYC                  111.72
232251881 Lakshmee                  80.63
26432133  Danielle                  68.02
137274917 David                     62.89

Count of properties.
                          name
host_id   host_name           
219517861 Sonder (NYC)     327
107434423 Blueground       232
30283594  Kara             121
137358866 Kazuya           103
16098958  Jeremy & Laura    96

Sonder (NYC) is the busiest because he has the most properties for rent.



In [179]:
# 2. How many neighborhood groups are available and which shows up the most?

print(air_bnb.groupby('neighbourhood_group').count().sort_values('neighbourhood', ascending = False)[['neighbourhood']])
print('\nFive Neighourhood groups, with Manhattan showing up the most.')

                     neighbourhood
neighbourhood_group               
Manhattan                    21661
Brooklyn                     20104
Queens                        5666
Bronx                         1091
Staten Island                  373

Five Neighourhood groups, with Manhattan showing up the most.


In [180]:
# 3. Are private rooms the most popular in manhattan?
# Interpreting this as the most popular to rent out, i.e. the count of private rooms available

print(air_bnb.groupby(['neighbourhood_group', 'room_type']).count()[['name']])
print('\nNo. there are only 7,979 private rooms, but 13,193 houses / apartments for rent in Manhattan.\n')

                                      name
neighbourhood_group room_type             
Bronx               Entire home/apt    379
                    Private room       652
                    Shared room         59
Brooklyn            Entire home/apt   9558
                    Private room     10127
                    Shared room        413
Manhattan           Entire home/apt  13193
                    Private room      7979
                    Shared room        480
Queens              Entire home/apt   2096
                    Private room      3372
                    Shared room        198
Staten Island       Entire home/apt    176
                    Private room       188
                    Shared room          9

No. there are only 7,979 private rooms, but 13,193 houses / apartments for rent in Manhattan.



In [181]:
# 4. Which hosts are the busiest and based on their reviews?  ## busiest and best?
print(air_bnb.groupby(['host_id', 'host_name', 'neighbourhood_group']).sum().sort_values('reviews_per_month', ascending = False).head(5)[['reviews_per_month']])
print('\nSonder (NYC) is the busiest, getting almost 400 reviews per month.\n')


                                            reviews_per_month
host_id   host_name    neighbourhood_group                   
219517861 Sonder (NYC) Manhattan                       397.56
244361589 Row NYC      Manhattan                       111.72
232251881 Lakshmee     Queens                           80.63
26432133  Danielle     Queens                           68.02
137274917 David        Manhattan                        62.89

Sonder (NYC) is the busiest, getting almost 400 reviews per month.



In [182]:
# 5. Which neighorhood group has the highest average price?
air_bnb['price_per_night'] = air_bnb['price'] / air_bnb['minimum_nights']
print(air_bnb.groupby('neighbourhood_group').mean().sort_values('price_per_night', ascending = False)[['price_per_night']])
print("\nManhattan has the highest average price, at almost $90 per night.")

                     price_per_night
neighbourhood_group                 
Manhattan                  86.945981
Staten Island              65.941963
Brooklyn                   57.428778
Queens                     55.307232
Bronx                      50.703610

Manhattan has the highest average price, at almost $90 per night.


In [183]:
# 6. Which neighbor hood group has the highest total price?
print(air_bnb.groupby('neighbourhood_group').sum().sort_values('price', ascending = False)[['price']])
print('\nManhattan, with Total price of $4,264,527.\n')

                       price
neighbourhood_group         
Manhattan            4264527
Brooklyn             2500600
Queens                563867
Bronx                  95459
Staten Island          42825

Manhattan, with Total price of $4,264,527.



In [184]:
# 7. hich top 5 hosts have the highest total price?
air_bnb.groupby(['host_id', 'host_name', 'neighbourhood_group', 'neighbourhood']).sum().sort_values('price', ascending = False).head(5)[['price']]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,price
host_id,host_name,neighbourhood_group,neighbourhood,Unnamed: 4_level_1
219517861,Sonder (NYC),Manhattan,Financial District,57738
205031545,Red Awning,Manhattan,Midtown,35294
3750764,Kevin,Manhattan,Chelsea,18780
836168,Henry,Manhattan,Upper West Side,15000
1177497,Jessica,Brooklyn,Clinton Hill,14850


In [185]:
# 8. Who currently has no (zero) availability with a review count of 100 or more?

# air_bnb[air_bnb['hi_rev_zero_av'] == True][['host_id', 'host_name', 'number_of_reviews', 'availability_365']]

# air_bnb.groupby(['host_id', 'host_name']) #  air_bnb['number_of_reviews'] >= 100  & air_bnb['availability_365'] == 0
air_bnb[(air_bnb['number_of_reviews'] >= 100)  & (air_bnb['availability_365'] == 0)][['host_id', 'host_name', 'availability_365', 'number_of_reviews']].sort_values('number_of_reviews', ascending = False).reset_index()

Unnamed: 0,index,host_id,host_name,availability_365,number_of_reviews
0,471,792159,Wanda,0,480
1,9974,22959695,Gurpreet Singh,0,424
2,9976,22959695,Gurpreet Singh,0,408
3,22104,121391142,Deloris,0,368
4,5876,21641206,Veronica,0,351
...,...,...,...,...,...
157,19377,26073602,Anna,0,101
158,12375,22423049,Abraham,0,100
159,16190,42399786,Braydon,0,100
160,19459,96148809,Raymond,0,100


In [186]:
# 9. What host has the highest total of prices and where are they located?
air_bnb.groupby(['host_id','host_name','neighbourhood_group']).sum().sort_values('price', ascending = False).head(1)[['price']]


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,price
host_id,host_name,neighbourhood_group,Unnamed: 3_level_1
219517861,Sonder (NYC),Manhattan,82795


In [187]:
# 10. When did Danielle from Queens last receive a review?

air_bnb[(air_bnb['host_name'] == 'Danielle') & (air_bnb['neighbourhood_group'] == 'Queens')].sort_values('last_review', ascending = False)[['host_id','host_name','neighbourhood_group','last_review']]

Unnamed: 0,host_id,host_name,neighbourhood_group,last_review
22469,26432133,Danielle,Queens,2019-07-08
21517,26432133,Danielle,Queens,2019-07-07
20403,26432133,Danielle,Queens,2019-07-06
22068,26432133,Danielle,Queens,2019-07-06
7086,26432133,Danielle,Queens,2019-07-03
33861,201647469,Danielle,Queens,2019-06-20
27021,154256662,Danielle,Queens,2018-01-02
16349,18051286,Danielle,Queens,


## Further Questions

1. Which host has the most listings?

In [188]:
air_bnb.groupby(['host_id','host_name']).count().sort_values('id', ascending = False).head(1)[['name']]

Unnamed: 0_level_0,Unnamed: 1_level_0,name
host_id,host_name,Unnamed: 2_level_1
219517861,Sonder (NYC),327


2. How many listings have completely open availability?

In [189]:
air_bnb[air_bnb['availability_365'] == 365].count()[['id']]

id    1295
dtype: int64

3. What room_types have the highest review numbers?

In [190]:
air_bnb.groupby('room_type').mean().round().sort_values('number_of_reviews', ascending = False)[['number_of_reviews']]

Unnamed: 0_level_0,number_of_reviews
room_type,Unnamed: 1_level_1
Private room,24.0
Entire home/apt,23.0
Shared room,17.0


# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --

In [192]:
print('\n1. Which hosts are the busiest and why?\n')
print('\nSum of reviews per month, indicating how busy host is.')
print(air_bnb.groupby(['host_id', 'host_name']).sum().sort_values('reviews_per_month', ascending = False).head(5)[['reviews_per_month']])

print('\nCount of properties.')
print(air_bnb.groupby(['host_id', 'host_name']).count().sort_values('id', ascending = False).head(5)[['name']])
print('\nSonder (NYC) is the busiest because he has the most properties for rent.\n')

print('\n2. How many neighborhood groups are available and which shows up the most?\n')
print(air_bnb.groupby('neighbourhood_group').count().sort_values('neighbourhood', ascending = False)[['neighbourhood']])
print('\nFive Neighourhood groups, with Manhattan showing up the most.\n')

print('\n3. Are private rooms the most popular in manhattan?\n')
print(air_bnb.groupby(['neighbourhood_group', 'room_type']).count()[['name']])
print('\nNo. there are only 7,979 private rooms, but 13,193 houses / apartments for rent in Manhattan.\n')

print('\n4. Which hosts are the busiest and best based on their reviews?\n')
print(air_bnb.groupby(['host_id', 'host_name', 'neighbourhood_group']).sum().sort_values('reviews_per_month', ascending = False).head(5)[['reviews_per_month']])
print('\nSonder (NYC) is the busiest, getting almost 400 reviews per month.\n')

print('\n5. Which neighorhood group has the highest average price?\n')
print(air_bnb.groupby('neighbourhood_group').mean().round(decimals = 2).sort_values('price_per_night', ascending = False)[['price_per_night']])
print("\nManhattan has the highest average price, at almost $90 per night.")

print('\n6. Which neighborhood group has the highest total price?\n')
print(air_bnb.groupby('neighbourhood_group').sum().sort_values('price', ascending = False)[['price']])
print('\nManhattan, with Total price of $4,264,527.\n')

print('\n7. Which top 5 hosts have the highest total price?\n')
print(air_bnb.groupby(['host_id', 'host_name', 'neighbourhood_group', 'neighbourhood']).sum().sort_values('price', ascending = False).head(5)[['price']])

print('\n8. Who currently has no (zero) availability with a review count of 100 or more?\n')
print(air_bnb[(air_bnb['number_of_reviews'] >= 100)  & (air_bnb['availability_365'] == 0)][['host_id', 'host_name', 'availability_365', 'number_of_reviews']].sort_values('number_of_reviews', ascending = False).reset_index())

print('\n9. What host has the highest total of prices and where are they located?\n')
print(air_bnb.groupby(['host_id','host_name','neighbourhood_group']).sum().sort_values('price', ascending = False).head(1)[['price']])

print('\n10. When did Danielle from Queens last receive a review?\n')
print(air_bnb[(air_bnb['host_name'] == 'Danielle') & (air_bnb['neighbourhood_group'] == 'Queens')].sort_values('last_review', ascending = False)[['host_id','host_name','neighbourhood_group','last_review']])
print('\nThere are four hosts names Dalielle in Queens. The most recent review for any of them was 7/8/2019.\n')




1. Which hosts are the busiest and why?


Sum of reviews per month, indicating how busy host is.
                        reviews_per_month
host_id   host_name                      
219517861 Sonder (NYC)             397.56
244361589 Row NYC                  111.72
232251881 Lakshmee                  80.63
26432133  Danielle                  68.02
137274917 David                     62.89

Count of properties.
                          name
host_id   host_name           
219517861 Sonder (NYC)     327
107434423 Blueground       232
30283594  Kara             121
137358866 Kazuya           103
16098958  Jeremy & Laura    96

Sonder (NYC) is the busiest because he has the most properties for rent.


2. How many neighborhood groups are available and which shows up the most?

                     neighbourhood
neighbourhood_group               
Manhattan                    21661
Brooklyn                     20104
Queens                        5666
Bronx                         1091
Staten 