# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidance to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
air_bnb = pd.read_csv('./AB_NYC_2019.csv')
air_bnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [26]:
# How many neighborhood groups are available and which shows up the most?
air_bnb.neighbourhood_group.unique()
neighbourhoods = air_bnb.groupby('neighbourhood_group', as_index=False).count()[['neighbourhood_group', 'neighbourhood']]
sorted_neighbourhoods = neighbourhoods.sort_values(['neighbourhood'], ascending=False)
print(sorted_neighbourhoods)




  neighbourhood_group  neighbourhood
2           Manhattan          21661
1            Brooklyn          20104
3              Queens           5666
0               Bronx           1091
4       Staten Island            373


In [4]:
# Are private rooms the most popular in manhattan?
neighbourhood=air_bnb.groupby(['neighbourhood_group', 'room_type'])
print(neighbourhood[['id']].count())



                                        id
neighbourhood_group room_type             
Bronx               Entire home/apt    379
                    Private room       652
                    Shared room         60
Brooklyn            Entire home/apt   9559
                    Private room     10132
                    Shared room        413
Manhattan           Entire home/apt  13199
                    Private room      7982
                    Shared room        480
Queens              Entire home/apt   2096
                    Private room      3372
                    Shared room        198
Staten Island       Entire home/apt    176
                    Private room       188
                    Shared room          9


In [5]:
# Which hosts are the busiest and based on their reviews?
groupby_host_id=air_bnb.groupby('host_id')
hosts = groupby_host_id[['reviews_per_month']].count()
sorted_hosts=hosts.sort_values(['reviews_per_month'], ascending=False).head()
print(sorted_hosts)

           reviews_per_month
host_id                     
219517861                207
61391963                  79
16098958                  61
137358866                 51
7503643                   49


In [6]:
#Which neighorhood group has the highest average price?
average = air_bnb.groupby('neighbourhood_group')['price'].mean()
average.sort_values(ascending=False)

neighbourhood_group
Manhattan        196.875814
Brooklyn         124.383207
Staten Island    114.812332
Queens            99.517649
Bronx             87.496792
Name: price, dtype: float64

In [7]:
# Which neighbor hood group has the highest total price?
print(air_bnb.groupby('neighbourhood_group')['price'].sum())

neighbourhood_group
Bronx              95459
Brooklyn         2500600
Manhattan        4264527
Queens            563867
Staten Island      42825
Name: price, dtype: int64


In [51]:
#Which top 5 hosts have the highest total price?
top5 = air_bnb.groupby('host_name').sum().sort_values('price', ascending = False)
top5.head(5)


Unnamed: 0_level_0,id,host_id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
host_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Sonder (NYC),10885561678,71782340547,13316.25823,-24198.18856,82795,4353,1281,397.56,106929,98588
Blueground,7210036953,24924786136,9451.60418,-17166.13165,70331,7470,29,6.04,53824,58884
Michael,7430617239,22673153604,16984.89137,-30841.28573,66895,4600,11081,475.82,1043,38888
David,7943862898,28633613354,16414.34392,-29804.53757,65844,3754,8103,508.61,907,44171
Alex,5496620312,19850092761,11361.10533,-20635.86235,52563,1651,6204,443.44,475,30031


In [9]:
# Who currently has no (zero) availability with a review count of 100 or more?
no_avail = air_bnb[(air_bnb['availability_365'] == 0) & (air_bnb['number_of_reviews'] >= 100)]

no_avail.count()


id                                162
name                              162
host_id                           162
host_name                         161
neighbourhood_group               162
neighbourhood                     162
latitude                          162
longitude                         162
room_type                         162
price                             162
minimum_nights                    162
number_of_reviews                 162
last_review                       162
reviews_per_month                 162
calculated_host_listings_count    162
availability_365                  162
dtype: int64

In [17]:
# What host has the highest total of prices and where are they located?
print(air_bnb.groupby(['host_name','neighbourhood'])[['price']].sum().nlargest(1,['price']))


                                 price
host_name    neighbourhood            
Sonder (NYC) Financial District  57738


In [18]:
# When did Danielle from Queens last receive a review?
print(air_bnb[(air_bnb['host_name']=='Danielle') & (air_bnb['neighbourhood_group']=='Queens')]['last_review'])


7086     2019-07-03
16349           NaN
20403    2019-07-06
21517    2019-07-07
22068    2019-07-06
22469    2019-07-08
27021    2018-01-02
33861    2019-06-20
Name: last_review, dtype: object


## Further Questions

1. Which host has the most listings?

In [23]:
print(air_bnb.groupby('host_name')['name'].count().nlargest())

host_name
Michael         417
David           403
Sonder (NYC)    327
John            294
Alex            279
Name: name, dtype: int64


2. How many listings have completely open availability?

In [24]:
open365 = air_bnb[air_bnb['availability_365'] >= 365]
open365.id.count()

1295

3. What room_types have the highest review numbers?

In [25]:
print(air_bnb.groupby('room_type')['number_of_reviews'].sum())

room_type
Entire home/apt    580403
Private room       538346
Shared room         19256
Name: number_of_reviews, dtype: int64


# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --

<h6> Questions </h6>
<ol> 
    <li>manhattan</li>
    <li>entire homes/apt are most popular</li>
    <li>host 219517861 has was the busiest</li>
    <li>manhattan</li>
    <li>manhattan</li>
    <li>sonder, blueground, michael, david, alex</li>
    <li>162</li>
    <li>sonder</li>
    <li>7/8/2019</li>
</ol>
<h6> Further Questions </h6>
<ol>
    <li>michael</li>
    <li>1295</li>
    <li>entire homes/apt</li>
</ol>
    