# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidance to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to simulate what you will face when you are out in the wild. 

Happy Coding!

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [3]:
air_bnb = pd.read_csv('./AB_NYC_2019.csv')
air_bnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [6]:
# How many neighborhood groups are available and which shows up the most?
hoods = air_bnb[['id','neighbourhood_group']].groupby('neighbourhood_group').count().sort_values('id',ascending = False)
hoods 
# 5 neighborhood groups are available, Manhattan shows up the most. 

Unnamed: 0_level_0,id
neighbourhood_group,Unnamed: 1_level_1
Manhattan,21661
Brooklyn,20104
Queens,5666
Bronx,1091
Staten Island,373


In [14]:
# Are private rooms the most popular in manhattan? 
manhattan = air_bnb[air_bnb['neighbourhood_group'] == 'Manhattan']
manhattan = manhattan[['neighbourhood_group','room_type']].groupby('room_type').count().sort_values('room_type', ascending = False)
manhattan

# Entire home/apt is the most popular type of airBnB rental in Manhattan.

Unnamed: 0_level_0,neighbourhood_group
room_type,Unnamed: 1_level_1
Shared room,480
Private room,7982
Entire home/apt,13199


In [44]:
# Which hosts are the busiest and based on their reviews?
busiest = air_bnb[['host_id','number_of_reviews','availability_365']].groupby('host_id').sum().sort_values('number_of_reviews', ascending = False)
busiest
#tophost = air_bnb[air_bnb['host_id'] == 37312959]	

# host_id 37312959 has the highest amount of reviews 

Unnamed: 0_level_0,number_of_reviews,availability_365
host_id,Unnamed: 1_level_1,Unnamed: 2_level_1
37312959,2273,824
344035,2205,3723
26432133,2017,1443
35524316,1971,2559
40176101,1818,342
...,...,...
140338526,0,0
24508767,0,0
140323391,0,0
140312311,0,0


In [46]:
#Which neighorhood group has the highest average price?
avgPriceByHood = air_bnb[['neighbourhood_group','price']].groupby('neighbourhood_group').mean().sort_values('price',ascending = False).round(decimals=2)
avgPriceByHood

# Manhattan has highest average price - $196.88

Unnamed: 0_level_0,price
neighbourhood_group,Unnamed: 1_level_1
Manhattan,196.88
Brooklyn,124.38
Staten Island,114.81
Queens,99.52
Bronx,87.5


In [54]:
# Which neighbor hood group has the highest total price?
highestPrice = air_bnb[['neighbourhood_group','price']].groupby('neighbourhood_group').sum().sort_values('price',ascending = False)
highestPrice

# Manhattan has the highest total price 

Unnamed: 0_level_0,price
neighbourhood_group,Unnamed: 1_level_1
Manhattan,4264527
Brooklyn,2500600
Queens,563867
Bronx,95459
Staten Island,42825


In [64]:
#Which top 5 hosts have the highest total price?
highestPricedHosts = air_bnb[['host_id','price']].groupby('host_id').sum().sort_values('price', ascending = False)
highestPricedHosts.head(5)

Unnamed: 0_level_0,price
host_id,Unnamed: 1_level_1
219517861,82795
107434423,70331
156158778,37097
205031545,35294
30283594,33581


In [75]:
# Who currently has no (zero) availability with a review count of 100 or more?
notAvailable = air_bnb[['host_id','availability_365']].groupby('host_id').sum()
notAvailable = notAvailable[notAvailable['availability_365']== 0]


highlyReviewed = air_bnb[['host_id','number_of_reviews']].groupby('host_id').sum()
highlyReviewed = highlyReviewed[highlyReviewed['number_of_reviews'] >= 100]
highlyReviewed
highlyReviewedNoAvail = highlyReviewed.merge(notAvailable, on = 'host_id', how = 'inner')
highlyReviewedNoAvail = highlyReviewedNoAvail.sort_values('number_of_reviews', ascending = False)
highlyReviewedNoAvail


Unnamed: 0_level_0,number_of_reviews,availability_365
host_id,Unnamed: 1_level_1,Unnamed: 2_level_1
22959695,1157,0
99392252,732,0
121391142,693,0
792159,480,0
37818581,479,0
...,...,...
21090508,100,0
140293912,100,0
96148809,100,0
22423049,100,0


In [115]:
# What host has the highest total of prices and where are they located?
richHost = air_bnb[['host_id','price']].groupby('host_id').sum().sort_values('price', ascending = False).head(1)
richHost

richestHost = air_bnb.merge(richHost, on = 'host_id', how = 'inner')
richestHost = richestHost[['host_id', 'neighbourhood_group','id']].groupby(["host_id",'neighbourhood_group']).count()
richestHost

# host_id 219517861 has the highest total price ($82795) and is located in Manhattan. 

Unnamed: 0_level_0,Unnamed: 1_level_0,id
host_id,neighbourhood_group,Unnamed: 2_level_1
219517861,Manhattan,327


In [98]:
# When did Danielle from Queens last receive a review?
lastReview = air_bnb[air_bnb['host_name'] == 'Danielle']
lastReview = lastReview[lastReview['neighbourhood_group'] == 'Queens']

lastReview = lastReview.sort_values('last_review', ascending = False).head(1)
lastReview[['last_review']]

# Danielle from Queens received their last review on 2019-07-08


Unnamed: 0,last_review
22469,2019-07-08


## Further Questions

1. Which host has the most listings?

In [109]:
mostListings = air_bnb[['id','host_id']].groupby('host_id').count().sort_values('id', ascending = False)
mostListings
hostWithTheMost = air_bnb[air_bnb['host_id'] == 219517861]
hostWithTheMost[['host_name']].head(1)

#Sonder(NYC) has the most listings (327)

Unnamed: 0,host_name
38293,Sonder (NYC)


2. How many listings have completely open availability?

In [113]:
openAvail = air_bnb[['id']][air_bnb['availability_365'] == 365].count()
openAvail

# 1295 lsitings are completely available.

id                  1295
availability_365    1295
dtype: int64

3. What room_types have the highest review numbers?

In [116]:
reviewsByRoomType = air_bnb[['room_type', 'number_of_reviews']].groupby('room_type').sum('number_of_reviews').sort_values('number_of_reviews',ascending = False)
reviewsByRoomType


Unnamed: 0_level_0,number_of_reviews
room_type,Unnamed: 1_level_1
Entire home/apt,580403
Private room,538346
Shared room,19256


# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --

In [None]:
# The most expensive neighborhood to rent from is Manahattan, then Broklyn. Entire home/apts are the most expensive type of rental. 
# From my analysis, I think it's ridiculous that one person can own 375 properties within NY for airBnB, which lowkey explains why the city is so expensive.


