# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidance to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [3]:
df = pd.read_csv('AB_NYC_2019.csv')
df.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [42]:
# How many neighborhood groups are available and which shows up the most?
nh_count = df['neighbourhood_group'].nunique()
most_common_nh = df['neighbourhood_group'].value_counts().idxmax(0)
print(nh_count, "Total Unique neighbourhoods")
print(most_common_nh, "Shows up the most")

5 Total Unique neighbourhoods
Manhattan Shows up the most


In [43]:
# Are private rooms the most popular in manhattan?
private_rooms = ((df['room_type'] == 'Private room') & (df['neighbourhood_group'] == 'Manhattan')).sum()
shared_rooms = ((df['room_type'] == 'Shared room') & (df['neighbourhood_group'] == 'Manhattan')).sum()
entire_rooms = ((df['room_type'] == 'Entire home/apt') & (df['neighbourhood_group'] == 'Manhattan')).sum()
if private_rooms > shared_rooms and private_rooms > entire_rooms:
    print("Private rooms are the most common in Manhattan at a count of", private_rooms)
else:
    print("Private rooms are not the largest or most common")

count = df[df['neighbourhood_group'] == 'Manhattan']
print(count['room_type'].value_counts())

Private rooms are not the largest or most common
room_type
Entire home/apt    13199
Private room        7982
Shared room          480
Name: count, dtype: int64


In [66]:
# Which hosts are the busiest and based on their reviews?
hosts = df['host_name']
review_count = df['reviews_per_month'].idxmax()
print(df['host_name'].iloc[review_count], "really do got the most reviews")

Row NYC really do got the most reviews


In [44]:
#Which neighorhood group has the highest average price?
groups = df['neighbourhood_group'].value_counts()
groups_average = df.groupby('neighbourhood_group')['price'].mean()
print(groups_average)
print(groups_average.idxmax(), "has the adverage highest price")

neighbourhood_group
Bronx             87.496792
Brooklyn         124.383207
Manhattan        196.875814
Queens            99.517649
Staten Island    114.812332
Name: price, dtype: float64
Manhattan Has the adverage highest price


In [45]:
# Which neighbor hood group has the highest total price?
groups_average = df.groupby('neighbourhood_group')['price'].sum()
print(groups_average)
print(groups_average.idxmax(), "has the highest price")

neighbourhood_group
Bronx              95459
Brooklyn         2500600
Manhattan        4264527
Queens            563867
Staten Island      42825
Name: price, dtype: int64
Manhattan Has the highest price


In [47]:
#Which top 5 hosts have the highest total price?
groups_average = df.groupby('host_name')['price'].sum()
print(groups_average.nlargest(5), "are the top 5")

host_name
Sonder (NYC)    82795
Blueground      70331
Michael         66895
David           65844
Alex            52563
Name: price, dtype: int64 are the top 5


In [48]:
# Who currently has no (zero) availability with a review count of 100 or more?

hosts = ((df['availability_365'] == 0) & (df['number_of_reviews'] >= 100)).idxmax()
print(df['host_name'].iloc[hosts])

MaryEllen


In [49]:
# What host has the highest total of prices and where are they located?
prices = df.groupby(['host_name','neighbourhood'])['price'].sum().reset_index()
index = prices['price'].idxmax()
print(prices.loc[index])

host_name              Sonder (NYC)
neighbourhood    Financial District
price                         57738
Name: 25263, dtype: object


In [51]:
# When did Danielle from Queens last receive a review?
sub_data = df[['host_name','neighbourhood_group','last_review']]
sub_data = sub_data[(sub_data['host_name'] == "Danielle") & (sub_data['neighbourhood_group']== "Queens")]
sub_data['last_review'] = pd.to_datetime(sub_data['last_review'])
sub_data = sub_data.sort_values(by='last_review')
print(sub_data.iloc[-1],"\n") # Returns NaT but -2 works 
print(sub_data.iloc[-2]) # Above retunrs NaT but just incase, I included the second last review

host_name              Danielle
neighbourhood_group      Queens
last_review                 NaT
Name: 16349, dtype: object 

host_name                         Danielle
neighbourhood_group                 Queens
last_review            2019-07-08 00:00:00
Name: 22469, dtype: object


## Further Questions

1. Which host has the most listings?

In [41]:
host_count = df['host_name'].value_counts()
index = host_count.idxmax()
print(index, "has the most listings")
print(host_count)

Michael has the most listings
host_name
Michael              417
David                403
Sonder (NYC)         327
John                 294
Alex                 279
                    ... 
Rhonycs                1
Brandy-Courtney        1
Shanthony              1
Aurore And Jamila      1
Ilgar & Aysel          1
Name: count, Length: 11452, dtype: int64


2. How many listings have completely open availability?

In [12]:
sub_data = df[df['availability_365'] == 365]
print(sub_data['availability_365'].count(), "Total listings that have been open 365 days of the year")

1295 Total listings that have been open 365 days of the year


3. What room_types have the highest review numbers?

In [55]:
unqiue_rooms = df.groupby('room_type')['number_of_reviews'].sum()
print(unqiue_rooms.idxmax(), "has the most common")
print(unqiue_rooms)

Entire home/apt
room_type
Entire home/apt    580403
Private room       538346
Shared room         19256
Name: number_of_reviews, dtype: int64


# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --

## Seems pretty choatic house market. 
## Probably why its so expensive to own a house in NYC
## All of them are being rented out
## Im not a real estate agent but sounds like a pretty good business model
