# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidance to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
air_bnb = pd.read_csv('AB_NYC_2019.csv')
air_bnb.head()



Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [30]:
# How many neighborhood groups are available and which shows up the most?
# relavant info will be found in neighbourhood group
#find number of unique values and the most common


neighborhood_groups_count = air_bnb['neighbourhood_group'].value_counts()  
unique_neighborhood_groups = air_bnb['neighbourhood_group'].nunique()
most_common_group = neighborhood_groups_count.idxmax()

print(f"There are {unique_neighborhood_groups} neighborhood groups and {most_common_group} is the most commonly occurring group.")

There are 5 neighborhood groups and Manhattan is the most commonly occurring group.


In [33]:
# Are private rooms the most popular in manhattan?
#find maximum room type count 
#see if that matches type private room, print true or false

manhattan_listings = air_bnb[air_bnb['neighbourhood_group'] == 'Manhattan']

# Count the room types in Manhattan
room_type_counts_manhattan = manhattan_listings['room_type'].value_counts()

most_popular_room_type = room_type_counts_manhattan.idxmax()

if most_popular_room_type == 'Private room':
    print("True")
else:
    print("False")

False


In [11]:
# Which hosts are the busiest and based on their reviews?

##we want to sort the data based on the hosts (host id and host name)
##Then sum up all the reviews from a listing using the number_of_reviews column
##then sort by total reviews and print out the top

busiest_hosts_by_reviews = air_bnb.groupby(['host_id', 'host_name']).agg(
    total_reviews=('number_of_reviews', 'sum')
).sort_values(by='total_reviews', ascending=False)

## Print the top 10 
print(busiest_hosts_by_reviews.head(10))

                                          total_reviews
host_id   host_name                                    
37312959  Maya                                     2273
344035    Brooklyn&   Breakfast    -Len-           2205
26432133  Danielle                                 2017
35524316  Yasu & Akiko                             1971
40176101  Brady                                    1818
4734398   Jj                                       1798
16677326  Alex And Zeena                           1355
6885157   Randy                                    1346
219517861 Sonder (NYC)                             1281
23591164  Angela                                   1269


In [12]:
#Which neighorhood group has the highest average price?
#this is similar to the last problem
##group data by neighborhood group, calcualte average price for each group, sort groups by average price

avg_price_by_neighborhood_group = air_bnb.groupby('neighbourhood_group').agg(
    avg_price=('price', 'mean')
).sort_values(by='avg_price', ascending=False)

# Display the neighborhood groups with the highest average prices
avg_price_by_neighborhood_group.head(5)  

Unnamed: 0_level_0,avg_price
neighbourhood_group,Unnamed: 1_level_1
Manhattan,196.875814
Brooklyn,124.383207
Staten Island,114.812332
Queens,99.517649
Bronx,87.496792


In [13]:
# Which neighbor hood group has the highest total price?
#group by neighborhood group, sum prices per group, sort by total price

total_price_by_neighborhood_group = air_bnb.groupby('neighbourhood_group').agg(
    total_price=('price', 'sum')
).sort_values(by='total_price', ascending=False)

# Display the neighborhood group with the highest total price
total_price_by_neighborhood_group.head(1)

Unnamed: 0_level_0,total_price
neighbourhood_group,Unnamed: 1_level_1
Manhattan,4264527


In [14]:
#Which top 5 hosts have the highest total price?
#group by hosts, add the prices together, sort by total price
#print out the top 5
total_price_by_host = air_bnb.groupby(['host_id', 'host_name']).agg(
    total_price=('price', 'sum')
).sort_values(by='total_price', ascending=False)

# Display the top 5 hosts with the highest total price
total_price_by_host.head(5) 

Unnamed: 0_level_0,Unnamed: 1_level_0,total_price
host_id,host_name,Unnamed: 2_level_1
219517861,Sonder (NYC),82795
107434423,Blueground,70331
156158778,Sally,37097
205031545,Red Awning,35294
30283594,Kara,33581


In [17]:
# Who currently has no (zero) availability with a review count of 100 or more?
#here we need to check the availability365 for availability
# check number of reviews for review count

hosts_zero_availability = air_bnb[(air_bnb['availability_365'] == 0) & (air_bnb['number_of_reviews'] >= 100)]

#print out the results
print(hosts_zero_availability)

             id                                       name    host_id  \
8          5203         Cozy Clean Guest Room - Family Apt       7490   
94        20913         Charming 1 bed GR8 WBurg LOCATION!      79402   
132       30031           NYC artists’ loft with roof deck     129352   
174       44221             Financial District Luxury Loft     193722   
180       45556      Fort Greene, Brooklyn: Center Bedroom      67778   
...         ...                                        ...        ...   
29581  22705516         The Quietest Block in Manhattan :)  127740507   
30461  23574142                          queens get away!!  176185168   
31250  24267706  entire sunshine of the spotless mind room   21074914   
32670  25719044                COZY Room for Female Guests   40119874   
35014  27759146     Cozy corner near Empire State Building  209549523   

        host_name neighbourhood_group              neighbourhood  latitude  \
8       MaryEllen           Manhattan        

In [18]:
# What host has the highest total of prices and where are they located?
#group by host name and ID, sum up total prices, sort by total, print out info on our top earner
#we also want to grab the neighborhood group and neighborhood to answer the where questions

total_price_by_host_location = air_bnb.groupby(['host_id', 'host_name', 'neighbourhood_group', 'neighbourhood']).agg(
    total_price=('price', 'sum')
).sort_values(by='total_price', ascending=False)

# Get the top-selling host with their location details
top_selling_host = total_price_by_host_location.head(1)

print(top_selling_host)


                                                               total_price
host_id   host_name    neighbourhood_group neighbourhood                  
219517861 Sonder (NYC) Manhattan           Financial District        57738


In [23]:
# When did Danielle from Queens last receive a review?
#we want to find the host names Danielle with the neighborhood Queens
#we need to have error handing if there is some non-number value in the last_review column, originally didn't work we need this too
air_bnb['last_review'] = pd.to_datetime(air_bnb['last_review'], errors='coerce')
danielle_reviews = air_bnb[(air_bnb['host_name'] == 'Danielle') & (air_bnb['neighbourhood_group'] == 'Queens')]

# Get the latest review date, maxiumum of dates with reviews
danielle_last_review = danielle_reviews['last_review'].max()

print(danielle_last_review)



2019-07-08 00:00:00


## Further Questions

1. Which host has the most listings?

In [24]:
#group by host ID and host name, sum up total listings by host, sort by total listings

most_listings_by_host = air_bnb.groupby(['host_id', 'host_name']).agg(total_listings=('calculated_host_listings_count', 'sum')).sort_values(by='total_listings', ascending=False)

# Get the host with the most listings (top of list)
top_host_with_most_listings = most_listings_by_host.head(1)

print(top_host_with_most_listings)

                        total_listings
host_id   host_name                   
219517861 Sonder (NYC)          106929


2. How many listings have completely open availability?

In [26]:
#looking for listings where available_365 is equal to 365
#count them and return how many

completely_open_listings = air_bnb[air_bnb['availability_365'] == 365]

# Count the number of such listings using .shape
total_open_availability = completely_open_listings.shape[0]

print(f"{total_open_availability} listings are completely avaialble for booking.")

1295 listings are completely avaialble for booking.


3. What room_types have the highest review numbers?

In [27]:
#group by room_type, sum total review number, sort by toal review numbers

reviews_by_room_type = air_bnb.groupby('room_type').agg(
    total_reviews=('number_of_reviews', 'sum')
).sort_values(by='total_reviews', ascending=False)

# Display the room types with the highest review numbers
print(reviews_by_room_type)

                 total_reviews
room_type                     
Entire home/apt         580403
Private room            538346
Shared room              19256


# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --

In [None]:
How many neighborhood groups are available and which shows up the most?
There are 5 unique neighborhood groups and Manhattan is the most commonly occurring group.

Are private rooms the most popular in manhattan?
No, the most popular is not private rooms.

Which hosts are the busiest and based on their reviews?
for the top 5 hosts, their id number, name, and review count are as follows:
37312959  Maya                                     2273
344035    Brooklyn&   Breakfast    -Len-           2205
26432133  Danielle                                 2017
35524316  Yasu & Akiko                             1971
40176101  Brady                                    1818

Which neighorhood group has the highest average price?
Manhattan with an average price of $196.875814

Which neighborhood group has the highest total price?
Manhattan with a total of $4264527

Which top 5 hosts have the highest total price?
the following are the hosts with top total prices
below are displayed the Host ID< host name, and total price
219517861	Sonder (NYC)	82795
107434423	Blueground	70331
156158778	Sally	37097
205031545	Red Awning	35294
30283594	Kara	33581

Who currently has no (zero) availability with a review count of 100 or more?
For this one we just printed the relevant listings. There are listings that satisfy both criteria.

What host has the highest total of prices and where are they located?
1295 listings are completely avaialble for booking.

When did Danielle from Queens last receive a review?
Danielle left her last review on 2019-07-08 00:00:00
