# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidance to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [3]:
air_bnb = pd.read_csv('AB_NYC_2019.csv')
air_bnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [58]:
# How many neighborhood groups are available and which shows up the most?
all_neighborhood_groups = air_bnb.groupby('neighbourhood_group').size()

most_popular_neighborhood = all_neighborhood_groups.idxmax()

count_most_popular_neighborhood = all_neighborhood_groups.max()


In [12]:
# Are private rooms the most popular in manhattan?
manhattan_data = air_bnb[air_bnb['neighbourhood_group'] == 'Manhattan']

count_room_types = manhattan_data['room_type'].value_counts()

most_popular_room_type = count_room_types.idxmax()

if most_popular_room_type == 'Private room':
    print('Private rooms are the most desired type of rental.')
else:
    print('Private rooms are NOT the most desired type of rental')


Private rooms are NOT the most desired type of rental


In [59]:
# Which hosts are the busiest and based on their reviews?
host_reviews = air_bnb.groupby('host_id')['number_of_reviews'].sum()

host_with_the_most = host_reviews.sort_values(ascending = False)


In [60]:
#Which neighorhood group has the highest average price?
average_price_by_neighborhood = air_bnb.groupby('neighbourhood_group')['price'].mean()

highest_priced_neighborhood_by_average = average_price_by_neighborhood.idxmax()


In [61]:
# Which neighbor hood group has the highest total price?
highest_total_price_by_neighborhood = air_bnb.groupby('neighbourhood_group')['price'].sum()

most_expensive_neighborhood_with_rentals = highest_total_price_by_neighborhood.idxmax()


In [62]:
#Which top 5 hosts have the highest total price?
total_price_per_host = air_bnb.groupby('host_id')['price'].sum()

top_5_hosts = total_price_per_host.sort_values(ascending = False).head()


In [63]:
# Who currently has no (zero) availability with a review count of 100 or more?
refined_search_air_bnb = air_bnb[air_bnb['number_of_reviews'] >= 100]

hosts_with_nothing_available = refined_search_air_bnb[refined_search_air_bnb['availability_365'] == 0]

specific_hosts_with_nothing_available = hosts_with_nothing_available['host_id'].unique()


In [64]:
# What host has the highest total of prices and where are they located?
total_price_by_host = air_bnb.groupby('host_id')['price'].sum()

host_with_highest_total_prices = total_price_by_host.idxmax()

location_of_host_with_highest_total = air_bnb.loc[air_bnb['host_id'] == host_with_highest_total_prices, 'neighbourhood_group'].iloc[0]


In [65]:
# When did Danielle from Queens last receive a review?
danielle_from_queens_reviews = air_bnb[(air_bnb['host_name'] == 'Danielle') & (air_bnb['neighbourhood_group'] == 'Queens')]

sorted_reviews_for_danielle = danielle_from_queens_reviews.sort_values(by = 'last_review', ascending = False)

date_of_last_review = sorted_reviews_for_danielle.iloc[0]['last_review']


## Further Questions

1. Which host has the most listings?

In [66]:
number_of_listings_per_host = air_bnb.groupby('host_id').size()

host_that_lists_the_most = number_of_listings_per_host.idxmax()

most_listings_count = number_of_listings_per_host.max()


2. How many listings have completely open availability?

In [67]:
all_year_available_listings_count = air_bnb[air_bnb['availability_365'] == 365].shape[0]


3. What room_types have the highest review numbers?

In [68]:
review_count_by_room_type = air_bnb.groupby('room_type')['number_of_reviews'].sum()

room_type_with_highest_reviews = review_count_by_room_type[review_count_by_room_type == review_count_by_room_type.max()]


# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --

In [69]:
print("Number of neighborhoods:", len(all_neighborhood_groups))
print("Neighborhood that appears the most:", most_popular_neighborhood)
if most_popular_room_type == 'Private room':
    print('Private rooms are the most desired type of rental.')
else:
    print('Private rooms are NOT the most desired type of rental')
print('The top 3 busiest hosts according to the number of reviews they receive are:')
print(host_with_the_most.head(3))
print('The neighborhood group with the highest average price for rentals is:', highest_priced_neighborhood_by_average)
print('The neighborhood with the highest TOTAL price of rentals is:', most_expensive_neighborhood_with_rentals)
print('The top 5 hosts with the highest total price of rentals are:') 
print(top_5_hosts)
print('Hosts with no rentals available that have 100 or more reviews:') 
print(specific_hosts_with_nothing_available)
print('Host with the highest total of prices:', host_with_highest_total_prices)
print('This is where they are located:', location_of_host_with_highest_total)
print('The last time Danielle from Queens received a review was:', date_of_last_review)
print('The host with the most listings is:', host_that_lists_the_most)
print('The number of listings that are available during the entire year are:', all_year_available_listings_count)
print('Room types with the highest review numbers:')
print(room_type_with_highest_reviews)

Number of neighborhoods: 5
Neighborhood that appears the most: Manhattan
Private rooms are NOT the most desired type of rental
The top 3 busiest hosts according to the number of reviews they receive are:
host_id
37312959    2273
344035      2205
26432133    2017
Name: number_of_reviews, dtype: int64
The neighborhood group with the highest average price for rentals is: Manhattan
The neighborhood with the highest TOTAL price of rentals is: Manhattan
The top 5 hosts with the highest total price of rentals are:
host_id
219517861    82795
107434423    70331
156158778    37097
205031545    35294
30283594     33581
Name: price, dtype: int64
Hosts with no rentals available that have 100 or more reviews:
[     7490     79402    129352    193722     67778    239208    303939
    522065    683975    242506    792159   1311398   1358312   1360043
     36897   1492339   1503831   1649300   1935605   1146958   2265770
   2275829   2361715   1787284   1215949   2494666   3558158   3778274
   3880974 

In [57]:
print('I would draw the conclusion that rentals of entire homes/apartments, located in Manhattan;')
print('That have the least availability and the highest total prices of listings.')
print('Show the hosts that own the most popular rental properties.')
print('I would use the included code to search for the top ten host that own properties')
print('fulfilling the these specific requirements. Here are the results:')

entire_home_rentals_manhattan = air_bnb[(air_bnb['neighbourhood_group'] == 'Manhattan') & (air_bnb['room_type'] == 'Entire home/apt')]

host_stats = entire_home_rentals_manhattan.groupby('host_id').agg({'price': 'sum', 'availability_365': 'min'})

top_10_hosts = host_stats.sort_values(by = ['availability_365', 'price'], ascending = False).head(10)

top_10_hosts_id_num_and_names = top_10_hosts.merge(air_bnb[['host_id', 'host_name']], how = 'left', on = 'host_id')

print(top_10_hosts_id_num_and_names)

I would draw the conclusion that rentals of entire homes/apartments, located in Manhattan;
That have the least availability and the highest total prices of listings.
Show the hosts that own the most popular rental properties.
I would use the included code to search for the top ten host that own properties
fulfilling the these specific requirements. Here are the results:
     host_id  price  availability_365        host_name
0    4382127   9999               365             Matt
1  271248669   6500               365            Jenny
2     213266   5000               365          Jessica
3   45863742   3750               365            James
4  229458601   3200               365              Kay
5  101080203   3000               365  Luxury Property
6    1581845   2400               365             Indi
7   10767841   2000               365            Manon
8  270294045   1600               365        Christian
9    5118419   1500               365             Greg
