# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidance to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [3]:
air_bnb = pd.read_csv('AB_NYC_2019 - AB_NYC_2019.csv')
air_bnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [4]:
# How many neighborhood groups are available and which shows up the most?


neighborhood_counts = air_bnb['neighbourhood_group'].value_counts()

num_neighborhood_groups = len(neighborhood_counts)

most_common_neighborhood_group = neighborhood_counts.idxmax()

print(f"There are {num_neighborhood_groups} neighborhood groups available.")
print(f"The neighborhood group that shows up the most is: {most_common_neighborhood_group}")


There are 5 neighborhood groups available.
The neighborhood group that shows up the most is: Manhattan


In [5]:
# Are private rooms the most popular in manhattan?
manhattan_data = air_bnb[air_bnb['neighbourhood_group'] == 'Manhattan']

room_type_counts = manhattan_data['room_type'].value_counts()


print(room_type_counts)


Entire home/apt    13199
Private room        7982
Shared room          480
Name: room_type, dtype: int64


In [7]:
# Which hosts are the busiest and based on their reviews?

reviews_per_host = air_bnb.groupby('host_id')['number_of_reviews'].sum().reset_index()

busiest_hosts = reviews_per_host.sort_values(by='number_of_reviews', ascending=False)

average_review_scores = air_bnb.groupby('host_id')['reviews_per_month'].mean().reset_index()

busiest_hosts = busiest_hosts.merge(average_review_scores, on='host_id', how='left')

busiest_hosts.rename(columns={'number_of_reviews': 'total_reviews', 'review_scores_rating': 'average_review_score'}, inplace=True)

top_10_busiest_hosts = busiest_hosts.head(10)
print(top_10_busiest_hosts)



     host_id  total_reviews  reviews_per_month
0   37312959           2273          10.706000
1     344035           2205           4.307692
2   26432133           2017          13.604000
3   35524316           1971           3.665455
4   40176101           1818           6.030000
5    4734398           1798           7.680000
6   16677326           1355           3.290833
7    6885157           1346           1.676667
8  219517861           1281           1.920580
9   23591164           1269           5.845000


In [10]:
#Which neighorhood group has the highest average price?
grouped_neighborhood = air_bnb.groupby('neighbourhood_group')

average_price_by_group = grouped_neighborhood['price'].mean()

highest_average_price_neighborhood = average_price_by_group.idxmax()

print(f"The neighborhood group with the highest average price is: {highest_average_price_neighborhood}")

The neighborhood group with the highest average price is: Manhattan


In [11]:
# Which neighbor hood group has the highest total price?

neighborhood_group_prices = air_bnb.groupby('neighbourhood_group')['price'].sum()

highest_total_price_neighborhood = neighborhood_group_prices.idxmax()

print("Neighborhood group with the highest total price:", highest_total_price_neighborhood)


Neighborhood group with the highest total price: Manhattan


In [12]:
#Which top 5 hosts have the highest total price?
air_bnb['total_price'] = air_bnb['price'] * air_bnb['availability_365']

hosts_total_price = air_bnb.groupby('host_id')['total_price'].sum().reset_index()

sorted_hosts_total_price = hosts_total_price.sort_values(by='total_price', ascending=False)

top_5_hosts = sorted_hosts_total_price.head(5)

print(top_5_hosts)

         host_id  total_price
34646  219517861     24563716
29407  107434423     18021038
19574   30283594     10448235
34051  205031545      7686699
1966      836168      6376000


In [17]:
# Who currently has no (zero) availability with a review count of 100 or more?

filtered_data = air_bnb[(air_bnb['availability_365'] == 0) & (air_bnb['number_of_reviews'] >= 100)]


result = filtered_data[['name', 'number_of_reviews']]
print(result)

                                            name  number_of_reviews
8             Cozy Clean Guest Room - Family Apt                118
94            Charming 1 bed GR8 WBurg LOCATION!                168
132             NYC artists’ loft with roof deck                193
174               Financial District Luxury Loft                114
180        Fort Greene, Brooklyn: Center Bedroom                206
...                                          ...                ...
29581         The Quietest Block in Manhattan :)                103
30461                          queens get away!!                119
31250  entire sunshine of the spotless mind room                102
32670                COZY Room for Female Guests                131
35014     Cozy corner near Empire State Building                112

[162 rows x 2 columns]


In [22]:
# What host has the highest total of prices and where are they located?

host_prices = air_bnb.groupby('host_id')['price'].sum().reset_index()
highest_price_host = host_prices.loc[host_prices['price'].idxmax()]
highest_price_host_id = highest_price_host['host_id']
host_location = air_bnb.loc[air_bnb['host_id'] == highest_price_host_id, ['host_name', 'neighbourhood_group', 'neighbourhood']].iloc[0]
print("Host with the highest total prices:")
print("Host Name:", host_location['host_name'])
print("Location:", host_location['neighbourhood'], "in", host_location['neighbourhood_group'])

Host with the highest total prices:
Host Name: Sonder (NYC)
Location: Financial District in Manhattan


In [27]:
# When did Danielle from Queens last receive a review?


danielle_queens_reviews = air_bnb[(air_bnb['host_name'] == 'Danielle') & (air_bnb['neighbourhood_group'] == 'Queens')]
danielle_queens_reviews_sorted = danielle_queens_reviews.sort_values(by='last_review', ascending=False)
last_review_date = danielle_queens_reviews_sorted['last_review'].iloc[0]


print("Danielle from Queens last received a review on:", last_review_date)


Danielle from Queens last received a review on: 2019-07-08


## Further Questions

1. Which host has the most listings?

In [28]:
host_listings_count = air_bnb['host_id'].value_counts()
host_with_most_listings = host_listings_count.idxmax()
most_listings_count = host_listings_count.max()

print(f"The host with the most listings is host_id: {host_with_most_listings}, with {most_listings_count} listings.")


The host with the most listings is host_id: 219517861, with 327 listings.


2. How many listings have completely open availability?

In [29]:
open_avail_count = len(air_bnb[air_bnb['availability_365'] == 365])

print("Number of listings with completely open availability:", open_avail_count)


Number of listings with completely open availability: 1295


3. What room_types have the highest review numbers?

In [30]:
room_types_review_avg = air_bnb.groupby('room_type')['number_of_reviews'].mean()

sorted_room_types = room_types_review_avg.sort_values(ascending=False)

room_types_review_count = air_bnb.groupby('room_type')['number_of_reviews'].sum()

print("Average Review Score for each Room Type:")
print(sorted_room_types)

print("\nTotal Number of Reviews for each Room Type:")
print(room_types_review_count)


Average Review Score for each Room Type:
room_type
Private room       24.112962
Entire home/apt    22.842418
Shared room        16.600000
Name: number_of_reviews, dtype: float64

Total Number of Reviews for each Room Type:
room_type
Entire home/apt    580403
Private room       538346
Shared room         19256
Name: number_of_reviews, dtype: int64


# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.

-- Add your conclusion --