# AirBnB NY Locations Data Case Study

Your task will be to take the data provided and find evidence to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

This is to simulate what you will face when you are out in the wild. 

Happy Coding!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
air_bnb = pd.read_csv('./AB_NYC_2019.csv')
air_bnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


### Question 1

How many neighborhood groups are available and which shows up the most?



In [3]:
air_bnb.groupby(['neighbourhood_group']).describe()

Unnamed: 0_level_0,id,id,id,id,id,id,id,id,host_id,host_id,...,calculated_host_listings_count,calculated_host_listings_count,availability_365,availability_365,availability_365,availability_365,availability_365,availability_365,availability_365,availability_365
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,...,75%,max,count,mean,std,min,25%,50%,75%,max
neighbourhood_group,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
Bronx,1091.0,22734920.0,10234020.0,44096.0,16174880.5,23879304.0,31899087.0,36442252.0,1091.0,105609900.0,...,2.0,37.0,1091.0,165.758937,135.247098,0.0,37.0,148.0,313.5,365.0
Brooklyn,20104.0,18256850.0,10833200.0,2539.0,8704323.75,18876042.5,27843948.75,36485057.0,20104.0,56715260.0,...,2.0,232.0,20104.0,100.232292,126.275775,0.0,0.0,28.0,188.0,365.0
Manhattan,21661.0,18774940.0,11167930.0,2595.0,9162161.0,19116844.0,29541214.0,36487245.0,21661.0,67830620.0,...,2.0,327.0,21661.0,111.97941,132.677836,0.0,0.0,36.0,230.0,365.0
Queens,5666.0,21755000.0,10376870.0,12937.0,13960418.25,22564596.0,30768797.25,36484363.0,5666.0,96156800.0,...,3.0,103.0,5666.0,144.451818,135.538597,0.0,2.0,98.0,286.0,365.0
Staten Island,373.0,21597470.0,10393100.0,42882.0,15532430.0,22977021.0,30082958.0,36438336.0,373.0,98533600.0,...,3.0,8.0,373.0,199.678284,131.852,0.0,78.0,219.0,333.0,365.0


In [4]:
# Answer: It looks like there are 5 neighborhood groups, the mode of which is Manhattan.

### Question 2

Are private rooms the most popular in manhattan?



In [5]:
air_bnb[air_bnb['neighbourhood_group']=='Manhattan'].groupby(['room_type']).describe()

Unnamed: 0_level_0,id,id,id,id,id,id,id,id,host_id,host_id,...,calculated_host_listings_count,calculated_host_listings_count,availability_365,availability_365,availability_365,availability_365,availability_365,availability_365,availability_365,availability_365
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,...,75%,max,count,mean,std,min,25%,50%,75%,max
room_type,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
Entire home/apt,13199.0,18668600.0,11370150.0,2595.0,8833426.5,19011527.0,29848697.5,36485431.0,13199.0,65576970.0,...,2.0,327.0,13199.0,117.140996,134.282211,0.0,0.0,42.0,245.0,365.0
Private room,7982.0,18807590.0,10823340.0,3647.0,9585945.0,19049897.0,28935059.0,36487245.0,7982.0,69823140.0,...,2.0,327.0,7982.0,101.845026,128.367346,0.0,0.0,29.0,188.0,365.0
Shared room,480.0,21156150.0,10950690.0,12048.0,11698076.0,22350176.0,31091598.5,36485609.0,480.0,96667200.0,...,4.0,28.0,480.0,138.572917,146.525946,0.0,0.0,81.0,320.0,365.0


In [6]:
# Private rooms are not the most popular. Enitre homes or apartments are.

### Question 3

Which hosts are the busiest based on their reviews?



In [7]:
# I noticed some NaN in the reviews_per_month, so I'm getting rid of that nonsense.
# I'm guessing those instances were caused by places that don't have reviews anyway,
# so I don't have to worry about deleting contenders.
air_bnb_full = air_bnb.dropna(axis=0)

In [8]:
# I want to make sure that there's not a duplicate host id for any reason.
# It turns out there is. I'll have to group by host_id.
air_bnb_full.host_id.count() == air_bnb_full.host_id.nunique()

False

In [9]:
air_bnb_full.groupby(['host_id'])['reviews_per_month'].sum().sort_values(ascending = False)

host_id
219517861    397.56
244361589    111.72
232251881     80.63
26432133      68.02
137274917     62.89
              ...  
10071119       0.01
2840710        0.01
7919277        0.01
72747          0.01
4350748        0.01
Name: reviews_per_month, Length: 30232, dtype: float64

In [10]:
air_bnb_full[air_bnb_full['host_id'] == 219517861]

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
38294,30181945,Sonder | 180 Water | Premier 1BR + Rooftop,219517861,Sonder (NYC),Manhattan,Financial District,40.70771,-74.00641,Entire home/apt,229,29,1,2019-05-29,0.73,327,219
38588,30347708,Sonder | 180 Water | Charming 1BR + Rooftop,219517861,Sonder (NYC),Manhattan,Financial District,40.70743,-74.00443,Entire home/apt,232,29,1,2019-05-21,0.60,327,159
39769,30937590,Sonder | The Nash | Artsy 1BR + Rooftop,219517861,Sonder (NYC),Manhattan,Murray Hill,40.74792,-73.97614,Entire home/apt,262,2,8,2019-06-09,1.86,327,91
39770,30937591,Sonder | The Nash | Lovely Studio + Rooftop,219517861,Sonder (NYC),Manhattan,Murray Hill,40.74771,-73.97528,Entire home/apt,255,2,14,2019-06-10,2.59,327,81
39771,30937594,Sonder | The Nash | Brilliant Studio + Rooftop,219517861,Sonder (NYC),Manhattan,Murray Hill,40.74845,-73.97446,Entire home/apt,245,2,4,2019-06-08,0.94,327,137
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
44346,34183895,Sonder | Stock Exchange | Intimate 1BR + Kitchen,219517861,Sonder (NYC),Manhattan,Financial District,40.70630,-74.01254,Entire home/apt,247,2,4,2019-06-23,2.35,327,232
44534,34284409,Sonder | 116 John | Modern Studio + Gym,219517861,Sonder (NYC),Manhattan,Financial District,40.70825,-74.00482,Entire home/apt,100,29,1,2019-06-06,0.88,327,358
44670,34341994,Sonder | 116 John | Polished 2BR + Gym,219517861,Sonder (NYC),Manhattan,Financial District,40.70781,-74.00525,Entire home/apt,179,29,1,2019-06-24,1.00,327,339
45148,34566104,Sonder | Stock Exchange | Warm Studio + Lounge,219517861,Sonder (NYC),Manhattan,Financial District,40.70598,-74.01069,Entire home/apt,222,2,1,2019-05-29,0.73,327,315


In [11]:
# Ah, there are duplicate id's because some are affiliated with companies.
# It only makes sense that the most review per month is on of these.
# EDIT: I realized later that there are also duplicate host_names because
# the csv is by each location, not host.

# Sonder (NYC) is the host with the most reviews per month, and therefore probably the most busy.

### Question 4

Which neighorhood group has the highest average price?


In [12]:
round(air_bnb.groupby(['neighbourhood_group'])['price'].mean(), 2)

neighbourhood_group
Bronx             87.50
Brooklyn         124.38
Manhattan        196.88
Queens            99.52
Staten Island    114.81
Name: price, dtype: float64

In [13]:
# It looks like Manhattan has the highest average price at $196.88

### Question 5

Which neighbor hood group has the highest total price?


In [14]:
round(air_bnb.groupby(['neighbourhood_group'])['price'].max(), 2)

neighbourhood_group
Bronx             2500
Brooklyn         10000
Manhattan        10000
Queens           10000
Staten Island     5000
Name: price, dtype: int64

In [15]:
# There's a three way tie for highest price since Brooklyn, Manhattan, and Queens all have a highest
# price of $10,000.

### Question 6

Which 5 hosts have the highest total prices?

In [16]:
round(air_bnb.groupby(['host_id','host_name'])['price'].max().sort_values(ascending = False).head(), 2)

host_id   host_name
20582832  Kathrine     10000
5143901   Erin         10000
72390391  Jelena       10000
1235070   Olson         9999
4382127   Matt          9999
Name: price, dtype: int64

In [17]:
# First off, I felt so satisfied thinking of that command line. I'm getting more used to this!

# Secondly, to top 5 hosts are Kathrine, Erin, Jelena, Olson, and Matt with prices
# of around $10,000

### Question 7

Who currently has no (zero) availability with a review count of 100 or more?

In [18]:
air_bnb[(air_bnb['availability_365']==0) & (air_bnb['number_of_reviews']>100)]

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
8,5203,Cozy Clean Guest Room - Family Apt,7490,MaryEllen,Manhattan,Upper West Side,40.80178,-73.96723,Private room,79,2,118,2017-07-21,0.99,1,0
94,20913,Charming 1 bed GR8 WBurg LOCATION!,79402,Christiana,Brooklyn,Williamsburg,40.70984,-73.95775,Entire home/apt,100,5,168,2018-07-22,1.57,1,0
132,30031,NYC artists’ loft with roof deck,129352,Sol,Brooklyn,Greenpoint,40.73494,-73.95030,Private room,50,3,193,2019-05-20,1.86,1,0
174,44221,Financial District Luxury Loft,193722,Coral,Manhattan,Financial District,40.70666,-74.01374,Entire home/apt,196,3,114,2019-06-20,1.06,1,0
180,45556,"Fort Greene, Brooklyn: Center Bedroom",67778,Doug,Brooklyn,Fort Greene,40.68863,-73.97691,Private room,65,2,206,2019-06-30,1.92,2,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29581,22705516,The Quietest Block in Manhattan :),127740507,Kathleen,Manhattan,Harlem,40.83102,-73.94181,Private room,65,2,103,2019-07-07,5.89,2,0
30461,23574142,queens get away!!,176185168,Janet,Queens,Laurelton,40.68209,-73.73662,Private room,65,1,119,2018-12-24,7.79,1,0
31250,24267706,entire sunshine of the spotless mind room,21074914,Albert,Brooklyn,Bedford-Stuyvesant,40.68234,-73.91318,Private room,49,1,102,2019-07-05,6.73,3,0
32670,25719044,COZY Room for Female Guests,40119874,Stephany,Brooklyn,Prospect-Lefferts Gardens,40.66242,-73.94417,Private room,48,1,131,2019-05-31,9.97,2,0


In [19]:
# Uh, all of these people? Am I missing something? My answer is this whole data_frame!

### Question 8

What host has the highest total of prices and where are they located?



In [20]:
air_bnb.groupby(['host_id', 'host_name', 'neighbourhood_group'])['price'].sum().sort_values(ascending = False)

host_id    host_name     neighbourhood_group
219517861  Sonder (NYC)  Manhattan              82795
107434423  Blueground    Manhattan              69741
205031545  Red Awning    Manhattan              35294
30283594   Kara          Manhattan              33581
156158778  Sally         Manhattan              29194
                                                ...  
91034542   Maureen       Manhattan                 10
205820814  Luz           Bronx                     10
52777892   Amy           Manhattan                 10
10132166   Aymeric       Brooklyn                   0
13709292   Qiuchi        Manhattan                  0
Name: price, Length: 37554, dtype: int64

In [21]:
# Unsurprisingly, that title belongs to Sonder (NYC) and they are located in Manhattan.

### Question 9

When did Danielle from Queens last receive a review?



In [22]:
air_bnb[(air_bnb['host_name']=='Danielle') & (air_bnb['neighbourhood_group']=='Queens')]['last_review']

7086     2019-07-03
16349           NaN
20403    2019-07-06
21517    2019-07-07
22068    2019-07-06
22469    2019-07-08
27021    2018-01-02
33861    2019-06-20
Name: last_review, dtype: object

In [23]:
# Of the 8 Danielles in Queens, the last one to receive a review was 2019-07-08

## Further Questions

1. Which host has the most listings?

In [44]:
air_bnb.groupby(['host_id', 'host_name']).count()['id'].sort_values(ascending=False)

host_id    host_name     
219517861  Sonder (NYC)      327
107434423  Blueground        232
30283594   Kara              121
137358866  Kazuya            103
16098958   Jeremy & Laura     96
                            ... 
13543967   Paulina             1
13541655   Michael             1
13540183   Ashley              1
13538150   Mariana             1
274321313  Kat                 1
Name: id, Length: 37439, dtype: int64

In [None]:
# Again, Sonder has the most listings with 327 instances.

2. How many listings have completely open availability?

In [47]:
air_bnb[air_bnb['availability_365']!=0].id.count()

31362

In [None]:
# It appears that there are 31,362 listings with at least one available slot.

3. What room_types have the highest review numbers?

In [55]:
air_bnb.groupby('room_type')['number_of_reviews'].sum()

room_type
Entire home/apt    580403
Private room       538346
Shared room         19256
Name: number_of_reviews, dtype: int64

In [None]:
# Entire homes or apartments have the highest number of reviews with 580,403.

# Final Conclusion

In this cell, write your final conclusion for each of the questions asked.

Also, if you uncovered some more details that were not asked above, please discribe them here.
______________________________________________________________________________________________

I already described how I felt about some of the answers, but here I'll give a brief synopsis of each one:

1. By the 5 groups, it became apparent that we're studying NYC with this data (althought that was already revealed by the name of the csv.) Manhattan having the most available AirBnB makes me believe that either it is area with the highest population, or the biggest tourist attraction.

2. Entire apartments being the most popular first made me wonder if that just meant they're better, but that's already basically inherently true. I think the true lesson to be learned here is that most of the tourists in Manhattan are willing to pay more. In retrospect, I wonder if I truly found the "most popular" by just finding the most common type.

3. This was the answer that first made me realize that not all of these AirBnB's are privately owned. Being an organization, I'm guessing that they would have better marketing, and I suspect they would push their clients to do reviews as well, which is why I think they get the most reviews.

4. It didn't surprise me that Manhattan had the highest average price because of what I already stated in #2, which is that tourists in Manhattan are possibly willing to pay more.

5. Highest price is almost always indicative of an outlier, so I don't think this takes away from the idea that Manhattan has the most money flowing. It was surprising to me that there was an exact three way tie.

6. Oh hey! These must be the three owners of the three way tie from the last problem! Nothing too meta to learn about this one, except that the other two are only a dollar off.

7. There are 35014 places with no availability that have gotten over 100 reviews! AirBnBs remain busy.

8. Ok, ok, I get it. Sonder and Manhattan have the most AirBnB action. It's hard to deny at this point.

9. My conclusion for this one is that sometimes a question is impossible, and you have to make your own interpretation.

FQ1. Sonder, being the biggest organization, has the most listings naturally. I'm surprised that the third highest, Kara, appears to be the name of an individual. Perhaps somebody in real estate?

FQ2. There are 31,362 available listings. That's almost half. I would've expected more, but it seems like AirBnBs truly are thriving in this time period.

FQ3. If Entire apartments wasn't the type with the most reviews, I would have been very surprised seeing as how they are the most common type.

CONCLUSION: AirBnBs were very popular when this data was taken, and especially so in Manhattan. Organizations such as Sonder tend to be the most bustling and get the most action, and apartments and houses are the most popular type of place to stay at.