# AirBnB NY Locations Data Case Study

In this final project, you task will be to take the data provided and find evidence to answer the following questions.

1. Which hosts are the busiest and why?
2. How many neighborhood groups are available and which shows up the most?
3. Are private rooms the most popular in manhattan?
4. Which hosts are the busiest and based on their reviews?
5. Which neighorhood group has the highest average price?
6. Which neighborhood group has the highest total price?
7. Which top 5 hosts have the highest total price?
8. Who currently has no (zero) availability with a review count of 100 or more?
9. What host has the highest total of prices and where are they located?
10. When did Danielle from Queens last receive a review?

You will be given **4 hours** to complete this assignment. 
**Be Advised** I will go dark for this intire assignment time period. That said, any questions that you would like to ask about the data, or the project **MUST** be asked before the time starts. Once the time has started, I can no longer give information.

This is to similate what you will face when you are out in the wild. 

Happy Coding!

=========================================================================================================================

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### Exploring the data set

In [2]:
air_bnb = pd.read_csv('AB_NYC_2019.csv')
air_bnb.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


In [3]:
air_bnb.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48895 entries, 0 to 48894
Data columns (total 16 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              48895 non-null  int64  
 1   name                            48879 non-null  object 
 2   host_id                         48895 non-null  int64  
 3   host_name                       48874 non-null  object 
 4   neighbourhood_group             48895 non-null  object 
 5   neighbourhood                   48895 non-null  object 
 6   latitude                        48895 non-null  float64
 7   longitude                       48895 non-null  float64
 8   room_type                       48895 non-null  object 
 9   price                           48895 non-null  int64  
 10  minimum_nights                  48895 non-null  int64  
 11  number_of_reviews               48895 non-null  int64  
 12  last_review                     

#### I wanted to investigate the host_ids with a NaN host_name.

Unnamed: 0,id,host_id,host_name
360,100184,526653,
2700,1449546,7779204,
5745,4183989,919218,
6075,4446862,23077718,
6582,4763327,24576978,
8163,6292866,32722063,
8257,6360224,33134899,
8852,6786181,32722063,
9138,6992973,5162530,
9817,7556587,39608626,


 - There are 21 total NaN host_names
 - 18 unique host_ids with NaN host_names
 - Also there may be duplicate host_names that are not NaN 
 - This may become an issue later if we wanted to groupby host_names

 - 48895 rows, 16 columns
 - name, host_name, last_review and reviews_per_month have null values.

### Which hosts are the busiest and based on their reviews?


Unnamed: 0,host_id,host_name,sum_of_reviews,sum_of_reviews_rank
21304,37312959,Maya,2273,1.000000
1052,344035,Brooklyn& Breakfast -Len-,2205,0.999973
18626,26432133,Danielle,2017,0.999947
20872,35524316,Yasu & Akiko,1971,0.999920
21921,40176101,Brady,1818,0.999893
...,...,...,...,...
21806,39695769,Avra,0,0.192340
21809,39706334,Erin,0,0.192340
21812,39724060,Jaime,0,0.192340
21816,39731713,Polina,0,0.192340


Unnamed: 0,host_id,host_name,num_of_listings,availability_365,avg_reviews_per_month,avg_reviews_per_month_rank
35152,228415932,Louann,1.0,134.000000,20.940000,1.000000
35807,244361589,Row NYC,9.0,292.555556,18.620000,0.999973
32091,156684502,Nalicia,3.0,25.666667,18.126667,0.999947
34510,217379941,Brent,1.0,28.000000,15.780000,0.999920
23486,47621202,Dona,2.0,253.000000,13.990000,0.999893
...,...,...,...,...,...,...
37434,274273284,Anastasia,1.0,180.000000,,0.192340
37435,274298453,Adrien,1.0,15.000000,,0.192340
37436,274307600,Jonathan,1.0,341.000000,,0.192340
37437,274311461,Scott,1.0,176.000000,,0.192340


Unnamed: 0,host_id,host_name,sum_of_reviews,sum_of_reviews_rank,num_of_listings,availability_365,avg_reviews_per_month,avg_reviews_per_month_rank
0,37312959,Maya,2273,1.000000,5.0,164.800000,10.706000,0.999225
1,344035,Brooklyn& Breakfast -Len-,2205,0.999973,13.0,286.384615,4.307692,0.955234
2,26432133,Danielle,2017,0.999947,5.0,288.600000,13.604000,0.999866
3,35524316,Yasu & Akiko,1971,0.999920,11.0,232.636364,3.665455,0.929272
4,40176101,Brady,1818,0.999893,7.0,48.857143,6.030000,0.987660
...,...,...,...,...,...,...,...,...
37434,39695769,Avra,0,0.192340,1.0,0.000000,,0.192340
37435,39706334,Erin,0,0.192340,1.0,0.000000,,0.192340
37436,39724060,Jaime,0,0.192340,1.0,79.000000,,0.192340
37437,39731713,Polina,0,0.192340,1.0,0.000000,,0.192340


 - There are 37439 unique hosts in this data set
 - Based on the total sum of reviews for each host, the following are busiest:
     - Maya, with 2273 reviews
     - Brooklyn&Breakfast-Len-, with 2205 reviews
     - Danielle, with 2017 reviews
 - Based on the average number of reviews per month for each host, the following are busiest:
     - Louann with ~21 reviews/month
     - Row NYC with ~19 reviews/month
     - Nalicia with ~18 reviews/month
 - Interesting to note that having the most reviews doesn't necessarily mean a host receives many reviews per month, which may have to do with different factors having varying influence on each metric. (i.e. the number of listings a host has may be closely tied to why a host might have many reviews. the minimum number of nights for a listing may be closely tied to why a host has many reviews per month. these are just speculations though, and other possible factors and their correlations with the metric should be investigated)

### How many neighborhood groups are available and which shows up the most?

Unnamed: 0,neighbourhood_group,id
0,Manhattan,21661
1,Brooklyn,20104
2,Queens,5666
3,Bronx,1091
4,Staten Island,373


 - There are 5 different neighborhood groups available. Manhattan shows up the most.

### Are private rooms the most popular in manhattan?

Total count = 21661


Unnamed: 0,room_type,id
0,Entire home/apt,13199
1,Private room,7982
2,Shared room,480


 - In the Manhattan neighbourhood group, entire home/apartment type listings are the most popular, followed by private rooms and shared rooms.

### Which neighorhood group has the highest average price?

Unnamed: 0,neighbourhood_group,average_price,listings_count
0,Manhattan,196.875814,21661
1,Brooklyn,124.383207,20104
2,Staten Island,114.812332,373
3,Queens,99.517649,5666
4,Bronx,87.496792,1091


 - Manhattan has the highest average price of listings. 

### Which neighbor hood group has the highest total price?

Unnamed: 0,neighbourhood_group,price
0,Manhattan,4264527
1,Brooklyn,2500600
2,Queens,563867
3,Bronx,95459
4,Staten Island,42825


 - When comparing the total price sum of all listings in each neighbourhood_group, Manhattan has the highest total price. 

### Which top 5 hosts have the highest total price?

Unnamed: 0,host_id,host_name
0,2787,John
1,2845,Jennifer
2,4632,Elisabeth
3,4869,LisaRoxanne
4,7192,Laura
...,...,...
48884,274307600,Jonathan
48886,274311461,Scott
48888,274321313,Kat
48892,23492952,Ilgar & Aysel


Unnamed: 0,host_id,total price sum,host_name
0,219517861,82795,Sonder (NYC)
1,107434423,70331,Blueground
2,156158778,37097,Sally
3,205031545,35294,Red Awning
4,30283594,33581,Kara


 - Sonder(NYC), Blueground, Sally, RedAwning and Kara have the highest total price.

### Who currently has no (zero) availability with a review count of 100 or more?

Unnamed: 0,host_id,availability_365,number_of_reviews,host_name
0,22959695,0,1061,Gurpreet Singh
1,99392252,0,732,Michael
2,121391142,0,693,Deloris
3,792159,0,480,Wanda
4,37818581,0,432,Sofia
...,...,...,...,...
143,26073602,0,101,Anna
144,84141923,0,100,Marisha
145,96148809,0,100,Raymond
146,42399786,0,100,Braydon


Unnamed: 0,host_id,availability_365,number_of_reviews


 - 148 unique hosts have individual listings that have 0 availability and number of reviews >= 100.
 - Other Rangers reported 136 hosts - the difference here is that they grouped by host_id, summed up all the number of reviews (even those less than 100) and produced a list of hosts who had listings of 0 availability and the SUM number of reviews >= 100, which I tried to simulate below.
 - Based on these two different analyses, there are 12 hosts who have a sum number of reviews >= 100, but their individual listings have < 100 reviews.

Unnamed: 0,host_id,id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
25,7490,5203,40.80178,-73.96723,79,2,118,0.99,1,0
134,36897,282443,40.71601,-73.99123,90,3,107,1.17,1,0
307,79402,20913,40.70984,-73.95775,100,5,168,1.57,1,0
452,129352,30031,40.73494,-73.95030,50,3,193,1.86,1,0
639,193722,44221,40.70666,-74.01374,196,3,114,1.06,1,0
...,...,...,...,...,...,...,...,...,...,...
32017,155125855,66208914,122.24282,-221.95150,394,3,125,6.53,9,0
32287,159156636,65759674,122.27159,-221.97225,360,3,286,16.06,9,0
32988,176185168,23574142,40.68209,-73.73662,65,1,119,7.79,1,0
33398,187487947,182096207,244.39486,-443.73467,459,6,164,23.03,36,0


### What host has the highest total of prices and where are they located?

array(['Manhattan'], dtype=object)

 - Sonder (NYC) has the highest sum of prices. 
 - She is located in the Manhattan neighbourhood group.

### When did Danielle from Queens last receive a review?

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
7086,5115372,Comfy Room Family Home LGA Airport NO CLEANING...,26432133,Danielle,Queens,East Elmhurst,40.76374,-73.87103,Private room,54,1,430,2019-07-03,13.45,5,347
16349,13151075,ASTORIA APARTMENT OUTDOOR SPACE,18051286,Danielle,Queens,Astoria,40.77221,-73.92901,Private room,50,1,0,,,1,0
20403,16276632,Cozy Room Family Home LGA Airport NO CLEANING FEE,26432133,Danielle,Queens,East Elmhurst,40.76335,-73.87007,Private room,48,1,510,2019-07-06,16.22,5,341
21517,17222454,Sun Room Family Home LGA Airport NO CLEANING FEE,26432133,Danielle,Queens,East Elmhurst,40.76367,-73.87088,Private room,48,1,417,2019-07-07,14.36,5,338
22068,17754072,Bed in Family Home Near LGA Airport,26432133,Danielle,Queens,East Elmhurst,40.76389,-73.87155,Shared room,38,1,224,2019-07-06,7.96,5,80
22469,18173787,Cute Tiny Room Family Home by LGA NO CLEANING FEE,26432133,Danielle,Queens,East Elmhurst,40.7638,-73.87238,Private room,48,1,436,2019-07-08,16.03,5,337
27021,21386105,Quiet & clean 1br haven with balcony near the ...,154256662,Danielle,Queens,Astoria,40.77134,-73.92424,Entire home/apt,250,3,1,2018-01-02,0.05,1,180
33861,26814763,One bedroom with full bed / 1 stop from Manhattan,201647469,Danielle,Queens,Long Island City,40.74565,-73.94699,Private room,108,2,13,2019-06-20,1.74,1,333


 - There are four Danielles from Queens (four different host_ids).
 - The Danielle with 5 different listings last received a review on July 8, 2019. 

## Further Questions

1. Which host has the most listings?

Unnamed: 0,host_id,number_listings,host_name
0,219517861,327,Sonder (NYC)
1,107434423,232,Blueground
2,30283594,121,Kara
3,137358866,103,Kazuya
4,16098958,96,Jeremy & Laura
...,...,...,...
37452,13540183,1,Ashley
37453,13538150,1,Mariana
37454,13535952,1,Nastassia
37455,13533446,1,Daniel


 - Extra: Sonder(NYC), Blueground, Kara, Kazuya, Jeremy & Laura have the highest number of listings.

2. How many listings have completely open availability?

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.94190,Private room,150,3,0,,,1,365
36,11452,Clean and Quiet in Brooklyn,7355,Vt,Brooklyn,Bedford-Stuyvesant,40.68876,-73.94312,Private room,35,60,0,,,1,365
38,11943,Country space in the city,45445,Harriet,Brooklyn,Flatbush,40.63702,-73.96327,Private room,150,1,0,,,1,365
97,21644,"Upper Manhattan, New York",82685,Elliott,Manhattan,Harlem,40.82803,-73.94731,Private room,89,1,1,2018-10-09,0.11,1,365
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48744,36415840,A BEAUTIFUL SPACE IN HEART OF WILLIAMSBURG,223715460,Simon And Julian,Brooklyn,Williamsburg,40.71091,-73.96560,Entire home/apt,499,30,0,,,1,365
48844,36453952,West Village Studio on quiet cobblestone street,115491896,Will,Manhattan,West Village,40.73620,-74.00827,Entire home/apt,205,1,0,,,1,365
48868,36473253,Heaven for you(only for guy),261338177,Diana,Brooklyn,Gravesend,40.59118,-73.97119,Shared room,25,7,0,,,6,365
48880,36481315,The Raccoon Artist Studio in Williamsburg New ...,208514239,Melki,Brooklyn,Williamsburg,40.71232,-73.94220,Entire home/apt,120,1,0,,,3,365


 - 1295 listings have completely open availability

3. What room_types have the highest review numbers?

Unnamed: 0,room_type,id
0,Entire home/apt,25409
1,Private room,22326
2,Shared room,1160


 - In this data set, we have 25409 listings that are entire home/apts; 22326 listings that are private rooms, and 116 listings that are shared rooms

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
room_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Entire home/apt,25409.0,22.842418,42.408837,0.0,1.0,5.0,23.0,488.0
Private room,22326.0,24.112962,47.286746,0.0,1.0,5.0,25.0,629.0
Shared room,1160.0,16.6,34.185006,0.0,0.0,4.0,16.25,454.0


 - Listings that are private rooms have the highest average number of reviews (across other listings of the same room_type).

# Final Conclusion

The data analyzed are AirBnB listings in New York City in 2019.
The data has 48895 unique listings, 37439 unique hosts and includes the following information:
 - Listing Info
     - Neighbourhood
     - Neighbourhood group
     - Geographical Coordinates
     - Room Types
     - Price
     - Minimum number of nights for reservation
     - Number of Reviews
     - Date of Last Review
     - Number of Reviews Per Month
     - Number of Days Available in a Year
 - Host Info
     - Host Name
     - Calculated Number of Listings

The questions were split into sections to create a more coherent analysis.

## Host Analysis
Which hosts are the busiest and why?
Which hosts are the busiest  based on their reviews?
 - There are many ways that suggest whether a host may be busy.
 - One way is by the total number of reviews:
     - These are the top 3 hosts with the highest total number of reviews.
         - Maya, with 2273 reviews
         - Brooklyn&Breakfast-Len-, with 2205 reviews
         - Danielle, with 2017 reviews

 - One way is by the average number of reviews per month:
     - These are the top 3 hosts with the highest average number of reviews per month.
         - Louann with ~21 reviews/month
         - Row NYC with ~19 reviews/month
         - Nalicia with ~18 reviews/month

- One way is by the number of listings the host has to take care of:
     - These are the top 3 hosts with the highest number of listings.
         - Sonder (NYC) with 327 listings
         - Blueground with 232 listings
         - Kara with 121 listings

Which top 5 hosts have the highest total price?
 - Among the hosts, these are the top 5 with the highest sum of listing prices
     - Sonder (NYC) with ~83k
     - Blueground with ~70k
     - Sally with ~37k
     - Red Awning with ~35k
     - Kara with ~34k

What host has the highest total of prices and where are they located?
 - Sonder (NYC) has the highest total of prices.
 - Sonder (NYC) is located in Manhattan

When did Danielle from Queens last receive a review?
 - There are four Danielle-s from Queens, each with a unique host_id, but only one has multiple listings, so we will analyze Danielle with host id: 26432133
 - Danielle (id: 26432133) last received a review on July 8, 2019.

## Neighborhood Group Analysis
How many neighborhood groups are available and which shows up the most?
 - Five neighborhood groups are available, listed below with the number of listings in each group.
     - Manhattan with 21661 listings
     - Brooklyn with 20104 listings
     - Queens with 5666 listings
     - Bronx with 1091 listings
     - Staten Island with 373 listings

Which neighorhood group has the highest average price?
 - Listings in Manhattan have the highest average price of ~197 dollars.
 
Which neighborhood group has the highest total price?
 - Manhattan has the highest total sum of listing prices, with ~4.3 million dollars total.

## Listing Analysis
Are private rooms the most popular in manhattan?
 - No, Entire homes and apartments are most popular in Manhattan (~13000 listings)
 - Followed by private rooms (~8000 listings)
 - And shared rooms (480 listings)

Who currently has no (zero) availability with a review count of 100 or more?
 - 148 unique hosts have individual listings that have 0 availability and number of reviews >= 100.
 - Other Rangers reported 136 hosts - the difference here is that they grouped by host_id, summed up all the number of reviews (even those less than 100) and produced a list of hosts who had listings of 0 availability and the SUM number of reviews >= 100, which I tried to simulate below.
 - Based on these two different analyses, there are 12 hosts who have a sum number of reviews >= 100, but their individual listings have < 100 reviews.