# Airbnb Data Science Drills

Welcome! This notebook contains step-by-step pandas drills using your Airbnb Open Data dataset. Each section has instructions, space for your code, and helpful hints. Feel free to experiment and add extra cells as you learn.

---

In [2]:
# 1. Import pandas and load the data
import pandas as pd
airbnb = pd.read_csv('/Users/chandlershortlidge/Downloads/Airbnb_Open_Data.csv')
airbnb.head()  # Display first few rows


  airbnb = pd.read_csv('/Users/chandlershortlidge/Downloads/Airbnb_Open_Data.csv')


Unnamed: 0,id,NAME,host id,host_identity_verified,host name,neighbourhood group,neighbourhood,lat,long,country,...,service fee,minimum nights,number of reviews,last review,reviews per month,review rate number,calculated host listings count,availability 365,house_rules,license
0,1001254,Clean & quiet apt home by the park,80014485718,unconfirmed,Madaline,Brooklyn,Kensington,40.64749,-73.97237,United States,...,$193,10.0,9.0,10/19/2021,0.21,4.0,6.0,286.0,Clean up and treat the home the way you'd like...,
1,1002102,Skylit Midtown Castle,52335172823,verified,Jenna,Manhattan,Midtown,40.75362,-73.98377,United States,...,$28,30.0,45.0,5/21/2022,0.38,4.0,2.0,228.0,Pet friendly but please confirm with me if the...,
2,1002403,THE VILLAGE OF HARLEM....NEW YORK !,78829239556,,Elise,Manhattan,Harlem,40.80902,-73.9419,United States,...,$124,3.0,0.0,,,5.0,1.0,352.0,"I encourage you to use my kitchen, cooking and...",
3,1002755,,85098326012,unconfirmed,Garry,Brooklyn,Clinton Hill,40.68514,-73.95976,United States,...,$74,30.0,270.0,7/5/2019,4.64,4.0,1.0,322.0,,
4,1003689,Entire Apt: Spacious Studio/Loft by central park,92037596077,verified,Lyndon,Manhattan,East Harlem,40.79851,-73.94399,United States,...,$41,10.0,9.0,11/19/2018,0.1,3.0,1.0,289.0,"Please no smoking in the house, porch or on th...",


## 2. Explore the data
- What columns are in the dataset?
- How many rows are there?

*Hint: Use `airbnb.columns` and `len(airbnb)` or `airbnb.shape`.*

In [5]:
airbnb.columns


Index(['id', 'NAME', 'host id', 'host_identity_verified', 'host name',
       'neighbourhood group', 'neighbourhood', 'lat', 'long', 'country',
       'country code', 'instant_bookable', 'cancellation_policy', 'room type',
       'Construction year', 'price', 'service fee', 'minimum nights',
       'number of reviews', 'last review', 'reviews per month',
       'review rate number', 'calculated host listings count',
       'availability 365', 'house_rules', 'license'],
      dtype='object')

## 3. Data cleaning
- The "price" is in strings with $
- we need to remove dollar signs, commas, and extra whitespace in order to convert to intergers or floats

In [24]:
# Remove dollar signs, commas, and extra whitespace from price
airbnb["price_clean"] = airbnb["price"].str.replace("$", "", regex=False)
airbnb["price_clean"] = airbnb["price_clean"].str.replace(",", "", regex=False)
airbnb["price_clean"] = airbnb["price_clean"].str.strip()

# Now convert to float or int
airbnb["price_clean"] = airbnb["price_clean"].astype(float)
airbnb

Unnamed: 0,id,NAME,host id,host_identity_verified,host name,neighbourhood group,neighbourhood,lat,long,country,...,minimum nights,number of reviews,last review,reviews per month,review rate number,calculated host listings count,availability 365,house_rules,license,price_clean
0,1001254,Clean & quiet apt home by the park,80014485718,unconfirmed,Madaline,Brooklyn,Kensington,40.64749,-73.97237,United States,...,10.0,9.0,10/19/2021,0.21,4.0,6.0,286.0,Clean up and treat the home the way you'd like...,,966.0
1,1002102,Skylit Midtown Castle,52335172823,verified,Jenna,Manhattan,Midtown,40.75362,-73.98377,United States,...,30.0,45.0,5/21/2022,0.38,4.0,2.0,228.0,Pet friendly but please confirm with me if the...,,142.0
2,1002403,THE VILLAGE OF HARLEM....NEW YORK !,78829239556,,Elise,Manhattan,Harlem,40.80902,-73.94190,United States,...,3.0,0.0,,,5.0,1.0,352.0,"I encourage you to use my kitchen, cooking and...",,620.0
3,1002755,,85098326012,unconfirmed,Garry,Brooklyn,Clinton Hill,40.68514,-73.95976,United States,...,30.0,270.0,7/5/2019,4.64,4.0,1.0,322.0,,,368.0
4,1003689,Entire Apt: Spacious Studio/Loft by central park,92037596077,verified,Lyndon,Manhattan,East Harlem,40.79851,-73.94399,United States,...,10.0,9.0,11/19/2018,0.10,3.0,1.0,289.0,"Please no smoking in the house, porch or on th...",,204.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
102594,6092437,Spare room in Williamsburg,12312296767,verified,Krik,Brooklyn,Williamsburg,40.70862,-73.94651,United States,...,1.0,0.0,,,3.0,1.0,227.0,No Smoking No Parties or Events of any kind Pl...,,844.0
102595,6092990,Best Location near Columbia U,77864383453,unconfirmed,Mifan,Manhattan,Morningside Heights,40.80460,-73.96545,United States,...,1.0,1.0,7/6/2015,0.02,2.0,2.0,395.0,House rules: Guests agree to the following ter...,,837.0
102596,6093542,"Comfy, bright room in Brooklyn",69050334417,unconfirmed,Megan,Brooklyn,Park Slope,40.67505,-73.98045,United States,...,3.0,0.0,,,5.0,1.0,342.0,,,988.0
102597,6094094,Big Studio-One Stop from Midtown,11160591270,unconfirmed,Christopher,Queens,Long Island City,40.74989,-73.93777,United States,...,2.0,5.0,10/11/2015,0.10,3.0,1.0,386.0,,,546.0


### A. Filter by price less than $200

In [30]:
price_200 = airbnb[airbnb["price_clean"] < 200.0]
price_200

Unnamed: 0,id,NAME,host id,host_identity_verified,host name,neighbourhood group,neighbourhood,lat,long,country,...,minimum nights,number of reviews,last review,reviews per month,review rate number,calculated host listings count,availability 365,house_rules,license,price_clean
1,1002102,Skylit Midtown Castle,52335172823,verified,Jenna,Manhattan,Midtown,40.75362,-73.98377,United States,...,30.0,45.0,5/21/2022,0.38,4.0,2.0,228.0,Pet friendly but please confirm with me if the...,,142.0
6,1004650,BlissArtsSpace!,61300605564,,Alberta,Brooklyn,Bedford-Stuyvesant,40.68688,-73.95596,United States,...,45.0,49.0,10/5/2017,0.40,5.0,1.0,224.0,Please no shoes in the house so bring slippers...,,71.0
14,1009068,Wonderful Guest Bedroom in Manhattan for SINGLES,88920244552,verified,Leonardo,Manhattan,Upper West Side,40.79826,-73.96113,United States,...,2.0,113.0,7/5/2019,0.91,3.0,1.0,68.0,,,149.0
22,1013487,CBG CtyBGd HelpsHaiti rm#1:1-4,53754237010,verified,Mary,Brooklyn,Park Slope,40.68069,-73.97706,United States,...,2.0,130.0,7/1/2019,1.09,4.0,6.0,54.0,Arrival time can be no later than 10:00PM. No ...,,133.0
35,1020667,Large B&B Style rooms,57938703679,unconfirmed,Jared,Brooklyn,Bedford-Stuyvesant,40.68310,-73.95473,United States,...,1.0,74.0,5/12/2019,0.66,,2.0,76.0,"No smoking, this is non-negotiable. No illega...",,191.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
102536,6060404,Relaxing bedroom in 6br apartment,61210811059,unconfirmed,John,Brooklyn,Bushwick,40.69034,-73.91666,United States,...,1.0,1.0,7/28/2015,0.02,1.0,1.0,73.0,Overall please treat the apartment as you woul...,,70.0
102546,6065927,LG Bright Apartment,10408461200,verified,Katie,Manhattan,Morningside Heights,40.80481,-73.96375,United States,...,21.0,4.0,1/6/2018,0.09,4.0,1.0,313.0,#NAME?,,129.0
102563,6075316,Large 1Br Apt + Balcony in LES!,59240106692,unconfirmed,Elena,Manhattan,Lower East Side,40.72115,-73.98308,United States,...,5.0,1.0,9/16/2015,0.02,2.0,1.0,133.0,,,162.0
102578,6083601,Sunny & Spacious near Central Park,41624631525,verified,Guida,Manhattan,East Harlem,40.79674,-73.94449,United States,...,5.0,4.0,1/6/2016,0.08,1.0,1.0,323.0,,,177.0


### B. Aggrigate the data

1. What is the mean price per neighborhood? 

In [43]:
avg_price_hood = airbnb.groupby("neighbourhood")["price_clean"].mean()
avg_price_hood

neighbourhood
Allerton           636.343750
Arden Heights      804.888889
Arrochar           625.764706
Arverne            652.125561
Astoria            639.035275
                      ...    
Windsor Terrace    579.784848
Woodhaven          630.518325
Woodlawn           587.137931
Woodrow            709.333333
Woodside           634.588336
Name: price_clean, Length: 224, dtype: float64

2. What is the median price per neighborhood? 

In [None]:
median_price_hood = airbnb.groupby("neighbourhood")["price_clean"].quantile(.50)
median_price_hood

neighbourhood
Allerton            630.0
Arden Heights       889.0
Arrochar            629.0
Arverne             676.0
Astoria             645.0
                    ...  
Windsor Terrace     551.0
Woodhaven           636.0
Woodlawn            684.0
Woodrow            1025.0
Woodside            639.0
Name: price_clean, Length: 224, dtype: float64

3. Show price highest to lowest

In [None]:
avg_price_hood.sort_values(ascending=False) 

# By default, sort_values() sorts from smallest to largest (ascending=True).
# Setting ascending=False reverses it—largest to smallest.

neighbourhood
New Dorp                  1045.000000
Chelsea, Staten Island    1042.000000
Fort Wadsworth            1024.000000
Little Neck                817.750000
Jamaica Hills              812.904762
                             ...     
Richmondtown               373.400000
Breezy Point               309.888889
Rossville                  300.666667
Spuyten Duyvil             293.444444
Lighthouse Hill            107.666667
Name: price_clean, Length: 224, dtype: float64

## 4. Creating a new column
- Create a new column called 'price_per_min_nights' = price / minimum_nights.
- Show the first 5 rows of this new column.

*Hint: Use `airbnb['new_col'] = ...` syntax.*

In [55]:
airbnb["price_per_min_nights"] = airbnb["price_clean"] / airbnb["minimum nights"]

airbnb["price_per_min_nights"].min()

np.float64(-779.0)

## 6. Data cleaning
- Are there any missing values in the 'reviews_per_month' column?
- Replace missing values in 'reviews_per_month' with 0.

*Hint: Use `.isna().sum()` and `.fillna()`.*

In [None]:
new_reviews_per_month = airbnb["reviews per month"]
new_reviews_per_month = new_reviews_per_month.fillna(0)
print(new_reviews_per_month.isnull())

0         False
1         False
2         False
3         False
4         False
          ...  
102594    False
102595    False
102596    False
102597    False
102598    False
Name: reviews per month, Length: 102599, dtype: bool


## 7. Bonus drill: List comprehensions
- Make a list of host names with more than 100 reviews.
- How many unique hosts are in that list?

*Hint: Try `[name for name in airbnb[airbnb['number_of_reviews'] > 100]['host_name']]`.*

In [None]:
# Your code here
