# Airbnb data investigation
## Seattle Airbnb Open Data
> A sneak peek into the Airbnb activity in Seattle, WA, USA
### Context
> Since 2008, guests and hosts have used Airbnb to travel in a more unique, personalized way. As part of the Airbnb Inside initiative, this dataset describes the listing activity of homestays in Seattle, WA.

### Content
> The following Airbnb activity is included in this Seattle dataset:

>* Listings, including full descriptions and average review score
>* Reviews, including unique id for each reviewer and detailed comments
>* Calendar, including listing id and the price and availability for that day

### Acknowledgement
> This dataset is part of Airbnb Inside, and the original source can be found [here](http://insideairbnb.com/get-the-data.html).

## Boston Airbnb Open Data
> A sneak peek into the Airbnb activity in Boston, MA, USA
### Context
> Since 2008, guests and hosts have used Airbnb to travel in a more unique, personalized way. As part of the Airbnb Inside initiative, this dataset describes the listing activity of homestays in Boston, MA.

### Content
> The following Airbnb activity is included in this Boston dataset:

> *Listings, including full descriptions and average review score
> *Reviews, including unique id for each reviewer and detailed comments
> *Calendar, including listing id and the price and availability for that day

### Acknowledgement
> This dataset is part of Airbnb Inside, and the original source can be found [here](http://insideairbnb.com/get-the-data.html).

# Load data

In [10]:
# data location
%ls ../../Datasets

[34mBoston Airbnb Open Data[m[m/        Dataset of USED CARS.zip
Boston Airbnb Open Data.zip     Netflix_movie_and_TV_shows.csv
Car Sales.xlsx - car_data.csv   Netflix_movie_and_TV_shows.zip
Car sales report.zip            [34mSeattle_Airbnb[m[m/
Dataset of USED CARS.csv        Seattle_Airbnb.zip


In [11]:
# set data location
data_dir = '../../Datasets/'
boston_dir = data_dir+"Boston Airbnb Open Data/"
seattle_dir = data_dir+'Seattle_Airbnb/'

In [12]:
import os
# all boston datasets and seattle datasets
bs_all,sa_all = [],[]
for root,dirs,files in os.walk(boston_dir):
    for file in files:
        bs_all.append(os.path.join(root,file))
for root,dirs,files in os.walk(seattle_dir):
    for file in files:
        sa_all.append(os.path.join(root,file))

In [13]:
bs_all

['../../Datasets/Boston Airbnb Open Data/reviews.csv',
 '../../Datasets/Boston Airbnb Open Data/listings.csv',
 '../../Datasets/Boston Airbnb Open Data/calendar.csv']

In [14]:
sa_all

['../../Datasets/Seattle_Airbnb/reviews.csv',
 '../../Datasets/Seattle_Airbnb/listings.csv',
 '../../Datasets/Seattle_Airbnb/calendar.csv']

> ## Load all datasets

In [15]:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns',100)

In [16]:
tmp_df = pd.read_csv(bs_all[0])
tmp_df.head()

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,1178162,4724140,2013-05-21,4298113,Olivier,My stay at islam's place was really cool! Good...
1,1178162,4869189,2013-05-29,6452964,Charlotte,Great location for both airport and city - gre...
2,1178162,5003196,2013-06-06,6449554,Sebastian,We really enjoyed our stay at Islams house. Fr...
3,1178162,5150351,2013-06-15,2215611,Marine,The room was nice and clean and so were the co...
4,1178162,5171140,2013-06-16,6848427,Andrew,Great location. Just 5 mins walk from the Airp...


In [17]:
for col in tmp_df.columns:
    print(col,':',tmp_df[col].unique())

listing_id : [ 1178162  7246272 13658522 ...  6425405 13101775  7462268]
id : [ 4724140  4869189  5003196 ... 85797088 97264637 98550693]
date : ['2013-05-21' '2013-05-29' '2013-06-06' ... '2011-08-15' '2012-09-15'
 '2012-11-21']
reviewer_id : [ 4298113  6452964  6449554 ... 77129134 15799803 90128094]
reviewer_name : ['Olivier' 'Charlotte' 'Sebastian' ... 'Faustino' 'Kriti' 'Vid']
comments : ["My stay at islam's place was really cool! Good location, 5min away from subway, then 10min from downtown. The room was nice, all place was clean. Islam managed pretty well our arrival, even if it was last minute ;) i do recommand this place to any airbnb user :)"
 'Great location for both airport and city - great amenities in the house: Plus Islam was always very helpful even though he was away'
 "We really enjoyed our stay at Islams house. From the outside the house didn't look so inviting but the inside was very nice! Even though Islam himself was not there everything was prepared for our arri

In [18]:
tmp_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 68275 entries, 0 to 68274
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   listing_id     68275 non-null  int64 
 1   id             68275 non-null  int64 
 2   date           68275 non-null  object
 3   reviewer_id    68275 non-null  int64 
 4   reviewer_name  68275 non-null  object
 5   comments       68222 non-null  object
dtypes: int64(3), object(3)
memory usage: 3.1+ MB


In [19]:
# since both datasets contain 'reviews','listings', and 'calendar', create a dictionary key
dict_keys = ['reviews','listings','calendar']
# create dictionary of dataframes for both boston and seattle
dict_bs, dict_sa = {}, {}
for i,dict_key in enumerate(dict_keys):
    dict_bs[dict_key] = pd.read_csv(bs_all[i])
    dict_sa[dict_key] = pd.read_csv(sa_all[i])

> ## Wrangle data

In [42]:
dict_sa['reviews'].sample(5)

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
64962,9075558,54564940,2015-11-21,16113799,Forrest,Mostafa's home was affordable and conveniently...
53569,941467,5137547,2013-06-14,4156189,Hendrik,Thank you Rochelle for great time! The room an...
50020,258571,908555,2012-02-07,1689228,Rob,Nick was awesome-- very helpful and accommodat...
39788,5056580,38709247,2015-07-18,2004337,Michelle,This place is truly a one-of-a-kind gem nestle...
6157,4022127,49995660,2015-10-08,41190929,Olivia,Our host was excellent. He made sure that we h...


In [46]:
dict_sa['listings'].sample(5)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,notes,transit,thumbnail_url,medium_url,picture_url,xl_picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,street,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,city,state,zipcode,market,smart_location,country_code,country,latitude,longitude,is_location_exact,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,square_feet,price,weekly_price,monthly_price,security_deposit,cleaning_fee,guests_included,extra_people,minimum_nights,maximum_nights,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,requires_license,license,jurisdiction_names,instant_bookable,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,reviews_per_month
1384,6973790,https://www.airbnb.com/rooms/6973790,20160104002432,2016-01-04,"Downtown, A/C, Hottub, Parking S1",This urban suite is tastefully decorated and i...,,This urban suite is tastefully decorated and i...,none,,,,https://a0.muscache.com/ac/pictures/88396130/e...,https://a0.muscache.com/im/pictures/88396130/e...,https://a0.muscache.com/ac/pictures/88396130/e...,https://a0.muscache.com/ac/pictures/88396130/e...,4411144,https://www.airbnb.com/users/show/4411144,Emma,2012-12-15,"Seattle, Washington, United States",My husband and I are travel junkies! We love B...,within a day,63%,100%,f,https://a2.muscache.com/ac/users/4411144/profi...,https://a2.muscache.com/ac/users/4411144/profi...,Belltown,9.0,9.0,"['email', 'phone', 'linkedin', 'reviews', 'kba']",t,t,"2nd Avenue, Seattle, WA 98121, United States",Belltown,Belltown,Downtown,Seattle,WA,98121,Seattle,"Seattle, WA",US,United States,47.616026,-122.348525,t,Condominium,Entire home/apt,4,1.0,1.0,2.0,Real Bed,"{TV,""Cable TV"",Internet,""Wireless Internet"",""A...",,$99.00,,,,$75.00,1,$0.00,1,1125,today,t,14,38,65,336,2016-01-04,11,2015-07-03,2015-12-13,98.0,10.0,10.0,9.0,9.0,9.0,10.0,f,,WASHINGTON,f,strict,f,t,3,1.77
1436,631445,https://www.airbnb.com/rooms/631445,20160104002432,2016-01-04,Downtown Seattle View Condo,,A few blocks between Pike Place Market and the...,A few blocks between Pike Place Market and the...,none,The neighborhood is so FUN! Located in between...,,Cabs and buses are easy to move around with. I...,https://a2.muscache.com/ac/pictures/9077564/6a...,https://a2.muscache.com/im/pictures/9077564/6a...,https://a2.muscache.com/ac/pictures/9077564/6a...,https://a2.muscache.com/ac/pictures/9077564/6a...,3139639,https://www.airbnb.com/users/show/3139639,Alicia,2012-08-02,"Seattle, Washington, United States",I live in downtown Seattle and enjoy being out...,within an hour,100%,100%,t,https://a2.muscache.com/ac/pictures/5081fc8f-c...,https://a2.muscache.com/ac/pictures/5081fc8f-c...,Belltown,1.0,1.0,"['email', 'phone', 'reviews', 'jumio']",t,t,"1st Avenue, Seattle, WA 98121, United States",Belltown,Belltown,Downtown,Seattle,WA,98121,Seattle,"Seattle, WA",US,United States,47.613483,-122.348432,t,Apartment,Entire home/apt,4,1.0,1.0,1.0,Real Bed,"{TV,""Cable TV"",Internet,""Wireless Internet"",""A...",,$165.00,"$1,100.00","$4,000.00",,$75.00,3,$0.00,2,1125,yesterday,t,29,52,82,357,2016-01-04,38,2012-08-13,2015-11-28,94.0,9.0,10.0,10.0,10.0,10.0,9.0,f,,WASHINGTON,f,strict,f,f,1,0.92
1187,685600,https://www.airbnb.com/rooms/685600,20160104002432,2016-01-04,Steps from magical Discovery Park-1,A truly magnificent location in the heart of S...,WELCOME TO SEATTLE!!! I'm a professional ja...,A truly magnificent location in the heart of S...,none,"Magnolia has an ""island"" mentality and feel, t...",Continental breakfast items provided upon requ...,"* Plentiful street parking, both in front and ...",https://a0.muscache.com/ac/pictures/34499734/6...,https://a0.muscache.com/im/pictures/34499734/6...,https://a0.muscache.com/ac/pictures/34499734/6...,https://a0.muscache.com/ac/pictures/34499734/6...,3497328,https://www.airbnb.com/users/show/3497328,Nikki,2012-09-07,"Seattle, Washington, United States",,within a day,100%,100%,f,https://a0.muscache.com/ac/users/3497328/profi...,https://a0.muscache.com/ac/users/3497328/profi...,Magnolia,4.0,4.0,"['email', 'phone', 'facebook', 'google', 'link...",t,t,"W Emerson St, Seattle, WA 98199, United States",Magnolia,Lawton Park,Magnolia,Seattle,WA,98199,Seattle,"Seattle, WA",US,United States,47.655131,-122.407818,t,House,Private room,2,1.0,1.0,1.0,Real Bed,"{TV,""Cable TV"",Internet,""Wireless Internet"",Po...",250.0,$75.00,$500.00,"$1,700.00",$100.00,$50.00,1,$10.00,2,1125,4 months ago,t,30,60,90,365,2016-01-04,8,2014-08-10,2015-05-04,100.0,10.0,9.0,10.0,10.0,10.0,10.0,f,,WASHINGTON,f,strict,f,f,4,0.47
428,7325455,https://www.airbnb.com/rooms/7325455,20160104002432,2016-01-04,"Private, Standalone Apartment!",Our brand NEW garden apartment in the heart of...,A private garden apartment with private entran...,Our brand NEW garden apartment in the heart of...,none,Convenient location to get anywhere in the Sea...,,Tons!,https://a2.muscache.com/ac/pictures/56c15f3a-a...,https://a2.muscache.com/im/pictures/56c15f3a-a...,https://a2.muscache.com/ac/pictures/56c15f3a-a...,https://a2.muscache.com/ac/pictures/56c15f3a-a...,24017361,https://www.airbnb.com/users/show/24017361,Ryan,2014-11-21,US,,within an hour,100%,100%,f,https://a2.muscache.com/ac/users/24017361/prof...,https://a2.muscache.com/ac/users/24017361/prof...,,1.0,1.0,"['email', 'phone', 'facebook', 'reviews', 'kba']",t,t,"Thackeray Place Northeast, Seattle, WA 98105, ...",,Wallingford,Other neighborhoods,Seattle,WA,98105,Seattle,"Seattle, WA",US,United States,47.657367,-122.326742,f,Bungalow,Entire home/apt,2,1.0,0.0,1.0,Real Bed,"{TV,""Wireless Internet"",Kitchen,""Free Parking ...",,$75.00,,,$500.00,$20.00,1,$0.00,1,1125,2 weeks ago,t,25,55,85,360,2016-01-04,1,2015-12-22,2015-12-22,100.0,10.0,10.0,10.0,10.0,10.0,10.0,f,,WASHINGTON,f,flexible,f,f,1,1.0
2078,278192,https://www.airbnb.com/rooms/278192,20160104002432,2016-01-04,1 BdRm BunkBed for 3 Near Downtown,Located on North Beacon Hill a great room in t...,***SHARED SPACE ***We require ALL guests stayi...,Located on North Beacon Hill a great room in t...,none,,Wifi: Campdavid p/w g14classified,"Close to Beacon Hill Light Rail station, Bus l...",https://a2.muscache.com/ac/pictures/a638089e-6...,https://a2.muscache.com/im/pictures/a638089e-6...,https://a2.muscache.com/ac/pictures/a638089e-6...,https://a2.muscache.com/ac/pictures/a638089e-6...,862329,https://www.airbnb.com/users/show/862329,Prez & Cherie,2011-07-24,"Seattle, Washington, United States",Cherie and I are Seattle Natives with a taste ...,within a few hours,89%,100%,f,https://a2.muscache.com/ac/users/862329/profil...,https://a2.muscache.com/ac/users/862329/profil...,North Beacon Hill,11.0,11.0,"['email', 'phone', 'facebook', 'reviews', 'kba']",t,t,"23rd Ave S, Seattle, WA 98144, United States",North Beacon Hill,North Beacon Hill,Beacon Hill,Seattle,WA,98144,Seattle,"Seattle, WA",US,United States,47.577415,-122.3023,t,House,Private room,3,1.0,1.0,3.0,Real Bed,"{TV,""Cable TV"",Internet,""Wireless Internet"",Ki...",,$79.00,,,$95.00,$50.00,1,$15.00,1,31,5 days ago,t,30,60,90,365,2016-01-04,14,2012-03-20,2014-07-22,95.0,9.0,8.0,10.0,10.0,8.0,9.0,f,,WASHINGTON,f,strict,t,t,11,0.3


> check the columns and select the essential columns
> 
> * 'id', 'neighbourhood_group_cleansed','host_response_time','host_response_rate', 'host_acceptance_rate', 'name', 'note','transit', 'host_verifications', 'host_has_profile_pic', 'host_identity_verified', 'latitude', 'longitude', 

In [55]:
dict_sa['listings'].name.unique()

array(['Stylish Queen Anne Apartment',
       'Bright & Airy Queen Anne Apartment',
       'New Modern House-Amazing water view', ...,
       'Private apartment view of Lake WA',
       'Amazing View with Modern Comfort!', 'Large Lakefront Apartment'],
      dtype=object)

In [23]:
dict_sa['calendar'].sample(5)

Unnamed: 0,listing_id,date,available,price
662303,8409773,2016-07-15,t,$130.00
159988,7667990,2016-05-01,f,
77756,9812439,2016-01-15,f,
732568,714043,2016-01-17,f,
1262636,4163204,2016-04-14,t,$300.00


> The data size is very large, directly merging will be too huge. Drop the non-essential columns and decrease the granuarity of the data.

In [25]:
dict_bs['reviews'].columns

Index(['listing_id', 'id', 'date', 'reviewer_id', 'reviewer_name', 'comments'], dtype='object')

In [26]:
dict_bs['listings'].columns

Index(['id', 'listing_url', 'scrape_id', 'last_scraped', 'name', 'summary',
       'space', 'description', 'experiences_offered', 'neighborhood_overview',
       'notes', 'transit', 'access', 'interaction', 'house_rules',
       'thumbnail_url', 'medium_url', 'picture_url', 'xl_picture_url',
       'host_id', 'host_url', 'host_name', 'host_since', 'host_location',
       'host_about', 'host_response_time', 'host_response_rate',
       'host_acceptance_rate', 'host_is_superhost', 'host_thumbnail_url',
       'host_picture_url', 'host_neighbourhood', 'host_listings_count',
       'host_total_listings_count', 'host_verifications',
       'host_has_profile_pic', 'host_identity_verified', 'street',
       'neighbourhood', 'neighbourhood_cleansed',
       'neighbourhood_group_cleansed', 'city', 'state', 'zipcode', 'market',
       'smart_location', 'country_code', 'country', 'latitude', 'longitude',
       'is_location_exact', 'property_type', 'room_type', 'accommodates',
       'bathrooms',

In [27]:
dict_bs['calendar'].columns

Index(['listing_id', 'date', 'available', 'price'], dtype='object')

> Merge the dataframes for boston

In [108]:
df_bs = dict_bs['reviews'].merge(dict_bs['listings'], how='inner', left_on='listing_id', right_on='id')
df_bs = df_bs.merge(dict_bs['calendar'], how='inner', on='listing_id')
df_bs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24920375 entries, 0 to 24920374
Columns: 104 entries, listing_id to price_y
dtypes: float64(18), int64(18), object(68)
memory usage: 19.3+ GB


> merge dataframe for seattle

In [109]:
df_sa = dict_sa['reviews'].merge(dict_sa['listings'], how='inner', left_on='listing_id', right_on='id')
df_sa = df_sa.merge(dict_sa['calendar'], how='inner', on='listing_id')
df_sa.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30969885 entries, 0 to 30969884
Columns: 101 entries, listing_id to price_y
dtypes: float64(17), int64(16), object(68)
memory usage: 23.3+ GB
