# Airbnb data analysis
### Questions:
>* What is the price range monthly in each region in Boston and Seattle?
>
>* What is the most vibe time in each region in Boston and Seattle?
>  
>* Can we predict the possible cost as per the corresponding holder's profiles (e.g., 'neighbourhood_group_cleansed','host_response_time','host_response_rate', 'host_acceptance_rate', 'name', 'note','transit', 'host_verifications', 'host_has_profile_pic', 'host_identity_verified', 'instant_bookable', 'require_guest_profile_picture', 'require_guest_phone_verification'), region, and month?

# Load data

In [1]:
# data location
%ls ../../Datasets

[34mBoston Airbnb Open Data[m[m/        Dataset of USED CARS.zip
Boston Airbnb Open Data.zip     Netflix_movie_and_TV_shows.csv
Car Sales.xlsx - car_data.csv   Netflix_movie_and_TV_shows.zip
Car sales report.zip            [34mSeattle_Airbnb[m[m/
Dataset of USED CARS.csv        Seattle_Airbnb.zip


In [2]:
# set data location
data_dir = '../../Datasets/'
boston_dir = data_dir+"Boston Airbnb Open Data/"
seattle_dir = data_dir+'Seattle_Airbnb/'

In [3]:
import os
# all boston datasets and seattle datasets
bs_all,sa_all = [],[]
for root,dirs,files in os.walk(boston_dir):
    for file in files:
        bs_all.append(os.path.join(root,file))
for root,dirs,files in os.walk(seattle_dir):
    for file in files:
        sa_all.append(os.path.join(root,file))

In [4]:
bs_all

['../../Datasets/Boston Airbnb Open Data/reviews.csv',
 '../../Datasets/Boston Airbnb Open Data/listings.csv',
 '../../Datasets/Boston Airbnb Open Data/calendar.csv']

In [5]:
sa_all

['../../Datasets/Seattle_Airbnb/reviews.csv',
 '../../Datasets/Seattle_Airbnb/listings.csv',
 '../../Datasets/Seattle_Airbnb/calendar.csv']

> ## Load all datasets

In [6]:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns',100)

In [7]:
tmp_df = pd.read_csv(bs_all[0])
tmp_df.head()

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,1178162,4724140,2013-05-21,4298113,Olivier,My stay at islam's place was really cool! Good...
1,1178162,4869189,2013-05-29,6452964,Charlotte,Great location for both airport and city - gre...
2,1178162,5003196,2013-06-06,6449554,Sebastian,We really enjoyed our stay at Islams house. Fr...
3,1178162,5150351,2013-06-15,2215611,Marine,The room was nice and clean and so were the co...
4,1178162,5171140,2013-06-16,6848427,Andrew,Great location. Just 5 mins walk from the Airp...


In [8]:
for col in tmp_df.columns:
    print(col,':',tmp_df[col].unique())

listing_id : [ 1178162  7246272 13658522 ...  6425405 13101775  7462268]
id : [ 4724140  4869189  5003196 ... 85797088 97264637 98550693]
date : ['2013-05-21' '2013-05-29' '2013-06-06' ... '2011-08-15' '2012-09-15'
 '2012-11-21']
reviewer_id : [ 4298113  6452964  6449554 ... 77129134 15799803 90128094]
reviewer_name : ['Olivier' 'Charlotte' 'Sebastian' ... 'Faustino' 'Kriti' 'Vid']
comments : ["My stay at islam's place was really cool! Good location, 5min away from subway, then 10min from downtown. The room was nice, all place was clean. Islam managed pretty well our arrival, even if it was last minute ;) i do recommand this place to any airbnb user :)"
 'Great location for both airport and city - great amenities in the house: Plus Islam was always very helpful even though he was away'
 "We really enjoyed our stay at Islams house. From the outside the house didn't look so inviting but the inside was very nice! Even though Islam himself was not there everything was prepared for our arri

In [9]:
tmp_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 68275 entries, 0 to 68274
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   listing_id     68275 non-null  int64 
 1   id             68275 non-null  int64 
 2   date           68275 non-null  object
 3   reviewer_id    68275 non-null  int64 
 4   reviewer_name  68275 non-null  object
 5   comments       68222 non-null  object
dtypes: int64(3), object(3)
memory usage: 3.1+ MB


In [10]:
# since both datasets contain 'reviews','listings', and 'calendar', create a dictionary key
dict_keys = ['reviews','listings','calendar']
# create dictionary of dataframes for both boston and seattle
dict_bs, dict_sa = {}, {}
for i,dict_key in enumerate(dict_keys):
    dict_bs[dict_key] = pd.read_csv(bs_all[i])
    dict_sa[dict_key] = pd.read_csv(sa_all[i])

> ## Wrangle data

In [11]:
dict_sa['reviews'].sample(5)

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
7168,2191169,25556099,2015-01-19,17218594,Samantha,"This AirBNB was a great value for the space, a..."
66443,3959460,35776417,2015-06-22,29410594,Noreen,"Beth was a welcoming and helpful host, and we ..."
32259,719233,41924369,2015-08-10,30025417,Alex,The listing description was spot on. It whole ...
12045,1606171,23565127,2014-12-04,12364968,Matt,The room os nice. Great on a short notice. Dai...
31371,6646843,42459413,2015-08-13,6865729,Elena,"We had a great time, the apartment was clean a..."


In [12]:
dict_sa['listings'].sample(5)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,notes,transit,thumbnail_url,medium_url,picture_url,xl_picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,street,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,city,state,zipcode,market,smart_location,country_code,country,latitude,longitude,is_location_exact,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,square_feet,price,weekly_price,monthly_price,security_deposit,cleaning_fee,guests_included,extra_people,minimum_nights,maximum_nights,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,requires_license,license,jurisdiction_names,instant_bookable,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,reviews_per_month
2849,9117301,https://www.airbnb.com/rooms/9117301,20160104002432,2016-01-04,Diamond 11 : Walk to Downtown,The Diamond Suite at the 11th Avenue Inn Bed a...,,The Diamond Suite at the 11th Avenue Inn Bed a...,none,,,,https://a2.muscache.com/ac/pictures/9fe1bf5f-7...,https://a2.muscache.com/im/pictures/9fe1bf5f-7...,https://a2.muscache.com/ac/pictures/9fe1bf5f-7...,https://a2.muscache.com/ac/pictures/9fe1bf5f-7...,31509,https://www.airbnb.com/users/show/31509,David,2009-08-13,"Seattle, Washington, United States",I have lived in the Capitol Hill neighborhood ...,within an hour,100%,100%,t,https://a1.muscache.com/ac/users/31509/profile...,https://a1.muscache.com/ac/users/31509/profile...,Capitol Hill,4.0,4.0,"['email', 'phone', 'facebook', 'linkedin', 're...",t,t,"11th Avenue East, Seattle, WA 98102, United St...",Capitol Hill,Broadway,Capitol Hill,Seattle,WA,98102,Seattle,"Seattle, WA",US,United States,47.620403,-122.319309,t,Apartment,Entire home/apt,2,1.0,1.0,1.0,Real Bed,"{TV,""Wireless Internet"",Kitchen,""Free Parking ...",,$99.00,,,,,1,$0.00,1,28,a week ago,t,18,18,18,61,2016-01-04,0,,,,,,,,,,f,,WASHINGTON,f,strict,f,f,1,
1365,5847934,https://www.airbnb.com/rooms/5847934,20160104002432,2016-01-04,Belltown Court Sunset Suite,Our newest and brightest addition to Sea to Sk...,Brand new to Sea to Sky's boutique collection ...,Our newest and brightest addition to Sea to Sk...,none,Live like a local in the vibrant Belltown neig...,Please note that major holidays require a 4 or...,(URL HIDDEN) Our guests also use Uber and Lyft...,https://a2.muscache.com/ac/pictures/72840060/e...,https://a2.muscache.com/im/pictures/72840060/e...,https://a2.muscache.com/ac/pictures/72840060/e...,https://a2.muscache.com/ac/pictures/72840060/e...,430709,https://www.airbnb.com/users/show/430709,Sea To Sky Rentals,2011-03-08,"Seattle, Washington, United States",Rental and Management company representing ove...,within a day,88%,100%,f,https://a2.muscache.com/ac/users/430709/profil...,https://a2.muscache.com/ac/users/430709/profil...,Belltown,36.0,36.0,"['email', 'phone', 'facebook', 'linkedin', 're...",t,t,"2nd Ave and Battery St, Seattle, WA 98121, Uni...",Belltown,Belltown,Downtown,Seattle,WA,98121,Seattle,"Seattle, WA",US,United States,47.613737,-122.347775,f,Apartment,Entire home/apt,6,2.0,2.0,3.0,Real Bed,"{TV,""Cable TV"",""Wireless Internet"",Pool,Kitche...",,$230.00,,,,$209.00,1,$0.00,2,1125,today,t,26,48,78,348,2016-01-04,10,2015-05-10,2015-12-13,98.0,10.0,10.0,10.0,10.0,10.0,9.0,f,,WASHINGTON,f,strict,t,t,31,1.25
3029,1142039,https://www.airbnb.com/rooms/1142039,20160104002432,2016-01-04,Secluded Setting in North Seattle,We offer two cozy rooms in a secluded north Se...,Cozy house in a single family home in a cottag...,We offer two cozy rooms in a secluded north Se...,none,Single family homes. Schools and activities ar...,We are vegetarian. We prefer no cooking of meat.,"Public transportation is easily 2-blocks away,...",https://a2.muscache.com/ac/pictures/30263884/4...,https://a2.muscache.com/im/pictures/30263884/4...,https://a2.muscache.com/ac/pictures/30263884/4...,https://a2.muscache.com/ac/pictures/30263884/4...,225830,https://www.airbnb.com/users/show/225830,Lee & Steve,2010-09-06,"Seattle, Washington, United States","I am an community organizer, urban farmer and ...",within a few hours,100%,100%,f,https://a0.muscache.com/ac/users/225830/profil...,https://a0.muscache.com/ac/users/225830/profil...,Victory Heights,2.0,2.0,"['email', 'phone', 'reviews', 'kba']",t,t,"19th Avenue Northeast, Seattle, WA 98115, Unit...",Victory Heights,Victory Heights,Lake City,Seattle,WA,98115,Seattle,"Seattle, WA",US,United States,47.701221,-122.30815,t,House,Private room,1,1.0,1.0,1.0,Real Bed,"{""Wireless Internet"",Kitchen,""Free Parking on ...",,$50.00,$280.00,$750.00,,,1,$0.00,1,1125,4 weeks ago,t,27,29,59,334,2016-01-04,20,2014-10-13,2015-11-17,95.0,9.0,9.0,10.0,10.0,9.0,9.0,f,,WASHINGTON,f,flexible,f,f,2,1.34
3584,2211594,https://www.airbnb.com/rooms/2211594,20160104002432,2016-01-04,Modern Townhouse in Capitol Hill,You are welcome to stay in a modern townhouse ...,This is a three story townhouse that has it al...,You are welcome to stay in a modern townhouse ...,none,"With a Walk Score of 95/100, this is a fantast...",This space is not handicap-accessible.,Public transportation is only a minute's walk ...,https://a1.muscache.com/ac/pictures/29845332/f...,https://a1.muscache.com/im/pictures/29845332/f...,https://a1.muscache.com/ac/pictures/29845332/f...,https://a1.muscache.com/ac/pictures/29845332/f...,11058653,https://www.airbnb.com/users/show/11058653,David And Amelia,2014-01-06,"Seattle, Washington, United States","Interested in sports, exercising, music, psych...",within an hour,100%,100%,f,https://a2.muscache.com/ac/users/11058653/prof...,https://a2.muscache.com/ac/users/11058653/prof...,Minor,1.0,1.0,"['email', 'phone', 'reviews', 'jumio']",t,t,"18th Avenue, Seattle, WA 98122, United States",Minor,Stevens,Capitol Hill,Seattle,WA,98122,Seattle,"Seattle, WA",US,United States,47.617818,-122.30641,t,House,Entire home/apt,4,2.5,2.0,2.0,Real Bed,"{TV,Internet,""Wireless Internet"",Kitchen,""Indo...",,$126.00,,,$500.00,$105.00,2,$50.00,2,30,2 weeks ago,t,30,60,90,365,2016-01-04,69,2014-03-11,2016-01-02,93.0,10.0,10.0,10.0,9.0,10.0,9.0,f,,WASHINGTON,f,strict,f,f,1,3.11
3063,170469,https://www.airbnb.com/rooms/170469,20160104002432,2016-01-04,Private Bed & Bath in Ballard,Your cozy room with full-sized bed includes a ...,Your private bedroom with attached bath is on ...,Your cozy room with full-sized bed includes a ...,none,"Ballard is very popular with trendy shops, bar...","* Check-in is 3pm, check-out is at Noon. * I ...","Street parking is free, but hard to find durin...",https://a2.muscache.com/ac/pictures/af2aa4a4-4...,https://a2.muscache.com/im/pictures/af2aa4a4-4...,https://a2.muscache.com/ac/pictures/af2aa4a4-4...,https://a2.muscache.com/ac/pictures/af2aa4a4-4...,756099,https://www.airbnb.com/users/show/756099,Carie,2011-06-28,"Seattle, Washington, United States","I've lived in Seattle for 25 years, am a Life ...",within an hour,100%,100%,f,https://a2.muscache.com/ac/pictures/7937d2aa-4...,https://a2.muscache.com/ac/pictures/7937d2aa-4...,Ballard,1.0,1.0,"['email', 'phone', 'facebook', 'jumio']",t,t,"Alonzo Ave NW, Seattle, WA 98117, United States",Ballard,Whittier Heights,Ballard,Seattle,WA,98117,Seattle,"Seattle, WA",US,United States,47.678605,-122.374046,t,Townhouse,Private room,2,1.0,1.0,1.0,Real Bed,"{Internet,""Wireless Internet"",""Pets live on th...",,$60.00,,,,,1,$0.00,1,730,today,t,26,56,86,359,2016-01-04,0,,,,,,,,,,f,,WASHINGTON,f,flexible,f,f,1,


> check the columns and select the essential columns
> 
> * 'id', 'neighbourhood_group_cleansed','host_response_time','host_response_rate', 'host_acceptance_rate', 'name', 'note','transit', 'host_verifications', 'host_has_profile_pic', 'host_identity_verified', 'instant_bookable', 'require_guest_profile_picture', 'require_guest_phone_verification', 'price', 'review_scores_rating', 'review_scores_accuracy', 'review_scores_cleanliness',	'review_scores_checkin', 'review_scores_communication',	'review_scores_location', 'review_scores_value'

In [13]:
dict_sa['listings'].name.unique()

array(['Stylish Queen Anne Apartment',
       'Bright & Airy Queen Anne Apartment',
       'New Modern House-Amazing water view', ...,
       'Private apartment view of Lake WA',
       'Amazing View with Modern Comfort!', 'Large Lakefront Apartment'],
      dtype=object)

In [24]:
dict_sa['calendar'].listing_id.nunique()

3818

> The data size is very large, directly merging will be too huge. Drop the non-essential columns and decrease the granuarity of the data.

In [15]:
dict_bs['reviews'].columns

Index(['listing_id', 'id', 'date', 'reviewer_id', 'reviewer_name', 'comments'], dtype='object')

In [16]:
dict_bs['listings'].columns

Index(['id', 'listing_url', 'scrape_id', 'last_scraped', 'name', 'summary',
       'space', 'description', 'experiences_offered', 'neighborhood_overview',
       'notes', 'transit', 'access', 'interaction', 'house_rules',
       'thumbnail_url', 'medium_url', 'picture_url', 'xl_picture_url',
       'host_id', 'host_url', 'host_name', 'host_since', 'host_location',
       'host_about', 'host_response_time', 'host_response_rate',
       'host_acceptance_rate', 'host_is_superhost', 'host_thumbnail_url',
       'host_picture_url', 'host_neighbourhood', 'host_listings_count',
       'host_total_listings_count', 'host_verifications',
       'host_has_profile_pic', 'host_identity_verified', 'street',
       'neighbourhood', 'neighbourhood_cleansed',
       'neighbourhood_group_cleansed', 'city', 'state', 'zipcode', 'market',
       'smart_location', 'country_code', 'country', 'latitude', 'longitude',
       'is_location_exact', 'property_type', 'room_type', 'accommodates',
       'bathrooms',

In [17]:
dict_bs['calendar'].columns

Index(['listing_id', 'date', 'available', 'price'], dtype='object')

> Merge the dataframes for boston

In [108]:
df_bs = dict_bs['reviews'].merge(dict_bs['listings'], how='inner', left_on='listing_id', right_on='id')
df_bs = df_bs.merge(dict_bs['calendar'], how='inner', on='listing_id')
df_bs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24920375 entries, 0 to 24920374
Columns: 104 entries, listing_id to price_y
dtypes: float64(18), int64(18), object(68)
memory usage: 19.3+ GB


> merge dataframe for seattle

In [109]:
df_sa = dict_sa['reviews'].merge(dict_sa['listings'], how='inner', left_on='listing_id', right_on='id')
df_sa = df_sa.merge(dict_sa['calendar'], how='inner', on='listing_id')
df_sa.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30969885 entries, 0 to 30969884
Columns: 101 entries, listing_id to price_y
dtypes: float64(17), int64(16), object(68)
memory usage: 23.3+ GB
