# Airbnb data investigation
## Seattle Airbnb Open Data
> A sneak peek into the Airbnb activity in Seattle, WA, USA
### Context
> Since 2008, guests and hosts have used Airbnb to travel in a more unique, personalized way. As part of the Airbnb Inside initiative, this dataset describes the listing activity of homestays in Seattle, WA.

### Content
> The following Airbnb activity is included in this Seattle dataset:

>* Listings, including full descriptions and average review score
>* Reviews, including unique id for each reviewer and detailed comments
>* Calendar, including listing id and the price and availability for that day

### Acknowledgement
> This dataset is part of Airbnb Inside, and the original source can be found [here](http://insideairbnb.com/get-the-data.html).

## Boston Airbnb Open Data
> A sneak peek into the Airbnb activity in Boston, MA, USA
### Context
> Since 2008, guests and hosts have used Airbnb to travel in a more unique, personalized way. As part of the Airbnb Inside initiative, this dataset describes the listing activity of homestays in Boston, MA.

### Content
> The following Airbnb activity is included in this Boston dataset:

> *Listings, including full descriptions and average review score
> *Reviews, including unique id for each reviewer and detailed comments
> *Calendar, including listing id and the price and availability for that day

### Acknowledgement
> This dataset is part of Airbnb Inside, and the original source can be found [here](http://insideairbnb.com/get-the-data.html).

# Load data

In [7]:
# data location
%ls ../../Datasets

[34mBoston Airbnb Open Data[m[m/        Dataset of USED CARS.zip
Boston Airbnb Open Data.zip     Netflix_movie_and_TV_shows.csv
Car Sales.xlsx - car_data.csv   Netflix_movie_and_TV_shows.zip
Car sales report.zip            [34mSeattle_Airbnb[m[m/
Dataset of USED CARS.csv        Seattle_Airbnb.zip


In [60]:
# set data location
data_dir = '../../Datasets/'
boston_dir = data_dir+"Boston Airbnb Open Data/"
seattle_dir = data_dir+'Seattle_Airbnb/'

In [75]:
import os
# all boston datasets and seattle datasets
bs_all,sa_all = [],[]
for root,dirs,files in os.walk(boston_dir):
    for file in files:
        bs_all.append(os.path.join(root,file))
for root,dirs,files in os.walk(seattle_dir):
    for file in files:
        sa_all.append(os.path.join(root,file))

In [76]:
bs_all

['../../Datasets/Boston Airbnb Open Data/reviews.csv',
 '../../Datasets/Boston Airbnb Open Data/listings.csv',
 '../../Datasets/Boston Airbnb Open Data/calendar.csv']

In [77]:
sa_all

['../../Datasets/Seattle_Airbnb/reviews.csv',
 '../../Datasets/Seattle_Airbnb/listings.csv',
 '../../Datasets/Seattle_Airbnb/calendar.csv']

> ## Load all datasets

In [78]:
import pandas as pd
import numpy as np

In [92]:
tmp_df = pd.read_csv(bs_all[0])
tmp_df.head()

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,1178162,4724140,2013-05-21,4298113,Olivier,My stay at islam's place was really cool! Good...
1,1178162,4869189,2013-05-29,6452964,Charlotte,Great location for both airport and city - gre...
2,1178162,5003196,2013-06-06,6449554,Sebastian,We really enjoyed our stay at Islams house. Fr...
3,1178162,5150351,2013-06-15,2215611,Marine,The room was nice and clean and so were the co...
4,1178162,5171140,2013-06-16,6848427,Andrew,Great location. Just 5 mins walk from the Airp...


In [93]:
for col in tmp_df.columns:
    print(col,':',tmp_df[col].unique())

listing_id : [ 1178162  7246272 13658522 ...  6425405 13101775  7462268]
id : [ 4724140  4869189  5003196 ... 85797088 97264637 98550693]
date : ['2013-05-21' '2013-05-29' '2013-06-06' ... '2011-08-15' '2012-09-15'
 '2012-11-21']
reviewer_id : [ 4298113  6452964  6449554 ... 77129134 15799803 90128094]
reviewer_name : ['Olivier' 'Charlotte' 'Sebastian' ... 'Faustino' 'Kriti' 'Vid']
comments : ["My stay at islam's place was really cool! Good location, 5min away from subway, then 10min from downtown. The room was nice, all place was clean. Islam managed pretty well our arrival, even if it was last minute ;) i do recommand this place to any airbnb user :)"
 'Great location for both airport and city - great amenities in the house: Plus Islam was always very helpful even though he was away'
 "We really enjoyed our stay at Islams house. From the outside the house didn't look so inviting but the inside was very nice! Even though Islam himself was not there everything was prepared for our arri

In [94]:
tmp_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 68275 entries, 0 to 68274
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   listing_id     68275 non-null  int64 
 1   id             68275 non-null  int64 
 2   date           68275 non-null  object
 3   reviewer_id    68275 non-null  int64 
 4   reviewer_name  68275 non-null  object
 5   comments       68222 non-null  object
dtypes: int64(3), object(3)
memory usage: 3.1+ MB


In [95]:
# since both datasets contain 'reviews','listings', and 'calendar', create a dictionary key
dict_keys = ['reviews','listings','calendar']
# create dictionary of dataframes for both boston and seattle
dict_bs, dict_sa = {}, {}
for i,dict_key in enumerate(dict_keys):
    dict_bs[dict_key] = pd.read_csv(bs_all[i])
    dict_sa[dict_key] = pd.read_csv(sa_all[i])

> ## Wrangle data

In [122]:
dict_sa['reviews'].sample(5)

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
9537,5078129,35850548,2015-06-22,34541106,Jake,What you get is better than what you see!\nVer...
67337,1472532,50757672,2015-10-14,42067880,Fred,We never met Sid but talked with and correspon...
55302,1107845,18418587,2014-08-26,18235181,Gary,Loved staying in Owens place. A perfect settin...
11452,2016613,43073659,2015-08-17,25519145,Steven,We had a great experience dealing with the own...
48091,2978929,13630660,2014-06-01,15692815,Michael,Everything was great; no complaints. Sarah see...


In [123]:
dict_sa['listings'].sample(5)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,...,review_scores_value,requires_license,license,jurisdiction_names,instant_bookable,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,reviews_per_month
2407,7679928,https://www.airbnb.com/rooms/7679928,20160104002432,2016-01-04,Private room with Deck,Cute private bedroom with bathroom next door. ...,Private room with closet space. Great deck wit...,Cute private bedroom with bathroom next door. ...,none,,...,10.0,f,,WASHINGTON,f,flexible,f,f,1,0.6
2233,8845310,https://www.airbnb.com/rooms/8845310,20160104002432,2016-01-04,Sweet Little House,This 1909 home is full of charm (890sf) is wel...,,This 1909 home is full of charm (890sf) is wel...,none,,...,,f,,WASHINGTON,f,flexible,f,f,1,
3047,8131762,https://www.airbnb.com/rooms/8131762,20160104002432,2016-01-04,Ground Floor unit on Lake Washingon,"We have an apartment in our house, with a bedr...",,"We have an apartment in our house, with a bedr...",none,,...,,f,,WASHINGTON,f,flexible,f,f,1,
1228,7035498,https://www.airbnb.com/rooms/7035498,20160104002432,2016-01-04,Hip Flat - Great dining steps away!,Perfect city-stay local for work or play. With...,"Location, location, location! This is the perf...",Perfect city-stay local for work or play. With...,none,The Jewel Box is in a pretty perfect location....,...,9.0,f,,WASHINGTON,f,strict,f,f,2,5.71
2380,2809796,https://www.airbnb.com/rooms/2809796,20160104002432,2016-01-04,Garden Cottage near beach/downtown,Enjoy this private oasis just minutes from dow...,This bright and airy cottage has everything yo...,Enjoy this private oasis just minutes from dow...,none,,...,10.0,f,,WASHINGTON,t,moderate,f,f,1,2.28


In [121]:
dict_sa['calendar'].sample(5)

Unnamed: 0,listing_id,date,available,price
1318080,3994634,2016-03-09,t,$31.00
315260,6675927,2016-09-25,f,
1132833,670021,2016-08-29,f,
175462,7839535,2016-09-22,f,
1055364,6305798,2016-06-01,t,$134.00


> Merge the dataframes for boston

In [108]:
df_bs = dict_bs['reviews'].merge(dict_bs['listings'], how='inner', left_on='listing_id', right_on='id')
df_bs = df_bs.merge(dict_bs['calendar'], how='inner', on='listing_id')
df_bs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24920375 entries, 0 to 24920374
Columns: 104 entries, listing_id to price_y
dtypes: float64(18), int64(18), object(68)
memory usage: 19.3+ GB


> merge dataframe for seattle

In [109]:
df_sa = dict_sa['reviews'].merge(dict_sa['listings'], how='inner', left_on='listing_id', right_on='id')
df_sa = df_sa.merge(dict_sa['calendar'], how='inner', on='listing_id')
df_sa.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30969885 entries, 0 to 30969884
Columns: 101 entries, listing_id to price_y
dtypes: float64(17), int64(16), object(68)
memory usage: 23.3+ GB
