# Seattle Airbnbs

Data from airbnbs located in Seattle will be used to answer the following questions:
1. How high/low is a host's earning potential? 
2. What is the potential ROI for any given host?

### Import the goods

In [1]:
# Data frameworks
import pandas as pd
import numpy as np

# Data visualization
import matplotlib.pyplot as plt

# Machine Learning library
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.pipeline import Pipeline

In [2]:
calendar = pd.read_csv("data/calendar.csv")
listings = pd.read_csv("data/listings.csv")
reviews = pd.read_csv("data/reviews.csv")

### Inspect the goods

In [3]:
calendar.head()

Unnamed: 0,listing_id,date,available,price
0,241032,2016-01-04,t,$85.00
1,241032,2016-01-05,t,$85.00
2,241032,2016-01-06,f,
3,241032,2016-01-07,f,
4,241032,2016-01-08,f,


In [4]:
listings.head()

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,...,review_scores_value,requires_license,license,jurisdiction_names,instant_bookable,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,reviews_per_month
0,241032,https://www.airbnb.com/rooms/241032,20160104002432,2016-01-04,Stylish Queen Anne Apartment,,Make your self at home in this charming one-be...,Make your self at home in this charming one-be...,none,,...,10.0,f,,WASHINGTON,f,moderate,f,f,2,4.07
1,953595,https://www.airbnb.com/rooms/953595,20160104002432,2016-01-04,Bright & Airy Queen Anne Apartment,Chemically sensitive? We've removed the irrita...,"Beautiful, hypoallergenic apartment in an extr...",Chemically sensitive? We've removed the irrita...,none,"Queen Anne is a wonderful, truly functional vi...",...,10.0,f,,WASHINGTON,f,strict,t,t,6,1.48
2,3308979,https://www.airbnb.com/rooms/3308979,20160104002432,2016-01-04,New Modern House-Amazing water view,New modern house built in 2013. Spectacular s...,"Our house is modern, light and fresh with a wa...",New modern house built in 2013. Spectacular s...,none,Upper Queen Anne is a charming neighborhood fu...,...,10.0,f,,WASHINGTON,f,strict,f,f,2,1.15
3,7421966,https://www.airbnb.com/rooms/7421966,20160104002432,2016-01-04,Queen Anne Chateau,A charming apartment that sits atop Queen Anne...,,A charming apartment that sits atop Queen Anne...,none,,...,,f,,WASHINGTON,f,flexible,f,f,1,
4,278830,https://www.airbnb.com/rooms/278830,20160104002432,2016-01-04,Charming craftsman 3 bdm house,Cozy family craftman house in beautiful neighb...,Cozy family craftman house in beautiful neighb...,Cozy family craftman house in beautiful neighb...,none,We are in the beautiful neighborhood of Queen ...,...,9.0,f,,WASHINGTON,f,strict,f,f,1,0.89


In [5]:
reviews.head()

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,7202016,38917982,2015-07-19,28943674,Bianca,Cute and cozy place. Perfect location to every...
1,7202016,39087409,2015-07-20,32440555,Frank,Kelly has a great room in a very central locat...
2,7202016,39820030,2015-07-26,37722850,Ian,"Very spacious apartment, and in a great neighb..."
3,7202016,40813543,2015-08-02,33671805,George,Close to Seattle Center and all it has to offe...
4,7202016,41986501,2015-08-10,34959538,Ming,Kelly was a great host and very accommodating ...


In [26]:
print('These are the # of rows in each dataframe:')

print('calendar ==> {}'.format(len(calendar)))
print('listings ==> {}'.format(len(listings)))
print('reviews ==> {}'.format(len(reviews)))

print('')

print('These are the # of unique values:')

print('calendar ==> {}'.format(calendar['listing_id'].nunique()))
print('listings ==> {}'.format(listings['id'].nunique()))
print('reviews ==> {}'.format(reviews['listing_id'].nunique()))

These are the # of rows in each dataframe:
calendar ==> 1393570
listings ==> 3818
reviews ==> 84849

These are the # of unique values:
calendar ==> 3818
listings ==> 3818
reviews ==> 3191


In [27]:
print(calendar['listing_id'].head())
print(reviews['listing_id'].head())

0    241032
1    241032
2    241032
3    241032
4    241032
Name: listing_id, dtype: int64
0    7202016
1    7202016
2    7202016
3    7202016
4    7202016
Name: listing_id, dtype: int64


In [34]:
matches = pd.DataFrame([calendar['listing_id'].unique(),reviews['listing_id'].unique()])
matches

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,3808,3809,3810,3811,3812,3813,3814,3815,3816,3817
0,241032,953595,3308979,7421966,278830,5956968,1909058,856550,4948745,2493658,...,1844791.0,6120046.0,262764.0,8578490.0,3383329.0,8101950.0,8902327.0,10267360.0,9604740.0,10208623.0
1,7202016,3946674,7833113,8308353,4277026,7735100,4701141,7934963,2934389,6888107,...,,,,,,,,,,
