# Data Cleaning - Airbnb Listings

## Introduction

In the following notebook, I will be cleaning an aggregation of Airbnb Listings Data of the San Francisco area. This aggregation consists of listings data from 11/2018 through 10/2019.

The aggregation source code can be found [here](https://github.com/KishenSharma6/Airbnb-SF_ML_-_Text_Analysis/blob/master/Airbnb%20Raw%20Data%20Aggregation.ipynb)

Raw data can be found [here](https://github.com/KishenSharma6/Airbnb-SF_ML_-_Text_Analysis/tree/master/Data/01_Raw/SF%20Airbnb%20Raw%20Data)

In [2]:
#Read in libraries
import dask.dataframe as dd
import swifter

import pandas as pd
import pandas_profiling

import re

import numpy as np
from scipy import stats

import matplotlib.pyplot as plt
import seaborn as sns

**Set Additional Settings for Notebook**

In [3]:
#supress future warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

#Set plot aesthetics for notebook
sns.set(style='whitegrid', palette='pastel', color_codes=True)

#Increase number of columns and rows displayed by Pandas
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows',200)

**Read in Data**

In [4]:
#Set path to get aggregated listings data
path = r'C:\Users\kishe\Documents\Data Science\Projects\Python Projects\In Progress\Air BnB - SF\SF Airbnb Raw Data - Aggregated\12_26_2019_Listings.csv'

#Read in Airbnb Listings Data
listings = pd.read_csv(path,index_col=0, low_memory=False, sep=',')


## Pandas Profiling Report

In [5]:
# #Create Pandas Profiling Report for listings data
# profile = listings.profile_report(title='Airbnb Listings Report')

# #Write profile to an HTML file
# profile.to_file(output_file="Airbnb Listings Report.html")

# #Capture rejected variables
# rejected_variables = profile.get_rejected_variables(threshold=0.9)

# #View variables that were rejected
# print('Variables rejected for having a greater collinearity than .9:', rejected_variables)

In [6]:
# #Variables rejected for having a collinearity > .9
# ['availability_60', 'availability_90', 'calculated_host_listings_count_entire_homes', 'country_code', 
# 'host_total_listings_count', 'last_scraped', 'maximum_minimum_nights', 'maximum_nights', 
# 'maximum_nights_avg_ntm', 'minimum_maximum_nights', 'minimum_minimum_nights', 'minimum_nights', 'minimum_nights_avg_ntm']

In [7]:
# #View pandas profile for Listings data
# profile

# Data Cleaning

**Preview data in original format**

In [8]:
#Print original listings shape
print('Original listings shape:',listings.shape)

#View listings head
display(listings.head())

Original listings shape: (88937, 106)


Unnamed: 0,access,accommodates,amenities,availability_30,availability_365,availability_60,availability_90,bathrooms,bed_type,bedrooms,beds,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,calendar_last_scraped,calendar_updated,cancellation_policy,city,cleaning_fee,country,country_code,description,experiences_offered,extra_people,first_review,guests_included,has_availability,host_about,host_acceptance_rate,host_has_profile_pic,host_id,host_identity_verified,host_is_superhost,host_listings_count,host_location,host_name,host_neighbourhood,host_picture_url,host_response_rate,host_response_time,host_since,host_thumbnail_url,host_total_listings_count,host_url,host_verifications,house_rules,id,instant_bookable,interaction,is_business_travel_ready,is_location_exact,jurisdiction_names,last_review,last_scraped,latitude,license,listing_url,longitude,market,maximum_maximum_nights,maximum_minimum_nights,maximum_nights,maximum_nights_avg_ntm,medium_url,minimum_maximum_nights,minimum_minimum_nights,minimum_nights,minimum_nights_avg_ntm,monthly_price,name,neighborhood_overview,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,notes,number_of_reviews,number_of_reviews_ltm,picture_url,price,property_type,require_guest_phone_verification,require_guest_profile_picture,requires_license,review_scores_accuracy,review_scores_checkin,review_scores_cleanliness,review_scores_communication,review_scores_location,review_scores_rating,review_scores_value,reviews_per_month,room_type,scrape_id,security_deposit,smart_location,space,square_feet,state,street,summary,thumbnail_url,transit,weekly_price,xl_picture_url,zipcode
0,*Full access to patio and backyard (shared wit...,3,"{TV,""Cable TV"",Internet,Wifi,Kitchen,""Pets liv...",0,77,0,1,1.0,Real Bed,1.0,2.0,1,1.0,0.0,0.0,2019-04-03,a week ago,moderate,San Francisco,$100.00,United States,US,New update: the house next door is under const...,none,$25.00,2009-07-23,2,t,We are a family with 2 boys born in 2009 and 2...,,t,1169,t,t,1.0,"San Francisco, California, United States",Holly,Duboce Triangle,https://a0.muscache.com/im/pictures/efdad96a-3...,100%,within an hour,2008-07-31,https://a0.muscache.com/im/pictures/efdad96a-3...,1.0,https://www.airbnb.com/users/show/1169,"['email', 'phone', 'facebook', 'reviews', 'kba']",* No Pets - even visiting guests for a short t...,958,t,A family of 4 lives upstairs with their dog. N...,f,t,"{""SAN FRANCISCO""}",2019-03-16,2019-04-03,37.76931,STR-0001256,https://www.airbnb.com/rooms/958,-122.43386,San Francisco,30.0,1.0,30,30.0,,30.0,1.0,1,1.0,"$4,200.00","Bright, Modern Garden Unit - 1BR/1B",*Quiet cul de sac in friendly neighborhood *St...,Duboce Triangle,Western Addition,,Due to the fact that we have children and a do...,183,51.0,https://a0.muscache.com/im/pictures/b7c2a199-4...,$170.00,Apartment,f,f,t,10.0,10.0,10.0,10.0,10.0,97.0,10.0,1.55,Entire home/apt,20190400000000.0,$100.00,"San Francisco, CA","Newly remodeled, modern, and bright garden uni...",,CA,"San Francisco, CA, United States",New update: the house next door is under const...,,*Public Transportation is 1/2 block away. *Ce...,"$1,120.00",,94117
1,"Our deck, garden, gourmet kitchen and extensiv...",5,"{Internet,Wifi,Kitchen,Heating,""Family/kid fri...",0,0,0,0,1.0,Real Bed,2.0,3.0,1,1.0,0.0,0.0,2019-04-03,4 months ago,strict_14_with_grace_period,San Francisco,$100.00,United States,US,We live in a large Victorian house on a quiet ...,none,$0.00,2009-05-03,2,t,Philip: English transplant to the Bay Area and...,,t,8904,t,f,2.0,"San Francisco, California, United States",Philip And Tania,Bernal Heights,https://a0.muscache.com/im/users/8904/profile_...,80%,within a day,2009-03-02,https://a0.muscache.com/im/users/8904/profile_...,2.0,https://www.airbnb.com/users/show/8904,"['email', 'phone', 'reviews', 'kba', 'work_ema...","Please respect the house, the art work, the fu...",5858,f,,f,t,"{""SAN FRANCISCO""}",2017-08-06,2019-04-03,37.74511,,https://www.airbnb.com/rooms/5858,-122.42102,San Francisco,60.0,30.0,60,60.0,,60.0,30.0,30,30.0,"$5,500.00",Creative Sanctuary,I love how our neighborhood feels quiet but is...,Bernal Heights,Bernal Heights,,All the furniture in the house was handmade so...,111,0.0,https://a0.muscache.com/im/pictures/17714/3a7a...,$235.00,Apartment,f,f,t,10.0,10.0,10.0,10.0,10.0,98.0,9.0,0.92,Entire home/apt,20190400000000.0,,"San Francisco, CA",We live in a large Victorian house on a quiet ...,,CA,"San Francisco, CA, United States",,,The train is two blocks away and you can stop ...,"$1,600.00",,94110
2,,2,"{TV,Internet,Wifi,Kitchen,""Free street parking...",30,365,60,90,4.0,Real Bed,1.0,1.0,9,0.0,9.0,0.0,2019-04-03,17 months ago,strict_14_with_grace_period,San Francisco,$50.00,United States,US,Nice and good public transportation. 7 minute...,none,$12.00,2009-08-31,1,t,7 minutes walk to UCSF. 15 minutes walk to US...,,t,21994,t,f,10.0,"San Francisco, California, United States",Aaron,Cole Valley,https://a0.muscache.com/im/users/21994/profile...,100%,within a few hours,2009-06-17,https://a0.muscache.com/im/users/21994/profile...,10.0,https://www.airbnb.com/users/show/21994,"['email', 'phone', 'reviews', 'jumio', 'govern...","No party, No smoking, not for any kinds of smo...",7918,f,,f,t,"{""SAN FRANCISCO""}",2016-11-21,2019-04-03,37.76669,,https://www.airbnb.com/rooms/7918,-122.4525,San Francisco,60.0,32.0,60,60.0,,60.0,32.0,32,32.0,"$1,685.00",A Friendly Room - UCSF/USF - San Francisco,"Shopping old town, restaurants, McDonald, Whol...",Cole Valley,Haight Ashbury,,Please email your picture id with print name (...,17,0.0,https://a0.muscache.com/im/pictures/26356/8030...,$65.00,Apartment,f,f,t,8.0,9.0,8.0,9.0,9.0,85.0,8.0,0.15,Private room,20190400000000.0,$200.00,"San Francisco, CA",Room rental-sunny view room/sink/Wi Fi (inner ...,,CA,"San Francisco, CA, United States",Nice and good public transportation. 7 minute...,,N Juda Muni and bus stop. Street parking.,$485.00,,94117
3,,2,"{TV,Internet,Wifi,Kitchen,""Free street parking...",30,365,60,90,4.0,Real Bed,1.0,1.0,9,0.0,9.0,0.0,2019-04-03,17 months ago,strict_14_with_grace_period,San Francisco,$50.00,United States,US,Nice and good public transportation. 7 minute...,none,$12.00,2014-09-08,1,t,7 minutes walk to UCSF. 15 minutes walk to US...,,t,21994,t,f,10.0,"San Francisco, California, United States",Aaron,Cole Valley,https://a0.muscache.com/im/users/21994/profile...,100%,within a few hours,2009-06-17,https://a0.muscache.com/im/users/21994/profile...,10.0,https://www.airbnb.com/users/show/21994,"['email', 'phone', 'reviews', 'jumio', 'govern...",no pet no smoke no party inside the building,8142,f,,f,t,"{""SAN FRANCISCO""}",2018-09-12,2019-04-03,37.76487,,https://www.airbnb.com/rooms/8142,-122.45183,San Francisco,90.0,32.0,90,90.0,,90.0,32.0,32,32.0,"$1,685.00",Friendly Room Apt. Style -UCSF/USF - San Franc...,,Cole Valley,Haight Ashbury,,Please email your picture id with print name (...,8,1.0,https://a0.muscache.com/im/pictures/27832/3b1f...,$65.00,Apartment,f,f,t,9.0,10.0,9.0,10.0,9.0,93.0,9.0,0.14,Private room,20190400000000.0,$200.00,"San Francisco, CA",Room rental Sunny view Rm/Wi-Fi/TV/sink/large ...,,CA,"San Francisco, CA, United States",Nice and good public transportation. 7 minute...,,"N Juda Muni, Bus and UCSF Shuttle. small shopp...",$490.00,,94117
4,Guests have access to everything listed and sh...,5,"{TV,Internet,Wifi,Kitchen,Heating,""Family/kid ...",30,90,60,90,1.5,Real Bed,2.0,2.0,2,2.0,0.0,0.0,2019-04-03,4 months ago,strict_14_with_grace_period,San Francisco,$225.00,United States,US,Pls email before booking. Interior featured i...,none,$150.00,2009-09-25,2,t,Always searching for a perfect piece at Europe...,,t,24215,t,f,2.0,"San Francisco, California, United States",Rosy,Alamo Square,https://a0.muscache.com/im/users/24215/profile...,100%,within an hour,2009-07-02,https://a0.muscache.com/im/users/24215/profile...,2.0,https://www.airbnb.com/users/show/24215,"['email', 'phone', 'reviews', 'kba']",House Manual and House Rules will be provided ...,8339,f,,f,t,"{""SAN FRANCISCO""}",2018-08-11,2019-04-03,37.77525,STR-0000264,https://www.airbnb.com/rooms/8339,-122.43637,San Francisco,1125.0,7.0,1125,1125.0,,1125.0,7.0,7,7.0,,Historic Alamo Square Victorian,,Western Addition/NOPA,Western Addition,,tax ID on file tax ID on file,27,1.0,https://a0.muscache.com/im/pictures/6f84a7c2-e...,$785.00,House,t,t,t,10.0,10.0,10.0,10.0,10.0,97.0,9.0,0.23,Entire home/apt,20190400000000.0,$0.00,"San Francisco, CA",Please send us a quick message before booking ...,,CA,"San Francisco, CA, United States",Pls email before booking. Interior featured i...,,,,,94117


### Column Removal

**Removing rejected columns from Pandas Report**

These columns have a correlation of over 90% with another variable in the dataset.

In [9]:
#Create list of columns to drop
collinear= ['availability_60', 'availability_90', 'calculated_host_listings_count_entire_homes', 'country_code', 
'host_total_listings_count', 'last_scraped', 'maximum_minimum_nights', 'maximum_nights', 
'maximum_nights_avg_ntm', 'minimum_maximum_nights', 'minimum_minimum_nights', 'minimum_nights', 'minimum_nights_avg_ntm']

#Remove collinear columns from listings
listings.drop(columns = collinear, inplace = True)

#Updated listings shape
print('Updated listings shape:', listings.shape)

Updated listings shape: (88937, 93)


**Removing redundant columns**

Columns city, street, and smart_location appear to encode the same information. Columns neighbourhood and neighbourhood_cleansed also appear to do the same. 

Keeping city and neighbourhood_cleansed

In [10]:
#Cols to drop
cols = ['street', 'smart_location','neighbourhood']

#Dropping redundant columns
listings.drop(columns=cols, inplace=True)

#Updated listings shape
print('Updated listings shape:', listings.shape)

Updated listings shape: (88937, 90)


**Remove columns with homogenous values**

In [11]:
#Capture columns with homogeneous values and store as list in cols
cols = list(listings.columns[listings.nunique() <= 1])

#Drop cols
listings.drop(columns=cols, axis = 1, inplace=True)

#Updated listings shape
print('Updated listings shape:', listings.shape)

Updated listings shape: (88937, 80)


**Check for additional columns with mostly homogenous values**

In [12]:
#Capture columns with homogeneous values and store as list in cols
print(listings.columns[listings.nunique() <= 2])

#Per Pandas Report, state and country are not boolean values
#Investigating rows where listings.country == 'Mexico'
listings[listings.country == 'Mexico']

Index(['country', 'host_has_profile_pic', 'host_identity_verified',
       'host_is_superhost', 'instant_bookable', 'is_location_exact',
       'require_guest_phone_verification', 'require_guest_profile_picture',
       'requires_license', 'state'],
      dtype='object')


Unnamed: 0,access,accommodates,amenities,availability_30,availability_365,bathrooms,bed_type,bedrooms,beds,calculated_host_listings_count,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,calendar_last_scraped,calendar_updated,cancellation_policy,city,cleaning_fee,country,description,extra_people,first_review,guests_included,host_about,host_has_profile_pic,host_id,host_identity_verified,host_is_superhost,host_listings_count,host_location,host_name,host_neighbourhood,host_picture_url,host_response_rate,host_response_time,host_since,host_thumbnail_url,host_url,host_verifications,house_rules,id,instant_bookable,interaction,is_location_exact,last_review,latitude,license,listing_url,longitude,maximum_maximum_nights,monthly_price,name,neighborhood_overview,neighbourhood_cleansed,notes,number_of_reviews,number_of_reviews_ltm,picture_url,price,property_type,require_guest_phone_verification,require_guest_profile_picture,requires_license,review_scores_accuracy,review_scores_checkin,review_scores_cleanliness,review_scores_communication,review_scores_location,review_scores_rating,review_scores_value,reviews_per_month,room_type,scrape_id,security_deposit,space,square_feet,state,summary,transit,weekly_price,zipcode
4767,,6,"{TV,Internet,Wifi,""Air conditioning"",Kitchen,E...",0,0,3.5,Real Bed,3.0,4.0,1,0.0,0.0,2019-04-03,today,super_strict_60,San Francisco,$0.00,Mexico,ONEFINESTAY is proud to present this 3 bedroom...,$0.00,,6,"Hi, \r\n\r\nMy name is Sally and I am part of ...",t,156158778,f,f,458.0,"London, England, United Kingdom",Sally,Battersea,https://a0.muscache.com/im/pictures/user/e7869...,100%,within an hour,2017-10-25,https://a0.muscache.com/im/pictures/user/e7869...,https://www.airbnb.com/users/show/156158778,"['email', 'phone', 'work_email']",Pets not allowed. Check in from 16:00 to 16:00.,23298702,t,,f,,37.79574,,https://www.airbnb.com/rooms/23298702,-122.42566,1125.0,,Three Bridges Penthouse by ONEFINESTAY,,Pacific Heights,Home Truths: - There is a dedicated elevator ...,0,0.0,https://a0.muscache.com/im/pictures/f00cf518-5...,"$8,000.00",Apartment,f,f,f,,,,,,,,,Entire home/apt,20190400000000.0,"$1,500.00","Included amenities: Wifi, iPhone, Welcome Pack...",,,ONEFINESTAY is proud to present this 3 bedroom...,,,94109.0
20223,,6,"{TV,Internet,Wifi,""Air conditioning"",Kitchen,E...",10,25,3.5,Real Bed,3.0,4.0,2,,,2018-12-06,today,super_strict_60,San Francisco,$0.00,Mexico,ONEFINESTAY is proud to present this 3 bedroom...,$0.00,,6,"Hi, \r\n\r\nMy name is Sally and I am part of ...",t,156158778,f,f,419.0,"London, England, United Kingdom",Sally,Battersea,https://a0.muscache.com/im/pictures/user/e7869...,100%,within an hour,2017-10-25,https://a0.muscache.com/im/pictures/user/e7869...,https://www.airbnb.com/users/show/156158778,"['email', 'phone', 'work_email']",Pets not allowed. Check in from 16:00 to 16:00.,23298702,t,,f,,37.795744,,https://www.airbnb.com/rooms/23298702,-122.425657,,,Three Bridges Penthouse by ONEFINESTAY,,Pacific Heights,Home Truths: - There is a dedicated elevator ...,0,,https://a0.muscache.com/im/pictures/f00cf518-5...,"$8,000.00",Apartment,f,f,f,,,,,,,,,Entire home/apt,20181210000000.0,"$1,500.00","Included amenities: Wifi, iPhone, Welcome Pack...",,,ONEFINESTAY is proud to present this 3 bedroom...,,,94109.0
27186,,6,"{TV,Internet,Wifi,""Air conditioning"",Kitchen,E...",7,7,3.5,Real Bed,3.0,4.0,1,0.0,0.0,2019-02-01,today,super_strict_60,San Francisco,$0.00,Mexico,ONEFINESTAY is proud to present this 3 bedroom...,$0.00,,6,"Hi, \r\n\r\nMy name is Sally and I am part of ...",t,156158778,f,f,394.0,"London, England, United Kingdom",Sally,Battersea,https://a0.muscache.com/im/pictures/user/e7869...,97%,within an hour,2017-10-25,https://a0.muscache.com/im/pictures/user/e7869...,https://www.airbnb.com/users/show/156158778,"['email', 'phone', 'work_email']",Pets not allowed. Check in from 16:00 to 16:00.,23298702,t,,f,,37.795744,,https://www.airbnb.com/rooms/23298702,-122.425657,1125.0,,Three Bridges Penthouse by ONEFINESTAY,,Pacific Heights,Home Truths: - There is a dedicated elevator ...,0,0.0,https://a0.muscache.com/im/pictures/f00cf518-5...,"$8,000.00",Apartment,f,f,f,,,,,,,,,Entire home/apt,20190200000000.0,"$1,500.00","Included amenities: Wifi, iPhone, Welcome Pack...",,,ONEFINESTAY is proud to present this 3 bedroom...,,,94109.0
34502,,6,"{TV,Internet,Wifi,""Air conditioning"",Kitchen,E...",11,14,3.5,Real Bed,3.0,4.0,1,0.0,0.0,2019-01-09,yesterday,super_strict_60,San Francisco,$0.00,Mexico,ONEFINESTAY is proud to present this 3 bedroom...,$0.00,,6,"Hi, \r\n\r\nMy name is Sally and I am part of ...",t,156158778,f,f,390.0,"London, England, United Kingdom",Sally,Battersea,https://a0.muscache.com/im/pictures/user/e7869...,,,2017-10-25,https://a0.muscache.com/im/pictures/user/e7869...,https://www.airbnb.com/users/show/156158778,"['email', 'phone', 'work_email']",Pets not allowed. Check in from 16:00 to 16:00.,23298702,t,,f,,37.795744,,https://www.airbnb.com/rooms/23298702,-122.425657,1125.0,,Three Bridges Penthouse by ONEFINESTAY,,Pacific Heights,Home Truths: - There is a dedicated elevator ...,0,0.0,https://a0.muscache.com/im/pictures/f00cf518-5...,"$8,000.00",Apartment,f,f,f,,,,,,,,,Entire home/apt,20190110000000.0,"$1,500.00","Included amenities: Wifi, iPhone, Welcome Pack...",,,ONEFINESTAY is proud to present this 3 bedroom...,,,94109.0
56734,,6,"{TV,Internet,Wifi,""Air conditioning"",Kitchen,E...",0,0,3.5,Real Bed,3.0,4.0,1,0.0,0.0,2019-03-06,today,super_strict_60,San Francisco,$0.00,Mexico,ONEFINESTAY is proud to present this 3 bedroom...,$0.00,,6,"Hi, \r\n\r\nMy name is Sally and I am part of ...",t,156158778,f,f,394.0,"London, England, United Kingdom",Sally,Battersea,https://a0.muscache.com/im/pictures/user/e7869...,100%,within an hour,2017-10-25,https://a0.muscache.com/im/pictures/user/e7869...,https://www.airbnb.com/users/show/156158778,"['email', 'phone', 'work_email']",Pets not allowed. Check in from 16:00 to 16:00.,23298702,t,,f,,37.79574,,https://www.airbnb.com/rooms/23298702,-122.42566,1125.0,,Three Bridges Penthouse by ONEFINESTAY,,Pacific Heights,Home Truths: - There is a dedicated elevator ...,0,0.0,https://a0.muscache.com/im/pictures/f00cf518-5...,"$8,000.00",Apartment,f,f,f,,,,,,,,,Entire home/apt,20190310000000.0,"$1,500.00","Included amenities: Wifi, iPhone, Welcome Pack...",,,ONEFINESTAY is proud to present this 3 bedroom...,,,94109.0


Rows with Mexico in the country column belong to the same host. This listing is in Pacifica Heights, a neighborhood in San Francisco. We can remove country and state column for containing mostly homogenous values.

In [13]:
#Dropping country column
listings.drop(columns=['country','state'], inplace = True)

#Updated listings shape
print('Updated listings shape:', listings.shape)

Updated listings shape: (88937, 78)


**Remove columns with majority NA values**

Columns with greater than 30% missing values will be removed.

In [14]:
#Calculate which columns have mostly NA values
print(listings.isnull().mean().sort_values(ascending=False).head(15))

square_feet               0.982673
monthly_price             0.847757
weekly_price              0.846217
notes                     0.380764
license                   0.359266
access                    0.338228
interaction               0.327501
transit                   0.290936
neighborhood_overview     0.265739
house_rules               0.264524
host_about                0.239158
security_deposit          0.203695
review_scores_value       0.196836
review_scores_checkin     0.196780
review_scores_location    0.196780
dtype: float64


Keeping weekly_price and monthly_price, missing value would indicate that the listing is unavailable for use of over 7 or 30 days respectively

In [15]:
#Store weekly_price and monthly_price
keep = listings[['weekly_price', 'monthly_price']]

#Dropping columns with more than 30% values missing
listings= listings[listings.columns[(listings.isnull().mean() < .30 )]]

#Concatenate keep cols back into listings
pd.concat([listings, keep], axis=1)

#Updated listings shape
#print('Updated listings shape:', listings.shape)

Unnamed: 0,accommodates,amenities,availability_30,availability_365,bathrooms,bed_type,bedrooms,beds,calculated_host_listings_count,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,calendar_last_scraped,calendar_updated,cancellation_policy,city,cleaning_fee,description,extra_people,first_review,guests_included,host_about,host_has_profile_pic,host_id,host_identity_verified,host_is_superhost,host_listings_count,host_location,host_name,host_neighbourhood,host_picture_url,host_response_rate,host_response_time,host_since,host_thumbnail_url,host_url,host_verifications,house_rules,id,instant_bookable,is_location_exact,last_review,latitude,listing_url,longitude,maximum_maximum_nights,name,neighborhood_overview,neighbourhood_cleansed,number_of_reviews,number_of_reviews_ltm,picture_url,price,property_type,require_guest_phone_verification,require_guest_profile_picture,requires_license,review_scores_accuracy,review_scores_checkin,review_scores_cleanliness,review_scores_communication,review_scores_location,review_scores_rating,review_scores_value,reviews_per_month,room_type,scrape_id,security_deposit,space,summary,transit,zipcode,weekly_price,monthly_price
0,3,"{TV,""Cable TV"",Internet,Wifi,Kitchen,""Pets liv...",0,77,1.0,Real Bed,1.0,2.0,1,0.0,0.0,2019-04-03,a week ago,moderate,San Francisco,$100.00,New update: the house next door is under const...,$25.00,2009-07-23,2,We are a family with 2 boys born in 2009 and 2...,t,1169,t,t,1.0,"San Francisco, California, United States",Holly,Duboce Triangle,https://a0.muscache.com/im/pictures/efdad96a-3...,100%,within an hour,2008-07-31,https://a0.muscache.com/im/pictures/efdad96a-3...,https://www.airbnb.com/users/show/1169,"['email', 'phone', 'facebook', 'reviews', 'kba']",* No Pets - even visiting guests for a short t...,958,t,t,2019-03-16,37.76931,https://www.airbnb.com/rooms/958,-122.43386,30.0,"Bright, Modern Garden Unit - 1BR/1B",*Quiet cul de sac in friendly neighborhood *St...,Western Addition,183,51.0,https://a0.muscache.com/im/pictures/b7c2a199-4...,$170.00,Apartment,f,f,t,10.0,10.0,10.0,10.0,10.0,97.0,10.0,1.55,Entire home/apt,2.019040e+13,$100.00,"Newly remodeled, modern, and bright garden uni...",New update: the house next door is under const...,*Public Transportation is 1/2 block away. *Ce...,94117,"$1,120.00","$4,200.00"
1,5,"{Internet,Wifi,Kitchen,Heating,""Family/kid fri...",0,0,1.0,Real Bed,2.0,3.0,1,0.0,0.0,2019-04-03,4 months ago,strict_14_with_grace_period,San Francisco,$100.00,We live in a large Victorian house on a quiet ...,$0.00,2009-05-03,2,Philip: English transplant to the Bay Area and...,t,8904,t,f,2.0,"San Francisco, California, United States",Philip And Tania,Bernal Heights,https://a0.muscache.com/im/users/8904/profile_...,80%,within a day,2009-03-02,https://a0.muscache.com/im/users/8904/profile_...,https://www.airbnb.com/users/show/8904,"['email', 'phone', 'reviews', 'kba', 'work_ema...","Please respect the house, the art work, the fu...",5858,f,t,2017-08-06,37.74511,https://www.airbnb.com/rooms/5858,-122.42102,60.0,Creative Sanctuary,I love how our neighborhood feels quiet but is...,Bernal Heights,111,0.0,https://a0.muscache.com/im/pictures/17714/3a7a...,$235.00,Apartment,f,f,t,10.0,10.0,10.0,10.0,10.0,98.0,9.0,0.92,Entire home/apt,2.019040e+13,,We live in a large Victorian house on a quiet ...,,The train is two blocks away and you can stop ...,94110,"$1,600.00","$5,500.00"
2,2,"{TV,Internet,Wifi,Kitchen,""Free street parking...",30,365,4.0,Real Bed,1.0,1.0,9,9.0,0.0,2019-04-03,17 months ago,strict_14_with_grace_period,San Francisco,$50.00,Nice and good public transportation. 7 minute...,$12.00,2009-08-31,1,7 minutes walk to UCSF. 15 minutes walk to US...,t,21994,t,f,10.0,"San Francisco, California, United States",Aaron,Cole Valley,https://a0.muscache.com/im/users/21994/profile...,100%,within a few hours,2009-06-17,https://a0.muscache.com/im/users/21994/profile...,https://www.airbnb.com/users/show/21994,"['email', 'phone', 'reviews', 'jumio', 'govern...","No party, No smoking, not for any kinds of smo...",7918,f,t,2016-11-21,37.76669,https://www.airbnb.com/rooms/7918,-122.45250,60.0,A Friendly Room - UCSF/USF - San Francisco,"Shopping old town, restaurants, McDonald, Whol...",Haight Ashbury,17,0.0,https://a0.muscache.com/im/pictures/26356/8030...,$65.00,Apartment,f,f,t,8.0,9.0,8.0,9.0,9.0,85.0,8.0,0.15,Private room,2.019040e+13,$200.00,Room rental-sunny view room/sink/Wi Fi (inner ...,Nice and good public transportation. 7 minute...,N Juda Muni and bus stop. Street parking.,94117,$485.00,"$1,685.00"
3,2,"{TV,Internet,Wifi,Kitchen,""Free street parking...",30,365,4.0,Real Bed,1.0,1.0,9,9.0,0.0,2019-04-03,17 months ago,strict_14_with_grace_period,San Francisco,$50.00,Nice and good public transportation. 7 minute...,$12.00,2014-09-08,1,7 minutes walk to UCSF. 15 minutes walk to US...,t,21994,t,f,10.0,"San Francisco, California, United States",Aaron,Cole Valley,https://a0.muscache.com/im/users/21994/profile...,100%,within a few hours,2009-06-17,https://a0.muscache.com/im/users/21994/profile...,https://www.airbnb.com/users/show/21994,"['email', 'phone', 'reviews', 'jumio', 'govern...",no pet no smoke no party inside the building,8142,f,t,2018-09-12,37.76487,https://www.airbnb.com/rooms/8142,-122.45183,90.0,Friendly Room Apt. Style -UCSF/USF - San Franc...,,Haight Ashbury,8,1.0,https://a0.muscache.com/im/pictures/27832/3b1f...,$65.00,Apartment,f,f,t,9.0,10.0,9.0,10.0,9.0,93.0,9.0,0.14,Private room,2.019040e+13,$200.00,Room rental Sunny view Rm/Wi-Fi/TV/sink/large ...,Nice and good public transportation. 7 minute...,"N Juda Muni, Bus and UCSF Shuttle. small shopp...",94117,$490.00,"$1,685.00"
4,5,"{TV,Internet,Wifi,Kitchen,Heating,""Family/kid ...",30,90,1.5,Real Bed,2.0,2.0,2,0.0,0.0,2019-04-03,4 months ago,strict_14_with_grace_period,San Francisco,$225.00,Pls email before booking. Interior featured i...,$150.00,2009-09-25,2,Always searching for a perfect piece at Europe...,t,24215,t,f,2.0,"San Francisco, California, United States",Rosy,Alamo Square,https://a0.muscache.com/im/users/24215/profile...,100%,within an hour,2009-07-02,https://a0.muscache.com/im/users/24215/profile...,https://www.airbnb.com/users/show/24215,"['email', 'phone', 'reviews', 'kba']",House Manual and House Rules will be provided ...,8339,f,t,2018-08-11,37.77525,https://www.airbnb.com/rooms/8339,-122.43637,1125.0,Historic Alamo Square Victorian,,Western Addition,27,1.0,https://a0.muscache.com/im/pictures/6f84a7c2-e...,$785.00,House,t,t,t,10.0,10.0,10.0,10.0,10.0,97.0,9.0,0.23,Entire home/apt,2.019040e+13,$0.00,Please send us a quick message before booking ...,Pls email before booking. Interior featured i...,,94117,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88932,5,"{TV,Wifi,""Free parking on premises"",""Free stre...",28,88,2.0,Real Bed,2.0,4.0,4,4.0,0.0,2019-09-12,today,flexible,Daly City,$70.00,,$20.00,,4,,t,286256082,f,f,4.0,US,Francis,Crocker Amazon,https://a0.muscache.com/im/pictures/user/15d11...,100%,within an hour,2019-08-16,https://a0.muscache.com/im/pictures/user/15d11...,https://www.airbnb.com/users/show/286256082,"['email', 'phone', 'offline_government_id', 's...",Quiet hours: 10pm - 8am,38533186,f,t,,37.70707,https://www.airbnb.com/rooms/38533186,-122.44812,29.0,!159D 2B1B newly renovated quiet unit w/parking,,Crocker Amazon,0,0.0,https://a0.muscache.com/im/pictures/38cf6009-b...,$150.00,House,f,f,f,,,,,,,,,Private room,2.019091e+13,$100.00,,,,94014,,
88933,4,"{TV,Internet,Wifi,Kitchen,""Free street parking...",0,86,1.0,Real Bed,2.0,2.0,3,1.0,0.0,2019-09-12,today,flexible,San Francisco,,2 bedroom apartment in the lively Inner Sunset...,$0.00,,1,"I am a retired software engineer, recently re...",t,29536420,t,t,1.0,"San Francisco, California, United States",Claudia,Inner Sunset,https://a0.muscache.com/im/users/29536420/prof...,100%,within an hour,2015-03-17,https://a0.muscache.com/im/users/29536420/prof...,https://www.airbnb.com/users/show/29536420,"['email', 'phone', 'reviews', 'jumio', 'offlin...",,38544501,t,t,,37.76324,https://www.airbnb.com/rooms/38544501,-122.47466,1125.0,Inner Sunset Monthly rental,Many great Asian restaurants & shops including...,Inner Sunset,0,0.0,https://a0.muscache.com/im/pictures/d9926f9a-9...,$125.00,Apartment,f,f,t,,,,,,,,,Entire home/apt,2.019091e+13,,There are a washer and dryer in the closet for...,2 bedroom apartment in the lively Inner Sunset...,The N-Judah stops at 16th & Judah (15th going ...,94122,,
88934,4,"{TV,Wifi,Kitchen,""Free parking on premises"",""P...",0,315,2.5,Real Bed,3.0,2.0,2,0.0,0.0,2019-09-12,today,flexible,San Francisco,,Pitch Perfect 3 Bedroom townhouse apartment st...,$0.00,,1,I am a 50 year old originally from India and i...,t,9536576,t,t,1.0,"San Francisco, California, United States",Raj,Pacific Heights,https://a0.muscache.com/im/users/9536576/profi...,100%,within an hour,2013-10-20,https://a0.muscache.com/im/users/9536576/profi...,https://www.airbnb.com/users/show/9536576,"['email', 'phone', 'reviews', 'kba']",,38550921,t,t,,37.79199,https://www.airbnb.com/rooms/38550921,-122.44106,1125.0,"Designer 3BR, 2.5 bath SF Apartment in Pac Hei...",* Close to Alta Plaza Park * Walking distance ...,Pacific Heights,0,0.0,https://a0.muscache.com/im/pictures/80c82053-1...,$450.00,Apartment,f,f,t,,,,,,,,,Entire home/apt,2.019091e+13,,The first level consists of formal living and ...,Pitch Perfect 3 Bedroom townhouse apartment st...,* Bus transportation is close by,94115,,
88935,2,"{Wifi,""Smoke detector"",""Carbon monoxide detect...",16,66,1.0,Real Bed,1.0,1.0,2,2.0,0.0,2019-09-12,today,strict_14_with_grace_period,San Francisco,$40.00,Private Master Suite located in outer Sunset. ...,$0.00,,1,,t,291879348,f,f,0.0,US,Jason,Outer Sunset,https://a0.muscache.com/im/pictures/user/cca19...,,,2019-09-04,https://a0.muscache.com/im/pictures/user/cca19...,https://www.airbnb.com/users/show/291879348,"['phone', 'offline_government_id', 'government...",Quiet hour after 10pm Please do not invite oth...,38556299,t,t,,37.74284,https://www.airbnb.com/rooms/38556299,-122.49925,1125.0,Private Master bedroom w/ private bathroom -1,Safe and quiet neighborhood.,Parkside,0,0.0,https://a0.muscache.com/im/pictures/95233473-5...,$75.00,House,f,f,t,,,,,,,,,Private room,2.019091e+13,$0.00,"This is a share environment,please be aware th...",Private Master Suite located in outer Sunset. ...,"Bus, Uber, Lyft",94116,,


In [74]:
listings


Unnamed: 0,accommodates,amenities,availability_30,availability_365,bathrooms,bed_type,bedrooms,beds,calculated_host_listings_count,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,calendar_last_scraped,calendar_updated,cancellation_policy,city,cleaning_fee,description,extra_people,first_review,guests_included,host_about,host_has_profile_pic,host_id,host_identity_verified,host_is_superhost,host_listings_count,host_location,host_name,host_neighbourhood,host_picture_url,host_response_rate,host_response_time,host_since,host_thumbnail_url,host_url,host_verifications,house_rules,id,instant_bookable,is_location_exact,last_review,latitude,listing_url,longitude,maximum_maximum_nights,name,neighborhood_overview,neighbourhood_cleansed,number_of_reviews,number_of_reviews_ltm,picture_url,price,property_type,require_guest_phone_verification,require_guest_profile_picture,requires_license,review_scores_accuracy,review_scores_checkin,review_scores_cleanliness,review_scores_communication,review_scores_location,review_scores_rating,review_scores_value,reviews_per_month,room_type,scrape_id,security_deposit,space,summary,transit,zipcode
0,3,"{TV,""Cable TV"",Internet,Wifi,Kitchen,""Pets liv...",0,77,1.0,Real Bed,1.0,2.0,1,0.0,0.0,2019-04-03,a week ago,moderate,San Francisco,$100.00,New update: the house next door is under const...,$25.00,2009-07-23,2,We are a family with 2 boys born in 2009 and 2...,t,1169,t,t,1.0,"San Francisco, California, United States",Holly,Duboce Triangle,https://a0.muscache.com/im/pictures/efdad96a-3...,100%,within an hour,2008-07-31,https://a0.muscache.com/im/pictures/efdad96a-3...,https://www.airbnb.com/users/show/1169,"['email', 'phone', 'facebook', 'reviews', 'kba']",* No Pets - even visiting guests for a short t...,958,t,t,2019-03-16,37.76931,https://www.airbnb.com/rooms/958,-122.43386,30.0,"Bright, Modern Garden Unit - 1BR/1B",*Quiet cul de sac in friendly neighborhood *St...,Western Addition,183,51.0,https://a0.muscache.com/im/pictures/b7c2a199-4...,$170.00,Apartment,f,f,t,10.0,10.0,10.0,10.0,10.0,97.0,10.0,1.55,Entire home/apt,2.019040e+13,$100.00,"Newly remodeled, modern, and bright garden uni...",New update: the house next door is under const...,*Public Transportation is 1/2 block away. *Ce...,94117
1,5,"{Internet,Wifi,Kitchen,Heating,""Family/kid fri...",0,0,1.0,Real Bed,2.0,3.0,1,0.0,0.0,2019-04-03,4 months ago,strict_14_with_grace_period,San Francisco,$100.00,We live in a large Victorian house on a quiet ...,$0.00,2009-05-03,2,Philip: English transplant to the Bay Area and...,t,8904,t,f,2.0,"San Francisco, California, United States",Philip And Tania,Bernal Heights,https://a0.muscache.com/im/users/8904/profile_...,80%,within a day,2009-03-02,https://a0.muscache.com/im/users/8904/profile_...,https://www.airbnb.com/users/show/8904,"['email', 'phone', 'reviews', 'kba', 'work_ema...","Please respect the house, the art work, the fu...",5858,f,t,2017-08-06,37.74511,https://www.airbnb.com/rooms/5858,-122.42102,60.0,Creative Sanctuary,I love how our neighborhood feels quiet but is...,Bernal Heights,111,0.0,https://a0.muscache.com/im/pictures/17714/3a7a...,$235.00,Apartment,f,f,t,10.0,10.0,10.0,10.0,10.0,98.0,9.0,0.92,Entire home/apt,2.019040e+13,,We live in a large Victorian house on a quiet ...,,The train is two blocks away and you can stop ...,94110
2,2,"{TV,Internet,Wifi,Kitchen,""Free street parking...",30,365,4.0,Real Bed,1.0,1.0,9,9.0,0.0,2019-04-03,17 months ago,strict_14_with_grace_period,San Francisco,$50.00,Nice and good public transportation. 7 minute...,$12.00,2009-08-31,1,7 minutes walk to UCSF. 15 minutes walk to US...,t,21994,t,f,10.0,"San Francisco, California, United States",Aaron,Cole Valley,https://a0.muscache.com/im/users/21994/profile...,100%,within a few hours,2009-06-17,https://a0.muscache.com/im/users/21994/profile...,https://www.airbnb.com/users/show/21994,"['email', 'phone', 'reviews', 'jumio', 'govern...","No party, No smoking, not for any kinds of smo...",7918,f,t,2016-11-21,37.76669,https://www.airbnb.com/rooms/7918,-122.45250,60.0,A Friendly Room - UCSF/USF - San Francisco,"Shopping old town, restaurants, McDonald, Whol...",Haight Ashbury,17,0.0,https://a0.muscache.com/im/pictures/26356/8030...,$65.00,Apartment,f,f,t,8.0,9.0,8.0,9.0,9.0,85.0,8.0,0.15,Private room,2.019040e+13,$200.00,Room rental-sunny view room/sink/Wi Fi (inner ...,Nice and good public transportation. 7 minute...,N Juda Muni and bus stop. Street parking.,94117
3,2,"{TV,Internet,Wifi,Kitchen,""Free street parking...",30,365,4.0,Real Bed,1.0,1.0,9,9.0,0.0,2019-04-03,17 months ago,strict_14_with_grace_period,San Francisco,$50.00,Nice and good public transportation. 7 minute...,$12.00,2014-09-08,1,7 minutes walk to UCSF. 15 minutes walk to US...,t,21994,t,f,10.0,"San Francisco, California, United States",Aaron,Cole Valley,https://a0.muscache.com/im/users/21994/profile...,100%,within a few hours,2009-06-17,https://a0.muscache.com/im/users/21994/profile...,https://www.airbnb.com/users/show/21994,"['email', 'phone', 'reviews', 'jumio', 'govern...",no pet no smoke no party inside the building,8142,f,t,2018-09-12,37.76487,https://www.airbnb.com/rooms/8142,-122.45183,90.0,Friendly Room Apt. Style -UCSF/USF - San Franc...,,Haight Ashbury,8,1.0,https://a0.muscache.com/im/pictures/27832/3b1f...,$65.00,Apartment,f,f,t,9.0,10.0,9.0,10.0,9.0,93.0,9.0,0.14,Private room,2.019040e+13,$200.00,Room rental Sunny view Rm/Wi-Fi/TV/sink/large ...,Nice and good public transportation. 7 minute...,"N Juda Muni, Bus and UCSF Shuttle. small shopp...",94117
4,5,"{TV,Internet,Wifi,Kitchen,Heating,""Family/kid ...",30,90,1.5,Real Bed,2.0,2.0,2,0.0,0.0,2019-04-03,4 months ago,strict_14_with_grace_period,San Francisco,$225.00,Pls email before booking. Interior featured i...,$150.00,2009-09-25,2,Always searching for a perfect piece at Europe...,t,24215,t,f,2.0,"San Francisco, California, United States",Rosy,Alamo Square,https://a0.muscache.com/im/users/24215/profile...,100%,within an hour,2009-07-02,https://a0.muscache.com/im/users/24215/profile...,https://www.airbnb.com/users/show/24215,"['email', 'phone', 'reviews', 'kba']",House Manual and House Rules will be provided ...,8339,f,t,2018-08-11,37.77525,https://www.airbnb.com/rooms/8339,-122.43637,1125.0,Historic Alamo Square Victorian,,Western Addition,27,1.0,https://a0.muscache.com/im/pictures/6f84a7c2-e...,$785.00,House,t,t,t,10.0,10.0,10.0,10.0,10.0,97.0,9.0,0.23,Entire home/apt,2.019040e+13,$0.00,Please send us a quick message before booking ...,Pls email before booking. Interior featured i...,,94117
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88932,5,"{TV,Wifi,""Free parking on premises"",""Free stre...",28,88,2.0,Real Bed,2.0,4.0,4,4.0,0.0,2019-09-12,today,flexible,Daly City,$70.00,,$20.00,,4,,t,286256082,f,f,4.0,US,Francis,Crocker Amazon,https://a0.muscache.com/im/pictures/user/15d11...,100%,within an hour,2019-08-16,https://a0.muscache.com/im/pictures/user/15d11...,https://www.airbnb.com/users/show/286256082,"['email', 'phone', 'offline_government_id', 's...",Quiet hours: 10pm - 8am,38533186,f,t,,37.70707,https://www.airbnb.com/rooms/38533186,-122.44812,29.0,!159D 2B1B newly renovated quiet unit w/parking,,Crocker Amazon,0,0.0,https://a0.muscache.com/im/pictures/38cf6009-b...,$150.00,House,f,f,f,,,,,,,,,Private room,2.019091e+13,$100.00,,,,94014
88933,4,"{TV,Internet,Wifi,Kitchen,""Free street parking...",0,86,1.0,Real Bed,2.0,2.0,3,1.0,0.0,2019-09-12,today,flexible,San Francisco,,2 bedroom apartment in the lively Inner Sunset...,$0.00,,1,"I am a retired software engineer, recently re...",t,29536420,t,t,1.0,"San Francisco, California, United States",Claudia,Inner Sunset,https://a0.muscache.com/im/users/29536420/prof...,100%,within an hour,2015-03-17,https://a0.muscache.com/im/users/29536420/prof...,https://www.airbnb.com/users/show/29536420,"['email', 'phone', 'reviews', 'jumio', 'offlin...",,38544501,t,t,,37.76324,https://www.airbnb.com/rooms/38544501,-122.47466,1125.0,Inner Sunset Monthly rental,Many great Asian restaurants & shops including...,Inner Sunset,0,0.0,https://a0.muscache.com/im/pictures/d9926f9a-9...,$125.00,Apartment,f,f,t,,,,,,,,,Entire home/apt,2.019091e+13,,There are a washer and dryer in the closet for...,2 bedroom apartment in the lively Inner Sunset...,The N-Judah stops at 16th & Judah (15th going ...,94122
88934,4,"{TV,Wifi,Kitchen,""Free parking on premises"",""P...",0,315,2.5,Real Bed,3.0,2.0,2,0.0,0.0,2019-09-12,today,flexible,San Francisco,,Pitch Perfect 3 Bedroom townhouse apartment st...,$0.00,,1,I am a 50 year old originally from India and i...,t,9536576,t,t,1.0,"San Francisco, California, United States",Raj,Pacific Heights,https://a0.muscache.com/im/users/9536576/profi...,100%,within an hour,2013-10-20,https://a0.muscache.com/im/users/9536576/profi...,https://www.airbnb.com/users/show/9536576,"['email', 'phone', 'reviews', 'kba']",,38550921,t,t,,37.79199,https://www.airbnb.com/rooms/38550921,-122.44106,1125.0,"Designer 3BR, 2.5 bath SF Apartment in Pac Hei...",* Close to Alta Plaza Park * Walking distance ...,Pacific Heights,0,0.0,https://a0.muscache.com/im/pictures/80c82053-1...,$450.00,Apartment,f,f,t,,,,,,,,,Entire home/apt,2.019091e+13,,The first level consists of formal living and ...,Pitch Perfect 3 Bedroom townhouse apartment st...,* Bus transportation is close by,94115
88935,2,"{Wifi,""Smoke detector"",""Carbon monoxide detect...",16,66,1.0,Real Bed,1.0,1.0,2,2.0,0.0,2019-09-12,today,strict_14_with_grace_period,San Francisco,$40.00,Private Master Suite located in outer Sunset. ...,$0.00,,1,,t,291879348,f,f,0.0,US,Jason,Outer Sunset,https://a0.muscache.com/im/pictures/user/cca19...,,,2019-09-04,https://a0.muscache.com/im/pictures/user/cca19...,https://www.airbnb.com/users/show/291879348,"['phone', 'offline_government_id', 'government...",Quiet hour after 10pm Please do not invite oth...,38556299,t,t,,37.74284,https://www.airbnb.com/rooms/38556299,-122.49925,1125.0,Private Master bedroom w/ private bathroom -1,Safe and quiet neighborhood.,Parkside,0,0.0,https://a0.muscache.com/im/pictures/95233473-5...,$75.00,House,f,f,t,,,,,,,,,Private room,2.019091e+13,$0.00,"This is a share environment,please be aware th...",Private Master Suite located in outer Sunset. ...,"Bus, Uber, Lyft",94116


**Removing columns containing URL data**

URL columns contain only url link to images, which is not pertinent to our analysis.

In [None]:
#Drop columns containing URL's
listings = listings[listings.columns.drop(list(listings.filter(regex='url$')))]

#Updated listings shape
print('Updated listings shape:', listings.shape)

### Column Specific Cleaning

Cleaning specific columns in listings data in which there were specific value issues spotted in the Pandas Profiling report.

In [None]:
#Create list of cols that contain $%,{}[]"'
cols = ['cleaning_fee','extra_people','price','host_response_rate','security_deposit',
        'host_verifications','amenities']

#Remove $%,{}[]"'
listings[cols] = listings[cols].replace('[$,%{}\"\'\[\]]', ' ', regex=True)

#Remove white space between numerics
listings[cols] = listings[cols].replace('(?<=\d)\s+(?=\d)', '', regex=True)

#Check
listings[cols].head(3)

**Mapping 1's and 0's to Booleans**

There are several columns that contain strings 't' and 'f' to signify True and False. I will be updating these strings with the appropriate boolean values.

In [None]:
#List of columns to convert t's to 1's and f's to 0's
cols = ['host_has_profile_pic', 'host_identity_verified', 'host_is_superhost','instant_bookable','is_location_exact',
       'require_guest_phone_verification','require_guest_profile_picture','requires_license']

#Create dictionary to map True and False
mymap = {'t':True, 'f':False}

#Replace t's and f's with 1 and 0
listings[cols]=listings[cols].applymap(lambda s: mymap.get(s) if s in mymap else s)

#Fill missing values with a False
listings[cols] = listings[cols].fillna(0)

#Check
listings[cols].head(3)

**City Column**

In [None]:
#Replace neighborhood information with San Fancisco
listings.city.replace('^(?!South|D|V|Br|Ba|Nor).*', 'San Francisco', regex=True, inplace=True)

#Create list of outliers
outliers = ['Bay Area', 'Nor cal', 'Vallejo']

#Investigate rows with these outlier cities
listings[listings.city.isin(outliers)]

Outlier properties are SF properties. Updating city column

In [None]:
#Update city column
listings.city[listings.city.isin(outliers)]= 'San Francisco'

#Strip white space
listings.city = listings.city.str.strip()

#Check city values
listings.groupby('city')['city'].count()

**Miscellaneous column cleaning**

In [None]:
#convert 'a week ago' to '1 week ago' in calendar_updated
listings['calendar_updated'].replace('a week ago', '1 week ago', inplace=True)

**Zipcode**

In [None]:
import os

os.environ["GOOGLE_API_KEY"] = "AIzaSyAxsYeHC9EVsnvV7eiqto8JkwT43cXJm3g"
import geocoder

reverse = geocoder.google([37.727920, -122.440290], method = 'reverse', key = 'AIzaSyAxsYeHC9EVsnvV7eiqto8JkwT43cXJm3g')

print('San Francisco, California', reverse.city)

In [None]:
#Capture lat and long of rows with CA as the zipcode. 
print(listings[['latitude', 'longitude', 'zipcode']][listings.zipcode =='CA'])

#Per lat/long, zip = 94112. Updating 
listings.zipcode[listings.zipcode =='CA'] = 94112

In [None]:
#Remove trailing 0's
listings['zipcode'] = listings['zipcode'].replace('\.0$', '', regex=True)

#Remove CA\s from zipcode
listings.zipcode.replace('CA\s','',inplace = True, regex=True)

### Missing Values

In [None]:
#Total data points available
total = listings.notna().sum().sort_values(ascending=True)

#Capture total number of missing data per column
total_missing = listings.isna().sum().sort_values(ascending=False)

#Calculate the % of missing data per column
percent = (listings.isnull().sum()/listings.isnull().count()).sort_values(ascending=False)


#Concatenate into a pd dataframe
missing_data = pd.concat([total, total_missing, percent], axis=1, keys=['Total','Missing', 'Missing %'])

#Format percentage  
missing_data['Missing %'] = missing_data['Missing %'].apply(lambda x: x * 100)

#Sort missing_data by Missing Percent
missing_data= missing_data.sort_values(by = 'Missing %', ascending=False)

#View columns with data missing
missing_data.loc[missing_data['Missing %'] > 0]

At this time, I will leave the data as is. We have a lot of rows to build a model from. 

### Data Type Conversion

In [None]:
listings.dtypes

#Need to be numeric (int64)
cleaning_fee, extra_people, host_response_rate                              

#Need to be date
calendar_last_scraped, calendar_updated, first_review  

#/need to be bool
host_has_profile_pic, host_identity_verified, host_is_superhost

**Format boolean columns**

In [None]:
#Converting boolean columns to int8
cols = ['host_has_profile_pic', 'host_identity_verified', 'host_is_superhost','instant_bookable','is_location_exact',
       'require_guest_phone_verification','require_guest_profile_picture','requires_license']

#Convert cols to int 8 to save memory and check
listings[cols] = listings[cols].astype('int8', errors='ignore')

#check
listings[cols].dtypes

**Format date columns**

In [None]:
#List of columns to convert to dates
cols = ['calendar_last_scraped','first_review', 'host_since', 'last_review']

#Convert cols to date time
listings[cols] = listings[cols].swifter.apply(pd.to_datetime,errors='coerce', axis=1)

#Check
listings[cols].dtypes

**Formatting strings to numerics**

In [None]:
#List of columns to convert to numeric
cols = ['cleaning_fee','extra_people','price','host_response_rate','security_deposit']

#Remove $,commas, and % from cols in listings
listings[cols] = listings[cols].replace('[$%,]', '', regex=True)

#Convert columns in cols to numeric
listings[cols]= listings[cols].swifter.apply(pd.to_numeric, axis=1)

#Check
listings[cols].dtypes

In [None]:
#Convert zipcodes to numeric
listings.zipcode = listings.zipcode.swifter.apply(pd.to_numeric, axis=1)

# Remove this if not needed

In [None]:
#Present listings memory usage
#listings.memory_usage(deep=True).sort_values(ascending=False)

- 'int8' for small integers
- 'category' for strings with few unique values
- 'Sparse' if most values are 0 or NaN


In [None]:
# listings = listings.astype({ : 'int8',
#                              : 'category',
#                              : 'Sparse[int/str]'})

# int8: accommodates

#### Listings Outlier Removal for Price variable

In [None]:
#Airbnb price distributions
listings.price.plot(kind = 'hist', bins=55,  title='Price Distribution in San Francisco',
          legend = True, figsize=(10,6));

#Get axis object
ax = plt.gca()

#Format X axis
ax.get_xaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "$ {:,}".format(int(x))))

#Mute grid lines
ax.grid(b=False, which ='major', axis = 'x')

In [None]:
print(listings.shape)

In [None]:
#Calculate IQR of price
q25 = listings['price'].quantile(0.25)
q75= listings['price'].quantile(0.75)
iqr = q75 - q25

#Print percentiles
print('Percentiles: 25th={:.3f}, 75th={:.3f} \nIQR= {:.3f}'.format(q25, q75, iqr))

#Calculate outlier cutoffs
cut_off =1.5 * iqr
lower, upper = q25 - cut_off, q75 + iqr

#Identify outliers
outliers = [x for x in listings.price if x < lower or x > upper]
print("Number of outliers identified: {}".format(len(outliers)))

#Remove outliers
outliers_removed = [x for x in listings.price if x >= lower and x <= upper]
print('Non-outlier observations: {}'.format(len(outliers_removed)))

#Update df
listings = listings[listings.price.isin(outliers_removed)]
listings.shape

In [None]:
#Airbnb price distributions(outliers removed)
listings.price.plot(kind = 'hist', bins=40,  title='Price Distribution in San Francisco',
           figsize=(10,6));

#Capture mean and median of price
mean = listings.price.mean()
median = listings.price.median()

#Plot mean and median
plt.axvline(mean, color='r', linewidth=2, linestyle='--', label= str(round(mean,2)))
plt.axvline(median, color='green',linewidth=2, linestyle='--', label= str(median))

#Get axis object
ax = plt.gca()

#Format X axis
ax.get_xaxis().set_major_formatter(plt.FuncFormatter(lambda x, loc: "${:,}".format(int(x))))

#Mute grid lines
plt.grid(b=False, which ='major', axis = 'x')

plt.legend(loc='upper right',frameon=True, fancybox=True)


In [None]:
# #Set path to write listings
# path = r'Data\02_Intermediate\12_24_2019_listings_cleaned.csv'

# #Write listings to path
# listings.to_csv(path)