## Final Project Submission

Please fill out:
* Student name: Laura Lewis
* Student pace: full time
* Scheduled project review date/time: 15 May 2019, 3:30pm BST
* Instructor name: Joe San Pietro
* Blog post URL:
***

# Table of contents

[1. Introduction and project aims](#Introduction-and-project-aims)

[2. The dataset](#The-dataset)

[3. Cleaning and pre-processing](#Cleaning-and-pre-processing)

[4. Exploratory data analysis](#Exploratory-data-analysis)

[5. Building a neural network](#Building-a-neural-network)

[6. Conclusions and recommendations](#Conclusions-and-recommendations)

***

# Introduction and project aims

- Short description of Airbnb
- Why Airbnb pricing is important
- Difficult thing to do correctly, to balance revenue and occupancy (explanation)
- Several pricing algorithms out there, including Airbnb's own. But all of them require you to add a a base price first (and sometimes also a minimum and maximum price).
- This project aims to build a neural network to predict the base price for properties in London.

***

# The dataset

- Insideairbnb.com - anti-Airbnb lobby group that scrapes data
- Includes data on all Airbnb listings in London that are live on x (date)
- Limitations - messy data. Most importantly, only includes advertised price, not actual average price paid or the price advertised on the calendar - each day can have a different price.
- Advertised price is set by the listing owner and can be any amount. This is the price that you see on Airbnb if you don't enter dates. The sensible option is to set it at the lowest possible price that your property is actually listed at, in order to entice more customers in. However, a lot of people do not know how to set up Airbnb listings well, and so this is sometimes also set at very high values.
- This dataset can be used as a proof of concept. A more accurate version could be built using data on the actual average nightly rates paid, e.g. from sites like AirDNA.

***

# Cleaning and pre-processing

In this section...

### Importing the libraries and data

In [1]:
# Importing required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [17]:
raw_df = pd.read_csv('data/listings.csv')
print(f"The dataset contains {len(raw_df)} Airbnb listings")
pd.set_option('display.max_columns', len(raw_df.columns)) # To view all columns
raw_df.head(3)

The dataset contains 79671 Airbnb listings


  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,experiences_offered,neighborhood_overview,notes,transit,access,interaction,house_rules,thumbnail_url,medium_url,picture_url,xl_picture_url,host_id,host_url,host_name,host_since,host_location,host_about,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_thumbnail_url,host_picture_url,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,street,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,city,state,zipcode,market,smart_location,country_code,country,latitude,longitude,is_location_exact,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,square_feet,price,weekly_price,monthly_price,security_deposit,cleaning_fee,guests_included,extra_people,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,calendar_last_scraped,number_of_reviews,number_of_reviews_ltm,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,requires_license,license,jurisdiction_names,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,13913,https://www.airbnb.com/rooms/13913,20190409040957,2019-04-11,Holiday London DB Room Let-on going,My bright double bedroom with a large window h...,"Hello Everyone, I'm offering my lovely double ...",My bright double bedroom with a large window h...,business,Finsbury Park is a friendly melting pot commun...,For art lovers I can give guest my Tate Member...,The flat only a 10 minute walk to Finsbury Par...,Guest will have access to the self catering ki...,I like to have little chats with my guest over...,I'm an artist and have my artwork up on the wa...,,,https://a0.muscache.com/im/pictures/985879/b06...,,54730,https://www.airbnb.com/users/show/54730,Alina,2009-11-16,"London, England, United Kingdom",I am a Multi-Media Visual Artist and Creative ...,within a day,60%,,f,https://a0.muscache.com/im/users/54730/profile...,https://a0.muscache.com/im/users/54730/profile...,LB of Islington,4.0,4.0,"['email', 'phone', 'facebook', 'reviews']",t,f,"Islington, Greater London, United Kingdom",LB of Islington,Islington,,Islington,Greater London,N4 3,London,"Islington, United Kingdom",GB,United Kingdom,51.56802,-0.11121,t,Apartment,Private room,2,1.0,1.0,0.0,Real Bed,"{TV,""Cable TV"",Wifi,Kitchen,""Paid parking off ...",538.0,$65.00,$333.00,"$1,176.00",$100.00,$15.00,1,$15.00,1,29,1,1,29,29,1.0,29.0,4 months ago,t,10,39,68,343,2019-04-11,14,3,2010-08-18,2018-06-17,95.0,9.0,10.0,9.0,10.0,9.0,9.0,f,,,f,f,moderate,f,f,3,1,2,0,0.13
1,15400,https://www.airbnb.com/rooms/15400,20190409040957,2019-04-11,Bright Chelsea Apartment. Chelsea!,Lots of windows and light. St Luke's Gardens ...,Bright Chelsea Apartment This is a bright one...,Lots of windows and light. St Luke's Gardens ...,romantic,It is Chelsea.,The building next door is in the process of be...,The underground stations are South Kensington ...,There are two wardrobes for guests exclusive u...,If I am in the country I like to welcome my gu...,NO SMOKING PLEASE.. No unauthorised guests. No...,,,https://a0.muscache.com/im/pictures/428392/462...,,60302,https://www.airbnb.com/users/show/60302,Philippa,2009-12-05,"Kensington, England, United Kingdom","English, grandmother, I have travelled quite ...",within a few hours,100%,,f,https://a0.muscache.com/im/users/60302/profile...,https://a0.muscache.com/im/users/60302/profile...,Chelsea,1.0,1.0,"['email', 'phone', 'reviews', 'jumio', 'govern...",t,t,"London, United Kingdom",Chelsea,Kensington and Chelsea,,London,,SW3,London,"London, United Kingdom",GB,United Kingdom,51.48796,-0.16898,t,Apartment,Entire home/apt,2,1.0,1.0,1.0,Real Bed,"{TV,""Cable TV"",Internet,Wifi,""Air conditioning...",,$100.00,$600.00,"$2,250.00",$150.00,$50.00,2,$0.00,3,50,3,3,50,50,3.0,50.0,5 weeks ago,t,4,4,4,134,2019-04-11,81,0,2009-12-21,2018-03-30,95.0,10.0,10.0,10.0,10.0,10.0,9.0,f,,,f,f,strict_14_with_grace_period,t,t,1,1,0,0,0.71
2,17402,https://www.airbnb.com/rooms/17402,20190409040957,2019-04-11,Superb 3-Bed/2 Bath & Wifi: Trendy W1,"Open from June 2018 after a 3-year break, we a...",Ready again from June 2018 for bookings after ...,"Open from June 2018 after a 3-year break, we a...",none,"Location, location, location! You won't find b...",This property has new flooring throughout. Gue...,You can walk to tourist London or take numerou...,Full use of whole independent apartment,"Always available by email or phone (before, du...",The apartment benefits from new flooring throu...,,,https://a0.muscache.com/im/pictures/5673eb4f-a...,,67564,https://www.airbnb.com/users/show/67564,Liz,2010-01-04,"London, England, United Kingdom",We are Liz and Jack. We manage a number of ho...,within a few hours,62%,,t,https://a0.muscache.com/im/users/67564/profile...,https://a0.muscache.com/im/users/67564/profile...,Fitzrovia,16.0,16.0,"['email', 'phone', 'reviews', 'jumio', 'offlin...",t,t,"London, Fitzrovia, United Kingdom",Fitzrovia,Westminster,,London,Fitzrovia,W1T4BP,London,"London, United Kingdom",GB,United Kingdom,51.52098,-0.14002,t,Apartment,Entire home/apt,6,2.0,3.0,3.0,Real Bed,"{TV,Wifi,Kitchen,""Paid parking off premises"",E...",,$500.00,"$1,378.00",,$350.00,$65.00,4,$10.00,3,365,3,3,365,365,3.0,365.0,yesterday,t,30,60,89,364,2019-04-11,39,14,2011-03-21,2018-10-15,93.0,10.0,9.0,9.0,9.0,10.0,9.0,f,,,f,f,strict_14_with_grace_period,f,f,13,13,0,0,0.4


### Dropping columns

NLP will not be used in the creation of an initial model (although they could be used to augment the model later, e.g. through sentiment analysis). Therefore, free text columns will be dropped for now, as will other columns which are not useful for predicting price (e.g. url, host name and other host-related features that are unrelated to the property).

In [31]:
cols_to_drop = ['listing_url', 'scrape_id', 'last_scraped', 'name', 'summary', 'space', 'description', 'neighborhood_overview', 'notes', 'transit', 'access', 'interaction', 'house_rules', 'thumbnail_url', 'medium_url', 'picture_url', 'xl_picture_url', 'host_id', 'host_url', 'host_name', 'host_location', 'host_about', 'host_thumbnail_url', 'host_picture_url', 'host_neighbourhood', 'host_verifications', 'calendar_last_scraped']
df = raw_df.drop(cols_to_drop, axis=1)

Unnamed: 0,id,experiences_offered,host_since,host_location,host_response_time,host_response_rate,host_acceptance_rate,host_is_superhost,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,street,neighbourhood,neighbourhood_cleansed,neighbourhood_group_cleansed,city,state,zipcode,market,smart_location,country_code,country,latitude,longitude,is_location_exact,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,square_feet,price,weekly_price,monthly_price,security_deposit,cleaning_fee,guests_included,extra_people,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,number_of_reviews,number_of_reviews_ltm,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,requires_license,license,jurisdiction_names,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,13913,business,2009-11-16,"London, England, United Kingdom",within a day,60%,,f,LB of Islington,4.0,4.0,"['email', 'phone', 'facebook', 'reviews']",t,f,"Islington, Greater London, United Kingdom",LB of Islington,Islington,,Islington,Greater London,N4 3,London,"Islington, United Kingdom",GB,United Kingdom,51.56802,-0.11121,t,Apartment,Private room,2,1.0,1.0,0.0,Real Bed,"{TV,""Cable TV"",Wifi,Kitchen,""Paid parking off ...",538.0,$65.00,$333.00,"$1,176.00",$100.00,$15.00,1,$15.00,1,29,1,1,29,29,1.0,29.0,4 months ago,t,10,39,68,343,14,3,2010-08-18,2018-06-17,95.0,9.0,10.0,9.0,10.0,9.0,9.0,f,,,f,f,moderate,f,f,3,1,2,0,0.13
1,15400,romantic,2009-12-05,"Kensington, England, United Kingdom",within a few hours,100%,,f,Chelsea,1.0,1.0,"['email', 'phone', 'reviews', 'jumio', 'govern...",t,t,"London, United Kingdom",Chelsea,Kensington and Chelsea,,London,,SW3,London,"London, United Kingdom",GB,United Kingdom,51.48796,-0.16898,t,Apartment,Entire home/apt,2,1.0,1.0,1.0,Real Bed,"{TV,""Cable TV"",Internet,Wifi,""Air conditioning...",,$100.00,$600.00,"$2,250.00",$150.00,$50.00,2,$0.00,3,50,3,3,50,50,3.0,50.0,5 weeks ago,t,4,4,4,134,81,0,2009-12-21,2018-03-30,95.0,10.0,10.0,10.0,10.0,10.0,9.0,f,,,f,f,strict_14_with_grace_period,t,t,1,1,0,0,0.71
2,17402,none,2010-01-04,"London, England, United Kingdom",within a few hours,62%,,t,Fitzrovia,16.0,16.0,"['email', 'phone', 'reviews', 'jumio', 'offlin...",t,t,"London, Fitzrovia, United Kingdom",Fitzrovia,Westminster,,London,Fitzrovia,W1T4BP,London,"London, United Kingdom",GB,United Kingdom,51.52098,-0.14002,t,Apartment,Entire home/apt,6,2.0,3.0,3.0,Real Bed,"{TV,Wifi,Kitchen,""Paid parking off premises"",E...",,$500.00,"$1,378.00",,$350.00,$65.00,4,$10.00,3,365,3,3,365,365,3.0,365.0,yesterday,t,30,60,89,364,39,14,2011-03-21,2018-10-15,93.0,10.0,9.0,9.0,9.0,10.0,9.0,f,,,f,f,strict_14_with_grace_period,f,f,13,13,0,0,0.4
3,24328,family,2009-09-28,"London, England, United Kingdom",within a day,91%,,t,Battersea,3.0,3.0,"['email', 'phone', 'reviews', 'jumio', 'offlin...",t,t,"London, United Kingdom",LB of Wandsworth,Wandsworth,,London,,SW11 5GX,London,"London, United Kingdom",GB,United Kingdom,51.47298,-0.16376,t,Townhouse,Entire home/apt,4,1.5,2.0,2.0,Real Bed,"{TV,""Cable TV"",Internet,Wifi,Kitchen,""Free par...",1001.0,$175.00,"$1,050.00","$3,500.00",$250.00,$70.00,2,$0.00,30,1125,30,30,1125,1125,30.0,1125.0,5 weeks ago,t,12,14,44,319,92,0,2010-11-15,2016-09-07,98.0,10.0,10.0,10.0,10.0,9.0,9.0,f,,,f,f,moderate,t,t,1,1,0,0,0.9
4,25023,none,2010-04-03,"Rome, Lazio, Italy",within an hour,100%,,t,Wimbledon,1.0,1.0,"['email', 'phone', 'reviews']",t,f,"Wimbledon, London, United Kingdom",LB of Wandsworth,Wandsworth,,Wimbledon,London,SW19 6QH,London,"Wimbledon, United Kingdom",GB,United Kingdom,51.44687,-0.21874,t,Apartment,Entire home/apt,4,1.0,2.0,2.0,Real Bed,"{TV,Wifi,Kitchen,""Free parking on premises"",El...",700.0,$65.00,$630.00,"$2,515.00",$250.00,$50.00,2,$11.00,4,100,4,4,100,100,4.0,100.0,5 weeks ago,t,0,0,14,16,27,5,2016-03-05,2019-03-11,91.0,10.0,9.0,9.0,9.0,9.0,9.0,f,,,f,f,moderate,f,f,1,1,0,0,0.71


Other columns can be dropped because they contain a majority of null entries.

In [36]:
df.isna().sum()

id                                                  0
experiences_offered                                 0
host_since                                        248
host_location                                     473
host_response_time                              25699
host_response_rate                              25699
host_acceptance_rate                            79671
host_is_superhost                                 248
host_neighbourhood                              19643
host_listings_count                               248
host_total_listings_count                         248
host_verifications                                  0
host_has_profile_pic                              248
host_identity_verified                            248
street                                              0
neighbourhood                                       0
neighbourhood_cleansed                              0
neighbourhood_group_cleansed                    79671
city                        

In [37]:
df.drop(['host_acceptance_rate', 'neighbourhood_group_cleansed', 'square_feet', 'weekly_price', 'monthly_price', 'license', 'jurisdiction_names'], axis=1, inplace=True)

In [46]:
df.set_index('id', inplace=True) # The id will be used as the index, as this could be useful in future e.g. if a separate dataset containing reviews for each property is linked to this one

Unnamed: 0_level_0,experiences_offered,host_since,host_response_time,host_response_rate,host_is_superhost,host_neighbourhood,host_listings_count,host_total_listings_count,host_verifications,host_has_profile_pic,host_identity_verified,street,neighbourhood,neighbourhood_cleansed,city,state,zipcode,market,smart_location,country_code,country,latitude,longitude,is_location_exact,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,price,security_deposit,cleaning_fee,guests_included,extra_people,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,number_of_reviews,number_of_reviews_ltm,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,requires_license,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1
13913,business,2009-11-16,within a day,60%,f,LB of Islington,4.0,4.0,"['email', 'phone', 'facebook', 'reviews']",t,f,"Islington, Greater London, United Kingdom",LB of Islington,Islington,Islington,Greater London,N4 3,London,"Islington, United Kingdom",GB,United Kingdom,51.56802,-0.11121,t,Apartment,Private room,2,1.0,1.0,0.0,Real Bed,"{TV,""Cable TV"",Wifi,Kitchen,""Paid parking off ...",$65.00,$100.00,$15.00,1,$15.00,1,29,1,1,29,29,1.0,29.0,4 months ago,t,10,39,68,343,14,3,2010-08-18,2018-06-17,95.0,9.0,10.0,9.0,10.0,9.0,9.0,f,f,f,moderate,f,f,3,1,2,0,0.13
15400,romantic,2009-12-05,within a few hours,100%,f,Chelsea,1.0,1.0,"['email', 'phone', 'reviews', 'jumio', 'govern...",t,t,"London, United Kingdom",Chelsea,Kensington and Chelsea,London,,SW3,London,"London, United Kingdom",GB,United Kingdom,51.48796,-0.16898,t,Apartment,Entire home/apt,2,1.0,1.0,1.0,Real Bed,"{TV,""Cable TV"",Internet,Wifi,""Air conditioning...",$100.00,$150.00,$50.00,2,$0.00,3,50,3,3,50,50,3.0,50.0,5 weeks ago,t,4,4,4,134,81,0,2009-12-21,2018-03-30,95.0,10.0,10.0,10.0,10.0,10.0,9.0,f,f,f,strict_14_with_grace_period,t,t,1,1,0,0,0.71
17402,none,2010-01-04,within a few hours,62%,t,Fitzrovia,16.0,16.0,"['email', 'phone', 'reviews', 'jumio', 'offlin...",t,t,"London, Fitzrovia, United Kingdom",Fitzrovia,Westminster,London,Fitzrovia,W1T4BP,London,"London, United Kingdom",GB,United Kingdom,51.52098,-0.14002,t,Apartment,Entire home/apt,6,2.0,3.0,3.0,Real Bed,"{TV,Wifi,Kitchen,""Paid parking off premises"",E...",$500.00,$350.00,$65.00,4,$10.00,3,365,3,3,365,365,3.0,365.0,yesterday,t,30,60,89,364,39,14,2011-03-21,2018-10-15,93.0,10.0,9.0,9.0,9.0,10.0,9.0,f,f,f,strict_14_with_grace_period,f,f,13,13,0,0,0.4


In [53]:
df.head(3)

Unnamed: 0_level_0,experiences_offered,host_since,host_response_time,host_response_rate,host_is_superhost,host_listings_count,host_total_listings_count,host_has_profile_pic,host_identity_verified,street,neighbourhood,neighbourhood_cleansed,city,state,zipcode,market,smart_location,country_code,country,latitude,longitude,is_location_exact,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,price,security_deposit,cleaning_fee,guests_included,extra_people,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,number_of_reviews,number_of_reviews_ltm,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,requires_license,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1
13913,business,2009-11-16,within a day,60%,f,4.0,4.0,t,f,"Islington, Greater London, United Kingdom",LB of Islington,Islington,Islington,Greater London,N4 3,London,"Islington, United Kingdom",GB,United Kingdom,51.56802,-0.11121,t,Apartment,Private room,2,1.0,1.0,0.0,Real Bed,"{TV,""Cable TV"",Wifi,Kitchen,""Paid parking off ...",$65.00,$100.00,$15.00,1,$15.00,1,29,1,1,29,29,1.0,29.0,4 months ago,t,10,39,68,343,14,3,2010-08-18,2018-06-17,95.0,9.0,10.0,9.0,10.0,9.0,9.0,f,f,f,moderate,f,f,3,1,2,0,0.13
15400,romantic,2009-12-05,within a few hours,100%,f,1.0,1.0,t,t,"London, United Kingdom",Chelsea,Kensington and Chelsea,London,,SW3,London,"London, United Kingdom",GB,United Kingdom,51.48796,-0.16898,t,Apartment,Entire home/apt,2,1.0,1.0,1.0,Real Bed,"{TV,""Cable TV"",Internet,Wifi,""Air conditioning...",$100.00,$150.00,$50.00,2,$0.00,3,50,3,3,50,50,3.0,50.0,5 weeks ago,t,4,4,4,134,81,0,2009-12-21,2018-03-30,95.0,10.0,10.0,10.0,10.0,10.0,9.0,f,f,f,strict_14_with_grace_period,t,t,1,1,0,0,0.71
17402,none,2010-01-04,within a few hours,62%,t,16.0,16.0,t,t,"London, Fitzrovia, United Kingdom",Fitzrovia,Westminster,London,Fitzrovia,W1T4BP,London,"London, United Kingdom",GB,United Kingdom,51.52098,-0.14002,t,Apartment,Entire home/apt,6,2.0,3.0,3.0,Real Bed,"{TV,Wifi,Kitchen,""Paid parking off premises"",E...",$500.00,$350.00,$65.00,4,$10.00,3,365,3,3,365,365,3.0,365.0,yesterday,t,30,60,89,364,39,14,2011-03-21,2018-10-15,93.0,10.0,9.0,9.0,9.0,10.0,9.0,f,f,f,strict_14_with_grace_period,f,f,13,13,0,0,0.4


host_listings_count and host_total_listings_count are the same in all but 248 cases. These cases are those where the value is NaN. Therefore one of these columns can be dropped. Other columns which split these into type of property will also be dropped, as they will be highly correlated (one will be the total of the others).

In [61]:
print(sum((df.host_listings_count == df.host_total_listings_count) == False))
df.loc[((df.host_listings_count == df.host_total_listings_count) == False)]

248


Unnamed: 0_level_0,experiences_offered,host_since,host_response_time,host_response_rate,host_is_superhost,host_listings_count,host_total_listings_count,host_has_profile_pic,host_identity_verified,street,neighbourhood,neighbourhood_cleansed,city,state,zipcode,market,smart_location,country_code,country,latitude,longitude,is_location_exact,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,price,security_deposit,cleaning_fee,guests_included,extra_people,minimum_nights,maximum_nights,minimum_minimum_nights,maximum_minimum_nights,minimum_maximum_nights,maximum_maximum_nights,minimum_nights_avg_ntm,maximum_nights_avg_ntm,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,number_of_reviews,number_of_reviews_ltm,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,requires_license,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1
381741,none,,,,,,,,,"London, United Kingdom",LB of Brent,Brent,London,,NW10,London,"London, United Kingdom",GB,United Kingdom,51.53571,-0.24497,f,Apartment,Entire home/apt,6,2.0,2.0,2.0,Real Bed,"{TV,""Cable TV"",Internet,Wifi,Kitchen,""Pets liv...",$100.00,$250.00,$35.00,1,$0.00,1,1125,1,1,1125,1125,1.0,1125.0,today,t,11,20,37,310,7,0,2012-06-10,2013-01-02,100.0,9.0,9.0,9.0,10.0,8.0,9.0,f,f,f,strict_14_with_grace_period,f,f,1,1,0,0,0.08
388743,none,,,,,,,,,"Lee, London, United Kingdom",LB of Lewisham,Lewisham,Lee,London,SE12 0PT,London,"Lee, United Kingdom",GB,United Kingdom,51.45615,0.00956,f,Apartment,Private room,1,,1.0,1.0,Real Bed,{Internet},$80.00,,,1,$0.00,1,1125,1,1,1125,1125,1.0,1125.0,never,t,30,60,90,365,0,0,,,,,,,,,,f,f,f,flexible,f,f,1,0,1,0,
396100,none,,,,,,,,,"London, United Kingdom",LB of Bromley,Bromley,London,,SE26 4EQ,London,"London, United Kingdom",GB,United Kingdom,51.42273,-0.05679,t,House,Private room,4,1.5,2.0,4.0,Real Bed,"{TV,Wifi,""Pets allowed"",""Pets live on this pro...",$60.00,$0.00,$0.00,4,$0.00,1,7,1,2,7,7,1.3,7.0,4 days ago,t,16,37,61,123,20,4,2017-04-04,2019-01-04,99.0,10.0,10.0,10.0,10.0,9.0,9.0,f,f,f,flexible,f,f,6,0,6,0,0.81
400441,none,,,,,,,,,"London, United Kingdom",LB of Newham,Newham,London,,E12 6UW,London,"London, United Kingdom",GB,United Kingdom,51.54841,0.04934,t,Apartment,Private room,2,2.0,1.0,1.0,Real Bed,"{TV,Wifi,Kitchen,""Free parking on premises"",He...",$30.00,,,1,$20.00,180,1825,180,180,1825,1825,180.0,1825.0,21 months ago,t,30,60,90,365,0,0,,,,,,,,,,f,f,f,moderate,f,f,1,0,1,0,
423592,none,,,,,,,,,"Barking, United Kingdom",LB of Barking and Dagenham,Barking and Dagenham,Barking,,IG11 8LJ,London,"Barking, United Kingdom",GB,United Kingdom,51.53922,0.07048,t,Apartment,Private room,2,0.0,1.0,1.0,Real Bed,{},$70.00,,,1,$0.00,1,1125,1,1,1125,1125,1.0,1125.0,85 months ago,t,30,60,90,365,0,0,,,,,,,,,,f,f,f,flexible,f,f,1,0,1,0,
456696,none,,,,,,,,,"London, England, United Kingdom",LB of Newham,Newham,London,England,E16,London,"London, United Kingdom",GB,United Kingdom,51.50869,0.08019,f,House,Private room,2,,1.0,1.0,Real Bed,{},$75.00,,,1,$0.00,1,1125,1,1,1125,1125,1.0,1125.0,85 months ago,t,30,60,90,365,0,0,,,,,,,,,,f,f,f,flexible,f,f,1,0,1,0,
460680,none,,,,,,,,,"London, United Kingdom",LB of Newham,Newham,London,,E12 5AB,London,"London, United Kingdom",GB,United Kingdom,51.55088,0.0463,t,Apartment,Private room,2,1.0,1.0,1.0,Real Bed,"{Internet,Wifi,Kitchen,""Free parking on premis...",$110.00,,,1,$0.00,1,1125,1,1,1125,1125,1.0,1125.0,never,t,30,60,90,365,0,0,,,,,,,,,,f,f,f,flexible,f,f,1,0,1,0,
471753,none,,,,,,,,,"London, England, United Kingdom",LB of Newham,Newham,London,England,E12,London,"London, United Kingdom",GB,United Kingdom,51.54563,0.04566,f,House,Private room,2,1.0,1.0,1.0,Real Bed,"{TV,Internet,Wifi,Kitchen,""Smoking allowed"",Br...",$95.00,,,3,$35.00,1,180,1,1,180,180,1.0,180.0,never,t,30,60,90,365,0,0,,,,,,,,,,f,f,f,flexible,f,f,1,0,1,0,
486512,social,,,,,,,,,"Greater London, England, United Kingdom",LB of Redbridge,Redbridge,Greater London,England,IG5,London,"Greater London, United Kingdom",GB,United Kingdom,51.58583,0.06578,t,House,Private room,2,2.5,2.0,2.0,Real Bed,"{TV,""Cable TV"",Wifi,Kitchen,""Free parking on p...",$50.00,$100.00,,1,$20.00,5,60,5,5,60,60,5.0,60.0,12 months ago,t,30,60,90,365,0,0,,,,,,,,,,f,t,f,strict_14_with_grace_period,f,f,1,0,1,0,
510198,none,,,,,,,,,"London, United Kingdom",LB of Redbridge,Redbridge,London,,E12 5DZ,London,"London, United Kingdom",GB,United Kingdom,51.56385,0.03662,t,Bed and breakfast,Private room,2,2.0,1.0,1.0,Real Bed,"{TV,""Cable TV"",Internet,Wifi,Kitchen,""Free par...",$50.00,$200.00,$20.00,2,$10.00,2,90,2,2,90,90,2.0,90.0,9 months ago,t,30,60,90,365,0,0,,,,,,,,,,f,f,f,moderate,f,f,2,0,2,0,


In [62]:
df.drop(['host_total_listings_count', 'calculated_host_listings_count', 'calculated_host_listings_count_entire_homes', 'calculated_host_listings_count_private_rooms', 'calculated_host_listings_count_shared_rooms'], axis=1, inplace=True)

There are multiple columns for property location, including an attempt by the site that originally scraped the data to clean up the neighbourhood locations. Some of these columns can be dropped. Because all of the listings are in London, columns relating to city and country can be dropped. One columns for area (borough) will be kept - 'neighboorhood_cleansed'. 'Zipcode' (postcode) will be kept for now and investigated further below.

In [68]:
df.drop(['street', 'neighbourhood', 'city', 'state', 'market', 'smart_location', 'country_code', 'country', 'is_location_exact'], axis=1, inplace=True)

There are multiple columns for minimum and maximum night stays, but the two main ones will be used as there are few differences between e.g. minimum_nights and minimum_minimum_nights. The latter presumably refers to the fact that min/max night stays can vary over the year. The default (i.e. most frequently applied) min/max night stay values will be used instead.

In [71]:
sum((df.minimum_nights == df.minimum_minimum_nights) == False)

4698

In [72]:
df.drop(['minimum_minimum_nights', 'maximum_minimum_nights', 'minimum_maximum_nights', 'maximum_maximum_nights', 'minimum_nights_avg_ntm', 'maximum_nights_avg_ntm'], axis=1, inplace=True)

In [76]:
df.head(3)

Unnamed: 0_level_0,experiences_offered,host_since,host_response_time,host_response_rate,host_is_superhost,host_listings_count,host_has_profile_pic,host_identity_verified,neighbourhood_cleansed,zipcode,latitude,longitude,property_type,room_type,accommodates,bathrooms,bedrooms,beds,bed_type,amenities,price,security_deposit,cleaning_fee,guests_included,extra_people,minimum_nights,maximum_nights,calendar_updated,has_availability,availability_30,availability_60,availability_90,availability_365,number_of_reviews,number_of_reviews_ltm,first_review,last_review,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,requires_license,instant_bookable,is_business_travel_ready,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,reviews_per_month
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1
13913,business,2009-11-16,within a day,60%,f,4.0,t,f,Islington,N4 3,51.56802,-0.11121,Apartment,Private room,2,1.0,1.0,0.0,Real Bed,"{TV,""Cable TV"",Wifi,Kitchen,""Paid parking off ...",$65.00,$100.00,$15.00,1,$15.00,1,29,4 months ago,t,10,39,68,343,14,3,2010-08-18,2018-06-17,95.0,9.0,10.0,9.0,10.0,9.0,9.0,f,f,f,moderate,f,f,0.13
15400,romantic,2009-12-05,within a few hours,100%,f,1.0,t,t,Kensington and Chelsea,SW3,51.48796,-0.16898,Apartment,Entire home/apt,2,1.0,1.0,1.0,Real Bed,"{TV,""Cable TV"",Internet,Wifi,""Air conditioning...",$100.00,$150.00,$50.00,2,$0.00,3,50,5 weeks ago,t,4,4,4,134,81,0,2009-12-21,2018-03-30,95.0,10.0,10.0,10.0,10.0,10.0,9.0,f,f,f,strict_14_with_grace_period,t,t,0.71
17402,none,2010-01-04,within a few hours,62%,t,16.0,t,t,Westminster,W1T4BP,51.52098,-0.14002,Apartment,Entire home/apt,6,2.0,3.0,3.0,Real Bed,"{TV,Wifi,Kitchen,""Paid parking off premises"",E...",$500.00,$350.00,$65.00,4,$10.00,3,365,yesterday,t,30,60,89,364,39,14,2011-03-21,2018-10-15,93.0,10.0,9.0,9.0,9.0,10.0,9.0,f,f,f,strict_14_with_grace_period,f,f,0.4


### Description of each column:
- experiences_offered - slightly unclear as it does not appear to directly relate to Airbnb Experiences, but this seems to be the main recommended category of travel type, e.g. business
- host_since - date that the host first joined Airbnb
- host_response_time - average amount of time the host takes to reply to messages
- host_response_rate - proportion of messages that the host replies to
- host_is_superhost - whether or not the host is a superhost, which is a mark of quality for the top-rated and most experienced hosts, and can increase your search ranking on Airbnb
- host_listings_count - how many listings the host has in total
- host_has_profile_pic - whether or not the host has provided a picture of themselves
- host_identity_verified - whether or not the host has been verified with id
- neighbourhood_cleansed - the London borough the property is in
- zipcode - postcode of the property
- latitude
- longitude
- property_type - type of property, e.g. house or flat
- room_type - type of listing, e.g. entire home, private room or shared room
- accommodates - how many people the property accommodates
- bathrooms - number of bathrooms
- bedrooms - number of bedrooms
- beds - number of beds
- bed_type - type of bed, e.g. real bed or sofa-bed
- amenities - list of amenities
- price - nightly advertised price (the target variable)
- security_deposit - the amount required as a security deposit
- cleaning_fee - the amount of the cleaning fee (a fixed amount paid per booking)
- guests_included - the number of guests included in the booking fee
- extra_people - the price per additional guest above the guests_included price
- minimum_nights - the minimum length of stay
- maximum_nights - the maximum length of stay
- calendar_updated - when the host last updated the calendar
- has_availability - whether there are any nights available to be booked
- availability_30 - how many nights are available to be booked in the next 30 days
- availability_60 - how many nights are available to be booked in the next 60 days
- availability_90 - how many nights are available to be booked in the next 90 days
- availability_365 - how many nights are available to be booked in the next 365 days
- number_of_reviews - the number of reviews left for the property
- number_of_reviews_ltm - the number of reviews left for the property in the last twelve months
- first_review - the date of the first review
- last_review - the date of the most recent review
- review_scores_rating - guests can score properties overall from 1 to 5 stars
- review_scores_accuracy - guests can score the accuracy of a property's description from 1 to 5 stars
- review_scores_cleanliness - guests can score a property's cleanliness from 1 to 5 stars
- review_scores_checkin - guests can score their check-in from 1 to 5 stars
- review_scores_communication - guests can score a host's communication from 1 to 5 stars
- review_scores_location - guests can score a property's location from 1 to 5 stars
- review_scores_value - guests can score a booking's value for money from 1 to 5 stars
- requires_license - whether or not the property requires a license to operate on Airbnb
- instant_bookable - whether or not the property can be instant booked (i.e. booked straight away, without having to message the host first and wait to be accepted)
- is_business_travel_ready - whether or not the property is deemed to be particularly suitable for business travel, which comes with certain requirements (e.g. WiFi)
- cancellation_policy - the type of cancellation policy, e.g. strict or moderate
- require_guest_profile_picture - whether or not the guest is required to have a profile picture in order to book
- require_guest_phone_verification - whether or not the guest is required to have a phone number verified in order to book
- reviews_per_month - calculated field of the average number of reviews left by guest each month

### Cleaning individual columns

***

# Exploratory data analysis

In this section...

***

# Building a neural network

In this section...

- Try with and without lat/long
- Try with and without occupancy columns?

***

# Conclusions and recommendations

Conclusions

**Potential directions for future work**
- Include a wider geographic area, e.g. other major cities around the world are available on Insideairbnb
- Use better quality/more accurate data with actual prices paid per night
- Augment the model with NLP of listing descriptions and/or reviews, e.g. for sentiment analysis or looking for keywords
- Augment the model with a convolutional neural network to attempt to assess the quality of images (images are hugely important on Airbnb)
- In addition to predicting base prices, a sequence model could be used to calculate daily rates using data on seasonality and occupancy