## Introduction
- This document explores the AirBnB dataset containing listings and attributes for approximately 300 listings.

In [1]:
#import all the needed packages and set plots to be embedded inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
import re #Regex
from geopy.geocoders import Nominatim
import reverse_geocoder as rg 
from geopy.point import Point
from geopy.extra.rate_limiter import RateLimiter

%matplotlib inline

In [2]:
#Load the dataset into a panda dataframe
df = pd.read_csv('Bookings UK(1-3).csv')
df2 = pd.read_csv('Bookings UK(23-27).csv')
df3 = pd.read_csv('Bookings US(1-3).csv')
df4 = pd.read_csv('Bookings US(23-27).csv')

In [3]:
#Dimensions of the dataset
df.shape

(300, 6)

In [4]:
#Number of unique values in the dataset
df.nunique()

Title       292
Beds         27
Location    292
Price       174
Reviews     241
Host          3
dtype: int64

In [5]:
#Summary of the dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300 entries, 0 to 299
Data columns (total 6 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Title     300 non-null    object
 1   Beds      294 non-null    object
 2   Location  300 non-null    object
 3   Price     300 non-null    object
 4   Reviews   283 non-null    object
 5   Host      213 non-null    object
dtypes: object(6)
memory usage: 14.2+ KB


In [6]:
#Descriptive statistics of the data
df.describe()

Unnamed: 0,Title,Beds,Location,Price,Reviews,Host
count,300,294,300,300,283,213
unique,292,27,292,174,241,3
top,"<meta content=""Private Shepherds Hut - Off gri...","<span class=""dir dir-ltr"">1 double bed</span>","<div class=""t1jojoys dir dir-ltr"" id=""title_24...",$165 per night,"<span aria-hidden=""true"" class=""r1dxllyb dir d...","<div class=""t1mwk1n0 dir dir-ltr"">Superhost</div>"
freq,2,56,2,9,14,152


In [7]:
df.head()

Unnamed: 0,Title,Beds,Location,Price,Reviews,Host
0,"<meta content=""David Emanuel Suite, Grade 11* ...","<span class=""dir dir-ltr"">1 bed</span>","<div class=""t1jojoys dir dir-ltr"" id=""title_71...",$623 per night,,
1,"<meta content=""Berney - converted railway carr...","<span class=""dir dir-ltr"">2 double beds</span>","<div class=""t1jojoys dir dir-ltr"" id=""title_71...",$179 per night,"<span aria-hidden=""true"" class=""r1dxllyb dir d...",
2,"<meta content=""Superking Room- Proper B&amp;B ...","<span class=""dir dir-ltr"">1 king bed</span>","<div class=""t1jojoys dir dir-ltr"" id=""title_42...",$199 per night,"<span aria-hidden=""true"" class=""r1dxllyb dir d...","<div class=""t1mwk1n0 dir dir-ltr"">Superhost</div>"
3,"<meta content=""Kidwelly Farmhouse B&amp;B -The...","<span class=""dir dir-ltr"">2 beds</span>","<div class=""t1jojoys dir dir-ltr"" id=""title_38...",$115 per night,"<span aria-hidden=""true"" class=""r1dxllyb dir d...","<div class=""t1mwk1n0 dir dir-ltr"">Superhost</div>"
4,"<meta content=""The Lake Lodge (Windermere)"" i...","<span class=""dir dir-ltr"">4 beds</span>","<div class=""t1jojoys dir dir-ltr"" id=""title_26...",$670 per night,"<span aria-hidden=""true"" class=""r1dxllyb dir d...","<div class=""t1mwk1n0 dir dir-ltr"">Superhost</div>"


In [8]:
#Null values in the dataset
df.isnull().sum()

Title        0
Beds         6
Location     0
Price        0
Reviews     17
Host        87
dtype: int64

## Strucure of the dataset
- There are not more than 300 listings in the datasets, all scraped from the Airbnb website, with 6 features(title, number of beds, location, price, reviews and category of the host). There are both numeric and categorical data types. 
- The four different datasets are of listings from two region; US and UK with dates from 1-3 December and 23-27 December. The data will mostly help show how the prices of the listings vary from the two sets of dates and also the availability of listings in the two regions.


## Main features of interest
- I'm mostly interested in finding out which specific locations will bring in more profit if an individual decides to list a space via the Airbnb site. Also, more information about that space, number of beds, reviews given on the specific lisings and also type of the listing example, a cabin, cottage or an entire home, will greatly assist in the decision making process.

# Cleaning Data

In [9]:
# Copy of the originl datasets
df_clean = df.copy()
df2_clean = df2.copy()
df3_clean = df3.copy()
df4_clean = df4.copy()

## Issue 1: Unnecessary data in the Dataset(HTML Tags)

### Define:
- Use the lstrip and rstrip functions to get rid of the unwanted characters of the strings in the dataset.


### Code

In [10]:
df2_clean['Title'][1]

'<meta content="NEW Luxury Romantic Cottage - Idyllic Rural Bliss" itemprop="name"/>'

### Titles

In [11]:
#Strip the html tags from Listings titles
df_clean['Title'] = df_clean['Title'].map(lambda x: x.lstrip('<meta content="').rstrip(' itemprop="name"/>'))

In [12]:
df2_clean['Title'] = df2_clean['Title'].map(lambda x: x.lstrip('<meta content="').rstrip(' itemprop="name"/>'))

In [13]:
df3_clean['Title'] = df3_clean['Title'].map(lambda x: x.lstrip('<meta content="').rstrip(' itemprop="name"/>'))

In [14]:
df4_clean['Title'] = df3_clean['Title'].map(lambda x: x.lstrip('<meta content="').rstrip(' itemprop="name"/>'))

## Beds

In [15]:
df_clean['Beds'][0]

'<span class="dir dir-ltr">1 bed</span>'

In [16]:
##Strip the html tags from Listings number of beds
df_clean['Beds'] = df_clean['Beds'].astype(str).str.lstrip('<span class="dir dir-ltr">').str.rstrip('</span>')

In [17]:
df2_clean['Beds'] = df2_clean['Beds'].astype(str).str.lstrip('<span class="dir dir-ltr">').str.rstrip('</span>')

In [18]:
df3_clean['Beds'] = df3_clean['Beds'].astype(str).str.lstrip('<span class="dir dir-ltr">').str.rstrip('</span>')

In [19]:
df4_clean['Beds'] = df4_clean['Beds'].astype(str).str.lstrip('<span class="dir dir-ltr">').str.rstrip('</span>')

## Location

In [20]:
df_clean['Location'][0]

'<div class="t1jojoys dir dir-ltr" id="title_717228152145251237">Private room in Gileston</div>'

In [21]:
##Strip the html tags from Listings locations
df_clean['Location'] = df_clean['Location'].map(lambda x: x.lstrip('<div class="t1jojoys dir dir-ltr" id="title_717228152145251237">').rstrip('</div>'))

In [22]:
df2_clean['Location'] = df2_clean['Location'].map(lambda x: x.lstrip('<div class="t1jojoys dir dir-ltr" id="title_717228152145251237">').rstrip('</div>'))

In [23]:
df3_clean['Location'] = df3_clean['Location'].map(lambda x: x.lstrip('<div class="t1jojoys dir dir-ltr" id="title_717228152145251237">').rstrip('</div>'))

In [24]:
df4_clean['Location'] = df4_clean['Location'].map(lambda x: x.lstrip('<div class="t1jojoys dir dir-ltr" id="title_717228152145251237">').rstrip('</div>'))

In [25]:
#Some title tags info from the Locations were not removed using the previous strip function, the split function will get rid of those
df_clean['Location'] = df_clean['Location'].astype(str).str.split('\>', n=1, expand=True)[1]

In [26]:
df2_clean['Location'] = df2_clean['Location'].astype(str).str.split('\>', n=1, expand=True)[1]

In [27]:
df3_clean['Location'] = df3_clean['Location'].astype(str).str.split('\>', n=1, expand=True)[1]

In [28]:
df4_clean['Location'] = df4_clean['Location'].astype(str).str.split('\>', n=1, expand=True)[1]

## Reviews

In [29]:
df_clean['Reviews'][4]

'<span aria-hidden="true" class="r1dxllyb dir dir-ltr">4.93 (164)</span>'

In [30]:
##Strip the html tags from Listings reviews
df_clean['Reviews'] = df_clean['Reviews'].astype(str).str.lstrip('<span aria-hidden="true" class="r1dxllyb dir dir-ltr">').str.rstrip('</span>')

In [31]:
df2_clean['Reviews'] = df2_clean['Reviews'].astype(str).str.lstrip('<span aria-hidden="true" class="r1dxllyb dir dir-ltr">').str.rstrip('</span>')

In [32]:
df3_clean['Reviews'] = df3_clean['Reviews'].astype(str).str.lstrip('<span aria-hidden="true" class="r1dxllyb dir dir-ltr">').str.rstrip('</span>')

In [33]:
df4_clean['Reviews'] = df4_clean['Reviews'].astype(str).str.lstrip('<span aria-hidden="true" class="r1dxllyb dir dir-ltr">').str.rstrip('</span>')

## Host Info

In [34]:
df_clean['Host'][4]

'<div class="t1mwk1n0 dir dir-ltr">Superhost</div>'

In [35]:
##Strip the html tags from Listings host info
df_clean['Host'] = df_clean['Host'].astype(str).str.lstrip('<div class="t1mwk1n0 dir dir-ltr">').str.rstrip('</div>')


In [36]:
df2_clean['Host'] = df2_clean['Host'].astype(str).str.lstrip('<div class="t1mwk1n0 dir dir-ltr">').str.rstrip('</div>')


In [37]:
df3_clean['Host'] = df3_clean['Host'].astype(str).str.lstrip('<div class="t1mwk1n0 dir dir-ltr">').str.rstrip('</div>')


In [38]:
df4_clean['Host'] = df4_clean['Host'].astype(str).str.lstrip('<div class="t1mwk1n0 dir dir-ltr">').str.rstrip('</div>')


### Test

## T1

In [39]:
df_clean['Beds'].sample(4)

17      1 king bed
142    1 queen bed
278     1 king bed
213               
Name: Beds, dtype: object

In [40]:
df2_clean['Beds'].sample(2)

181    4 bed
92     3 bed
Name: Beds, dtype: object

In [41]:
df3_clean['Beds'].sample(2)

287          6 bed
272    2 queen bed
Name: Beds, dtype: object

In [42]:
df4_clean['Beds'].sample(2)

41         2 bed
37    1 king bed
Name: Beds, dtype: object

## T2

In [43]:
df_clean['Title'].sample(4)

52     Treetop Tent in Dark Skies Park - Red Dragon's...
275    Hound Tor Annexe, everything you need in one plac
145                                        The Vale  Cab
92                         The Chamber at White Rose Tow
Name: Title, dtype: object

In [44]:
df2_clean['Title'].sample(4)

165          Coastal Village.  Manorbier. NrTenby. Doubl
256    CORRIEHALL STOPOVER-hut3of4 CHARTREUS 2 single...
172    5 Star, 2 Bedroom Scandinavian Lodge with Hot Tub
29                                 The place to bee... 🐝
Name: Title, dtype: object

In [45]:
df3_clean['Title'].sample(4)

169                        Cozy Farmhouse - Pet friendly
217        The Dude's Abode A-Frame Private Ocean Access
215                 'Cozy Cabin 9 Mi fr. Glacier "Deer"'
288    Charming Cabin near Hood Canal, Lake Cushman &...
Name: Title, dtype: object

In [46]:
df4_clean['Title'].sample(4)

62                                       Off-grid itHous
211                                           Birdsong R
257       WAVERLY’S WHITE BUNGALOW OVER THE WATER~~~~~~~
299    SA Beach Suite #1 - Beachfront Apartment on Ca...
Name: Title, dtype: object

## T3

In [47]:
df_clean['Location'].sample(5)

201                            None
151    Private room in East Farndon
214                            None
183                Cabin in Cumbria
100           Home in Tyne and Wear
Name: Location, dtype: object

In [48]:
df2_clean['Location'].sample(5)

88          Cabin in North Yorkshire
216          Shepherd’s hut in Conwy
162              Farm stay in Beulah
287                             None
298    Home in Bowness on Windermere
Name: Location, dtype: object

In [49]:
df3_clean['Location'].sample(5)

88     Private room in Phenix City
209                           None
158        Bungalow in Nevada City
198         Townhouse in Surf City
234             Home in Scottsdale
Name: Location, dtype: object

In [50]:
df4_clean['Location'].sample(5)

156    Home in Jacksonville Beach
223          Cabin in Sevierville
188           Cabin in Broken Bow
248               Cabin in Athens
126           Tiny home in Marion
Name: Location, dtype: object

## T4

In [51]:
df_clean['Reviews'].sample(5)

147    4.87 (45)
96     4.79 (58)
145    4.67 (70)
72      4.9 (21)
140    4.59 (58)
Name: Reviews, dtype: object

In [52]:
df2_clean['Reviews'].sample(5)

103      4.8 (50)
245    4.98 (110)
57      4.9 (241)
133     4.88 (16)
197     4.91 (22)
Name: Reviews, dtype: object

In [53]:
df3_clean['Reviews'].sample(5)

43            New
270    4.99 (113)
81       5.0 (17)
137      4.9 (49)
259     4.8 (213)
Name: Reviews, dtype: object

In [54]:
df4_clean['Reviews'].sample(5)

297     4.94 (54)
112     4.9 (347)
172    4.84 (126)
252    4.96 (328)
115     4.8 (254)
Name: Reviews, dtype: object

## T5

In [55]:
df_clean['Host'].sample(5)

256    Superhost
159    Superhost
71     Superhost
47     Superhost
266     Rare fin
Name: Host, dtype: object

In [56]:
df2_clean['Host'].sample(5)

85              
280    Superhost
293     Rare fin
259     Rare fin
134             
Name: Host, dtype: object

In [57]:
df3_clean['Host'].sample(5)

232    Superhost
227     Rare fin
167     Rare fin
37          Plus
189     Rare fin
Name: Host, dtype: object

In [58]:
df4_clean['Host'].sample(5)

229    Superhost
244    Superhost
230             
196    Superhost
80     Superhost
Name: Host, dtype: object

In [59]:
df_clean.sample(5)

Unnamed: 0,Title,Beds,Location,Price,Reviews,Host
81,Eden Hideaway,3 bed,Farm stay in Scottish Borders,$355 per night,5.0 (48),
237,The Hive @ Braeside Retreats,1 bed,Tiny home in Thurso,$157 per night,,
46,Hobbit House @ Sychpwll,2 double bed,Earthen home in Llandrinio,$172 per night,4.92 (77),Superhost
120,Tiny Home A frame cabin SILVA,1 queen bed,Tiny home in Cowes,$151 per night,4.87 (68),Superhost
105,Under The Waves Unique Apartment Filey Sea F,1 bed,Vacation home in North Yorkshire,$361 per night,,


In [60]:
df2_clean.sample(5)

Unnamed: 0,Title,Beds,Location,Price,Reviews,Host
12,Oak-UK36258 (UK36258),1 double bed,Home in Llanfair Caereinion,$189 per night,New,
187,"Ardlea cottage, 2 en-suite bedroomed cottage,",3 bed,Cottage in Fionnphort,$247 per night,New,Superhost
197,Peak House Farm (Hereford Double Ensuite),1 bed,,$102 per night,4.91 (22),
202,The Hazel Hide - Luxury Eco A-Frame Cab,4 bed,Cabin in Ashington,$253 per night,New,
146,Longwood farm. white cabin. Peace and tranquility,3 bed,Cabin in North Yorkshire,$241 per night,5.0 (22),


In [61]:
df3_clean.sample(5)

Unnamed: 0,Title,Beds,Location,Price,Reviews,Host
66,"Private Cabin/Hot Tub, Next to Purgatory Resort!",1 queen bed,Cabin in Durango,$274 per night,4.81 (187),Superhost
259,Yosemite's Stoneoaks Cab,2 queen bed,Cabin in Yosemite National Park,$405 per night,4.8 (213),
82,NEW~rate discount~ Private Country Retreat w/P...,7 bed,Home in Apopka,$560 per night,5.0 (21),Superhost
28,Luxury Resort Home - New Pool/Hot Tub &amp; Gr...,6 bed,Home in Scottsdale,"$334 per night, originally $373",4.99 (119),Superhost
143,JUNIPER HILL cab,3 bed,Cabin in Wilmington,$263 per night,4.99 (165),Rare fin


In [62]:
df4_clean.sample(5)

Unnamed: 0,Title,Beds,Location,Price,Reviews,Host
141,"Cozy Winters in Blue Ridge, Pets Welcome!",3 bed,Cabin in Estes Park,"$394 per night, originally $640",4.9 (315),Rare fin
278,AFrame Featured in Dwell &amp; Sunset mags 4mi...,2 king bed,Home in Fountain Hills,"$481 per night, originally $588",4.94 (54),Rare fin
91,Vintage Airstream w/shelter. Stay dry in the w...,3 bed,Tower in Easton,$670 per night,4.97 (59),Superhost
56,☆Lakefront ☆ Resort Style Pool☆ Private Beach☆,1 queen bed,Cabin in Vilas,$451 per night,5.0 (4),
123,Lux Estate near Park w. Chef Kitchen &amp; Wat...,1 king bed,,"$314 per night, originally $348",4.97 (152),Superhost


## Issue 2: Duplicates in the datasets

### Define
- Use the drop_duplicates function to get rid of the duplcate values

### Code

In [63]:
df_clean.duplicated().sum()

8

In [64]:
df_clean.drop_duplicates(inplace=True)

In [65]:
df2_clean.duplicated().sum()

13

In [66]:
df2_clean.drop_duplicates(inplace=True)

In [67]:
df3_clean.duplicated().sum()

52

In [68]:
df3_clean.drop_duplicates(inplace=True)

### Test

In [69]:
df2_clean.duplicated().sum()

0

In [70]:
df_clean.duplicated().sum()

0

## Issue 3: More than one variable in the Location column

## Define 
- The location columns consists of both the location and type of the listing eg Home, Cabin
- Use the split function to separate the different variables

## Code

In [71]:
df2_clean['Location'].head()

0                      None
1                      None
2    Tiny home in Mickleton
3      Tiny home in Idrigil
4    Private room in London
Name: Location, dtype: object

In [72]:
#The split function will separate the varlues at 'in'
new = df_clean['Location'].str.split(" in ", n=1, expand=True)

In [73]:
new2 = df2_clean['Location'].str.split(" in ", n=1, expand=True)

In [74]:
new3 = df3_clean['Location'].str.split(" in ", n=1, expand=True)

In [75]:
new4 = df4_clean['Location'].str.split(" in ", n=1, expand=True)

In [76]:
#Each variable will be assigned a different column
df_clean['Type'] = new[0]
df_clean['Locate'] = new[1]

In [77]:
df2_clean['Type'] = new2[0]
df2_clean['Locate'] = new2[1]

In [78]:
df3_clean['Type'] = new3[0]
df3_clean['Locate'] = new3[1]

In [79]:
df4_clean['Type'] = new4[0]
df4_clean['Locate'] = new4[1]

In [80]:
#Drop the Location column since it's no longer needed
df_clean.drop(columns=['Location'], inplace=True)

In [81]:
df2_clean.drop(columns=['Location'], inplace=True)

In [82]:
df3_clean.drop(columns=['Location'], inplace=True)

In [83]:
df4_clean.drop(columns=['Location'], inplace=True)

In [84]:
df_clean.rename(columns={'Locate':'Location'}, inplace=True)

In [85]:
df2_clean.rename(columns={'Locate':'Location'}, inplace=True)
df3_clean.rename(columns={'Locate':'Location'}, inplace=True)
df4_clean.rename(columns={'Locate':'Location'}, inplace=True)

## Test

In [86]:
df_clean.sample(10)

Unnamed: 0,Title,Beds,Price,Reviews,Host,Type,Location
157,Regency townhous,5 bed,"$543 per night, originally $1,673",,Superhost,Townhouse,London
53,Secluded Picture Postcard Cottage with Pool,5 bed,$351 per night,4.96 (157),Rare fin,Home,Swaffham
57,"Idyllic, one-of-a-kind 17th century rural",6 bed,$106 per night,4.95 (187),Superhost,,
145,The Vale Cab,1 double bed,$92 per night,4.67 (70),,Cabin,"Woodhall spa, lincolnshire"
227,MegaPod 2 at Lee Wick Farm Cottages &amp; Glam...,,$137 per night,4.75 (8),Superhost,Private room,St. Osyth
177,Wigwam Lodge (Wheelchair Friendly),3 bed,$165 per night,4.88 (16),Superhost,Cabin,Storwoo
196,Dingle den caravan h,4 bed,$82 per night,4.41 (78),Rare fin,Campsite,Wales
5,"Swallows Loft, The Old Vicarag",2 bed,"$180 per night, originally $206",4.98 (109),Rare fin,Condo,Far Sawrey
129,The Shepherd’s Hut at Hafoty Boeth,1 double bed,"$101 per night, originally $114",5.0 (40),Superhost,Shepherd’s hut,Denbighshire
217,Harry's Stable - 4* NITB approved country esc,1 double bed,$122 per night,5.0 (13),,Farm stay,Comber


In [87]:
df2_clean.sample(10)

Unnamed: 0,Title,Beds,Price,Reviews,Host,Type,Location
261,Harney Peak (UK40074),2 double bed,$195 per night,,,Home,Portinscale
237,Cosy 1 bedroom Shepherd's hut with indoor f,3 bed,$137 per night,4.72 (18),,Shepherd’s hut,Llandegla
47,Serenity Lodge Otterburn with Hot Tub,6 bed,$485 per night,4.97 (37),Rare fin,Cabin,Old Town Farm
8,Berney - converted railway carriag,2 double bed,$179 per night,New,,Train,Trowse Newton
140,'Sea Breeze' sleeps 4 Selsey caravan 2 bathrooms,2 bed,$105 per night,4.81 (47),Rare fin,Cabin,Selsey
216,5* Shepherds Hut in Snowdonia with mountain views,1 double bed,"$144 per night, originally $163",4.99 (297),Superhost,Shepherd’s hut,Conwy
166,Geodesic D,1 double bed,$192 per night,,Superhost,,
152,Morven View Lodge NC500,6 single bed,$186 per night,4.83 (6),,Cabin,Highlan
59,Little Trenant Barn on Helford River Near Cons,3 bed,$198 per night,4.95 (108),Plus,Barn,Constantine
102,Stunning Apartment in Victorian Villa with Gard,2 bed,$275 per night,4.95 (22),Rare fin,Condo,Torbay


In [88]:
df3_clean.sample(10)

Unnamed: 0,Title,Beds,Price,Reviews,Host,Type,Location
219,"Modern Cabin, Private Fishing Lake, Near Sequoias",3 bed,"$215 per night, originally $331",4.99 (163),Superhost,Home,Springville
262,The Cove Cab,2 bed,"$429 per night, originally $477",4.98 (209),Rare fin,Cabin,Sherman
75,Romantic 1 bedroom cabin close to Smugglers Notch,1 king bed,$206 per night,4.93 (355),Superhost,Cabin,Cambridge
77,Awesome Lakefront A-Frame Cabin! Unique Interi...,5 bed,"$265 per night, originally $338",5.0 (47),Superhost,Home,Lake Ozark
206,"Lincoln Log Lodge (Lakefront, Kayaks, Arcade g...",2 queen bed,$144 per night,4.92 (96),Rare fin,Cabin,Bitely
5,Panoramic Views &amp; Dog Heaven at The Silo!,2 bed,$234 per night,5.0 (77),Rare fin,Barn,Corvallis
148,Lost Elk Cabin: NEW Listing! Relaxing &amp; pe...,5 bed,"$204 per night, originally $269",5.0 (43),,Cabin,Packwoo
64,Kilo Cabin ★TVD★ Lakeside Hideou,4 bed,"$179 per night, originally $229",4.94 (297),Superhost,Cabin,Covington
177,Belle â,1 bed,$301 per night,4.96 (165),Superhost,Cabin,Smithville
282,Snowbird-friendly oceanfront condo with ocean ...,2 bed,"$91 per night, originally $162",4.33 (57),,Apartment,Fort Walton Beach


In [89]:
df4_clean.sample(10)

Unnamed: 0,Title,Beds,Price,Reviews,Host,Type,Location
204,Freshly Remodeled Cabin with Brand New Hot Tub!,6 bed,$417 per night,4.96 (165),Rare fin,Cabin,Prineville
186,Beautiful Log Cabin on The Bay,1 king bed,$443 per night,4.59 (138),,Cabin,Broken Bow
159,Vintage Forest Service Log Cabin 9 mi fr Glac,2 sofa bed,"$75 per night, originally $92",4.94 (255),Superhost,Tiny home,Marble Falls
100,"Luxury 5 BR Cabin, Pool, Lake, 40 min from Atl",3 bed,$304 per night,4.97 (271),Superhost,,
149,The Martingale at Max Patch,1 queen bed,$110 per night,4.94 (66),Superhost,Tiny home,Clyde
121,NEW! Oceanfront Retreat w/ Pool Access &amp; B...,2 bed,"$126 per night, originally $201",4.92 (201),Superhost,Cabin,Martin City
289,Back Roads Cabin R,2 bed,$163 per night,4.9 (91),Superhost,Tiny home,Mills River
160,FREE Breakfast - The Tiny - 192 SQ FT Tiny Hous,2 single bed,$64 per night,4.75 (28),Superhost,Houseboat,Colfax
66,"Private Cabin/Hot Tub, Next to Purgatory Resort!",1 queen bed,$188 per night,4.98 (123),Superhost,Dome,Kalispell
143,JUNIPER HILL cab,3 bed,"$289 per night, originally $334",4.99 (386),Superhost,Cabin,Fredonia


## Issue 4: The Price column contains both Original Price and Discounted Prices

## Define
- Separate the Price column into Original and Actual price

## Code

In [90]:
df_clean['Price'].sample(10)

95     $275 per night
17     $165 per night
111    $272 per night
261    $199 per night
68     $183 per night
178    $148 per night
13      $92 per night
220     $69 per night
236    $173 per night
73     $175 per night
Name: Price, dtype: object

In [91]:
#Use the split function to separate the different Prices columns
prices = df_clean['Price'].str.split("originally", n=1, expand=True)

In [92]:
prices2 = df2_clean['Price'].str.split("originally", n=1, expand=True)

In [93]:
prices3 = df3_clean['Price'].str.split("originally", n=1, expand=True)

In [94]:
prices4 = df4_clean['Price'].str.split("originally", n=1, expand=True)

In [95]:
#Assign the different variables to columns
df_clean['Current Price'] = prices[0]
df_clean['Original Price'] = prices[1]

In [96]:
df2_clean['Current Price'] = prices2[0]
df2_clean['Original Price'] = prices2[1]

In [97]:
df3_clean['Current Price'] = prices3[0]
df3_clean['Original Price'] = prices3[1]

In [98]:
df4_clean['Current Price'] = prices4[0]
df4_clean['Original Price'] = prices4[1]

In [99]:
df_clean.drop(columns=['Price'], inplace=True)
df2_clean.drop(columns=['Price'], inplace=True)
df3_clean.drop(columns=['Price'], inplace=True)
df4_clean.drop(columns=['Price'], inplace=True)

In [100]:
df_clean.sample(10)

Unnamed: 0,Title,Beds,Reviews,Host,Type,Location,Current Price,Original Price
195,Fantastic two double bedroom annex with park...,2 bed,4.85 (150),Superhost,Guesthouse,Falmouth,$206 per night,
107,"New Barn, Sedgeford, North Norfolk",3 double bed,4.77 (215),Rare fin,,,$187 per night,
221,Standard King Ensuite 'Ruthin room' at Maenan ...,2 single bed,,,Hotel room,Conwy Principal Area,$157 per night,
5,"Swallows Loft, The Old Vicarag",2 bed,4.98 (109),Rare fin,Condo,Far Sawrey,"$180 per night,",$206
130,Fell Head glamping pod at Howgills Hideaway,2 bed,4.96 (219),Superhost,Tiny home,Cumbria,$97 per night,
240,The Byre @ Braeside Retreats,1 bed,,,Tiny home,Murkle,$157 per night,
257,No.6 on the bay,4 bed,4.86 (160),Superhost,Condo,Porthcawl,$206 per night,
297,1 Craiglinnhe Court...sleeps 2-4 guests.,3 bed,4.97 (220),Rare fin,Home,Highland Council,$179 per night,
50,Serendipity,1 king bed,4.99 (537),Rare fin,Farm stay,Scarborough,$114 per night,
47,Camping Pod 2 on Loch Shore - Dalavich,1 single bed,4.77 (66),Superhost,Hut,Dalavich,"$82 per night,",$89


In [101]:
df2_clean.sample(10)

Unnamed: 0,Title,Beds,Reviews,Host,Type,Location,Current Price,Original Price
235,The Glasshouse (luxury 2 bed with private hot ...,2 king bed,4.91 (11),,Farm stay,Newburgh,$687 per night,
201,Spacious room with king size bed and en-su,1 bed,New,,Private room,Cornwall,$144 per night,
226,Mermaid’s Hut \nBrightlings,,5.0 (14),Superhost,Hut,Brightlingsea,$55 per night,
13,Cow Shed: Daisy at Easton Farm Park,2 bed,4.78 (32),,Farm stay,Woodbridge,$149 per night,
122,Cottage on Private Estate near Ch,3 bed,4.79 (99),Rare fin,Cottage,Northumberlan,$202 per night,
156,Self contained garden studio ( The Snug),1 bed,5.0 (6),,Tiny home,Saint Merryn,$139 per night,
206,Inviting 4 Berth caravan in Hebden Bridg,2 double bed,5.0 (10),,Chalet,West Yorkshire,$157 per night,
179,"Beach front, private garden, seaside.",4 bed,4.65 (17),,Chalet,Abersoch,$125 per night,
202,The Hazel Hide - Luxury Eco A-Frame Cab,4 bed,New,,Cabin,Ashington,$253 per night,
210,"Cala Fearnadh Off-Grid cabin, Bunessan, Mull",3 bed,4.93 (149),Rare fin,Cabin,Bunessan,$219 per night,


In [102]:
df3_clean.sample(10)

Unnamed: 0,Title,Beds,Reviews,Host,Type,Location,Current Price,Original Price
108,Modern Getaway w/ POOL &amp; HOT TUB and AMAZI...,6 bed,4.88 (48),Superhost,Home,Dalton,"$535 per night,",$727
178,Claimjumper Creekside Cabin R,1 queen bed,4.89 (542),Superhost,,,"$292 per night,",$391
193,"Mountains To Deserts, You'll find it all here!",4 bed,4.94 (156),Superhost,Cabin,Teasdale,$218 per night,
262,The Cove Cab,2 bed,4.98 (209),Rare fin,Cabin,Sherman,"$429 per night,",$477
37,"Malibu, Carbon Beach- Bungalow Twelv",2 bed,4.85 (315),Plus,Bungalow,Malibu,$539 per night,
30,"Joshua Tree Casita, Amazing National Park Views",1 queen bed,4.98 (337),Superhost,Guesthouse,Joshua Tree,$175 per night,
66,"Private Cabin/Hot Tub, Next to Purgatory Resort!",1 queen bed,4.81 (187),Superhost,Cabin,Durango,$274 per night,
183,"Black Barn Ranch, Catskills. Enjoy the fall fo...",3 bed,4.88 (89),Superhost,Home,Kerhonkson,$592 per night,
73,Dove Haven | Find Your Peac,5 bed,New,Superhost,Villa,New Haven,$332 per night,
169,Cozy Farmhouse - Pet friendly,1 bed,New,,Home,Denver,$166 per night,


In [103]:
df4_clean.sample(10)

Unnamed: 0,Title,Beds,Reviews,Host,Type,Location,Current Price,Original Price
184,The Wandering Star Cabin Joshua Tree - Epic Vi...,5 bed,4.85 (137),Superhost,Tiny home,Mentone,$207 per night,
42,True Cold War Relic Atlas F Missile Silo / Bunk,2 queen bed,4.95 (55),,,,"$375 per night,",$424
72,The Alpine A Frame - Cozy Cabin with Barrel Sau,2 bed,4.88 (374),Rare fin,Condo,Las Vegas,"$215 per night,",$262
130,Beautiful Luxury Yurt bordering on Flathead Lak,1 king bed,4.55 (205),,Cabin,Broken Bow,$289 per night,
266,NEW! Sunny Forest Hideaway w/ Hot Tub &amp; Ga...,1 queen bed,4.95 (274),Superhost,Home,Waco,$160 per night,
224,The Barn at Slick Rock,4 bed,4.92 (288),Rare fin,Home,Joshua Tree,"$763 per night,","$1,217"
264,The Local Chapter Big Bend Luxury Yurt 2 NEW,2 king bed,4.99 (80),Superhost,Home,Joshua Tree,"$832 per night,","$1,639"
112,NEW~rate discount~ Private Country Retreat w/P...,1 king bed,4.9 (347),Rare fin,Home,Wapato,$185 per night,
41,Young Wild &amp; TREE(HOUSE) private hot tub,2 bed,4.88 (26),Superhost,Tiny home,Quincy,$316 per night,
183,"Black Barn Ranch, Catskills. Enjoy the fall fo...",8 bed,4.91 (206),Superhost,Home,Atlanta,"$654 per night,",$912


## Issue 5: Listings reviews has two variables in one column

### Define
- The reviews column should be split into two columns ie Ratings and the number of reviews a listing has

### Code

In [104]:
df_clean['Reviews'].sample(5)

298    4.98 (127)
10      4.94 (33)
212      4.8 (10)
296     4.99 (74)
104    4.91 (409)
Name: Reviews, dtype: object

In [105]:
# Use the split function to separate the Reviews columns
rates = df_clean['Reviews'].str.split(" ", n=1, expand=True)

In [106]:
rates2 = df2_clean['Reviews'].str.split(" ", n=1, expand=True)

In [107]:
rates3 = df3_clean['Reviews'].str.split(" ", n=1, expand=True)

In [108]:
rates4 = df4_clean['Reviews'].str.split(" ", n=1, expand=True)

In [109]:
df_clean['Rating'] = rates[0]
df_clean['No.of reviews'] = rates[1]

In [110]:
df2_clean['Rating'] = rates2[0]
df2_clean['No.of reviews'] = rates2[1]

In [111]:
df3_clean['Rating'] = rates3[0]
df3_clean['No.of reviews'] = rates3[1]

In [112]:
df4_clean['Rating'] = rates4[0]
df4_clean['No.of reviews'] = rates4[1]

In [113]:
#DRop the Reviews column since it's no longer needed
df_clean.drop(columns=['Reviews'], inplace=True)
df2_clean.drop(columns=['Reviews'], inplace=True)
df3_clean.drop(columns=['Reviews'], inplace=True)
df4_clean.drop(columns=['Reviews'], inplace=True)

In [114]:
df_clean.sample(5)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,No.of reviews
238,King / Family in a Characterful Georgian Mans,3 bed,,Private room,Hampshire,$166 per night,,,
233,Peak House Farm (Hereford Double Ensuite),1 bed,,,,$103 per night,,4.91,(22)
249,New Forest Country Home with Hot Tub &amp; Ope...,9 bed,Superhost,Guest suite,Hampshire,$764 per night,,4.89,(97)
231,"Double or twin with Ensuite, Kinguss",1 double bed,,Hotel room,Kingussie,$133 per night,,4.38,(41)
59,Beautifully Handcrafted cosy Log cab,4 bed,Superhost,Cabin,Broad Haven,$175 per night,,4.97,(39)


In [115]:
df2_clean.sample(5)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,No.of reviews
87,Under The Waves Unique Apartment Filey Sea F,1 bed,,Vacation home,North Yorkshire,$481 per night,,,
157,Friendly log cab,,Superhost,Cabin,Llandrillo,$334 per night,,4.71,(7)
155,Lodge 5 (Family Style) - Yorkshire Dales,3 king bed,,Private room,North Yorkshire,$451 per night,,4.33,(3)
47,Serenity Lodge Otterburn with Hot Tub,6 bed,Rare fin,Cabin,Old Town Farm,$485 per night,,4.97,(37)
231,Railway Carriages Hinton Admiral,3 bed,Rare fin,Train,Hinton,$259 per night,,4.5,(60)


In [116]:
df3_clean.sample(5)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,No.of reviews
80,Mountain G,2 bed,,Home,Fancy Gap,$149 per night,,4.92,(97)
154,New modern lakefront cabin with stunning views!,5 bed,Superhost,Cabin,Anaconda,$697 per night,,4.98,(127)
95,Adventure Cabin 74,3 bed,,Tiny home,Truckee,$325 per night,,4.71,(48)
280,NEW! Dog-Friendly ‘Sunset Cabin’ w/ Wood Fire ...,1 queen bed,,Cabin,Thorndale,$131 per night,,5.0,(5)
242,Award-Winning Forest Getaway: @thesearanchhous,2 bed,Rare fin,Home,The Sea Ranch,"$464 per night,",$618,4.99,(160)


In [117]:
df4_clean.sample(5)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,No.of reviews
203,"Mountains To Deserts, You'll find it all here!",8 bed,Rare fin,Home,Joshua Tree,"$860 per night,","$1,108",4.95,(298)
26,"Tiny House Cozy Cabin by Zion, Grand Canyon, Bryc",3 queen bed,,Home,Saint Helena Islan,$437 per night,,4.63,(8)
295,NEW! Sunny Forest Hideaway w/ Hot Tub &amp; Ga...,2 bed,Superhost,Tiny home,Liberty Hill,$157 per night,,4.93,(76)
242,Award-Winning Forest Getaway: @thesearanchhous,1 queen bed,Superhost,Tiny home,Asheville,$145 per night,,4.93,(276)
96,Santa Fe Super D,1 king bed,Superhost,Condo,Whitethorn,$694 per night,,4.99,(80)


## Issue 6: Unecessary data in the dataset

### Define
- Get rid of the unecessary information in the dataset. Example; the word beds in the Beds column, the brackets in the No of reviews column

### Code

In [118]:
#Use the strip functions to get rid of the extra data that was scraped and is not needed in the analysis
df_clean['No.of reviews'] = df_clean['No.of reviews'].astype(str).str.lstrip('(').str.rstrip(')')

In [119]:
df2_clean['No.of reviews'] = df2_clean['No.of reviews'].astype(str).str.lstrip('(').str.rstrip(')')

In [120]:
df3_clean['No.of reviews'] = df3_clean['No.of reviews'].astype(str).str.lstrip('(').str.rstrip(')')

In [121]:
df4_clean['No.of reviews'] = df4_clean['No.of reviews'].astype(str).str.lstrip('(').str.rstrip(')')

In [122]:
df4_clean['Beds'] = df4_clean['Beds'].str[0]


In [123]:
df3_clean['Beds'] = df3_clean['Beds'].str[0]


In [124]:
df2_clean['Beds'] = df2_clean['Beds'].str[0]


In [125]:
df_clean['Beds'] = df_clean['Beds'].str[0]

In [126]:
df_clean['Current Price'] = df_clean['Current Price'].astype(str).str.lstrip('$').str.rstrip('per night,')


In [127]:
df2_clean['Current Price'] = df2_clean['Current Price'].astype(str).str.lstrip('$').str.rstrip('per night,')


In [128]:
df3_clean['Current Price'] = df3_clean['Current Price'].astype(str).str.lstrip('$').str.rstrip('per night,')


In [129]:
df4_clean['Current Price'] = df4_clean['Current Price'].astype(str).str.lstrip('$').str.rstrip('per night,')


In [130]:
#Get rid of the dollar sign in the Original Price column using the replace function
df_clean['Original Price'] = df_clean['Original Price'].str.replace(r'[^0-9]+', '')
df2_clean['Original Price'] = df2_clean['Original Price'].str.replace(r'[^0-9]+', '')
df3_clean['Original Price'] = df3_clean['Original Price'].str.replace(r'[^0-9]+', '')
df4_clean['Original Price'] = df4_clean['Original Price'].str.replace(r'[^0-9]+', '')


### Test

In [131]:
df_clean.sample(5)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,No.of reviews
170,Park of Drumquhassle Twin Room in Estate Mans,2,,Private room,Drymen,206,,4.6,5.0
197,Super King/Twin Luxury Apartment (Garbh-bheinn),1,Superhost,Farm stay,Highland Council,118,182.0,5.0,8.0
237,The Hive @ Braeside Retreats,1,,Tiny home,Thurso,157,,,
92,The Chamber at White Rose Tow,1,,Private room,Highland Council,240,,,
137,Stylish Cosy Lodge in Grizedale Fores,4,Rare fin,Cabin,Satterthwaite,262,,4.99,231.0


In [132]:
df3_clean.sample(5)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,No.of reviews
38,East Side Beehiv,1,Superhost,,,303,,4.9,182
149,The Martingale at Max Patch,3,Superhost,Cabin,Hot Springs,161,205.0,4.97,88
27,Chic Desert Homestead near JoshuaTree🌵,1,Superhost,Cabin,Twentynine Palms,132,,5.0,16
137,Mountain Retreat w/ Views of Lake Pend Oreill,4,,Home,Sagle,345,,4.9,49
89,"🌅Gorgeous Oceanview, Shelter Cove, Oceanfront! 🌊",1,Superhost,Condo,Whitethorn,703,,4.99,80


In [133]:
df2_clean.sample(5)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,No.of reviews
123,"Glorious Lotus Belle Tent, Bamburgh",,,,,172,,4.8,5
226,Mermaid’s Hut \nBrightlings,,Superhost,Hut,Brightlingsea,55,,5.0,14
240,Luxury Hut-Bedviews/Shower/LBurn/Wc/Stars/Dog/WiF,1.0,Superhost,Shepherd’s hut,Usk,135,,5.0,72
38,Large luxury coastal glamping pod and outdoor ...,1.0,,Farm stay,Weston,206,,5.0,7
147,Oak Lodge at Avonvale Holiday Lodges,7.0,,Cabin,Evesham,227,,4.76,77


In [134]:
df_clean.sample(5)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,No.of reviews
219,The Old School House - Loft Apartment 8,4,Superhost,Apartment,Lytham St Annes,157,,New,
149,"Royal Portrush, accesible Flax Mill",1,,Castle,Coleraine,710,,5.0,4.0
242,Spacious room with king size bed and en-su,1,,Private room,Cornwall,145,,New,
5,"Swallows Loft, The Old Vicarag",2,Rare fin,Condo,Far Sawrey,180,206.0,4.98,109.0
180,Crookston House B&amp;B Ballroom Bed,1,Superhost,Private room,Scottish Borders,157,,5.0,12.0


## Issue 7: None values in the datasets

### Define
- Use the replace function to get rid of 'None' in the dataset

### Code

In [135]:
#Using Numpy, fill in the empty values with np.nan
df_clean = df_clean.fillna(value=np.nan)

In [136]:
df2_clean = df2_clean.fillna(value=np.nan)

In [137]:
df3_clean = df3_clean.fillna(value=np.nan)

In [138]:
df4_clean = df4_clean.fillna(value=np.nan)

In [139]:
df_clean = df_clean.replace('None', np.NaN)
df2_clean = df2_clean.replace('None' ,np.NaN)
df3_clean = df3_clean.replace('None' , np.NaN)
df4_clean = df4_clean.replace('None' , np.NaN)

In [140]:
#convert the blank spaces to nan using Numpy
df_clean.replace(r'^\s*$', np.nan, regex=True)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,No.of reviews
0,"David Emanuel Suite, Grade 11* Gileston M",1,,,,623,,,
1,Berney - converted railway carriag,2,,Train,Trowse Newton,179,,New,
2,Superking Room- Proper B&amp;B - Pool- No Book...,1,Superhost,Private room,"Little Whelnetham, Bury St Edmunds",199,,5.0,39
3,Kidwelly Farmhouse B&amp;B -The Lof,2,Superhost,Private room,Kidwelly,115,,5.0,23
4,The Lake Lodge (Windermere),4,Superhost,Home,Cumbria,670,,4.93,164
...,...,...,...,...,...,...,...,...,...
295,Cosy 1 bedroom Shepherd's hut with indoor f,3,,Shepherd’s hut,Llandegla,138,,4.72,18
296,Apartment with stunning views across Fistral B...,4,Superhost,Apartment,Newquay,168,,4.99,74
297,1 Craiglinnhe Court...sleeps 2-4 guests.,3,Rare fin,Home,Highland Council,179,,4.97,220
298,Glamping Under Stars Shepherd's Hut -,1,,Shepherd’s hut,Rhosesmor,132,,4.98,127


In [141]:
df2_clean.replace(r'^\s*$', np.nan, regex=True)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,No.of reviews
0,"David Emanuel Suite, Grade 11* Gileston M",1,,,,839,,,
1,NEW Luxury Romantic Cottage - Idyllic Rural Bliss,2,Rare fin,,,249,,4.98,172
2,Mirts Mansion - Little Country Houses SS,1,Rare fin,Tiny home,Mickleton,208,,4.87,126
3,"The Croft Chalet Pods - pod 2, Loch K",1,Superhost,Tiny home,Idrigil,151,,4.96,94
4,Small bed bedroom in a cosy Victorian fl,1,Rare fin,Private room,London,80,,4.95,20
...,...,...,...,...,...,...,...,...,...
295,Hafod Station (UK35030),2,,Home,Kent,226,353,4.86,7
296,"Stables - Uninterrupted Sea Views, Sleeps 8, L...",6,Rare fin,Home,Croyde,478,575,4.89,9
297,Carreg Gleision (UK33739),2,,Home,Mano,179,,,
298,Old Belfield (CC128241),3,,Home,Bowness on Windermere,221,253,4.27,15


In [142]:
df3_clean.replace(r'^\s*$', np.nan, regex=True)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,No.of reviews
0,Sleeps 10 | Beach Access | Tiki Bar | Dog Frie...,1,,Home,Bolivar Peninsula,277,,New,
1,Iconic Glass Mansion - Huge Views - Best Loc,1,Superhost,Home,Osage Beach,485,901,4.98,50
2,The ATL Treehouse with hot tub/ heated spac,1,Superhost,Treehouse,East Point,294,,4.79,403
3,"❤️ Calico Cabin Mtn Views! Hot tub, Firepit, p...",3,Superhost,Cabin,Blue Ridge,209,435,4.95,175
4,I Bar Ranch One of a kind Off Grid Cab,1,Superhost,Cabin,Challis,173,,4.99,212
...,...,...,...,...,...,...,...,...,...
291,Great Views | Desert Oasis | Pet-Friendly| Hot...,4,Superhost,,,196,290,4.85,88
292,Lovely Bear Cabin on the White Riv,4,Superhost,Cabin,Mountain View,222,,4.96,26
293,Glamping Riverfront w Private Pavilion WiFi sh...,4,Superhost,,,234,,4.89,47
297,CASA DE SUNSET-JTNP -The Ultimate Desert Getaway,4,Rare fin,Home,Twentynine Palms,171,206,4.99,113


In [143]:
df4_clean.replace(r'^\s*$', np.nan, regex=True)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,No.of reviews
0,Sleeps 10 | Beach Access | Tiki Bar | Dog Frie...,1,Rare fin,Place to stay,Hildale,531,853,4.87,168
1,Iconic Glass Mansion - Huge Views - Best Loc,3,Rare fin,Cottage,Dauphin Islan,578,,4.57,28
2,The ATL Treehouse with hot tub/ heated spac,1,Superhost,Treehouse,East Point,427,,4.79,403
3,"❤️ Calico Cabin Mtn Views! Hot tub, Firepit, p...",1,Plus,Treehouse,Asheville,447,,4.98,591
4,I Bar Ranch One of a kind Off Grid Cab,2,Superhost,Cabin,Indian River,668,848,4.96,249
...,...,...,...,...,...,...,...,...,...
295,NEW! Sunny Forest Hideaway w/ Hot Tub &amp; Ga...,2,Superhost,Tiny home,Liberty Hill,157,,4.93,76
296,"Butterfly Cottage near COTA F1, Lockhart &amp;...",1,Superhost,Cottage,Whittier,164,,4.98,182
297,CASA DE SUNSET-JTNP -The Ultimate Desert Getaway,2,,Home,Fountain Hills,481,588,4.94,54
298,Book your Winter Wonderland Getaway Today!,1,Superhost,Camper/RV,Savannah,102,112,4.79,237


### Test

In [144]:
df_clean.sample(5)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,No.of reviews
295,Cosy 1 bedroom Shepherd's hut with indoor f,3,,Shepherd’s hut,Llandegla,138,,4.72,18
155,Honeysuckle Pod - Bradley Hall Rural Escapes,1,Superhost,Tiny home,Whitchurch,151,,5.0,21
119,The Strickland Arms - Lina's LogBy,6,,Yurt,Penrith,133,,4.89,97
140,Eco Pod 2,4,,Tiny home,Glenmore,97,,4.59,58
62,CaeManal - Manal,1,Superhost,Farm stay,Ceredigion,162,,5.0,64


In [145]:
df2_clean.sample(5)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,No.of reviews
35,"Hafod y Rhedwydd, an off-grid cottage in an SSSI",3,Superhost,Cottage,Gwyne,309,,4.9,31
66,Vibrant Flat with Fantastic Views by the Royal...,2,Plus,Apartment,Edinburgh,546,,4.8,165
37,Oak House Shepherds Hu,3,Superhost,Shepherd’s hut,Everton,271,,4.95,149
76,Secluded tranquil woodland lodge with amazing ...,1,Superhost,Tiny home,Mortehoe,179,,4.99,68
257,Ploughman's Retreat - Stunning Vintage Hu,1,,Shepherd’s hut,South Alkham,225,,5.0,9


In [146]:
df3_clean.sample(5)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,No.of reviews
278,AFrame Featured in Dwell &amp; Sunset mags 4mi...,4,Superhost,Cabin,Big Bear Lake,280,428.0,4.79,151
140,Double Down - Family MTB Friendly Lake Front Liv,8,Superhost,Home,Bella Vista,431,,4.93,14
82,NEW~rate discount~ Private Country Retreat w/P...,7,Superhost,Home,Apopka,560,,5.0,21
35,Surf City Heart! Island Center with Excellent ...,6,Superhost,Home,Surf City,770,885.0,5.0,59
236,Modern Cabins near Lake Austin w/ Cowboy Pool!,7,Superhost,Cabin,Austin,387,538.0,4.95,76


In [147]:
df4_clean.sample(5)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,No.of reviews
276,Joshua tree! Stargazing~Hot Tub~Firepit~ Hammocks,2,Superhost,Tiny home,Liberty Hill,157,,4.93,76
195,Casa Mami Pioneertow,5,Superhost,Home,Scottsdale,724,,5.0,7
94,Dome #2 at Smoky Mountains Glamping (Read Deta...,3,,,,177,287.0,4.69,36
181,Private Hillside Cabin Near Broken Bow Lake in...,2,Rare fin,Cabin,Gatlinburg,801,,4.92,464
116,Glamping Dome in Nature-Read Details First-4WD!,1,,Home,Logan,787,,4.78,23


## Issue 8: Renaming columns

### Define
- Use the rename function to rename some of the columns to a more relatable name

### Code

In [148]:
#Use the rename function to rename some of the columns
df_clean.rename(columns={"No.of reviews":"Reviews"}, inplace = True)

In [149]:
df2_clean.rename(columns={"No.of reviews":"Reviews"}, inplace = True)

In [150]:
df3_clean.rename(columns={"No.of reviews":"Reviews"}, inplace = True)

In [151]:
df4_clean.rename(columns={"No.of reviews":"Reviews"}, inplace = True)

### Test

In [152]:
df_clean.sample(5)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,Reviews
293,Snowd,1,Superhost,Shepherd’s hut,Talsarnau,97,,4.89,187
109,North Lodg,3,,Cottage,Cumbria,482,,4.71,7
251,"Farm Stay Shepherds Hut, In Beautiful Oak Orchard",1,Rare fin,Shepherd’s hut,Devon,150,,5.0,101
235,Super Manor House bed and breakfast 1,1,,Private room,Powys,167,,4.91,11
193,Rural farmhouse between Hay and Brec,5,Superhost,Home,Llandefalle,253,,4.9,20


In [153]:
df2_clean.sample(5)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,Reviews
45,Sandpiper Lodg,3,Superhost,Barn,Norfolk,344,,5.0,4
270,Ty Arts,3,,Home,Llanddeiniolen,259,,4.71,7
277,Town Centre / Outside Terrace / Parking / Wif,1,Superhost,Apartment,Stratford Upon Avon,166,,4.87,282
241,"Primrose Farm, Glamping With hot Tub - Bluebell",4,Superhost,Tent,Chacewater,242,,4.73,52
190,"Glampio Coed Glamping - Porthor, Aberd",1,,,,151,,4.96,27


In [154]:
df3_clean.sample(5)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,Reviews
14,3-Story Container Home +spa +pool +rooftop deck,3,Superhost,Home,Joshua Tree,328,409.0,5.0,3
224,The Barn at Slick Rock,1,Rare fin,Barn,Hendersonville,151,210.0,4.96,238
13,Little Lake Hous,2,Superhost,Home,Fond du Lac,221,,4.74,197
136,Bee Our Guest Mountain Top Glamping D,1,,Dome,Mars Hill,161,213.0,5.0,10
2,The ATL Treehouse with hot tub/ heated spac,1,Superhost,Treehouse,East Point,294,,4.79,403


In [155]:
df4_clean.sample(5)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,Reviews
128,Adventure Cabin 74,4,Superhost,Cabin,Sapulpa,232,,4.85,103
216,Last minute thanksgiving cabin? 1 hr from Chicag,1,,Home,North Myrtle Beach,217,276.0,5.0,4
62,Off-grid itHous,3,Superhost,Cabin,Eclectic,400,,4.8,30
263,The Red Barn at the Lakef,2,,Cabin,Shipshewana,344,402.0,4.77,62
243,Zion Canyon BnB Room-4,3,Superhost,Cabin,Maggie Valley,271,433.0,4.92,26


## Issue 9: Float values in the Original Price column

### Define
- Convert the Original Prices column from float to Int datatype

### Code

In [156]:
#Since the float to int conversion is not possible with nan in the Prices columns, first fillna with '0' then use astype function to convert float to int

df_clean['Original Price'] = df_clean['Original Price'].fillna(0).astype(np.int64, errors='ignore')
df2_clean['Original Price'] = df2_clean['Original Price'].fillna(0).astype(np.int64, errors='ignore')
df3_clean['Original Price'] = df3_clean['Original Price'].fillna(0).astype(np.int64, errors='ignore')
df4_clean['Original Price'] = df4_clean['Original Price'].fillna(0).astype(np.int64, errors='ignore')

In [157]:
df2_clean['Host'].unique()

array(['', 'Rare fin', 'Superhost', 'Plus'], dtype=object)

### Test

In [158]:
df_clean.sample(5)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,Reviews
244,Esk - beautiful timber cabin in The Lake Distric,2.0,Superhost,Dome,Irton,183,0,4.93,15
94,"Beautiful large house, idyllic woodland gard",6.0,Superhost,Home,Aberdeenshire,392,0,4.86,28
113,Armadilla 2 at Lee Wick Farm Cottages &amp; Gl...,,Superhost,Private room,St Osyth,137,0,4.56,16
271,The Tool Shed nestled in idyllic countrysid,1.0,Superhost,Hut,Redfor,69,0,4.94,226
222,Peaceful space in a unique historic h,1.0,,Private room,North Yorkshire,109,0,4.92,13


## Issue 10: Null values in the datasets

### Define
- Use the dropna function to get rid of all null values

### Code

In [159]:
df_clean.isnull().sum()

Title              0
Beds               6
Host               0
Type              21
Location          21
Current Price      0
Original Price     0
Rating             0
Reviews           30
dtype: int64

In [160]:
df2_clean.isnull().sum()

Title              0
Beds               7
Host               0
Type              25
Location          25
Current Price      0
Original Price     0
Rating             0
Reviews           43
dtype: int64

In [161]:
df3_clean.isnull().sum()

Title              0
Beds               0
Host               0
Type              21
Location          21
Current Price      0
Original Price     0
Rating             0
Reviews           12
dtype: int64

In [162]:
df4_clean.isnull().sum()

Title              0
Beds               2
Host               0
Type              23
Location          23
Current Price      0
Original Price     0
Rating             0
Reviews           14
dtype: int64

In [163]:
#Use the dropna function to remove any null values
df_clean.dropna(subset=['Beds', 'Type', 'Location', 'Reviews'], inplace=True)
df2_clean.dropna(subset=['Beds', 'Type', 'Location', 'Reviews'], inplace=True)
df3_clean.dropna(subset=['Beds', 'Type', 'Location', 'Reviews'], inplace=True)
df4_clean.dropna(subset=['Beds', 'Type', 'Location', 'Reviews'], inplace=True)

### Test

In [164]:
df_clean.isnull().sum()

Title             0
Beds              0
Host              0
Type              0
Location          0
Current Price     0
Original Price    0
Rating            0
Reviews           0
dtype: int64

In [165]:
df2_clean.isnull().sum()

Title             0
Beds              0
Host              0
Type              0
Location          0
Current Price     0
Original Price    0
Rating            0
Reviews           0
dtype: int64

In [166]:
df3_clean.isnull().sum()

Title             0
Beds              0
Host              0
Type              0
Location          0
Current Price     0
Original Price    0
Rating            0
Reviews           0
dtype: int64

In [167]:
df4_clean.isnull().sum()

Title             0
Beds              0
Host              0
Type              0
Location          0
Current Price     0
Original Price    0
Rating            0
Reviews           0
dtype: int64

## Issue 11: 'Rare fin' in the Host column and missing Host info

### Define
- 'Rare find' should be in the Host columns, not 'Rare fin'
- Since not all listings had the Host information, use the replace function to replace the missing information with np.nan

### Code

In [168]:
df4_clean['Host'].unique()

array(['Rare fin', 'Superhost', 'Plus', ''], dtype=object)

In [169]:
#df_clean[df_clean['Host']].str.contains("Rare fin", na = False).apply(lambda x: str(x)+ 'd')
#Use the lambda function to fix the Host information that has a missing 'd'

df_clean['Host'] = df_clean['Host'].apply(lambda x: str(x) +'d' if 'Rare fin' in x else x)
df2_clean['Host'] = df2_clean['Host'].apply(lambda x: str(x) +'d' if 'Rare fin' in x else x)
df3_clean['Host'] = df3_clean['Host'].apply(lambda x: str(x) +'d' if 'Rare fin' in x else x)
df4_clean['Host'] = df4_clean['Host'].apply(lambda x: str(x) +'d' if 'Rare fin' in x else x)


In [170]:
#convert the blank spaces to nan using Numpy

df_clean['Host'].replace('', np.nan, inplace=True)
df2_clean['Host'].replace('', np.nan, inplace=True)
df3_clean['Host'].replace('', np.nan, inplace=True)
df4_clean['Host'].replace('', np.nan, inplace=True)

### Test

In [171]:
df_clean['Host'].unique()

array(['Superhost', 'Rare find', nan, 'Plus'], dtype=object)

In [172]:
df_clean.sample(5)

Unnamed: 0,Title,Beds,Host,Type,Location,Current Price,Original Price,Rating,Reviews
196,Dingle den caravan h,4,Rare find,Campsite,Wales,82,0,4.41,78
171,"Merrydown's Shepherd's Hut, smart and cosy",1,Superhost,Shepherd’s hut,Dalwoo,110,0,5.0,26
285,Listers Lodg,4,Rare find,Cottage,Brotton,97,0,4.76,470
74,"‘Lake View Lodge’ (44) Winderemere, Bowness",3,Rare find,Chalet,Windermere,246,0,4.78,169
89,Secluded Barn set within private 150 acres,6,Superhost,Farm stay,Caistor,115,0,4.94,127


In [173]:
#Change the Current Price column datatype to an Integer

df_clean['Current Price'] = df_clean['Current Price'].fillna(0).astype(np.int64, errors='ignore')
df2_clean['Current Price'] = df2_clean['Current Price'].fillna(0).astype(np.int64, errors='ignore')
df3_clean['Current Price'] = df3_clean['Current Price'].fillna(0).astype(np.int64, errors='ignore')
df4_clean['Current Price'] = df4_clean['Current Price'].fillna(0).astype(np.int64, errors='ignore')

In [174]:
#Use the astype function to chnage the datatype to Int

df_clean['Beds'] = df_clean['Beds'].astype(int)
df2_clean['Beds'] = df2_clean['Beds'].astype(int)
df3_clean['Beds'] = df3_clean['Beds'].astype(int)
df4_clean['Beds'] = df4_clean['Beds'].astype(int)

In [175]:
df_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 238 entries, 2 to 299
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Title           238 non-null    object
 1   Beds            238 non-null    int32 
 2   Host            185 non-null    object
 3   Type            238 non-null    object
 4   Location        238 non-null    object
 5   Current Price   238 non-null    int64 
 6   Original Price  238 non-null    int64 
 7   Rating          238 non-null    object
 8   Reviews         238 non-null    object
dtypes: int32(1), int64(2), object(6)
memory usage: 17.7+ KB


## Issue 12: Rename the columns

### Define
- Use the rename function to change the name of some columns

### Code

In [176]:
df_clean.rename(columns={"Title":"Name", "Type":"Listing_type", "Original Price":"OriginalPrice", "Current Price":"DiscountedPrice"}, inplace = True)
df2_clean.rename(columns={"Title":"Name", "Type":"Listing_type", "Original Price":"OriginalPrice", "Current Price":"DiscountedPrice"}, inplace = True)
df3_clean.rename(columns={"Title":"Name", "Type":"Listing_type", "Original Price":"OriginalPrice", "Current Price":"DiscountedPrice"}, inplace = True)
df4_clean.rename(columns={"Title":"Name", "Type":"Listing_type", "Original Price":"OriginalPrice", "Current Price":"DiscountedPrice"}, inplace = True)

### Test

In [177]:
df_clean.head()

Unnamed: 0,Name,Beds,Host,Listing_type,Location,DiscountedPrice,OriginalPrice,Rating,Reviews
2,Superking Room- Proper B&amp;B - Pool- No Book...,1,Superhost,Private room,"Little Whelnetham, Bury St Edmunds",199,0,5.0,39
3,Kidwelly Farmhouse B&amp;B -The Lof,2,Superhost,Private room,Kidwelly,115,0,5.0,23
4,The Lake Lodge (Windermere),4,Superhost,Home,Cumbria,670,0,4.93,164
5,"Swallows Loft, The Old Vicarag",2,Rare find,Condo,Far Sawrey,180,206,4.98,109
6,"Unique Scottish Country House Trossachs,Calland",8,,Home,Brig o'Turk,585,0,4.89,48


## Issue 13: Get cordinates from the Locations

### Define
- Use geolocator to generate the cordinates from the locations

### Code

In [None]:
#Initialize Nominatim API
geolocator = Nominatim(timeout=10, user_agent= "geoapiExercises")

In [179]:
#To avoid getting the Attribute error, create functions that make sure there are no Null values being passed in the geolocator function
def get_latitude(x):
  if hasattr(x,'latitude') and (x.latitude is not None): 
     return x.latitude


def get_longitude(x):
  if hasattr(x,'longitude') and (x.longitude is not None): 
     return x.longitude
    

In [180]:
#Use the geolocator function to get the Latitudes and Longitudes

geolocate_column = df_clean['Location'].apply(geolocator.geocode)
df_clean['Latitude'] = geolocate_column.apply(get_latitude)
df_clean['Longitude'] = geolocate_column.apply(get_longitude)

In [181]:
geolocate_column = df2_clean['Location'].apply(geolocator.geocode)
df2_clean['Latitude'] = geolocate_column.apply(get_latitude)
df2_clean['Longitude'] = geolocate_column.apply(get_longitude)

In [182]:
geolocate_column = df3_clean['Location'].apply(geolocator.geocode)
df3_clean['Latitude'] = geolocate_column.apply(get_latitude)
df3_clean['Longitude'] = geolocate_column.apply(get_longitude)

In [183]:
geolocate_column = df4_clean['Location'].apply(geolocator.geocode)
df4_clean['Latitude'] = geolocate_column.apply(get_latitude)
df4_clean['Longitude'] = geolocate_column.apply(get_longitude)

In [184]:
#Use the dropna function to remove any null values in the cordinates columns

df_clean.dropna(subset=['Latitude', 'Longitude'], inplace=True)
df2_clean.dropna(subset=['Latitude', 'Longitude'], inplace=True)
df3_clean.dropna(subset=['Latitude', 'Longitude'], inplace=True)
df4_clean.dropna(subset=['Latitude', 'Longitude'], inplace=True)

In [185]:
''''
def reverseGeocode(coordinates): 
    result = rg.search(coordinates)
    return (result)

'''''

"'\ndef reverseGeocode(coordinates): \n    result = rg.search(coordinates)\n    return (result)\n\n"

In [186]:
nom = Nominatim(user_agent="http")


In [194]:
#The RateLimiter helps avoid the timeout error when getting the Addresses

geo = RateLimiter(geolocator.geocode, min_delay_seconds=5)

df_clean['Address'] = df_clean['Location'].apply(geo).apply(nom.geocode)



KeyboardInterrupt: 

In [None]:

geo = RateLimiter(geolocator.geocode, min_delay_seconds=5)

df2_clean['Address'] = df2_clean['Location'].apply(geo).apply(nom.geocode)


In [None]:

geo = RateLimiter(geolocator.geocode, min_delay_seconds=5)

df3_clean['Address'] = df3_clean['Location'].apply(geo).apply(nom.geocode)


In [None]:

geo = RateLimiter(geolocator.geocode, min_delay_seconds=5)

df4_clean['Address'] = df4_clean['Location'].apply(geo).apply(nom.geocode)


In [195]:
#df_clean['Address'].sample(5)

In [None]:

'''''
def get_city(x):
  if hasattr(x,'city') and (x.city is not None): 
     return x.city

def get_country(x):
  if hasattr(x,'country') and (x.country is not None): 
     return x.country

def get_zipcode(x):
  if hasattr(x,'zipcode') and (x.zipcode is not None): 
     return x.zipcode
    
def get_state(x):
  if hasattr(x,'state') and (x.state is not None): 
     return x.state
    
'''''

    

In [None]:
'''''

geolocate_columns = df_clean['Address'].apply(geolocator.geocode)
df_clean['city'] = geolocate_column.apply(get_city)
df_clean['country'] = geolocate_column.apply(get_country)
df_clean['state'] = geolocate_column.apply(get_state)
df_clean['zipcode'] = geolocate_column.apply(get_zipcode)

'''''

#df_clean['latitude'] = geolocate_column.apply(get_latitude)
#df_clean['longitude'] = geolocate_column.apply(get_longitude)

In [196]:
#Create a functions thet gets locations details from the coordinates eg State name or Zipcodes

def city_state_country(row):
    coord = f"{row['Latitude']}, {row['Longitude']}"
    
    #location = geolocator.reverse(Point(row['Latitude'], row['Longitude']))
    location = geolocator.reverse(coord, exactly_one=True)
    address = location.raw['address']
    code = address.get('country_code')
    zipcode = address.get('postcode')
    state = address.get('state', '')
    country = address.get('country', '')
    row['State'] = state
    row['Country'] = country
    row['ZipCodes'] = zipcode
    row['Codes'] = code
    return row


In [198]:
#Call the function in the different dataframes

df_clean = df_clean.apply(city_state_country, axis=1)
df2_clean = df2_clean.apply(city_state_country, axis=1)
df3_clean = df3_clean.apply(city_state_country, axis=1)
df4_clean = df4_clean.apply(city_state_country, axis=1)



In [199]:
df_clean.isnull().sum()

Name                0
Beds                0
Host               49
Listing_type        0
Location            0
DiscountedPrice     0
OriginalPrice       0
Rating              0
Reviews             0
Latitude            0
Longitude           0
State               0
Country             0
ZipCodes            9
Codes               0
dtype: int64

### Test

In [200]:
df_clean['ZipCodes'][190]

'LL65 2NZ'

In [201]:
df_clean['Codes'][190]

'gb'

In [202]:
df_clean['Location'][190]

'Rhoscolyn'

In [205]:
df_clean.head()

Unnamed: 0,Name,Beds,Host,Listing_type,Location,DiscountedPrice,OriginalPrice,Rating,Reviews,Latitude,Longitude,State,Country,ZipCodes,Codes
2,Superking Room- Proper B&amp;B - Pool- No Book...,1,Superhost,Private room,"Little Whelnetham, Bury St Edmunds",199,0,5.0,39,52.206821,0.755366,England,United Kingdom,IP30 0TJ,gb
3,Kidwelly Farmhouse B&amp;B -The Lof,2,Superhost,Private room,Kidwelly,115,0,5.0,23,51.736387,-4.306231,Cymru / Wales,United Kingdom,SA17 4UD,gb
4,The Lake Lodge (Windermere),4,Superhost,Home,Cumbria,670,0,4.93,164,54.614314,-2.94209,England,United Kingdom,CA11 0LF,gb
5,"Swallows Loft, The Old Vicarag",2,Rare find,Condo,Far Sawrey,180,206,4.98,109,54.351053,-2.957659,England,United Kingdom,LA22 0LQ,gb
6,"Unique Scottish Country House Trossachs,Calland",8,,Home,Brig o'Turk,585,0,4.89,48,56.229672,-4.362475,Alba / Scotland,United Kingdom,FK17 8HS,gb


In [206]:
df_clean.isnull().sum()

Name                0
Beds                0
Host               49
Listing_type        0
Location            0
DiscountedPrice     0
OriginalPrice       0
Rating              0
Reviews             0
Latitude            0
Longitude           0
State               0
Country             0
ZipCodes            9
Codes               0
dtype: int64

In [207]:
#Use the copy function to copy the cleaned datasets to other differently named datasets

BnB_UK1 = df_clean.copy()
BnB_UK2 = df2_clean.copy()
BnB_US1 = df3_clean.copy()
BnB_US2 = df4_clean.copy()

In [208]:
BnB_UK1.to_csv('BnB_UK1.csv', index=False)
BnB_UK2.to_csv('BnB_UK2.csv', index=False)
BnB_US1.to_csv('BnB_US1.csv',  index=False)
BnB_US2.to_csv('BnB_US2.csv',  index=False)