# Melbourne Airbnb Senitment Analysis
Welcome to my analysis of Melbourne Airbnb data. The question I aim to solve is as follows:
- What are the most common words used in the transit description in relation to the price of the Airbnb?
- How does prices correlate to transit descriptions?

My hypothesis is as follows: 

**_places with lowest distance transit (aka most accessible) will have the highest prcies_**

In this analysis, I will attempt to clean the data, create a wordcloud based on the most common words in the 'transit' column and see if there is a correlation between certain words and the price level

In [5]:
!pip install wordcloud
!pip install langdetect



In [2]:
import os
import numpy as np
import pandas as pd

# List all the files that I can use
for dirname, _, filenames in os.walk('MelbourneAirbnb'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
print("\n############################################\n")
print(os.popen('ls').read())

MelbourneAirbnb\calendar_dec18.csv
MelbourneAirbnb\cleansed_listings_dec18.csv
MelbourneAirbnb\listings_summary_dec18.csv
MelbourneAirbnb\neighbourhoods.csv
MelbourneAirbnb\reviews_dec18.csv
MelbourneAirbnb\reviews_summary_dec18.csv

############################################

Melbourne Sentiment Analysis.ipynb
MelbourneAirbnb
cleansed_listings_dec18.xlsx
initial_transit_cleaned.csv
output.png



In [19]:
df = pd.read_excel("cleansed_listings_dec18.xlsx")

In [20]:
# view all the columns in the file, some columns will not be used in my analysis
print(df.columns.values)

['id' 'listing_url' 'scrape_id' 'last_scraped' 'name' 'summary' 'space'
 'description' 'neighborhood_overview' 'notes' 'transit' 'access'
 'interaction' 'house_rules' 'picture_url' 'host_id' 'host_url'
 'host_name' 'host_since' 'host_location' 'host_about'
 'host_response_time' 'host_response_rate' 'host_is_superhost'
 'host_thumbnail_url' 'host_picture_url' 'host_neighborhood'
 'host_verifications' 'host_has_profile_pic' 'host_identity_verified'
 'street' 'neighborhood' 'city' 'suburb' 'state' 'zipcode'
 'smart_location' 'country_code' 'country' 'latitude' 'longitude'
 'is_location_exact' 'property_type' 'room_type' 'accommodates'
 'bathrooms' 'bedrooms' 'beds' 'bed_type' 'amenities' 'price'
 'weekly_price' 'monthly_price' 'security_deposit' 'cleaning_fee'
 'guests_included' 'extra_people' 'minimum_nights' 'maximum_nights'
 'calendar_updated' 'has_availability' 'availability_30' 'availability_60'
 'availability_90' 'availability_365' 'calendar_last_scraped'
 'number_of_reviews' 'first

In [21]:
df.head()

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,summary,space,description,neighborhood_overview,notes,...,review_scores_location,review_scores_value,requires_license,license,instant_bookable,cancellation_policy,require_guest_profile_picture,require_guest_phone_verification,calculated_host_listings_count,reviews_per_month
0,9835,https://www.airbnb.com/rooms/9835,20181200000000,2018-07-12,Beautiful Room & House,,"House: Clean, New, Modern, Quite, Safe. 10Km f...","House: Clean, New, Modern, Quite, Safe. 10Km f...",Very safe! Family oriented. Older age group.,,...,9.0,9.0,f,,f,strict_14_with_grace_period,f,f,1,0.04
1,10803,https://www.airbnb.com/rooms/10803,20181200000000,2018-07-12,Room in Cool Deco Apartment in Brunswick,A large air conditioned room with queen spring...,The apartment is Deco/Edwardian in style and h...,A large air conditioned room with queen spring...,This hip area is a crossroads between two grea...,,...,9.0,9.0,f,,t,moderate,t,t,1,1.5
2,12936,https://www.airbnb.com/rooms/12936,20181200000000,2018-07-12,St Kilda 1BR APT+BEACHSIDE+VIEWS+PARKING+WIFI+AC,RIGHT IN THE HEART OF ST KILDA! It doesn't get...,FREE WiFi FREE in-building remote controlled g...,RIGHT IN THE HEART OF ST KILDA! It doesn't get...,A stay at our apartment means you can enjoy so...,First floor apartment with both lift and stair...,...,9.0,9.0,f,,f,strict_14_with_grace_period,f,f,17,0.15
3,15246,https://www.airbnb.com/rooms/15246,20181200000000,2018-07-12,Large private room-close to city,"Comfortable, relaxed house, a home away from ...",The atmosphere is relaxed and easy going. You ...,"Comfortable, relaxed house, a home away from ...","This is a great neighbourhood ‚Äì it is quiet,...",A simple self service breakfast is available ‚...,...,9.0,9.0,f,,f,moderate,f,f,3,0.3
4,16760,https://www.airbnb.com/rooms/16760,20181200000000,2018-07-12,Melbourne BnB near City & Sports,,We offer comfortable accommodation in Inner Me...,We offer comfortable accommodation in Inner Me...,,,...,10.0,9.0,f,,f,moderate,f,f,1,0.74


In [33]:
# create a dataframe with only columns I want
# !NOTE: Drop the rows where transit = NaN, this means we only want valid transit data
keep_col = ['transit', 'price', 'weekly_price', 'monthly_price', 'host_neighborhood', 'street', 'neighborhood', 'city', 'suburb', 'state']
new_df = df[keep_col]
df = new_df[new_df['transit'].notna()] # override the old df
df.head(100)

Unnamed: 0,transit,price,weekly_price,monthly_price,host_neighborhood,street,neighborhood,city,suburb,state
0,"yes the bus 305,309 is exactly two blocks away...",60,,,,"Bulleen, VIC, Australia",Balwyn North,Manningham,Bulleen,VIC
1,easy transport options - the tram is right out...,35,200.0,803.0,Brunswick,"Brunswick East, VIC, Australia",Brunswick,Moreland,Brunswick East,VIC
2,our apartment is located within walking distan...,159,1253.0,4452.0,St Kilda,"St Kilda, VIC, Australia",St Kilda,Port Phillip,St Kilda,VIC
3,public transport is super convenient with a ch...,50,250.0,920.0,Thornbury,"Thornbury, VIC, Australia",Thornbury,Darebin,Thornbury,VIC
5,if you re arriving via the airport tullamarine...,98,540.0,,,"Berwick, VIC, Australia",,Casey,Berwick,VIC
...,...,...,...,...,...,...,...,...,...,...
120,east malvern is an established and well to do ...,50,350.0,975.0,Malvern East,"Malvern East, VIC, Australia",Malvern East,Stonnington,Malvern East,VIC
121,there is a tram stop in front of the apartment...,132,895.0,3763.0,Central Business District,"Melbourne, VIC, Australia",Central Business District,Melbourne,Melbourne,VIC
122,"getting around is easy from st kilda, which is...",246,1967.0,5931.0,St Kilda,"St Kilda, VIC, Australia",St Kilda,Port Phillip,St Kilda,VIC
123,"tram, train or bus - everything is right here.",82,500.0,800.0,South Yarra,"South Yarra, VIC, Australia",South Yarra,Stonnington,South Yarra,VIC


In [23]:
print("The dataset has {} rows and {} columns.".format(*df.shape))

The dataset has 14912 rows and 10 columns.


# Data Cleaning: Remove Non-English transit comments if there is

- first step is to langdetect library to filterout only english 'transit' comments
- second step is to remove all text that has strange letters like 'â€šÃ„Ã' with empty spaces

In [24]:
from langdetect import detect

# this will auto classify languages based on input text
# !NOTE: don't keep running this and the cell below as it will take like 2mins to process
def language_detection(text: str) -> str:
    try:
        return detect(text)
    except:
        return None

In [34]:
%%time
df["language"] = df['transit'].apply(language_detection)
new_df = df

Wall time: 2min 1s


In [2]:
# drop rows where there is no english
new_df = df.drop(df[df.language != 'en'].index)

NameError: name 'df' is not defined

In [1]:
import re

def clean_text(string_in):
    string_in = re.sub("[^a-zA-Z0-9\.\-\/\,\)\(]", " ", str(string_in))  # Replace all non-letters, non-numbers with spaces
    string_in = re.sub(" +", " ", str(string_in))         # Turn all multiple spaces with just one space
    string_in = string_in.lower()                         # turn all into lower-case   
    
    return string_in.strip()

new_df["transit"] = new_df.transit.apply(clean_text)
new_df = new_df[new_df['transit'].notna()]
new_df.to_csv('initial_transit_cleaned.csv', index=False)
print("The dataset has {} rows and {} columns.".format(*df.shape))

NameError: name 'new_df' is not defined

In [41]:
new_df.head(30)

Unnamed: 0,transit,price,weekly_price,monthly_price,host_neighborhood,street,neighborhood,city,suburb,state,language
0,"yes the bus 305,309 is exactly two blocks away...",60,,,,"Bulleen, VIC, Australia",Balwyn North,Manningham,Bulleen,VIC,en
1,easy transport options - the tram is right out...,35,200.0,803.0,Brunswick,"Brunswick East, VIC, Australia",Brunswick,Moreland,Brunswick East,VIC,en
2,our apartment is located within walking distan...,159,1253.0,4452.0,St Kilda,"St Kilda, VIC, Australia",St Kilda,Port Phillip,St Kilda,VIC,en
3,public transport is super convenient with a ch...,50,250.0,920.0,Thornbury,"Thornbury, VIC, Australia",Thornbury,Darebin,Thornbury,VIC,en
5,if you re arriving via the airport tullamarine...,98,540.0,,,"Berwick, VIC, Australia",,Casey,Berwick,VIC,en
6,"8-10 minutes walk to local train station, and ...",50,335.0,1400.0,,"Reservoir, VIC, Australia",,Darebin,Reservoir,VIC,en
7,numerous buses on victoria parade will take yo...,100,,,,"East Melbourne, VIC, Australia",,Melbourne,East Melbourne,VIC,en
8,"monash university, clayton campus is a 10 minu...",98,535.0,,,"Oakleigh East, VIC, Australia",,Monash,Oakleigh East,VIC,en
10,there are two tram lines right outside the apa...,98,800.0,,Richmond,"Richmond, VIC, Australia",Richmond,Yarra,Richmond,VIC,en
11,our apartment is located within walking distan...,190,1743.0,5572.0,St Kilda,"St Kilda, VIC, Australia",St Kilda,Port Phillip,St Kilda,VIC,en
