# Prices and Accessibility

As seen from the WordCloud Nltk analysis, some of the most common words commonly grouped up together includes the words "walk", "minutes", "tram", "train" and "stations". This suggests that accessibility is measured by _walking distance_ and places where _tram_ and _train station_ services are available.

Thus, my initial hypothesis still holds: **_more accessible Airbnbs are more expensive_**

In order to find out how price relates to accessibility, I must assign points to places where its more accessible in some sort of "accessibility score". Here are some ideas I can implement:

- Attach a positivity score based on the number of attractions available in transit
- Attach a transport_options score based on the number of transport options available
- Attach a walking score. The lower the time taken to get to transport, the higher the score **

But first lets clean the data a bit more, by standardising the the word "minute". I want every variation of the that word to be standardised as "minutes".

In [4]:
import numpy as np
import pandas as pd

df = pd.read_csv("initial_transit_cleaned.csv")
df.head(20)

Unnamed: 0,transit,price,weekly_price,monthly_price,host_neighborhood,street,neighborhood,city,suburb,state,language
0,"yes the bus 305,309 is exactly two blocks away...",60,,,,"Bulleen, VIC, Australia",Balwyn North,Manningham,Bulleen,VIC,en
1,easy transport options - the tram is right out...,35,200.0,803.0,Brunswick,"Brunswick East, VIC, Australia",Brunswick,Moreland,Brunswick East,VIC,en
2,our apartment is located within walking distan...,159,1253.0,4452.0,St Kilda,"St Kilda, VIC, Australia",St Kilda,Port Phillip,St Kilda,VIC,en
3,public transport is super convenient with a ch...,50,250.0,920.0,Thornbury,"Thornbury, VIC, Australia",Thornbury,Darebin,Thornbury,VIC,en
4,if you re arriving via the airport tullamarine...,98,540.0,,,"Berwick, VIC, Australia",,Casey,Berwick,VIC,en
5,"8-10 minutes walk to local train station, and ...",50,335.0,1400.0,,"Reservoir, VIC, Australia",,Darebin,Reservoir,VIC,en
6,numerous buses on victoria parade will take yo...,100,,,,"East Melbourne, VIC, Australia",,Melbourne,East Melbourne,VIC,en
7,"monash university, clayton campus is a 10 minu...",98,535.0,,,"Oakleigh East, VIC, Australia",,Monash,Oakleigh East,VIC,en
8,there are two tram lines right outside the apa...,98,800.0,,Richmond,"Richmond, VIC, Australia",Richmond,Yarra,Richmond,VIC,en
9,our apartment is located within walking distan...,190,1743.0,5572.0,St Kilda,"St Kilda, VIC, Australia",St Kilda,Port Phillip,St Kilda,VIC,en


In [7]:
import re

def clean_text(string_in):
    string_in = re.sub("\\s(?:min|mins|minute)(?=\\s|$)|^(?:min|mins|minute)(?=\\s)", "  minutes  ", str(string_in))  # Replace minute variations spaces
    string_in = re.sub("[\-]", " ", str(string_in))       # get rid of all hyphens 
    string_in = re.sub(" +", " ", str(string_in))         # Turn all multiple spaces with just one space
    string_in = string_in.lower()                         # turn all into lower-case   
    
    return string_in.strip()
df["transit"] = df.transit.apply(clean_text)
df = df[df['transit'].notna()]
df.to_csv("initial_transit_cleaned.csv", index=False)
df.head(20)

Unnamed: 0,transit,price,weekly_price,monthly_price,host_neighborhood,street,neighborhood,city,suburb,state,language
0,"yes the bus 305,309 is exactly two blocks away...",60,,,,"Bulleen, VIC, Australia",Balwyn North,Manningham,Bulleen,VIC,en
1,easy transport options the tram is right outside.,35,200.0,803.0,Brunswick,"Brunswick East, VIC, Australia",Brunswick,Moreland,Brunswick East,VIC,en
2,our apartment is located within walking distan...,159,1253.0,4452.0,St Kilda,"St Kilda, VIC, Australia",St Kilda,Port Phillip,St Kilda,VIC,en
3,public transport is super convenient with a ch...,50,250.0,920.0,Thornbury,"Thornbury, VIC, Australia",Thornbury,Darebin,Thornbury,VIC,en
4,if you re arriving via the airport tullamarine...,98,540.0,,,"Berwick, VIC, Australia",,Casey,Berwick,VIC,en
5,"8 10 minutes walk to local train station, and ...",50,335.0,1400.0,,"Reservoir, VIC, Australia",,Darebin,Reservoir,VIC,en
6,numerous buses on victoria parade will take yo...,100,,,,"East Melbourne, VIC, Australia",,Melbourne,East Melbourne,VIC,en
7,"monash university, clayton campus is a 10 minu...",98,535.0,,,"Oakleigh East, VIC, Australia",,Monash,Oakleigh East,VIC,en
8,there are two tram lines right outside the apa...,98,800.0,,Richmond,"Richmond, VIC, Australia",Richmond,Yarra,Richmond,VIC,en
9,our apartment is located within walking distan...,190,1743.0,5572.0,St Kilda,"St Kilda, VIC, Australia",St Kilda,Port Phillip,St Kilda,VIC,en


# Performing Sentiment Analysis with VADER + custom words

Using nltk's VADER, I will add more custom words to the positive indicators.

First I will apply the lexicon for attractions keywords
Then apply that for transport keywords
Lastly, combine the two fields into one

### Attractions

In [38]:
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer

Analyzer = SentimentIntensityAnalyzer()
attractions_map = {
    "cbd": 1.0,
    "city": 1.0,
    "supermarket": 1.0,
    "supermarkets": 1.0,
    "shopping": 2.0,
    "cafes": 2.0,
    "cafe": 2.0,
    "beach": 2.0,
    "beaches": 2.0,
    "docklands": 1.0,
    "dockland": 1.0,
    "university": 1.0,
    "restaurant": 2.0,
    "restaurants": 2.0,
    "sports": 1.0,
    "mall": 2.0,
    "market": 2.0,
    "malls": 2.0,
    "groccery": 2.0,
    "grocceries": 2.0,
    "casino": 2.0,
    "parliment": 1.0,
    "town": 2.0,
    "hotel": 1.0
}

transport_map = {
    "tram": 1.0,
    "trams": 1.0,
    "station": 1.0,
    "stations": 1.0,
    "free": 2.0,
    "airport": 2.0,
    "bus": 1.0,
    "buses": 1.0,
    "car": 1.0,
    "cars": 1.0,
    "parking": 1.0,
    "trains": 1.0,
    "train": 1.0,
    "cab": 1.0,
    "cabs": 1.0,
    "taxi": 1.0,
    "taxis": 1.0,
    "uber": 1.0,
    "drive": 1.0,
    "walk": 1.0,
    "bike": 1.0,
    "bikes": 1.0,
    "walking": 1.0,
    "walks": 1.0,
    "ride": 1.0,
    "skybus": 1.0,
    "railway": 1.0,
    "metro": 1.0

}
Analyzer.lexicon.update(attractions_map) # add to lexicon

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\brend\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [41]:
all(map(Analyzer.lexicon.pop, attractions_map)) # remove from lexicon

True

In [39]:
# getting only the positive score
def positive_score(text):
    positive_value = Analyzer.polarity_scores(text)['pos']
    return positive_value

In [40]:
%%time

df['attractions_score'] = df['transit'].apply(positive_score)
df.head(20)

Wall time: 4.78 s


Unnamed: 0,transit,price,weekly_price,monthly_price,host_neighborhood,street,neighborhood,city,suburb,state,language,attractions_score,transport_score,total_score
6,numerous buses on victoria parade will take yo...,100,,,,"East Melbourne, VIC, Australia",,Melbourne,East Melbourne,VIC,en,0.161,0.322,0.483
10,"yes, minutes to trains, trams and buses but mo...",228,1575.0,5100.0,Central Business District,"Melbourne, VIC, Australia",Central Business District,Melbourne,Melbourne,VIC,en,0.206,0.279,0.443
24,i have a myki available for you. it will have ...,64,,,Central Business District,"Melbourne, VIC, Australia",Central Business District,Melbourne,Melbourne,VIC,en,0.126,0.126,0.252
30,trams at the door. hire a bike. book the airpo...,140,,,Central Business District,"Melbourne, VIC, Australia",Central Business District,Melbourne,Melbourne,VIC,en,0.187,0.375,0.562
44,places to walk to .1min walk to lygon st .8min...,161,1030.0,3400.0,Richmond,"Carlton, VIC, Australia",Carlton,Melbourne,Carlton,VIC,en,0.145,0.276,0.393
49,flinders street station is a few minutes walk ...,220,1400.0,3862.0,Central Business District,"Melbourne, VIC, Australia",Central Business District,Melbourne,Melbourne,VIC,en,0.11,0.326,0.436
53,close and convenient public transport. easy ac...,190,900.0,2500.0,,"Parkville, VIC, Australia",Parkville,Melbourne,Parkville,VIC,en,0.266,0.266,0.532
54,"tram stop almost at the door, and walking dist...",155,910.0,3300.0,North Melbourne,"North Melbourne, VIC, Australia",North Melbourne,Melbourne,North Melbourne,VIC,en,0.0,0.294,0.294
55,the apartment is perfectly located for you to ...,90,,,Parkville,"Parkville, VIC, Australia",Parkville,Melbourne,Parkville,VIC,en,0.213,0.239,0.434
57,there is a tram to the city at end of street 1...,84,,,East Melbourne,"East Melbourne, VIC, Australia",East Melbourne,Melbourne,East Melbourne,VIC,en,0.059,0.216,0.275


### Transport

In [42]:
Analyzer.lexicon.update(transport_map) # add to lexicon

# remember to remove previous dict from lexicon before procceding
def positive_score(text):
    positive_value = Analyzer.polarity_scores(text)['pos']
    return positive_value

In [32]:
all(map(Analyzer.lexicon.pop, transport_map)) # remove from lexicon

True

In [43]:
%%time

df['transport_score'] = df['transit'].apply(positive_score)
df["total_score"] = df["attractions_score"] + df["transport_score"]


Wall time: 5.32 s


In [45]:
df = df.loc[df['city'] == "Melbourne"]
df.to_excel("transit_with_scores.xlsx")

In [46]:
df.head(20)

Unnamed: 0,transit,price,weekly_price,monthly_price,host_neighborhood,street,neighborhood,city,suburb,state,language,attractions_score,transport_score,total_score
6,numerous buses on victoria parade will take yo...,100,,,,"East Melbourne, VIC, Australia",,Melbourne,East Melbourne,VIC,en,0.161,0.322,0.483
10,"yes, minutes to trains, trams and buses but mo...",228,1575.0,5100.0,Central Business District,"Melbourne, VIC, Australia",Central Business District,Melbourne,Melbourne,VIC,en,0.206,0.279,0.485
24,i have a myki available for you. it will have ...,64,,,Central Business District,"Melbourne, VIC, Australia",Central Business District,Melbourne,Melbourne,VIC,en,0.126,0.126,0.252
30,trams at the door. hire a bike. book the airpo...,140,,,Central Business District,"Melbourne, VIC, Australia",Central Business District,Melbourne,Melbourne,VIC,en,0.187,0.375,0.562
44,places to walk to .1min walk to lygon st .8min...,161,1030.0,3400.0,Richmond,"Carlton, VIC, Australia",Carlton,Melbourne,Carlton,VIC,en,0.145,0.276,0.421
49,flinders street station is a few minutes walk ...,220,1400.0,3862.0,Central Business District,"Melbourne, VIC, Australia",Central Business District,Melbourne,Melbourne,VIC,en,0.11,0.326,0.436
53,close and convenient public transport. easy ac...,190,900.0,2500.0,,"Parkville, VIC, Australia",Parkville,Melbourne,Parkville,VIC,en,0.266,0.266,0.532
54,"tram stop almost at the door, and walking dist...",155,910.0,3300.0,North Melbourne,"North Melbourne, VIC, Australia",North Melbourne,Melbourne,North Melbourne,VIC,en,0.0,0.294,0.294
55,the apartment is perfectly located for you to ...,90,,,Parkville,"Parkville, VIC, Australia",Parkville,Melbourne,Parkville,VIC,en,0.213,0.239,0.452
57,there is a tram to the city at end of street 1...,84,,,East Melbourne,"East Melbourne, VIC, Australia",East Melbourne,Melbourne,East Melbourne,VIC,en,0.059,0.216,0.275
