# Applied Data Science Capstone Project #

## Business Problem ##

A group of medical professionals seek to invest in a rejuvenation clinic that provides infusion-based therapy to clients. Their target clienteles are young, professional women in Melbourne, Australia. The investors seek to find a location where the demographics include a large population of young professional women who live in suburbs associated with high-income status. The location must be within 800 metres of a train station. The investors would also like to know the location statistics of current successful clinics, most notably, popular venues according to social media. The venue location statistics from these successful clinics will be looked upon when selecting the final site location.


## Data ##

The data for this problem will come from a variety of sources. Suburbs across Melbourne will be classed as ‘high-income’ according to Australian Census data released from the Australian Taxation Office. The top 10 postcodes who have the highest average taxable income in Victoria will be selected from the 2011/2012, 2012/2013, 2013/2014, 2014/2015, 2015/2016 financial years, respectively. From there, data from the Australian Bureau of Statistics will determine which suburbs have the highest population of young, professional women; that is, women who are 20<34 years of age, and have an income > $1, 250 per week. This data will be used to construct a cluster map to determine possible site locations.

Using Foursquare, venue location data from current successful anti-aging clinics will then be analysed to determine the most common venues associated with these successful clinics. Each potential site location will then be ranked according to how ‘associated’ each suburb is to the venue location data. The suburb with the most associated venues will be the site location for the rejuvenation clinic.



# Methodology #

Methodology
This report consists of four sections. The first section involved downloading and pre-processing the required datasets. After determining top ten postcodes, only those postcodes with a population of >1, 000 20-34-year-olds were selected for further analysis. Part 2 involved creating a cluster map of Melbourne with potential site locations for the rejuvenation clinic. Part 3 involved attaining venue location data from current successful anti-aging clinics. A google maps query of 'anti-aging clinics' in Melbourne was conducted and only those with a 5-star customer rating were further analysed. Analysis involved a Foursquare API search of most popular venues associated with the successful clinics. Part 4 involved searching for an ideal location for the clinic. This was done by adding parameters from the previous section to search query. The suburb/postcode with the most associated venues in the parameter search was the suburb/postcode for site location. The parameters were location site must be within 800 metres of a cafe, gym/fitness centre, bookstore, grocery store and train station. 
 

# Part 1: Downloading Dataset and Pre-processing #

This section involves downloading and preprocessing the required datasets. After determining top ten postcodes, only those postcodes with a population of >1, 000 20-34 year-olds will be selected for further analysis. 

In [1]:
#Import required Libraries
import pandas as pd
import numpy as np
!pip install pgeocode
import pgeocode
from geopy.geocoders import Nominatim
from IPython.display import Image
from IPython.core.display import HTML
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import json
import requests
from pandas.io.json import json_normalize
from folium import plugins
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
print('Libraries Imported!')

Collecting pgeocode
  Downloading https://files.pythonhosted.org/packages/40/32/477ae060daf5a54a26caeb7c63901bb8c017c70fc8888c3073e29a11982e/pgeocode-0.1.1-py2.py3-none-any.whl
Installing collected packages: pgeocode
Successfully installed pgeocode-0.1.1
Libraries Imported!


In [2]:
#Import datasets

# Assign spreadsheet filename to `file'
file1 = 'taxstats2012individual07topandbottom10postcodesbystateterritory.xlsx' #2011/2012 financial year
file2 = 'taxstats2013individual07topandbottom10postcodesbystate.xlsx' #2012/2013 financial year
file3 = 'taxstats2014individual07topandbottom10postcodes.xlsx' #2013/2014 financial year
file4 = 'taxstats2015individual07topandbottom10postcodesstateterritory.xlsx' ##2014/2015 financial year
file5 = 'taxstats2016individual07topandbottom10postcodesstateterritory.xlsx' #2015/2016 financial year

# Load spreadsheets
xl1 = pd.ExcelFile(file1)
xl2 = pd.ExcelFile(file2)
xl3 = pd.ExcelFile(file3)
xl4 = pd.ExcelFile(file4)
xl5 = pd.ExcelFile(file5)

# Print the sheet names
print(xl1.sheet_names), (xl2.sheet_names), (xl3.sheet_names), (xl4.sheet_names), (xl5.sheet_names)

['Individuals Tax Title & Notes', 'Aus', 'NSW', 'VIC', 'QLD', 'SA', 'WA', 'TAS', 'NT', 'ACT']


(None,
 ['Notes', 'Aus', 'ACT', 'NSW', 'NT', 'QLD', 'SA', 'TAS', 'VIC', 'WA'],
 ['Notes', 'Aus', 'ACT', 'NSW', 'NT', 'QLD', 'SA', 'TAS', 'VIC', 'WA'],
 ['Notes', 'Aus', 'ACT', 'NSW', 'NT', 'QLD', 'SA', 'TAS', 'VIC', 'WA'],
 ['Notes',
  'Individuals Table 7A',
  'Individuals Table 7B',
  'Individuals Table 7C'])

In [3]:
# Select the 'VIC' sheet in 2012/2013 data to construct a dataframe
df1 = pd.read_excel('taxstats2012individual07topandbottom10postcodesbystateterritory.xlsx') #Load 2012/2013 data
df1 = xl1.parse('VIC') #Extract only Victorian Postcodes
#Rename columns
df1 = df1.rename(columns={df1.columns[0]: "Postcode", df1.columns[2]: "Suburb" })
df2012 = df1.loc[2:11, ['Postcode', 'Suburb']]
#Select only the top 10
print(df2012)
df2012.shape

   Postcode                           Suburb
2      3142                HAWKSBURN, TOORAK
3      3944                          PORTSEA
4      3186  BRIGHTON, BRIGHTON NORTH, DENDY
5      3126      CAMBERWELL EAST, CANTERBURY
6      3206         ALBERT PARK, MIDDLE PARK
7      3144  KOOYONG, MALVERN, MALVERN NORTH
8      3002                   EAST MELBOURNE
9      3929                         FLINDERS
10     3143         ARMADALE, ARMADALE NORTH
11     3101                      COTHAM, KEW


(10, 2)

In [4]:
#Do the same for the remainder years

#2012/2013
df2 = pd.read_excel('taxstats2013individual07topandbottom10postcodesbystate.xlsx') #Load 2012/2013
df2 = xl2.parse('VIC') #Extract only Victorian Postcodes
df2 = df2.rename(columns={df2.columns[0]: "Postcode", df2.columns[2]: "Suburb" }) #Rename columns
df2013 = df2.loc[3:12, ['Postcode', 'Suburb']] #Select only the top 10
#print(df2013)

#2013/2014
df3 = pd.read_excel('taxstats2014individual07topandbottom10postcodes.xlsx') #Load 2013/2014 data
df3 = xl3.parse('VIC') #Extract only Victorian Postcodes
df3 = df3.rename(columns={df3.columns[0]: "Postcode", df3.columns[2]: "Suburb" }) #Rename columns
df2014 = df3.loc[3:12, ['Postcode', 'Suburb']] #Select only the top 10
#print(df2014)

#2014/2015
df4 = pd.read_excel('taxstats2015individual07topandbottom10postcodesstateterritory.xlsx') #Load 2014/2015 data
df4 = xl4.parse('VIC') #Extract only Victorian Postcodes
df4 = df4.rename(columns={df4.columns[0]: "Postcode", df4.columns[2]: "Suburb" }) #Rename columns
df2015 = df4.loc[3:12, ['Postcode', 'Suburb']] #Select only the top 10
#print(df2015)

#2015/2016
df5 = pd.read_excel('taxstats2015individual07topandbottom10postcodesstateterritory.xlsx') #Load 2015/2016 data
df5 = xl5.parse('Individuals Table 7B') #Extract only Victorian Postcodes
df5 = df5.rename(columns={df5.columns[3]: "Postcode", df5.columns[4]: "Suburb" }) #Rename columns
df2016 = df5.loc[122:131, ['Postcode', 'Suburb']] #Select only the top 10
#print(df2016)

In [5]:
#Merge datsets
df_final = pd.concat([df2012,df2013,df2014,df2015,df2016]).drop_duplicates().reset_index(drop=True)
print(df_final)
df_final.shape

   Postcode                                             Suburb
0      3142                                  HAWKSBURN, TOORAK
1      3944                                            PORTSEA
2      3186                    BRIGHTON, BRIGHTON NORTH, DENDY
3      3126                        CAMBERWELL EAST, CANTERBURY
4      3206                           ALBERT PARK, MIDDLE PARK
5      3144                    KOOYONG, MALVERN, MALVERN NORTH
6      3002                                     EAST MELBOURNE
7      3929                                           FLINDERS
8      3143                           ARMADALE, ARMADALE NORTH
9      3101                                        COTHAM, KEW
10     3761                                         ST ANDREWS
11     3874  MCLOUGHLINS BEACH, WOODSIDE, WOODSIDE BEACH, W...
12     3928                                         MAIN RIDGE
13     3141                                        SOUTH YARRA
14     3813                               TYNONG, TYNON

(15, 2)

## Attaining Geocoordinates of Postcodes ##

In [6]:
#Find latitude and longitude values for each postcode

#nomi = pgeocode.Nominatim('au')
#nomi.query_postal_code("3142")

#nomi = pgeocode.Nominatim('au')
#nomi.query_postal_code("3944")

#nomi = pgeocode.Nominatim('au')
#nomi.query_postal_code("3186")

#nomi = pgeocode.Nominatim('au')
#nomi.query_postal_code("3126")

#nomi = pgeocode.Nominatim('au')
#nomi.query_postal_code("3206")
    
#nomi = pgeocode.Nominatim('au')
#nomi.query_postal_code("3144")

#nomi = pgeocode.Nominatim('au')
#nomi.query_postal_code("3002")

#nomi = pgeocode.Nominatim('au')
#nomi.query_postal_code("3929")

#nomi = pgeocode.Nominatim('au')
#nomi.query_postal_code("3143")

#nomi = pgeocode.Nominatim('au')
#nomi.query_postal_code("3101")

#nomi = pgeocode.Nominatim('au')
#nomi.query_postal_code("3761")

#nomi = pgeocode.Nominatim('au')
#nomi.query_postal_code("3874")

#nomi = pgeocode.Nominatim('au')
#nomi.query_postal_code("3928")

#nomi = pgeocode.Nominatim('au')
#nomi.query_postal_code("3141")

#nomi = pgeocode.Nominatim('au')
#nomi.query_postal_code("3813")

In [7]:
#Add Latitude and Longitude columns to dataframe
df_final['Latitude'] = '-37.8416', '-38.3167', '-37.9056', '-37.8247', '-37.8333', '-37.8399', '-38.4739', '-37.85', '-37.8064', ' -37.5782', '-38.5238', '-38.3973', '-38.3973', '-37.8333', '-38.0834'
df_final['Longitude'] = '145.016', '144.717', '145.008', '145.078', '144.981', '145.032', '145.019', '145.027', '145.031', '145.307', '146.804', '144.972', '144.972', '144.983', '145.621'
df_final

Unnamed: 0,Postcode,Suburb,Latitude,Longitude
0,3142,"HAWKSBURN, TOORAK",-37.8416,145.016
1,3944,PORTSEA,-38.3167,144.717
2,3186,"BRIGHTON, BRIGHTON NORTH, DENDY",-37.9056,145.008
3,3126,"CAMBERWELL EAST, CANTERBURY",-37.8247,145.078
4,3206,"ALBERT PARK, MIDDLE PARK",-37.8333,144.981
5,3144,"KOOYONG, MALVERN, MALVERN NORTH",-37.8399,145.032
6,3002,EAST MELBOURNE,-38.4739,145.019
7,3929,FLINDERS,-37.85,145.027
8,3143,"ARMADALE, ARMADALE NORTH",-37.8064,145.031
9,3101,"COTHAM, KEW",-37.5782,145.307


Downloading data from Australian Bereau of Statistics and selecting for populations >1,000 20-34 year olds

In [8]:
#Demographics were extracted from Australian 2016 census data. Target was number of persons aged between 20-34 per postcode

#3142: http://www.censusdata.abs.gov.au/CensusOutput/copsub2016.NSF/All%20docs%20by%20catNo/2016~Community%20Profile~POA3142/$File/GCP_POA3142.zip?OpenElement
#3944: http://www.censusdata.abs.gov.au/CensusOutput/copsub2016.NSF/All%20docs%20by%20catNo/2016~Community%20Profile~POA3142/$File/GCP_POA3944.zip?OpenElement
#3186: http://www.censusdata.abs.gov.au/CensusOutput/copsub2016.NSF/All%20docs%20by%20catNo/2016~Community%20Profile~POA3186/$File/GCP_POA3186.zip?OpenElement
#3126: http://www.censusdata.abs.gov.au/CensusOutput/copsub2016.NSF/All%20docs%20by%20catNo/2016~Community%20Profile~POA3126/$File/GCP_POA3126.zip?OpenElement
#3206: http://www.censusdata.abs.gov.au/CensusOutput/copsub2016.NSF/All%20docs%20by%20catNo/2016~Community%20Profile~POA3206/$File/GCP_POA3206.zip?OpenElement
#3002: http://www.censusdata.abs.gov.au/CensusOutput/copsub2016.NSF/All%20docs%20by%20catNo/2016~Community%20Profile~POA3002/$File/GCP_POA3002.zip?OpenElement
#3929: http://www.censusdata.abs.gov.au/CensusOutput/copsub2016.NSF/All%20docs%20by%20catNo/2016~Community%20Profile~POA3929/$File/GCP_POA3929.zip?OpenElement
#3143: http://www.censusdata.abs.gov.au/CensusOutput/copsub2016.NSF/All%20docs%20by%20catNo/2016~Community%20Profile~POA3143/$File/GCP_POA3143.zip?OpenElement
#3101: http://www.censusdata.abs.gov.au/CensusOutput/copsub2016.NSF/All%20docs%20by%20catNo/2016~Community%20Profile~POA3101/$File/GCP_POA3101.zip?OpenElement
#3761: http://www.censusdata.abs.gov.au/CensusOutput/copsub2016.NSF/All%20docs%20by%20catNo/2016~Community%20Profile~POA3761/$File/GCP_POA3761.zip?OpenElement
#3874: http://www.censusdata.abs.gov.au/CensusOutput/copsub2016.NSF/All%20docs%20by%20catNo/2016~Community%20Profile~POA3874/$File/GCP_POA3874.zip?OpenElement
#3928: http://www.censusdata.abs.gov.au/CensusOutput/copsub2016.NSF/All%20docs%20by%20catNo/2016~Community%20Profile~POA3928/$File/GCP_POA3928.zip?OpenElement
#3141: http://www.censusdata.abs.gov.au/CensusOutput/copsub2016.NSF/All%20docs%20by%20catNo/2016~Community%20Profile~POA3141/$File/GCP_POA3141.zip?OpenElement
#3813: http://www.censusdata.abs.gov.au/CensusOutput/copsub2016.NSF/All%20docs%20by%20catNo/2016~Community%20Profile~POA3813/$File/GCP_POA3813.zip?OpenElement
#Assign population data
df_final_demo = df_final.assign(n20_34 = [2997,16,3196,1272,1663,2139,1750,59,2545,5319,200,44,29,11723,119])
df_final_demo

Unnamed: 0,Postcode,Suburb,Latitude,Longitude,n20_34
0,3142,"HAWKSBURN, TOORAK",-37.8416,145.016,2997
1,3944,PORTSEA,-38.3167,144.717,16
2,3186,"BRIGHTON, BRIGHTON NORTH, DENDY",-37.9056,145.008,3196
3,3126,"CAMBERWELL EAST, CANTERBURY",-37.8247,145.078,1272
4,3206,"ALBERT PARK, MIDDLE PARK",-37.8333,144.981,1663
5,3144,"KOOYONG, MALVERN, MALVERN NORTH",-37.8399,145.032,2139
6,3002,EAST MELBOURNE,-38.4739,145.019,1750
7,3929,FLINDERS,-37.85,145.027,59
8,3143,"ARMADALE, ARMADALE NORTH",-37.8064,145.031,2545
9,3101,"COTHAM, KEW",-37.5782,145.307,5319


In [9]:
#Delete suburbs with n20_34 populations <1000
DEM_df = df_final_demo.drop(df_final_demo.index[[1, 7, 10, 11, 12, 14]])
DEM_df

Unnamed: 0,Postcode,Suburb,Latitude,Longitude,n20_34
0,3142,"HAWKSBURN, TOORAK",-37.8416,145.016,2997
2,3186,"BRIGHTON, BRIGHTON NORTH, DENDY",-37.9056,145.008,3196
3,3126,"CAMBERWELL EAST, CANTERBURY",-37.8247,145.078,1272
4,3206,"ALBERT PARK, MIDDLE PARK",-37.8333,144.981,1663
5,3144,"KOOYONG, MALVERN, MALVERN NORTH",-37.8399,145.032,2139
6,3002,EAST MELBOURNE,-38.4739,145.019,1750
8,3143,"ARMADALE, ARMADALE NORTH",-37.8064,145.031,2545
9,3101,"COTHAM, KEW",-37.5782,145.307,5319
13,3141,SOUTH YARRA,-37.8333,144.983,11723


This section involves selecting for females >20<34 who earn >$1, 250 per week. 

In [10]:
#Add income column for number of females >20<34 who earn >$1, 250 per week - nwomen_high_income 
df_income = DEM_df.assign(nwomen_high_income = [447,427,128,298,294,371,387,639,2092])
df_income

Unnamed: 0,Postcode,Suburb,Latitude,Longitude,n20_34,nwomen_high_income
0,3142,"HAWKSBURN, TOORAK",-37.8416,145.016,2997,447
2,3186,"BRIGHTON, BRIGHTON NORTH, DENDY",-37.9056,145.008,3196,427
3,3126,"CAMBERWELL EAST, CANTERBURY",-37.8247,145.078,1272,128
4,3206,"ALBERT PARK, MIDDLE PARK",-37.8333,144.981,1663,298
5,3144,"KOOYONG, MALVERN, MALVERN NORTH",-37.8399,145.032,2139,294
6,3002,EAST MELBOURNE,-38.4739,145.019,1750,371
8,3143,"ARMADALE, ARMADALE NORTH",-37.8064,145.031,2545,387
9,3101,"COTHAM, KEW",-37.5782,145.307,5319,639
13,3141,SOUTH YARRA,-37.8333,144.983,11723,2092


In [11]:
#Attain info on dataframe
df_income.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9 entries, 0 to 13
Data columns (total 6 columns):
Postcode              9 non-null object
Suburb                9 non-null object
Latitude              9 non-null object
Longitude             9 non-null object
n20_34                9 non-null int64
nwomen_high_income    9 non-null int64
dtypes: int64(2), object(4)
memory usage: 504.0+ bytes


In [12]:
#Make columns as numeric for further analysis
df_income[["Postcode", "Latitude", "Longitude"]] = DEM_df[["Postcode", "Latitude", "Longitude"]].apply(pd.to_numeric)
df_income.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9 entries, 0 to 13
Data columns (total 6 columns):
Postcode              9 non-null int64
Suburb                9 non-null object
Latitude              9 non-null float64
Longitude             9 non-null float64
n20_34                9 non-null int64
nwomen_high_income    9 non-null int64
dtypes: float64(2), int64(3), object(1)
memory usage: 504.0+ bytes


# Part 2: Create a Cluster Map of Possible Location Sites #

A map of Melbourne will be created with the clusters as potential site locations. The map will be used in section 4.

In [13]:
#Attain Latitude and Longitude of Melbourne
address = 'Melbourne, Victoria, Australia'
geolocator = Nominatim(user_agent="MEL_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The latitude and longitide of Melbourne is {}, {}, respectively.'.format(latitude, longitude))

The latitude and longitide of Melbourne is -37.8142176, 144.9631608, respectively.


In [14]:
#Creat a map of Melbourne with a zoom of 11 
MEL_map = folium.Map(location=[-37.8142176, 144.9631608], zoom_start=11)

# display world map
MEL_map

In [15]:
#Create a map according to feature group (nwomen_high_income)
#Create feature group to graph
nwomen_high_income = folium.map.FeatureGroup()

# Add latitude and longitude to feature group
for lat, lng, in zip(df_income.Latitude, df_income.Longitude):
    nwomen_high_income.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=10, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add nwomen_high_income to map
MEL_map.add_child(nwomen_high_income)

In [16]:
# add pop-up text to each marker on the map

latitudes = list(df_income.Latitude)
longitudes = list(df_income.Longitude)
labels = list(df_income.Suburb)

for lat, lng, label in zip(latitudes, longitudes, labels):
    folium.Marker([lat, lng], popup=label).add_to(MEL_map)    
    
# add nwomen_high_income to map
MEL_map.add_child(nwomen_high_income)

# Part3: Attaining Venue Location Statistics #

This section involves attaining the venue location data of current successful anti-aging clinics. A google maps  query of 'anti-aging clinics' in Melbourne will be conducted and only those with a 5 star customer rating will be further analysed. Analysis will involve a Foraquare API search of most popular venues associated with the successful clinics. 

In [17]:
#Foursquare credentials - will be deleted prior to online publication
CLIENT_ID = 'QPENQ4U03JYRKTBKH3FSCP5W5EK3DWSAP0ZBV3NK4HRZQACO' # Foursquare ID
CLIENT_SECRET = 'EKZUCEE3YG5FM5WWSYMTXCMEOMVTQHJMMSUMEMFPL3DY4LMK' # Foursquare Secret
VERSION = '20180605' # API version

In [18]:
#Google search (anti-aging clinc five star) Latitude and longitude

google_search = {'Venue Number': [1, 2, 3, 4, 5], 'Postcode': [3149, 3127, 3144, 3126, 3123], 'Latitude': [-37.8850259, -37.8236321, -37.8522658, -37.8241399, -37.8240792], 'Longitude': [145.1298489, 145.0952924, 145.0345854
, 145.0788732, 145.0483116]}
df_google = pd.DataFrame(data=google_search)
df_google


Unnamed: 0,Venue Number,Postcode,Latitude,Longitude
0,1,3149,-37.885026,145.129849
1,2,3127,-37.823632,145.095292
2,3,3144,-37.852266,145.034585
3,4,3126,-37.82414,145.078873
4,5,3123,-37.824079,145.048312


In [19]:
#Get location data of venues

Venue_1_Latitude = df_google.loc[0, 'Latitude'] 
Venue_1_Longitude = df_google.loc[0, 'Longitude'] 

Venue_number = df_google.loc[0, 'Venue Number'] 

print('Latitude and longitude values of Venue Number {} are {} and {}, respectively.'.format(Venue_number, 
                                                               Venue_1_Latitude, 
                                                               Venue_1_Longitude))

Latitude and longitude values of Venue Number 1 are -37.8850259 and 145.1298489, respectively.


## List top 10 venue Categories within a 800 metre radius ##

This section involves using Foursquare API to search for the top 10 venue categories within a 800 metre radius of each successful clinic. This will be done by first selecting the top 100 venues, grouping them by postcode, performing a count, and then determining top 10 venues for each postcode.

In [20]:
#Top 100 venues associated with each successful clinic (venue number)
radius = 800
LIMIT = 100
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    Venue_1_Latitude, 
    Venue_1_Longitude, 
    radius,
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=QPENQ4U03JYRKTBKH3FSCP5W5EK3DWSAP0ZBV3NK4HRZQACO&client_secret=EKZUCEE3YG5FM5WWSYMTXCMEOMVTQHJMMSUMEMFPL3DY4LMK&v=20180605&ll=-37.8850259,145.1298489&radius=800&limit=100'

In [21]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ca4210cdb04f507f7833c19'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'totalResults': 8,
  'suggestedBounds': {'ne': {'lat': -37.8778258928, 'lng': 145.13895452323203},
   'sw': {'lat': -37.89222590720001, 'lng': 145.120743276768}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4beb65770acf76b0a9883cc8',
       'name': 'Subway',
       'location': {'address': '317 Stephensons Rd.',
        'crossStreet': 'btwn Virginia & Winbourne',
        'lat': -37.87793330883979,
        'lng': 145.12858022744646,
        'labeledLatLngs': [{'la

In [22]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [23]:
#Create dataframe

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Subway,Sandwich Place,-37.877933,145.12858
1,Marisa,Pizza Place,-37.882939,145.127364
2,Mount Waverley Fish & Chips,Fish & Chips Shop,-37.882794,145.12752
3,Woolworths,Supermarket,-37.878312,145.127951
4,Scotchmans Creek,Park,-37.884214,145.13639


In [24]:
#Get data on all postcodes/suburbs

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(
                CLIENT_ID, 
                CLIENT_SECRET, 
                lat, 
                lng, 
                VERSION, 
                radius, 
                LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postcode', 
                  'Postcode Latitude', 
                  'Postocde Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    return(nearby_venues)

In [25]:
google_venues = getNearbyVenues(names=df_google['Postcode'],
                                   latitudes=df_google['Latitude'],
                                   longitudes=df_google['Longitude']
                                  )

3149
3127
3144
3126
3123


In [26]:
print(google_venues.shape)
google_venues.head()

(45, 7)


Unnamed: 0,Postcode,Postcode Latitude,Postocde Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,3149,-37.885026,145.129849,Mt Waverley Shopping Centre,-37.884601,145.126464,Department Store
1,3149,-37.885026,145.129849,Marisa,-37.882939,145.127364,Pizza Place
2,3149,-37.885026,145.129849,Mount Waverley Fish & Chips,-37.882794,145.12752,Fish & Chips Shop
3,3149,-37.885026,145.129849,STA Travel,-37.885613,145.125904,Tourist Information Center
4,3127,-37.823632,145.095292,Burger Burger,-37.823732,145.097748,Burger Joint


In [27]:
# Create a count to determine the venue location with the most venues
google_venues.groupby('Postcode').count()

Unnamed: 0_level_0,Postcode Latitude,Postocde Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Postcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
3123,12,12,12,12,12,12
3126,13,13,13,13,13,13
3127,6,6,6,6,6,6
3144,10,10,10,10,10,10
3149,4,4,4,4,4,4


In [28]:
#How many unique categories are there within the potential location sites
print('There are {} uniques categories.'.format(len(google_venues['Venue Category'].unique())))

There are 28 uniques categories.


##### Analyze category data about each location ####

In [29]:
# one hot encoding
google_onehot = pd.get_dummies(google_venues[['Venue Category']], prefix="", prefix_sep="")

# add venue number column back to dataframe
google_onehot['Postcode'] = google_venues['Postcode'] 

# move neighborhood column to the first column
fixed_columns = [google_onehot.columns[-1]] + list(google_onehot.columns[:-1])
google_onehot = google_onehot[fixed_columns]
google_onehot.head()

Unnamed: 0,Postcode,Bakery,Bar,Beer Garden,Bookstore,Burger Joint,Café,Dance Studio,Deli / Bodega,Department Store,...,Pet Store,Pizza Place,Pub,Shopping Mall,Thai Restaurant,Tourist Information Center,Train Station,Vietnamese Restaurant,Wine Shop,Yoga Studio
0,3149,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
1,3149,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
2,3149,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,3149,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
4,3127,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [30]:
google_onehot.shape

(45, 29)

In [31]:
# Group category frequency via location
google_grouped = google_onehot.groupby('Postcode').sum().reset_index()
google_grouped.head(6)

Unnamed: 0,Postcode,Bakery,Bar,Beer Garden,Bookstore,Burger Joint,Café,Dance Studio,Deli / Bodega,Department Store,...,Pet Store,Pizza Place,Pub,Shopping Mall,Thai Restaurant,Tourist Information Center,Train Station,Vietnamese Restaurant,Wine Shop,Yoga Studio
0,3123,0,1,0,0,0,4,1,0,0,...,0,0,1,0,0,0,1,1,0,1
1,3126,1,0,0,1,0,4,0,1,0,...,0,0,0,1,1,0,1,0,0,0
2,3127,0,0,0,0,1,2,0,0,0,...,0,0,0,0,0,0,0,0,1,0
3,3144,0,0,1,0,0,4,0,0,0,...,1,0,1,0,1,0,0,0,0,0
4,3149,0,0,0,0,0,0,0,0,1,...,0,1,0,0,0,1,0,0,0,0


In [32]:
# Display top venues in each location site

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [33]:
#Pick number of venues
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postcode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# construct a new dataframe
location_venues_sorted = pd.DataFrame(columns=columns)
location_venues_sorted['Postcode'] = google_grouped['Postcode']

for ind in np.arange(google_grouped.shape[0]):
    location_venues_sorted.iloc[ind, 1:] = return_most_common_venues(google_grouped.iloc[ind, :], num_top_venues)
location_venues_sorted

Unnamed: 0,Postcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,3123,Café,Yoga Studio,Pub,Bar,Dance Studio,Park,Gym,Vietnamese Restaurant,Train Station,Shopping Mall
1,3126,Café,Bakery,Bookstore,Grocery Store,Knitting Store,Deli / Bodega,Gourmet Shop,Shopping Mall,Thai Restaurant,Train Station
2,3127,Café,Wine Shop,Burger Joint,Food,Grocery Store,Yoga Studio,Bar,Beer Garden,Bookstore,Dance Studio
3,3144,Café,Beer Garden,Thai Restaurant,Pub,Pet Store,Liquor Store,Gym / Fitness Center,Yoga Studio,Food,Bar
4,3149,Tourist Information Center,Pizza Place,Department Store,Fish & Chips Shop,Yoga Studio,Grocery Store,Bar,Beer Garden,Bookstore,Burger Joint


As can be seen, the 1st most common venue for 4/5 successful clinics is a cafe within a 800 metre radius. Thus, as a prerequisite to site location, there must be a cafe within 500 metres of the site location. The next step will be to get the counts of each category. 

In [34]:
location_venues_sorted.stack().value_counts()

Bar                           4
Yoga Studio                   4
Café                          4
Beer Garden                   3
Grocery Store                 3
Bookstore                     3
Food                          2
Thai Restaurant               2
Pub                           2
Train Station                 2
Shopping Mall                 2
Burger Joint                  2
Dance Studio                  2
Knitting Store                1
Department Store              1
Gym / Fitness Center          1
3127                          1
3126                          1
Fish & Chips Shop             1
Deli / Bodega                 1
3144                          1
Pet Store                     1
3123                          1
Park                          1
Gym                           1
3149                          1
Wine Shop                     1
Pizza Place                   1
Gourmet Shop                  1
Liquor Store                  1
Tourist Information Center    1
Bakery  

Results show there are at least 4 Bars and Yoga studios located within a 500 metre radius in 4/5 locations. Also, bakery is present in 3/4 locations. These parameters will be used to select for site location. 

# Part 4: Finding the Best Location for the Clinic #

This next section involves searching for an ideal location for the clinic. This will be done by adding parameters from the previous section to search query. The suburb/postcode with the most associated venues in the parameter search will be the suburb/postcode for site location. The parameters are location site must be within 500 metres of a cafe, yoga studio, and bakery. The location must also be within 800 metres of a train station.

In [35]:
#Dataframe for site location
df_income

Unnamed: 0,Postcode,Suburb,Latitude,Longitude,n20_34,nwomen_high_income
0,3142,"HAWKSBURN, TOORAK",-37.8416,145.016,2997,447
2,3186,"BRIGHTON, BRIGHTON NORTH, DENDY",-37.9056,145.008,3196,427
3,3126,"CAMBERWELL EAST, CANTERBURY",-37.8247,145.078,1272,128
4,3206,"ALBERT PARK, MIDDLE PARK",-37.8333,144.981,1663,298
5,3144,"KOOYONG, MALVERN, MALVERN NORTH",-37.8399,145.032,2139,294
6,3002,EAST MELBOURNE,-38.4739,145.019,1750,371
8,3143,"ARMADALE, ARMADALE NORTH",-37.8064,145.031,2545,387
9,3101,"COTHAM, KEW",-37.5782,145.307,5319,639
13,3141,SOUTH YARRA,-37.8333,144.983,11723,2092


In [36]:
#Explore 1st Postcode
df_income.loc[0, 'Postcode']

3142

In [37]:
#Explore 1st Postcode
P_3142_latitude = df_income.loc[0, 'Latitude'] # neighborhood latitude value
P_3142_longitude = df_income.loc[0, 'Longitude'] # neighborhood longitude value

Postcode = df_income.loc[0, 'Postcode'] # neighborhood postcode

print('Latitude and longitude values of {} are {}, {}.'.format(Postcode, 
                                                               P_3142_latitude, 
                                                               P_3142_longitude))

Latitude and longitude values of 3142 are -37.8416, 145.016.


In [38]:
# Get top 100 venues within 500 metre radius

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 800 # define radius
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    P_3142_latitude, 
    P_3142_longitude, 
    radius, 
    LIMIT)
url # display URL



'https://api.foursquare.com/v2/venues/explore?&client_id=QPENQ4U03JYRKTBKH3FSCP5W5EK3DWSAP0ZBV3NK4HRZQACO&client_secret=EKZUCEE3YG5FM5WWSYMTXCMEOMVTQHJMMSUMEMFPL3DY4LMK&v=20180605&ll=-37.8416,145.016&radius=800&limit=100'

In [39]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ca4210edd57977ce67d95c8'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'totalResults': 18,
  'suggestedBounds': {'ne': {'lat': -37.834399992799995,
    'lng': 145.02510025932872},
   'sw': {'lat': -37.848800007200005, 'lng': 145.00689974067126}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '518f104e498e2be9551d4239',
       'name': 'Monkey Bean',
       'location': {'address': '475 Toorak Rd',
        'lat': -37.841024306995365,
        'lng': 145.008769727197,
        'labeledLatLngs': [{'label': 'display',
          'lat': -37.841

In [40]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [41]:
#Create dataframe

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Monkey Bean,Café,-37.841024,145.00877
1,Toorak Foodstore,Deli / Bodega,-37.840917,145.009481
2,Townhouse,Café,-37.841208,145.010046
3,Trak Lounge,Rock Club,-37.840854,145.007779
4,Zanuba Bar & Restaurant,Café,-37.8405,145.01021


In [42]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

18 venues were returned by Foursquare.


In [43]:
#Repeat for all postcodes/suburbs

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postcode',        
                  'Postcode Latitude', 
                  'Postcode Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [44]:
# type your answer here

melbourne_venues = getNearbyVenues(names=df_income['Postcode'],
                                   latitudes=df_income['Latitude'],
                                   longitudes=df_income['Longitude']
                                  )



3142
3186
3126
3206
3144
3002
3143
3101
3141


In [45]:
print(melbourne_venues.shape)
melbourne_venues.head()

(113, 7)


Unnamed: 0,Postcode,Postcode Latitude,Postcode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,3142,-37.8416,145.016,Woolworths,-37.842029,145.017163,Supermarket
1,3142,-37.8416,145.016,Safeway,-37.841784,145.014682,Grocery Store
2,3142,-37.8416,145.016,Amcal Chemist,-37.843432,145.020528,Pharmacy
3,3142,-37.8416,145.016,Toorak Village Pizza,-37.841013,145.010805,Pizza Place
4,3186,-37.9056,145.008,JB Hi-Fi,-37.907418,145.009877,Electronics Store


In [46]:
melbourne_venues.groupby('Postcode').count()

Unnamed: 0_level_0,Postcode Latitude,Postcode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Postcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
3002,9,9,9,9,9,9
3126,13,13,13,13,13,13
3141,22,22,22,22,22,22
3142,4,4,4,4,4,4
3143,25,25,25,25,25,25
3144,8,8,8,8,8,8
3186,18,18,18,18,18,18
3206,14,14,14,14,14,14


In [47]:
print('There are {} uniques categories.'.format(len(melbourne_venues['Venue Category'].unique())))

There are 51 uniques categories.


In [48]:
# one hot encoding
melbourne_onehot = pd.get_dummies(melbourne_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
melbourne_onehot['Postcode'] = melbourne_venues['Postcode'] 

# move neighborhood column to the first column
fixed_columns = [melbourne_onehot.columns[-1]] + list(melbourne_onehot.columns[:-1])
melbourne_onehot = melbourne_onehot[fixed_columns]

melbourne_onehot.head()

Unnamed: 0,Postcode,Athletics & Sports,Australian Restaurant,Bakery,Baseball Field,Bistro,Bookstore,Botanical Garden,Café,Chinese Restaurant,...,Supermarket,Taco Place,Tea Room,Tennis Stadium,Thai Restaurant,Theater,Trail,Train Station,Video Store,Wine Shop
0,3142,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
1,3142,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,3142,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,3142,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,3186,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [49]:
melbourne_onehot.shape

(113, 52)

Select for Cafe, Bakery, yoga studio and restaurant

In [50]:
melbourne_grouped = melbourne_onehot.groupby('Postcode').sum().reset_index()
melbourne_grouped.head(10)

Unnamed: 0,Postcode,Athletics & Sports,Australian Restaurant,Bakery,Baseball Field,Bistro,Bookstore,Botanical Garden,Café,Chinese Restaurant,...,Supermarket,Taco Place,Tea Room,Tennis Stadium,Thai Restaurant,Theater,Trail,Train Station,Video Store,Wine Shop
0,3002,0,1,1,0,0,0,0,3,0,...,0,0,0,0,0,0,0,0,0,0
1,3126,0,0,1,0,0,1,0,4,0,...,0,0,0,0,1,0,0,1,0,0
2,3141,0,2,1,0,1,0,2,1,0,...,0,0,1,0,1,1,1,0,0,0
3,3142,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
4,3143,0,0,0,0,0,1,0,6,1,...,1,0,0,0,1,0,0,0,0,0
5,3144,1,0,0,1,0,0,0,0,0,...,0,0,0,1,0,0,0,1,0,1
6,3186,0,0,0,0,0,0,0,3,0,...,1,1,0,0,0,0,0,1,1,0
7,3206,0,2,0,0,1,0,1,1,0,...,0,0,1,0,1,0,0,0,0,0


In [51]:
print(melbourne_grouped.shape)

(8, 52)


In [52]:
list(melbourne_grouped)

['Postcode',
 'Athletics & Sports',
 'Australian Restaurant',
 'Bakery',
 'Baseball Field',
 'Bistro',
 'Bookstore',
 'Botanical Garden',
 'Café',
 'Chinese Restaurant',
 'Coffee Shop',
 'Convenience Store',
 'Costume Shop',
 'Deli / Bodega',
 'Dessert Shop',
 'Electronics Store',
 'Fast Food Restaurant',
 'French Restaurant',
 'Frozen Yogurt Shop',
 'Garden',
 'Gourmet Shop',
 'Greek Restaurant',
 'Grocery Store',
 'Gym / Fitness Center',
 'Hotel',
 'Indian Restaurant',
 'Italian Restaurant',
 'Japanese Restaurant',
 'Knitting Store',
 'Korean Restaurant',
 'Lake',
 'Light Rail Station',
 'Liquor Store',
 'Movie Theater',
 'Park',
 'Pharmacy',
 'Pizza Place',
 'Restaurant',
 'Sandwich Place',
 'Seafood Restaurant',
 'Shopping Mall',
 'Steakhouse',
 'Supermarket',
 'Taco Place',
 'Tea Room',
 'Tennis Stadium',
 'Thai Restaurant',
 'Theater',
 'Trail',
 'Train Station',
 'Video Store',
 'Wine Shop']

Given the fact there were no Yoga Studios in the query, gym/fitness centre was substituted for the query as most gyms/fitness centres have yoga classes/groups. 

In [53]:
final_df = melbourne_grouped[['Postcode', 'Café', 'Bookstore', 'Train Station', 'Gym / Fitness Center', 'Grocery Store']]

In [54]:
final_df

Unnamed: 0,Postcode,Café,Bookstore,Train Station,Gym / Fitness Center,Grocery Store
0,3002,3,0,0,0,1
1,3126,4,1,1,0,1
2,3141,1,0,0,0,1
3,3142,0,0,0,0,1
4,3143,6,1,0,0,2
5,3144,0,0,1,0,0
6,3186,3,0,1,1,0
7,3206,1,0,0,0,0


In [55]:
location_site = pd.merge(final_df, df_income, how='inner', on=None, left_on=None, right_on=None,
         left_index=False, right_index=False, sort=True,
         suffixes=('_x', '_y'), copy=True, indicator=False,
         validate=None)

In [56]:
location_site.drop(['n20_34'], axis=1)

Unnamed: 0,Postcode,Café,Bookstore,Train Station,Gym / Fitness Center,Grocery Store,Suburb,Latitude,Longitude,nwomen_high_income
0,3002,3,0,0,0,1,EAST MELBOURNE,-38.4739,145.019,371
1,3126,4,1,1,0,1,"CAMBERWELL EAST, CANTERBURY",-37.8247,145.078,128
2,3141,1,0,0,0,1,SOUTH YARRA,-37.8333,144.983,2092
3,3142,0,0,0,0,1,"HAWKSBURN, TOORAK",-37.8416,145.016,447
4,3143,6,1,0,0,2,"ARMADALE, ARMADALE NORTH",-37.8064,145.031,387
5,3144,0,0,1,0,0,"KOOYONG, MALVERN, MALVERN NORTH",-37.8399,145.032,294
6,3186,3,0,1,1,0,"BRIGHTON, BRIGHTON NORTH, DENDY",-37.9056,145.008,427
7,3206,1,0,0,0,0,"ALBERT PARK, MIDDLE PARK",-37.8333,144.981,298


In [57]:
#Drop latitude, longitude and postcodes without train stations
final_location_site = location_site.drop(['Latitude', 'Longitude', 'Train Station', 'n20_34'], axis=1)
final_location_site = final_location_site.drop(final_location_site.index[[0, 2, 3, 4, 7]])
final_location_site

Unnamed: 0,Postcode,Café,Bookstore,Gym / Fitness Center,Grocery Store,Suburb,nwomen_high_income
1,3126,4,1,0,1,"CAMBERWELL EAST, CANTERBURY",128
5,3144,0,0,0,0,"KOOYONG, MALVERN, MALVERN NORTH",294
6,3186,3,0,1,0,"BRIGHTON, BRIGHTON NORTH, DENDY",427


# Results/Discussion #

Results show a total of 15 postcodes qualified for high-income suburbs based on data provided by the Australian Taxation Office. Location data results showed a total of five 5-star anti-aging clinics were available for further analysis. Camberwell East and Canterbury were the best location site for the new clinic. The downside to this is the rather low number of young, professional women compared to other postcodes/suburbs. Brighton, Brighton North and Dendy (3186) are ranked in second place for location site as they have a total of 5 associated venues, but they do have a rather large population of young, professional women compared to Camberwell East and Canterbury. 
Yoga studios were associated with 4/5 successful clinics; however query results returned no yoga studios when searching for ideal location. This was later changed to gym/fitness centre as most gyms/fitness centres contain yoga classes/groups. Only 3186 returned a positive result for gym/fitness centre. The next step would be for the investors to experience firsthand the 3186 and 3126 business districts to find an ideal location site.


# Conclusion #

The purpose of this study was to select a site location for a group of investors specific to their clientele. The location site had to be within 800 metres of a train station and had to have associated venues with current successful clinics. An associated venue search for current successful clinics found cafe, bakery and yoga studios to be most associated with these clinics. Camberwell East and Canterbury were the suburbs with the most associated venues from the successful clinics. A downside to Camberwell East and Canterbury is the number of young, professional women; the lowest of all suburbs.
Limitations of this study include geographic isolation. These results are based in Melbourne, Australia and it is hypothesized different search results will be produced for different geographic locations. Also, these search results are not certain, given sporadic opening and closing of venues. If a site location were to be chosen based on associated venues, these venues could change location/venue anytime.


