# Alcohol and Crime in Neighbourhoods in the Australian Capital Territory
## Part 1 – Background and Problem
### Discussion of Background
Over the last several years in Australia, there has been strong concerns about alcohol and violent crime. The Australian state of New South Wales introduced laws restricting trading hours for businesses supplying alcohol to the public, such as bottle shops and bars. Since the introduction of the laws, there has been a rigorous debate whether there is an association between the supply of alcohol and violent crime. Many businesses have complained there is not and want the laws repealed. The Australian Capital Territory currently does not have New South Wales-style alcohol control laws. The study aims to inform policy and policing decision-making in the Australian Capital Territory to ensure policy and policing decisions have a data-driven basis. The target audience is the Australian Capital Territory Government.

### Description of Problem
Currently, there has been no research (that the author is aware of) examining alcohol and violent crime in the Australian Capital Territory. To address this gap, the present study aims to examine whether neighbourhoods with more venues that supply alcohol have higher levels of violent crime than neighbourhoods with fewer of these venues. This research could support informing policing decisions to prevent and address violent crime. The problem is important to solve to ensure there is objectively informed decisions for alcohol and crime control.

## Part 2 – Dataset and Solution
### Overview of Dataset and Solution
The study will use crime data from the Australian Federal Police, neighbourhood population estimates from the Australian Capital Territory Government and venue data from Foursquare. An OPTICS variable-density DBSCAN-based cluster analysis will be used on both crime per capita figures and venue data to determine whether violent crime and alcohol venues have similar feature clustering. A bruteforce model will be also used to test whether suburbs in the same clusters also appear in the same cluster in each other dataset.

### Description of Crime Data
The Australian Federal Police publish crime data for the Australian Capital Territory at https://www.data.act.gov.au/Justice-Safety-and-Emergency/ACT-Crime-Statistics/2egm-dieb, including features such as:
* counts of types of crime (such as assault, homicide and sexual assault)
* a breakdown of each quarter (such as Q2 Apr-June 2019)
* a breakdown by neighbourhood (Amaroo, Lyneham etc.).

The Python3 geopy.geocoders package will be used to add longitude and latitude values to the above dataset. Only violent crime will be examined. Canberra neighbourhood population estimates will also be used to transform the data to per capita crime rates.

This data has been manually manipulated in Excel to normalise its layout and uploaded again to https://raw.githubusercontent.com/caseyj2/IBM-Data-Science-Capstone/master/Crime%20in%20the%20Australian%20Capital%20Territory%2C%201%20July%202018-30%20June%202019.csv.

The investigation uses crime statistics for the Australian fiscal year that ended 30 June 2019.

### Description of Foursquare Data
Foursquare contains venue data for the Australia Capital Territory. The study will use this data to extract the details of venues that supply alcohol (bars, bottleshops etc.). Examples of the features used will be:
* venue name
* longitude
* latitude.

### Description of Neighbourhood Population Estimates
Population estimates are available from the Australian Capital Territory Government via the API endpoint http://www.data.act.gov.au/kci6-ugxa. The dataset's features are population estimates by:
* neighbourhood
* sex
* age.

For this project, the estimates will be aggregated and only pivoted by neighbourhood, because the study is not examining variances by sex and age.

# Part 3 - Methodology

## Inferential Statistics Used
This study did not use any inferential statistics, because the data available was population-level or estimates. Inferential statistics (t-tests, ANOVAs etc.) generally have assumptions, such as the central limit theorem. The data sources were not able to be validated against these assumptions.

## Machine Learning Techique Used and Why
The study used the unsupervised machine learning technique called OPTICS clustering. This is similar to dbscan, but examines the local density of data to be able to accomodate datasets with clusters of varying density. Reasons why for this approach included:
* upon manual inspection, the data sources showed features with clusters of varying densities
* the venue dataset from Foursquare was dimension and using OPTICS eliminated the need for parameter hypertuning (such as determining the number of clusters etc. in k-means, search radius in DBSCAN etc.)
* OPTICS is in sklearn and a well developed and reliable algorithm.

## Exploratory Data Analysis
### Overview
OPTICS clustering was used on per capita crime data and venue data from Foursquare to extract latent feature clustering in the datasets. Initially, K-means was used; however, the results were not meaningful and so the analysis changed to use OPTICS because of being able to deal with variable density data with outliers. Python code was developed to check if there was common patterns of clustering in the two datasets. No common patterns were found.

### Packages Used
The analysis used the six nonstandard packages below. Standard Python3.7 objects were also used (such as sleep etc.). Versions were set to avoid compatibity issues on future re-runs. A description of each package is below.
* pandas (Version 0.25.3) - pandas provided data analysis and wrangling features.
* sodapy (Version 2.0.0) - Sodapy allowed interfacing more simply with the Australian Capital Territories Government's data endpoints.
* scikit-learn (Version 0.22) - scikit learn provided OPTICS clustering and supporting functions.
* foursquare (Version 1!2019.9.11) - foursquare provided venue data
* geopy (Version 1.20.0) - geopy allowed attributing neighbourhoods to geocodes and vice-versa.

In [1]:
%pip install pandas==0.25.3 sodapy==2.0.0 scikit-learn==0.22 foursquare==1!2019.9.11 geopy==1.20.0

Note: you may need to restart the kernel to use updated packages.


In [2]:
#import packages
import foursquare
import geopy
import numpy as np
import pandas as pd
from os import getenv
from sodapy import Socrata
from sklearn.cluster import OPTICS
from time import sleep
from geopy.geocoders import Nominatim
from sklearn.preprocessing import normalize

### Wrangle Crime Data
For the crime data, a multi-step wrangling process was used. From the source, the data was not suitable for use in pandas, due to an orthodox format. To fix this, I manually re-formatted the data in Excel to be in a useable format. Then, I used pandas to import the data, translate the neighbourhoods to the same set of identifiers as in the other dataset. Neighbourhood population estimates were then used to transfer the data to a per capita basis.

In [3]:
#Read crime data from csv file
raw_crime_data = pd.read_csv('https://raw.githubusercontent.com/caseyj2/IBM-Data-Science-Capstone/master/Crime%20in%20the%20Australian%20Capital%20Territory%2C%201%20July%202018-30%20June%202019.csv')

In [4]:
#Filter for violent crime types
aggregate_crime_data = raw_crime_data[['Neighbourhood','1 Homicide','2a Assault - FV','2b Assault - Non-FV','3 Sexual Assault','4 Other offences against a person','5a Robbery - armed','5b Robbery - other']].copy()

#Convert Neighbourhood column to string and remove leading and trailing whitespaces
aggregate_crime_data.Neighbourhood=aggregate_crime_data.Neighbourhood.apply(str).apply(str.strip)

#Normalise Neighbourhood names to suburb_populations
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='KINGSTON', 'Neighbourhood']='KINGSTON-BARTON'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='BARTON', 'Neighbourhood']='KINGSTON-BARTON' 
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='OAKS ESTATE', 'Neighbourhood']='KOWEN'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='BEARD', 'Neighbourhood']='KOWEN'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='FYSHWICK', 'Neighbourhood']='ACT EAST'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='COOMBS', 'Neighbourhood']='ACT SOUTH WEST'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='DENMAN PROSPECT', 'Neighbourhood']='ACT SOUTH WEST'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='WRIGHT', 'Neighbourhood']='ACT SOUTH WEST'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='MONCRIEFF', 'Neighbourhood']='GUNGAHLIN'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='SYMONSTON', 'Neighbourhood']='KOWEN'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='PIALLIGO', 'Neighbourhood']='KOWEN'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='KENNY', 'Neighbourhood']='GUNGAHLIN'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='BLACK MOUNTAIN', 'Neighbourhood']='BELCONNEN'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='CAPITAL HILL', 'Neighbourhood']='KINGSTON-BARTON'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='DUNTROON', 'Neighbourhood']='CAMPBELL'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='HARMAN', 'Neighbourhood']='KOWEN'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='JACKA', 'Neighbourhood']='GUNGAHLIN'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='KINLYSIDE', 'Neighbourhood']='KOWEN'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='RUSSELL', 'Neighbourhood']='CAMPBELL'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='STROMLO', 'Neighbourhood']='ACT SOUTH WEST'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='TAYLOR', 'Neighbourhood']='HACKETT'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='THARWA', 'Neighbourhood']='BANKS'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='THROSBY', 'Neighbourhood']='GUNGAHLIN'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='URIARRA', 'Neighbourhood']='ACT SOUTH WEST'
aggregate_crime_data.loc[aggregate_crime_data.Neighbourhood=='WILLIAMSDALE', 'Neighbourhood']='BANKS'

#Aggregate (possibly unneccessarily)
aggregate_crime_data=aggregate_crime_data.groupby('Neighbourhood').sum()

#Reset index
aggregate_crime_data=aggregate_crime_data.reset_index()

#Print aggregate_crime_data
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    display(aggregate_crime_data)

Unnamed: 0,Neighbourhood,1 Homicide,2a Assault - FV,2b Assault - Non-FV,3 Sexual Assault,4 Other offences against a person,5a Robbery - armed,5b Robbery - other
0,ACT EAST,1,0,29,2,3,4,1
1,ACT SOUTH WEST,0,22,12,2,5,1,0
2,ACTON,0,5,9,5,1,0,1
3,AINSLIE,0,16,12,26,2,2,0
4,AMAROO,0,16,10,8,2,1,0
5,ARANDA,0,3,6,0,1,0,0
6,BANKS,0,17,6,2,5,1,1
7,BELCONNEN,0,36,115,52,12,13,6
8,BONNER,0,19,0,3,1,0,0
9,BONYTHON,0,12,8,3,3,0,0


In [5]:
#Construct a Socrata query object for the Australian Capital Territory Government's population by suburb estimates. An app token must be available to the process in environment variable POPULATION_ACT_ID.
data_act_client = Socrata("www.data.act.gov.au", app_token=getenv('POPULATION_ACT_ID'))

#Query the population estimates dataset
suburb_populations = data_act_client.get("kci6-ugxa", select="UPPER(suburb) AS Neighbourhood,male_age_0+male_age_1+male_age_2+male_age_3+male_age_4+male_age_5+male_age_6+male_age_7+male_age_8+male_age_9+male_age_10+male_age_11+male_age_12+male_age_13+male_age_14+male_age_15+male_age_16+male_age_17+male_age_18+male_age_19+male_age_20+male_age_21+male_age_22+male_age_23+male_age_24+male_age_25+male_age_26+male_age_27+male_age_28+male_age_29+male_age_30+male_age_31+male_age_32+male_age_33+male_age_34+male_age_35+male_age_36+male_age_37+male_age_38+male_age_39+male_age_40+male_age_41+male_age_42+male_age_43+male_age_44+male_age_45+male_age_46+male_age_47+male_age_48+male_age_49+male_age_50+male_age_51+male_age_52+male_age_53+male_age_54+male_age_55+male_age_56+male_age_57+male_age_58+male_age_59+male_age_60+male_age_61+male_age_62+male_age_63+male_age_64+male_age_65+male_age_66+male_age_67+male_age_68+male_age_69+male_age_70+male_age_71+male_age_72+male_age_73+male_age_74+male_age_75+male_age_76+male_age_77+male_age_78+male_age_79+male_age_80+male_age_81+male_age_82+male_age_83+male_age_84+male_age_85+female_age_0+female_age_1+female_age_2+female_age_3+female_age_4+female_age_5+female_age_6+female_age_7+female_age_8+female_age_9+female_age_10+female_age_11+female_age_12+female_age_13+female_age_14+female_age_15+female_age_16+female_age_17+female_age_18+female_age_19+female_age_20+female_age_21+female_age_22+female_age_23+female_age_24+female_age_25+female_age_26+female_age_27+female_age_28+female_age_29+female_age_30+female_age_31+female_age_32+female_age_33+female_age_34+female_age_35+female_age_36+female_age_37+female_age_38+female_age_39+female_age_40+female_age_41+female_age_42+female_age_43+female_age_44+female_age_45+female_age_46+female_age_47+female_age_48+female_age_49+female_age_50+female_age_51+female_age_52+female_age_53+female_age_54+female_age_55+female_age_56+female_age_57+female_age_58+female_age_59+female_age_60+female_age_61+female_age_62+female_age_63+female_age_64+female_age_65+female_age_66+female_age_67+female_age_68+female_age_69+female_age_70+female_age_71+female_age_72+female_age_73+female_age_74+female_age_75+female_age_76+female_age_77+female_age_78+female_age_79+female_age_80+female_age_81+female_age_82+female_age_83+female_age_84+female_age_85 AS Population", where='year=\'2019-06-30T00:00:00.000\'', content_type='csv')

#Discard Socrata query object
del data_act_client

#Convert results into Pandas dataframe
suburb_populations=pd.DataFrame.from_records(suburb_populations[1::], columns=suburb_populations[0])

#Convert Neighbourhood column to string and remove leading and trailing whitespaces
suburb_populations.Neighbourhood=suburb_populations.Neighbourhood.apply(str).apply(str.strip)

#Convert Population column to integer
suburb_populations.Population=suburb_populations.Population.apply(int)

#Normalise Neighbourhood names to crime_data
suburb_populations.loc[suburb_populations.Neighbourhood=='CIVIC', 'Neighbourhood']='CITY'
suburb_populations.loc[suburb_populations.Neighbourhood=='LAKE BURLEY GRIFFIN', 'Neighbourhood']='CITY'
suburb_populations.loc[suburb_populations.Neighbourhood=='GUNGAHLIN EAST', 'Neighbourhood']='GUNGAHLIN'
suburb_populations.loc[suburb_populations.Neighbourhood=='GUNGAHLIN TC', 'Neighbourhood']='GUNGAHLIN'
suburb_populations.loc[suburb_populations.Neighbourhood=='GUNGAHLIN WEST', 'Neighbourhood']='GUNGAHLIN'
suburb_populations.loc[suburb_populations.Neighbourhood=='MOUNT TAYLOR', 'Neighbourhood']='WESTON'
suburb_populations.loc[suburb_populations.Neighbourhood=='TUGGERANONG', 'Neighbourhood']='ISABELLA PLAINS'
suburb_populations.loc[suburb_populations.Neighbourhood=='TUGGERANONG', 'Neighbourhood']='ISABELLA PLAINS'
suburb_populations.loc[suburb_populations.Neighbourhood=='NAMADGI', 'Neighbourhood']='ACT SOUTH WEST'
suburb_populations.loc[suburb_populations.Neighbourhood=='GOOROMON', 'Neighbourhood']='DUNLOP'

#Aggregate (possibly unneccessarily)
suburb_populations=suburb_populations.groupby('Neighbourhood').sum()

#Reset index
suburb_populations=suburb_populations.reset_index()

#Print crime_data
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    display(suburb_populations)

Unnamed: 0,Neighbourhood,Population
0,ACT EAST,865
1,ACT SOUTH WEST,9530
2,ACTON,2431
3,AINSLIE,5346
4,AMAROO,6042
5,ARANDA,2505
6,BANKS,4817
7,BELCONNEN,6773
8,BONNER,5875
9,BONYTHON,3600


In [6]:
#Place populations and crime statistics into one dataframe
collated_crime_data=pd.DataFrame.merge(suburb_populations, aggregate_crime_data)

#Remove records of zero population to avoid division by zero errors later
collated_crime_data=collated_crime_data[collated_crime_data.Population > 0]

#Print collated_crime_data
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    display(collated_crime_data)

Unnamed: 0,Neighbourhood,Population,1 Homicide,2a Assault - FV,2b Assault - Non-FV,3 Sexual Assault,4 Other offences against a person,5a Robbery - armed,5b Robbery - other
0,ACT EAST,865,1,0,29,2,3,4,1
1,ACT SOUTH WEST,9530,0,22,12,2,5,1,0
2,ACTON,2431,0,5,9,5,1,0,1
3,AINSLIE,5346,0,16,12,26,2,2,0
4,AMAROO,6042,0,16,10,8,2,1,0
5,ARANDA,2505,0,3,6,0,1,0,0
6,BANKS,4817,0,17,6,2,5,1,1
7,BELCONNEN,6773,0,36,115,52,12,13,6
8,BONNER,5875,0,19,0,3,1,0,0
9,BONYTHON,3600,0,12,8,3,3,0,0


In [7]:
#Free unnecessary values
del aggregate_crime_data

In [8]:
#Omit Neighbourhood and population and divide by respective population values
crime_per_capita = collated_crime_data.loc[:,"1 Homicide":].div(collated_crime_data.Population, axis=0)

#Reinsert Neighbourhood
crime_per_capita = collated_crime_data[['Neighbourhood']].join(crime_per_capita)

#Print crime_per_capita
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    display(crime_per_capita)

Unnamed: 0,Neighbourhood,1 Homicide,2a Assault - FV,2b Assault - Non-FV,3 Sexual Assault,4 Other offences against a person,5a Robbery - armed,5b Robbery - other
0,ACT EAST,0.001156,0.0,0.033526,0.002312,0.003468,0.004624,0.001156
1,ACT SOUTH WEST,0.0,0.002308,0.001259,0.00021,0.000525,0.000105,0.0
2,ACTON,0.0,0.002057,0.003702,0.002057,0.000411,0.0,0.000411
3,AINSLIE,0.0,0.002993,0.002245,0.004863,0.000374,0.000374,0.0
4,AMAROO,0.0,0.002648,0.001655,0.001324,0.000331,0.000166,0.0
5,ARANDA,0.0,0.001198,0.002395,0.0,0.000399,0.0,0.0
6,BANKS,0.0,0.003529,0.001246,0.000415,0.001038,0.000208,0.000208
7,BELCONNEN,0.0,0.005315,0.016979,0.007678,0.001772,0.001919,0.000886
8,BONNER,0.0,0.003234,0.0,0.000511,0.00017,0.0,0.0
9,BONYTHON,0.0,0.003333,0.002222,0.000833,0.000833,0.0,0.0


In [9]:
#Free unnecessary variable
del collated_crime_data

### OPTICS Clustering of Crime Per Capita Data
After transforming the data to a per capita basis, OPTICS clustering was used to group the data into clusters. Note cluster -1 are outliers, and not a cluster.

In [10]:
#Run OPTICS clustering
clusters = OPTICS().fit(normalize(crime_per_capita.loc[:,"1 Homicide":]))

#Insert cluster labels into crime_per_capita dataframe
crime_per_capita.insert(0, 'Cluster', clusters.labels_)

#Print crime_per_capita with cluster labels
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    display(crime_per_capita)

#Drop redundant variable
del clusters

Unnamed: 0,Cluster,Neighbourhood,1 Homicide,2a Assault - FV,2b Assault - Non-FV,3 Sexual Assault,4 Other offences against a person,5a Robbery - armed,5b Robbery - other
0,-1,ACT EAST,0.001156,0.0,0.033526,0.002312,0.003468,0.004624,0.001156
1,2,ACT SOUTH WEST,0.0,0.002308,0.001259,0.00021,0.000525,0.000105,0.0
2,3,ACTON,0.0,0.002057,0.003702,0.002057,0.000411,0.0,0.000411
3,-1,AINSLIE,0.0,0.002993,0.002245,0.004863,0.000374,0.000374,0.0
4,-1,AMAROO,0.0,0.002648,0.001655,0.001324,0.000331,0.000166,0.0
5,-1,ARANDA,0.0,0.001198,0.002395,0.0,0.000399,0.0,0.0
6,-1,BANKS,0.0,0.003529,0.001246,0.000415,0.001038,0.000208,0.000208
7,1,BELCONNEN,0.0,0.005315,0.016979,0.007678,0.001772,0.001919,0.000886
8,-1,BONNER,0.0,0.003234,0.0,0.000511,0.00017,0.0,0.0
9,2,BONYTHON,0.0,0.003333,0.002222,0.000833,0.000833,0.0,0.0


In [11]:
#Summarise clusters
crime_summary_by_cluster=crime_per_capita[["Cluster","Neighbourhood"]].groupby('Cluster').agg(lambda x: ', '.join(x)).join(crime_per_capita.groupby(['Cluster']).mean())
crime_summary_by_cluster.insert(loc=crime_summary_by_cluster.shape[1], column='Total', value=crime_summary_by_cluster.sum(axis=1))
crime_summary_by_cluster.sort_values(by='Cluster',inplace=True)
display(crime_summary_by_cluster)

Unnamed: 0_level_0,Neighbourhood,1 Homicide,2a Assault - FV,2b Assault - Non-FV,3 Sexual Assault,4 Other offences against a person,5a Robbery - armed,5b Robbery - other,Total
Cluster,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
-1,"ACT EAST, AINSLIE, AMAROO, ARANDA, BANKS, BONN...",2.5e-05,0.01043,0.022199,0.001367,0.00127,0.000592,0.003618,0.039501
0,"CITY, GARRAN, HUME, MOLONGLO, PHILLIP",0.000741,0.003128,0.051387,0.003217,0.001861,0.000369,0.001306,0.06201
1,"BELCONNEN, BRUCE, NICHOLLS, WARAMANGA, WESTON,...",0.0,0.001783,0.005068,0.001946,0.000765,0.000345,0.000148,0.010054
2,"ACT SOUTH WEST, BONYTHON, CASEY, GIRALANG, ISA...",0.0,0.002824,0.001581,0.000557,0.000567,4.2e-05,3.8e-05,0.005609
3,"ACTON, KALEEN, RED HILL, TURNER, WATSON",0.0,0.001309,0.002277,0.001428,0.000407,7.6e-05,0.000129,0.005626


In [12]:
canberra_neighbourhoods = ['ACT EAST','ACT SOUTH WEST','ACTON','AINSLIE','AMAROO','ARANDA','BANKS','BARTON','BEARD','BELCONNEN','BLACK MOUNTAIN','BONNER','BONYTHON','BRADDON','BRUCE','CALWELL','CAMPBELL','CAPITAL HILL','CASEY','CHAPMAN','CHARNWOOD','CHIFLEY','CHISHOLM','CITY','CONDER','COOK','COOMBS','CRACE','CURTIN','DEAKIN','DENMAN PROSPECT','DICKSON','DOWNER','DUFFY','DUNLOP','DUNTROON','EVATT','FADDEN','FARRER','FISHER','FLOREY','FLYNN','FORDE','FORREST','FRANKLIN','FRASER','FYSHWICK','GARRAN','GILMORE','GIRALANG','GORDON','GOWRIE','GREENWAY','GRIFFITH','GUNGAHLIN','HACKETT','HALL','HARMAN','HARRISON','HAWKER','HIGGINS','HOLDER','HOLT','HUGHES','HUME','ISAACS','ISABELLA PLAINS','JACKA','KALEEN','KAMBAH','KENNY','KINGSTON','KINGSTON-BARTON','KINLYSIDE','KOWEN','LATHAM','LAWSON','LYNEHAM','LYONS','MACARTHUR','MACGREGOR','MACQUARIE','MAJURA','MAWSON','MCKELLAR','MELBA','MITCHELL','MOLONGLO','MONASH','MONCRIEFF','NARRABUNDAH','NGUNNAWAL','NICHOLLS',"O'CONNOR","O'MALLEY",'OAKS ESTATE','OXLEY','PAGE','PALMERSTON','PEARCE','PHILLIP','PIALLIGO','RED HILL','REID','RICHARDSON','RIVETT','RUSSELL','SCULLIN','SPENCE','STIRLING','STROMLO','SYMONSTON','TAYLOR','THARWA','THEODORE','THROSBY','TORRENS','TURNER','URIARRA','WANNIASSA','WARAMANGA','WATSON','WEETANGERA','WESTON','WILLIAMSDALE','WRIGHT','YARRALUMLA']
geopy.geocoders.options.default_timeout = None
canberra_geolocator = Nominatim(user_agent='canberra_explorer')

#This function gets the longitude and lattitude of an ACT suburb as a string lat,lon
def getCoordinates(neighbourhood):
    sleep(1)
    try:
        location = canberra_geolocator.geocode(neighbourhood+', Australian Capital Territory')
        return str(location.latitude)+','+str(location.longitude)
    except:
        return getCoordinates(address)

#This function returns the ACT suburb of a geocode
def getNeighhourhood(coordinates):
    sleep(1)
    try:
        results=canberra_geolocator.reverse(coordinates)[0].upper().split(', ')
        for result in results:
            if result in canberra_neighbourhoods:
                if result=='CIVIC':
                    return 'CITY'
                
                if result=='LAKE BURLEY GRIFFIN':
                    return 'CITY'
                
                if result=='GUNGAHLIN EAST':
                    return 'GUNGAHLIN'
                
                if result=='GUNGAHLIN TC':
                    return 'GUNGAHLIN'
                
                if result=='GUNGAHLIN WEST':
                    return 'GUNGAHLIN'
                
                if result=='MOUNT TAYLOR':
                    return 'WESTON'
                
                if result=='TUGGERANONG':
                    return 'ISABELLA PLAINS'
                
                if result=='TUGGERANONG':
                    return 'ISABELLA PLAINS'
                
                if result=='NAMADGI':
                    return 'ACT SOUTH WEST'
                
                if result=='GOOROMON':
                    return 'DUNLOP'
                
                return result
        
        return ''
    
    except:
        return getNeighhourhood(address)

In [13]:
#Set Foursquare client using environmental variables FOURSQUARE_ID and FOURSQUARE_SECRET for security purposes
foursquare_client = foursquare.Foursquare(client_id=getenv('FOURSQUARE_ID'), client_secret=getenv('FOURSQUARE_SECRET'), version='20180605')

#For each coordinate pair, get the neighbourhood's name.
suburb_coordinates_pairs=[getCoordinates(neighbourhood) for neighbourhood in crime_per_capita.loc[:,"Neighbourhood"].values]

#Set a dataframe to populate Foursquare results into.
foursquare_results = pd.DataFrame([], columns=['Coordinates', 'Venue Category']) 

#For each coordinate pair from above and type of alcohol venue, retrieve Foursquare results
i=0

for coordinates_pair in suburb_coordinates_pairs:
    for term in ['bar','bottleshop', 'liquor', 'pubs', 'nightclub', 'night club']:
        results=foursquare_client.venues.search(params={'query': term, 'll':coordinates_pair, 'radius': 10000, 'intent': 'browse'})
        for result in results['venues']:
            category=''
            if 'categories' in result:
                if len(result['categories'])>=1:
                    category=result['categories'][0]['name']

            coordinates=''
            if 'location' in result:
                if 'lat' in result['location']:
                    if 'lng' in result['location']:
                        coordinates=str(result['location']['lat'])+','+str(result['location']['lng'])

            if coordinates!='' and category!='':
                foursquare_results.loc[i,'Coordinates'] = coordinates
                foursquare_results.loc[i,'Venue Category'] = category
                i=i+1

#Free iterator.          
del i

#Print Foursquare results.
display(foursquare_results)

Unnamed: 0,Coordinates,Venue Category
0,"-35.27838266040374,149.1284813365466",Italian Restaurant
1,"-35.27621,149.13118",Coffee Shop
2,"-35.27672820851512,149.12116327197995",Pub
3,"-35.28378140300602,149.11763629994482",Bar
4,"-35.272128055408906,149.13458721595302",Japanese Restaurant
...,...,...
7051,"-35.23850279208846,149.0661310608485",Café
7052,"-35.2775068395008,149.12577125398246",Gym
7053,"-35.276227144727024,149.12734722518945",Sports Bar
7054,"-35.27992591694459,149.1344690322876",Café


In [14]:
#Remove duplicate results and print outcome.
unique_results=foursquare_results.groupby('Coordinates').first().reset_index()
unique_results['Neighbourhood']=[getNeighhourhood(neighbourhood) for neighbourhood in unique_results.loc[:,'Coordinates'].values]
unique_results=unique_results[unique_results.Neighbourhood!='']
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    display(unique_results)

Unnamed: 0,Coordinates,Venue Category,Neighbourhood
1,"-35.15616526661593,149.14698337220318",Liquor Store,BONNER
3,"-35.164786,149.131966",Liquor Store,AMAROO
4,"-35.170138,149.069735",Office,HALL
5,"-35.17081,149.07079",Restaurant,HALL
6,"-35.171254,149.12822",Athletics & Sports,AMAROO
7,"-35.17654,149.13395",Italian Restaurant,GUNGAHLIN
8,"-35.1783724820464,149.09832084549436",Field,NICHOLLS
9,"-35.18303,149.1345",Salon / Barbershop,GUNGAHLIN
10,"-35.18328558626071,149.13263251656377",Bar,GUNGAHLIN
11,"-35.18343,149.13739",Salon / Barbershop,GUNGAHLIN


In [15]:
#Unpivot Foursquare unique results and fill blanks with zeros.
venues_by_suburb=unique_results.groupby(['Neighbourhood','Venue Category']).count().unstack(level=-1,fill_value=0)

#Set column labels
column_labels=[]

for label in venues_by_suburb['Coordinates'].columns.values:
    column_labels.append(label)
    
venues_by_suburb.columns=column_labels

column_labels

del column_labels

#Reset index on venues by suburb
venues_by_suburb.reset_index(inplace=True)

#Remove with no neighbourhood, i.e. out-of-Canberra results.
venues_by_suburb=venues_by_suburb[venues_by_suburb.Neighbourhood!='']

#Print venues_by_suburb
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    display(venues_by_suburb)

Unnamed: 0,Neighbourhood,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,Bar,Basketball Stadium,Beer Garden,Brewery,Building,Burger Joint,Butcher,Café,Candy Store,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Rec Center,Dance Studio,Dessert Shop,Diner,Dive Bar,Dog Run,Electronics Store,Ethiopian Restaurant,Event Space,Field,Flower Shop,Food,Furniture / Home Store,Gaming Cafe,Gastropub,Gay Bar,General College & University,General Entertainment,German Restaurant,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Health & Beauty Service,Hotel Bar,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Karaoke Bar,Liquor Store,Locksmith,Lounge,Market,Massage Studio,Medical Lab,Miscellaneous Shop,Multiplex,Music Venue,New American Restaurant,Nightclub,Non-Profit,Noodle House,Office,Other Nightlife,Outdoors & Recreation,Pharmacy,Pizza Place,Pool,Pub,Racetrack,Restaurant,Road,Salon / Barbershop,Sandwich Place,Snack Place,Social Club,Spa,Sports Bar,Sports Club,Stables,Steakhouse,Supermarket,Sushi Restaurant,Tennis Court,Thai Restaurant,Thrift / Vintage Store,Warehouse,Wine Bar
0,ACTON,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,AINSLIE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
2,AMAROO,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,ARANDA,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,BARTON,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1
5,BELCONNEN,0,0,0,0,0,2,0,0,0,0,0,0,3,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,1,2,0,0,0,0,0,1,1,1,0,0,0,1,1,0,0,0,1,0,0,0,0,0,2,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0
6,BONNER,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,BRADDON,0,0,0,0,1,3,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1
8,BRUCE,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,CALWELL,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0


### OPTICS Clustering of Venue Data
OPTICS clustering was used to group the data into clusters. Note cluster -1 are outliers, and not a cluster.

In [16]:
#Run OPTICS clustering
clusters = OPTICS().fit(normalize(venues_by_suburb.loc[:,venues_by_suburb.columns[1]:]))

#Insert cluster labels into crime_per_capita dataframe
venues_by_suburb.insert(0, 'Cluster', clusters.labels_)

#Print crime_per_capita with cluster labels
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    display(venues_by_suburb)

#Drop redundant variable
del clusters

Unnamed: 0,Cluster,Neighbourhood,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,Bar,Basketball Stadium,Beer Garden,Brewery,Building,Burger Joint,Butcher,Café,Candy Store,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Rec Center,Dance Studio,Dessert Shop,Diner,Dive Bar,Dog Run,Electronics Store,Ethiopian Restaurant,Event Space,Field,Flower Shop,Food,Furniture / Home Store,Gaming Cafe,Gastropub,Gay Bar,General College & University,General Entertainment,German Restaurant,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Health & Beauty Service,Hotel Bar,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Karaoke Bar,Liquor Store,Locksmith,Lounge,Market,Massage Studio,Medical Lab,Miscellaneous Shop,Multiplex,Music Venue,New American Restaurant,Nightclub,Non-Profit,Noodle House,Office,Other Nightlife,Outdoors & Recreation,Pharmacy,Pizza Place,Pool,Pub,Racetrack,Restaurant,Road,Salon / Barbershop,Sandwich Place,Snack Place,Social Club,Spa,Sports Bar,Sports Club,Stables,Steakhouse,Supermarket,Sushi Restaurant,Tennis Court,Thai Restaurant,Thrift / Vintage Store,Warehouse,Wine Bar
0,-1,ACTON,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,-1,AINSLIE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
2,0,AMAROO,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,-1,ARANDA,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,-1,BARTON,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1
5,-1,BELCONNEN,0,0,0,0,0,2,0,0,0,0,0,0,3,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,1,2,0,0,0,0,0,1,1,1,0,0,0,1,1,0,0,0,1,0,0,0,0,0,2,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0
6,0,BONNER,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,-1,BRADDON,0,0,0,0,1,3,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1
8,-1,BRUCE,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,-1,CALWELL,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0


In [17]:
#Aggregate clusters to make and print summary
venue_summary_by_cluster=venues_by_suburb[["Cluster","Neighbourhood"]].groupby('Cluster').agg(lambda x: ', '.join(x)).join(venues_by_suburb.groupby(['Cluster']).mean())
venue_summary_by_cluster.insert(loc=venue_summary_by_cluster.shape[1], column='Total', value=venue_summary_by_cluster.sum(axis=1))
venue_summary_by_cluster.sort_values(by='Total',inplace=True)
display(venue_summary_by_cluster)

Unnamed: 0_level_0,Neighbourhood,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,Bar,Basketball Stadium,Beer Garden,Brewery,...,Stables,Steakhouse,Supermarket,Sushi Restaurant,Tennis Court,Thai Restaurant,Thrift / Vintage Store,Warehouse,Wine Bar,Total
Cluster,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,"CHISHOLM, DUNTROON, HUGHES, MACQUARIE, MELBA, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.333333,0.0,0.0,0.0,0.0,2.333333
0,"AMAROO, BONNER, CONDER, DICKSON, HAWKER, OAKS ...",0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,...,0.0,0.142857,0.0,0.0,0.142857,0.0,0.0,0.0,0.142857,3.285714
-1,"ACTON, AINSLIE, ARANDA, BARTON, BELCONNEN, BRA...",0.020833,0.020833,0.020833,0.125,0.104167,0.520833,0.020833,0.020833,0.041667,...,0.020833,0.020833,0.041667,0.020833,0.125,0.041667,0.041667,0.020833,0.145833,6.458333


### Analysis of Results

In [18]:
#This function checks whether two neighbourhoods are in the same cluster in either the crime_per_capita or venues_by_suburb data
def inSameCluster(neighbourhood1, neighbourhood2, df):
    try:
        neighbourhood1_cluster = df.loc[df.Neighbourhood==neighbourhood1].Cluster.values[0]
    except:
        return False
    
    if neighbourhood1_cluster == -1:
        return False
    
    try:
        neighbourhood2_cluster = df.loc[df.Neighbourhood==neighbourhood2].Cluster.values[0]
    except:
        return False
    
    if neighbourhood2_cluster == -1:
        return False

    return neighbourhood1_cluster == neighbourhood2_cluster

In [19]:
#Get list of neighbourhoods in either dataset and sort alphabetically
neighbourhoods=list(set(unique_results.Neighbourhood)|set(crime_per_capita.Neighbourhood))
neighbourhoods.sort()

#Check maximium neighbourhoods
neighbourhood_max = len(neighbourhoods)

#Set variables for results
cluster_relations = 0
silhouetted_relations = 0

crime_cluster_relations = 0
venue_cluster_relations = 0

#For each neighbourhood, check the cluster patterns
for i in range(0,neighbourhood_max):
    for j in range(i+1,neighbourhood_max):
        neighbourhood1 = neighbourhoods[i]
        neighbourhood2 = neighbourhoods[j]
        
        crime_cluster_related = inSameCluster(neighbourhood1, neighbourhood2, crime_per_capita)
        venue_cluster_related = inSameCluster(neighbourhood1, neighbourhood2, venues_by_suburb)
        
        if crime_cluster_related:
            crime_cluster_relations += 1
            
        if venue_cluster_related:
            venue_cluster_relations += 1
        
        if crime_cluster_related or venue_cluster_related:
            cluster_relations += 1
            
        if crime_cluster_related and venue_cluster_related:
            silhouetted_relations += 1

#Print the results
message = 'Neighbourhoods related in a crime cluster: '+str(crime_cluster_relations)
message += '\nNeighbourhoods related in a venues cluster: '+str(venue_cluster_relations)
message += '\nNeighbourhoods related in either cluster: '+str(cluster_relations)
message += '\nNeighbourhoods related in both clusters: '+str(silhouetted_relations)
print(message)

#Removed unnecessary objects.
del neighbourhoods, neighbourhood_max, cluster_relations, silhouetted_relations, crime_cluster_relations, venue_cluster_relations, crime_cluster_related, venue_cluster_related, message

Neighbourhoods related in a crime cluster: 50
Neighbourhoods related in a venues cluster: 36
Neighbourhoods related in either cluster: 86
Neighbourhoods related in both clusters: 0


# Part 4 - Results
## Clustering of Crime Data

### Outliers
ACT East, Ainslie, Amaroo, Aranda, Banks, Bonner, Braddon, Calwell, Campbell, Chapman, Charnwood, Chifley, Chisholm, Conder, Cook, Crace, Curtin, Deakin, Dickson, Downer, Duffy, Dunlop, Evatt, Fadden, Farrer, Fisher, Florey, Flynn, Forde, Forrest, Franklin, Fraser, Gilmore, Gordon, Gowrie, Greenway, Griffith, Gungahlin, Hackett, Hall, Harrison, Hawker, Higgins, Holder, Holt, Hughes, Isaacs, Kambah, Kingston-Barton, Kowen, Latham, Lawson, Lyneham, Lyons, Macarthur, Macquarie, Majura, Mawson, Mckellar, Melba, Mitchell, Monash, Narrabundah, Ngunnawal, O'Connor, O'Malley, Oxley, Page, Palmerston, Pearce, Reid, Richardson, Rivett, Scullin, Spence, Stirling, Theodore, Torrens, Wanniassa, Weetangera are outliers.

### Cluster 0
Cluster 0 includes City, Garran, Hume, Molonglo and Phillip. Cluster 0 has the highest rates of all types of violent crime and the highest rate overall.

### Cluster 1
Cluster 1 has Belconnen, Bruce, Nicholls, Waramanga, Weston and Yarralumla. Cluster 1 has Belconnen, Bruce, Nicholls, Waramanga, Weston and Yarralumla. Cluster 1 has the second highest rates of all types of violent crime, except family violence which is second lowest. Cluster 1 also has the second highest rate overall.

### Cluster 2
Cluster 2's neighbourhoods are ACT South West, Bonython, Casey, Giralang, Isabella Plains, Macgregor. Cluster 2 has the lowest rates of all violent crime, except family violence which is second highest and other offences against a person which is second lowest. Cluster 2 also has the lowest rate overall.

### Cluster 3
Cluster 3 has Acton, Kaleen, Red Hill, Turner and Watson. Cluster 3 has the second lowest rates of all violent crime, except family violence and other offences against a person that were lowest. Cluster 3 also has the second lowest rate overall.

## Clustering of Venue Data
### Cluster 0
Cluster 0 contains Amaroo, Bonner, Conder, Dickson, Hawker, Oaks Estate and Pearce. Cluster 0  has the lowest number of alcohol-related venues in the Foursquare data.

### Cluster 1
Cluster 1 has the neighbourhoods of Chisholm, Duntroon, Hughes, Macquarie, Melba and Reid. Cluster 1 has the highest number of venues.

### Outliers
The neighbourhoods Acton, Ainslie, Aranda, Barton, Belconnen, Braddon, Bruce, Calwell, Chapman, Charnwood, City, Crace, Curtin, Deakin, Downer, Evatt, Flynn, Forrest, Franklin, Fyshwick, Garran, Greenway, Griffith, Gungahlin, Hall, Holder, Holt, Hume, Kaleen, Kambah, Kingston, Lyneham, Mawson, Mckellar, Mitchell, Narrabundah, Ngunnawal, Nicholls, O'Connor, Page, Phillip, Stirling, Turner, Wanniassa, Waramanga, Watson, Weston and Yarralumla were outliers.

## Overlap of Clustering
The analysis shows no overlap between the clusters for the crime data and those of the venue data.

# Part 5 - Discussion
The analysis suggests there is no association between the availability of alcohol-related venues and patterns of violent crime. There were many outliers within the analysis; however, and so there is a need for subsequent research to validate the results. The discovery of clusters for patterns of violent crime also merits further research to determine the drivers of the patterns.