# Crime in Montevideo, Uruguay
### Uruguay is considered a safe country in Latin America, especially compared to its neighbors. However, crime in Uruguay, and in its capital Montevideo in particular, has increased significantly in the last decade. 
### The aim of this analysis is to have a better understanding of which parts of Montevideo have a higher rate of crime and what's the relationship with other variables (police presence and economic level).
### In order to do this the following data will be used:
### - Foursquare information of venues in Montevideo. Foursqueare is mainly used by tourists or by Uruguayans that belong to high-income classes. As per this, the venues provided by Foursquare will be used as a proxy for areas that have a higher presence of police and are high-income.
### - Location of Police Stations in Montevideo. This information was obtained from a Government site (Ministerio de Desarollo Social).
### - Number of Crimes during 2019 reported per Police Station. This information was also obtained from a Government site (Ministerio del Interior). 

#### Import libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
# uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries imported.


#### Set credentials for Foursquare and obtain venue information from Montevideo, Uruguay

In [31]:
CLIENT_ID = 'JGSHF3NQXMWZ44ER1KJKECJOWWRWZNOLEOANVP3J2EWFH4GK' # your Foursquare ID
CLIENT_SECRET = 'RTALTWRS2KQFDPERC3SPRJNNFWJZSHDIOUPHIBLQB5T52K1V' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: JGSHF3NQXMWZ44ER1KJKECJOWWRWZNOLEOANVP3J2EWFH4GK
CLIENT_SECRET:RTALTWRS2KQFDPERC3SPRJNNFWJZSHDIOUPHIBLQB5T52K1V


In [32]:
address = 'Montevideo, UY'

geolocator = Nominatim(user_agent="mvd_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Montevideo are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Montevideo are -34.9059039, -56.1913569.


In [33]:
radius = 10000 #define radius
LIMIT = 1000 #limit of number of venues returned by Fursquare API

url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=JGSHF3NQXMWZ44ER1KJKECJOWWRWZNOLEOANVP3J2EWFH4GK&client_secret=RTALTWRS2KQFDPERC3SPRJNNFWJZSHDIOUPHIBLQB5T52K1V&ll=-34.9059039,-56.1913569&v=20180605&radius=10000&limit=1000'

In [34]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f382ff5dd2e0829bbf23112'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Montevideo',
  'headerFullLocation': 'Montevideo',
  'headerLocationGranularity': 'city',
  'totalResults': 203,
  'suggestedBounds': {'ne': {'lat': -34.81590380999991,
    'lng': -56.081818032194775},
   'sw': {'lat': -34.99590399000009, 'lng': -56.30089576780523}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c7eeaf48e6495210aa112bd',
       'name': 'Auditorio Nacional del Sodre Dra. Adela Reta',
       'location': {'address': 'Andes',
        'crossStreet': 'Mercedes',
        'lat': -34.9044896879589,
        'lng': -56.19839471139537,
        'labeledLatLngs': [

In [35]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [36]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.columns = ['Name','Category','Latitude','Longitude']

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,Name,Category,Latitude,Longitude
0,Auditorio Nacional del Sodre Dra. Adela Reta,Concert Hall,-34.90449,-56.198395
1,Ashot Shawarma,Kebab Restaurant,-34.907314,-56.190219
2,Smart Hotel Montevideo,Hotel,-34.908406,-56.197819
3,La Ibérica,Furniture / Home Store,-34.905319,-56.201069
4,Candy Bar Palermo,Bar,-34.910187,-56.18502


In [37]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


#### Cluster the venues. After inspection it is confirmed these clusters are good proxies for the neighborhoods in Montevideo (no geospatial data available found).

In [40]:
# set number of clusters
kclusters = 7

mvd_clustering = nearby_venues.drop(['Name','Category'], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(mvd_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 1, 3, 3, 1, 3, 3, 3, 1, 3])

In [41]:
# add clustering labels
nearby_venues.insert(0, 'Cluster Labels', kmeans.labels_)

In [42]:
nearby_venues.head()

Unnamed: 0,Cluster Labels,Name,Category,Latitude,Longitude
0,3,Auditorio Nacional del Sodre Dra. Adela Reta,Concert Hall,-34.90449,-56.198395
1,1,Ashot Shawarma,Kebab Restaurant,-34.907314,-56.190219
2,3,Smart Hotel Montevideo,Hotel,-34.908406,-56.197819
3,3,La Ibérica,Furniture / Home Store,-34.905319,-56.201069
4,1,Candy Bar Palermo,Bar,-34.910187,-56.18502


In [47]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add Venue markers to the map
markers_colors = []
for lat, lon, name, cluster in zip(nearby_venues['Latitude'], nearby_venues['Longitude'], nearby_venues['Name'], nearby_venues['Cluster Labels']):
    label = folium.Popup(str(name) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Load information about Police Stations in Uruguay. The data was previously obtained from Uruguayan government site, downloaded to a .txt file. 

In [44]:
df_police = pd.read_csv('comisarias.txt') #import Uruguay Police information

#remove parenthesis from location fields
df_police['LON'] = df_police['LON'].replace('\(','', regex=True)
df_police['LON'] = df_police['LON'].replace('\)','', regex=True)
df_police['LAT'] = df_police['LAT'].replace('\(','', regex=True)
df_police['LAT'] = df_police['LAT'].replace('\)','', regex=True)
df_police.head()

Unnamed: 0,NRO,RECURSO,NOMBRE,DESCRIPCION,DIRECCION,TELEFONO,CORREO,OBSERVACION,LON,LAT
0,1,Policía,Seccional 12 Cerro Largo,,undefined undefined,,,,-53.981129636258,-32.342184518895564
1,2,Policía,Seccional 14 Cerro Largo,,undefined undefined,,,,-54.17770399994951,-32.34168200088537
2,3,Policía,Seccional 4 C.Largo,,undefined undefined,,,,-53.74546099990412,-32.10700600085456
3,4,Policía,Seccional 5 C.Largo,,undefined undefined,,,,-54.12363099994595,-31.960464000876435
4,5,Policía,Sub. Cria. C.Largo,,undefined undefined,,,,-54.16259799994918,-31.872311000876657


### Drop unnecesary fields and only consider Police Stations in Uruguay's capital: Montevideo

In [308]:
df_police_mvd = df_police[df_police['NOMBRE'].str.contains('Montevideo')] #select Police only in Montevideo
df_police_mvd = df_police_mvd.reset_index(drop=True) #reset index
df_police_mvd = df_police_mvd.drop(['RECURSO','NRO','DESCRIPCION','DIRECCION','TELEFONO','CORREO','OBSERVACION'], axis=1) #drop empty fields
df_police_mvd.columns = ['Name','Longitude','Latitude']

df_police_mvd['Longitude'] = df_police_mvd['Longitude'].astype(float)
df_police_mvd['Latitude'] = df_police_mvd['Latitude'].astype(float)
df_police_mvd.dtypes

df_police_mvd = df_police_mvd.drop(axis=0, index=18) #drop "Jefatura"
df_police_mvd = df_police_mvd.reset_index(drop=True)

df_police_mvd.head()

Unnamed: 0,Name,Longitude,Latitude
0,Comisaria Seccional 18 Montevideo.,-56.085709,-34.800053
1,Comisaria Seccional 16 Montevideo.,-56.144316,-34.850426
2,Comisaria Seccional 25 Montevideo,-56.116258,-34.837025
3,Comisaria Seccional 17 Montevideo,-56.164098,-34.795132
4,Comisaria Seccional 10 Montevideo,-56.15028,-34.907021


In [309]:
# add Seccional field
import re
df_police_mvd['Seccional'] = df_police_mvd['Name'].str.findall(r'\d+')

seccional = df_police_mvd['Seccional']

for key,value in enumerate(seccional):
    n_seccional = value[0]    
    #print(key, n_seccional)
    df_police_mvd['Seccional'][key] = n_seccional

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.


In [326]:
df_police_mvd['Seccional']=df_police_mvd['Seccional'].astype(int)
df_police_mvd.head()

Unnamed: 0,Name,Longitude,Latitude,Seccional
0,Comisaria Seccional 18 Montevideo.,-56.085709,-34.800053,18
1,Comisaria Seccional 16 Montevideo.,-56.144316,-34.850426,16
2,Comisaria Seccional 25 Montevideo,-56.116258,-34.837025,25
3,Comisaria Seccional 17 Montevideo,-56.164098,-34.795132,17
4,Comisaria Seccional 10 Montevideo,-56.15028,-34.907021,10


### Load the information of Crimes per Police Station.The data was previously obtained from Uruguayan government site, downloaded to an excel file. 

In [313]:
# import number of crimes reported per police station
df_denuncias = pd.read_excel('Denuncias.xlsx')
df_denuncias.head()

Unnamed: 0,Seccional,Denuncias (Hurto y Rapina)
0,9,350
1,8,639
2,7,395
3,6,158
4,5,297


In [314]:
df_denuncias = df_denuncias.drop(df_denuncias.index[[25,26]]) #drop PREFECTURA
df_denuncias = df_denuncias.drop(axis=0, index=2) #drop empty row

In [315]:
df_denuncias['Seccional'] = df_denuncias['Seccional'].astype(int)

In [316]:
df_denuncias.head()

Unnamed: 0,Seccional,Denuncias (Hurto y Rapina)
0,9,350
1,8,639
3,6,158
4,5,297
5,4,250


### Merge Police Station location data with Number of Crimes

In [317]:
df_police_merged = pd.merge(df_police_mvd,df_denuncias, on='Seccional')

In [319]:
df_police_merged.columns = ['Name','Longitude','Latitude','Seccional','N_Crimes']
df_police_merged.head()

Unnamed: 0,Name,Longitude,Latitude,Seccional,N_Crimes
0,Comisaria Seccional 18 Montevideo.,-56.085709,-34.800053,18,789
1,Comisaria Seccional 16 Montevideo.,-56.144316,-34.850426,16,928
2,Comisaria Seccional 25 Montevideo,-56.116258,-34.837025,25,544
3,Comisaria Seccional 17 Montevideo,-56.164098,-34.795132,17,1213
4,Comisaria Seccional 10 Montevideo,-56.15028,-34.907021,10,260


## Generate Map with Police Station location, bubble size indicating Number of Crimes, and venue locations.

In [331]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add Venue markers to the map
markers_colors = []
for lat, lon, name, cluster in zip(nearby_venues['Latitude'], nearby_venues['Longitude'], nearby_venues['Name'], nearby_venues['Cluster Labels']):
    label = folium.Popup(str(name) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',#rainbow[cluster-1],
        fill=True,
        fill_color='blue',#rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    
for Name, Latitude, Longitude, Crimes in zip(df_police_merged['Name'], df_police_merged['Latitude'], df_police_merged['Longitude'],df_police_merged['N_Crimes']):
    label = '{}, {}'.format(Number, Name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [Latitude, Longitude],
        radius=(Crimes/100)*5,
        popup=label,
        color='crimson',
        fill=True,
        fill_color='crimson',
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Results
### The map clearly shows a higher number of crimes in the outskirts of the city, were there is practically no presence of Foursquare venues. As mentioned previously, the Foursquare app is used mainly by tourists and high-income Uruguayans, so the location of the venues available in Foursquare can be used as a proxy for more policed and better-off areas. 


# Conclusion
### The main conclusion of the analysis is that crime rate and police presence/economic-status appear to be inversely related, with higher crime rate in areas with less police presence and/or lower economic-status.