# Correlations of crime and land use in Chicago

# Capstone project IBM Data Science

I would explore the correlations between the amount and types of crimes committed in a community area and the venues located there.  

The relations between crime and socioeconomic conditions are unclear. Often there are contradictory reports on whether difficult economic conditions create more crime or whether a booming economy that increases the availability of money and goods promotes crime. Nevertheless, it is important to do not dismiss possible connections between the economy and crime. In this project, I will try to find if the crimes committed in community areas with similar socioeconomic have a relation with the land use. For example, let's suppose that  A and B are community areas with similar economic conditions, but area A is much safer than area B. I want to see if the types of venues in area A are similar to those in area B. If they are not similar, then study what types of venues are present in the safer environment.  The venues could be parks, liquor shops, restaurants among others. 

Also, venues that reduce crime in one type of area do not necessarily affect crime in other types of areas.  One hypothesis here is that parks can be that kind of venue. In areas with a high rate of crime, social interactions can induce people to become criminals. Parks promote social interaction.  Therefore parks there contribute to crime. On the other hand, parks will not affect crime in areas with low rate crime. 

This project could help policymakers to design safer neighborhoods by building or closing venues to reduce crime. 

# Data

To reduce the effects of variables not considered in this project I will only compare neighborhoods in the same city.  Due to the availability of the data, I choose Chicago. The datasets of the crime and economic status of the neighborhoods in Chicago were already given in previous courses of this certification.   

In [1]:
import csv
import pandas as pd

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans


import folium # map rendering library
import numpy as np

In [3]:
economic_data = pd.read_csv("Census_Data_Selected_socioeconomic_indicators_in_Chicago__2008___2012.csv")
economic_data.head(2)

Unnamed: 0,COMMUNITY_AREA_NUMBER,COMMUNITY_AREA_NAME,PERCENT OF HOUSING CROWDED,PERCENT HOUSEHOLDS BELOW POVERTY,PERCENT AGED 16+ UNEMPLOYED,PERCENT AGED 25+ WITHOUT HIGH SCHOOL DIPLOMA,PERCENT AGED UNDER 18 OR OVER 64,PER_CAPITA_INCOME,HARDSHIP_INDEX
0,1.0,Rogers Park,7.7,23.6,8.7,18.2,27.5,23939,39.0
1,2.0,West Ridge,7.8,17.2,8.8,20.8,38.5,23040,46.0


This dataset contain 9 parameters. I will only describe the ones that are not clear. 

**Percent of housing crowded**: Percentage of houses with more people than rooms.

**Percent households below poverty**: Percentage of houses with an income less than the federal poverty level. In 2012 it was an income of 23 050 for a family of 4.

**Hardship index**: It is an score that combines the 6 socioeconomic indicators in this dataset.

I plan to cluster the data set, to obtain similar socieconomic community areas.

In [13]:
crime_data=pd.read_csv("Chicago_Crime_Data.csv")
crime_data=crime_data.drop(["CASE_NUMBER","DATE","BLOCK","IUCR","BEAT","DISTRICT","WARD","X_COORDINATE","Y_COORDINATE","YEAR","UPDATEDON","LOCATION"], axis=1)
crime_data.head(2)

Unnamed: 0,ID,PRIMARY_TYPE,DESCRIPTION,LOCATION_DESCRIPTION,ARREST,DOMESTIC,COMMUNITY_AREA_NUMBER,FBICODE,LATITUDE,LONGITUDE
0,3512276,THEFT,FROM BUILDING,SMALL RETAIL STORE,False,False,58.0,6,41.807441,-87.703956
1,3406613,THEFT,$500 AND UNDER,OTHER,False,False,23.0,6,41.89828,-87.716406


I only left the relevent information to project. That is the crimes, the types of crimes and their location. The **FBICODE** is a classification of crimes.

## Venues in a community area

To obtain the venues in an area, I need the location of the area. I will take the locaction of an area as the average location of the crimes within that area. 

In [17]:
location=crime_data.groupby(["COMMUNITY_AREA_NUMBER"]).mean()
location=location.drop(["ID","ARREST","DOMESTIC"], axis=1)
location.head(2)

Unnamed: 0_level_0,LATITUDE,LONGITUDE
COMMUNITY_AREA_NUMBER,Unnamed: 1_level_1,Unnamed: 2_level_1
1.0,42.013645,-87.673434
2.0,41.995488,-87.699195


I create a dataframe with community areas names, community area number, latitude and longitude.

In [19]:
location_name=pd.merge(economic_data,location, on="COMMUNITY_AREA_NUMBER")

In [22]:
location_name.head()

Unnamed: 0,COMMUNITY_AREA_NUMBER,COMMUNITY_AREA_NAME,PERCENT OF HOUSING CROWDED,PERCENT HOUSEHOLDS BELOW POVERTY,PERCENT AGED 16+ UNEMPLOYED,PERCENT AGED 25+ WITHOUT HIGH SCHOOL DIPLOMA,PERCENT AGED UNDER 18 OR OVER 64,PER_CAPITA_INCOME,HARDSHIP_INDEX,LATITUDE,LONGITUDE
0,1.0,Rogers Park,7.7,23.6,8.7,18.2,27.5,23939,39.0,42.013645,-87.673434
1,2.0,West Ridge,7.8,17.2,8.8,20.8,38.5,23040,46.0,41.995488,-87.699195
2,3.0,Uptown,3.8,24.0,8.9,11.8,22.2,35787,20.0,41.965424,-87.653356
3,4.0,Lincoln Square,3.4,10.9,8.2,13.4,25.5,37524,17.0,41.971693,-87.688144
4,5.0,North Center,0.3,7.5,5.2,4.5,26.2,57123,6.0,41.945273,-87.682741


I obtain the venues using the functions defined in the course.

In [25]:
CLIENT_ID = '3DUVJBPGEYCLZNSAIGXGADMURJJU5GSLR2KEQQHLU0IO2NZY' # your Foursquare ID
CLIENT_SECRET = 'RZ3O03B0L2QIKFWYPJKQCGPN4VEYAETNCM5SPQV0SZM13WOI' # your Foursquare Secret
VERSION = '20180605'

In [23]:
LIMIT = 50 # limit of number of venues returned by Foursquare API It was 100 originaly

radius = 500 # define radius

In [28]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Community area', 
                  'Area Latitude', 
                  'Area Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [29]:
chicago_venues = getNearbyVenues(names=location_name['COMMUNITY_AREA_NAME'],
                                   latitudes=location_name['LATITUDE'],
                                   longitudes=location_name['LONGITUDE']
                                  )

Rogers Park
West Ridge
Uptown
Lincoln Square
North Center
Lake View
Lincoln Park
Near North Side
Norwood Park
Jefferson Park
Forest Glen
Albany Park
Portage Park
Irving Park
Dunning
Montclaire
Belmont Cragin
Hermosa
Avondale
Logan Square
Humboldt park
West Town
Austin
West Garfield Park
East Garfield Park
Near West Side
North Lawndale
South Lawndale
Lower West Side
Loop
Near South Side
Douglas
Fuller Park
Grand Boulevard
Kenwood
Hyde Park
Woodlawn
South Shore
Chatham
Avalon Park
South Chicago
Burnside
Calumet Heights
Roseland
Pullman
South Deering
East Side
West Pullman
Riverdale
Hegewisch
Garfield Ridge
Brighton Park
McKinley Park
Bridgeport
New City
West Elsdon
Gage Park
Clearing
West Lawn
Chicago Lawn
West Englewood
Englewood
Greater Grand Crossing
Ashburn
Auburn Gresham
Beverly
Washington Height
Morgan Park
O'Hare
Edgewater


In [30]:
print(chicago_venues.shape)
chicago_venues.head()

(1208, 7)


Unnamed: 0,Community area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Rogers Park,42.013645,-87.673434,Taqueria & Restaurant Cd. Hidalgo,42.011634,-87.674484,Mexican Restaurant
1,Rogers Park,42.013645,-87.673434,El Famous Burrito,42.010421,-87.674204,Mexican Restaurant
2,Rogers Park,42.013645,-87.673434,Romanian Kosher Sausage Co.,42.012765,-87.674692,Deli / Bodega
3,Rogers Park,42.013645,-87.673434,Pottawattomie Park,42.015112,-87.676928,Park
4,Rogers Park,42.013645,-87.673434,Bark Place,42.01008,-87.675223,Pet Store


The venues in each area are stored in the chicago_venues dataframe.