# COGS 108 - Final Project - Crime Watch Effectiveness and Distribution

# Overview

<span style="background-color: #FFFF00">Fill in later once have a bit more data analysis finished</span>

# Names

- Alexander Schonken
- Jeffrey Chu
- Jennifer Dich
- Owen Clinton

# Group Members IDs

- A13331901
- A13613249
- A14230996
- A13340650

# Research Question

The question our group is trying to answer is: <b>What factors contribute to the types of crime committed in Los Angeles within a specific area throughout the day/night?</b> We also want to address what a data driven response would look like. One specific example we want to look into is whether street light coverage or a greater police presence contribute to lower overall crime rates in neighborhoods at night. We will use this data to consider whether moving police patrols to areas with less streetlight coverage would lead to a reduction in overall crime.

## Background and Prior Work

The datasets that we’re planning on using consist of streetlight data for the city of Los Angeles in combination with as much police data we can gather on locations where officers are called, the types of crimes committed, data on arrests made, and where officers stop people consistently. These datasets used together will help lead us to a more holistic view of the situation we are trying to analyze. Each of the datasets does not necessarily hold all of the data the others do in a homogenous way (i.e. some are missing years) but by increasing the amount of datasets we use and inferring connections based off of the data present, a clear picture will be painted and our call to action on what practices need to be put in place to increase the safety of Los Angeles will be backed up by valid data.

Before starting our exploratory data analysis of datasets we found on the subject, we don’t know a lot about the topic besides the inherent feeling of safety one gets when you’re in an area with street lights compared to a dark street. So our starting hypothesis is mainly based off of intuition and personal experience. The datasets we’ve found so far have a lot of information within them besides just location data, so we’ll have a very wide view of this narrow topic which we’ll be able to continually narrow our scope to reveal interesting answers and insightful conclusions.  

We weren’t able to find any other projects asking questions about the correlation between streetlights and police officer patrol locations. Our project seems to be fairly unique in the intersectionality of the data we’re using to reveal interesting insights into public safety. However, there have been projects created about streetlights and their affects on the number of crimes per year. An example of this is the report by Kinder Rice Institute for Urban Research that goes into how it was very difficult to recommend that cities increase streetlights to directly combat crime. Interestingly, on the other side of the argument is a research paper by CrimeLab New York in 2019 that found a significant drop in index crimes consisting of murder, aggravated assault, and other more "deadly" crimes. In addition to that finding, the paper also found that citizens living in the areas that received the lighting improvements were very pleased to have the additional infastructure in their neighborhoods. So the correlation between increasing street lighting and lowering crimes was very positively correlated in addition to the side effect of increasing the general happiness of citizens in the area.  

Since the research, that our group has been able to find, presents conclusions landing on both sides of the benefits of increasing street lighting to hep combat crime, our team's queston which more specifically focuses on police patrol positions relative to these well-lit areas should shed some new light on the subject and lead to increased insight on an apparently contested matter. Hopefully our project is able to contribute to the public good of Los Angeles (and by extension other cities) by discovering ways to increase the city's safety without increasing costs for taxpayers.

References (include links):
1) Kinder Rice Institute for Urban Researc - What Happens in the Shadows: Streetlights and How They Relate To Crime
https://kinder.rice.edu/sites/g/files/bxs1676/f/documents/Kinder%20Streetlights%20and%20Crime%20report.pdf

2) CrimeLab New York - "Can Street Lighting Reduce Crime?":
https://urbanlabs.uchicago.edu/projects/crime-lights-study

# Hypothesis


We hypothesize that areas with more street light coverage will have less overall crime than areas with less coverage. Therefore leading to a minimal need for police to monitor those areas, it will allow them the freedom to expand their range to cover areas that are less safe. 

# Dataset(s)

*Fill in your dataset information here*

(Copy this information for each dataset)
- Dataset Name:
- Link to the dataset:
- Number of observations:

1-2 sentences describing each dataset. 

If you plan to use multiple datasets, add 1-2 sentences about how you plan to combine these datasets.

# Setup

In [80]:
# Imports

%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import folium
import folium.plugins as plugins
import shapely
import os

# Need to pip install
import altair as alt
import geopandas as gpd

# Configure libraries

# Setup seaborn
sns.set()
sns.set_context('talk')

# Change options for displaying DataFrames
pd.options.display.max_rows = 7
pd.options.display.max_columns = 50

# Round decimals in DataFrames
pd.set_option('precision', 2)

# Allow altair to display graphs in notebook
alt.renderers.enable('notebook')

# Setup altair to save graphs to external file
def json_dir(data, data_dir = 'altairdata'):
    os.makedirs(data_dir, exits_ok = True)
    return alt.pipe(data, alto.to_json(filename = data_dir + '/{prefix}-{hash}.{extension}'))
alt.data_transformers.register('json_dir', json_dir)
alt.data_transformers.enable('json_dir', data_dir = 'mydata')

DataTransformerRegistry.enable('json_dir')

File Paths for all the Datasets

In [2]:
LAArrestDataPath = r'ProjectData/LAArrestData.csv'
LACrimeDataPath = r'ProjectData/LACrimeData.csv'
LAStreetlightLocationPath = r'ProjectData/LAStreetlightLocations.geojson'
LADistrictsPath = r'ProjectData/LADistricts.geojson'
HomeValuePath = r'ProjectData/MedianPricePerSquareFoot.csv'
LAZipCodesPath = r'ProjectData/LAZipCodes.geojson'
LAZipCodePopulationPath = r'ProjectData/LAPopulationbyZipCode.csv'
LACityZipCodesPath = r'ProjectData/LACityZipCodes.geojson'

Initialize all datasets into geopandas and pandas dataframes

In [3]:
arrests = pd.read_csv(LAArrestDataPath)
crimes = pd.read_csv(LACrimeDataPath)
homeValue = pd.read_csv(HomeValuePath)
populations = pd.read_csv(LAZipCodePopulationPath)
streetlights = gpd.read_file(LAStreetlightLocationPath)
districts = gpd.read_file(LADistrictsPath)
zipCodes = gpd.read_file(LAZipCodesPath)
cityzipCodes = gpd.read_file(LACityZipCodesPath)

# Data Cleaning

Drop columns and put crimes location into same format as arrests

In [4]:
arrests = arrests.drop(['Area ID','Area Name','Charge Group Code',
                        'Arrest Type Code','Charge','Address',
                        'Cross Street','Charge Description'], axis = 1)

In [5]:
crimes['Location'] = '(' + crimes['LAT'].map(str) + ', ' + crimes['LON'].map(str) + ')'

In [6]:
crimes = crimes.drop(['Date Rptd','AREA ','AREA NAME','Part 1-2','Crm Cd',
                      'Mocodes','Premis Cd','Premis Desc','Weapon Used Cd',
                      'Weapon Desc','Status','Crm Cd 1','Crm Cd 2','Crm Cd 3',
                      'Crm Cd 4','LOCATION','Cross Street','LON','LAT'], axis = 1)

In [7]:
arrests.columns = ['Report ID','Date','Time','Reporting District', 
                  'Arrest Age','Arrest Sex','Arrest Descent',
                   'Description','Location']
crimes.columns = ['Report ID','Date','Time','Reporting District',
                  'Crime Description','Victim Age','Victim Sex',
                  'Victim Descent','Status','Location']

Drop Nan data

In [8]:
arrests = arrests.dropna(subset = ['Time', 'Arrest Age', 'Arrest Sex', 'Arrest Descent', 'Description'])

In [9]:
crimes = crimes.dropna(subset = ['Time', 'Crime Description', 'Victim Age', 'Victim Sex', 'Victim Descent'])

Convert string location points to geometry for geopandas

In [10]:
def string_to_point(string):
    string_tuple = tuple(map(float, string.replace('(','').replace(')','').split(',')))
    point = shapely.geometry.Point(string_tuple[1],string_tuple[0])
    return point

In [11]:
arrestsgeometry = arrests['Location'].apply(string_to_point)
crimesgeometry = crimes['Location'].apply(string_to_point)

Add Zip Codes to crimes and arrest and streetlight dataframe

In [12]:
arrests_geo =  gpd.GeoDataFrame(arrests['Report ID'], geometry = arrestsgeometry)
crimes_geo = gpd.GeoDataFrame(crimes['Report ID'], geometry = crimesgeometry)

In [13]:
arrests_geo.crs = zipCodes.crs
crimes_geo.crs = zipCodes.crs

In [14]:
arrestswithzip = gpd.sjoin(arrests_geo,zipCodes, how = 'inner', op = 'within')
crimeswithzip = gpd.sjoin(crimes_geo, zipCodes, how = 'inner', op = 'within')
streetlights = gpd.sjoin(streetlights, zipCodes, how = 'inner', op = 'within')

In [15]:
arrestsmergeframe = pd.DataFrame(arrestswithzip)
crimesmergeframe = pd.DataFrame(crimeswithzip)

In [16]:
arrests = pd.merge(arrests, arrestsmergeframe, on = 'Report ID')
crimes = pd.merge(crimes, crimesmergeframe, on = 'Report ID')

In [17]:
arrests = arrests.drop(['OBJECTID','index_right'], axis = 1)
crimes = crimes.drop(['OBJECTID','index_right'], axis = 1)

In [18]:
streetlightsdf = pd.DataFrame(streetlights)

Change time to actual time variables

In [19]:
def gethour(integer):
    return int(integer/100)

In [20]:
arrests['Time'] = arrests['Time'].astype(int)

In [21]:
crimes['Hour'] = crimes['Time'].apply(gethour)
arrests['Hour'] = arrests['Time'].apply(gethour)

Make a list of unique zipcodes

In [22]:
crimesarray = crimes['ZIPCODE'].unique()
arrestsarray = arrests['ZIPCODE'].unique()

In [23]:
zipCodeArray = list(set(crimesarray) | set(arrestsarray))

In [24]:
zipCodedf = pd.DataFrame(zipCodeArray, columns = ['Zip'])

Only include housing, population data where stuff happens

In [25]:
homeValue['Zip'] = homeValue['RegionName'].astype(str)

In [26]:
populations['Zip'] = populations['Zip Code'].astype(str)

In [27]:
homeValue = pd.merge(zipCodedf, homeValue, how = 'left', on = 'Zip')

In [28]:
populations = pd.merge(zipCodedf, populations, how = 'left', on = 'Zip')

In [29]:
homeValue = homeValue.fillna(0)

In [30]:
populations = populations.fillna(0)

Add area to zip code geodataframes

In [42]:
cityzipCodes['Area'] = cityzipCodes['geometry'].to_crs({'init': 'epsg:3395'}).map(lambda p: p.area / 10**6)

# Data Analysis & Results

- 3 maps 
- 1 choropleth with counts popup including total, population, #streetlights, house value
- 1 heatmap with time
- 1 choropleth with popup graphs

- statistical analysis of streetlights

We will start off with a couple mapped visualizations of our data

To do this we first tally up the total amount of crimes and arrests committed and made in each zip code

In [32]:
crimeCountDict = crimes['ZIPCODE'].value_counts()
arrestCountDict = arrests['ZIPCODE'].value_counts()
streetlightsDict = streetlightsdf['ZIPCODE'].value_counts()

In [33]:
crimeCounts = pd.DataFrame.from_dict(crimeCountDict).reset_index()
arrestCounts = pd.DataFrame.from_dict(arrestCountDict).reset_index()
streetlightsCounts = pd.DataFrame.from_dict(streetlightsDict).reset_index()

In [34]:
crimeCounts.columns = ['Zip','Crime Count']
arrestCounts.columns = ['Zip', 'Arrest Count']
streetlightsCounts.columns = ['Zip', 'Streetlight Count']

In [35]:
counts = pd.merge(crimeCounts, arrestCounts, how = 'outer', on = 'Zip')
counts = pd.merge(counts, streetlightsCounts, how = 'left', on = 'Zip')

In [36]:
counts = counts.fillna(0)

In [37]:
counts['Total'] = counts['Crime Count'] + counts['Arrest Count']

We then map out each zip code and add pop ups to each one to display data for each zip code

In [38]:
choroplethPopup = folium.Map([34.055862, -118.326904])
zipcodeLayer = folium.FeatureGroup(name = 'Zip Codes')
transparent = {'fillColor': '#00000000', 'color': '#00000000'}
folium.Choropleth(
    geo_data = cityzipCodes,
    name = 'Choropleth',
    data = counts,
    columns = ['Zip', 'Total'],
    key_on = 'feature.properties.ZIPCODE',
    fill_color = 'BuPu',
    fill_opacity = 0.7,
    line_opacity = 0.2,
    legend_name = 'Total Crimes').add_to(choroplethPopup)

for i in zipCodeArray:
    datarow = counts.where(counts['Zip'] == i).dropna().iloc[0]
    gs = folium.GeoJson(zipCodes.loc[zipCodes['ZIPCODE'] == i], style_function = lambda x: transparent)
    zipCodelabel = i
    crimeslabel = datarow['Crime Count']
    arrestslabel = datarow['Arrest Count']
    totallabel = datarow['Total']
    populationlabel = populations.where(populations['Zip'] == i).dropna().iloc[0]['Total Population']
    homeValuelabel = int(homeValue.where(homeValue['Zip'] == i).dropna().iloc[0]['2019-10'])
    streetlightslabel = datarow['Streetlight Count']
    popuphtml = """
            <html>
                <body>
                    <h1>%s</h1>
                    <p>Number of Crimes Commited: %s</p>
                    <p>Number of Arrests Made: %s</p>
                    <p>Total: %s</p>
                    <p>Median Home Price Per Square Foot: %s</p>
                    <p>Population: %s</p>
                    <p>Number of Streetlights: %s</p>
                </body>
            </html>"""%(zipCodelabel, str(crimeslabel), str(arrestslabel), str(totallabel), 
                        str(homeValuelabel), str(populationlabel), str(streetlightslabel))
    folium.Popup(popuphtml,max_width = 250).add_to(gs)
    gs.add_to(zipcodeLayer)
choroplethPopup.add_child(zipcodeLayer)
folium.LayerControl().add_to(choroplethPopup)

<folium.map.LayerControl at 0x22823daaa08>

In [138]:
choroplethPopup.save('LabeledChoropleth.html')

Next we move on to a dynamic heatmap of crime activity

In [64]:
crimesHeatMap = crimes[['Report ID', 'Location', 'Hour']]
arrestsHeatMap = arrests[['Report ID', 'Location', 'Hour']]

In [65]:
heatMap  = pd.merge(crimesHeatMap, arrestsHeatMap, how = 'outer', on = ['Report ID', 'Location', 'Hour'])

In [106]:
heatMap['Hour'] = heatMap['Hour'].replace(24.0, 0.0)

In [107]:
heatMap['Lat'], heatMap['Lon'] = heatMap['Location'].str.split(',').str

In [108]:
heatMap['Lat'] = heatMap['Lat'].str.replace('(','').astype(float)
heatMap['Lon'] = heatMap['Lon'].str.replace(')','').astype(float)

In [109]:
heatMap['count'] = 1

In [110]:
hourList = []
for i in heatMap.Hour.sort_values().unique():
    hourList.append(heatMap.loc[heatMap.Hour == i, ['Lat', 'Lon', 'count']].groupby(['Lat', 'Lon']).sum().reset_index().values.tolist())

In [136]:
timeHeatMap = folium.Map([34.055862, -118.326904])
plugins.HeatMapWithTime(
    hourList,
    radius = 5,
    gradient = {0.2: 'blue', 0.4: 'lime', 0.6: 'orange', 1: 'red'},
    min_opacity = 0.5,
    max_opacity = 0.8,
    use_local_extrema = True
    ).add_to(timeHeatMap)

<folium.plugins.heat_map_withtime.HeatMapWithTime at 0x22930dfc848>

In [139]:
timeHeatMap.save('HeatMap.html')

Now we make charts of our data and put them on the map

In [None]:
chartDict = {}
for i in cityzipCodes['ZIP']:
    

In [143]:
arrests

Unnamed: 0,Report ID,Date,Time,Reporting District,Arrest Age,Arrest Sex,Arrest Descent,Description,Location,geometry,ZIPCODE,Hour
0,5614161,04/29/2019,1040,842,41,M,H,Robbery,"(34.0508, -118.4592)",POINT (-118.45920 34.05080),90025,10
1,5806609,11/23/2019,1830,457,23,F,H,Robbery,"(34.0477, -118.2047)",POINT (-118.20470 34.04770),90033,18
2,5615197,04/30/2019,615,663,27,M,O,Burglary,"(34.0907, -118.3384)",POINT (-118.33840 34.09070),90038,6
...,...,...,...,...,...,...,...,...,...,...,...,...
1226136,5612940,04/27/2019,2315,1797,42,M,H,Aggravated Assault,"(34.2248, -118.4967)",POINT (-118.49670 34.22480),91343,23
1226137,5612654,04/27/2019,1555,1832,38,F,H,Aggravated Assault,"(33.942, -118.2739)",POINT (-118.27390 33.94200),90003,15
1226138,5612691,04/27/2019,1545,1203,32,F,B,Aggravated Assault,"(33.9994, -118.3108)",POINT (-118.31080 33.99940),90062,15


# Ethics & Privacy

*Fill in your ethics & privacy discussion here*

# Conclusion & Discussion

*Fill in your discussion information here*