## UD's Crime Heat Map

Quick background information:
- I found this webstite  and was interested in scraping the data from it. https://www.crimemapping.com/map/location/19716,%20Newark,%20DE,%20USA?id=dHA9MCNsb2M9NjgzNDc4MSNsbmc9NTQjcGw9MTIyNTQyMSNsYnM9MTQ6MTI4OTkyNw==#

- I converted ALL crime incident reports from zip code 19716 Newark DE and a surrounding 2-mile radius
- This heat map depicts ALL crime incidents from time of 8-31-2022 to 11-30-2022

Goals for this project:
- assess in which areas around UD's Campus are the most crime ridden
- perform EDA and data cleaning

In [1]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

In [2]:
df = pd.read_csv("UDEL Crimes.csv")

In [3]:
df1 = df.copy()

In [5]:
df.head(2)

Unnamed: 0,Type,Description,Incident #,Location,Agency,Date,Unnamed: 6,Unnamed: 7
0,,LARCENY/SHOPLIFTING,3122041000.0,100 BLOCK MAIN ST,Newark Police,11/28/22 20:18,,Obtained from: https://www.crimemapping.com/ma...
1,,LARCENY/FROM VEHICLE/NOT ATTACHED,3122041000.0,200 BLOCK MAIN ST,Newark Police,11/28/22 17:09,,two mile radius from zipcode of 19716


In [6]:
df.columns

Index(['Type', 'Description', 'Incident #', 'Location', 'Agency', 'Date',
       'Unnamed: 6', 'Unnamed: 7'],
      dtype='object')

In [7]:
df.drop(["Type", "Unnamed: 6", "Unnamed: 7"], axis = 1, inplace=True)

In [8]:
df.columns

Index(['Description', 'Incident #', 'Location', 'Agency', 'Date'], dtype='object')

In [9]:
#let's create month, day of month, year, and then time column as well
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 653 entries, 0 to 652
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Description  653 non-null    object 
 1   Incident #   625 non-null    float64
 2   Location     653 non-null    object 
 3   Agency       653 non-null    object 
 4   Date         653 non-null    object 
dtypes: float64(1), object(4)
memory usage: 25.6+ KB


In [10]:
#non-date object
df[["Month", "Date", "Year"]] = df["Date"].str.split("/", expand=True)
df.head(2)

Unnamed: 0,Description,Incident #,Location,Agency,Date,Month,Year
0,LARCENY/SHOPLIFTING,3122041000.0,100 BLOCK MAIN ST,Newark Police,28,11,22 20:18
1,LARCENY/FROM VEHICLE/NOT ATTACHED,3122041000.0,200 BLOCK MAIN ST,Newark Police,28,11,22 17:09


In [11]:
df[["Year", "Time"]] = df["Year"].str.split(" ", expand=True)
df.head(2)

Unnamed: 0,Description,Incident #,Location,Agency,Date,Month,Year,Time
0,LARCENY/SHOPLIFTING,3122041000.0,100 BLOCK MAIN ST,Newark Police,28,11,22,20:18
1,LARCENY/FROM VEHICLE/NOT ATTACHED,3122041000.0,200 BLOCK MAIN ST,Newark Police,28,11,22,17:09


In [12]:
df.shape

(653, 8)

In [13]:
df.isnull().sum()

Description     0
Incident #     28
Location        0
Agency          0
Date            0
Month           0
Year            0
Time            0
dtype: int64

In [17]:
#First we need to remove "BLOCK" from address and then convert address
#into lat and lon quick test with lat/long package first:

df.iloc[2][2]

'300 BLOCK PAPER MILL RD'

In [15]:
x = df.iloc[2][2].lower() + ', Newark, DE 19716, United States'
print(x)

300 block paper mill rd, Newark, DE 19716, United States


In [16]:
from geopy.geocoders import Nominatim
app = Nominatim(user_agent="test")
address = "300 Paper Mill Rd, Newark, DE 19711, United States"
# address = "First St SE, Washington, DC 20004, United States"
location = app.geocode(address).raw
location['lat'], location['lon']

('39.691342', '-75.74942')

In [19]:
#removing "BLOCK" from address:

banned = ["BLOCK"]
f = lambda x: ' '.join([item for item in x.split() if item not in banned])
df["Location"] = df["Location"].apply(f)

In [20]:
df.head(2)

Unnamed: 0,Description,Incident #,Location,Agency,Date,Month,Year,Time
0,LARCENY/SHOPLIFTING,3122041000.0,100 MAIN ST,Newark Police,28,11,22,20:18
1,LARCENY/FROM VEHICLE/NOT ATTACHED,3122041000.0,200 MAIN ST,Newark Police,28,11,22,17:09


In [23]:
#some locations can not be automatically converted, so I will be using except clause while
#iterating throughout dataset for now

In [21]:
test_df = df

In [24]:
#be aware the code below will take a couple mins to run as it is going through
#the whole data set automatically converting addresses into lat first then lon cordinates

In [None]:
data = []
for value in test_df["Location"]:
    try:
        app = Nominatim(user_agent="test")
        address = value + ", Newark, DE 19716, United States"
        location = app.geocode(address).raw
        data.append(location['lat'])  
    except:
        data.append("unknown")
test_df["Latitude"] = data 

In [None]:
data = []
for value in test_df["Location"]:
    try:
        app = Nominatim(user_agent="test")
        address = value + ", Newark, DE 19716, United States"
        location = app.geocode(address).raw
        data.append(location['lon'])
    except:
        data.append("unknown")
test_df["Longitude"] = data 

In [None]:
#saving the df into csv so I don't have to wait for results again:
test_df.to_csv('test_data_processed.csv')

In [None]:
#we need to drop the "uknown" in observations for lat/lon before mapping
test_df.drop(test_df.loc[test_df['Latitude']=="unknown"].index, inplace=True)

In [None]:
test_df.drop(test_df.loc[test_df['Longitude']=="unknown"].index, inplace=True)

In [None]:
#for mapping cordinates we will need to convert df lat/lon columsn into an array
crime_locations = test_df[["Latitude", "Longitude"]]

In [None]:
#iterating through test_df to append intensity of .1 (can mess around with this later)

data = []

for value in crime_locations["Latitude"]:
    data.append(.1)
crime_locations["Intensity"] = data 

In [None]:
crime_locations.head()

In [None]:
#converting to array with all data required to map df
crime_locations_array = crime_locations.to_numpy()

In [None]:
#what the array/other info looks like
crime_locations_array

In [None]:
# first attempt at heat map

map_heatmap = folium.Map([39.6749886, -75.7490016], tiles='CartoDB Positron', zoom_start=13)

# plugins.HeatMap(crimes).add_to(map_heatmap)
plugins.HeatMap(crime_locations_array).add_to(map_heatmap)

map_heatmap

In [None]:
#to save map as interactive HTML file:
# map_heatmap.save('map.html')