# New Orleans Incidents and Venues

In [7]:
import numpy as np
import pandas as pd
!pip install shapely
from shapely.geometry import Polygon, Point, MultiPolygon



## 1. Introduction
---
New Orleans is a beautiful and complicated city in southern Louisiana with a 300 year history. It's known for jazz, Creole/Cajun cuisine, and Mardi Gras. The city has generally been the most populous of the state, but suffered a large outflux after the 2005 disaster, Hurricane Katrina. For 2019, it's estimated to be home to **423,656 people** in **13 districts** with **72 distinct neighborhoods**, coming out with a population density around 858 persons per square kilometer. However, along with its rich culture, the city has been riddled with crime, poverty, and flooding. The New Orleans Police Department is estimated to be understaffed by 30- 50% depending on the unit [(see this article)](https://www.fox8live.com/story/35820316/new-orleans-police-pay-increase-is-attempt-to-stop-terrible-attrition-rate/ "Police Troubles")
 and as such must stretch their resources by whatever means they have. In stretching those resources, they must be informed on high problem times, areas, and what to expect. This project focuses upon the relation between New Orleans locales and 911 (emergency/crime) incident calls, seeking to find patterns to better distribute local police patrols and response units. It leverages data on venues, neighborhood zones, and 911 calls in exploring these patterns. Lastly, traffic and miscellaneous incidents will be *excluded* from the project. 
 
Specifically, we want to answer the **question**:
 * Can we use machine learning to cluster New Orleans areas based on 911 incidents and venue types present?

We will also *explore*:
 * Is there a correlation between venue type occurrence and incident type occurrence?  

Which may help the following **stakeholders**:
* Local police and emergency responders
* Government officials allocating city resources in business licenses (if certain venue types trend with crime) or police funding

Additionally, if the information from this project is combined with housing data, it could be useful for:
* Realestate investors and developers 
* Non-locals looking to move to the area

Considering the problem and stakeholders, we can create a map of New Orleans where each neighborhood is clustered according to frequency of 911 incidents and venue types. This may inform where and what type of emergency resource is most needed and around which venue type(s) the incidents occur. 

## 2. Data
---
### 2.1 Data Sources
---
Data for this project comes from four main sources:

1. [**Calls for Service 2018**](https://data.nola.gov/api/views/9san-ivhk/rows.csv?accessType=DOWNLOAD "Calls for Service 2018 csv")   
This is a log of all 911 calls in New Orleans in 2018. It contains 175 different types of incidents ranging from traffic violations to homicide. Each incident’s time, location, priority, and a range of other factors are listed. For this project, incidents related to neighborhood safety and crime will be used. Those such as traffic, medical, and miscellaneous will be *excluded*.
2. [**Foursquare API**](https://developer.foursquare.com/docs/api/endpoints "Foursquare endpoints")    
Venues and their types can be retrieved based on geolocations from a given neighborhood’s centroid (listed on Wikipedia). 
3. [**Wikipedia list of Neighborhoods**](https://en.wikipedia.org/wiki/Neighborhoods_in_New_Orleans "NOLA Neighborhoods")   
A table of New Orleans neighborhoods and their centroid coordinates can be scraped from wikipedia using pandas. 
4. [**Neighborhood boundaries**](https://data.nola.gov/api/views/abhb-x4ch/rows.csv?accessType=DOWNLOAD "Neighborhood Boundaries")    
This is a list of New Orleans neighborhoods and their geographic boundaries as defined by the US census. Incidents from the calls of service will be categorized into neighborhoods based on geolocation using this data.

### 2.2 Data Cleaning
---
Data from Calls for Service 2018 is a huge table consisting of over 460,000 entries. Many fields contain missing values or irrelevant content for this study. In cleaning the table, several things must be done. First, essential columns must be identified and erroneous ones dropped. Second, rows with an incident type not pertaining to neighborhood safety or crime are to be removed. Third, missing values need to be dropped or filled. Finally, naming schemes of rows are revised. 

Scanning the columns in the Calls for Service data set, we can see an incident’s police log number, type, the police code for the type, its initial type and code, 3 different time stamps for time created, arrived, and closed; the disposition (what happened when the responders arrived), and several other fields. For this study, the type and location fields are essential. We will also keep the block address, zip, beat, and police district for additional location information. One time stamp (TimeCreate) and the type code (Type_) are retained for reference and later analysis.       

Summary of Columns | Column Names | Total  
--- | --- | ---  
**Columns kept** | Type_, TypeText, TimeCreate, Beat, BLOCK_ADDRESS, Zip, PoliceDistrict, Location | 8  
**Columns discarded** | NOPD_Item, Priority, InitialType, InitialTypeText, InitialPriority, MapX, MapY, TimeDispatch, TimeArrive, TimeClosed, Disposition, DispositionText, SelfInitiated	| 13  

The next big item for this data set is retaining rows with a relevant type field. As stated before, we want rows that are related to neighborhood safety and crime. Some examples of types to be discarded are traffic related incidents, MEDICAL, and AREA CHECK. For a full list, check below:  

Type Removed | Note  
--- | ---  
'ABANDONED BOAT' | --    
'ABANDONED VEHICLE' | --  
'AIRPLANE CRASH' | --  
'AREA CHECK' | --  
'ATTACHMENT' | --  
'AUTO ACCIDENT' | --  
'AUTO ACCIDENT CITY VEHICLE' | --  
'AUTO ACCIDENT FATALITY' | --  
'AUTO ACCIDENT POLICE VEHICLE' | --  
'AUTO ACCIDENT WITH INJURY' | --  
'BUSINESS CHECK' | --  
'CAD TEST' | --  
'COMPLAINT OTHER' | --  
'DIRECTED PATROL' | --  
'DIRECTED TRAFFIC ENFORCEMENT' | --  
'DISTURBANCE (OTHER)' | May be safety related, but not enough info is available    
'ELECTRONIC MONITORING' | --  
'FLOOD EVENT' | --  
'FLOODED VEHICLE' | --  
'INCIDENT REQUESTED BY ANOTHER AGENCY' | --  
'LOST PROPERTY' | --  
'MEDICAL' | --  
'MEDICAL - NALOXONE' | --  
'MEET AN OFFICER' | --  
'MUNICIPAL ATTACHMENT' | --  
'OFFICER NEEDS ASSISTANCE' | --  
'PARADE ITEM NUMBER' | --  
'RECOVERY OF VEHICLE' | --  
'RESIDENCE CHECK' | --  
'RETURN FOR ADDITIONAL INFO' | --  
'SILENT 911 CALL' | --  
'TAKING TEMPORARY POSSESSION' | --  
'TOW IMPOUNDED VEHICLE (PRIVATE)' | --  
'TRAFFIC ATTACHMENT' | --  
'TRAFFIC CONGESTION' | --  
'TRAFFIC INCIDENT' | --  
'TROOP N AREA - BUSINESS - RESIDENCE CHECK' | --  
'UNDERPASS MONITORING OR CLOSURE' | --  
'WALKING BEAT' | --  
'WARR STOP WITH RELEASE' | --  
**TOTAL REMOVED** | **40**  
**TOTAL RETAINED** | **135**  

It could be argued that other types not removed should be or some removed should be retained. However, without consulting experts the above items are what we will discard. 



After filtering the necessary columns and rows, we drop all rows with nan values in the Location field. Rows marked with (0, 0) are also dropped. Lastly, columns are renamed as follows:  

Initial Name | Final Name   
--- | ---  
Type_ | Code  
TypeText | Type  
TimeCreate | Time  
Beat | Beat  
BLOCK_ADDRESS | Address  
Zip | Zip  
PoliceDistrict | District  
Location | Coordinates    

Data from both Foursquare and Wikipedia is cleaned and formatted as it is retrieved. As for neighborhood boundaries, the boundary information is stored as a MULTIPOLYGON type from a different kernel. We will change this into a POLYGON type from the shapely library in python. Later, we can pass the coordinates from Calls for Service as "points" to see if they are contained in the POLYGON, thus assigning their neighborhood. The necessary columns from this data set are "the_geom" and "Neighborhood". "the_geom" is renamed as "Poly". Last, the neighborhood listings in the Wikipedia list and Neighborhood boundaries are reconciled--the only difference being an additional space in "St. Anthony" in the Neighborhood Boundaries data set.  

### 2.3 Data Examples
---
1. **Calls for Service 2018** (before cleaning)

In [4]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,NOPD_Item,Type_,TypeText,Priority,InitialType,InitialTypeText,InitialPriority,MapX,MapY,TimeCreate,...,TimeArrive,TimeClosed,Disposition,DispositionText,SelfInitiated,Beat,BLOCK_ADDRESS,Zip,PoliceDistrict,Location
0,B0918118,103M,MENTAL PATIENT,2C,67,THEFT,1G,37369000,3513814,02/08/2018 08:35:29 PM,...,02/08/2018 08:37:45 PM,02/08/2018 09:57:14 PM,NAT,Necessary Action Taken,N,2S02,087XX S Claiborne Ave,70118.0,2,"(0.0, 0.0)"
1,A1964518,67C,THEFT FROM EXTERIOR,1A,62C,SIMPLE BURGLARY VEHICLE,1G,0,0,01/17/2018 04:10:29 PM,...,01/18/2018 02:34:08 AM,01/18/2018 03:14:11 AM,GOA,GONE ON ARRIVAL,N,7L03,147XX Chef Menteur Hwy,,0,
2,B0853118,21,COMPLAINT OTHER,1J,103,DISTURBANCE (OTHER),1C,3696210,550646,02/08/2018 10:29:36 AM,...,02/08/2018 06:21:40 PM,02/08/2018 06:28:41 PM,NAT,Necessary Action Taken,N,7A01,037XX Downman Rd,70126.0,3,"(30.00763148, -90.02092967)"
3,B0901318,103,DISTURBANCE (OTHER),1C,103,DISTURBANCE (OTHER),1C,3674468,523681,02/08/2018 05:44:10 PM,...,02/08/2018 06:09:17 PM,02/08/2018 06:52:40 PM,NAT,Necessary Action Taken,N,6F02,029XX S Saratoga St,70115.0,6,"(29.93415313, -90.09054327)"
4,B0894018,107,SUSPICIOUS PERSON,2C,107,SUSPICIOUS PERSON,2C,3662712,530883,02/08/2018 04:55:15 PM,...,02/08/2018 05:02:55 PM,02/08/2018 05:09:17 PM,GOA,GONE ON ARRIVAL,N,2K03,018XX Cambronne St,70118.0,2,"(29.9543012, -90.12741579)"


**After cleaning**

In [5]:

#drop unnecessary columns
callsdf.drop(columns=['NOPD_Item', 'Priority', 'InitialType', 'InitialTypeText', 'InitialPriority', 'MapX', 'MapY', 'TimeDispatch', 'TimeArrive', 'TimeClosed', 'Disposition', 'DispositionText', 'SelfInitiated'], inplace=True) #drop unnecessary columns

#drop unnecessary types
types = callsdf['TypeText'].unique().tolist()
types = sorted(types)
removes =[]
rs = [0, 1, 18, 19, 24, 25, 26, 27, 28, 29, 38, 39, 42, 51, 52, 55, 60, 67, 68, 88, 92, 93, 94, 96, 101, 105, 109, 119, 121, 122, 128, 153, 159, 160, 161, 162, 163, 169, 173, 174]   # non-crime incidents     

for i in rs:
    removes.append(types[i]) #make traffic, medical, and misc incidents

removes #list of traffic, medical, and misc incidents
civildf = callsdf[~callsdf.TypeText.isin(removes)] #get df without traffic, medical, and miscillaneous incidents

#drop rows with nan and 0, 0 values in Location
civildf.Location.replace(to_replace='(0.0, 0.0)', value=np.nan, inplace=True)
civildf.dropna(subset=['Location'], inplace=True)

#Reset the index
civildf.reset_index(inplace=True)
civildf.drop(columns='index', inplace=True)

#Change our location values from string type into a latitude, longitude tuple 
civildf['Location'] = civildf['Location'].str.replace('[()]', '', regex=True)
for i in range(civildf.shape[0]):
    b = civildf.loc[i, 'Location'].split(",") #split into a list
    for j in range(len(b)):
        b[j] = float(b[j])
    civildf.at[i, 'Location'] = b #reassign poly value into list of x, y tuples
    
#rename columns
civildf.columns=['Code', 'Type', 'Time', 'Beat', 'Address', 'Zip', 'District', 'Coordinates']
civildf.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,Code,Type,Time,Beat,Address,Zip,District,Coordinates
0,107,SUSPICIOUS PERSON,02/08/2018 04:55:15 PM,2K03,018XX Cambronne St,70118.0,2,"[29.9543012, -90.12741579]"
1,107,SUSPICIOUS PERSON,02/08/2018 05:20:45 PM,5I04,Andry St & N Claiborne Ave,70117.0,5,"[29.9668437, -90.0168289]"
2,62B,BUSINESS BURGLARY,02/08/2018 05:19:41 AM,7L01,046XX Michoud Blvd,70129.0,7,"[30.03203358, -89.92870826]"
3,29,DEATH,02/08/2018 10:16:26 AM,6P01,014XX General Taylor St,70115.0,6,"[29.92592877, -90.09646117]"
4,67,THEFT,02/08/2018 10:54:10 AM,5L01,023XX N Tonti St,70117.0,5,"[29.98070722, -90.05459674]"


2. **Foursquare API**  
Venues and their type will be combined with the Wikipedia list of Neighborhoods later.

3. **Wikipedia List of Neighborhoods**

In [19]:
tables = pd.read_html('https://en.wikipedia.org/wiki/Neighborhoods_in_New_Orleans', header=0)
neighborhoods = tables[0]
neighborhoods.head()


Unnamed: 0,Neighborhood,Longitude,Latitude
0,U.S. NAVAL BASE,-90.026093,29.946085
1,ALGIERS POINT,-90.051606,29.952462
2,WHITNEY,-90.042357,29.9472
3,AUDUBON,-90.12145,29.932994
4,OLD AURORA,-90.0,29.92444


4. **Neighborhood Boundaries** (before cleaning)

In [41]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,the_geom,OBJECTID,Neighborhood,Shape_Length,Shape_Area
0,MULTIPOLYGON (((-90.057938033305 29.9380212048...,906,LOWER GARDEN DISTRICT,30616.527778,30395410.0
1,MULTIPOLYGON (((-90.100964619377 29.9171524047...,883,EAST RIVERSIDE,12713.023743,6908365.0
2,MULTIPOLYGON (((-90.029798602096 29.9825104216...,889,FLORIDA DEV,5959.031328,1540363.0
3,MULTIPOLYGON (((-90.112151624385 29.9773204178...,881,DIXON,13225.197641,7050618.0
4,MULTIPOLYGON (((-90.090187615105 29.9286374080...,912,MILAN,16762.463146,14702410.0


**After cleaning**

In [42]:
boundariesdf.columns=['Poly', 'id', 'Neighborhood', 'length', 'area'] #Change column names
boundariesdf.drop(columns=['id', 'length', 'area'], inplace=True)
df1 = boundariesdf
df1['Poly'] = df1['Poly'].str.replace('[MULTIPOLYGON((()))]', '', regex=True) #remove the multiploy and stuff
for hood in range(72):
    x = df1.loc[hood, 'Poly'].split(",") #split into a list
    for i in range(0, len(x)):
        x[i] = x[i].split() #split list items into tuples
        for j in range(len(x[i])):
            x[i][j] = float(x[i][j]) #convert tuple items into floats
    df1.at[hood, 'Poly'] = x #reassign poly value into list of x, y tuples
df1['Poly'] = df1['Poly'].apply(Polygon) # convert our list into polygon objects
boundariesdf.replace(to_replace='ST.  ANTHONY', value='ST. ANTHONY', inplace=True)
boundariesdf.head()

Unnamed: 0,Poly,Neighborhood
0,"POLYGON ((-90.05793803330501 29.93802120487, -...",LOWER GARDEN DISTRICT
1,"POLYGON ((-90.100964619377 29.917152404751, -9...",EAST RIVERSIDE
2,"POLYGON ((-90.02979860209599 29.982510421634, ...",FLORIDA DEV
3,"POLYGON ((-90.11215162438501 29.977320417804, ...",DIXON
4,"POLYGON ((-90.09018761510499 29.928637408086, ...",MILAN
