# Crimes by District Area

## New data

In order to represent the crimes committed in each district area, we have obtained 2 new datasets: [**police_districts.geojson**](https://data.cityofchicago.org/Public-Safety/Boundaries-Police-Districts-current-/fthy-xz3r) and [**police_stations.csv**](https://www.chicago.gov/city/en/depts/cpd/dataset/police_stations.html).

The **police_districts.geojson** file contains information for the 25 district areas in the city of Chicago. Each row has the following information:

|Attribute|Description|Domain|
| --- | --- | --- |
| *dist_label* | Label of the district area | String |
| *dist_num* | Number identifier for the district area | String |
| *geometry* | Geometric representation of the district areas |  Geometry |

The **police_stations.csv** dataset provides information on the current 23 police stations in Chicago:

| Attribute | Description | Domain |
| --- | --- | --- |
| *DISTRICT* | Number identifier for the district area where the station is located | String |
| *DISTRICT NAME* | Name of the district area where the station is located | String |
| *ADDRESS* | Street address of the police station | String |
| *CITY* | City where the police station is located | String |
| *STATE* | State abbreviation where the police station is located | String |
| *ZIP* | ZIP code of the station's location | Integer |
| *WEBSITE* | URL of the official website for the police station or district | String |
| *PHONE* | Main contact phone number for the station | String |
| *FAX* | Fax number for the station | String |
| *TTY* | Teletypewriter (TTY) number for text communication assistance | String |
| *X COORDINATE* | X coordinate for the station location in the local grid | Float |
| *Y COORDINATE* | Y coordinate for the station location in the local grid | Float |
| *LATITUDE* | Latitude coordinate of the station location | Float |
| *LONGITUDE* | Longitude coordinate of the station location | Float |
| *LOCATION* | Coordinates of the station in (latitude, longitude) format | String |


## District areas

A preliminary preprocessing must be done to identify which community areas are included in each district area.

In [1]:
import requests
import json

response = requests.get("https://gitlab.com/drvicsana/crime-milp-project/-/raw/main/data/stations.json")
stations = json.loads(response.text)

response = requests.get("https://gitlab.com/drvicsana/crime-milp-project/-/raw/main/geojson/raw_community_areas.json")
community_areas = json.loads(response.text)

Originally

In [2]:
import pandas as pd

df_sta = pd.read_csv("data/police_stations.csv")
df_sta = df_sta[['DISTRICT', 'DISTRICT NAME', 'ADDRESS']]

df_sta.rename(columns={
    'DISTRICT': 'dist_num',
    'DISTRICT NAME': 'dist_name', 
    'ADDRESS': 'station_name'
}, inplace=True)

# Assigning '0' as the unofficial number of the Headquarters district
df_sta.loc[df_sta['dist_num'] == 'Headquarters', 'dist_num'] = '0'

print(f'Number of Districts:\n{len(df_sta)}')
df_sta.head()

Number of Districts:
23


Unnamed: 0,dist_num,dist_name,station_name
0,0,Headquarters,3510 S Michigan Ave
1,18,Near North,1160 N Larrabee St
2,19,Town Hall,850 W Addison St
3,20,Lincoln,5400 N Lincoln Ave
4,22,Morgan Park,1900 W Monterey Ave


In [3]:
import geopandas as gpd

gdf_dist = gpd.read_file('data/police_districts.geojson')
gdf_dist = gdf_dist[gdf_dist['dist_num'] != '31']

gdf_dist.drop(columns='dist_label', inplace=True)

df_sta = df_sta.merge(gdf_dist, on=['dist_num'], how='outer')

print(f'Number of Districts:\n{len(df_sta)}')
df_sta.head()

Number of Districts:
23


Unnamed: 0,dist_num,dist_name,station_name,geometry
0,0,Headquarters,3510 S Michigan Ave,
1,1,Central,1718 S State St,"MULTIPOLYGON (((-87.62437 41.88886, -87.62419 ..."
2,10,Ogden,3315 W Ogden Ave,"MULTIPOLYGON (((-87.68604 41.86661, -87.68603 ..."
3,11,Harrison,3151 W Harrison St,"MULTIPOLYGON (((-87.70679 41.90283, -87.70679 ..."
4,12,Near West,1412 S Blue Island Ave\n,"MULTIPOLYGON (((-87.65742 41.90351, -87.65739 ..."


In [4]:
df_dist = pd.DataFrame(stations)
df_dist = df_dist.merge(df_sta, on=['station_name'], how='outer')
df_dist.drop(columns=['longitude', 'latitude'], inplace=True)

print(f'Number of Districts:\n{len(df_dist)}')
df_dist.head()

Number of Districts:
23


Unnamed: 0,station_name,community_area,dist_num,dist_name,geometry
0,1160 N Larrabee St,NEAR NORTH SIDE,18,Near North,"MULTIPOLYGON (((-87.63068 41.92623, -87.6296 4..."
1,1412 S Blue Island Ave\n,NEAR WEST SIDE,12,Near West,"MULTIPOLYGON (((-87.65742 41.90351, -87.65739 ..."
2,1438 W 63rd St,WEST ENGLEWOOD,7,Englewood,"MULTIPOLYGON (((-87.63076 41.7942, -87.63076 4..."
3,1718 S State St,NEAR SOUTH SIDE,1,Central,"MULTIPOLYGON (((-87.62437 41.88886, -87.62419 ..."
4,1900 W Monterey Ave,MORGAN PARK,22,Morgan Park,"MULTIPOLYGON (((-87.63632 41.73618, -87.63592 ..."


In [5]:
# Converting to GeoDataFrame to be able to represent the districts in a map
gdf_dist = gpd.GeoDataFrame(df_dist, geometry='geometry', crs="EPSG:4326")

# No missing geometries for the map representation
gdf_dist.dropna(inplace=True)

print(f'Number of Districts:\n{len(gdf_dist)}')
gdf_dist.head()

Number of Districts:
22


Unnamed: 0,station_name,community_area,dist_num,dist_name,geometry
0,1160 N Larrabee St,NEAR NORTH SIDE,18,Near North,"MULTIPOLYGON (((-87.63068 41.92623, -87.6296 4..."
1,1412 S Blue Island Ave\n,NEAR WEST SIDE,12,Near West,"MULTIPOLYGON (((-87.65742 41.90351, -87.65739 ..."
2,1438 W 63rd St,WEST ENGLEWOOD,7,Englewood,"MULTIPOLYGON (((-87.63076 41.7942, -87.63076 4..."
3,1718 S State St,NEAR SOUTH SIDE,1,Central,"MULTIPOLYGON (((-87.62437 41.88886, -87.62419 ..."
4,1900 W Monterey Ave,MORGAN PARK,22,Morgan Park,"MULTIPOLYGON (((-87.63632 41.73618, -87.63592 ..."


In this part, we display a map that shows both the districts and the community areas. By hovering over or clicking on them, additional information is displayed.

Additionally, we have added a static number centered within each district area to improve district identification.

In [6]:
import folium

m = folium.Map(location=[41.840019, -87.691057], zoom_start=11, width="100%", tiles="CartoDB positron")

# Adding districts
for _, polygon in gdf_dist.iterrows():
    geo_j = folium.GeoJson(
        data=polygon['geometry'],
        tooltip=folium.Tooltip(
            f"<b>District:</b> {polygon['dist_num']}<br>"
            f"<b>Name:</b> {polygon['dist_name']}"
        ),
        popup=folium.Popup(
            f"<b>District:</b> {polygon['dist_num']}<br>"
            f"<b>Name:</b> {polygon['dist_name']}",
            max_width=250
        ),
        style_function=lambda feature: {
            'fillColor': '#ff0000',
            'color': '#ff0000',
            'weight': 3,
            'fillOpacity': 0.1
        },
        highlight_function=lambda feature: {
            'fillColor': '#ffffff',
            'color': '#cc0000',
            'weight': 4,
            'fillOpacity': 1
        }
    )
    geo_j.add_to(m)

    centroid = polygon['geometry'].centroid
    folium.Marker(
        location=[centroid.y, centroid.x],
        popup=f"District: {polygon['dist_num']}",
        icon=folium.DivIcon(
            html=(
                f'<div style="font-size: 12pt; color: black; font-weight: bold;">'
                f'{polygon["dist_num"]}'
                f'</div>'
            )
        )
    ).add_to(m)

# Adding community areas
for polygon in community_areas["features"]:
    geo_j = folium.GeoJson(
        data=polygon,
        tooltip=folium.Tooltip(
            f"<b>Name:</b> {polygon['properties']['community_area']}<br>"
            f"<b>Population:</b> {polygon['properties']['population']} hab.<br>"
            f"<b>Area:</b> {polygon['properties']['area']} km²"
        ),
        popup=folium.Popup(
            f"<b>Name:</b> {polygon['properties']['community_area']}<br>"
            f"<b>Population:</b> {polygon['properties']['population']} hab.<br>"
            f"<b>Area:</b> {polygon['properties']['area']} km²",
            max_width=250
        ),
        style_function=lambda feature: {
            'fillColor': '#0066a2',
            'color': '#0066a2',
            'weight': 1,
            'fillOpacity': 0.1
        },
        highlight_function=lambda feature: {
            'fillColor': '#ffffff',
            'color': '#003366',
            'weight': 2,
            'fillOpacity': 1
        }
    )
    geo_j.add_to(m)

folium.LayerControl().add_to(m)

m

Using the map, we manually identified the corresponding district for each community area. We have designated **DOUGLAS** as the community area for the **Headquarters** district since it was originally assigned to this district. The distribution is as follows:

- **District 0**: DOUGLAS
- **District 1**: NEAR SOUTH SIDE, LOOP
- **District 2**: FULLER PARK, GRAND BOULEVARD, HYDE PARK, KENWOOD, OAKLAND, WASHINGTON PARK
- **District 3**: WOODLAWN, GREATER GRAND CROSSING, SOUTH SHORE
- **District 4**: SOUTH DEERING, AVALON PARK, BURNSIDE, CALUMET HEIGHTS, EAST SIDE, HEGEWISCH, SOUTH CHICAGO
- **District 5**: PULLMAN, RIVERDALE, ROSELAND, WEST PULLMAN
- **District 6**: AUBURN GRESHAM, CHATHAM
- **District 7**: WEST ENGLEWOOD, ENGLEWOOD
- **District 8**: CHICAGO LAWN, ARCHER HEIGHTS, ASHBURN, CLEARING, GARFIELD RIDGE, WEST ELSDON, WEST LAWN
- **District 9**: BRIDGEPORT, ARMOUR SQUARE, BRIGHTON PARK, GAGE PARK, MCKINLEY PARK, NEW CITY
- **District 10**: NORTH LAWNDALE, SOUTH LAWNDALE
- **District 11**: EAST GARFIELD PARK, HUMBOLDT PARK, WEST GARFIELD PARK
- **District 12**: NEAR WEST SIDE, LOWER WEST SIDE, WEST TOWN
- **District 14**: LOGAN SQUARE, AVONDALE
- **District 15**: AUSTIN
- **District 16**: JEFFERSON PARK, DUNNING, EDISON PARK, FOREST GLEN, NORWOOD PARK, OHARE, PORTAGE PARK
- **District 17**: ALBANY PARK, IRVING PARK, NORTH PARK
- **District 18**: NEAR NORTH SIDE, LINCOLN PARK
- **District 19**: LAKE VIEW, NORTH CENTER
- **District 20**: LINCOLN SQUARE, EDGEWATER, UPTOWN
- **District 22**: MORGAN PARK, BEVERLY, MOUNT GREENWOOD, WASHINGTON HEIGHTS
- **District 24**: ROGERS PARK, WEST RIDGE
- **District 25**: BELMONT CRAGIN, HERMOSA, MONTCLARE

In the absence of defined boundaries for the **Headquarters** district, we apply those of **DOUGLAS** in the corresponding entry of the `geometry` field.

In [7]:
# Looking for the record that corresponds to 'DOUGLAS'
for i in range(len(community_areas["features"])):
    if community_areas["features"][i]["properties"]["community_area"]=='DOUGLAS':
        print(i, community_areas["features"][i]["properties"]["community_area"])

34 DOUGLAS


In [8]:
from shapely.geometry import Polygon, MultiPolygon

coords_douglas = community_areas["features"][34]["geometry"]["coordinates"]
polygons = [Polygon(polygon[0]) for polygon in coords_douglas]
multipoly_douglas = MultiPolygon(polygons)
df_dist.loc[df_dist['dist_num'] == '0', 'geometry'] = multipoly_douglas

print(f'Number of Districts:\n{len(df_dist)}')
df_dist.head()

Number of Districts:
23


Unnamed: 0,station_name,community_area,dist_num,dist_name,geometry
0,1160 N Larrabee St,NEAR NORTH SIDE,18,Near North,"MULTIPOLYGON (((-87.63068 41.92623, -87.6296 4..."
1,1412 S Blue Island Ave\n,NEAR WEST SIDE,12,Near West,"MULTIPOLYGON (((-87.65742 41.90351, -87.65739 ..."
2,1438 W 63rd St,WEST ENGLEWOOD,7,Englewood,"MULTIPOLYGON (((-87.63076 41.7942, -87.63076 4..."
3,1718 S State St,NEAR SOUTH SIDE,1,Central,"MULTIPOLYGON (((-87.62437 41.88886, -87.62419 ..."
4,1900 W Monterey Ave,MORGAN PARK,22,Morgan Park,"MULTIPOLYGON (((-87.63632 41.73618, -87.63592 ..."


The **Central** and **Wentworth** districts overlap with the **Headquarters** district, so for better visualization, the latter will be added last in the choropleth.

That said, the `df_dist` dataframe requires some modifications.

In [9]:
hq = df_dist.iloc[0]

# Moving first row to last
df_dist = df_dist.iloc[1:]
df_dist = pd.concat([df_dist, hq.to_frame().T], ignore_index=True)

# Converting to GeoDataFrame
gdf_dist = gpd.GeoDataFrame(df_dist, geometry='geometry', crs="EPSG:4326")
gdf_dist

Unnamed: 0,station_name,community_area,dist_num,dist_name,geometry
0,1412 S Blue Island Ave\n,NEAR WEST SIDE,12,Near West,"MULTIPOLYGON (((-87.65742 41.90351, -87.65739 ..."
1,1438 W 63rd St,WEST ENGLEWOOD,7,Englewood,"MULTIPOLYGON (((-87.63076 41.7942, -87.63076 4..."
2,1718 S State St,NEAR SOUTH SIDE,1,Central,"MULTIPOLYGON (((-87.62437 41.88886, -87.62419 ..."
3,1900 W Monterey Ave,MORGAN PARK,22,Morgan Park,"MULTIPOLYGON (((-87.63632 41.73618, -87.63592 ..."
4,2150 N California Ave,LOGAN SQUARE,14,Shakespeare,"MULTIPOLYGON (((-87.69257 41.93943, -87.69253 ..."
5,2255 E 103rd St,SOUTH DEERING,4,South Chicago,"MULTIPOLYGON (((-87.55547 41.76135, -87.55475 ..."
6,3120 S Halsted St,BRIDGEPORT,9,Deering,"MULTIPOLYGON (((-87.63193 41.86015, -87.63117 ..."
7,3151 W Harrison St,EAST GARFIELD PARK,11,Harrison,"MULTIPOLYGON (((-87.70679 41.90283, -87.70679 ..."
8,3315 W Ogden Ave,NORTH LAWNDALE,10,Ogden,"MULTIPOLYGON (((-87.68604 41.86661, -87.68603 ..."
9,3420 W 63rd St,CHICAGO LAWN,8,Chicago Lawn,"MULTIPOLYGON (((-87.71442 41.82861, -87.71433 ..."


## Crimes

In [10]:
response = requests.get("https://gitlab.com/drvicsana/crime-milp-project/-/raw/main/data/crimes.json")
crimes = json.loads(response.text)

In [11]:
df_cr = pd.DataFrame(crimes)

print(f'Number of Crimes:\n{len(df_cr)}')
df_cr

Number of Crimes:
234896


Unnamed: 0,type,arrest,domestic,community_area,longitude,latitude
0,OTHER OFFENSE,False,True,ENGLEWOOD,-87.649437,41.771782
1,SEX OFFENSE,True,False,GREATER GRAND CROSSING,-87.597001,41.763338
2,SEX OFFENSE,False,False,JEFFERSON PARK,-87.766404,41.985875
3,WEAPONS VIOLATION,False,False,ENGLEWOOD,-87.652840,41.762615
4,BATTERY,True,True,WEST TOWN,-87.699285,41.900506
...,...,...,...,...,...,...
234891,HOMICIDE,True,False,EDGEWATER,-87.658318,41.993457
234892,HOMICIDE,False,False,LOWER WEST SIDE,-87.638918,41.857173
234893,HOMICIDE,False,False,ROSELAND,-87.621374,41.711753
234894,HOMICIDE,False,False,SOUTH LAWNDALE,-87.728122,41.841506


Here, we map the community areas based on the distribution we created earlier. Subsequently, we verify that the number of rows remains unchanged.

In [12]:
_0  = ['DOUGLAS']
_1  = ['NEAR SOUTH SIDE', 'LOOP']
_2  = ['FULLER PARK', 'GRAND BOULEVARD', 'HYDE PARK', 'KENWOOD', 'OAKLAND', 'WASHINGTON PARK']
_3  = ['WOODLAWN', 'GREATER GRAND CROSSING', 'SOUTH SHORE']
_4  = ['SOUTH DEERING', 'AVALON PARK', 'BURNSIDE', 'CALUMET HEIGHTS', 'EAST SIDE', 'HEGEWISCH', 'SOUTH CHICAGO']
_5  = ['PULLMAN', 'RIVERDALE', 'ROSELAND', 'WEST PULLMAN']
_6  = ['AUBURN GRESHAM', 'CHATHAM']
_7  = ['WEST ENGLEWOOD', 'ENGLEWOOD']
_8  = ['CHICAGO LAWN', 'ARCHER HEIGHTS', 'ASHBURN', 'CLEARING', 'GARFIELD RIDGE', 'WEST ELSDON', 'WEST LAWN']
_9  = ['BRIDGEPORT', 'ARMOUR SQUARE', 'BRIGHTON PARK', 'GAGE PARK', 'MCKINLEY PARK', 'NEW CITY']
_10 = ['NORTH LAWNDALE', 'SOUTH LAWNDALE']
_11 = ['EAST GARFIELD PARK', 'HUMBOLDT PARK', 'WEST GARFIELD PARK']
_12 = ['NEAR WEST SIDE', 'LOWER WEST SIDE', 'WEST TOWN']
_14 = ['LOGAN SQUARE', 'AVONDALE']
_15 = ['AUSTIN']
_16 = ['JEFFERSON PARK', 'DUNNING', 'EDISON PARK', 'FOREST GLEN', 'NORWOOD PARK', 'OHARE', 'PORTAGE PARK']
_17 = ['ALBANY PARK', 'IRVING PARK', 'NORTH PARK']
_18 = ['NEAR NORTH SIDE', 'LINCOLN PARK']
_19 = ['LAKE VIEW', 'NORTH CENTER']
_20 = ['LINCOLN SQUARE', 'EDGEWATER', 'UPTOWN']
_22 = ['MORGAN PARK', 'BEVERLY', 'MOUNT GREENWOOD', 'WASHINGTON HEIGHTS']
_24 = ['ROGERS PARK', 'WEST RIDGE']
_25 = ['BELMONT CRAGIN', 'HERMOSA', 'MONTCLARE']

comm_map = {} 
for dist in [_0, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _14, _15, _16, _17, _18, _19, _20, _22, _24, _25]:
    for comm in dist:
        comm_map[comm] = dist[0]

df_cr['community_area'] = df_cr['community_area'].replace(comm_map)
df_cr = df_cr.merge(df_dist, on=['community_area'])

gdf_cr = gpd.GeoDataFrame(df_cr, geometry='geometry', crs="EPSG:4326")

print(f'Number of Crimes:\n{len(gdf_cr)}')
gdf_cr

Number of Crimes:
234896


Unnamed: 0,type,arrest,domestic,community_area,longitude,latitude,station_name,dist_num,dist_name,geometry
0,OTHER OFFENSE,False,True,WEST ENGLEWOOD,-87.649437,41.771782,1438 W 63rd St,7,Englewood,"MULTIPOLYGON (((-87.63076 41.7942, -87.63076 4..."
1,SEX OFFENSE,True,False,WOODLAWN,-87.597001,41.763338,7040 S Cottage Grove Ave,3,Grand Crossing,"MULTIPOLYGON (((-87.58001 41.79348, -87.57949 ..."
2,SEX OFFENSE,False,False,JEFFERSON PARK,-87.766404,41.985875,5151 N Milwaukee Ave,16,Jefferson Park,"MULTIPOLYGON (((-87.80655 42.01896, -87.80655 ..."
3,WEAPONS VIOLATION,False,False,WEST ENGLEWOOD,-87.652840,41.762615,1438 W 63rd St,7,Englewood,"MULTIPOLYGON (((-87.63076 41.7942, -87.63076 4..."
4,BATTERY,True,True,NEAR WEST SIDE,-87.699285,41.900506,1412 S Blue Island Ave\n,12,Near West,"MULTIPOLYGON (((-87.65742 41.90351, -87.65739 ..."
...,...,...,...,...,...,...,...,...,...,...
234891,HOMICIDE,True,False,LINCOLN SQUARE,-87.658318,41.993457,5400 N Lincoln Ave,20,Lincoln,"MULTIPOLYGON (((-87.66029 41.99092, -87.66029 ..."
234892,HOMICIDE,False,False,NEAR WEST SIDE,-87.638918,41.857173,1412 S Blue Island Ave\n,12,Near West,"MULTIPOLYGON (((-87.65742 41.90351, -87.65739 ..."
234893,HOMICIDE,False,False,PULLMAN,-87.621374,41.711753,727 E 111th St,5,Calumet,"MULTIPOLYGON (((-87.58776 41.72231, -87.58762 ..."
234894,HOMICIDE,False,False,NORTH LAWNDALE,-87.728122,41.841506,3315 W Ogden Ave,10,Ogden,"MULTIPOLYGON (((-87.68604 41.86661, -87.68603 ..."


## Choropleth

Now, we are ready to obtain the choropleth with the number of crimes for each district area.

In [13]:
m = folium.Map(location=[41.840019, -87.691057], zoom_start=11, width="100%", tiles="CartoDB positron")
df = df_cr.groupby("community_area").count()["type"].reset_index().rename({"type": "crime_count"}, axis="columns")
gdf_dist = gdf_dist.merge(df, on='community_area', how='left')
df = gdf_dist[['community_area', 'dist_name', 'crime_count']]

In [14]:
# Creating the choropleth
choropleth = folium.Choropleth(
    geo_data=gdf_dist,
    data=df,
    columns=["community_area", "crime_count"],
    key_on="feature.properties.community_area",
    fill_color='OrRd',
    fill_opacity=1,
    line_opacity=1,
    line_color='black',
    line_weight=2
).add_to(m)

folium.GeoJson(
    data=gdf_dist,
    tooltip=folium.GeoJsonTooltip(
        fields=["dist_name", "crime_count"],
        aliases=['District Area:', 'Crime Count:'],
        localize=True,
        labels=True
    ),
    popup=folium.GeoJsonPopup(
        fields=["dist_name", "crime_count"],
        aliases=['District Area:', 'Crime Count:'],
        localize=True,
        labels=True
    ),
    style_function=lambda feature: {
        'fillColor': 'transparent',
        'color': 'transparent',
        'weight': 0
    },
    highlight_function=lambda feature: {
        'fillColor': '#ffffff',
        'color': '#000000',
        'weight': 3,
        'fillOpacity': 0.6
    }
).add_to(m)

folium.LayerControl().add_to(m)

m