Skip to content

Latest commit

 

History

History
358 lines (282 loc) · 11 KB

crime data.md

File metadata and controls

358 lines (282 loc) · 11 KB

CRIME DATA

About the data?

In this presentation, we embark on a detailed exploration of reported crime incidents in Chicago, a dataset that extends from 2001 to the current date, excluding the most recent seven days. This data is extracted from the Chicago Police Department's Citizen Law Enforcement Analysis and Reporting (CLEAR) system, reflects a broad spectrum of crime incidents, with a notable exception for murders where each victim's data is separately recorded. A key aspect of this dataset is its commitment to the privacy of crime victims. To this end, the information is generalized to the block level, without pinpointing specific locations. It's important to highlight that the dataset encompasses unverified reports and preliminary crime classifications that may be subject to change following further investigation. This aspect underscores the dynamic and somewhat tentative nature of the data. Given the potential for mechanical or human error, the Chicago Police Department explicitly states that the accuracy, completeness, timeliness, or correct sequencing of the data cannot be guaranteed. As a result, this dataset should not be used for time-based comparative purposes. This presentation aims to provide a data-driven narrative on public safety in Chicago. We will delve into this rich dataset, publicly available under the terms provided by the City of Chicago and offered 'AS IS' by Google, to uncover patterns, understand trends, and offer insights into the complex domain of urban crime and safety."

For information on SQL PULL: https://github.com/dsrichard97/chicagosql

For more information please visit the following link: https://dsrichard97.github.io/chicago_crime/

1. LOADING THE DATA

import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans

# Load data
crimedf = pd.read_csv("~/Desktop/crimedataquery.csv")
crimedf.head(5)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
unique_key case_number date block iucr primary_type description location_description arrest domestic ... ward community_area fbi_code x_coordinate y_coordinate year updated_on latitude longitude location
0 11587320 JC140984 2019-01-11 12:01:00.000000 UTC 001XX N WABASH AVE 1122 DECEPTIVE PRACTICE COUNTERFEIT CHECK OTHER False False ... 42.0 32.0 10 1176785.0 1901619.0 2019 2019-02-13 04:01:17.000000 UTC 41.885389 -87.626266 (41.885389294, -87.626265771)
1 11792012 JC389748 2019-08-12 01:30:00.000000 UTC 001XX N DEARBORN ST 1152 DECEPTIVE PRACTICE ILLEGAL USE CASH CARD RESTAURANT False False ... 42.0 32.0 11 1175916.0 1901339.0 2019 2019-08-19 03:53:06.000000 UTC 41.884641 -87.629465 (41.884640562, -87.629465296)
2 11695143 JC272910 2019-05-21 08:05:00.000000 UTC 0000X E WACKER PL 1305 CRIMINAL DAMAGE CRIMINAL DEFACEMENT RESTAURANT True False ... 42.0 32.0 14 1176954.0 1902140.0 2019 2019-06-30 03:41:21.000000 UTC 41.886815 -87.625629 (41.886815123, -87.625629401)
3 13107804 JG302026 2023-06-15 11:54:00.000000 UTC 003XX N LOWER MICHIGAN AVE 2027 NARCOTICS POSSESS - CRACK SIDEWALK True False ... 42.0 32.0 18 1177249.0 1902230.0 2023 2023-08-19 03:40:26.000000 UTC 41.887055 -87.624543 (41.887055407, -87.624543366)
4 13226319 JG442967 2023-09-28 06:50:00.000000 UTC 0000X S STATE ST 0313 ROBBERY ARMED - OTHER DANGEROUS WEAPON SMALL RETAIL STORE False False ... 34.0 32.0 03 1176389.0 1900278.0 2023 2023-10-06 03:43:01.000000 UTC 41.881718 -87.627760 (41.88171846, -87.627760426)

5 rows × 22 columns

REPORTING INFORMATION

  1. What are the crime types in 2022-2023?
  2. What are the top 5 crimes in 2022-2023?
  3. Where are the hotspots located?

For this study we are intrested in looking at 2022-2023 crime information. In general, our crime data goes back to 2019 from our sql pull. Reference our previous sql pull for more information.

# Convert 'date' column to datetime format
crimedf['date'] = pd.to_datetime(crimedf['date'])

# Filter data for the years 2022 and 2023
crimedf_filtered = crimedf[crimedf['date'].dt.year.isin([2022, 2023])]

2. EXPLORATORY DATA ANALYSIS (EDA)

crime_count_2022_2023 = crimedf_filtered['primary_type'].value_counts().reset_index()
crime_count_2022_2023.columns = ['primary_type', 'count']

# Plot
plt.figure(figsize=(10, 6))
sns.barplot(data=crime_count_2022_2023, x='count', y='primary_type')
plt.title('Crime Types in Chicago (2022-2023)')
plt.xlabel('Count')
plt.ylabel('Crime Type')
plt.show()

png

For question one, we notice that deceptive practice is the highest crime in Chicago. It is recommended for Chicago to combat these crimes by investing more in cybersecurtiy and services to help the community from fradulant crimes.

Top 5 CRIMES

top_5_crimes = crime_count_2022_2023.head(5)

# Plot
sns.barplot(data=top_5_crimes, x='count', y='primary_type')
plt.title('Top 5 Crimes in Chicago in 2022-2023')
plt.xlabel('Count')
plt.ylabel('Crime Type')
plt.show()

png

3. BLOCK LEVEL ANALYSIS FOR SPECIFIC CRIMES

# DECEPTIVE PRACTICE in 001XX N STATE ST
deceptive_practice_block = crimedf_filtered[
    (crimedf_filtered['primary_type'] == 'DECEPTIVE PRACTICE') & 
    (crimedf_filtered['block'] == '001XX N STATE ST')
].groupby('block').size().reset_index(name='count')

# BATTERY in 006XX S CENTRAL AVE
battery_block = crimedf_filtered[
    (crimedf_filtered['primary_type'] == 'BATTERY') & 
    (crimedf_filtered['block'] == '006XX S CENTRAL AVE')
].groupby('block').size().reset_index(name='count')

4. TEMPORAL ANALYSIS - USING MONTHLY TRENDS

# Example for DECEPTIVE PRACTICE
deceptive_practice_data = crimedf_filtered[
    (crimedf_filtered['primary_type'] == 'DECEPTIVE PRACTICE') &
    (crimedf_filtered['block'] == '001XX N STATE ST')
]
deceptive_practice_data['month'] = deceptive_practice_data['date'].dt.to_period('M')
monthly_deceptive = deceptive_practice_data.groupby('month').size().reset_index(name='count')

# Plotting
plt.figure(figsize=(12, 6))
plt.plot(monthly_deceptive['month'].dt.to_timestamp(), monthly_deceptive['count'])
plt.title('Monthly Trend of Deceptive Practice at 001XX N STATE ST (2019-2023)')
plt.xlabel('Month')
plt.ylabel('Count of Crimes')
plt.xticks(rotation=45)
plt.show()
/Users/richarddiaz/opt/anaconda3/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py:1162: UserWarning: Converting to PeriodArray/Index representation will drop timezone information.
  warnings.warn(
/var/folders/55/6xmr2dls3kl02hf3b8mx94qw0000gn/T/ipykernel_4484/746959681.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  deceptive_practice_data['month'] = deceptive_practice_data['date'].dt.to_period('M')

png

5. K-MEANS CLUSTERING FOR GEOSPATIAL DATA

# Assuming crimedf_filtered is already defined and cleaned
# K-means clustering
coords = crimedf_filtered[['latitude', 'longitude']]
kmeans = KMeans(n_clusters=5, random_state=0).fit(coords)
crimedf_filtered['cluster'] = kmeans.labels_

# Plotting clusters with a legend
plt.figure(figsize=(10, 6))
scatter = plt.scatter(crimedf_filtered['longitude'], crimedf_filtered['latitude'], 
                      c=crimedf_filtered['cluster'], cmap='viridis', label=crimedf_filtered['cluster'])

# Create a legend
plt.legend(*scatter.legend_elements(), title="Clusters")
plt.title('Crime Clusters in Chicago')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()

png

  • Cluster 0: This is one group or 'cluster' of crime incidents as identified by the K-means algorithm. All points in this cluster are more similar to each other (in terms of their geographical location - latitude and longitude) than they are to points in other clusters.
  • Cluster 1: This represents a different group of crime incidents, again grouped based on their proximity to each other.
  • Clusters 2, 3, and 4: Similarly, these labels represent additional groups of crime incidents.

The goal of this clustering is to identify 'hotspots' of crime in Chicago based on geographical data. Each cluster represents a geographical area where crimes have occurred with higher density compared to other areas. By examining these clusters, you can gain insights into which areas require more attention or resources for crime prevention and law enforcement.

The more dense the crime for specific location translates to more law enforcement in that area by targeting the top 5 crimes mentioned before.