# Dallas Incidents Police Reports

This dataset contains detailed information on police incidents, including time, location, crime types, and demographic. It has 86 columns with 1347410 rows -- it is a medium size data sample, so predictive modeling results will be detailed.

Given the nature of this dataset, we can perform **Predictive Pnalysis Porkflow** to cover:
* **Data Cleaning & Features Engineering**
* **Geo-Coordinate extraction**
* **Label Encoding & Time Features**
* **Crime Type Forecasting** using classification
* **Risk Modeling** using HeatMap (I considered KDE and Spatial Clustering but they're not as efficient).

First, we want to processed the data and create a modeling dataset with the following features:
* **Latitude & Longitude:** Spatial coordinates
* **Month, DayOfWeek, Hour:** time-based feature
* **IncidentCategory:** encoded crime type

In [65]:
import streamlit as st
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [67]:
# Download the csv from the source first.
df = pd.read_csv('https://www.dallasopendata.com/Public-Safety/Police-Incidents/qv6i-rri7/about_data')

# We grab the first couple of observations for obervation purposes.
df.head()

  df = pd.read_csv('Police_Incidents_20250505.csv')


Unnamed: 0,Incident Number w/year,Year of Incident,Service Number ID,Watch,Call (911) Problem,Type of Incident,Type Location,Type of Property,Incident Address,Apartment Number,...,NIBRS Code,NIBRS Group,NIBRS Type,Update Date,X Coordinate,Y Cordinate,Zip Code,City,State,Location1
818926,122960-2023,2023,122960-2023-01,2,58 - ROUTINE INVESTIGATION,CRIMINAL TRESPASS AFFIDAVIT,Other,,2029 S BUCKNER BLVD,,...,999,C,999 - No Coded,2023-07-30 14:12:47.0000000,2526947.0,6959796.0,75217.0,DALLAS,TX,"2029 S BUCKNER BLVD\nDALLAS, TX 75217\n(32.747..."
304888,166799-2018,2018,166799-2018-01,2,12R - RESIDENTIAL ALARM,ALARM INCIDENT REPORT (NO OFFENSE),Apartment Complex/Building,,4412 MCKINNEY AVE,13.0,...,999,C,999 - No Coded,2018-07-30 15:04:23.0000000,2494047.0,6985400.0,75205.0,DALLAS,TX,"4412 MCKINNEY AVE\nDALLAS, TX 75205\n(32.81988..."
1243428,025871-2022,2022,025871-2022-01,3,09V - UUMV,UNAUTHORIZED USE OF MOTOR VEH - AUTOMOBILE,Parking (Business),Motor Vehicle,1601 MCKINNEY AVE,,...,240,A,Coded,2024-12-18 11:42:37.0000000,2489361.0,6973666.0,75202.0,DALLAS,TX,"1601 MCKINNEY AVE\nDALLAS, TX 75202\n(32.78753..."
708232,048018-2020,2020,048018-2020-01,1,58 - ROUTINE INVESTIGATION,"THEFT OF SERVICE > OR EQUAL $2,500 <$30K PC31....",Parking (Business),Motor Vehicle,2992 FOREST LN,,...,26A,A,Coded,2020-07-11 01:09:15.0000000,2466652.0,7017502.0,75234.0,DALLAS,TX,"2992 FOREST LN\nDALLAS, TX 75234\n(32.90900301..."
287187,162873-2020,2020,162873-2020-02,3,40/01 - OTHER,"CRIM MISCHIEF >OR EQUAL $750 < $2,500",Apartment Complex/Building,,11333 AMANDA LN,1004.0,...,290,A,Coded,2022-10-18 10:49:18.0000000,2524166.0,7006383.0,75238.0,DALLAS,TX,"11333 AMANDA LN\nDALLAS, TX 75238\n(32.8758880..."


Now, we can:
1) split the data into training and test sets
2) Train a classification model to predict **IncidentCategory** (crime type)
3) Evaluate performance
4) Generate a heatmap using 'Follium' modeling

In [40]:
import re
from sklearn.preprocessing import LabelEncoder
from datetime import datetime

In [42]:
# We want to copy the original dataset for processing
df = df.copy()

# We need to extract coordinates from 'Location1' as Latitude and Longitude
def extract_coords(location):
    match = re.search(r'\(([-\d.]+),\s*([-\d.]+)\)', str(location))
    if match:
        return float(match.group(1)), float(match.group(2))
    return None, None

# We apply the extracted df
df[['Latitude', 'Longitude']] = df['Location1'].apply(lambda x: pd.Series(extract_coords(x)))

df.head()

Unnamed: 0,Incident Number w/year,Year of Incident,Service Number ID,Watch,Call (911) Problem,Type of Incident,Type Location,Type of Property,Incident Address,Apartment Number,...,NIBRS Type,Update Date,X Coordinate,Y Cordinate,Zip Code,City,State,Location1,Latitude,Longitude
0,227907-2015,2015,227907-2015-01,2,58 - ROUTINE INVESTIGATION,ILLEGAL DUMPING 1000 LBS OR MORE,Outdoor Area Public/Private,,3000 S LEDBETTER DR,,...,,2015-10-09 14:57:32.0000000,2453378.0,6944537.0,75211.0,DALLAS,TX,"3000 S LEDBETTER DR\nDALLAS, TX 75211\n(32.707...",32.707467,-96.925177
1,238085-2019,2019,238085-2019-01,3,19 - SHOOTING,ASSAULT (AGG) -DEADLY WEAPON,Condominium/Townhome Parking,,2525 PLAYERS CT,1402.0,...,Coded,2023-06-26 10:09:29.0000000,2469747.0,7047277.0,75287.0,DALLAS,TX,"2525 PLAYERS CT\nDALLAS, TX 75287\n(32.9913260...",32.991326,-96.866043
2,126427-2021,2021,126427-2021-01,3,41/20 - ROBBERY - IN PROGRESS,ROBBERY OF BUSINESS (AGG),Rest Area,,3938 S POLK ST,,...,Coded,2022-03-07 14:30:48.0000000,2479206.0,6940459.0,75224.0,DALLAS,TX,"3938 S POLK ST\nDALLAS, TX 75224\n(32.69675, -...",32.69675,-96.839985
3,249485-2016,2016,249485-2016-01,1,6X - MAJOR DIST (VIOLENCE),CRIM MISCHIEF >OR EQUAL $100 BUT <$750,Single Family Residence - Occupied,,3134 UTAH AVE,,...,,2016-10-24 09:35:09.0000000,2488556.0,6944035.0,75216.0,DALLAS,TX,"3134 UTAH AVE\nDALLAS, TX 75216\n(32.706170014...",32.70617,-96.809269
4,100969-2022,2022,100969-2022-01,1,6XA - MAJOR DIST AMBULANCE,ASSAULT -BODILY INJURY ONLY,Apartment Parking Lot,,12708 SCHROEDER RD,,...,Coded,2022-06-21 15:48:09.0000000,2502145.0,7022398.0,75243.0,DALLAS,TX,"12708 SCHROEDER RD\nDALLAS, TX 75243\n(32.9203...",32.920389,-96.760836


In [44]:
# Now, we want to convert 'Date1 of Occurrence' to datetime and extract features
df['Date1 of Occurrence'] = pd.to_datetime(df['Date1 of Occurrence'])
df['Month'] = df['Date1 of Occurrence'].dt.month
df['DayOfWeek'] = df['Date1 of Occurrence'].dt.dayofweek
df['Hour'] = pd.to_datetime(df['Time1 of Occurrence'], format='%H:%M').dt.hour

df.head()

Unnamed: 0,Incident Number w/year,Year of Incident,Service Number ID,Watch,Call (911) Problem,Type of Incident,Type Location,Type of Property,Incident Address,Apartment Number,...,Y Cordinate,Zip Code,City,State,Location1,Latitude,Longitude,Month,DayOfWeek,Hour
0,227907-2015,2015,227907-2015-01,2,58 - ROUTINE INVESTIGATION,ILLEGAL DUMPING 1000 LBS OR MORE,Outdoor Area Public/Private,,3000 S LEDBETTER DR,,...,6944537.0,75211.0,DALLAS,TX,"3000 S LEDBETTER DR\nDALLAS, TX 75211\n(32.707...",32.707467,-96.925177,10,3,5
1,238085-2019,2019,238085-2019-01,3,19 - SHOOTING,ASSAULT (AGG) -DEADLY WEAPON,Condominium/Townhome Parking,,2525 PLAYERS CT,1402.0,...,7047277.0,75287.0,DALLAS,TX,"2525 PLAYERS CT\nDALLAS, TX 75287\n(32.9913260...",32.991326,-96.866043,11,0,19
2,126427-2021,2021,126427-2021-01,3,41/20 - ROBBERY - IN PROGRESS,ROBBERY OF BUSINESS (AGG),Rest Area,,3938 S POLK ST,,...,6940459.0,75224.0,DALLAS,TX,"3938 S POLK ST\nDALLAS, TX 75224\n(32.69675, -...",32.69675,-96.839985,7,4,16
3,249485-2016,2016,249485-2016-01,1,6X - MAJOR DIST (VIOLENCE),CRIM MISCHIEF >OR EQUAL $100 BUT <$750,Single Family Residence - Occupied,,3134 UTAH AVE,,...,6944035.0,75216.0,DALLAS,TX,"3134 UTAH AVE\nDALLAS, TX 75216\n(32.706170014...",32.70617,-96.809269,10,0,0
4,100969-2022,2022,100969-2022-01,1,6XA - MAJOR DIST AMBULANCE,ASSAULT -BODILY INJURY ONLY,Apartment Parking Lot,,12708 SCHROEDER RD,,...,7022398.0,75243.0,DALLAS,TX,"12708 SCHROEDER RD\nDALLAS, TX 75243\n(32.9203...",32.920389,-96.760836,6,0,6


In [47]:
# Next, we want to encode categorical columns
cat_columns = LabelEncoder()
df['IncidentCategory'] = cat_columns.fit_transform(df['Type of Incident'].astype(str))

df['IncidentCategory'].head()

0    135
1     12
2    198
3     54
4     16
Name: IncidentCategory, dtype: int64

In [49]:
# Now, we drop rows with missing key features
df = df.dropna(subset=['Latitude', 'Longitude', 'Month', 'DayOfWeek', 'Hour', 'IncidentCategory'])

df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
Index: 7942 entries, 0 to 7999
Data columns (total 92 columns):
 #   Column                                     Non-Null Count  Dtype         
---  ------                                     --------------  -----         
 0   Incident Number w/year                     7942 non-null   object        
 1   Year of Incident                           7942 non-null   int64         
 2   Service Number ID                          7942 non-null   object        
 3   Watch                                      7942 non-null   int64         
 4   Call (911) Problem                         7311 non-null   object        
 5   Type of Incident                           7942 non-null   object        
 6   Type  Location                             7930 non-null   object        
 7   Type of Property                           1638 non-null   object        
 8   Incident Address                           7942 non-null   object        
 9   Apartment Number        

Unnamed: 0,Incident Number w/year,Year of Incident,Service Number ID,Watch,Call (911) Problem,Type of Incident,Type Location,Type of Property,Incident Address,Apartment Number,...,Zip Code,City,State,Location1,Latitude,Longitude,Month,DayOfWeek,Hour,IncidentCategory
0,227907-2015,2015,227907-2015-01,2,58 - ROUTINE INVESTIGATION,ILLEGAL DUMPING 1000 LBS OR MORE,Outdoor Area Public/Private,,3000 S LEDBETTER DR,,...,75211.0,DALLAS,TX,"3000 S LEDBETTER DR\nDALLAS, TX 75211\n(32.707...",32.707467,-96.925177,10,3,5,135
1,238085-2019,2019,238085-2019-01,3,19 - SHOOTING,ASSAULT (AGG) -DEADLY WEAPON,Condominium/Townhome Parking,,2525 PLAYERS CT,1402.0,...,75287.0,DALLAS,TX,"2525 PLAYERS CT\nDALLAS, TX 75287\n(32.9913260...",32.991326,-96.866043,11,0,19,12
2,126427-2021,2021,126427-2021-01,3,41/20 - ROBBERY - IN PROGRESS,ROBBERY OF BUSINESS (AGG),Rest Area,,3938 S POLK ST,,...,75224.0,DALLAS,TX,"3938 S POLK ST\nDALLAS, TX 75224\n(32.69675, -...",32.69675,-96.839985,7,4,16,198
3,249485-2016,2016,249485-2016-01,1,6X - MAJOR DIST (VIOLENCE),CRIM MISCHIEF >OR EQUAL $100 BUT <$750,Single Family Residence - Occupied,,3134 UTAH AVE,,...,75216.0,DALLAS,TX,"3134 UTAH AVE\nDALLAS, TX 75216\n(32.706170014...",32.70617,-96.809269,10,0,0,54
4,100969-2022,2022,100969-2022-01,1,6XA - MAJOR DIST AMBULANCE,ASSAULT -BODILY INJURY ONLY,Apartment Parking Lot,,12708 SCHROEDER RD,,...,75243.0,DALLAS,TX,"12708 SCHROEDER RD\nDALLAS, TX 75243\n(32.9203...",32.920389,-96.760836,6,0,6,16


Next, we want to proceed with the **Crime Risk Modeling using Follium** to generate a heatmap of incident concentration.

In [52]:
#First, we import the libraries
import folium
from folium.plugins import HeatMap

In [54]:
# Now, we are creating a folium map centered around the average location
center_lat = df['Latitude'].mean()
center_lon = df['Longitude'].mean()
map_heatmap = folium.Map(location=[center_lat, center_lon], zoom_start=12)

# Here, we are preparing the data for heatmap
heat_data = df[['Latitude', 'Longitude']].values.tolist()

# We are adding a heatmap layer
HeatMap(heat_data).add_to(map_heatmap)

# We are saving our result heatmap to HTML
output_path = "Police_Incidents_heatmap.html"
map_heatmap.save(output_path)

output_path

display(map_heatmap)
## Please check the current directory for the .html file to view the interactive results.

We want to perform the classification on the dataset using Random Forest Classifier (**RandomForestClassifier**)
and gather the report using **classification_report**)

In [58]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

In [60]:
X = df[['Latitude', 'Longitude', 'Month', 'DayOfWeek', 'Hour']]
y = df['IncidentCategory']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

clf = RandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.05      0.04      0.04        53
           2       0.00      0.00      0.00         3
           3       0.00      0.00      0.00         4
           5       0.00      0.00      0.00         6
           6       0.00      0.00      0.00        10
           7       0.00      0.00      0.00         1
           8       0.00      0.00      0.00         2
          11       0.00      0.00      0.00         5
          12       0.00      0.00      0.00        30
          13       0.00      0.00      0.00         6
          14       0.00      0.00      0.00         2
          15       0.00      0.00      0.00        14
          16       0.00      0.00      0.00        44
          18       0.00      0.00      0.00         1
          21       0.00      0.00      0.00         1
          23       0.00      0.00      0.00         1
          24       0.00      0.00      0.00        10
          25       0.00    

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [62]:
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.holtwinters import ExponentialSmoothing

df['IncidentMonth'] = pd.DatetimeIndex(df['Date1 of Occurrence']).to_period('M')
monthly_counts = df.groupby('IncidentMonth').size().rename('IncidentCount').to_timestamp()
monthly_counts = monthly_counts.dropna()

model = ExponentialSmoothing(monthly_counts, trend='additive', seasonal=None, initialization_method="estimated")
fit_model = model.fit()

forecast = fit_model.forecast(6)
print(monthly_counts)
print("Forecast:")
print(forecast)

IncidentMonth
2000-09-01     1
2006-05-01     1
2007-01-01     1
2013-01-01     1
2013-05-01     1
              ..
2025-01-01    51
2025-02-01    62
2025-03-01    62
2025-04-01    57
2025-05-01     9
Name: IncidentCount, Length: 142, dtype: int64
Forecast:
142    34.347540
143    34.583746
144    34.819952
145    35.056158
146    35.292364
147    35.528571
dtype: float64


  self._init_dates(dates, freq)
  return get_prediction_index(
  return get_prediction_index(
