# How did the COVID-19 lockdowns impact our crime landscape?
## Philadelphia Crime Incident Analysis (2018–Present)

### Research Focus
This notebook evaluates how crime patterns shifted across three periods:
- **Before lockdowns**: 2018–2019
- **During lockdowns**: 2020–2021
- **After lockdowns**: 2023–Present

### Key Questions
1. How did overall crime distributions shift across the three periods?
2. Did burglary patterns change around the March 2020 lockdown?
3. Is there evidence of a **displacement effect**—residential burglaries dropping while commercial burglaries rise?

### Table of Contents
1. [Reproducibility](#reproducibility)
2. [Data Source & Assumptions](#data-source)
3. [Imports & Settings](#imports)
4. [Data Loading](#data-loading)
5. [Data Validation](#data-validation)
6. [Transformations](#transformations)
7. [Period Comparison](#period-comparison)
8. [Visualization](#visualization)
9. [Insights](#insights)
10. [Conclusion](#conclusion)
11. [Completion Checklist](#completion-checklist)

In [None]:
# Reproducibility <a id="reproducibility"></a>
import os
import sys
import platform
import pandas as pd
import numpy as np
import matplotlib
import seaborn as sns

print('Python:', sys.version.replace('
', ' '))
print('Platform:', platform.platform())
print('Conda env:', os.environ.get('CONDA_DEFAULT_ENV', 'unknown'))
print('pandas:', pd.__version__)
print('numpy:', np.__version__)
print('matplotlib:', matplotlib.__version__)
print('seaborn:', sns.__version__)

RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
print('Random seed set to', RANDOM_SEED)

## Data Source & Assumptions <a id="data-source"></a>
- **Source file**: `data/crime_incidents_combined.parquet`
- **Primary date field**: `dispatch_date` (converted to datetime)
- **Crime labels**: `text_general_code`
- **Lockdown marker**: March 2020

**Period Definitions**:
- **Before**: 2018–2019
- **During**: 2020–2021
- **After**: 2023–last complete year

**Note**: 2022 is treated as a transition year and is excluded from the before/during/after comparisons.

In [None]:
# Imports & settings <a id="imports"></a>
from pathlib import Path
import warnings

warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:.2f}'.format)

# Plot styling
matplotlib.rcParams['figure.dpi'] = 120
matplotlib.rcParams['savefig.dpi'] = 300
sns.set_theme(style='whitegrid', context='talk')

PROJECT_ROOT = Path.cwd().parent
DATA_PATH = PROJECT_ROOT / 'data' / 'crime_incidents_combined.parquet'
REPORTS_DIR = PROJECT_ROOT / 'reports'
REPORTS_DIR.mkdir(parents=True, exist_ok=True)

print('Data path:', DATA_PATH)
print('Reports dir:', REPORTS_DIR)

## Data Loading <a id="data-loading"></a>
Load the crime incidents dataset and standardize date fields.

In [None]:
# Load data
df = pd.read_parquet(DATA_PATH)

# Ensure dispatch_date is datetime
df['dispatch_date'] = pd.to_datetime(df['dispatch_date'], errors='coerce')

# Basic cleanup
df = df.dropna(subset=['dispatch_date'])
df['year'] = df['dispatch_date'].dt.year
df['month'] = df['dispatch_date'].dt.to_period('M').dt.to_timestamp()

print(f'Loaded {len(df):,} records')
print('Date range:', df['dispatch_date'].min(), 'to', df['dispatch_date'].max())
print('Columns:', list(df.columns))

## Data Validation <a id="data-validation"></a>
Check missingness, date range coverage, and identify the last complete year.

In [None]:
# Missingness checks
missing_dispatch = df['dispatch_date'].isna().mean() * 100
missing_text = df['text_general_code'].isna().mean() * 100
print(f'Missing dispatch_date: {missing_dispatch:.2f}%')
print(f'Missing text_general_code: {missing_text:.2f}%')

# Determine last complete year (12 months of data)
months_per_year = df.groupby('year')['month'].nunique().reset_index(name='n_months')
complete_years = months_per_year[months_per_year['n_months'] == 12]['year']
last_complete_year = int(complete_years.max())
print('Last complete year:', last_complete_year)

# Quick count summary
year_counts = df.groupby('year').size().reset_index(name='incidents')
display(year_counts.tail(10))

## Transformations <a id="transformations"></a>
Define the analysis periods and create burglary subcategories for residential vs. commercial.

In [None]:
# Define analysis periods
def assign_period(year: int, last_complete: int) -> str | None:
    if year in (2018, 2019):

df['period'] = df['year'].apply(lambda y: assign_period(y, last_complete_year))
df_period = df.dropna(subset=['period']).copy()

# Burglary type classification
def classify_burglary(text: str) -> str | None:
    if text == 'Burglary Residential':

df_period['burglary_type'] = df_period['text_general_code'].apply(classify_burglary)

print('Period counts:')
print(df_period['period'].value_counts())
print('
Burglary type counts (within period data):')
print(df_period['burglary_type'].value_counts(dropna=False))

## Period Comparison <a id="period-comparison"></a>
Compare overall incidents and burglary types by period using monthly averages to normalize for different period lengths.

In [None]:
# Monthly totals for overall incidents
monthly_total = (

# Monthly totals for burglary types
monthly_burglary = (

# Period summaries (monthly averages)
period_total_summary = (

period_burglary_summary = (

display(period_total_summary)
display(period_burglary_summary)

## Visualization <a id="visualization"></a>
Create a multi-line time series chart for residential vs. commercial burglaries and annotate the March 2020 lockdown.

In [None]:
# Prepare time series for plotting
burglary_ts = (

# Pivot for lines
burglary_pivot = burglary_ts.pivot(index='month', columns='burglary_type', values='incidents').fillna(0)

fig, ax = plt.subplots(figsize=(14, 7))
colors = {

for col in burglary_pivot.columns:
    ax.plot(burglary_pivot.index, burglary_pivot[col], label=col, linewidth=2.2, color=colors.get(col))

# Lockdown annotation
lockdown_date = pd.Timestamp('2020-03-01')
ax.axvline(lockdown_date, color='red', linestyle='--', linewidth=2)
ax.annotate(

ax.set_title('Monthly Burglary Trends in Philadelphia
Residential vs. Commercial', fontweight='bold')
ax.set_xlabel('Month')
ax.set_ylabel('Incidents per Month')
ax.legend(title='Burglary Type')
ax.set_xlim(burglary_pivot.index.min(), burglary_pivot.index.max())

fig.tight_layout()
output_path = REPORTS_DIR / 'covid_lockdown_burglary_trends.png'
fig.savefig(output_path, bbox_inches='tight')
print('Saved chart to', output_path)

## Insights <a id="insights"></a>
Quantify the displacement effect by comparing monthly averages before vs. during lockdowns for each burglary subtype.

In [None]:
# Displacement effect metrics
summary = period_burglary_summary.pivot(index='burglary_type', columns='period', values='monthly_avg')

summary['Change During vs Before (%)'] = (

summary['Change After vs During (%)'] = (

display(summary.round(2))

### Interpretation
- If **Residential Burglary** decreases during 2020–2021 while **Commercial Burglary** increases, this supports a displacement effect.
- Use the table above to compare the percentage changes in the **During vs Before** column.
- For the after period, check whether commercial burglary stays elevated or returns to pre-lockdown levels.

## Conclusion <a id="conclusion"></a>
This notebook provides a structured comparison of crime patterns before, during, and after COVID-19 lockdowns, with a specific focus on burglary displacement. The results combine period summaries with a time-series visualization annotated to March 2020.

## Completion Checklist <a id="completion-checklist"></a>
- [ ] Notebook executed end-to-end with no errors
- [ ] Outputs preserved (do not clear outputs)
- [ ] Chart saved to `reports/covid_lockdown_burglary_trends.png`
- [ ] Results reviewed for displacement effect
- [ ] `docs/NOTEBOOK_COMPLETION_REPORT.md` updated
- [ ] `docs/NOTEBOOK_QUICK_REFERENCE.md` updated