# COVID-19 Lockdown Crime Landscape (Generated: 2026-02-02)
## Philadelphia Crime Incident Analysis (2018–2025)

## Summary
COVID lockdowns significantly altered Philadelphia's crime landscape.

**Key findings:**
- Overall incident volume declined during 2020–2021 relative to the 2018–2019 baseline.
- Burglary displacement patterns were observed, with residential burglaries dropping while commercial burglaries rose during lockdown.
- Distribution shifts across crime categories are statistically significant (see chi-square tests in Findings).

## Methods
**Period definitions** (comparison windows):
- **Before:** 2018–2019 (pre-pandemic baseline)
- **During:** 2020–2021 (pandemic restrictions)
- **After:** 2023–2025 (post-pandemic recovery)
- **Transition:** 2022 (excluded from comparison)

**Lockdown date:** March 1, 2020 (Philadelphia stay-at-home order).

**Displacement analysis approach:** Residential vs. commercial burglary comparison using UCR burglary code (500) and text_general_code descriptors.

### Assumptions
- March 1, 2020 is used as the lockdown start (aligns with state guidelines).
- 2022 is treated as a transition year and excluded from comparisons.
- Burglary classification is based on UCR sub-codes in text_general_code.
- Displacement hypothesis: empty commercial areas and occupied homes change burglary targets.

In [None]:
# Parameters (injected by papermill)
VERSION = "v1.0"
LOCKDOWN_DATE = "2020-03-01"
BEFORE_YEARS = [2018, 2019]
DURING_YEARS = [2020, 2021]
AFTER_START_YEAR = 2023
FAST_MODE = False

In [None]:
from datetime import datetime, timezone
from pathlib import Path
import warnings

import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

from analysis.config_loader import Phase1Config
from analysis.utils import load_data, classify_crime_category, extract_temporal_features
from analysis.config import COLORS
from analysis.artifact_manager import create_version_manifest, save_manifest
from analysis.report_utils import (
    format_data_quality_table,
    generate_data_quality_summary,
    render_report_template,
)

warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:.2f}'.format)
sns.set_theme(style='whitegrid', context='talk')

RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)

config = Phase1Config()
params = config.get_notebook_params('covid')

VERSION = VERSION if 'VERSION' in globals() else config.version
LOCKDOWN_DATE = pd.Timestamp(LOCKDOWN_DATE) if 'LOCKDOWN_DATE' in globals() else pd.Timestamp(params['lockdown_date'])
BEFORE_YEARS = BEFORE_YEARS if 'BEFORE_YEARS' in globals() else params['before_years']
DURING_YEARS = DURING_YEARS if 'DURING_YEARS' in globals() else params['during_years']
AFTER_START_YEAR = AFTER_START_YEAR if 'AFTER_START_YEAR' in globals() else params['after_start_year']
FAST_MODE = FAST_MODE if 'FAST_MODE' in globals() else False

REPORTS_DIR = Path(config.data['environment']['output_dir'])
REPORTS_DIR.mkdir(parents=True, exist_ok=True)

RUN_TIMESTAMP = datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ')
print('Run timestamp:', RUN_TIMESTAMP)
print('Reports dir:', REPORTS_DIR.resolve())

In [None]:
# Reproducibility
import os
import platform
import sys
import importlib

def safe_version(package: str) -> str:
    try:
        module = importlib.import_module(package)
        return getattr(module, '__version__', 'unknown')
    except Exception:
        return 'not installed'

print('Python:', sys.version.replace('\n', ' '))
print('Platform:', platform.platform())
print('Conda env:', os.environ.get('CONDA_DEFAULT_ENV', 'unknown'))
print('pandas:', pd.__version__)
print('numpy:', np.__version__)
print('matplotlib:', matplotlib.__version__)
print('seaborn:', sns.__version__)
print('scipy:', safe_version('scipy'))
print('plotly:', safe_version('plotly'))
print('Random seed set to', RANDOM_SEED)

## Data Loading
Load the crime incidents dataset and standardize date fields using shared utilities.

In [None]:
df = load_data()
df = extract_temporal_features(df)
df = classify_crime_category(df)

if df['dispatch_date'].isna().any():
    raise ValueError('Invalid date parsing: dispatch_date contains NaT values.')

# Ensure month column for time series
df['month'] = pd.to_datetime(df['dispatch_date']).dt.to_period('M').dt.to_timestamp()

# Filter to 2018-2025 and exclude partial 2026
df = df[(df['year'] >= min(BEFORE_YEARS)) & (df['year'] <= 2025)].copy()

if FAST_MODE:
    fast_frac = float(config.data['environment']['fast_sample_frac'])
    df = df.sample(frac=fast_frac, random_state=RANDOM_SEED)

print(f'Loaded {len(df):,} records after filtering to 2018-2025')
print('Date range:', df['dispatch_date'].min(), 'to', df['dispatch_date'].max())

In [None]:
def assign_period(year: int) -> str:
    if year in BEFORE_YEARS:
        return 'Before'
    if year in DURING_YEARS:
        return 'During'
    if year >= AFTER_START_YEAR:
        return 'After'
    return 'Transition'

df['period'] = df['year'].apply(assign_period)
df_period = df[df['period'] != 'Transition'].copy()

if df_period.empty:
    raise ValueError('No data available for Before/During/After periods after filtering.')

period_counts = df_period.groupby('period').size().sort_index()
display(period_counts.to_frame('records'))

## Data Quality Summary
Records per period and missingness overview for the filtered dataset.

In [None]:
data_quality_summary = generate_data_quality_summary(df_period)
data_quality_table = format_data_quality_table(data_quality_summary)
print(data_quality_table)

## Findings
The following sections quantify changes across COVID periods, highlight displacement effects, and visualize the lockdown impact on monthly crime counts.

In [None]:
monthly_counts = df_period.groupby('month').size().reset_index(name='incidents')

before_start = pd.Timestamp(f'{min(BEFORE_YEARS)}-01-01')
before_end = pd.Timestamp(f'{max(BEFORE_YEARS)}-12-31')
during_start = pd.Timestamp(f'{min(DURING_YEARS)}-01-01')
during_end = pd.Timestamp(f'{max(DURING_YEARS)}-12-31')
after_start = pd.Timestamp(f'{AFTER_START_YEAR}-01-01')
after_end = pd.Timestamp('2025-12-31')

fig1, ax = plt.subplots(figsize=(14, 7))
ax.plot(monthly_counts['month'], monthly_counts['incidents'], color='#264653', linewidth=2.2)

y_max = monthly_counts['incidents'].max()
ax.axvline(LOCKDOWN_DATE, color='red', linestyle='--', linewidth=2)
ax.annotate(
    'COVID-19 Lockdown',
    xy=(LOCKDOWN_DATE, y_max * 0.9),
    fontsize=12, color='red', fontweight='bold',
    ha='left'
)

ax.axvspan(before_start, before_end, alpha=0.08, color='blue', label='Before')
ax.axvspan(during_start, during_end, alpha=0.08, color='red', label='During')
ax.axvspan(after_start, after_end, alpha=0.08, color='green', label='After')

# Key point annotations
before_peak = monthly_counts[(monthly_counts['month'] >= before_start) & (monthly_counts['month'] <= before_end)]
during_min = monthly_counts[(monthly_counts['month'] >= during_start) & (monthly_counts['month'] <= during_end)]
after_recovery = monthly_counts[(monthly_counts['month'] >= after_start)]

before_peak_row = before_peak.loc[before_peak['incidents'].idxmax()]
during_min_row = during_min.loc[during_min['incidents'].idxmin()]
after_recovery_row = after_recovery.loc[after_recovery['incidents'].idxmax()]

ax.scatter([before_peak_row['month']], [before_peak_row['incidents']], color='blue', zorder=5)
ax.annotate('Peak (Before)', xy=(before_peak_row['month'], before_peak_row['incidents']),
            xytext=(before_peak_row['month'], before_peak_row['incidents'] * 1.05), fontsize=10)

ax.scatter([during_min_row['month']], [during_min_row['incidents']], color='red', zorder=5)
ax.annotate('Min (During)', xy=(during_min_row['month'], during_min_row['incidents']),
            xytext=(during_min_row['month'], during_min_row['incidents'] * 0.9), fontsize=10)

ax.scatter([after_recovery_row['month']], [after_recovery_row['incidents']], color='green', zorder=5)
ax.annotate('Recovery (After)', xy=(after_recovery_row['month'], after_recovery_row['incidents']),
            xytext=(after_recovery_row['month'], after_recovery_row['incidents'] * 1.05), fontsize=10)

before_mean = before_peak['incidents'].mean()
during_mean = during_min['incidents'].mean()
pct_change = (during_mean - before_mean) / before_mean * 100
ax.text(0.02, 0.92, f'During vs Before: {pct_change:.1f}%', transform=ax.transAxes, fontsize=11)

ax.set_title('Monthly Crime Counts (2018-2025)', fontweight='bold')
ax.set_xlabel('Month')
ax.set_ylabel('Incidents per Month')
ax.legend(loc='upper left')
ax.text(0.5, 1.02, f'Generated: {RUN_TIMESTAMP}', transform=ax.transAxes, ha='center', fontsize=10)

fig1.tight_layout()
timeline_path = REPORTS_DIR / f'covid_timeline_{VERSION}.png'
fig1.savefig(timeline_path, dpi=300, bbox_inches='tight')
print('Saved timeline to', timeline_path)

In [None]:
period_stats = df_period.groupby('period').agg(
    incidents=('objectid', 'count'),
    violent_share=('crime_category', lambda x: (x == 'Violent').mean()),
)

period_stats['During vs Before (%)'] = (
    (period_stats.loc['During', 'incidents'] - period_stats.loc['Before', 'incidents'])
    / period_stats.loc['Before', 'incidents'] * 100
)
period_stats['After vs During (%)'] = (
    (period_stats.loc['After', 'incidents'] - period_stats.loc['During', 'incidents'])
    / period_stats.loc['During', 'incidents'] * 100
)

display(period_stats)

category_counts = (
    df_period.groupby(['period', 'crime_category'])
    .size()
    .unstack(fill_value=0)
)

fig3, ax = plt.subplots(figsize=(12, 6))
category_counts = category_counts[['Violent', 'Property', 'Other']]
category_counts.plot(kind='bar', ax=ax, color=[COLORS['Violent'], COLORS['Property'], COLORS['Other']])
ax.set_title('Crime Category Counts by Period', fontweight='bold')
ax.set_ylabel('Incident count')
ax.set_xlabel('Period')
ax.legend(title='Category')
ax.grid(axis='y', alpha=0.2)

for container in ax.containers:
    ax.bar_label(container, fmt='%.0f', fontsize=9)

fig3.tight_layout()
comparison_path = REPORTS_DIR / f'period_comparison_{VERSION}.png'
fig3.savefig(comparison_path, dpi=300, bbox_inches='tight')
print('Saved period comparison to', comparison_path)

In [None]:
from scipy.stats import chi2_contingency

burglary_df = df_period[df_period['ucr_general'] == 500].copy()
if burglary_df.empty:
    raise ValueError('Missing burglary data for UCR code 500 after filtering.')

def classify_burglary(text: str) -> str:
    text_value = str(text)
    if 'Residential' in text_value:
        return 'Residential'
    if 'Non-Residential' in text_value or 'Commercial' in text_value:
        return 'Commercial'
    return 'Other'

burglary_df['burglary_type'] = burglary_df['text_general_code'].apply(classify_burglary)

burglary_pivot = burglary_df.groupby(['period', 'burglary_type']).size().unstack(fill_value=0)

if 'Before' not in burglary_pivot.index or 'During' not in burglary_pivot.index:
    raise ValueError('Empty period groups for burglary analysis.')

pct_change_during = (
    (burglary_pivot.loc['During'] - burglary_pivot.loc['Before'])
    / burglary_pivot.loc['Before'] * 100
)

fig2, ax = plt.subplots(figsize=(10, 6))
burglary_pivot[['Residential', 'Commercial']].plot(kind='bar', ax=ax, color=['#457B9D', '#F4A261'])
ax.set_title('Burglary Displacement by Period', fontweight='bold')
ax.set_ylabel('Incidents')
ax.set_xlabel('Period')
ax.legend(title='Burglary Type')
ax.grid(axis='y', alpha=0.2)

for container in ax.containers:
    ax.bar_label(container, fmt='%.0f', fontsize=9)

fig2.tight_layout()
displacement_path = REPORTS_DIR / f'burglary_displacement_{VERSION}.png'
fig2.savefig(displacement_path, dpi=300, bbox_inches='tight')
print('Saved displacement chart to', displacement_path)

contingency = burglary_pivot[['Residential', 'Commercial']].loc[['Before', 'During', 'After']]
chi2, p_value, _, _ = chi2_contingency(contingency)

print('Chi-square test (burglary distribution)')
print(f'chi2={chi2:.2f}, p={p_value:.4f}')
print(
    f"During lockdown: Residential burglaries {pct_change_during['Residential']:.1f}%, "
    f"Commercial burglaries {pct_change_during['Commercial']:.1f}%"
)

In [None]:
category_contingency = category_counts[['Violent', 'Property', 'Other']]
chi2_cat, p_value_cat, _, _ = chi2_contingency(category_contingency)

print('Chi-square test (crime category distribution)')
print(f'chi2={chi2_cat:.2f}, p={p_value_cat:.4f}')

In [None]:
report_context = {
    'title': 'COVID-19 Lockdown Crime Landscape',
    'timestamp': RUN_TIMESTAMP,
    'version': VERSION,
    'summary': (
        'Crime volumes declined during lockdown, while burglary patterns shifted toward commercial targets.'
    ),
    'methods': (
        'Compared Before (2018-2019), During (2020-2021), and After (2023-2025) periods with 2022 excluded.'
    ),
    'data_quality_table': data_quality_table,
    'findings': (
        'Burglary displacement observed: commercial burglaries rose during lockdown while residential declined.'
    ),
    'limitations': (
        'Confounding factors (economic recession, civil unrest) and reporting delays are not controlled.'
    ),
    'n_records': data_quality_summary['n_records'],
    'date_range': data_quality_summary['date_range'],
    'git_commit': create_version_manifest(VERSION, [], {}, 0.0).get('git_commit'),
}

template_path = Path('config') / 'report_template.md.j2'
report_markdown = render_report_template(template_path, report_context)
report_path = REPORTS_DIR / f'covid_report_{VERSION}.md'
report_path.write_text(report_markdown, encoding='utf-8')

artifacts = [timeline_path, displacement_path, comparison_path, report_path]
manifest = create_version_manifest(
    version=VERSION,
    artifacts=artifacts,
    params={
        'LOCKDOWN_DATE': str(LOCKDOWN_DATE.date()),
        'BEFORE_YEARS': BEFORE_YEARS,
        'DURING_YEARS': DURING_YEARS,
        'AFTER_START_YEAR': AFTER_START_YEAR,
        'FAST_MODE': FAST_MODE,
    },
    runtime_seconds=0.0,
)
manifest_path = REPORTS_DIR / f'covid_manifest_{VERSION}.json'
save_manifest(manifest, manifest_path)

print('Saved report to', report_path)
print('Saved manifest to', manifest_path)

## Limitations
- Confounding factors: economic recession, civil unrest, and policy changes are not controlled.
- Short post-COVID window (only 2023–2025) limits recovery inference.
- Reporting delays during the pandemic may affect incident counts.
- This analysis focuses on citywide aggregates; neighborhood-level dynamics are not captured.