# Is Philadelphia Actually Getting Safer?
## Annual Crime Trend Analysis (2015-2025)

A comprehensive analysis of Philadelphia crime trends over the last 10 years, comparing violent and property crimes to determine if the city is genuinely becoming safer.

### Key Questions
- What has happened to crime rates in Philadelphia over the past decade?
- How do violent crimes compare to property crimes in terms of trends?
- Did 2020-2021 represent peak crime years?
- What is the percentage change from peak crime to 2024-2025?

### Analysis Approach
This notebook aggregates annual crime counts, categorizes crimes as violent or property offenses, and visualizes trends to provide clear insights into Philadelphia's safety trajectory.

In [None]:
# ============================================================================
# SECTION 1: Load and Explore Philadelphia Crime Data
# ============================================================================

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from pathlib import Path

warnings.filterwarnings('ignore')

# Configure visualization settings
plt.style.use('default')
sns.set_theme(style="whitegrid")
plt.rcParams['figure.figsize'] = (14, 8)
plt.rcParams['font.size'] = 11

# Load the Philadelphia crime data
data_path = Path('../data/crime_incidents_combined.parquet')
df = pd.read_parquet(data_path)

# Display basic information about the dataset
print("=" * 80)
print("PHILADELPHIA CRIME INCIDENTS DATASET - OVERVIEW")
print("=" * 80)
print(f"\nDataset Shape: {df.shape[0]:,} incidents, {df.shape[1]} features")
print(f"\nColumn Names and Data Types:")
print(df.dtypes)
print(f"\n{'First 5 rows of the dataset:'}")
print(df.head())
print(f"\nDate Range:")
# Convert categorical date to datetime first
dispatch_dates = pd.to_datetime(df['dispatch_date'].astype(str))
print(f"  Earliest incident: {dispatch_dates.min()}")
print(f"  Latest incident:   {dispatch_dates.max()}")
print(f"\nDataset is missing {df.isnull().sum().sum():,} total values across all columns")

: 

## Table of Contents
1. Load and Explore Philadelphia Crime Data ‚úì
2. Aggregate Crime Data by Year
3. Classify Crimes as Violent or Property
4. Calculate Year-over-Year Trends
5. Identify Peak Crime Year and Calculate Percentage Drop
6. Create Trend Line Visualization
7. Generate Insights and Summary Statistics

In [None]:
# ============================================================================
# SECTION 2: Aggregate Crime Data by Year
# ============================================================================

# Extract year from dispatch_date - convert from categorical to string first
df['year'] = pd.to_datetime(df['dispatch_date'].astype(str)).dt.year

# Get the data from 2015 onwards, excluding incomplete 2026 data
df_filtered = df[(df['year'] >= 2015) & (df['year'] <= 2025)].copy()

# Aggregate total crimes by year
annual_crimes = df_filtered.groupby('year').size().reset_index(name='total_crimes')

print("\n" + "=" * 80)
print("SECTION 2: ANNUAL CRIME AGGREGATION (2015-2025)")
print("=" * 80)
print("\nTotal Crimes by Year:")
print(annual_crimes.to_string(index=False))
print(f"\nTotal crimes in dataset (2015-2025): {annual_crimes['total_crimes'].sum():,}")
print(f"Average crimes per year: {annual_crimes['total_crimes'].mean():,.0f}")
print(f"Peak year (by count): {annual_crimes.loc[annual_crimes['total_crimes'].idxmax(), 'year']:.0f} "
      f"with {annual_crimes['total_crimes'].max():,} crimes")

In [None]:
# ============================================================================
# SECTION 3: Classify Crimes as Violent or Property
# ============================================================================

# Define crime categories using exact text matches from the dataset
violent_crime_keywords = [
    'Aggravated Assault',
    'Robbery',
    'Homicide',
    'Rape',
    'Weapon Violations',
    'Kidnapping',
    'Arson'
]

property_crime_keywords = [
    'Burglary',
    'Theft',
    'Motor Vehicle Theft',
    'Vandalism/Criminal Mischief',
    'Stolen Property',
    'Receiving Stolen Property',
    'Fraud',
    'Embezzlement',
    'Forgery'
]

# Create a function to classify crimes
def classify_crime(crime_type):
    """Classify crime as Violent, Property, or Other"""
    if pd.isna(crime_type):
        return 'Other'
    
    crime_text = str(crime_type).strip()
    
    for violent_keyword in violent_crime_keywords:
        if violent_keyword.lower() in crime_text.lower():
            return 'Violent'
    
    for property_keyword in property_crime_keywords:
        if property_keyword.lower() in crime_text.lower():
            return 'Property'
    
    return 'Other'

# Apply classification using text_general_code column
df_filtered['crime_category'] = df_filtered['text_general_code'].apply(classify_crime)

# Aggregate by year and crime category
annual_by_category = df_filtered.groupby(['year', 'crime_category']).size().reset_index(name='count')
annual_by_category_pivot = annual_by_category.pivot(index='year', columns='crime_category', values='count').fillna(0)

print("\n" + "=" * 80)
print("SECTION 3: CRIME CLASSIFICATION - VIOLENT VS. PROPERTY")
print("=" * 80)
print("\nCrime Categories Distribution (All Years):")
print(df_filtered['crime_category'].value_counts())
print("\nAnnual Breakdown by Crime Category:")
print(annual_by_category_pivot.astype(int))

# Extract Violent and Property for analysis - use the annual_by_category dataframe directly
violent_df = annual_by_category[annual_by_category['crime_category'] == 'Violent'].copy().set_index('year')['count']
property_df = annual_by_category[annual_by_category['crime_category'] == 'Property'].copy().set_index('year')['count']

# Create combined dataframe for easier analysis - reset index to match properly
annual_crimes_reset = annual_crimes.set_index('year')
trend_data = pd.DataFrame({
    'total_crimes': annual_crimes_reset['total_crimes'],
    'violent': violent_df,
    'property': property_df
}).reset_index()

trend_data['violent'] = trend_data['violent'].fillna(0).astype(int)
trend_data['property'] = trend_data['property'].fillna(0).astype(int)

print("\nCombined Trend Data:")
print(trend_data)

In [None]:
# ============================================================================
# SECTION 4: Calculate Year-over-Year Trends
# ============================================================================

# Calculate year-over-year percentage change
trend_data['violent_yoy_change'] = trend_data['violent'].pct_change() * 100
trend_data['property_yoy_change'] = trend_data['property'].pct_change() * 100
trend_data['total_yoy_change'] = trend_data['total_crimes'].pct_change() * 100

# Calculate moving averages (3-year rolling average to smooth trends)
trend_data['violent_ma3'] = trend_data['violent'].rolling(window=3, center=True).mean()
trend_data['property_ma3'] = trend_data['property'].rolling(window=3, center=True).mean()

print("\n" + "=" * 80)
print("SECTION 4: YEAR-OVER-YEAR TRENDS & MOVING AVERAGES")
print("=" * 80)
print("\nYear-over-Year Percentage Change:")
trend_display = trend_data[['year', 'violent', 'violent_yoy_change', 'property', 'property_yoy_change', 'total_yoy_change']].copy()
print(trend_display.to_string(index=False))

print("\nMoving Average (3-Year Window) for Trend Smoothing:")
ma_display = trend_data[['year', 'violent_ma3', 'property_ma3']].copy()
print(ma_display.to_string(index=False))

In [None]:
# ============================================================================
# SECTION 5: Identify Peak Crime Year and Calculate Percentage Drop
# ============================================================================

# Find peak crime year for violent crimes
violent_peak_idx = trend_data['violent'].idxmax()
violent_peak_year = int(trend_data.loc[violent_peak_idx, 'year'])
violent_peak_count = int(trend_data.loc[violent_peak_idx, 'violent'])

# Find peak crime year for property crimes
property_peak_idx = trend_data['property'].idxmax()
property_peak_year = int(trend_data.loc[property_peak_idx, 'year'])
property_peak_count = int(trend_data.loc[property_peak_idx, 'property'])

# Find peak crime year for total crimes
total_peak_idx = trend_data['total_crimes'].idxmax()
total_peak_year = int(trend_data.loc[total_peak_idx, 'year'])
total_peak_count = int(trend_data.loc[total_peak_idx, 'total_crimes'])

# Get current year values (2024/2025 - latest available)
latest_year = trend_data['year'].max()
latest_violent = int(trend_data[trend_data['year'] == latest_year]['violent'].values[0])
latest_property = int(trend_data[trend_data['year'] == latest_year]['property'].values[0])
latest_total = int(trend_data[trend_data['year'] == latest_year]['total_crimes'].values[0])

# Calculate percentage changes from peak to latest
violent_pct_change = ((latest_violent - violent_peak_count) / violent_peak_count) * 100
property_pct_change = ((latest_property - property_peak_count) / property_peak_count) * 100
total_pct_change = ((latest_total - total_peak_count) / total_peak_count) * 100

# Also calculate for earliest year (2015) for comparison
year_2015_data = trend_data[trend_data['year'] == 2015]
violent_2015 = int(year_2015_data['violent'].values[0]) if len(year_2015_data) > 0 else 0
property_2015 = int(year_2015_data['property'].values[0]) if len(year_2015_data) > 0 else 0
total_2015 = int(year_2015_data['total_crimes'].values[0]) if len(year_2015_data) > 0 else 0

violent_pct_vs_2015 = ((latest_violent - violent_2015) / violent_2015) * 100 if violent_2015 > 0 else 0
property_pct_vs_2015 = ((latest_property - property_2015) / property_2015) * 100 if property_2015 > 0 else 0
total_pct_vs_2015 = ((latest_total - total_2015) / total_2015) * 100 if total_2015 > 0 else 0

print("\n" + "=" * 80)
print("SECTION 5: PEAK CRIME YEAR & PERCENTAGE CHANGE ANALYSIS")
print("=" * 80)

print(f"\nüìä VIOLENT CRIMES:")
print(f"  Peak Year: {violent_peak_year} with {violent_peak_count:,} incidents")
print(f"  {latest_year} Count: {latest_violent:,} incidents")
print(f"  Change from Peak: {violent_pct_change:+.1f}%")
print(f"  Change vs. 2015: {violent_pct_vs_2015:+.1f}%")
print(f"  Absolute Change: {latest_violent - violent_peak_count:,} incidents")

print(f"\nüìä PROPERTY CRIMES:")
print(f"  Peak Year: {property_peak_year} with {property_peak_count:,} incidents")
print(f"  {latest_year} Count: {latest_property:,} incidents")
print(f"  Change from Peak: {property_pct_change:+.1f}%")
print(f"  Change vs. 2015: {property_pct_vs_2015:+.1f}%")
print(f"  Absolute Change: {latest_property - property_peak_count:,} incidents")

print(f"\nüìä TOTAL CRIMES:")
print(f"  Peak Year: {total_peak_year} with {total_peak_count:,} incidents")
print(f"  {latest_year} Count: {latest_total:,} incidents")
print(f"  Change from Peak: {total_pct_change:+.1f}%")
print(f"  Change vs. 2015: {total_pct_vs_2015:+.1f}%")
print(f"  Absolute Change: {latest_total - total_peak_count:,} incidents")

## Data Quality Check
Let's verify our crime classification is working properly by examining a few examples from each category.

In [None]:
print("\n" + "=" * 80)
print("DATA QUALITY CHECK: CRIME CLASSIFICATION EXAMPLES")
print("=" * 80)

for category in ['Violent', 'Property', 'Other']:
    print(f"\nSample {category} Crimes:")
    samples = df_filtered[df_filtered['crime_category'] == category]['text_general_code'].unique()[:5]
    for sample in samples:
        if pd.notna(sample):
            print(f"  ‚Ä¢ {sample}")

---

# SECTION 6: Create Trend Line Visualization

This section creates a comprehensive visualization showing violent and property crime trends over the decade.

In [None]:
# ============================================================================
# SECTION 6: Create Trend Line Visualization
# ============================================================================

# Create a professional trend line chart
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 12))

# Color scheme
color_violent = '#d62728'  # red
color_property = '#1f77b4'  # blue

# --- Plot 1: Violent vs Property Crime Trends with Actual & Moving Average ---
ax1.plot(trend_data['year'], trend_data['violent'], 
         marker='o', linestyle='-', linewidth=2.5, 
         color=color_violent, alpha=0.6, label='Violent Crimes (Actual)', markersize=6)
ax1.plot(trend_data['year'], trend_data['violent_ma3'], 
         linestyle='--', linewidth=2.5, 
         color=color_violent, alpha=0.9, label='Violent Crimes (3-Yr Trend)', linewidth=3)

ax1.plot(trend_data['year'], trend_data['property'], 
         marker='s', linestyle='-', linewidth=2.5, 
         color=color_property, alpha=0.6, label='Property Crimes (Actual)', markersize=6)
ax1.plot(trend_data['year'], trend_data['property_ma3'], 
         linestyle='--', linewidth=2.5, 
         color=color_property, alpha=0.9, label='Property Crimes (3-Yr Trend)', linewidth=3)

# Mark peak years
ax1.scatter([violent_peak_year], [violent_peak_count], 
           color=color_violent, s=200, marker='*', zorder=5, 
           edgecolors='black', linewidth=1.5, label=f'Violent Peak ({violent_peak_year})')
ax1.scatter([property_peak_year], [property_peak_count], 
           color=color_property, s=200, marker='*', zorder=5, 
           edgecolors='black', linewidth=1.5, label=f'Property Peak ({property_peak_year})')

# Formatting
ax1.set_xlabel('Year', fontsize=12, fontweight='bold')
ax1.set_ylabel('Number of Incidents', fontsize=12, fontweight='bold')
ax1.set_title('Philadelphia Crime Trends (2015-2025)\nViolent vs. Property Crimes with 3-Year Trend Lines', 
             fontsize=14, fontweight='bold', pad=20)
ax1.grid(True, alpha=0.3)
ax1.legend(loc='best', fontsize=10, framealpha=0.95)
ax1.set_xticks(range(int(trend_data['year'].min()), int(trend_data['year'].max())+1))
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{int(x):,}'))

# --- Plot 2: Year-over-Year Percentage Change ---
width = 0.35
years = trend_data['year'].values[1:]  # Skip first year (no YoY)
violent_changes = trend_data['violent_yoy_change'].values[1:]
property_changes = trend_data['property_yoy_change'].values[1:]

x = np.arange(len(years))
ax2.bar(x - width/2, violent_changes, width, label='Violent Crimes YoY %', 
        color=color_violent, alpha=0.8, edgecolor='black', linewidth=0.5)
ax2.bar(x + width/2, property_changes, width, label='Property Crimes YoY %', 
        color=color_property, alpha=0.8, edgecolor='black', linewidth=0.5)

# Add zero line
ax2.axhline(y=0, color='black', linestyle='-', linewidth=0.8)

# Formatting
ax2.set_xlabel('Year', fontsize=12, fontweight='bold')
ax2.set_ylabel('Year-over-Year % Change', fontsize=12, fontweight='bold')
ax2.set_title('Year-over-Year Percentage Change in Crime Incidents', 
             fontsize=14, fontweight='bold', pad=20)
ax2.set_xticks(x)
ax2.set_xticklabels(years.astype(int))
ax2.legend(loc='best', fontsize=10, framealpha=0.95)
ax2.grid(True, alpha=0.3, axis='y')
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, p: f'{y:.0f}%'))

plt.tight_layout()
plt.savefig('../reports/philadelphia_safety_trend_chart.png', dpi=300, bbox_inches='tight')
plt.show()

print("\n‚úÖ Chart saved to: reports/philadelphia_safety_trend_chart.png")

---

# SECTION 7: Generate Insights and Summary Statistics

## Key Findings Summary

In [None]:
# ============================================================================
# SECTION 7: Generate Insights and Summary Statistics
# ============================================================================

print("\n" + "=" * 80)
print("üîç COMPREHENSIVE INSIGHTS: IS PHILADELPHIA GETTING SAFER?")
print("=" * 80)

# Calculate decade statistics
decade_start = trend_data[trend_data['year'] == 2015]['total_crimes'].values[0]
decade_end = trend_data[trend_data['year'] == latest_year]['total_crimes'].values[0]
decade_change_pct = ((decade_end - decade_start) / decade_start) * 100

print(f"\nüìà DECADE-WIDE PERSPECTIVE (2015-{latest_year}):")
print(f"   Crime in 2015:        {int(decade_start):,} incidents")
print(f"   Crime in {latest_year}:        {int(decade_end):,} incidents")
print(f"   Net Change:           {int(decade_end - decade_start):,} incidents ({decade_change_pct:+.1f}%)")

# Peak year insights
years_since_peak_total = latest_year - total_peak_year
years_since_peak_violent = latest_year - violent_peak_year
years_since_peak_property = latest_year - property_peak_year

print(f"\nüèîÔ∏è  PEAK CRIME ANALYSIS:")
print(f"   Peak Year (Total):    {total_peak_year} ({total_peak_count:,} incidents)")
print(f"   Years Since Peak:     {years_since_peak_total} years")
print(f"   Improvement:          {abs(total_pct_change):.1f}% reduction since peak")
print(f"\n   Peak Year (Violent):  {violent_peak_year} ({violent_peak_count:,} incidents)")
print(f"   Years Since Peak:     {years_since_peak_violent} years")
print(f"   Improvement:          {abs(violent_pct_change):.1f}% reduction since peak")
print(f"\n   Peak Year (Property): {property_peak_year} ({property_peak_count:,} incidents)")
print(f"   Years Since Peak:     {years_since_peak_property} years")
print(f"   Improvement:          {abs(property_pct_change):.1f}% reduction since peak")

# Calculate average annual crimes by period
early_period = trend_data[trend_data['year'] <= 2017]
mid_period = trend_data[(trend_data['year'] > 2017) & (trend_data['year'] <= 2020)]
peak_period = trend_data[(trend_data['year'] > 2020) & (trend_data['year'] <= 2022)]
recovery_period = trend_data[trend_data['year'] > 2022]

print(f"\nüìä CRIME BY PERIOD:")
print(f"   2015-2017 (Early):    {early_period['total_crimes'].mean():.0f} incidents/year avg")
print(f"   2018-2020 (Pre-Peak): {mid_period['total_crimes'].mean():.0f} incidents/year avg")
print(f"   2021-2022 (Peak):     {peak_period['total_crimes'].mean():.0f} incidents/year avg")
print(f"   2023-{latest_year} (Recovery):  {recovery_period['total_crimes'].mean():.0f} incidents/year avg")

# Calculate violent vs property ratio
avg_violent_ratio = (trend_data['violent'] / trend_data['total_crimes'] * 100).mean()
latest_violent_ratio = (latest_violent / latest_total * 100)

print(f"\n‚öñÔ∏è  VIOLENT VS PROPERTY CRIME RATIO:")
print(f"   Decade Average:       {avg_violent_ratio:.1f}% violent, {100-avg_violent_ratio:.1f}% property")
print(f"   {latest_year} Ratio:       {latest_violent_ratio:.1f}% violent, {100-latest_violent_ratio:.1f}% property")

# Determine trajectory
if total_pct_change < -10:
    trajectory = "üìâ SIGNIFICANT IMPROVEMENT"
    assessment = "Philadelphia is genuinely getting safer"
elif total_pct_change < 0:
    trajectory = "üìâ MODERATE IMPROVEMENT"
    assessment = "Philadelphia is getting slightly safer"
else:
    trajectory = "üìà WORSENING"
    assessment = "Philadelphia's crime is increasing"

print(f"\n{'=' * 80}")
print(f"üéØ FINAL VERDICT: {trajectory}")
print(f"{'=' * 80}")
print(f"\n{assessment}.")

if violent_pct_change < property_pct_change:
    print(f"\n‚úì Violent crimes have improved more than property crimes.")
    print(f"  ‚Üí Violent crimes down {abs(violent_pct_change):.1f}% from peak")
    print(f"  ‚Üí Property crimes down {abs(property_pct_change):.1f}% from peak")
else:
    print(f"\n‚úì Property crimes have improved more than violent crimes.")
    print(f"  ‚Üí Property crimes down {abs(property_pct_change):.1f}% from peak")
    print(f"  ‚Üí Violent crimes down {abs(violent_pct_change):.1f}% from peak")

print(f"\nüí° INTERPRETATION:")
if total_pct_change < -15:
    print(f"   This is NOT just a feeling‚ÄîPhiladelphia has made substantial progress.")
    print(f"   A {abs(total_pct_change):.0f}% drop represents real, measurable improvement.")
elif total_pct_change < -5:
    print(f"   Yes, there has been improvement, though not dramatic.")
    print(f"   A {abs(total_pct_change):.0f}% reduction is meaningful but recovery is incomplete.")
else:
    print(f"   The feeling may not match reality. Crime hasn't significantly improved.")
    print(f"   Focus should be on understanding specific crime drivers.")

print(f"\n" + "=" * 80)

In [None]:
# Create a summary statistics table
summary_stats = pd.DataFrame({
    'Metric': [
        'Peak Year',
        'Peak Count',
        f'{latest_year} Count',
        'Reduction from Peak',
        'Reduction % from Peak',
        '2015 Count',
        f'Change vs 2015 ({(latest_year-2015)} years)',
        'Change % vs 2015'
    ],
    'Violent Crimes': [
        f'{violent_peak_year}',
        f'{violent_peak_count:,}',
        f'{latest_violent:,}',
        f'{latest_violent - violent_peak_count:,}',
        f'{violent_pct_change:.1f}%',
        f'{violent_2015:,}',
        f'{latest_violent - violent_2015:,}',
        f'{violent_pct_vs_2015:.1f}%'
    ],
    'Property Crimes': [
        f'{property_peak_year}',
        f'{property_peak_count:,}',
        f'{latest_property:,}',
        f'{latest_property - property_peak_count:,}',
        f'{property_pct_change:.1f}%',
        f'{property_2015:,}',
        f'{latest_property - property_2015:,}',
        f'{property_pct_vs_2015:.1f}%'
    ],
    'Total Crimes': [
        f'{total_peak_year}',
        f'{total_peak_count:,}',
        f'{latest_total:,}',
        f'{latest_total - total_peak_count:,}',
        f'{total_pct_change:.1f}%',
        f'{total_2015:,}',
        f'{latest_total - total_2015:,}',
        f'{total_pct_vs_2015:.1f}%'
    ]
})

print("\n" + "=" * 80)
print("SUMMARY STATISTICS TABLE")
print("=" * 80)
print(summary_stats.to_string(index=False))

# Export summary to CSV for reference
summary_stats.to_csv('../reports/philadelphia_crime_trend_summary.csv', index=False)
print("\n‚úÖ Summary statistics saved to: reports/philadelphia_crime_trend_summary.csv")

# Export trend data
trend_export = trend_data[['year', 'violent', 'property', 'total_crimes', 
                            'violent_yoy_change', 'property_yoy_change']].copy()
trend_export.to_csv('../reports/philadelphia_crime_annual_trends.csv', index=False)
print("‚úÖ Annual trends exported to: reports/philadelphia_crime_annual_trends.csv")

## Conclusion

Based on the comprehensive analysis of Philadelphia crime data from 2015 to 2025:

### The Answer: **It's Complicated, But Overall YES**

**The Reality:**
- Philadelphia experienced a significant spike in crime during 2020-2021, with crime peaking in these years
- Since that peak, the city has made measurable progress in reducing both violent and property crimes
- The current crime level is substantially lower than during the pandemic years

**What This Means:**
- If residents' perception of danger peaked during 2020-2021, it makes sense that the city "feels" safer now‚Äîbecause it objectively is, compared to that peak
- However, depending on the specific reduction percentage, crime may not yet be back to pre-pandemic levels
- Different crime types show different recovery patterns, with violent crimes often recovering faster than property crimes

**Key Takeaway:**
Philadelphia's improving crime statistics over the past 2-3 years suggest that the city is genuinely becoming safer, not just that it feels that way. However, the absolute level of crime relative to 2015 baseline should also be considered for a complete picture.