# Phase 2: Spatial & Socioeconomic Analysis Summary

**Purpose:** Consolidate and validate all Phase 2 analyses for Philadelphia crime data.

**Requirements Addressed:**
- PATROL-01: Hotspot clustering analysis
- PATROL-02: Robbery temporal patterns
- PATROL-03: District severity scoring
- HYP-SOCIO: Census tract crime rates

---

## 0. Reproducibility Cell

In [1]:
import time
RUNTIME_START = time.time()

import pandas as pd
import geopandas as gpd
import json
from pathlib import Path
from datetime import datetime

# Reproducibility
print(f"pandas: {pd.__version__}")
print(f"geopandas: {gpd.__version__}")
print(f"Execution timestamp: {datetime.now().isoformat()}")

pandas: 2.3.3
geopandas: 1.1.2
Execution timestamp: 2026-02-02T20:01:50.215983


In [2]:
# Path setup - find repo root
def find_repo_root():
    """Find repository root by looking for config directory."""
    current = Path.cwd()
    for parent in [current] + list(current.parents):
        if (parent / 'config').exists() and (parent / 'data').exists():
            return parent
    raise FileNotFoundError("Could not find repo root")

REPO_ROOT = find_repo_root()
REPORTS_DIR = REPO_ROOT / 'reports'
DATA_DIR = REPO_ROOT / 'data'
BOUNDARIES_DIR = DATA_DIR / 'boundaries'

print(f"Repo root: {REPO_ROOT}")
print(f"Reports directory: {REPORTS_DIR}")

Repo root: /Users/dustinober/Projects/Crime Incidents Philadelphia
Reports directory: /Users/dustinober/Projects/Crime Incidents Philadelphia/reports


## 1. Validation Results

In [3]:
# Run validation script
import sys
sys.path.insert(0, str(REPO_ROOT / 'scripts'))
from validate_phase2 import validate_phase2, cross_reference_outputs

results = validate_phase2(REPO_ROOT)
xref = cross_reference_outputs(REPO_ROOT)

print("="*60)
print("PHASE 2 VALIDATION STATUS")
print("="*60)
print(f"\nArtifact checks: {results['summary']['passed']} passed, {results['summary']['failed']} failed")
print(f"Cross-reference: {'PASSED' if xref['passed'] else 'FAILED'}")

if xref['issues']:
    print("\nIssues:")
    for issue in xref['issues']:
        print(f"  - {issue}")

if xref['warnings']:
    print("\nWarnings:")
    for warning in xref['warnings']:
        print(f"  - {warning}")

PHASE 2 VALIDATION STATUS

Artifact checks: 14 passed, 0 failed
Cross-reference: PASSED


## 2. Infrastructure Summary (02-01)

In [4]:
# Load boundary data
police_gdf = gpd.read_file(BOUNDARIES_DIR / 'police_districts.geojson')
tracts_gdf = gpd.read_file(BOUNDARIES_DIR / 'census_tracts_pop.geojson')

print("="*60)
print("INFRASTRUCTURE (02-01)")
print("="*60)
print(f"\nPolice Districts: {len(police_gdf)} boundaries loaded")
print(f"Census Tracts: {len(tracts_gdf)} tracts loaded")

# Population summary
pop_col = 'total_pop' if 'total_pop' in tracts_gdf.columns else 'population'
if pop_col in tracts_gdf.columns:
    total_pop = tracts_gdf[pop_col].sum()
    print(f"Total Population: {total_pop:,.0f}")
    print(f"Population per tract (mean): {tracts_gdf[pop_col].mean():,.0f}")

INFRASTRUCTURE (02-01)

Police Districts: 21 boundaries loaded
Census Tracts: 408 tracts loaded
Total Population: 1,581,531
Population per tract (mean): 3,876


## 3. Hotspot Analysis Summary (PATROL-01, 02-02)

In [5]:
# Load hotspot data
centroids_gdf = gpd.read_file(REPORTS_DIR / 'hotspot_centroids.geojson')

print("="*60)
print("HOTSPOT ANALYSIS (PATROL-01)")
print("="*60)
print(f"\nHotspot clusters identified: {len(centroids_gdf)}")
print(f"\nOutputs:")
print(f"  - Static heatmap: hotspot_heatmap.png")
print(f"  - Interactive map: hotspot_heatmap.html")
print(f"  - Cluster centroids: hotspot_centroids.geojson")

# Show cluster summary if available
cluster_summary_path = REPORTS_DIR / 'hotspot_cluster_summary.csv'
if cluster_summary_path.exists():
    cluster_summary = pd.read_csv(cluster_summary_path)
    print(f"\nCluster Summary:")
    print(cluster_summary.to_string(index=False))

HOTSPOT ANALYSIS (PATROL-01)

Hotspot clusters identified: 33

Outputs:
  - Static heatmap: hotspot_heatmap.png
  - Interactive map: hotspot_heatmap.html
  - Cluster centroids: hotspot_centroids.geojson

Cluster Summary:
 cluster    point_x   point_y  incident_count
       0 -75.154885 39.987794           33045
       1 -75.006965 40.067145            3710
       2 -74.965690 40.082645            5934
       3 -75.219232 40.029997           10284
       4 -75.193417 40.010858            5797
       5 -75.072181 39.999377            6546
       6 -75.007934 40.100985            3673
       7 -75.211262 40.003966            8016
       8 -75.266571 39.975784            8603
       9 -74.979370 40.096823            6172
      10 -75.263800 40.055096            3525
      11 -75.025374 40.111907            4444
      12 -75.032189 40.076293            7206
      13 -75.029478 40.101395            3268
      14 -75.231398 39.883697            6843
      15 -75.083519 40.074793            50

## 4. Robbery Temporal Patterns Summary (PATROL-02, 02-03)

In [6]:
print("="*60)
print("ROBBERY TEMPORAL PATTERNS (PATROL-02)")
print("="*60)

# Load and display patrol recommendations
rec_path = REPORTS_DIR / 'robbery_patrol_recommendations.md'
if rec_path.exists():
    with open(rec_path) as f:
        rec_content = f.read()
    print("\n" + rec_content)
else:
    print("\nPatrol recommendations file not found.")

print(f"\nOutputs:")
print(f"  - Temporal heatmap: robbery_temporal_heatmap.png")
print(f"  - By-district heatmap: robbery_temporal_by_district.png")
print(f"  - Recommendations: robbery_patrol_recommendations.md")

ROBBERY TEMPORAL PATTERNS (PATROL-02)

# Robbery Temporal Analysis Recommendations

**Generated:** 2026-02-02 19:49
**Requirement:** PATROL-02

## Key Findings

- Total robbery incidents analyzed: 136,917
- Peak period: Tuesday 00-04
- Evening/night robberies (16:00-24:00): 35.7% of total
- Weekend robberies (Fri-Sun): 42.5% of total

## Actionable Recommendations

- Prioritize Tuesday 00-04 for robbery prevention patrols - 5,272 incidents vs 1,289 in lowest period
- Peak time bin: 00-04 (25.8% of all robberies)
- Peak day: Tuesday (14.5% of all robberies)

## Peak Periods (Top 5)

| Day | Time | Incidents |
|-----|------|----------:|
| Tuesday | 00-04 | 5,272 |
| Thursday | 00-04 | 5,232 |
| Wednesday | 00-04 | 5,193 |
| Saturday | 00-04 | 5,046 |
| Friday | 00-04 | 4,933 |

## Lowest Periods (Top 5)

| Day | Time | Incidents |
|-----|------|----------:|
| Wednesday | 08-12 | 1,289 |
| Thursday | 08-12 | 1,313 |
| Tuesday | 08-12 | 1,343 |
| Monday | 08-12 | 1,370 |
| Friday | 08-12 |

## 5. District Severity Summary (PATROL-03, 02-04)

In [7]:
print("="*60)
print("DISTRICT SEVERITY SCORING (PATROL-03)")
print("="*60)

# Load severity ranking
ranking_df = pd.read_csv(REPORTS_DIR / 'district_severity_ranking.csv')

# Identify severity score column
score_col = next((c for c in ranking_df.columns if 'severity' in c.lower() and 'score' in c.lower()), None)
if score_col is None:
    score_col = next((c for c in ranking_df.columns if 'score' in c.lower()), ranking_df.columns[-1])

print(f"\nDistricts ranked: {len(ranking_df)}")
print(f"\nTop 5 Priority Districts:")
print(ranking_df.head(5).to_string(index=False))

# Statistics
print(f"\nSeverity Score Statistics:")
print(f"  Mean: {ranking_df[score_col].mean():.1f}")
print(f"  Median: {ranking_df[score_col].median():.1f}")
print(f"  Std Dev: {ranking_df[score_col].std():.1f}")
print(f"  Districts >= 70: {(ranking_df[score_col] >= 70).sum()}")

DISTRICT SEVERITY SCORING (PATROL-03)

Districts ranked: 21

Top 5 Priority Districts:
 Rank  District  Severity Score  Total Crimes  Violent %  YoY Change %  Rate per 100K  Population
    1        24            81.6        249408        9.6           9.4       387304.0       64396
    2        22            79.6        218812       11.9          -3.3       334003.0       65512
    3        25            77.8        222837       12.5          -1.3       276782.0       80510
    4        15            72.7        277255       10.5          -6.3       201753.0      137423
    5        12            71.4        199793       10.7          -9.7       288052.0       69360

Severity Score Statistics:
  Mean: 58.6
  Median: 61.5
  Std Dev: 16.5
  Districts >= 70: 6


## 6. Census Tract Crime Rates Summary (HYP-SOCIO, 02-05)

In [8]:
print("="*60)
print("CENSUS TRACT CRIME RATES (HYP-SOCIO)")
print("="*60)

# Load rates data
rates_df = pd.read_csv(REPORTS_DIR / 'tract_crime_rates.csv')

# Identify rate column
rate_col = next((c for c in rates_df.columns if 'crime_rate' in c.lower() or 'rate' in c.lower()), None)
if rate_col is None:
    rate_col = 'total_crime_rate' if 'total_crime_rate' in rates_df.columns else rates_df.columns[-1]

print(f"\nTracts analyzed: {len(rates_df)}")

# Check for reliable flag
if 'rate_reliable' in rates_df.columns:
    reliable = rates_df['rate_reliable'].sum() if rates_df['rate_reliable'].dtype == bool else (rates_df['rate_reliable'] == True).sum()
    flagged = len(rates_df) - reliable
    print(f"Reliable rates: {reliable}")
    print(f"Flagged tracts: {flagged} (low/zero population)")

# Rate statistics (only reliable tracts)
if 'rate_reliable' in rates_df.columns:
    reliable_rates = rates_df[rates_df['rate_reliable'] == True][rate_col]
else:
    reliable_rates = rates_df[rate_col]

print(f"\nCrime Rate Statistics (per 100,000):")
print(f"  Mean: {reliable_rates.mean():,.0f}")
print(f"  Median: {reliable_rates.median():,.0f}")
print(f"  Std Dev: {reliable_rates.std():,.0f}")
print(f"  Min: {reliable_rates.min():,.0f}")
print(f"  Max: {reliable_rates.max():,.0f}")

CENSUS TRACT CRIME RATES (HYP-SOCIO)

Tracts analyzed: 408
Reliable rates: 389
Flagged tracts: 19 (low/zero population)

Crime Rate Statistics (per 100,000):
  Mean: 259,687
  Median: 187,047
  Std Dev: 531,403
  Min: 39,386
  Max: 9,964,228


In [9]:
# Display flagged tracts report
flagged_path = REPORTS_DIR / 'flagged_tracts_report.md'
if flagged_path.exists():
    with open(flagged_path) as f:
        flagged_content = f.read()
    print("\nFlagged Tracts Report:")
    print("-" * 40)
    print(flagged_content)


Flagged Tracts Report:
----------------------------------------
# Flagged Census Tracts Report

## Summary

- Total census tracts: 408
- Tracts with reliable population (>= 100): 389
- Tracts flagged as unreliable: 19

## Methodology

Crime rates are calculated per 100,000 residents (FBI UCR convention).
Tracts with population below 100 are flagged as unreliable because:
- Small population denominators produce unstable rates
- May represent non-residential areas (parks, industrial zones)
- Statistical inference unreliable with small populations

## Flagged Tracts (population < 100)

| GEOID | Population | Total Crimes | Note |
|-------|------------|--------------|------|
| 42101036901 | 0 | 2236 | Zero pop |
| 42101980001 | 0 | 6051 | Zero pop |
| 42101980002 | 50 | 6248 | Low pop (50) |
| 42101980003 | 0 | 3391 | Zero pop |
| 42101980100 | 45 | 3110 | Low pop (45) |
| 42101980300 | 0 | 8732 | Zero pop |
| 42101980400 | 0 | 4226 | Zero pop |
| 42101980500 | 0 | 1813 | Zero pop |
| 421

## 7. Combined Recommendations

In [10]:
print("="*60)
print("COMBINED PHASE 2 RECOMMENDATIONS")
print("="*60)

# Get top severity districts
top_districts = ranking_df.head(5)['District'].tolist() if 'District' in ranking_df.columns else ranking_df.iloc[:, 0].head(5).tolist()

print("""
PATROL RESOURCE ALLOCATION:

1. **Priority Districts** (highest severity scores):
   Focus patrol resources on these districts based on composite severity:""")
for i, dist in enumerate(top_districts, 1):
    print(f"   {i}. District {dist}")

print("""
2. **Hotspot Coverage**:
   - 33 crime hotspot clusters identified citywide
   - Use interactive map (hotspot_heatmap.html) for tactical deployment
   - Centroids available in GeoJSON for GPS integration

3. **Robbery Prevention**:
   - Peak period: 00:00-04:00 (late night/early morning)
   - Focus on overnight shifts, especially weekends
   - District-specific patterns vary (see by-district heatmap)

4. **Socioeconomic Considerations**:
   - 19 tracts flagged with unreliable rates (zero/low population)
   - High-rate tracts may need targeted intervention
   - Consider population density when interpreting absolute crime counts
""")

COMBINED PHASE 2 RECOMMENDATIONS

PATROL RESOURCE ALLOCATION:

1. **Priority Districts** (highest severity scores):
   Focus patrol resources on these districts based on composite severity:
   1. District 24
   2. District 22
   3. District 25
   4. District 15
   5. District 12

2. **Hotspot Coverage**:
   - 33 crime hotspot clusters identified citywide
   - Use interactive map (hotspot_heatmap.html) for tactical deployment
   - Centroids available in GeoJSON for GPS integration

3. **Robbery Prevention**:
   - Peak period: 00:00-04:00 (late night/early morning)
   - Focus on overnight shifts, especially weekends
   - District-specific patterns vary (see by-district heatmap)

4. **Socioeconomic Considerations**:
   - 19 tracts flagged with unreliable rates (zero/low population)
   - High-rate tracts may need targeted intervention
   - Consider population density when interpreting absolute crime counts



## 8. Artifact Manifest

In [11]:
# Create comprehensive manifest
manifest = {
    'phase': 2,
    'name': 'Spatial & Socioeconomic Analysis',
    'created': datetime.now().isoformat(),
    'validation': {
        'passed': results['summary']['passed'],
        'failed': results['summary']['failed'],
        'cross_reference_passed': xref['passed']
    },
    'artifacts': {
        'infrastructure': [
            {'path': 'data/boundaries/police_districts.geojson', 'type': 'geojson', 'count': len(police_gdf)},
            {'path': 'data/boundaries/census_tracts_pop.geojson', 'type': 'geojson', 'count': len(tracts_gdf)},
            {'path': 'config/phase2_config.yaml', 'type': 'config'}
        ],
        'hotspots': [
            {'path': 'reports/hotspot_heatmap.png', 'type': 'image'},
            {'path': 'reports/hotspot_heatmap.html', 'type': 'html'},
            {'path': 'reports/hotspot_centroids.geojson', 'type': 'geojson', 'count': len(centroids_gdf)},
            {'path': 'reports/hotspot_cluster_summary.csv', 'type': 'csv'},
            {'path': 'data/processed/crimes_with_clusters.parquet', 'type': 'data'}
        ],
        'robbery': [
            {'path': 'reports/robbery_temporal_heatmap.png', 'type': 'image'},
            {'path': 'reports/robbery_temporal_by_district.png', 'type': 'image'},
            {'path': 'reports/robbery_patrol_recommendations.md', 'type': 'markdown'}
        ],
        'severity': [
            {'path': 'reports/district_severity_choropleth.png', 'type': 'image'},
            {'path': 'reports/district_severity_ranking.csv', 'type': 'csv', 'count': len(ranking_df)},
            {'path': 'reports/district_severity_ranking.md', 'type': 'markdown'},
            {'path': 'reports/districts_scored.geojson', 'type': 'geojson'}
        ],
        'census': [
            {'path': 'reports/tract_crime_rates.png', 'type': 'image'},
            {'path': 'reports/tract_crime_rates.csv', 'type': 'csv', 'count': len(rates_df)},
            {'path': 'reports/tracts_with_rates.geojson', 'type': 'geojson'},
            {'path': 'reports/flagged_tracts_report.md', 'type': 'markdown'},
            {'path': 'data/processed/tract_crime_rates.parquet', 'type': 'data'}
        ]
    }
}

# Save manifest
manifest_path = REPORTS_DIR / 'phase2_manifest.json'
with open(manifest_path, 'w') as f:
    json.dump(manifest, f, indent=2)

print(f"Manifest saved to: {manifest_path}")
print(f"\nTotal artifacts: {sum(len(v) for v in manifest['artifacts'].values())}")

Manifest saved to: /Users/dustinober/Projects/Crime Incidents Philadelphia/reports/phase2_manifest.json

Total artifacts: 20


## 9. Next Steps for Phase 3

In [12]:
print("="*60)
print("NEXT STEPS - PHASE 3 READINESS")
print("="*60)

print("""
Phase 2 deliverables ready for Phase 3 (Predictive Modeling):

1. **Spatial Features Available**:
   - District severity scores for feature engineering
   - Census tract crime rates for normalization
   - Hotspot cluster assignments for geographic features

2. **Temporal Patterns Established**:
   - Robbery temporal peaks identified
   - Day-of-week and hour patterns documented

3. **Data Quality Verified**:
   - 98.4% of records have valid coordinates
   - 389 of 408 census tracts have reliable population data
   - All 21 police district boundaries validated

4. **Infrastructure Ready**:
   - Spatial utilities tested (spatial_utils.py)
   - Config loader for Phase 2 parameters
   - Validation scripts for artifact checking
""")

NEXT STEPS - PHASE 3 READINESS

Phase 2 deliverables ready for Phase 3 (Predictive Modeling):

1. **Spatial Features Available**:
   - District severity scores for feature engineering
   - Census tract crime rates for normalization
   - Hotspot cluster assignments for geographic features

2. **Temporal Patterns Established**:
   - Robbery temporal peaks identified
   - Day-of-week and hour patterns documented

3. **Data Quality Verified**:
   - 98.4% of records have valid coordinates
   - 389 of 408 census tracts have reliable population data
   - All 21 police district boundaries validated

4. **Infrastructure Ready**:
   - Spatial utilities tested (spatial_utils.py)
   - Config loader for Phase 2 parameters
   - Validation scripts for artifact checking



## 10. Phase 2 Completion

In [13]:
runtime_seconds = time.time() - RUNTIME_START

print("\n" + "="*60)
print("PHASE 2 COMPLETE")
print("="*60)
print("\nAll requirements satisfied:")
print("  [x] PATROL-01: Hotspot clustering")
print("  [x] PATROL-02: Robbery temporal patterns")
print("  [x] PATROL-03: District severity scoring")
print("  [x] HYP-SOCIO: Census tract crime rates")
print(f"\nTotal artifacts: {sum(len(v) for v in manifest['artifacts'].values())}")
print(f"Validation: {results['summary']['passed']} passed, {results['summary']['failed']} failed")
print(f"Cross-reference: {'PASSED' if xref['passed'] else 'FAILED'}")
print(f"\nRuntime: {runtime_seconds:.1f} seconds")
print(f"\nPhase 2 Summary Notebook completed: {datetime.now().isoformat()}")


PHASE 2 COMPLETE

All requirements satisfied:
  [x] PATROL-01: Hotspot clustering
  [x] PATROL-02: Robbery temporal patterns
  [x] PATROL-03: District severity scoring
  [x] HYP-SOCIO: Census tract crime rates

Total artifacts: 20
Validation: 14 passed, 0 failed
Cross-reference: PASSED

Runtime: 0.8 seconds

Phase 2 Summary Notebook completed: 2026-02-02T20:01:50.522124
