# Crime Incidents Philadelphia - Analysis Notebook Suite

Welcome to the Crime Incidents Philadelphia analysis platform. This notebook suite provides an interactive, step-by-step analysis of crime trends, patterns, and hotspots across Philadelphia from 2006 to present.

## Project Overview

**Data Source**: [OpenDataPhilly](https://opendataphilly.org/) - Carto API for Philadelphia Police Department crime incidents

**Time Range**: 2006 to present (continuously updated)

**Goal**: Identify temporal and spatial patterns of crime to support data-driven public safety analysis.

## Workflow Overview

This analysis is organized into 6 phases, each with dedicated notebooks:

### Phase 1: Data Ingestion
- **Purpose**: Download and consolidate monthly crime data from OpenDataPhilly API
- **Notebook**: `phase_01_data_ingestion/01_scrape_and_consolidate.ipynb`
- **Output**: Consolidated Parquet file with optimized data types

### Phase 2: Exploration
- **Purpose**: Understand data structure, identify quality issues, and profile distributions
- **Notebooks**:
  - `phase_02_exploration/01_data_overview.ipynb` — Basic data shape, types, and distributions
  - `phase_02_exploration/02_data_quality_assessment.ipynb` — Duplicates, missing values, outliers

### Phase 3: Processing
- **Purpose**: Clean data and create analytical features
- **Notebooks**:
  - `phase_03_processing/01_data_cleaning.ipynb` — Handle missing values, remove duplicates, standardize formats
  - `phase_03_processing/02_feature_engineering.ipynb` — Temporal, spatial, and aggregate features

### Phase 4: Analysis
- **Purpose**: Discover temporal trends, categorical patterns, and statistical relationships
- **Notebooks**:
  - `phase_04_analysis/01_temporal_analysis.ipynb` — Time-series trends, seasonality
  - `phase_04_analysis/02_categorical_analysis.ipynb` — Crime types, districts, cross-tabulations
  - `phase_04_analysis/03_statistical_summaries.ipynb` — Correlations, distributions, summary reports

### Phase 5: Visualization
- **Purpose**: Create interactive maps, dashboards, and visual reports
- **Notebooks**:
  - `phase_05_visualization/01_crime_maps_and_hotspots.ipynb` — Folium maps, KDE hotspot detection
  - `phase_05_visualization/02_trend_analysis_dashboards.ipynb` — Plotly dashboards, heatmaps

### Phase 6: Modeling (Future)
- **Purpose**: Build predictive and classification models
- **Notebooks**:
  - `phase_06_modeling/01_forecasting_exploration.ipynb` — Time-series forecasting
  - `phase_06_modeling/02_classification_models.ipynb` — Crime type/district prediction

## Quick Start

### First Time Setup

1. **Install dependencies** (if not already done):
   ```bash
   pip install -r requirements.txt
   ```

2. **Run Phase 1** to download and consolidate data:
   - Open: `notebooks/phase_01_data_ingestion/01_scrape_and_consolidate.ipynb`
   - Follow the cells to scrape monthly data and create the consolidated Parquet file

3. **Proceed sequentially** through phases 2-5 for analysis and visualization.

### Regular Analysis Sessions

- **To update data**: Run Phase 1 notebook (or execute the refresh steps)
- **To skip to analysis**: Assume data is current and jump directly to Phase 2
- **To focus on visuals**: Jump to Phase 5 if data and processing are complete

## Data Refresh Strategy

Choose one approach:

**Option A (Recommended)**: Run Phase 1 notebook at the start of each analysis session
- Ensure latest data is available
- Simple cell at top: `python scripts/helper/refresh_data.py` or direct import

**Option B**: External scheduled refresh (e.g., GitHub Actions, cron)
- Data is refreshed automatically
- Notebooks always work with current data
- Requires infrastructure setup

**Option C**: Manual refresh on-demand
- Users explicitly choose when to refresh
- Lower data freshness but less overhead

## Project Structure

```
.
├── config.ini                  # Configuration (API endpoints, paths)
├── data/                       # Data storage
│   ├── raw/                    # Original downloads
│   └── processed/              # Consolidated Parquet files
├── notebooks/                  # Jupyter notebook suite (THIS IS YOU!)
│   ├── 00_start_here.ipynb     # Master index
│   ├── phase_01_data_ingestion/
│   ├── phase_02_exploration/
│   ├── phase_03_processing/
│   ├── phase_04_analysis/
│   ├── phase_05_visualization/
│   └── phase_06_modeling/
├── scripts/
│   └── helper/                 # ETL/helper scripts (scrape, consolidate)
├── src/                        # Reusable library code
│   ├── data/                   # Data loading and utilities
│   ├── analysis/               # Statistical profiling and analysis
│   ├── geospatial/             # Spatial analysis and mapping
│   └── utils/                  # Configuration and general utilities
├── visualizations/             # Generated maps and charts
├── requirements.txt            # Python dependencies
└── README.md                   # Project documentation
```

## Reusable Modules

Notebooks leverage these reusable components from `src/`:

- **`src.data.loader`**: Load Parquet files and datasets
- **`src.analysis.profiler`**: Data profiling, statistical summaries, cross-tabulations
- **`src.geospatial.analyzer`**: GeoDataFrame conversion, hotspot detection, map generation
- **`src.utils.config`**: Configuration management and path resolution

## Key Dependencies

- **pandas**: Data manipulation and analysis
- **geopandas**: Spatial data handling
- **folium**: Interactive mapping
- **plotly**: Interactive dashboards and charts
- **scikit-learn**: Machine learning (future modeling)
- **requests**: API calls for data ingestion
- **pyarrow**: Parquet file support

## Tips for Effective Use

1. **Run cells sequentially** within each notebook to maintain state
2. **Review markdown cells** for context and explanations
3. **Modify parameters** (date ranges, filtering criteria) to customize analyses
4. **Export outputs** as needed (CSV, HTML, PNG)
5. **Keep notebooks focused** on their phase—complex custom analysis can be added as new notebooks

## Next Steps

### To get started immediately:
1. Navigate to: `notebooks/phase_01_data_ingestion/01_scrape_and_consolidate.ipynb`
2. Run all cells to download and consolidate data
3. Move to Phase 2 for initial data exploration

### To skip data ingestion (if data already exists):
1. Jump to: `notebooks/phase_02_exploration/01_data_overview.ipynb`
2. Verify data is loaded correctly
3. Proceed through subsequent phases

---

**Version**: 1.0  
**Last Updated**: 2026-01-27  
**Contact**: For questions or issues, refer to the project README or documentation.
