Data cleaning and exploratory analysis of LAPD crime incident records, prepared for visualization in Tableau.
This project analyzes over 1.5 million crime incidents reported by the Los Angeles Police Department between 2010 and 2017. The Python script cleans and enriches the raw dataset for use in Tableau dashboards.
- Normalizes whitespace in address fields
- Converts integer time values to
HH:MMformat for Tableau compatibility - Splits combined lat/long
Locationfield into separateLatitudeandLongitudecolumns - Drops vague
MO Codescolumn
- Days Between Date Occurred and Date Reported - reporting delay in days
- Month Occurred - extracted month name for seasonal analysis
- Victim Age Group - bucketed age ranges (0-10, 11-20, ..., 91-99)
- Time Category - time-of-day buckets (Late Night, Early Morning, Morning, etc.)
- Victim Descent Type - full ethnicity names mapped from LAPD descent codes
The raw dataset is too large for GitHub (~456 MB). Download it from the source:
Crime Data from 2010 to 2019 - City of Los Angeles Open Data
Place the downloaded CSV as data/Crime_Data_2010_2017.csv before running the script.
pip install pandas numpy
python exploration.pyThe cleaned dataset will be saved to data/lapd_crime_dataset_cleaned.csv.
├── exploration.py # Data cleaning and feature engineering script
├── data/
│ ├── lookup/
│ │ ├── agegroup.csv # Age to age group mapping
│ │ ├── time.csv # Time to time-of-day category mapping
│ │ └── victimdescent.csv # LAPD descent code to ethnicity mapping
│ └── LAPD_Reporting_Districts.zip # LAPD reporting district boundaries (shapefile)
└── README.md
- Python (Pandas, NumPy) - data cleaning and transformation
- Tableau - visualization and dashboards
- Franco Neo Recasata
- Jaime Tanedo