1. Overview

This notebook loads the raw Metropolitan Police crime dataset and the 2019 Index of Multiple Deprivation (IMD) dataset.
The goal is to explore the structure, check data quality, and understand what transformations are required before modelling.

2. Objectives

Inspect raw tables

Check column types and missing values

Explore basic distributions

Validate that borough codes align across datasets

Identify obvious cleaning requirements

3. Raw Crime Data Structure

The raw crime data contains monthly crime counts aggregated by borough and major crime category.

This notebook previews:

Number of rows & columns

Time coverage

Borough coverage

Variability in crime counts

4. Raw IMD 2019 Structure

The IMD dataset contains borough-level deprivation indicators, including:

Overall IMD score

Income domain

Crime domain

Education, Health, Living Environment, etc.

These indicators will later serve as explanatory variables for crime forecasting.

5. Initial Insights

Crime counts are highly skewed

Seasonality is visible at borough level

IMD scores vary strongly between boroughs

No obvious missing values in IMD

Crime dataset requires standardisation and aggregation

6. Next Steps

Move to notebook 02_clean_crime.ipynb to clean, normalise, and reshape the crime dataset.

In [None]:
%run ../notebook_init.py

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from utils import RAW_DIR

In [None]:
crime_raw = pd.read_csv(RAW_DIR / "crime_raw.csv")
imd_raw = pd.read_excel(RAW_DIR / "imd_2019_raw.xlsx")

crime_raw.head(), imd_raw.head()

In [None]:
crime_raw.info()
crime_raw.describe(include='all')

In [None]:
crime_raw.isna().sum()

In [None]:
sns.countplot(data=crime_raw, x="Borough", order=crime_raw["Borough"].value_counts().index)
plt.xticks(rotation=90)
plt.title("Row Count per Borough")
plt.show()