# BTown Tickets

> An analysis of parking tickets in Brampton, Ontario

## Motivating Questions

### Personal interest

I drive regularly in Brampton, so I want to know:

- Where are the most ticketed places in Brampton?
  - Why are they the most ticketed places?

- What are the most common reasons for parking tickets in Brampton? The rarest?
  - What are the most expensive offenses I could commit? The cheapest?
    - Should I expect the fines to vary for the same kind of offense? And by how much
  - What kinds of parking violations generate the most revenue for the city?

- What times of the year has the greatest volume of tickets? The least?
  - Are there seasonal/holiday patterns in parking offenses?
  - Did Brampton's parking violation behaviour get better or worse from 2013 to 2018? (the period of the dataset)
  - How frequently are Bramptonians getting parking tickets?


### Story: Brampton's driving reputation

Brampton drivers have a bad reputation (as of 2024) (CITATION?). Does this reputation hold for Brampton's parking habits?

- How does Brampton compare to other cities (particularly Vaughn, Toronto, and Mississauga), in terms of:
  - Volume of tickets (proportional to driving population, active police force, etc.)
  - Severity of penalties/fines
  - Severity of the nature of offenses
  - Frequency of tickets

## Imports

In [1]:
from pathlib import Path
import locale
import pandas as pd
import numpy as np

from btowntickets import utils

In [2]:
# Config
locale.setlocale(locale.LC_ALL, '')

'English_Canada.1252'

## Load Data

Convert to parquet format (if not already!)

- It's more space efficient than `.csv`
- It can store type information
- It loads faster into memory

[Apache Parquet](https://parquet.apache.org/)

In [3]:
def clean(df: pd.DataFrame) -> pd.DataFrame:
    """Clean the parking tickets dataset"""
    # For now, we remove tickets missing either an issue date or issue time
    df = df.dropna(axis="index", how="any", subset=["ISSUEDATE", "ISSUETIME"])

    # Drop duplicates
    df = df[~df.duplicated()]
    return df

In [4]:
df = None
CLEAN_DATA = Path("../data/clean/Penalty_Notices-Brampton.parquet")
try:
    df = utils.LoadProcess(
        data_loader=utils.ParquetLoader(CLEAN_DATA)
    ).run()
    print(f"Loaded '{CLEAN_DATA.name}'")
except FileNotFoundError as err:
    df_raw = utils.LoadProcess(
        data_loader=utils.CSVLoader(Path("../data/raw/Penalty_Notices-Parking_Tickets-2013_2018.csv"))
    ).run()
    print("Loaded .csv")
    df = clean(df_raw)
    print("Cleaned df")
    df.to_parquet(CLEAN_DATA, engine="pyarrow")
    print("Saved .parquet")

Loaded .csv
Cleaned df
Saved .parquet


In [5]:
df.dtypes

X                             Float64
Y                             Float64
OBJECTID                        Int64
ADDRESS                        object
ISSUEDATE         datetime64[ns, UTC]
LICSTATEPROV                 category
VIODESCRIPTION               category
VIOFINE                       Float64
VOIDSTATUS                   category
ISSUETIME         datetime64[ns, UTC]
dtype: object

# High-Level Metrics

Totals

In [12]:
print(f"{len(df)} tickets issued over the following period:")
print(df["ISSUEDATE"].describe().loc[["min", "max"]])

376668 tickets issued over the following period:
min    2013-01-01 00:00:00+00:00
max    2018-07-27 00:00:00+00:00
Name: ISSUEDATE, dtype: object


In [14]:
# How many voided parking tickets?
df["VOIDSTATUS"].value_counts(dropna=False)

VOIDSTATUS
NO    314624
XX     56477
VO      5567
Name: count, dtype: int64

In [None]:
# TODO: Figure out what 'XX' means. Is this supposed to mean 'not applicable'?

For the rest of our analysis, let's look at non-void tickets

In [16]:
nonvoid = df.loc[df["VOIDSTATUS"] == "NO"]
nonvoid["VOIDSTATUS"].value_counts(dropna=False)

VOIDSTATUS
NO    314624
VO         0
XX         0
Name: count, dtype: int64

## Why are people getting tickets? And how much do they pay?

In [None]:
# TODO: Produce visuals instead of visuals. Consider using seaborn, plotly, etc.

In [29]:
parking_ticket_reasons = nonvoid.groupby(["VIODESCRIPTION"], observed=True)["VIOFINE"].describe()\
    .reset_index()\
    .rename(columns={"mean": "Fine mean", "std": "Fine std", "min": "Fine min", "max": "Fine max"})\
    .sort_values(by="50%", ascending=False)
parking_ticket_reasons.head(10)

Unnamed: 0,VIODESCRIPTION,count,Fine mean,Fine std,Fine min,25%,50%,75%,Fine max
3,PARK ACCESSIBLE PARKING SPACE ON STREET/NO PERMIT,44.0,305.795455,83.887682,90.0,350.0,350.0,350.0,350.0
15,PARK IN ACCESSIBLE PARKING SPACE/NO PERMIT,1346.0,249.919019,119.393711,0.0,175.0,350.0,350.0,350.0
1,OBSTRUCT ACCESS AISLE,1269.0,192.487786,100.738257,0.0,150.0,150.0,300.0,350.0
21,PARK LARGE MOTOR VEHICLE ON STREET,2628.0,106.341134,27.456173,0.0,100.0,125.0,125.0,125.0
33,PARK WITHIN 3M OF FIRE HYDRANT,3805.0,90.479632,23.173542,0.0,100.0,100.0,100.0,100.0
13,PARK IN A DESIGNATED FIRE ROUTE,9623.0,110.354567,39.194162,-17.0,100.0,100.0,150.0,150.0
39,STOP PROHIBITED TIME AS POSTED,1842.0,85.377307,28.22143,0.0,100.0,100.0,100.0,100.0
38,STOP IN PROHIBITED AREA,2297.0,86.145842,27.29579,-19.0,100.0,100.0,100.0,100.0
20,PARK INTERFERING WITH SNOW REMOVAL AND/OR WINT...,3153.0,68.914209,15.897066,0.0,75.0,75.0,75.0,75.0
46,STAND IN NO PARKING LOADING ZONE,4.0,50.0,0.0,50.0,50.0,50.0,50.0,50.0


In [48]:
# TODO: Figure out why are there negative values ???
df["VIOFINE"].describe()

count     314624.0
mean     40.896573
std      27.876275
min          -19.0
25%           35.0
50%           35.0
75%           40.0
max          350.0
Name: VIOFINE, dtype: Float64

### Top 5 most expensive reasons for parking tickets

In [32]:
parking_ticket_reasons.sort_values(by="Fine max", ascending=False).head(5)

Unnamed: 0,VIODESCRIPTION,count,Fine mean,Fine std,Fine min,25%,50%,75%,Fine max
3,PARK ACCESSIBLE PARKING SPACE ON STREET/NO PERMIT,44.0,305.795455,83.887682,90.0,350.0,350.0,350.0,350.0
1,OBSTRUCT ACCESS AISLE,1269.0,192.487786,100.738257,0.0,150.0,150.0,300.0,350.0
15,PARK IN ACCESSIBLE PARKING SPACE/NO PERMIT,1346.0,249.919019,119.393711,0.0,175.0,350.0,350.0,350.0
13,PARK IN A DESIGNATED FIRE ROUTE,9623.0,110.354567,39.194162,-17.0,100.0,100.0,150.0,150.0
29,PARK ON PRIVATE PROPERTY,49839.0,39.365808,3.924825,-15.0,40.0,40.0,40.0,150.0


### Top 5 cheapest reasons for parking tickets

In [33]:
parking_ticket_reasons.sort_values(by="Fine min", ascending=False).tail(5)
# Why are they negative lol?

Unnamed: 0,VIODESCRIPTION,count,Fine mean,Fine std,Fine min,25%,50%,75%,Fine max
29,PARK ON PRIVATE PROPERTY,49839.0,39.365808,3.924825,-15.0,40.0,40.0,40.0,150.0
37,PARKING 2:00 AM TO 6:00 AM PROHIBITED,161712.0,34.356683,3.683123,-17.0,35.0,35.0,35.0,40.0
13,PARK IN A DESIGNATED FIRE ROUTE,9623.0,110.354567,39.194162,-17.0,100.0,100.0,150.0,150.0
27,PARK ON MUNICIPAL PROPERTY,10025.0,37.871421,6.89529,-17.0,40.0,40.0,40.0,40.0
38,STOP IN PROHIBITED AREA,2297.0,86.145842,27.29579,-19.0,100.0,100.0,100.0,100.0


### Most common reasons for parking tickets

In [34]:
parking_ticket_reasons.sort_values(by="count", ascending=False).head(5)

Unnamed: 0,VIODESCRIPTION,count,Fine mean,Fine std,Fine min,25%,50%,75%,Fine max
37,PARKING 2:00 AM TO 6:00 AM PROHIBITED,161712.0,34.356683,3.683123,-17.0,35.0,35.0,35.0,40.0
29,PARK ON PRIVATE PROPERTY,49839.0,39.365808,3.924825,-15.0,40.0,40.0,40.0,150.0
23,PARK OBSTRUCTING SIDEWALK,17241.0,37.961893,7.366468,0.0,40.0,40.0,40.0,40.0
16,PARK IN EXCESS OF 3 HOURS,11209.0,28.865198,4.582428,-5.0,30.0,30.0,30.0,30.0
14,PARK IN A PROHIBITED AREA,10675.0,33.64445,5.523662,0.0,35.0,35.0,35.0,35.0


### Rarest reasons for parking tickets

In [35]:
parking_ticket_reasons.sort_values(by="count", ascending=False).tail(5)

Unnamed: 0,VIODESCRIPTION,count,Fine mean,Fine std,Fine min,25%,50%,75%,Fine max
43,PARK IN FRONT OF LANEWAY,5.0,24.0,13.416408,0.0,30.0,30.0,30.0,30.0
28,PARK ON PEDESTRIAN CROSSOVER,4.0,30.0,0.0,30.0,30.0,30.0,30.0,30.0
46,STAND IN NO PARKING LOADING ZONE,4.0,50.0,0.0,50.0,50.0,50.0,50.0,50.0
45,PARK ON CROSSWALK,3.0,40.0,0.0,40.0,40.0,40.0,40.0,40.0
19,PARK IN TAXICAB STAND,1.0,40.0,,40.0,40.0,40.0,40.0,40.0


### What kinds of parking offense generate the most revenue for the city?

In [40]:
nonvoid.groupby("VIODESCRIPTION", observed=True)["VIOFINE"].sum().sort_values(ascending=False).head(5)

VIODESCRIPTION
PARKING 2:00 AM TO 6:00 AM PROHIBITED    5555888.0
PARK ON PRIVATE PROPERTY                 1961952.5
PARK IN A DESIGNATED FIRE ROUTE          1061942.0
PARK OBSTRUCTING SIDEWALK                 654501.0
PARK ON MUNICIPAL PROPERTY                379661.0
Name: VIOFINE, dtype: Float64

## Spatial Trends

In [None]:
# TODO: Display a heat map of parking violation fines, that overlaps a map of Brampton (consider GeoPandas)

Most frequently ticketed places in Brampton

> Recreation centres are at the top of the list. Not unexpected

In [36]:
nonvoid["ADDRESS"].value_counts(dropna=False)

ADDRESS
CENTRAL PARK DR AT 150         3545
PETER ROBERTSON BLVD AT 995    2443
RAY LAWSON BLVD AT 500         1565
JOHN ST                        1206
40 FINCHGATE BLVD              1197
                               ... 
MOUNTAINASH RD NEAR 202           1
PROUSE DR AT 11                   1
MOUNTAINASH RD NEAR 195           1
BEAR RUN RD NEAR 08               1
LONG MEADOW RD NEAR 32            1
Name: count, Length: 60641, dtype: int64

Parking violations by provincial/state license plate

In [47]:
nonvoid["LICSTATEPROV"].value_counts()

LICSTATEPROV
ON    303656
QC      3367
AB      1462
AZ       513
NY       493
       ...  
SD         1
WY         1
HI         1
           0
MX         0
Name: count, Length: 68, dtype: int64

## Seasonal & Annual Trends in Parking Violations

In [None]:
# TODO: Analysis of seasonal & annual trends in parking violations

In [None]:
# TODO: How frequently is someone in Brampton given a parking ticket?
# Every second? Every day? Every week? Every month?

In [None]:
# TODO: Are there correlations between season and parking tickets? If so, how do those correlations change based on location?

In [None]:
# TODO: Does the clock-shifts for daylight savings have any effect on parking violations?
#   I'm guessing parking violations get more frequent when the clocks move forward

In [None]:
# TODO: Are there correlations between time of day?
#   Probably, considering 'Street parking before 2 - 6 AM is one of the most common violations

In [None]:
# TODO: Did Brampton's parking situation get worse or better over the time period of the dataset (2013 - 2018)?

## Brampton vs the GTA: How bad are we, really?

In [None]:
# TODO: Compare Brampton to one of Toronto, Mississauga, or Vaughn. Take into account differences in:
# - total population size,
# - total population density
# - driving population size
# - law enforcement population size