# Toronto Parking Tickets Analysis #

#### Dataset Description ####

Approximately 2.8 million parking tickets are issued annually across the City of Toronto. This dataset contains non-identifiable information relating to each parking ticket issued for each calendar year. The tickets are issued by Toronto Police Services (TPS) personnel as well as persons certified and authorized to issue tickets by TPS.

This data set contains complete records only. Incomplete records in the City database are not included in the data set. Incomplete records may exist due to a variety of reasons e.g. the vehicle registration is out-of-province, tickets paid prior to staff entering the ticket data, etc.The volume of incomplete records relative to the overall volume is low and therefore presents insignificant impact to trend analysis.

Data source: https://open.toronto.ca/dataset/parking-tickets/

In [144]:
# Import libraries
import pandas as pd
import requests
import numpy as np
from geopy.geocoders import Nominatim
import config

import plotly.express as px
import plotly.io as pio
import plotly.graph_objects as go

In [3]:
# Get the dataset metadata by passing package_id to the package_search endpoint
# For example, to retrieve the metadata for this dataset:

url = "https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/package_show"
params = { "id": "8c233bc2-1879-44ff-a0e4-9b69a9032c54"}
package = requests.get(url, params = params).json()
#print(package['result']['resources'])

### Data Exploration and Cleaning ###

In [132]:
# Import and prepare data

data = pd.read_csv('parking-tickets-2018/Parking_Tags_Data_2018_1.csv')
df = pd.DataFrame(data)
print('Number of rows and cols: ', str(df.shape))

Number of rows and cols:  (750000, 11)


In [5]:
df.head()

Unnamed: 0,tag_number_masked,date_of_infraction,infraction_code,infraction_description,set_fine_amount,time_of_infraction,location1,location2,location3,location4,province
0,***92517,20180101,16,PARK-WITHIN 9M INTERSECT ROAD,50,0.0,S/S,PRYOR AVE,E/O,CLOVERDALE RD,ON
1,***71708,20180101,29,PARK PROHIBITED TIME NO PERMIT,30,2.0,NR,266 DOVERCOURT RD,,,ON
2,***92311,20180101,29,PARK PROHIBITED TIME NO PERMIT,30,2.0,NR,15 FAIRBANK AVE,,,ON
3,***92312,20180101,29,PARK PROHIBITED TIME NO PERMIT,30,2.0,NR,15 FAIRBANK AVE,,,ON
4,***71709,20180101,29,PARK PROHIBITED TIME NO PERMIT,30,3.0,NR,266 DOVERCOURT RD,,,ON


In [6]:
df.dtypes

tag_number_masked          object
date_of_infraction          int64
infraction_code             int64
infraction_description     object
set_fine_amount             int64
time_of_infraction        float64
location1                  object
location2                  object
location3                  object
location4                  object
province                   object
dtype: object

In [134]:
# check for NaN values
print('Number of rows and cols: ', str(df.shape))
print(df.isna().sum(axis = 0))

Number of rows and cols:  (750000, 11)
tag_number_masked              0
date_of_infraction             0
infraction_code                0
infraction_description         0
set_fine_amount                0
time_of_infraction           677
location1                  70403
location2                     90
location3                 703627
location4                 703387
province                       0
dtype: int64


In [127]:
# check if there are records from other provinces 
print(df['province'].unique())
print(df.groupby(['province']).tag_number_masked.count().nlargest(5))

['ON' 'QC' 'MI' 'AZ' 'VT' 'AB' 'NM' 'NY' 'WV' 'NJ' 'PA' 'BC' 'PQ' 'IL'
 'OH' 'NS' 'SK' 'XX' 'FL' 'MB' 'MN' 'VA' 'TX' 'OR' 'MD' 'NB' 'GA' 'CA'
 'MA' 'NC' 'CO' 'PE' 'IN' 'NF' 'IA' 'CT' 'WI' 'WA' 'NH' 'RI' 'AL' 'MO'
 'KY' 'LA' 'ME' 'OK' 'AR' 'SD' 'NV' 'TN' 'SC' 'NT' 'NE' 'MS' 'DC' 'ID'
 'MT' 'KS' 'NU' 'UT' 'YT' 'DE' 'GO' 'AK' 'ND' 'HI']
province
ON    726528
QC      7608
AB      2498
AZ      1605
NY      1584
Name: tag_number_masked, dtype: int64


- Looks like <b>location2</b> has the physical address.
- When physical address is occationally not available, the combination of <b>location2</b> and <b>location3</b> gives the main intersection.
- There are records occurred outsides of Ontario shown in <b>province</b>.

After spot checking the addresses shown in <b>province</b> outside of Ontario, the addresses do exist in Toronto. The provinces shown other than Ontario are likely an encoding or system error. 

In [107]:
# extract year, month, date in their own column
df['infraction_yr'] = df.date_of_infraction.astype(str).str[:4]
df['infraction_mth'] = df.date_of_infraction.astype(str).str[4:6]
df['infraction_date'] = df.date_of_infraction.astype(str).str[6:8]
df.tail()

Unnamed: 0,tag_number_masked,date_of_infraction,infraction_code,infraction_description,set_fine_amount,time_of_infraction,location1,location2,location3,location4,province,infraction_yr,infraction_mth,infraction_date
749995,***36033,20180517,207,PARK MACHINE-REQD FEE NOT PAID,30,1237.0,NR,37 ELM ST,,,ON,2018,5,17
749996,***52274,20180517,3,PARK ON PRIVATE PROPERTY,30,1237.0,AT,33 DAVISVILLE AVE,,,ON,2018,5,17
749997,***53832,20180517,5,PARK-SIGNED HWY-PROHIBIT DY/TM,50,1237.0,OPP,188 MC CAUL ST,,,ON,2018,5,17
749998,***56548,20180517,9,STOP-SIGNED HWY-PROHIBIT TM/DY,60,1237.0,NR,124 AVENUE RD,,,ON,2018,5,17
749999,***59300,20180517,29,PARK PROHIBITED TIME NO PERMIT,30,1237.0,NR,508 MARKHAM ST,,,ON,2018,5,17


### Preminary Analysis ###

In [9]:
daily_infraction_df = pd.DataFrame(df.groupby(['infraction_yr','infraction_mth', 'infraction_date']).tag_number_masked.count())
daily_infraction_df = daily_infraction_df.reset_index()
daily_infraction_df['infraction_ymd'] = daily_infraction_df['infraction_yr'].map(str) + '-' + daily_infraction_df['infraction_mth'].map(str) + '-' + daily_infraction_df['infraction_date'].map(str)
daily_infraction_df.head()

Unnamed: 0,infraction_yr,infraction_mth,infraction_date,tag_number_masked,infraction_ymd
0,2018,1,1,1269,2018-01-01
1,2018,1,2,5489,2018-01-02
2,2018,1,3,5104,2018-01-03
3,2018,1,4,5002,2018-01-04
4,2018,1,5,4177,2018-01-05


In [135]:
# plot daily infractions per day

fig = px.line(daily_infraction_df, x='infraction_ymd', y='tag_number_masked', title='Number of Infractions Per Day')
fig.show()

In [11]:
# plot infraction volume by month

fig2 = px.bar(daily_infraction_df, 
              x='infraction_mth', 
              y='tag_number_masked',
              title='Number of Infractions Per Month') 
fig2.show()

In [15]:
# display as heatmap

fig_heatmap = go.Figure(data=go.Heatmap(
                        z=daily_infraction_df['tag_number_masked'],
                        x=daily_infraction_df['infraction_date'], 
                        y=daily_infraction_df['infraction_mth'],
                        colorscale='Mint'
                        ))

fig_heatmap.update_xaxes(title_text = 'Date')
fig_heatmap.update_yaxes(title_text = 'Month')
fig_heatmap.update_layout(title_text='Number of Tickets Issued Per Day')

fig_heatmap.show()

In [31]:
# Highest to the lowest Total Fine Amount by type

fine_type_df = pd.DataFrame(df.groupby(['infraction_yr','infraction_description', 'set_fine_amount']).tag_number_masked.count()).reset_index()
fine_type_df['total_fine_amt'] = fine_type_df['set_fine_amount'] * fine_type_df['tag_number_masked']

print(fine_type_df.sort_values(by=['total_fine_amt', 'tag_number_masked'], ascending=False))

    infraction_yr          infraction_description  set_fine_amount  \
84           2018  PARK-SIGNED HWY-PROHIBIT DY/TM               50   
43           2018        PARK ON PRIVATE PROPERTY               30   
139          2018   STOP-SIGNED HIGHWAY-RUSH HOUR              150   
51           2018  PARK PROHIBITED TIME NO PERMIT               30   
34           2018  PARK MACHINE-REQD FEE NOT PAID               30   
..            ...                             ...              ...   
102          2018      STAND SIGNED TAXICAB STAND                0   
110          2018    STAND VEHICLE-SIGNED HIGHWAY                0   
111          2018  STAND VEHICLE-SIGNED HIGHWAY-3                0   
125          2018    STOP SIDE STOPPED/PARKED VEH                0   
144          2018  STOP/STAND/PARK VEND NO PERMIT                0   

     tag_number_masked  total_fine_amt  
84              118228         5911400  
43              143717         4311510  
139              25037         37555

In [86]:
# plot highest to lowest single ticket amount by type 

fig3 = go.Figure()

fig3.add_trace(go.Bar(x=fine_type_df['infraction_description'], 
                      y=fine_type_df['set_fine_amount'])
                ) 

fig3.update_layout(xaxis=dict({'categoryorder':'total descending'},
                              tickfont=dict(size=8)), 
                   title_text='Highest to Lowest Single Ticket Amount by Type',
                   yaxis_title='Single Ticket Amount',
                  )
                  
fig3.show()

In [96]:
# highest to lowest total fine amount and frequency

fig4 = go.Figure()

fig4.add_trace(go.Bar(x=fine_type_df['infraction_description'], 
                      y=fine_type_df['tag_number_masked'],
                      name='Number of Tickets Issued'
                     )
                ) 

fig4.add_trace(go.Scatter(x=fine_type_df['infraction_description'], 
                          y=fine_type_df['total_fine_amt'],
                          yaxis='y2',
                          mode="markers",
                          name='Total Fine Amount')
              )


fig4.update_layout(xaxis=dict({'categoryorder':'total descending'},
                              tickfont=dict(size=8)), 
                   title_text='Highest to Lowest Total Fine Amount and Frequency in Log Scale',
                   yaxis_title='Number of Tickets Issued',
                   yaxis=dict(type='log'),
                   yaxis2=dict(anchor='x',
                              overlaying='y',
                              side='right',
                              type='log',
                              title='Total Fine Amount'),
                   legend=dict(orientation='h',
                               y=1.1),
                   height=600
                  )
fig4.show()

### Which areas of Toronto do infractions occur? ###

In [228]:
# test dataframe
#del test_data, test_df
test_data = pd.read_csv('test_data.csv')
test_df = pd.DataFrame(test_data)
print('Number of rows and cols: ', str(test_df.shape))

Number of rows and cols:  (49, 11)


In [266]:
# Geocoding to converting street addresses to lat and long coordinates test

# Try converting 1 physical address using Geopy
geolocator = Nominatim(user_agent='toronto_parking_ticket_analysis')
location = geolocator.geocode('3 DE JONG ST, Toronto, Ontario, Canada').latitude
print((location))

# Is geopy able to handle intersections
location1 = geolocator.geocode('HAYHURST RD and KIPLING AVE, Toronto, Ontario')
print((location1.latitude, location1.longitude))

location2 = geolocator.geocode('NEWCASTLE ST and ROYAL YORK RD, Toronto, Ontario, Canada')
print(location2)

# Handle Postal code
location3 = geolocator.geocode('M8Y 2R3')
print(location3.latitude, location3.longitude)

43.76603304166667
(43.6920292, -79.5577647)
Newcastle Street, Royal York Road, Etobicoke—Lakeshore, Etobicoke, Toronto, Golden Horseshoe, Ontario, M8Y 2R3, Canada
43.6185434 -79.4995218


Geopy geocode handles full addresses well, but not intersections. Could potentially retrieven the postal code for coordinates but the address returned is not fully accurate. 

In [259]:
# Patch intersections data, combine the street names for an intersection name if a numbered address is not available
# Add Toronto, Ontario to the end of the addresses
# Currently skipping some of geocoding when only intersection data is available

skipped_rows = 0

for i in range(len(test_df)):
    an_address = test_df.loc[i, 'location2'][0].isdigit()    
    if an_address:
        test_df.loc[i, 'full_location'] = test_df.loc[i, 'location2'] + str(', Toronto, Ontario, Canada') 
    else:      
        test_df.loc[i, 'full_location'] = test_df.loc[i, 'location2'] + str(' and ') + test_df.loc[i, 'location4'] + str(', Toronto, Ontario, Canada') 
    
    try:
        test_df.loc[i, 'latitude'] = geolocator.geocode(test_df.loc[i, 'full_location']).latitude
        test_df.loc[i, 'longitude'] = geolocator.geocode(test_df.loc[i, 'full_location']).longitude
    except:
        skipped_rows += 1
    
        
print(skipped_rows)

11


In [260]:
test_df.head()

Unnamed: 0,tag_number_masked,date_of_infraction,infraction_code,infraction_description,set_fine_amount,time_of_infraction,location1,location2,location3,location4,province,full_location,latitude,longitude
0,***71720,20180101,5,PARK-SIGNED HWY-PROHIBIT DY/TM,50,12,NR,202 DOVERCOURT RD,,,ON,"202 DOVERCOURT RD, Toronto, Ontario, Canada",43.647129,-79.423922
1,***61115,20180101,5,PARK-SIGNED HWY-PROHIBIT DY/TM,50,19,W/S,GREAT WEST DR,S/O,DE JONG ST,ON,"GREAT WEST DR and DE JONG ST, Toronto, Ontario...",,
2,***61117,20180101,5,PARK-SIGNED HWY-PROHIBIT DY/TM,50,22,NR,3 DE JONG ST,,,ON,"3 DE JONG ST, Toronto, Ontario, Canada",43.766033,-79.274797
3,***61118,20180101,5,PARK-SIGNED HWY-PROHIBIT DY/TM,50,23,OPP,54 ZEZEL WAY,,,ON,"54 ZEZEL WAY, Toronto, Ontario, Canada",43.766148,-79.275421
4,***92520,20180101,5,PARK-SIGNED HWY-PROHIBIT DY/TM,50,24,E/S,FORD ST,S/O,ST CLAIR AVE W,ON,"FORD ST and ST CLAIR AVE W, Toronto, Ontario, ...",,
