# Section 1
The City of Baltimore maintains a database of parking citations issued within the city. More information about the dataset can be found here. You can download the dataset as a CSV file here. Unless stated otherwise, you should only consider citations written before January 1, 2019.

### Data cleaning and prep

In [39]:
import pandas as pd
import numpy as np
from datetime import datetime

In [None]:
# initial read of file, forcing dtypes
data = 'Parking_Citations.csv'
cols_keep = ['Citation', 'Make', 'ViolFine', 'ViolDate','OpenPenalty', 'PoliceDistrict']
cols_types = ['int64', 'str', 'float64', 'str', 'float64', 'str']
df = pd.read_csv(data, usecols=cols_keep, dtype=dict(zip(cols_keep, cols_types)))
# df = pd.read_csv(data, usecols=cols_keep, dtype=dict(zip(cols_keep, cols_types)), parse_dates=['ViolDate'])
df.head()

In [41]:
# converting ViolDate column into python date time
df.ViolDate = pd.to_datetime(df.ViolDate, format='%m/%d/%Y %I:%M:%S %p', infer_datetime_format=True)
df.dtypes

Citation                   int64
Make                      object
ViolFine                 float64
ViolDate          datetime64[ns]
OpenPenalty              float64
PoliceDistrict            object
dtype: object

### Q1. For all citations, what is the mean violation fine?

In [45]:
mean_vio_fine = df.ViolFine.mean()
print(round(mean_vio_fine,10))

49.8959492313


### Q2. Find the police district that has the highest mean violation fine. What is that mean violation fine? Keep in mind that Baltimore is divided into nine police districts, so clean the data accordingly.

In [48]:
# find unique values in district column:
districts = df.PoliceDistrict.unique()
print(districts)

[nan 'Eastern' 'Western' 'Northern' 'Central' 'Southeastern' 'Notheastern'
 'Northwestern' 'NORTHERN' 'Southern' 'SOUTHERN' 'Southwestern'
 'SOUTHWESTERN' 'SOUTHEASTERN' 'CENTRAL' 'WESTERN' 'EASTERN'
 'NORTHWESTERN' 'NORTHEASTERN']


In [51]:
# replace 'Notheastern' with 'Northeastern'
df.PoliceDistrict = df.PoliceDistrict.replace('Notheastern', 'NORTHEASTERN')

In [66]:
# group df by lowercased district name and find mean violation find
pd.set_option('display.precision',10)
dist_gp = df.groupby(df['PoliceDistrict'].str.lower())
dist_gp.mean().sort_values(by=['ViolFine'])

Unnamed: 0_level_0,Citation,ViolFine,OpenPenalty
PoliceDistrict,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
central,73842492.001,44.8028723651,38.5895606517
northern,76267226.602,48.1452263091,32.2822440838
southeastern,74997810.382,48.750941886,28.7512592981
eastern,74697449.202,50.8005944339,64.4070971809
western,72524726.431,53.4737451941,67.4415652815
southern,77515468.806,54.38742581,30.107563325
southwestern,72115918.031,58.2980085349,56.8853129445
northwestern,70819505.604,59.7556851742,57.0510189014
northeastern,73509503.138,61.3523935667,48.3817218543
