# Citation Revenue

## Executive Summary

The daily citation revenue on and after October 15th, 2020 is significantly greater than the average daily revenue.

---

## Acquire
1. Aquire the dataset, [Los Angeles Parking Citations](https://www.kaggle.com/cityofLA/los-angeles-parking-citations).
1. Import libraries.
1. Load the dataset using pandas.
1. Display the shape and first/last 2 rows.
1. Display general statistics of the dataset - w/ the # of unique values in each column.
1. Uncover the number of missing values in each column.

## Prepare
- Remove spaces + capitalization from each column name.
- Drop features missing >=74.42\% of their values. 
- Drop unused features: `vin`, `rp_state_plate`, `make`, `body_style`, `color`, `marked_time`, `color_description`, `body_style_description`, `agency_description`, `meter_id`, `ticket_number`, and `violation_code`.
- Cast `issue_date` and `issue_time` to datetime data types.
- Transform `latitude` and `longitude` coordinates from NAD1983StatePlaneCaliforniaVFIPS0405 feet projection to EPSG4326 World Geodetic System 1984.
- Filter dates from 2017-01-01 to 2021-04-12.
- Filter for street sweeping citations.

In [None]:
# Import libraries
import pandas as pd
from datetime import datetime
from scipy import stats
import plotly.express as px

import src

import warnings
warnings.filterwarnings('ignore')

In [None]:
# Load the data
df = src.get_sweep_data()

In [None]:
df.head()

# Fixing and updating functions in `prepare.py`

In [None]:
# Normalize column names
formatted_feature_names = [x.replace(' ', '_').lower() for x in df.columns.to_list()]
df.columns = formatted_feature_names

In [None]:
# Drop unused features
df = src.drop_features(df)
df.head()

## Fix and update `add_features` function in `prepare.py`

- Cast `issue_date` to datetime.
- `issue_time`
    - Add the time to `issue_date`

In [None]:
# Cast issue_date to datetime
df.issue_date = pd.to_datetime(df.issue_date, infer_datetime_format=True)
print(df.issue_date[0])

In [None]:
type(df.issue_date[0])

In [None]:
# Convert `issue_time` from a float to a datetime value.
import time
time.strftime('%H:%M', df.issue_time[0]).time()

In [None]:
# FIX TIME FEATURES
# Create features using issue_data and issue_time
df = df.assign(
day_of_week = df.issue_date.dt.day_name(),
issue_year = df.issue_date.dt.year,
issue_hour = df.issue_time.dt.hour,
issue_minute = df.issue_time.dt.minute,
)
print(type(df.issue_hour))
print(type(df.issue_hour))

# Cast new features from float to int dtype.
df.issue_year = df.issue_year.astype(int)
df.issue_hour = df.issue_hour.astype(int)
df.issue_minute = df.issue_minute.astype(int)

In [None]:
# Prepare the data
df_citations = src.prep_sweep_data(df)

# Show the first two rows
df_citations.head(2)

In [None]:
# Check the feature data types and non-null counts.
df_citations.info()

# Exploration: READ DO ALL VISUALS
- Use plotly
- Move code to `explore.py`

## How much daily revenue is generated from street sweeper citations?
### Daily Revenue from Street Sweeper Citations
The number of street sweeping citations increased in October 2020.

In [None]:
# Daily street sweeping citation revenue
daily_revenue = df_citations.groupby('issue_date').fine_amount.sum()
daily_revenue.index = pd.to_datetime(daily_revenue.index)

In [None]:
sns.set_context('talk')

# Plot daily revenue from street sweeping citations
daily_revenue.plot(figsize=(14, 7), label='Revenue', color='DodgerBlue')
plt.axhline(daily_revenue.mean(), color='black', label='Average Revenue')

plt.title("Daily Revenue from Street Sweeping Citations")
plt.xlabel('')
plt.ylabel("Revenue (in thousand's)")

plt.xticks(rotation=0, horizontalalignment='center', fontsize=13)
plt.yticks(range(0, 1_000_000, 200_000), ['$0', '$200', '$400', '$600', '$800',])
plt.ylim(0, 1_000_000)

plt.legend(loc=2, framealpha=.8);

### Anomaly: Declaration of Local Emergency

In [None]:
sns.set_context('talk')

# Plot daily revenue from street sweeping citations
daily_revenue.plot(figsize=(14, 7), label='Revenue', color='DodgerBlue')
plt.axvspan('2020-03-16', '2020-10-14', color='grey', alpha=.25)
plt.text('2020-03-29', 890_000, 'Declaration of\nLocal Emergency', fontsize=11)


plt.title("Daily Revenue from Street Sweeping Citations")
plt.xlabel('')
plt.ylabel("Revenue (in thousand's)")

plt.xticks(rotation=0, horizontalalignment='center', fontsize=13)
plt.yticks(range(0, 1_000_000, 200_000), ['$0', '$200', '$400', '$600', '$800',])
plt.ylim(0, 1_000_000)

plt.legend(loc=2, framealpha=.8);

In [None]:
sns.set_context('talk')

# Plot daily revenue from street sweeping citations
daily_revenue.plot(figsize=(14, 7), label='Revenue', color='DodgerBlue')
plt.axhline(daily_revenue.mean(), color='black', label='Average Revenue')
plt.axvline(datetime.datetime(2020, 10, 15), color='red', linestyle="--", label='October 15, 2020')

plt.title("Daily Revenue from Street Sweeping Citations")
plt.xlabel('')
plt.ylabel("Revenue (in thousand's)")

plt.xticks(rotation=0, horizontalalignment='center', fontsize=13)
plt.yticks(range(0, 1_000_000, 200_000), ['$0', '$200K', '$400K', '$600K', '$800K',])
plt.ylim(0, 1_000_000)

plt.legend(loc=2, framealpha=.8);

In [None]:
## Add section comparing citation time distributions before and after parking was enforced.

# Results