# Acquire

Let's acquire the parking citations data from our file.
1. Import libraries.
1. Load the dataset.
1. Display the shape and first/last 2 rows.
1. Display general infomation about the dataset - w/ the # of unique values in each column.
1. Display the number of missing values in each column.
1. Descriptive statistics for all numeric features.

In [None]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sys
import time
import folium

# Functions to scrape youtube comments.
from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Insert path to the source folder.
sys.path.insert(1, 'src/')
import acquire
import prepare

# Filter warnings
from warnings import filterwarnings
filterwarnings('ignore')

## Street Sweeper Citations
Data: [Los Angeles Parking Citations](https://www.kaggle.com/cityofLA/los-angeles-parking-citations)<br>
Load the dataset and filter for:
- Citations issued from 2017-01-01 to 2020-12-22.
- Street Sweeping violations Violation Description"

In [None]:
df = acquire.get_sweep_data()

In [None]:
df = prepare.prep_sweep_data(df)

In [None]:
df.head(2)

In [None]:
df.tail(2)

In [None]:
df.describe()

In [None]:
df.hist(figsize=(16, 8))
plt.tight_layout();

__Initial findings__
- `Issue time` and `Marked Time` are quasi-normally distributed. Note: Poisson Distribution
- It's interesting to see the distribution of our activity on earth follows a normal distribution.
- Agencies 50+ write the most parking citations.
- Most fine amounts are less than $100.00
- There are a few license plates that are null or invalid entrys.

## Social Media, Youtube, and News Articles

- https://www.latimes.com/california/story/2020-03-16/los-angeles-parking-ticket-street-sweeping-coronavirus-covid19
- https://www.latimes.com/california/story/2020-10-15/street-sweeping-parking-enforcement-resumes-today
- https://abc7.com/society/las-resumed-parking-enforcement-prompts-outcry/7079278/
- https://labss.maps.arcgis.com/apps/webappviewer/index.html?id=51d5f486a30a4c9f9397f31cdaa3ae17
- https://www.youtube.com/watch?v=UcVexGcW27k
- https://www.youtube.com/watch?v=arNAJ4DgGMk
- https://streetsla.lacity.org/

In [None]:
data=[]

with Chrome() as driver:
    wait = WebDriverWait(driver,15)
    driver.get("https://www.youtube.com/watch?v=kuhhT_cBtFU&t=2s")

    for item in range(200): 
        wait.until(EC.visibility_of_element_located((By.TAG_NAME, "body"))).send_keys(Keys.END)
        time.sleep(15)

    for comment in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#content"))):
        data.append(comment.text)

In [None]:
data[0]

# Prepare

- Remove spaces + capitalization in each column name
- Cast `Plate Expiry Date` to datetime data type.
- Cast `Issue Date` and `Issue Time` to datetime data types.
- Drop columns missing >=74.42\% of their values. 
- Drop missing values.

# Exploratory Data Analysis

```python
m = folium.Map(location=[34.0522, -118.2437],
               min_zoom=8,
               max_bounds=True)

mc = plugins.MarkerCluster()

for index, row in df.iterrows():
    mc.add_child(
        
        folium.Marker(location=[str(row['latitude']), str(row['longitude'])],
                      popup='Cited {} {} at {}'.format(row['day_of_week'],
                                                       row['issue_date'].strftime('%Y-%m-%d'),
                                                       row['issue_time'][:-3]),
                      control_scale=True,
                      clustered_marker=True
                     )
    )
    

m.add_child(mc)

m.save('citations_2020.html')
```

## Social Media
- Youtube
- Twitter

### Hypothesis Test 1

**Initial Query**
Was the amount of citations issued in October 2020 significantly greater than previous Octobers?

**Statistical Test**

$H_0$: There is no significant difference...

$H_a$: There is a significant difference...

### Hypothesis Test 2

**Initial Query**
Question

**Statistical Test**

$H_0$:

$H_a$:

### Hypothesis Test 3

**Initial Query**
Question

**Statistical Test**

$H_0$:

$H_a$:

# Conclusions