# San Diego Downtown Parking Model and Predictor

As an intelligent data analyst/scientist creating the **`intuitive data visualzation`**, a good habit is to start with the settings of plotting toolboxes, `matplotlib` and `seaborn`


In [1]:
%matplotlib inline
import matplotlib
import seaborn as sns
sns.set(style="darkgrid", color_codes=True)
matplotlib.rcParams['figure.dpi'] = 144


Another good habit is to list the used Python library at the top


In [5]:
import os
import glob
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import folium
from folium.plugins import MarkerCluster


There are *3 consequent jupyter notebooks* in this modeling activity presenting the end-to-end flow. On each notebook, including the method and the assumption, the code could be investigated and further re-factored. I present and go through the code matching my best understanding so far.

1. `Data Collection and Visualization`
2. `Data Investigation and Wrangle/Clean`
3. `Prediction Model Build and Performance`


This notebook contains the section of data-collection and visualization, presenting the partial development of end-to-end flow. The detailed notebook is divided into the following sections:

1. **`Problem Statement and the Proposed Solving Flow`**
2. **`Data Collection`**
3. **`Mapping and Visualization`**


## Problem Statement and the Proposed Solving Flow

## Data Collection

I can find two types of data from the warehouse managed by San Diego Government [San Diego Data Warehouse](https://data.sandiego.gov/datasets/):

1. Transaction records of parking meters [Parking Meter Transactions](https://data.sandiego.gov/datasets/parking-meters-transactions/)
2. Geo locations of parking meters [Parking Meter Locations](https://data.sandiego.gov/datasets/parking-meters-locations/)


In [33]:
df_loc = pd.read_csv('../data/parking/treas_parking_meters_loc_datasd.csv')

In [34]:
df_loc.set_index('pole', inplace=True)
df_loc.head()

Unnamed: 0_level_0,zone,area,sub_area,config_id,config_name,longitude,latitude
pole,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
CC-1003,City,Barrio Logan,1000 CESAR CHAVEZ WAY,9000,2 Hour Max $1.25 HR 8am-6pm Mon-Sat,-117.145178,32.700353
CC-1005,City,Barrio Logan,1000 CESAR CHAVEZ WAY,9000,2 Hour Max $1.25 HR 8am-6pm Mon-Sat,-117.145178,32.700352
CC-1011,City,Barrio Logan,1000 CESAR CHAVEZ WAY,9000,2 Hour Max $1.25 HR 8am-6pm Mon-Sat,-117.145349,32.700155
CC-1013,City,Barrio Logan,1000 CESAR CHAVEZ WAY,9000,2 Hour Max $1.25 HR 8am-6pm Mon-Sat,-117.145405,32.700107
CC-1015,City,Barrio Logan,1000 CESAR CHAVEZ WAY,9000,2 Hour Max $1.25 HR 8am-6pm Mon-Sat,-117.145539,32.699987


In [8]:
df_tran_2018 = pd.read_csv('../data/parking/treas_parking_payments_2018_datasd.csv')

In [11]:
df_tran_2018.head()

Unnamed: 0,uuid,meter_type,pole_id,trans_amt,pay_method,trans_start,meter_expire
0,SSJ50718010100571650,SS,J-507,50,CASH,2018-01-01 00:57:16,2018-01-01 00:57:16
1,MS2400E118010101163925,MS,2-400E1,25,CASH,2018-01-01 01:16:39,2018-01-01 01:16:39
2,MS2400E118010101164225,MS,2-400E1,25,CASH,2018-01-01 01:16:42,2018-01-01 01:16:42
3,SSG40018010101274125,SS,G-400,25,CASH,2018-01-01 01:27:41,2018-01-01 01:27:41
4,SSFI4151801010137505,SS,FI-415,5,CASH,2018-01-01 01:37:50,2018-01-01 01:37:50


In [12]:
print(len(df_tran_2018)) 
print(len(df_loc))

9524389
4931


(None, None)

In [13]:
print(len(df_tran_2018['pole_id'].unique()))

4827


In [42]:
df_loc_2018 = pd.DataFrame(index=df_tran_2018['pole_id'].unique())
df_loc_2018 = df_loc_2018.join(df_loc, how='inner')
print(len(df_loc_2018))
df_loc_2018.head()

4611


Unnamed: 0,zone,area,sub_area,config_id,config_name,longitude,latitude
1-1004,Downtown,Core - Columbia,1000 FIRST AVE,9000,2 Hour Max $1.25 HR 8am-6pm Mon-Sat,-117.163929,32.715904
1-1006,Downtown,Core - Columbia,1000 FIRST AVE,9000,2 Hour Max $1.25 HR 8am-6pm Mon-Sat,-117.16393,32.716037
1-1008,Downtown,Core - Columbia,1000 FIRST AVE,9000,2 Hour Max $1.25 HR 8am-6pm Mon-Sat,-117.163931,32.716169
1-1020,Downtown,Core - Columbia,1000 FIRST AVE,9115,15 Min Max $1.25 HR 8am-6pm Mon-Sat,-117.161278,32.71789
1-1310,Downtown,Core - Columbia,1300 FIRST AVE,12466,2 Hour Max $1.25 HR 8am-4pm Mon-Fri 8am-6pm Sat,-117.163951,32.719024


In [49]:
df_tran_2018 = df_tran_2018.join(pd.DataFrame(index=df_loc_2018.index), how='inner', on='pole_id')
df_tran_2018.head()

Unnamed: 0,uuid,meter_type,pole_id,trans_amt,pay_method,trans_start,meter_expire
0,SSJ50718010100571650,SS,J-507,50,CASH,2018-01-01 00:57:16,2018-01-01 00:57:16
4131,SSJ507180102092048250,SS,J-507,250,CREDIT CARD,2018-01-02 09:20:48,2018-01-02 12:00:00
13141,SSJ50718010212104925,SS,J-507,25,CASH,2018-01-02 12:10:49,2018-01-02 12:22:49
17371,SSJ50718010213181765,SS,J-507,65,CASH,2018-01-02 13:18:17,2018-01-02 13:49:29
19344,SSJ507180102135049250,SS,J-507,250,CREDIT CARD,2018-01-02 13:50:49,2018-01-02 15:50:49


## API Mapping and Visualization


In this section, we utilize [Folium Map](https://python-visualization.github.io/folium/) to have the interactive marker. In addition, we can also have the plugin function of `MarkerCluster` to have the interactive aggregated number of regional cluster shown in Map


In [63]:
sd_map = folium.Map(location=[32.7174209, -117.1627714], zoom_start=14) 

In [67]:
marker_cluster = MarkerCluster().add_to(sd_map)

In [69]:
for i in df_loc_2018.index:
    
    popup = i
    lat = df_loc_2018.loc[i, 'latitude']
    lon = df_loc_2018.loc[i, 'longitude']
    
    folium.Marker(
        location=[lat, lon],
        popup=popup,
        icon=folium.Icon(color='green', icon='cloud')
    ).add_to(marker_cluster)

In [70]:
display(sd_map)


### Find Unique Parking Areas


In [72]:
df_loc_2018['area'].unique()

array(['Core - Columbia', 'Cortez Hill', 'Bankers Hill', 'Marina',
       'Gaslamp', 'East Village', 'Hillcrest', 'Golden Hill',
       'North Park', 'Point Loma', 'Mission Hills', 'Barrio Logan',
       'Little Italy', 'University Heights', 'Talmadge', 'Midtown',
       'College', 'Five Points', 'Mission Beach'], dtype=object)

In [79]:
df_loc_2018.groupby(by='area').size().sort_values(ascending=False)

area
East Village          1039
Bankers Hill           660
Hillcrest              631
Core - Columbia        513
Little Italy           360
Gaslamp                352
Cortez Hill            301
Marina                 245
North Park             116
Mission Hills          107
University Heights      99
Five Points             75
Barrio Logan            34
Talmadge                32
Midtown                 16
Golden Hill             14
Point Loma               8
College                  5
Mission Beach            4
dtype: int64

### API Mapping the Specific Parking Area

In [91]:
sd_map = folium.Map(location=[32.7174209, -117.1627714], zoom_start=14) 

In [92]:
marker_cluster = MarkerCluster().add_to(sd_map)

In [93]:
for i in df_loc_2018[df_loc_2018['area'] == 'University Heights'].index:
    
    popup = i
    lat = df_loc_2018.loc[i, 'latitude']
    lon = df_loc_2018.loc[i, 'longitude']
    
    folium.Marker(
        location=[lat, lon],
        popup=popup,
        icon=folium.Icon(color='green', icon='cloud')
    ).add_to(marker_cluster)

In [94]:
display(sd_map)

In [97]:
df_loc_2018_sdsu = df_loc_2018[df_loc_2018['area'] == 'University Heights']
df_loc_2018_sdsu.head()

Unnamed: 0,zone,area,sub_area,config_id,config_name,longitude,latitude
EL-1819,Mid-City,University Heights,1800 EL CAJON BLVD,9000,2 Hour Max $1.25 HR 8am-6pm Mon-Sat,-117.145512,32.755176
EL-1821,Mid-City,University Heights,1800 EL CAJON BLVD,9000,2 Hour Max $1.25 HR 8am-6pm Mon-Sat,-117.145268,32.755176
EL-2002,Mid-City,University Heights,2000 EL CAJON BLVD,9000,2 Hour Max $1.25 HR 8am-6pm Mon-Sat,-117.14356,32.755338
EL-2006,Mid-City,University Heights,2000 EL CAJON BLVD,9000,2 Hour Max $1.25 HR 8am-6pm Mon-Sat,-117.143242,32.75534
EL-2008,Mid-City,University Heights,2000 EL CAJON BLVD,9000,2 Hour Max $1.25 HR 8am-6pm Mon-Sat,-117.14306,32.755341


In [98]:
df_tran_2018_sdsu = df_tran_2018.join(df_loc_2018_sdsu, on='pole_id', how='inner')

In [99]:
df_tran_2018_sdsu.head()

Unnamed: 0,uuid,meter_type,pole_id,trans_amt,pay_method,trans_start,meter_expire,zone,area,sub_area,config_id,config_name,longitude,latitude
58,SSMO173718010109313025,SS,MO-1737,25,CASH,2018-01-01 09:31:30,2018-01-01 09:31:30,Uptown,University Heights,1700 MONROE AVE,8999,1 Hour Max $1.25 HR 8am-6pm Mon-Sat,-117.146457,32.759051
1935,SSMO173718010208071225,SS,MO-1737,25,CASH,2018-01-02 08:07:12,2018-01-02 08:19:12,Uptown,University Heights,1700 MONROE AVE,8999,1 Hour Max $1.25 HR 8am-6pm Mon-Sat,-117.146457,32.759051
3123,SSMO173718010208494115,SS,MO-1737,15,CASH,2018-01-02 08:49:41,2018-01-02 08:56:53,Uptown,University Heights,1700 MONROE AVE,8999,1 Hour Max $1.25 HR 8am-6pm Mon-Sat,-117.146457,32.759051
23929,SSMO1737180102151209125,SS,MO-1737,125,CREDIT CARD,2018-01-02 15:12:09,2018-01-02 16:12:09,Uptown,University Heights,1700 MONROE AVE,8999,1 Hour Max $1.25 HR 8am-6pm Mon-Sat,-117.146457,32.759051
27866,SSMO1737180102162711125,SS,MO-1737,125,CREDIT CARD,2018-01-02 16:27:11,2018-01-02 17:27:11,Uptown,University Heights,1700 MONROE AVE,8999,1 Hour Max $1.25 HR 8am-6pm Mon-Sat,-117.146457,32.759051


### Save the dataframes for the next step

In [103]:
df_tran_2018.to_pickle('./data_saved/2018_transactions.pkl', compression='gzip')
df_loc_2018.to_pickle('./data_saved/2018_parking_locations.pkl')