# CityIQ Project SafeStreet

<br> ORGANIZATION: IGNITE | SCALE | SAN DIEGO 2019 Smart Cities Hackathon and Innovation Program

<br> PURPOSE: The goal of the CityIQ Project SafeStreet is to analyze CityIQ sensor data in order to determine what features predict traffic-related deaths on San Diego County roadways.

<br> <b>UPDATE: The City of San Diego has ended its contract with the CityIQ data provider ( Ubicquia, Inc. ), and data is no longer accessible through the API. This notebook therefore exists as a CityIQ data analysis proof-of-concept.</b>

<br> CityIQ API Documentation Sources:
<br> https://docs.cityiq.io/Default.htm
<br> https://github.com/CityIQ
<br> https://github.com/Ubicquia/CityIQ
<br> https://github.com/opensandiego/sd-smart-streetlight-tools

# Imports:

In [None]:
import numpy as np
import pandas as pd
import os

# CityIQ API requests.
import requests

# Epoch calculations.
import time
import datetime

# Visualizations imports.
import matplotlib.pyplot as plt
import folium
from folium.plugins import HeatMap

# Function Definitions:

# HTTP GET Request
Meta-/data acquisition is performed by GET requests to the CityIQ REST APIs.

In [None]:
def get_request(url, headers, params):    
    payload = ""   
    response = requests.request("GET", url, data=payload, headers=headers, params=params)
    return(response.json())

# Retrieve Data by assetUid
The purpose of this function is to perform GET requests for each asset from a list of assetUids and return the contents as a list.

In [None]:
def data_by_assetUid(auid, h, p):
    arr = []
    for i in auid:
        uri = "https://sandiego.cityiq.io/api/v2/event/assets/" + i + "/events"
        get = get_request(uri, h, p)
        if get['metaData']['totalRecords'] > 0:
            arr.append(get['content'])
    return(arr)

# Flatten List
The purpose of this function is to flatten the list that is generated by the data_by_assetUid function.

In [None]:
def flatten(arr):
    list_o_dics = []
    for l in arr:
        for dic in l:
            list_o_dics.append(dic)
    return(list_o_dics)

# Retrieve Data by Date

The purpose of the following functions is to create a list of dates from which to retrieve asset data. Specify how many days back you want to go ( from yesterday's date ).

In [None]:
def millis(dt):
    epoch = datetime.datetime.utcfromtimestamp(0)
    return(int((dt - epoch).total_seconds() * 1000.0))

def get_date_list(days_back):    
    base = datetime.datetime.today() - datetime.timedelta(1)
    dates = [(base - datetime.timedelta(days=x)).strftime('%m/%d/%Y') for x in range(0, days_back)]
    epoch_dates = [(millis(datetime.datetime.strptime(x + ' 00:00:00','%m/%d/%Y %H:%M:%S')),millis(datetime.datetime.strptime(x + ' 23:59:59','%m/%d/%Y %H:%M:%S'))) for x in dates]
    return(epoch_dates)

# Data Acquisition:

# Additional Datasets

# locations.csv
This dataset contains extracts of the assets and locations from the San Diego CityIQ system. Refer to the CityIQ developer documentation for details about these data records. The data are extracted using the cityiq Python package. See the ExtractAssets.ipynb notebook for the extract process.

In [None]:
# Source: https://data.sandiegodata.org/dataset/sandiego-gov-cityiq_objects
# Define path to data.
loc_sdrdl_fp = "./locations.csv"

# pd_collisions_datasd.csv
This dataset contains traffic collision reports within the City of San Diego. Generally, a report is not taken for property damage-only collisions that do not involve hit & run or DUI. The California Highway Patrol is responsible for handling collisions occurring on the freeway.

In [None]:
# Define path to data.
collisions_fp = "./pd_collisions_datasd.csv"

# pedestrians.csv
This dataset contains counts of pedestrian in walkways, in San Diego, from August 2018 through March 2019, aggregated to 15 minutes. Use the CityIQ Assets and Locations dataset for the geographic positions of the walkways.

The original data also has values for counts in each direction of the walkway, and the speed. Unfortunately, the geographic data for the walkways -- lines -- are usually wrong, with many walkways being incorrectly long and poorly positioned. The result is that many of the speed values are much too high, so for this dataset the speed and direction values are dropped.

In [None]:
# Source: https://data.sandiegodata.org/dataset/cityiq-pedestrians
# Define path to data.
ped_sdrdl_fp = "./pedestrians.csv"

# CityIQ API

# URLs
The data acquisition from the CityIQ REST APIs is performed by GET requests to the following URLs.

In [None]:
urls = { 'uaa': "https://auth.aa.cityiq.io/oauth/token", # CityIQ UAA.
         'meta': "https://sandiego.cityiq.io/api/v2/metadata/assets/search", # CityIQ Metadata Service.
         'event_by_bbox': "https://sandiego.cityiq.io/api/v2/event/locations/events", # CityIQ Event Service ( retrieve by bbox ).
         'event_by_auid': "https://sandiego.cityiq.io/api/v2/event/assets/" # CityIQ Event Service ( retrieve by assetUid ).
         # ,'event_by_luid': "https://sandiego.cityiq.io/api/v2/event/locations/" # CityIQ Event Service ( retrieve by locationUid ).
       }

# Postman-Tokens
The GET request header requires a parameter called "Postman-Token", which varies by type of asset/event request. We do not have access to the Metrology or Media zones, so cannot access their data ( these tokens are commented out ).

In [None]:
ptokens = { 'uaa': "8c5854aa-9b88-482f-978a-77916bfd9792", # CityIQ UAA.
            'meta': "f51ab022-3288-4fbe-a7d2-5781fe449378", # CityIQ Metadata Service.
            'pkin_auid': "1284f57d-1a27-4b7d-947b-a53ece2a01b8", # CityIQ Parking Event Service ( retrieve by assetId ).
            'pkin_bbox': "01d2538e-beeb-4498-964f-6f97a9538d5c", # CityIQ Parking Event Service ( retrieve by bbox ).
            'pkin_luid': "a79e2341-0b9e-4dc1-8d10-29f8e28e3c99", # CityIQ Parking Event Service ( retrieve by locationId ).
            'pkout_auid': "687c56b4-b1fe-47a2-8312-58f92e4276aa", # CityIQ Parking Event Service ( retrieve by assetId ).
            'pkout_bbox': "114157a1-7f40-495b-a1f3-716c9505d8fb", # CityIQ Parking Event Service ( retrieve by bbox ).
            'pkout_luid': "cd48f6dc-e034-41a2-9aa2-fbbad13e6bfc", # CityIQ Parking Event Service ( retrieve by locationId ).
            'pedevt_auid': "833733ae-b5b6-4168-a9a9-610f5b0fe6cb", # CityIQ Pedestrian Event Service ( retreive by assetId ).          
            'pedevt_bbox': "f51ab022-3288-4fbe-a7d2-5781fe449378", # CityIQ Pedestrian Event Service ( retrieve by bbox ).
            'pedevt_luid': "22b8e9c1-be36-4aef-9eb8-4a37f32f8679", # CityIQ Pedestrian Event Service ( retrieve by locationId ).
            'tfevt_auid': "26685807-3fbb-43ce-b200-e0f5ebb53073", # CityIQ Traffic Event Service ( retrieve by assetId ).
            'tfevt_bbox': "4ed0a27c-c91e-4baa-93b1-3a7434c25e1b", # CityIQ Traffic Event Service ( retrieve by bbox ).
            'tfevt_luid': "8f62aa35-2223-4351-a462-9b1a9f2a5913", # CityIQ Pedestrian Event Service ( retrieve by locationId ).
            'humidity': "a7299bd9-5a78-422a-ba3d-f6bfb142b74a", # CityIQ Environmental Event Service ( retrieve by assetId ).
            'orientation': "308e24be-78fc-40c8-9eb7-a37dab020494", # CityIQ Environmental Event Service ( retrieve by assetId ).
            'pressure': "f9b81deb-cd11-47ce-be04-13a1dc858f63", # CityIQ Environmental Event Service ( retrieve by assetId ).
            'temperature': "82bd2ae9-c991-4a80-abd2-7e54abcbcd79", # CityIQ Environmental Event Service ( retrieve by assetId ).
            'humidity': "a7299bd9-5a78-422a-ba3d-f6bfb142b74a" # CityIQ Environmental Event Service ( retrieve by assetId ).
            # ,'energy_alert': "a9f2834c-8cdf-4ee0-a73c-6aa3614db52f",  # CityIQ Metrology Event Service ( retrieve by assetId ).
            # ,'energy_timeseries': "f71bae62-2f6c-48dd-b3fd-4f6b58f6823b", # CityIQ Metrology Event Service ( retrieve by assetId).
            # ,'metrology': "8e876aef-25f9-46e9-bfb3-96ff628023bd" # CityIQ Metrology Event Service ( retrieve by assetUid).
          }

# Predix Zone IDs
The GET request header requires Predix zone IDs, which define separate data streams ( ENVIRONMENTAL, ENERGY-METERING, MEDIA, PARKING, PEDESTRIAN, TRAFFIC ). The zone IDs indicate the different sub-APIs that we have access to, which allow us to pull their corresponding event type data. See the access token 'scope' value for which data streams we have access to.

In [None]:
zone_ids = { 'environmental': "SD-IE-ENVIRONMENTAL",
             'parking': "SD-IE-PARKING",
             'pedestrian': "SD-IE-PEDESTRIAN",
             'traffic': "SD-IE-TRAFFIC"
             # ,'audio': "SD-ID-AUDIO",
             # ,'image': "SD-IE-IMAGE",
             # ,'metrology': "SD-IE-METROLOGY",
             # ,'video': "SD-IE-VIDEO"
           }

# Boundary Boxes
Meta-/data can be filtered using the boundary box parameter, which is defined by northwest and southeast latitude and longitude coordinates, and represents a geographic area. For example, this parameter could be used to collect asset metadata ( e.g. assetUids ) for assets along specific Vision Zero corridors ( may require sequential queries to obtain entire corridor ).



In [None]:
bboxes = { 'el_cajon_blvd': "33.077762:-117.663817,32.559574:-116.584410",
           'downtown_san_diego': "32.718987:-117.174244,32.707356:-117.154850"
           # ,'4000_4100_university_ave': "32.749516:-117.104581,32.749862:-117.109569",
           # ,'university_ave': "32.747555:-117.073596,32.749721:-117.173804",
           # ,'petco_park': "32.709496:-117.159329,32.705434:-117.154362"
         }

# Timestamps
Event data can be pulled within a specific epoch, defined by timestamps bounding the interval ( in Unix time ).

In [None]:
# Define epoch with timestamps in the following format: YEAR-MONTH-DAY HOUR:MINUTE:SECOND MICROSECONDS.
START = "2019-05-1 00:00:00" 
END = "2019-05-1 23:59:59"

# Convert into tuples for time module.
start_tuple = time.strptime(START, '%Y-%m-%d %H:%M:%S')
end_tuple = time.strptime(END, '%Y-%m-%d %H:%M:%S')

# Define timestamps in milliseconds ( to nearest second ).
STARTTS = int(1000*time.mktime(start_tuple))
ENDTS = int(1000*time.mktime(end_tuple))

# Asset Types
Types of assets ( sensors ) available. Each asset can only be of a single type.

In [None]:
assets = { 'camera': "CAMERA", 
           'environmental': "ENV_SENSOR", 
           'em': "EM_SENSOR",
           'microphone': "MIC", 
           'node': 'NODE'
         }

# Event Types
Types of events ( collections of data from sensors ) available. Each asset may have multiple event types ( e.g. CAMERA asset that collects PEDEVT and TFEVT events ).

In [None]:
events = { 'humidity': "HUMIDITY",
           'orientation': "ORIENTATION",
           'pedestrian': "PEDEVT",
           'parking_in': "PKIN",
           'parking_out': "PKOUT",
           'pressure': "PRESSURE",
           'temperature': "TEMPERATURE",
           'traffic': "TFEVT"
         }

# Location Types
Types of software-defined boundaries that the camera assets use to detect events.

In [None]:
locations = { 'parking': "PARKING_ZONE",
              'pedestrian': "WALKWAY",
              'traffic': "TRAFFIC_LANE"
            }

# CityIQ Access Token

Let's first grab the access token, so that we can make the API calls.

<b> Unfortunately, due to the inavailability of the public client id and client secret credentials, the Postman-Token is no longer valid, and thus the API is no longer accessable through this notebook ( please see the UPDATE section at the head of this notebook for further information ). </b>

In [None]:
# Define header dictionary.
uaa_headers = { 'Authorization': "Basic UHVibGljQWNjZXNzOnVWZWVNdWl1ZTRrPQ==",
                'cache-control': "no-cache",
                'Postman-Token': ptokens['uaa']
              }

# Define parameter dictionary.    
uaa_params = {"grant_type":"client_credentials"}    

# Return access token and other information as dictionary.
atoken = get_request(urls['uaa'], uaa_headers, uaa_params)

# CityIQ Asset Metadata

Next let's grab the asset metadata.

# All Metadata

In [None]:
# Define header dictionary.
meta_headers = { 'Authorization': atoken['token_type'] + atoken['access_token'],
                 'Predix-Zone-Id': zone_ids['traffic'], # Any zone ID will work for metadata.
                 'cache-control': "no-cache",
                 'Postman-Token': ptokens['meta']
               }

# Define parameter dictionary.
ATYPE = "assetType:" + assets['camera'] # Filter response by asset type.
Q = ATYPE # Query using a filter ( assetType, eventTypes, mediaType ).
PAGE = 0 # Indicates page number ( default = 0 ).
SIZE = 30000 # Maximum number of records to return per page ( default = 20 ).
meta_params = { #"bbox": bboxes['downtown_san_diego'], # Comment line to return all assets, irrespective of coordinates.
                #"q": Q, # Comment line to return all asset types.
                "page": PAGE,
                "size": SIZE,
              }

# Return asset metadata and other information as dictionary.
all_meta_dict = get_request(urls['meta'], meta_headers, meta_params)

# Only Pedestrian Event Asset Metadata

In [None]:
# Define header dictionary.
meta_headers = { 'Authorization': atoken['token_type'] + atoken['access_token'],
                 'Predix-Zone-Id': zone_ids['pedestrian'], # Any zone ID will work for metadata.
                 'cache-control': "no-cache",
                 'Postman-Token': ptokens['meta']
               }

# Define parameter dictionary.
ETYPE = "eventTypes:" + events['pedestrian'] # Filter response by event type.
Q = ETYPE # Query using a filter ( assetType, eventTypes, mediaType ).
PAGE = 0 # Indicates page number ( default = 0 ).
SIZE = 30000 # Maximum number of records to return per page ( default = 20 ).
meta_params = { #"bbox": bboxes['downtown_san_diego'], # Comment line to return all assets, irrespective of coordinates.
                "q": Q, # Comment line to return all asset types.
                "page": PAGE,
                "size": SIZE,
              }

# Return asset metadata and other information as dictionary.
ped_meta_dict = get_request(urls['meta'], meta_headers, meta_params)

# Only Traffic Event Asset Metadata

In [None]:
# Define header dictionary.
meta_headers = { 'Authorization': atoken['token_type'] + atoken['access_token'],
                 'Predix-Zone-Id': zone_ids['traffic'], # Any zone ID will work for metadata.
                 'cache-control': "no-cache",
                 'Postman-Token': ptokens['meta']
               }

# Define parameter dictionary.
ETYPE = "eventTypes:" + events['traffic'] # Filter response by event type.
Q = ETYPE # Query using a filter ( assetType, eventTypes, mediaType ).
PAGE = 0 # Indicates page number ( default = 0 ).
SIZE = 30000 # Maximum number of records to return per page ( default = 20 ).
meta_params = { #"bbox": bboxes['downtown_san_diego'], # Comment line to return all assets, irrespective of coordinates.
                "q": Q, # Comment line to return all asset types.
                "page": PAGE,
                "size": SIZE,
              }

# Return asset metadata and other information as dictionary.
traf_meta_dict = get_request(urls['meta'], meta_headers, meta_params)

# Only Humidity Event Asset Metadata

In [None]:
# Define header dictionary.
meta_headers = { 'Authorization': atoken['token_type'] + atoken['access_token'],
                 'Predix-Zone-Id': zone_ids['environmental'], # Any zone ID will work for metadata.
                 'cache-control': "no-cache",
                 'Postman-Token': ptokens['meta']
               }

# Define parameter dictionary.
ETYPE = "eventTypes:" + events['humidity'] # Filter response by event type.
Q = ETYPE # Query using a filter ( assetType, eventTypes, mediaType ).
PAGE = 0 # Indicates page number ( default = 0 ).
SIZE = 30000 # Maximum number of records to return per page ( default = 20 ).
meta_params = { #"bbox": bboxes['downtown_san_diego'], # Comment line to return all assets, irrespective of coordinates.
                "q": Q, # Comment line to return all asset types.
                "page": PAGE,
                "size": SIZE,
              }

# Return asset metadata and other information as dictionary.
hum_meta_dict = get_request(urls['meta'], meta_headers, meta_params)

# Only Orientation Event Asset Metadata

In [None]:
# Define header dictionary.
meta_headers = { 'Authorization': atoken['token_type'] + atoken['access_token'],
                 'Predix-Zone-Id': zone_ids['environmental'], # Any zone ID will work for metadata.
                 'cache-control': "no-cache",
                 'Postman-Token': ptokens['meta']
               }

# Define parameter dictionary.
ETYPE = "eventTypes:" + events['orientation'] # Filter response by event type.
Q = ETYPE # Query using a filter ( assetType, eventTypes, mediaType ).
PAGE = 0 # Indicates page number ( default = 0 ).
SIZE = 30000 # Maximum number of records to return per page ( default = 20 ).
meta_params = { #"bbox": bboxes['downtown_san_diego'], # Comment line to return all assets, irrespective of coordinates.
                "q": Q, # Comment line to return all asset types.
                "page": PAGE,
                "size": SIZE,
              }

# Return asset metadata and other information as dictionary.
ori_meta_dict = get_request(urls['meta'], meta_headers, meta_params)

# Only Pressure Event Asset Metadata

In [None]:
# Define header dictionary.
meta_headers = { 'Authorization': atoken['token_type'] + atoken['access_token'],
                 'Predix-Zone-Id': zone_ids['environmental'], # Any zone ID will work for metadata.
                 'cache-control': "no-cache",
                 'Postman-Token': ptokens['meta']
               }

# Define parameter dictionary.
ETYPE = "eventTypes:" + events['pressure'] # Filter response by event type.
Q = ETYPE # Query using a filter ( assetType, eventTypes, mediaType ).
PAGE = 0 # Indicates page number ( default = 0 ).
SIZE = 30000 # Maximum number of records to return per page ( default = 20 ).
meta_params = { #"bbox": bboxes['downtown_san_diego'], # Comment line to return all assets, irrespective of coordinates.
                "q": Q, # Comment line to return all asset types.
                "page": PAGE,
                "size": SIZE,
              }

# Return asset metadata and other information as dictionary.
pres_meta_dict = get_request(urls['meta'], meta_headers, meta_params)

# Only Temperature Event Asset Metadata

In [None]:
# Define header dictionary.
meta_headers = { 'Authorization': atoken['token_type'] + atoken['access_token'],
                 'Predix-Zone-Id': zone_ids['environmental'], # Any zone ID will work for metadata.
                 'cache-control': "no-cache",
                 'Postman-Token': ptokens['meta']
               }

# Define parameter dictionary.
ETYPE = "eventTypes:" + events['temperature'] # Filter response by event type.
Q = ETYPE # Query using a filter ( assetType, eventTypes, mediaType ).
PAGE = 0 # Indicates page number ( default = 0 ).
SIZE = 30000 # Maximum number of records to return per page ( default = 20 ).
meta_params = { #"bbox": bboxes['downtown_san_diego'], # Comment line to return all assets, irrespective of coordinates.
                "q": Q, # Comment line to return all asset types.
                "page": PAGE,
                "size": SIZE,
              }

# Return asset metadata and other information as dictionary.
temp_meta_dict = get_request(urls['meta'], meta_headers, meta_params)

# Generate Event Type Heat Maps
Generate heat maps showing the density of nodes with assets actively recording the specified event type. This will enable a visualization of the geographic distribution of the nodes.

In [None]:
# Convert the all_meta_dict dictionary into a DataFrame.
df = pd.DataFrame(all_meta_dict['content'])

# The number of unique parentAssetUids is the number of nodes ( 3013 ) + 1 ( nodes themselves have a parentAssetUid of None)
display(len(df[~df.parentAssetUid.isna()].parentAssetUid.unique()))

# Verify that every sensor belongs to one of the nodes.
display((len(df[df.assetType == "NODE"].assetUid.unique()) == len(df[df.assetType == "NODE"].assetUid))) # Verifies that each node has a unique assetUid ( equivalent to parentAssetUid for sensors ).
display(df[df.assetType != "NODE"].parentAssetUid.isna().unique()) # Verifies that none of the sensors lack a parentAssetUid.
display(df[~df.parentAssetUid.isna()].parentAssetUid.isin(df[df.assetType == "NODE"].assetUid.unique()).unique()) # Verifies that each sensor's parentAssetUid corresponds an existing node.

# Generate a dictionary containing of the all event types each node measures.
evts = { auid: list(sorted(set(flatten(df[(df.parentAssetUid == auid) & (~df.eventTypes.isna())].eventTypes)))) \
         for auid in df[df.assetType == "NODE"].assetUid.unique() 
       }

# Generate a dictionary containing the coordinates of each node.
coors = { auid: str(df[df.assetUid == auid].coordinates.values) \
         for auid in df[df.assetType == "NODE"].assetUid.unique() 
        }

# Convert dictionaries into DataFrames.
evts_df = pd.DataFrame({'node_assetUid': list(evts.keys()), 'eventTypes': list(evts.values())})
coors_df = pd.DataFrame({'node_assetUid': list(coors.keys()), 'coordinates': list(coors.values())})

# Verify that node_assetUids match up by index.
display(np.all((evts_df.node_assetUid.values == coors_df.node_assetUid.values) == True))

# Create a DataFrame containing the coordinates and eventTypes for each node.
coordinates = coors_df.coordinates.str.strip("[]''").str.split(':')
all_df = pd.DataFrame()
all_df["latitude"] = [float(coordinates[i][0]) for i in coordinates.index]
all_df["longitude"] = [float(coordinates[i][1]) for i in coordinates.index]
all_df["eventTypes"] = list(evts_df.eventTypes.values)

# One-hot-encoding of eventTypes. ENERGY_TIMESERIES ends up as two separate Series, depending on its order in the original lists.
one_df = all_df.eventTypes.apply(', '.join).str.get_dummies(sep=", ")
one_df = pd.concat([all_df, one_df], axis=1)

# Define mean latitude and longitude to start map at.
lat_mean = one_df.describe().at['mean','latitude']
long_mean = one_df.describe().at['mean','longitude']
m = folium.Map([lat_mean, long_mean],
                      zoom_start=10)

# Apply the heat map to the Folium map, for the given event type ( e.g. PRESSURE ).
HeatMap(one_df[['latitude', 'longitude', 'PRESSURE']].values, min_opacity =0.4).add_to(m)

# CityIQ Asset Event Data

Next let's grab the event data. Retrieving from the REST API; a WebSocket may be used in the future to obtain near-real-time data. We will do so separately, as they require distinct zone ID, eventType and locationType parameters ( different event types query different sub-APIs ).

As a test case, we will collect all of the event data for each asset on a single node, which specified below ( this node has assets that collect pedestrian, traffic, and environmental events ). We will accomplish this by requesting event data by the assetUid of each asset. To grab these assetUids, we will first convert the metadata dictionaries to pandas DataFrames.

In [None]:
# Define parentAssetUid to extract events from a single node.
pauid = '08cbccff-cdd0-404f-af62-0346a4480d5c'

In [None]:
# Convert dictionaries to DataFrames.
all_meta_df = pd.DataFrame(all_meta_dict['content'])
ped_meta_df = pd.DataFrame(ped_meta_dict['content'])
traf_meta_df = pd.DataFrame(traf_meta_dict['content'])
hum_meta_df = pd.DataFrame(hum_meta_dict['content'])
ori_meta_df = pd.DataFrame(ori_meta_dict['content'])
pres_meta_df = pd.DataFrame(pres_meta_dict['content'])
temp_meta_df = pd.DataFrame(temp_meta_dict['content'])

# Pedestrian Events

In [None]:
# Define header dictionary.
pedevt_headers = { 'Authorization': atoken['token_type'] + atoken['access_token'],
                   'Predix-Zone-Id': zone_ids['pedestrian'],
                   'cache-control': "no-cache",
                   'Postman-Token': ptokens['pedevt_auid']
                 }

# Define parameter dictionary.
SIZE = 100 # Maximum number of records to return per page ( default = 20 ).
pedevt_params = { "eventType": events['pedestrian'],
                  #"bbox": bboxes['downtown_san_diego'],
                  #"locationType": "WALKWAY",#locations['pedestrian'],
                  "startTime": STARTTS,
                  "endTime": ENDTS,
                  "pageSize": SIZE
                }

# Return event data by assetUid.
pedevt_arr = data_by_assetUid(ped_meta_df[ped_meta_df['parentAssetUid'] == pauid].assetUid.values, pedevt_headers, pedevt_params)
pedevt_arr = flatten(pedevt_arr) # Extract the inner list of dictionaries from the list.

# Traffic Events

In [None]:
# Define header dictionary.
tfevt_headers = { 'Authorization': atoken['token_type'] + atoken['access_token'],
                  'Predix-Zone-Id': zone_ids['traffic'],
                  'cache-control': "no-cache",
                  'Postman-Token': ptokens['tfevt_auid']
                }

# Define parameter dictionary.
SIZE = 3000 # Maximum number of records to return per page ( default = 20 ).
tfevt_params = { "eventType": events['traffic'],
                 #"bbox": bboxes['downtown_san_diego'],
                 #"locationType": locations['traffic'],
                 "startTime": STARTTS,
                 "endTime": ENDTS,
                 "pageSize": SIZE
               }

# Return event data by assetUid.
tfevt_arr = data_by_assetUid(traf_meta_df[traf_meta_df['parentAssetUid'] == pauid].assetUid.values, tfevt_headers, tfevt_params)
tfevt_arr = flatten(tfevt_arr) # Extract the inner list of dictionaries from the list.

# Humidity Events

In [None]:
# Define header dictionary.
hum_headers = { 'Authorization': atoken['token_type'] + atoken['access_token'],
                'Predix-Zone-Id': zone_ids['environmental'],
                'cache-control': "no-cache",
                'Postman-Token': ptokens['humidity']
              }

# Define parameter dictionary.
SIZE = 1000 # Maximum number of records to return per page ( default = 20 ).
hum_params = { "eventType": events['humidity'],
               #"bbox": bboxes['downtown_san_diego'],
               #"locationType": LTYPES['traffic'],
               "startTime": STARTTS,
               "endTime": ENDTS,
               "pageSize": SIZE
             }

# Return event data by assetUid.
hum_arr = data_by_assetUid(hum_meta_df[hum_meta_df['parentAssetUid'] == pauid].assetUid.values, hum_headers, hum_params)
hum_arr = flatten(hum_arr) # Extract the inner list of dictionaries from the list.

# Orientation Events

In [None]:
# Define header dictionary.
ori_headers = { 'Authorization': atoken['token_type'] + atoken['access_token'],
                'Predix-Zone-Id': zone_ids['environmental'],
                'cache-control': "no-cache",
                'Postman-Token': ptokens['orientation']
              }

# Define parameter dictionary.
SIZE = 1000 # Maximum number of records to return per page ( default = 20 ).
ori_params = { "eventType": events['orientation'],
               #"bbox": bboxes['downtown_san_diego'],
               "locationType": locations['traffic'],
               "startTime": STARTTS,
               "endTime": ENDTS,
               "pageSize": SIZE
               }

# Return event data by assetUid.
ori_arr = data_by_assetUid(ori_meta_df[ori_meta_df['parentAssetUid'] == pauid].assetUid.values, ori_headers, ori_params)
ori_arr = flatten(ori_arr)  # Extract the inner list of dictionaries from the list.

# Pressure Events

In [None]:
# Define header dictionary.
pres_headers = { 'Authorization': atoken['token_type'] + atoken['access_token'],
                 'Predix-Zone-Id': zone_ids['environmental'],
                 'cache-control': "no-cache",
                 'Postman-Token': ptokens['humidity']
               }

# Define parameter dictionary.
SIZE = 1000 # Maximum number of records to return per page ( default = 20 ).
pres_params = { "eventType": events['pressure'],
                #"bbox": bboxes['downtown_san_diego'],
                #"locationType": locations['traffic'],
                "startTime": STARTTS,
                "endTime": ENDTS,
                "pageSize": SIZE
              }

# Return event data by assetUid.
pres_arr = data_by_assetUid(pres_meta_df[pres_meta_df['parentAssetUid'] == pauid].assetUid.values, pres_headers, pres_params)
pres_arr = flatten(pres_arr) # Extract the inner list of dictionaries from the list.

# Temperature Events

In [None]:
# Define header dictionary.
temp_headers = { 'Authorization': atoken['token_type'] + atoken['access_token'],
                 'Predix-Zone-Id': zone_ids['environmental'],
                 'cache-control': "no-cache",
                 'Postman-Token': ptokens['temperature']
               }

# Define parameter dictionary.
SIZE = 1000 # Maximum number of records to return per page ( default = 20 ).
temp_params = { "eventType": events['temperature'],
                #"bbox": bboxes['downtown_san_diego'],
                #"locationType": locations['traffic'],
                "startTime": STARTTS,
                "endTime": ENDTS,
                "pageSize": SIZE
               }

# Return event data by assetUid.
temp_arr = data_by_assetUid(temp_meta_df[temp_meta_df['parentAssetUid'] == pauid].assetUid.values, temp_headers, temp_params)
temp_arr = flatten(temp_arr) # Extract the inner list of dictionaries from the list.

# CityIQ Asset Media Data
We currently do not have access to the Metrology or Media zones, so cannot access their data.

In [None]:
# Define CityIQ Media Service URL.
# media_url = ""

# MEDIA = "" # Media type ().
# MTYPE = "mediaType:" + MEDIA # Filter response by media type. CAMERA is the only sensor that will generate mediaType.

# Data Analysis:
The goal of Vision Zero is to reduce the count of traffic-related deaths to zero. Traffic-related deaths occur when there are collisions with certain properties between vehicles, between vehicles and pedestrians, or between vehicles and bicyclists, and are a subset of the possible results of a collision ( which also include non-/injurious collisions ). Collisions occur when objects intersect in time and space. We want to learn what properties lead to the collisions that result in traffic-related deaths. 

# Asset Parser

We need to parse each asset to extract all the data we wish to collect. Below is a list of the columns of interest:

* assetUid
* eventType
* measures ( dicitionary-type; need to extract values from keys )
* properties ( dicitionary-type; need to extract values from keys )
* timestamp

In [None]:
# Convert lists to DataFrames.
pedevts_df = pd.DataFrame(pedevt_arr)
tfevts_df = pd.DataFrame(tfevt_arr)
humevts_df = pd.DataFrame(hum_arr)
orievts_df = pd.DataFrame(ori_arr)
presevts_df = pd.DataFrame(pres_arr)
tempevts_df = pd.DataFrame(temp_arr)

In [None]:
# Extract event measures and properties from dictionaries, and append to DataFrames.
pedevts_measures_df = pd.DataFrame(pedevts_df.measures.tolist())
pedevts_properties_df = pd.DataFrame(pedevts_df.properties.tolist())
pedevts_df.drop(['measures','properties'],axis=1,inplace=True)
pedevts = pd.concat([pedevts_df, pedevts_measures_df, pedevts_properties_df],axis=1)

tfevts_measures_df = pd.DataFrame(tfevts_df.measures.tolist())
tfevts_properties_df = pd.DataFrame(tfevts_df.properties.tolist())
tfevts_df.drop(['measures','properties'],axis=1,inplace=True)
tfevts = pd.concat([tfevts_df, tfevts_measures_df, tfevts_properties_df],axis=1)

humevts_measures_df = pd.DataFrame(humevts_df.measures.tolist())
humevts_properties_df = pd.DataFrame(humevts_df.properties.tolist())
humevts_df.drop(['measures','properties'],axis=1,inplace=True)
humevts = pd.concat([humevts_df, humevts_measures_df, humevts_properties_df],axis=1)

orievts_measures_df = pd.DataFrame(orievts_df.measures.tolist())
orievts_properties_df = pd.DataFrame(orievts_df.properties.tolist())
orievts_df.drop(['measures','properties'],axis=1,inplace=True)
orievts = pd.concat([orievts_df, orievts_measures_df, orievts_properties_df],axis=1)

presevts_measures_df = pd.DataFrame(presevts_df.measures.tolist())
presevts_properties_df = pd.DataFrame(presevts_df.properties.tolist())
presevts_df.drop(['measures','properties'],axis=1,inplace=True)
presevts = pd.concat([presevts_df, presevts_measures_df, presevts_properties_df],axis=1)

tempevts_measures_df = pd.DataFrame(tempevts_df.measures.tolist())
tempevts_properties_df = pd.DataFrame(tempevts_df.properties.tolist())
tempevts_df.drop(['measures','properties'],axis=1,inplace=True)
tempevts = pd.concat([tempevts_df, tempevts_measures_df, tempevts_properties_df],axis=1)

# Display DataFrames.
display(pedevts.sample(3))
display(tfevts.sample(3))
display(humevts.sample(3))
display(orievts.sample(3))
display(presevts.sample(3))
display(tempevts.sample(3))

In [None]:
# Print dimensions of event DataFrames (rows, columns).
print("Pedestrian events: ", pedevts.shape)
print("Traffic events: ", tfevts.shape)
print("Humidity events: ", humevts.shape)
print("Orientation events: ", orievts.shape)
print("Pressure events: ", presevts.shape)
print("Temperature events: ", tempevts.shape)

It looks like for 5-1-2019, 89 pedestrian events, 2,144 traffic events, and 49 events of each environmental event type were captured by the assets on this node.

Let's rename the columns, so that we can append everything together.

In [None]:
# Rename event features. 
pedevts.columns = ["pedevt_" + cols for cols in list(pedevts)]
tfevts.columns = ["tfevt_" + cols for cols in list(tfevts)]
humevts.columns = ["humevt_" + cols for cols in list(humevts)]
orievts.columns = ["orievt_" + cols for cols in list(orievts)]
presevts.columns = ["presevt_" + cols for cols in list(presevts)]
tempevts.columns = ["tempevt_" + cols for cols in list(tempevts)]

In [None]:
# Append all event DataFrames together ( tempevts data determined to be outside the scope of this project ).
all_data_df = pedevts.append([tfevts, humevts, orievts, presevts], sort=False)

# Write to CSV

In [None]:
filepath = './' 
# Example filename ( format: parentAssetUid_DATE.csv ): 08cbccff-cdd0-404f-af62-0346a4480d5c_2019-05-1.csv
all_data_df.to_csv(os.path.join(filepath,r'08cbccff-cdd0-404f-af62-0346a4480d5c_2019-05-1.csv'),index=False)