# Improving the Quality of Urban Life

### Authors: Mingwei (Yuka) Xu, Shirley Wang

# Introduction

## Background

The goal of Sidewalk Labs Toronto is to be able to improve urban life, and to achieve that they are observing how people interact with the environment on a daily basis through the use of Numina sensors.  In this notebook we have created some tools that will hopefully be helpful in the analysis of how people interact with the environment and give further insights into what sort of changes should be made to make life better for people who pass through the area.
    
## Data

The data is collected from 3 sensors located around Sidewalk Labs: Streetscape (located at the exhibition area inside the building), Under Raincoat (located to the side of the building under a large tarp), and Outside (located in front of the building with some tables and a walkway).  The Numina sensors allow us to ask for either the number of pedestrians counted in an area during a time interval, or a heatmap of the area during a time interval.  Since the sensors do not store any information about the people it sees for privacy purposes, if a person walks off frame and then back in frame later during the same interval, it is likely that they are counted more than once.  As a result, at times of high movement, like events at Sidewalk Labs, it is very likely many people are counted more than once, and so we will see bizarrely large numbers for people appearing in an area sometimes.
    
We only have data for specific time ranges for each sensor:
- Streetscape: Feb 20 2019 - Jan 12 2020
- Outside: Mar 20 2019 - Jan 12 2020
- Under Raincoat: Mar 20 2019 - Dec 6 2019
    
To use this notebook, please store your login credentials for Numina's api in a seperate python file called `login.py`.  The file `login.py` should contain:

```
login = "yourname@mail.utoronto.ca"
pwd = "yourpassword"
```

Which correspond to your own Numina account.  Run the four cells below to get all imports required, and to get access to numina's data.

    
## Privacy Policy

Numina's own privacy policy is "Intelligence without Surveillance", although it is debatable how well it upholds that.  We created this notebook with that policy in mind, and so we tried to build a notebook that provides useful information about how pedestrians interact with the area around Sidewalk Labs, but also contains nothing that could be used to directly identify a person to protect the privacy of everyone caught on camera.  We always query the data instead of storing it in an external file.  The heatmaps do not have the number of pedestrians counted in it displayed so information about a specific person can't be inferred from it.  The only place where the actual counts of number of people seen in an area is in part 3, where you can see the number of pedestrians observed in an hour, but one hour is a long enough time interval that it would be very difficult to observe a singular person's movements in it. 


In [1]:
import requests
import json
import glob
import datetime
from time import sleep
from calendar import monthrange
import math
import pandas as pd
import numpy as np
from IPython.display import display

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import matplotlib.cm as cm
import matplotlib.widgets as mpl_widgets
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

import ipywidgets as widgets
from ipywidgets import Layout, interact, interactive, HBox, VBox

from statsmodels.tsa.holtwinters import ExponentialSmoothing

# ignore the warnings
import warnings
warnings.filterwarnings('ignore')

In [2]:
# store login data in login.py
%run login.py

In [3]:
loginquery = f"""
mutation {{
  logIn(
      email:\"{login}\",
      password:\"{pwd}\") {{
    jwt {{
      token
      exp
    }}
  }}
}}
"""

In [4]:
url = 'https://api.numina.co/graphql'
mylogin = requests.post(url, json={'query': loginquery})

if mylogin.json()["data"]:
    token = mylogin.json()['data']['logIn']['jwt']['token']
    expdate = mylogin.json()['data']['logIn']['jwt']['exp']
    print("Success.  This session is active until", expdate)
else:
    print("Failure to Access Numina.  Please check your login credentials.")

Success.  This session is active until 2020-03-23T23:25:37.608580


# How Does Dwell Time Change?

One important thing to consider when planning a public space is where people tend to dwell and what paths people tend to take.  These come as a natural consequence of the area the person interacts with.  The widget below allows the user to choose a day or a month, and view heatmaps for that time range across chosen time intervals, to see how people interact with that space varies throughout that time range.

The selections for time intervals are:
- For 1 Day: 15 Minutes, 1 Hour, or 6 Hours
- For 1 Month: 6 Hours, or 1 Day

### Tabs:
1. Heatmap Over Time: Shows heatmaps for that time range, in time interval increments.  Gives an idea of where people tend to be and what paths they took through that time range. 
2. Most Common Spots: Takes the sum of all heatmaps in Heatmap Over Time and shows the areas with higher heat, to give an idea of spots people tended to be in throughout that entire time range.

### Notes: 
- The number of people in each heatmap is not shown for privacy purposes.
- Every time a selection is made, we re-query Numina for the data.  This means that loading is slower (especially if you choose day by 15 minutes or month by 6 hours), but this way we avoid security and memory issues that come with storing the data.  The loading bar will turn green once everything is loaded and finished rendering.
- Transitions for playing can be slow when showing days in a month, due to the high volume of data involved in making the heatmap.
- The movements of the points during the transitions are not indicative of any actual movements of people, they're just there to make the transitions look cool.
- We only have information for each area within a specific range, specified above in the **Data** section. Choosing any date outside this range will result in nothing being shown on the maps.
- Plotly's FigureWidget does not currently support animations, so I am using basic widget Output to capture the animated heatmap plot.  This means that if we switch tabs to the other plot, and then reselect data to be redrawn, the animated plot may end up horribly smushed when you switch tabs back to it.  Please always have the Heatmap Over Time tab selected when you reselect data to avoid this problem.

In [5]:
# useful global stuff
locations = {"Streetscape": ["SWLSANDBOX1", "streetscape.png"], 
             "Under Raincoat": ["SWLSANDBOX2", "under_raincoat.png"], 
             "Outside": ["SWLSANDBOX3", "outside.png"]}

minutes15 = datetime.timedelta(minutes=15)
onehour = datetime.timedelta(hours=1)
sixhours = datetime.timedelta(hours=6)
oneday = datetime.timedelta(days=1)

In [6]:
# widgets

error_messages = widgets.Output()

# day selections
day_picker = widgets.DatePicker(
    description='Date:',
    disabled=False
)

day_timerange_dd = widgets.Dropdown(options = ["15 Minutes", "1 Hour", "6 Hours"], 
                                    description="By:")

area_dd = widgets.Dropdown(options=list(locations.keys()), description="Area:")

day_select = widgets.Button(description='Select Day', disabled=False, button_style='')

day_selections = widgets.VBox([day_picker, day_timerange_dd, area_dd, day_select, error_messages])

# month selections
months_choices = {}
for i in range(1,13):
    months_choices[datetime.date(2019, i, 1).strftime('%B')] = i

month_dd = widgets.Dropdown(options = list(months_choices.keys()), description="Month:")
year_dd = widgets.Dropdown(options = [2019, 2020], description="Year:")
month_timerange_dd = widgets.Dropdown(options = ["6 Hours", "1 Day"], description="By:")

month_select = widgets.Button(description='Select Month', disabled=False, button_style='')

month_selections = widgets.VBox([month_dd, year_dd, month_timerange_dd, area_dd, month_select, error_messages])

# loading bar
loading1 = widgets.IntProgress(value=0, min=0, max=10, step=1, description='Loading:', 
                               bar_style='info', orientation='horizontal')

# output widgets
heatmap_by_time = widgets.Output() #go.FigureWidget() 
heatmap_popular_areas = go.FigureWidget()

# selections in folding accordian tabs
selections1 = widgets.Accordion(children=[day_selections, month_selections])
selections1.set_title(0, 'Show One Day')
selections1.set_title(1, 'Show One Month')

# output graph tabs
tab1 = widgets.Tab([heatmap_by_time, heatmap_popular_areas])
tab1.set_title(0, "Heatmap Over Time")
tab1.set_title(1, "Most Common Spots")

In [7]:
# functions

# when Select Date is pressed
def process_date_selection(clicked):
    if day_picker.value:
        startday = datetime.datetime(day_picker.value.year, day_picker.value.month, day_picker.value.day)
        
        if day_timerange_dd.value == "15 Minutes":
            timegap = minutes15
            num_iters = 24 * 4
        elif day_timerange_dd.value == "1 Hour":
            timegap = onehour
            num_iters = 24 
        else:
            timegap = sixhours
            num_iters = 4
            
        location = locations[area_dd.value][0]
        img = locations[area_dd.value][1]
        
        loading1.max = num_iters + 5
        loading1.value = 0
        loading1.bar_style = "info"
        
        title = " Day " + startday.strftime(format="%B %d, %Y")
        display_heatmaps(startday, timegap, location, num_iters, img, title)
        
    else:
        error_messages.clear_output()
        with error_messages:
            print("Please Select A Day")
            
            
# when Select Month is pressed
def process_month_selection(clicked):
    year = year_dd.value
    month = months_choices[month_dd.value]
    startday = datetime.datetime(year=year, month=month, day=1)

    if month_timerange_dd.value == "6 Hours":
        timegap = sixhours
        num_iters = 4 * monthrange(year, month)[1]
    else:
        timegap = oneday
        num_iters = monthrange(year, month)[1]

    location = locations[area_dd.value][0]
    img = locations[area_dd.value][1]
    
    loading1.max = num_iters + 5
    loading1.value = 0
    loading1.bar_style = "info"
    
    title = " Month " + startday.strftime(format="%B %Y")
    display_heatmaps(startday, timegap, location, num_iters, img, title)
    
    
# call the heatmap making functions
def display_heatmaps(startday, timegap, location, num_iters, img, title):
    heatmaps = query_heatmap_info(startday, timegap, location, num_iters)
    create_heatmaps(heatmaps, img, num_iters, title)
        

# query Numina for the heatmaps
def query_heatmap_info(starttime, timegap, location, num_iters):
    heatmaps = {}
    # since we can only get 1 heatmap per query, we have to loop to get multiple heatmaps
    for i in range(num_iters):
        str_start = (starttime + timegap * i).strftime("%Y-%m-%dT%H:%M:%S")
        str_end = (starttime + timegap * (i + 1)).strftime("%Y-%m-%dT%H:%M:%S")

        heatmapquery = f"""
        query {{
          feedHeatmaps(
            serialno:\"{location}\",
            startTime:\"{str_start}\",
            endTime:\"{str_end}\",
            objClasses:["pedestrian"],
            timezone:"America/New_York") {{
            edges {{
              node {{
                time
                objClass
                heatmap
              }}
            }}
          }}
        }}
        """

        heatmap_request = requests.post(url, json={'query': heatmapquery}, headers = {'Authorization':token})
        heatmaps[str_start] = heatmap_request.json()["data"]["feedHeatmaps"]["edges"][0]["node"]["heatmap"]
        loading1.value = loading1.value + 1
        
    return heatmaps


# create the two heatmaps for display
def create_heatmaps(heatmaps, img, num_iters, title):
    # put heatmap data into a dataframe to draw heatmaps as scatterplots
    data = pd.DataFrame(columns=["x", "y", "value", "DateTime"])
    for heat in heatmaps:
        new = pd.DataFrame(heatmaps[heat])
        # need some data even if no one there so the frame is there on the graph
        if len(new) == 0:
            new = pd.DataFrame({0: [0, 0], 1: [0, 0], 2: [0, 1]})

        new["DateTime"] = heat
        new = new.rename(columns={0: "x", 1: "y", 2: "value"})
        data = data.append(new)
    data["value"] = data["value"].astype(float)
    
    loading1.value = loading1.value + 1
    
    # due to the high volumne of data in a heatmap, to effectively visualize and plot 
    # the graph with the animations I've cut down on the data by keeping every 2nd or 3rd point
    if data["DateTime"].value_counts().values[0] > 20000:
        cut_data = data[(data["x"] % 3 == 0) & (data["y"] % 3 == 0)]
    elif data["DateTime"].value_counts().values[0] > 10000:
        cut_data = data[(data["x"] % 2 == 0) & (data["y"] % 2 == 0)]
    else:
        cut_data = data

    if cut_data["DateTime"].value_counts().values[0] > 4000:
        # clunky when switching cause so much data to plot
        stayonframe = 3000
    else:
        stayonframe = 2000
        
    heatmap_colorscale = px.colors.sequential.Jet[1:5]
    loading1.value = loading1.value + 1

    # heatmap with animations
    fig = px.scatter(cut_data, x="x", y="y", color="value", animation_frame="DateTime",
                     range_x=[0, 640], range_y=[480, 0], color_continuous_scale=heatmap_colorscale,
                     width=800, height=700, opacity=0.5, range_color=[0, 1],
                     labels={"x": "", "y": ""})
    fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = stayonframe
    fig.layout.updatemenus[0].buttons[0].args[1]["transition"]["duration"] = 1000
    
    # add background image
    fig.add_layout_image(
            dict(
                source=img,
                xref="x",
                yref="y",
                x=0,
                y=0,
                sizex=640,
                sizey=480,
                sizing="stretch",
                opacity=1,
                layer="below")
    )
    fig.update_layout(title="Heatmaps of" + title,
                      xaxis=dict(showgrid=False, zeroline=False, ticks='',
                      showticklabels=False),
                      yaxis=dict(showgrid=False, zeroline=False, ticks='',
                      showticklabels=False))
    fig.update_traces(marker=dict(size=10))
    
    # unstable but necessary for animations, see Note for explanation 
    heatmap_by_time.clear_output()
    with heatmap_by_time:
        fig.show()
    
    loading1.value = loading1.value + 1
    
    # get data for heatmap of common areas
    summed_heatmap = data.groupby(["x", "y"])["value"].sum()
    summed_heatmap = summed_heatmap.reset_index()
    summed_heatmap = summed_heatmap[(summed_heatmap["x"] != 0) & (summed_heatmap["x"] != 0)]

    if (len(summed_heatmap)) > 0:
        max_val = max(list(summed_heatmap["value"]))
        min_val = min(list(summed_heatmap["value"]))
        bound = (max_val - min_val) / 4 + min_val
    else:
        max_val = 1
        bound = 0
        
    summed_heatmap = summed_heatmap[summed_heatmap["value"] > bound]
    
    loading1.value = loading1.value + 1
    
    # create heatmap of the common areas
    fig = px.scatter(summed_heatmap, x="x", y="y", color="value", range_x=[0, 640], 
                     range_y=[480, 0], color_continuous_scale=heatmap_colorscale,
                     width=800, height=600, opacity=0.3, range_color=[0, max_val],
                     labels={"x": "", "y": ""})
    fig.add_layout_image(dict(source=img,
                                xref="x",
                                yref="y",
                                x=0,
                                y=0,
                                sizex=640,
                                sizey=480,
                                sizing="stretch",
                                opacity=1,
                                layer="below"))
    fig.update_layout(title = "Heatmap of Most Commonly Visited Areas on" + title, 
                      xaxis=dict(showgrid=False, zeroline=False, ticks='',
                      showticklabels=False),
                      yaxis=dict(showgrid=False, zeroline=False, ticks='',
                      showticklabels=False))
    
    heatmap_popular_areas.data = []
    heatmap_popular_areas.add_traces(fig.data)
    heatmap_popular_areas.layout = fig.layout
    loading1.value = loading1.value + 1
    loading1.bar_style = "success"

In [8]:
# buttons activate functions
day_select.on_click(process_date_selection)
month_select.on_click(process_month_selection)

In [9]:
# display widgets
display(selections1)
display(loading1)
display(tab1)

Accordion(children=(VBox(children=(DatePicker(value=None, description='Date:'), Dropdown(description='By:', op…

IntProgress(value=0, bar_style='info', description='Loading:', max=10)

Tab(children=(Output(), FigureWidget({
    'data': [], 'layout': {'template': '...'}
})), _titles={'0': 'Heatm…

# Dwell Time During Events vs No Events

There are lots of events in 307 Sidewalk Labs. Two critical questions related we need to consider are that whether people tend to dwell in certain spots or choose certain desire lines during events, and whether these things will be different compared to where there are no events. Since we do not have a list of events in 307, so we make our definition of events.

> If a day has 500 people in an hour at daytime, we define it as **event day**. Otherside, it will be **no event day**.

This widget will support us in comparing the dwell time during a specific time period of an event day and a no event day by showing the side by side heatmaps.

### How to use:
First you need to use `Date` to select the range of date you are interested. To choose the event day and no event day, select date from `Event` and `No Event` respectively. You can select the location `Streetscape`, `Under Raincoat`, and `Outside` in `Location`, and time range `15 min`, `1 hour`, `6 hours`, `day` in `By`. To select start time of event and no event day to compare the heatmaps, use `Time`.

### Notes:

- Like previous widget, the number of people in each heatmap is not shown for privacy purposes.
- Like previous widget, every time a selection is made, we re-query Numina for the data, so the process can be slow, but no more than 30 seconds.
- Notice that the start date for SWL sensors is `2019-02-20`, and the end date for them is `2020-01-12`, since we only have data in this time period. 

In [10]:
# read the counts data according to the parameters
def read_count(locations, str_start, str_end, interval):
    all_counts = pd.DataFrame(columns=["time", "Streetscape", 
                                       "Under Raincoat", "Outside"])
    # loop over the locations to get the different result
    for loc in locations:
        countquery = f"""
        query{{
          feedCountMetrics(
            serialnos:\"{locations[loc][0]}\",
            startTime:\"{str_start}\",
            endTime:\"{str_end}\",
            objClasses:["pedestrian"],
            timezone:"America/New_York",
            interval:\"{interval}\") {{
            edges {{
              node {{
                serialno
                result
                objClass
                time
              }}
            }}
          }}
        }}
        """
        counts = requests.post(url, json={'query': countquery}, 
                               headers = {'Authorization':token})
        counts = counts.json()['data']['feedCountMetrics']['edges']
        counts_lst = []
        for i in range(len(counts)):
            counts_lst += [counts[i]['node']]
        counts = pd.DataFrame.from_dict(
            counts_lst, orient='columns').drop(
            ['objClass','serialno'],axis=1).rename(columns={'result':str(loc)})
        # create new dataframe or fill in na
        if all_counts['time'].count() == 0:
            all_counts = all_counts.append(counts)
        else:
            all_counts = all_counts.combine_first(counts)
    # convert the number of people to float
    all_counts[["Streetscape","Under Raincoat", "Outside"]] = \
                all_counts[["Streetscape","Under Raincoat", "Outside"]].astype(float)
    # sum all data together to get the overall information
    all_counts['total'] = all_counts[["Streetscape","Under Raincoat", "Outside"]].sum(1)
    # change the time to timestamp
    all_counts['time'] = pd.to_datetime(
            all_counts['time'], format="%Y-%m-%dT%H:%M:%S%z")
    return all_counts

In [11]:
# find a way to define an event day
# define it as 500 people in an hour at daytime
def check_events(count_data):
    # read the counts
    sub_count = read_count(locations,datetime.datetime.strftime(
                        count_data['time'].iloc[0],'%Y-%m-%dT%H:%M:%S'),
                               datetime.datetime.strftime(
                        count_data['time'].iloc[-1],
                                   '%Y-%m-%dT%H:%M:%S'),"1h")
    # check the events if they have 500 people in an hour
    sub_count['event'] = sub_count['total'].apply(lambda x: 1 if x>=500 else 0)
    # check if this will happen at daytime
    sub_count['hour'] = sub_count['time'].apply(lambda x: str(x).split(' ')[1][0:2])
    sub_count = sub_count[sub_count['hour']>'07']
    sub_count = sub_count[sub_count['hour']<'20']
    sub_count.drop('hour',axis=1)
    # keep the time format to be consistent
    sub_count['time'] = sub_count['time'].apply(lambda x: str(x).split(' ')[0]+\
                                                " 00:00:00-05:00:00")
    sub_count = sub_count.groupby('time').max().reset_index()
    sub_count['time'] = pd.to_datetime(
            sub_count['time'], format="%Y-%m-%d %H:%M:%S%z")
    # merge the dataframe with origin one
    count_data = count_data.merge(sub_count[['time','event']],on='time')
    
    return count_data

In [12]:
# read the heatmaps for question 2
def read_heatmap(location, str_start, str_end):
    heatmapquery = f"""
    query {{
      feedHeatmaps(
        serialno:\"{locations[location][0]}\",
        startTime:\"{str_start}\",
        endTime:\"{str_end}\",
        objClasses:["pedestrian"],
        timezone:"America/New_York") {{
        edges {{
          node {{
            time
            objClass
            heatmap
          }}
        }}
      }}
    }}
    """

    heatmap_request = requests.post(url, json={'query': heatmapquery}, 
                                    headers = {'Authorization':token})
    heatmaps = heatmap_request.json()["data"]["feedHeatmaps"]["edges"][0]["node"]["heatmap"]
    # only keep the x- and y-axis and its value
    heatmaps = pd.DataFrame(heatmaps, columns=["x", "y", "value"])
    heatmaps = heatmaps.append(pd.DataFrame(
            {"x": [0, 0], "y": [0, 0], "value": [0, 1]}))
    return heatmaps

In [13]:
# prepare work: create dataframe for question 2
dates = [datetime.date(2019, 2, j) for j in range(20, 29)]
dates += [datetime.date(i, j, 1) for i in range(2019, 2020) for j in range(3, 13)]
dates += [datetime.date(2020, 1, i) for i in range(1, 13)]
hour = [str(i).zfill(2)+":"+j+":00" for i in range(0,24) for j in ["00","15","30","45"]]
all_events = check_events(read_count(
        locations,"2019-02-20T00:00:00","2020-01-12T00:00:00","24h"))
events_option = all_events[all_events['event']==1]['time'].astype(str).str[:10]
non_events_option = all_events[all_events['event']==0]['time'].astype(str).str[:10]

In [14]:
# define the function to complete question 2
def compare_events(date, event, non_event, location, interval, time):
    # calculate the time
    time_event_asdate = datetime.datetime.strptime(event+" "+time,
                    "%Y-%m-%d %H:%M:%S")
    time_non_event_asdate = datetime.datetime.strptime(non_event+" "+time,
                    "%Y-%m-%d %H:%M:%S")
    if interval == '15 min':
        time_end_event = str(time_event_asdate+datetime.timedelta(minutes=15))
        time_non_end_event = str(time_non_event_asdate+datetime.timedelta(minutes=15))
    elif interval == 'hour':
        time_end_event = str(time_event_asdate+datetime.timedelta(hours = 1))
        time_non_end_event = str(time_non_event_asdate+datetime.timedelta(hours=1))
    elif interval == '6 hours':
        time_end_event = str(time_event_asdate+datetime.timedelta(hours = 6))
        time_non_end_event = str(time_non_event_asdate+datetime.timedelta(hours=6))
    else:
        time_end_event = str(time_event_asdate+datetime.timedelta(days = 1))
        time_non_end_event = str(time_non_event_asdate+datetime.timedelta(days=1))
    time_end_event = time_end_event.replace(" ","T")
    time_non_end_event = time_non_end_event.replace(" ","T")
    
    # create heatmaps according to the event or non event day
    heatmaps_event = read_heatmap(location, 
                    datetime.datetime.strftime(
                        time_event_asdate,'%Y-%m-%dT%H:%M:%S'), time_end_event)
    heatmaps_non_event = read_heatmap(location, 
                    datetime.datetime.strftime(
                        time_non_event_asdate,'%Y-%m-%dT%H:%M:%S'), time_non_end_event)
    # plot heatmaps
    fig,axs = plt.subplots(1,2,figsize=(20,7),sharey=True)
    ax = axs[0].scatter(x=heatmaps_event["x"], y=heatmaps_event["y"],
                            c=heatmaps_event["value"],cmap="jet",vmin=0, vmax=1)
    axs[0].set_title("Event Day "+str(event),fontsize=15)
    axs[1].scatter(x=heatmaps_non_event["x"], y=heatmaps_non_event["y"],
                   c= heatmaps_non_event["value"],cmap="jet",vmin=0, vmax=1)
    axs[1].set_title("No Event Day "+str(non_event),fontsize=15)
    for i in range(0,2):
        if location == 'Streetscape':
            img = plt.imread("streetscape.png")
        elif location == 'Under Raincoat':
            img = plt.imread("under_raincoat.png")
        else:
            img = plt.imread("outside.png")
        axs[i].imshow(img,zorder=0)
        axs[i].set_yticks([])
        axs[i].set_xticks([])
        axs[i].axis('off')
    # color bar
    cbar = fig.colorbar(ax, ax=axs.ravel().tolist())
    fig.suptitle('Heatmap for '+location,fontsize=20)
    plt.show()


# interactive widget
date_widget = widgets.SelectionRangeSlider(
             options=dates,
             index=(0, 7),
             description='Date:',
             disabled=False, layout=Layout(width='45%'))
event_widget = widgets.Dropdown(
             options=events_option,
             description='Event:',
             disabled=False, layout=Layout(width='45%'))
non_event_widget = widgets.Dropdown(
             options=non_events_option,
             description='No Event:',
             disabled=False, layout=Layout(width='45%'))
# update interactive widget
def update_event_day(*args):
    # define the start and the end period
    start = datetime.datetime.combine(date_widget.value[0], 
                        datetime.datetime.min.time())
    end = datetime.datetime.combine(date_widget.value[1],
                        datetime.datetime.min.time())
    event_widget.options = [time for time in events_option if \
                            ((datetime.datetime.strptime(time,
                    "%Y-%m-%d") >= start) & (datetime.datetime.strptime(time,
                    "%Y-%m-%d") <= end))]
    
    non_event_widget.options = [time for time in non_events_option if\
                                ((datetime.datetime.strptime(time,
                    "%Y-%m-%d") >= start) & (datetime.datetime.strptime(time,
                    "%Y-%m-%d") <= end))]
    
date_widget.observe(update_event_day, 'value')


# present the dashboard
widget = interactive(compare_events, 
         date = date_widget,
         event = event_widget, 
         non_event = non_event_widget, #change to non_event
         location = widgets.Dropdown(
             options=['Streetscape','Under Raincoat','Outside'],
             description='Location:',
             disabled=False, layout=Layout(width='45%')), 
         interval = widgets.Dropdown(
             options=['15 min','1 hour','6 hours','day'],
             description='By:',
             disabled=False, layout=Layout(width='45%')), 
         time = widgets.Dropdown(
             options=hour,
             description='Time:',
             disabled=False, layout=Layout(width='45%')), )

controls = HBox(widget.children[1:-1], 
                layout = Layout(flex_flow='row wrap'))
output = widget.children[-1]

In [15]:
# final display the dashboard
display(VBox([widget.children[0], controls, output]))

VBox(children=(SelectionRangeSlider(description='Date:', index=(0, 7), layout=Layout(width='45%'), options=(da…

# When Should 307 Scheduling Maintenance?

Sidewalk Labs requires maintenance after every 500 visitors or 500 hours.  500 hours is roughly 3 weeks.  We are considering maintenance separately on every sensor, since they can observe different numbers of people, and maintenance can be done seperately for each sensor.  For each sensor, we are using a time series analysis Holt-Winters Exponential Smoothing model to try and predict future visitors, using the data of number of people observed every hour.  A majority of the data is counted as residuals due to the extremely high counts on event days though, so the model has issues.  From observation, there isn't any interval where 3 weeks goes by without 500 visitors, so we have decided to disregard the 500 hours requirement and focus entirely on when 500 visitors is reached.

### Tabs:
1. Counts & Predictions: shows the observed counts every hour as well as the fitted Holt-Winters Exponential Smoothing model, and predictions for visitors for the next month.  The light blue lines represent when 500 visitors are reached, not counting days we have flagged as days with events.  The purple lines represent when 500 visitors are reached for the predicted visitors.  However, the model cannot predict when events happen, so the true amount of maintenance will most likely be higher than what's predicted.  The user can use the x-axis slider at the bottom of the plot to adjust the time interval shown in the plot.
2. Time to 500 Visitors: shows all the time intervals needed to achieve 500 visitors, not counting days we have flagged as days with events.  The purple is actual time intervals counted with the data, and the yellow is from the fitted model shown in Counts & Predictions.  This plot will hopefully give an idea of how long it takes to reach 500 visitors, to know how often maintenance should be scheduled.
3. Hours of Over 500: shows the hours that have over 500 pedestrians counted.  It also has red lines at 7am and 8pm, which is around the earliest and latest that people will work for their jobs.  This plot will hopefully give an idea of when events happen and what times do the most people show up, so that Sidewalk Labs knows to avoid hours with high counts.
4. Hours 500 Reached: shows the hours when 500 visitors is reached.  It also has red lines at 7am and 8pm, which is around the earliest and latest that people will work for their jobs.  This plot will hopefully give an idea of when Sidewalk Labs reaches 500 hours, and when Sidewalk Labs needs to prepare for scheduling maintenance.  

### Notes:
- The Counts & Predictions plot is prone to crashing, due to the overwhelming number of data plotted.  If it doesn't load and a sad face is displayed instead, press the "Render in Browser" button to load the graph in a separate browser tab directly, which will bypass the crashing in notebook issue.  Restarting your computer can also fix the problem.
- It can take some time to load a selection, since the Counts & Predictions plot has a lot going on in it.  The loading bar turns green when everything is done.
- The Counts & Predictions plot may look crowded.  Please use the xaxis slider at the bottom of it to help focus in on specific locations in the plot to help you better see the data.

In [16]:
# widgets

# inputs
area_dd2 = widgets.Dropdown(options=list(locations.keys()), description="Area:")
area_btn = widgets.Button(description='Select Area', disabled=False, button_style='')

# loading bar
loading2 = widgets.IntProgress(value=0, min=0, max=12, step=1, description='Loading:', 
                               bar_style='info', orientation='horizontal')

# outputs
time_graph = go.FigureWidget()
average_timeto500 = go.FigureWidget()
expected_event_hours = go.FigureWidget()
reached_500_hours = go.FigureWidget()

browser_render = widgets.Button(description='Render in Browser', disabled=False, button_style='')

# putting them together

area_selections = widgets.VBox([area_dd2, area_btn])
time_tab = widgets.VBox([browser_render, time_graph])

# selections in folding accordian tabs
selections2 = widgets.Accordion(children=[area_selections])
selections2.set_title(0, 'Select Area')

# output graph tabs
tab2 = widgets.Tab([time_tab, average_timeto500, expected_event_hours, reached_500_hours])
tab2.set_title(0, "Counts & Predictions")
tab2.set_title(1, "Time to 500 Visitors")
tab2.set_title(2, "Hours of Over 500 ")
tab2.set_title(3, "Hours 500 Reached")

In [17]:
# functions

# reference to big timeseries figure in case notebook renderer crashes
timeseries_figure = None

# in case notebook renderer crashes for predictions plot
def show_in_browser(click):
    if timeseries_figure:
        timeseries_figure.show(renderer="browser")


# function to call all other functions needed
def update_maintenance_graphs(click):
    global timeseries_figure
    loading2.value = 0
    loading2.bar_style = "info"
    loading2.value = loading2.value + 1
    
    count_data = process_area_selection()
    loading2.value = loading2.value + 1
    
    maybe_events = count_data[count_data["result"] >= 500]
    our_event_days = list(maybe_events["date"].value_counts().index)
    loading2.value = loading2.value + 1
    
    over_capacity, gaps, over_capacity_dates, over_capacity_times = get_visitor_data(count_data, our_event_days)
    loading2.value = loading2.value + 1
    
    model, predictions, predicted_time_gaps, reached_capacity = fit_timeseries_model(count_data)
    loading2.value = loading2.value + 1
    
    figure = draw_timeseries_graph(count_data, model, predictions, over_capacity, reached_capacity)
    timeseries_figure = figure
    loading2.value = loading2.value + 1
    
    hours_graphs(count_data, over_capacity_times, maybe_events, gaps, predicted_time_gaps)
    
    loading2.value = loading2.value + 1
    loading2.bar_style = "success"
    

# an area was selected, get data for it
def process_area_selection():
    location = locations[area_dd2.value][0]
    if area_dd2.value == "Streetscape":
        starttime = "2019-02-20T00:00:00"
        endtime = "2020-01-12T00:00:00"
    
    elif area_dd2.value == "Under Raincoat":
        starttime = "2019-03-20T00:00:00"
        endtime = "2019-12-06T00:00:00"
        
    else: # Outside
        starttime = "2019-03-19T00:00:00"
        endtime = "2020-01-12T00:00:00"
    
    hourquery = f"""
    query {{
      feedCountMetrics(
        serialnos:[\"{location}\"],
        startTime:\"{starttime}\",
        endTime:\"{endtime}\",
        objClasses:["pedestrian"],
        timezone:"America/New_York",
        interval:"1h") {{
        edges {{
          node {{
            serialno
            result
            objClass
            time
          }}
        }}
      }}
    }}
    """
    all_streetscape_counts = requests.post(url, json={'query': hourquery}, headers = {'Authorization':token})
    counts_list = all_streetscape_counts.json()["data"]["feedCountMetrics"]["edges"]
    count_data = pd.DataFrame()
    for node in counts_list:
        data = pd.DataFrame(node["node"], index=[0])
        count_data = count_data.append(data)
        
    count_data = count_data.reset_index(drop=True)
    count_data = count_data.rename(columns={"time": "datetime"})
    count_data["date"] = count_data["datetime"].apply(lambda x: x[0:10])
    count_data["time"] = count_data["datetime"].apply(lambda x: x[11:19])
    
    return count_data


# count for 500 visitors and sort through data
def get_visitor_data(count_data, our_event_days):
    visitors = 0
    over_capacity = []
    gaps = []
    over_capacity_dates = []
    over_capacity_times = []

    days_skipped = 0  # cause they're events
    for i, r in count_data.iterrows():
        if r["date"] not in our_event_days:

            visitors = visitors + r["result"]
            if visitors > 500:
                visitors = 0

                if len(over_capacity) > 0:
                    last = over_capacity[len(over_capacity) - 1]
                    curr_time = datetime.datetime.strptime(r["datetime"], "%Y-%m-%dT%H:%M:%S%z")
                    last_time = datetime.datetime.strptime(last, "%Y-%m-%dT%H:%M:%S%z")
                    time_gap = curr_time - last_time
                    time_gap = time_gap - datetime.timedelta(hours=1) * days_skipped
                    gaps.append(time_gap)

                over_capacity.append(r["datetime"])
                over_capacity_dates.append(r["date"])
                over_capacity_times.append(r["time"])
                days_skipped = 0
        else:
            if r["time"] == '00:00:00':
                days_skipped = days_skipped + 1
                
    gaps_hours = []
    for timegap in gaps:
        gaps_hours.append(timegap.days * 24 + timegap.seconds // 3600)
                
    return over_capacity, gaps_hours, over_capacity_dates, over_capacity_times


# fit the timeseries model and create the big plot for it
def fit_timeseries_model(count_data):
    fit_data = count_data[["datetime", "result"]]
    fit_data = fit_data.set_index("datetime")
    fit_data = pd.Series(fit_data["result"])
    
    # 24 seasonal periods since we have hourly data 
    model = ExponentialSmoothing(fit_data, seasonal_periods=24, trend='add', seasonal='add').fit()
    predictions = pd.DataFrame(model.forecast(768)).reset_index().rename(columns={"index": "Datetime", 0: "Visitors"})
    predictions["Visitors"] = predictions["Visitors"].apply(lambda x: int(max(x, 0)))
    
    # find when 500 visitors are reached
    visitors = 0
    predicted_time_gaps = []
    reached_capacity = []
    for i, r in predictions.iterrows():
        visitors = visitors + r["Visitors"]
        if visitors > 500:
            visitors = 0
            if len(predicted_time_gaps) == 0:
                last = predictions.loc[0]["Datetime"].to_pydatetime()
            else:
                last = reached_capacity[len(reached_capacity) - 1].to_pydatetime()
            curr = r["Datetime"].to_pydatetime()
            gap = curr - last
            predicted_time_gaps.append(gap)
            reached_capacity.append(r["Datetime"])
            
    # count when 500 hours are reached from the predictions
    gaps_hours = []
    for timegap in predicted_time_gaps:
        gaps_hours.append(timegap.days * 24 + timegap.seconds // 3600)
            
    return model, predictions,gaps_hours, reached_capacity


# draw the large timeseries plot
def draw_timeseries_graph(count_data, model, predictions, over_capacity, reached_capacity):
    fig = go.Figure() 
    # observed counts
    fig.add_trace(go.Scatter(x=count_data["datetime"], y=count_data["result"],
                             mode="lines", name="Observed"))
    # model fitted values
    fig.add_trace(go.Scatter(x=model.fittedvalues.index, y=model.fittedvalues.values,
                             mode='lines', name='Fitted'))
    # model predictions for next month
    fig.add_trace(go.Scatter(x=predictions["Datetime"], y=predictions["Visitors"],
                             mode='lines', name='Predicted'))
    # add dots where over 500 in an hour happens so we think that day is an event
    fig.add_trace(go.Scatter(x = count_data[count_data["result"] > 500]["datetime"],
                             y = count_data[count_data["result"] > 500]["result"], 
                             name= "Expected Events", mode='markers'))
    loading2.value = loading2.value + 1

    # draw vertical lines whenever 500 visitors is reached
    for i in range(len(over_capacity)):
        fig.add_shape(
            # Line Vertical
            dict(
                type="line",
                x0=over_capacity[i],
                y0=0,
                x1=over_capacity[i],
                y1=max(count_data["result"]) + 100,
                line=dict(color="aqua", width=1)
        ))

    # so that the vertical lines show up in in the legend
    fig.add_trace(go.Scatter(x=[over_capacity[0]], y=[0],
                             mode='lines', name='500 Visitors Reached', line=dict(color="aqua")))
    
    loading2.value = loading2.value + 1

    # draw vertical lines whenever 500 visitors is predicted to be reached
    for i in range(len(reached_capacity)):
        fig.add_shape(
            # Line Vertical
            dict(
                type="line",
                x0=reached_capacity[i],
                y0=0,
                x1=reached_capacity[i],
                y1=max(count_data["result"]) + 100,
                line=dict(color="rgb(231,107,243)", width=1)
        ))
        
    if len(reached_capacity) > 0:
        spot = reached_capacity[0]
    else:
        spot = "2020-01-12T00:00:00-05:00"

    # so that colour appears in legend
    fig.add_trace(go.Scatter(x=[spot], y=[0],
                             mode='lines', name='Predicted 500 Visitors Reached', 
                             line=dict(color="rgb(231,107,243)")))

    fig.update_layout(xaxis_rangeslider_visible=True, width=900, height=600, 
                      xaxis=dict(title="DateTime"), yaxis=dict(title="Number of Pedestrians"), 
                      title="Observed and Predicted Pedestrian Counts and When 500 Visitors are Reached")
    
    time_graph.data = []
    time_graph.add_traces(fig.data)
    time_graph.layout = fig.layout
    return fig


# draw the other graphs
def hours_graphs(count_data, over_capacity_times, maybe_events, gaps_hours, predicted_gaps_hours):
    # clean data into format for graphing
    times = list(count_data["time"].value_counts().index)
    times.sort()
    time_counts = pd.DataFrame({"Time": times})
    time_counts["Count"] = 0

    these_hours = pd.Series(over_capacity_times).value_counts()
    for i, r in time_counts.iterrows():
        if r["Time"] in these_hours.index:
            time_counts.loc[i, "Count"] = these_hours[r["Time"]]

    # create bar graph of hours when there are over 500 visitors
    fig = px.bar(time_counts, x='Time', y='Count')
    fig.add_shape(
            # Line Vertical
            dict(
                type="line",
                x0="07:00:00",
                y0=0,
                x1="07:00:00",
                y1=max(time_counts["Count"]) + 5,
                line=dict(color="red", width=3)
    ))

    fig.add_shape(
            # Line Vertical
            dict(
                type="line",
                x0="20:00:00",
                y0=0,
                x1="20:00:00",
                y1=max(time_counts["Count"]) + 5,
                line=dict(color="red", width=3)
    ))

    fig.update_layout(title="Hours When Sidewalk Labs Reaches Over 500 Visitors")
    reached_500_hours.data = []
    reached_500_hours.add_traces(fig.data)
    reached_500_hours.layout = fig.layout
    loading2.value = loading2.value + 1
    
    # clean data into format for graphing
    times = list(count_data["time"].value_counts().index)
    times.sort()
    time_counts = pd.DataFrame({"Time": times})
    time_counts["Count"] = 0

    these_hours = maybe_events["time"].value_counts()
    for i, r in time_counts.iterrows():
        if r["Time"] in these_hours.index:
            time_counts.loc[i, "Count"] = these_hours[r["Time"]]

    # create bar graph of hours when we count 500 visitors as reached
    fig = px.bar(time_counts, x='Time', y='Count')
    fig.add_shape(
            dict(
                type="line",
                x0="07:00:00",
                y0=0,
                x1="07:00:00",
                y1=max(time_counts["Count"]) + 5,
                line=dict(color="red", width=3)
    ))

    fig.add_shape(
            dict(
                type="line",
                x0="20:00:00",
                y0=0,
                x1="20:00:00",
                y1=max(time_counts["Count"]) + 5,
                line=dict(color="red", width=3)
    ))

    fig.update_layout(title="Hours With Over 500 Pedestrians Counted")    
    
    expected_event_hours.data = []
    expected_event_hours.add_traces(fig.data)
    expected_event_hours.layout = fig.layout
    loading2.value = loading2.value + 1
    
    # create histogram of how many hours it takes to reach 500 visitors for both
    # observed and predicted
    fig = go.Figure()
    fig.add_trace(go.Histogram(x=gaps_hours, 
                               marker=dict(color="fuchsia", line=dict(color="grey")), 
                               xbins=dict(size=10),
                               name="Hours To 500 Visitors"))
    fig.add_trace(go.Histogram(x=predicted_gaps_hours, 
                               marker=dict(color="rgb(255, 217, 102)"),
                               xbins=dict(size=10), 
                               name="Predicted Hours To 500 Visitors Next Month"))

    fig.update_layout(barmode='overlay', width=900, height=600, 
                      xaxis=dict(title="Hours"), yaxis=dict(title="Count"), 
                      title="Number of Hours to Reach 500 Visitors")
    fig.update_traces(opacity=0.75)
    
    average_timeto500.data = []
    average_timeto500.add_traces(fig.data)
    average_timeto500.layout = fig.layout
    loading2.value = loading2.value + 1

In [18]:
area_btn.on_click(update_maintenance_graphs)
browser_render.on_click(show_in_browser)

In [19]:
display(selections2)
display(loading2)
display(tab2)

Accordion(children=(VBox(children=(Dropdown(description='Area:', options=('Streetscape', 'Under Raincoat', 'Ou…

IntProgress(value=0, bar_style='info', description='Loading:', max=12)

Tab(children=(VBox(children=(Button(description='Render in Browser', style=ButtonStyle()), FigureWidget({
    …