# Ourself

## Notes from Bevan
Here is a link to our google drive folder with the Ourself data. There are two files: one for the installation that is still up at Roosevelt Plaza Park, one for the installation that was up at the Kroc Center. 

The file contains data in the following format:
    
    datestamp, timestamp, behavior

The software was running a state machine that tracks what it thinks people are doing so it knows how to control the lights and sounds; the logger recorded the status of that state machine. It recorded a data point every second as long as a sensor is tripped. 
The sensors are microwave sensors to detect motion, and weight sensors to detect when someone is on the central platform. 

If a sensor trip awakens it from a null state, it records this as someone arriving - "ARR".
If the weight sensors are tripped, it records this as occupied - "OCC".
If the weight sensors disengage and then microwave sensors are activated, it interprets this as someone departing - "DEP"

There are of course ambiguities in the interpretation. Someone may have left the platform while someone else entered. Sometime it may flip from "DEParted" straight to "OCCupied" again without hitting "ARRival", such as if there were many people milling around the sculpture. 

We are interested in overall trends, especially the following:

    - on how many occasions was the platform occupied?
    - how long was the total time of occupation?
    - how long was the average time of occupation?
    - what was the ratio of "ARR" + "DEP" times to the "OCC" time?

In [124]:
import pandas as pd
from bokeh.plotting import figure, output_notebook, show
import numpy as np
output_notebook()

In [125]:
# get data
df = pd.read_csv(
                 '../data/Ourself_datalog_KROC_03-13-2017.TXT'
                 ,names = ['dateTime', 'ARR', 'OCC', 'DEP']
                 ,dtype = {'dateTime':object, 'ARR':str, 'OCC':str, 'DEP':str}
)
df.fillna(0, inplace = True)

df['arrived'] = df.ARR == 'ARR'
df['occupied'] = df.OCC == 'OCC'
df['departed'] = df.DEP == 'DEP'
df.index = pd.to_datetime(df.dateTime)

df = df[['arrived', 'occupied', 'departed']]

In [126]:
df[(df.departed == True) & (df.occupied == True)]

Unnamed: 0_level_0,arrived,occupied,departed
dateTime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2016-05-10 15:53:01,True,True,True
2016-05-10 15:53:03,True,True,True
2016-05-10 15:53:04,True,True,True
2016-05-10 15:53:05,True,True,True
2016-05-10 15:53:17,True,True,True
2016-05-10 15:53:18,True,True,True
2016-05-10 15:53:19,True,True,True
2016-05-10 15:53:21,True,True,True
2016-05-10 15:53:25,True,True,True
2016-05-10 15:53:28,True,True,True


In [127]:
pd.crosstab(df.arrived, columns = [df.occupied, df.departed])

occupied,False,False,True,True
departed,False,True,False,True
arrived,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
False,4563,321262,2797961,1
True,1581349,10,0,15


In [128]:
df['2016-05-11 15:10'].resample('30S').mean().head()

Unnamed: 0_level_0,arrived,occupied,departed
dateTime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2016-05-11 15:10:00,True,False,False
2016-05-11 15:10:30,True,False,False


In [200]:
full_idx = pd.date_range(start=df.index.min().strftime('%Y-%m-%d %H:%M'), end = df.index.max(), freq='T')
full_idx

DatetimeIndex(['2016-05-06 16:53:00', '2016-05-06 16:54:00',
               '2016-05-06 16:55:00', '2016-05-06 16:56:00',
               '2016-05-06 16:57:00', '2016-05-06 16:58:00',
               '2016-05-06 16:59:00', '2016-05-06 17:00:00',
               '2016-05-06 17:01:00', '2016-05-06 17:02:00',
               ...
               '2016-12-23 14:33:00', '2016-12-23 14:34:00',
               '2016-12-23 14:35:00', '2016-12-23 14:36:00',
               '2016-12-23 14:37:00', '2016-12-23 14:38:00',
               '2016-12-23 14:39:00', '2016-12-23 14:40:00',
               '2016-12-23 14:41:00', '2016-12-23 14:42:00'],
              dtype='datetime64[ns]', length=332510, freq='T')

In [251]:
df_rs = df.resample('30S', label = 'left').max().reindex(full_idx)
df_rs['occupied'] = df_rs.occupied.fillna(0)
df_rs['arrived'] = df_rs.arrived.replace(0, np.nan)
df_rs['departed'] = df_rs.departed.replace(0, np.nan)

In [214]:
def plot_day(dt, datf=df_rs):
    date_df = df_rs[dt]
    
    TOOLS = 'box_zoom, reset'
    
    # create a new plot with a title and axis labels
    p = figure(tools=TOOLS, width=900, height=150, x_axis_type="datetime", title = 'Activity on {}'.format(dt))
    p.grid.grid_line_alpha=0

    # add a line renderer with legend and line thickness
    p.line(date_df.index, date_df.occupied, line_width=1, color = 'grey', legend = 'Occupied')
    
    p.square(date_df.index, date_df.arrived - 1.05, color = 'teal', legend = 'Arrive')
    p.square(date_df.index, date_df.departed - 1.05, color = 'firebrick', legend = 'Depart')

    # show the results
    show(p)
    # ts_arr = date_df[date_df.arrived == True].index


date_list = [x.strftime("%Y-%m-%d") for x in pd.date_range(start='2016-05-20', end = '2016-05-25', freq='D')]

for dt in date_list:
    plot_day(dt)
    

In [229]:
df_diff = df_rs
df_diff['occupied_ts'] = df_diff.occupied.fillna(0).tshift(periods=1)

In [230]:
df_diff['occupied'] = df_diff.occupied - df_diff.occupied_ts

In [231]:
plot_day('2016-05-25', datf=df_diff)

In [241]:
df_diff = df_diff.occ_diff

In [245]:
sum(df_diff == 1)

8000

In [253]:
df_rs.occupied.sum(skipna=True)

55228.0