## **Final Project - Predicting Pedestrian Crossings**
### Cindy Chen
#### Last Updated: 5/30/2022


## **Project Overview:**

#### *Question 1: Describe in detail the substantive problem you are interested in working on for this assignment*

---

My machine learning problem of interest is the ability to predict whether a pedestrian will cross the street in front of a vehicle based on footage from self-driving vehicles.  I selected this specific problem for its salient and serious nature: the growing adoption of computer vision technology for self-driving cars heightens the risk of pedestrian casualties and injuries, especially when the technology is still nascent.  The critical consequences of an autonomous vehicle hitting even a single pedestrian demands acute attention to pedestrian safety.  Accordingly, the ability to accurately predict whether someone is crossing the street in front of a self-driving car is extremely important. When I was given this question for an internship applications's technical evaluation, I hadn't worked with sequential data for classification at the time.  Accordingly, I would like to re-approach this classification problem with my newly-acquired knowledge of Recurrent Neural Nets (RNNs).

### *Question 2: Describe your data (including the features, target, and the general type of data you are working with)*
---
This project focuses on **annotated** sequential video data about pedestrian crossings. In other words, video data of pedestrians crossing a street have been annotated frame-by-frame with bounding box dimensions and positions, predictive labels such as "hand wave" and "look" if it occurred in the frame, and outcome labels on whether the pedestrian is crossing in the frame (as well as whether they crossed at any point in the video).  In other words, this data set consists of text, bounding box coordinates, and numeric data; no image data is involved since it has all been translated.

The dataset for this project comes from the Joint Attention in Autonomous Driving (JAAD) project from York University, which is a publicly-available dataset aimed at helping train models on pedestrian and driver behaviors, as well as factors that motivate them.

Each row of the dataframe consists of some meta data about the video id and a specific pedestrian id, as well as an ordered list of frames where that pedestrian appears.

**FEATURES:**
* frame_numbers - the frame (starting from 0) of the annotated video

* bounding_boxes - Series of boxes that align with the frame_numbers field.  Each box is constructed of [box x, box y, box width, box height], where x and y represent the upper left hand corner of the box

* moving_slow, stopped, handwave, look, clear path, moving fast, looking, standing, slow down, nod, speed up - Annotated pedestrian attributes whose values are binary (1 or 0) based on whether that action occurred in the frame.

* crossing - Whether the pedestrian is crossing in the specific frame. I include this as an independent variable because drivers usually have context on whether someone is already crossing the street in the past.

**TARGET:**

* crossing - Whether or not the pedestrian is crossing in the specific frame.  My target variable will be whether the pedestrian is crossing in the NEXT frame.

In [1]:
% pip install keras_tuner

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
import ast
import pandas as pd
import numpy as np
import tensorflow as tf

# preprocessing
from imblearn.over_sampling import RandomOverSampler
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# for reproducibility
tf.random.set_seed(20220429)

# RNN
from tensorflow.keras.optimizers import Adam, Adagrad, Adadelta
from keras.models import Sequential, Model
from keras.layers import LSTM, Dense, Dropout, Flatten, Activation, BatchNormalization, GlobalMaxPooling1D, SimpleRNN, Bidirectional, GRU
from sklearn.model_selection import train_test_split
from tensorflow.python.keras.utils import np_utils

# keras tuner
import keras_tuner as kt

In [3]:
# read the data
# data had to be converted to excel format because there is an issue reading the data as a csv
pedestrians_df = pd.read_excel('pedestrian_df_excel.xlsx', sheet_name = 'pedestrian_df')

In [4]:
# apply function over each column to remove extra quotations in the data

for col_name in ['bounding_boxes',
                 'frame_numbers',
                 'moving slow',
                 'stopped',
                 'handwave',
                 'look',
                 'clear path',
                 'crossing',
                 'moving fast',
                 'looking',
                 'standing',
                 'slow down',
                 'nod',
                 'speed up']:
                 pedestrians_df[col_name] = pedestrians_df[col_name].apply(ast.literal_eval)

In [5]:
# view the original nested data frame
pedestrians_df.head()

Unnamed: 0,video_id,ped_ind,frame_numbers,bounding_boxes,moving slow,stopped,handwave,look,clear path,crossing,moving fast,looking,standing,slow down,nod,speed up,cross_overall
0,video_0071,1,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[[1209, 598, 51, 191], [1214, 598, 52, 192], [...","[False, False, False, False, False, False, Fal...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",False
1,video_0071,2,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[[1249, 621, 51, 127], [1254, 620, 51, 129], [...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",True
2,video_0204,1,"[3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, ...","[[1135, 673, 28, 97], [1139, 672, 29, 92], [11...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",True
3,video_0204,3,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[[906, 670, 35, 65], [906, 672, 32, 65], [907,...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",True
4,video_0204,2,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[[1152, 657, 42, 114], [1158, 657, 42, 117], [...","[False, False, False, False, False, False, Fal...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[True, True, True, True, True, True, True, Tru...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",False


In [6]:
# Let's take a more in-depth look at that first row:
print(pedestrians_df.iloc[0])

video_id                                                 video_0071
ped_ind                                                           1
frame_numbers     [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
bounding_boxes    [[1209, 598, 51, 191], [1214, 598, 52, 192], [...
moving slow       [False, False, False, False, False, False, Fal...
stopped           [True, True, True, True, True, True, True, Tru...
handwave          [False, False, False, False, False, False, Fal...
look              [False, False, False, False, False, False, Fal...
clear path        [False, False, False, False, False, False, Fal...
crossing          [False, False, False, False, False, False, Fal...
moving fast       [False, False, False, False, False, False, Fal...
looking           [False, False, False, False, False, False, Fal...
standing          [False, False, False, False, False, False, Fal...
slow down         [False, False, False, False, False, False, Fal...
nod               [False, False, False, False, F

### *Question 3: Describe why you think Deep Learning systems would be a good methodological choice, and the specific architecture(s) you think appropriate to explore*
---

Deep learning is an ideal method to analyze the annotated pedestrian crossing data, because the nature of the predictions rests on sequential/historical data.  After all, when a driver (or autonomous vehicle) makes a decision to slow down, stop, maintain the same speed, or speed up when they see a pedestrian has much to do with the pedestrian's sequence of observed actions. Order matters!  Are they stationary and staring at the car, or do they keep waving to signal their intention to cross? What if they wave intermittently, does that mean they are about to cross?

Basic classification models like random forests and logistic regressions can neither efficiently nor effectively incorporate history or sequences into their data to make a prediction of what will happen next.  Accordingly, I plan to use a **Recurrent Neural Network** because they incorporate sequential information and can drop information as needed.  In the realm of RNNs, I plan to experiment with various cells like LSTMs and GRUs when running my models.

### *Question 4: Show the code and outputs of 3-5 different experiments with variants of your proposed architecture(s)*
---
#### Step 1: Preprocess data

#### **NOTE: PREPROCESSING THE DATA WAS THE MOST DIFFICULT PART**

Add padding to my data set so that each annotated video is the same length (930 frames)

* I chose to pad at the beginning of the video

In [7]:
# identify the video with the longest frames, because all our padding must yield resulting arrays to match those dimensions
max_ped_len = 930

for row_id in range(0, 666, 1):
  
  for col_name in ['moving slow', 'stopped', 'handwave', 'look', 'clear path', 
                 'crossing', 'moving fast', 'looking', 'standing', 'slow down', 'nod', 'speed up']:
                   test_len = pedestrians_df.iloc[row_id][col_name]
                   selected_ped_len = len(test_len)
                   
                   padded_array = np.pad(test_len, (max_ped_len - selected_ped_len,0), mode = "constant")[0:930]
                   
                   # replace original values with padded data
                   pedestrians_df.at[row_id, col_name] = padded_array

  # reset all the frame numbers
  pedestrians_df.at[row_id, 'frame_numbers'] = np.arange(0, 930, 1)

  # create fake bounding boxes for padding
  test_len = pedestrians_df.iloc[row_id]['bounding_boxes']
  selected_ped_len = len(test_len)
  padded_array = np.pad(test_len, [(max_ped_len - selected_ped_len, 0), (0, 0)], mode = "constant")
  pedestrians_df.at[row_id, 'bounding_boxes'] = padded_array

In [8]:
# preview the new data frame (I expect to see lots of zeros and False values in the first part of the preview since this ensures that my
# bounding boxes are appearing correctly)
pedestrians_df.head()

Unnamed: 0,video_id,ped_ind,frame_numbers,bounding_boxes,moving slow,stopped,handwave,look,clear path,crossing,moving fast,looking,standing,slow down,nod,speed up,cross_overall
0,video_0071,1,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",False
1,video_0071,2,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",True
2,video_0204,1,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0,...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",True
3,video_0204,3,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0,...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",True
4,video_0204,2,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...","[[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0], [...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...","[False, False, False, False, False, False, Fal...",False


In [9]:
# take only the first 300 videos due to RAM capacity constraints while using Google Colab Pro
pedestrians_df = pedestrians_df[0:400]

### Step 2. Unravel the data frame

I must unravel the entire data frame since it is in a nested data frame so that each row is a unique pedestrian frame, and certain data points like the bounding box features must be separated into their own columns.

In [10]:
def unravel_dataframe(input_dataframe):
  # incorporate bounding boxes last
  attributes = ['frame_numbers', 'moving slow', 'stopped', 'handwave', 'look', 'clear path', 'crossing', 'moving fast',
              'looking', 'standing', 'slow down', 'nod', 'speed up']
              
  new_ped = pd.DataFrame() #create empty data frame
  final_df = pd.DataFrame() #create empty data frame
  bounding = pd.DataFrame() #create empty data frame
  
  # run loop to unravel the nested lists and format as a data frame
  for i in range(len(input_dataframe)):
    
    new_ped = pd.DataFrame() #clear dataframe
    
    for j in attributes:
      # convert a nested list into a data frame column specific to a single pedestrian
      new_column = pd.DataFrame(input_dataframe.iloc[i][j]) 
      
      # concatenate the column to the other columns specific to a single pedestrian
      new_ped = pd.concat([new_ped, new_column], axis = 1) 

    # convert existing pedestrians_df columns for concatenation
    input_dataframe['video_id'] = input_dataframe['video_id'].astype(str)
    input_dataframe['ped_ind'] = input_dataframe['ped_ind'].astype(str)

    # create a unique pedestrian id based on the video ID and the pedestrian_ind #
    new_ped['unique_pedestrian'] = input_dataframe.iloc[i]['video_id']  + "_" + input_dataframe.iloc[i]['ped_ind']
    
    # incorporate bounding box dimensions
    bounding = pd.DataFrame(input_dataframe.iloc[i]['bounding_boxes'])
  
    new_ped = pd.concat([new_ped, bounding], axis = 1)
    final_df = final_df.append(new_ped) #append these concatenated columns to my final data frmae

  # rename certain columns
  final_df.columns = ['frame_numbers', 'moving slow', 'stopped', 'handwave', 'look', 'clear path', 'crossing', 'moving fast',
                    'looking', 'standing','slow down', 'nod', 'speed up', 'unique_pedestrian', 'bb_x', 'bb_y', 'bb_width', 'bb_height']

  # rearrange my columns and reset my index
  final_df = final_df[['unique_pedestrian', 'frame_numbers', 'bb_x', 'bb_y', 'bb_width', 'bb_height', 'moving slow', 'stopped', 'handwave', 'look', 'clear path',
                     'moving fast', 'looking', 'standing','slow down', 'nod', 'speed up', 'crossing']].reset_index(inplace = False)


  # remove the index column that was created
  final_df = final_df.drop(["index"], axis = 1, inplace = False)

  # create a new data frame, which helps us keep track of different versions of the data frame
  df_with_numeric = pd.DataFrame()

  # replace the boolean values with 1.0 or 0.0 since I want to run calculations on these values
  for i in final_df.columns:
    df_with_numeric[i] = final_df[i].replace({True: 1.0, False: 0.0})

  # alter column names that have spaces in them to include underscores instead
  df_with_numeric.columns = [c.replace(' ', '_') for c in df_with_numeric.columns]

  return(df_with_numeric)

In [11]:
df_with_numeric = unravel_dataframe(pedestrians_df)

In [12]:
# review the current status of the data frame
df_with_numeric.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 372000 entries, 0 to 371999
Data columns (total 18 columns):
 #   Column             Non-Null Count   Dtype  
---  ------             --------------   -----  
 0   unique_pedestrian  372000 non-null  object 
 1   frame_numbers      372000 non-null  int64  
 2   bb_x               372000 non-null  float64
 3   bb_y               372000 non-null  float64
 4   bb_width           372000 non-null  float64
 5   bb_height          372000 non-null  float64
 6   moving_slow        372000 non-null  float64
 7   stopped            372000 non-null  float64
 8   handwave           372000 non-null  float64
 9   look               372000 non-null  float64
 10  clear_path         372000 non-null  float64
 11  moving_fast        372000 non-null  float64
 12  looking            372000 non-null  float64
 13  standing           372000 non-null  float64
 14  slow_down          372000 non-null  float64
 15  nod                372000 non-null  float64
 16  sp

In [13]:
# preview the end of the data frame
df_with_numeric.tail()

Unnamed: 0,unique_pedestrian,frame_numbers,bb_x,bb_y,bb_width,bb_height,moving_slow,stopped,handwave,look,clear_path,moving_fast,looking,standing,slow_down,nod,speed_up,crossing
371995,video_0137_0,925,1563.0,578.777778,162.444444,500.669591,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
371996,video_0137_0,926,1591.0,577.333333,164.333333,502.377193,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
371997,video_0137_0,927,1619.0,575.888889,166.222222,504.084795,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
371998,video_0137_0,928,1647.0,574.444444,168.111111,505.792398,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
371999,video_0137_0,929,1675.0,573.0,170.0,507.5,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0


Convert the unnested rows into arrays by row.  And then group them by the unique pedestrian so there are 400 unique pedestrians with 99 rolling frames each and 15 features.

Take every 10th sequence for efficiency reasons without compromising the variety of sequences.

In [14]:
# define empty arrays
bigger_frame2 = []
X2 = []
frames2 = []
stop_pt = 0 # define the stopping point as a default

for single_pedestrian in df_with_numeric['unique_pedestrian'].unique():
  # define absolute stop points
  abs_start_pt = df_with_numeric.unique_pedestrian.ne(single_pedestrian).idxmin() + 200
  abs_stop_pt = abs_start_pt + 929 - 200 # stopping pt is 929 frames not 930 as the 930th crossing var will be predicted

  start_pt = abs_start_pt # set the first start point

  while stop_pt < abs_stop_pt:
    stop_pt = start_pt + 100
    frames2 = []
  
    for i in range(start_pt, stop_pt-1): # stop_pt is exclusive in the range
      new_array = np.array(df_with_numeric.loc[i, 'bb_x':'crossing']) # include crossing as a variable
      new_array = np.reshape(new_array, -1)
      frames2 = np.append(frames2, new_array)
  
    frames2 = frames2.reshape(99, 16)

    bigger_frame2 = np.array(frames2, dtype = float)
    X2 = np.append(X2, bigger_frame2)
    
    # skip 10 frames so that our training set can cover more ground
    start_pt = start_pt + 10

In [15]:
X2 = X2.reshape(64*400, 99, 16) # 64 * 400 videos

In [55]:
# define the Y variable
y_prelim = np.array(df_with_numeric['crossing'], dtype=int) # isolate 'crossing' variable
new_y = [] # create new empty array which will house the edited sequence
new_y_array = []
y2 = []

for single_pedestrian in df_with_numeric['unique_pedestrian'].unique():
  
  # define absolute stop points where I am extracting y-value
  abs_start_pt = df_with_numeric.unique_pedestrian.ne(single_pedestrian).idxmin() + 299 # take the 300th value (because each sequence is now 100 frames but I skip the first 200 frames)
  abs_stop_pt = abs_start_pt + (929 - 299) # this takes the 930th value from each
  copy_val = abs_start_pt

  while copy_val <= abs_stop_pt:
    new_y = y_prelim[copy_val]
    new_y_array = np.array(new_y, dtype = int)
    y2 = np.append(y2, new_y_array)
    copy_val = copy_val + 10

### Step 3. Create training and test sets and scale data

In [17]:
# preview the X array that we have created for this problem
print("Length of X2:", len(X2))
print("Shape of X2:", X2.shape)
print("Preview of X2:", X2[0])

Length of X2: 25600
Shape of X2: (25600, 99, 16)
Preview of X2: [[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]


In [18]:
print("Length of Y2:", len(y2))
print("Shape of Y2:", y2.shape)

Length of Y2: 25600
Shape of Y2: (25600,)


In [19]:
# train test split the X and Y data
X_train2, X_test2, y_train2, y_test2 = train_test_split(
    X2, y2, test_size=0.1, shuffle=True)

print(X_train2.shape, X_test2.shape, y_train2.shape, y_test2.shape)

(23040, 99, 16) (2560, 99, 16) (23040,) (2560,)


In [45]:
X_train2[2]

array([[188.526316, 628.333333,  86.710526, ...,   0.      ,   0.      ,
          1.      ],
       [190.973684, 628.6     ,  85.805415, ...,   0.      ,   0.      ,
          1.      ],
       [193.421053, 628.866667,  84.939005, ...,   0.      ,   0.      ,
          1.      ],
       ...,
       [588.      , 637.      ,  57.      , ...,   0.      ,   0.      ,
          1.      ],
       [598.      , 638.      ,  57.      , ...,   0.      ,   0.      ,
          1.      ],
       [607.      , 638.      ,  58.      , ...,   0.      ,   0.      ,
          1.      ]])

In [51]:
df_with_numeric[313:325]

Unnamed: 0,unique_pedestrian,frame_numbers,bb_x,bb_y,bb_width,bb_height,moving_slow,stopped,handwave,look,clear_path,moving_fast,looking,standing,slow_down,nod,speed_up,crossing
313,video_0071_1,313,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
314,video_0071_1,314,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
315,video_0071_1,315,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
316,video_0071_1,316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
317,video_0071_1,317,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
318,video_0071_1,318,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
319,video_0071_1,319,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
320,video_0071_1,320,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
321,video_0071_1,321,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
322,video_0071_1,322,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Step 4. Run Models
####  Model #1: Recurrent Neural Net
> * uses Adadelta optimizer
> * smallest of the 4 models in terms of layers
> * no bidirectional layers

In [21]:
# defining the model

hidden_size = 400

dense_out_1 = 20
dense_out_2 = 1

model = Sequential()
model.add(SimpleRNN(32, return_sequences = True))
model.add(LSTM(hidden_size))
model.add(Dense(dense_out_1, activation="relu"))
model.add(Dense(dense_out_2, activation="softmax"))

model.compile(loss="binary_crossentropy", optimizer="adadelta", metrics=["accuracy"])

In [22]:
# fitting the network (This takes about a minute per epoch). Output indicates we should consider slowing down learning process with Dropout.

with tf.device('/device:GPU:0'): #"/GPU:0": Short-hand notation for the first GPU of your machine that is visible to TensorFlow.

  history = model.fit(
      X_train2,
      y_train2,
      epochs = 2,
      batch_size = 40,
      verbose = 2
  )

Epoch 1/2
576/576 - 32s - loss: 0.6433 - accuracy: 0.1829 - 32s/epoch - 55ms/step
Epoch 2/2
576/576 - 28s - loss: 0.4228 - accuracy: 0.1829 - 28s/epoch - 49ms/step


In [23]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn (SimpleRNN)      (40, 99, 32)              1568      
                                                                 
 lstm (LSTM)                 (40, 400)                 692800    
                                                                 
 dense (Dense)               (40, 20)                  8020      
                                                                 
 dense_1 (Dense)             (40, 1)                   21        
                                                                 
Total params: 702,409
Trainable params: 702,409
Non-trainable params: 0
_________________________________________________________________


In [24]:
# test score
model.evaluate(X_test2, y_test2)



[0.32437556982040405, 0.18671874701976776]

#### Model #2: Recurrent Neural Net
> * Uses adam optimizer
> * Incorporates dropout and recurrent dropout
> * No bidirectional layers (this seemed to reduce performance)
> * Uses 10% validation split

In [25]:
# defining the second model

model2 = Sequential()
model2.add(SimpleRNN(32, return_sequences = True, input_shape = X_train2.shape[-2:]))
model2.add(LSTM(hidden_size, dropout = 0.2, recurrent_dropout = 0.2, return_sequences = True))
model2.add(SimpleRNN(32, return_sequences = True, input_shape = X_train2.shape[-2:]))
model2.add(SimpleRNN(24, dropout = 0.1))
model2.add(Dense(100, activation="relu"))
model2.add(Dense(1, activation="softmax"))

model2.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])



In [26]:
with tf.device('/device:GPU:0'): #"/GPU:0": Short-hand notation for the first GPU of your machine that is visible to TensorFlow.

  history2 = model2.fit(
      X_train2,
      y_train2,
      epochs=2,
      batch_size=40,
      validation_split = 0.1,
      verbose=2,
      shuffle=True)

Epoch 1/2
519/519 - 354s - loss: 0.2143 - accuracy: 0.1830 - val_loss: 0.1792 - val_accuracy: 0.1819 - 354s/epoch - 683ms/step
Epoch 2/2
519/519 - 345s - loss: 0.1828 - accuracy: 0.1830 - val_loss: 0.1757 - val_accuracy: 0.1819 - 345s/epoch - 665ms/step


In [27]:
model2.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn_1 (SimpleRNN)    (None, 99, 32)            1568      
                                                                 
 lstm_1 (LSTM)               (None, 99, 400)           692800    
                                                                 
 simple_rnn_2 (SimpleRNN)    (None, 99, 32)            13856     
                                                                 
 simple_rnn_3 (SimpleRNN)    (None, 24)                1368      
                                                                 
 dense_2 (Dense)             (None, 100)               2500      
                                                                 
 dense_3 (Dense)             (None, 1)                 101       
                                                                 
Total params: 712,193
Trainable params: 712,193
Non-tr

In [28]:
# evaluate test set results
model2.evaluate(X_test2, y_test2)



[0.17101629078388214, 0.18671874701976776]

#### Model #3: Recurrent Neural Net

> * Uses Adagrad optimizer
> * Incorporates more RNN layers with recurrent dropout and dropout
> * Includes Flatten, Dropout, and Batch Normalization transformations in the dense layers
> * Includes 2 bidirectional layers
> * No validation split

In [29]:
# defining the second model

model3 = Sequential()
model3.add(SimpleRNN(32, return_sequences = True))
model3.add(Bidirectional(LSTM(hidden_size, return_sequences = True)))
model3.add(SimpleRNN(64, dropout = 0.1, return_sequences = True))
model3.add(LSTM(hidden_size, dropout = 0.1, return_sequences = True))
model3.add(SimpleRNN(64, dropout = 0.1, return_sequences = True))
model3.add(LSTM(hidden_size, recurrent_dropout = 0.2, return_sequences = True))
model3.add(Bidirectional(LSTM(hidden_size, return_sequences = True)))
model3.add(Flatten())
model3.add(Dense(250, activation = "relu"))
model3.add(BatchNormalization())
model3.add(Dense(45, activation = "relu"))
model3.add(Dropout(0.1))
model3.add(Dense(1, activation="softmax"))

model3.compile(loss="binary_crossentropy", optimizer="adagrad", metrics=["accuracy"])



In [30]:
with tf.device('/device:GPU:0'): #"/GPU:0": Short-hand notation for the first GPU of your machine that is visible to TensorFlow.

  history3 = model3.fit(
      X_train2,
      y_train2,
      epochs=3,
      batch_size=30,
      verbose=2
  )

Epoch 1/3
768/768 - 300s - loss: 0.1894 - accuracy: 0.1829 - 300s/epoch - 390ms/step
Epoch 2/3
768/768 - 288s - loss: 0.1707 - accuracy: 0.1829 - 288s/epoch - 374ms/step
Epoch 3/3
768/768 - 287s - loss: 0.1654 - accuracy: 0.1829 - 287s/epoch - 374ms/step


In [31]:
model3.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn_4 (SimpleRNN)    (30, 99, 32)              1568      
                                                                 
 bidirectional (Bidirectiona  (30, 99, 800)            1385600   
 l)                                                              
                                                                 
 simple_rnn_5 (SimpleRNN)    (30, 99, 64)              55360     
                                                                 
 lstm_3 (LSTM)               (30, 99, 400)             744000    
                                                                 
 simple_rnn_6 (SimpleRNN)    (30, 99, 64)              29760     
                                                                 
 lstm_4 (LSTM)               (30, 99, 400)             744000    
                                                      

In [32]:
# evaluate test set results
model3.evaluate(X_test2, y_test2)



[0.15172900259494781, 0.18671874701976776]

#### Model #4: Recurrent Neural Net
> * Uses a different number of units in the Simple RNNs, and these units gradually reduce across the flow of the layers
> * Back-to-back SimpleRNN layers
> * Uses an Adam optimizer
> * One epoch
> * 3 consecutive dense layers
> * No bidirectional layers
> * Incorporate a single GRU layer

In [33]:
# defining the second model

model4 = Sequential()
model4.add(SimpleRNN(240, return_sequences = True))
model4.add(SimpleRNN(124, dropout = 0.1, return_sequences = True))
model4.add(LSTM(90, recurrent_dropout = 0.2, return_sequences = True))
model4.add(GRU(40, return_sequences = True))
model4.add(LSTM(90, recurrent_dropout = 0.2, return_sequences = True))
model4.add(SimpleRNN(32, return_sequences = True))
model4.add(Flatten())
model4.add(Dense(250, activation = "relu"))
model4.add(BatchNormalization())
model4.add(Dense(120, activation = "relu"))
model4.add(Dense(1, activation="softmax"))

model4.compile(loss="binary_crossentropy", optimizer = "adam", metrics=["accuracy"])



In [57]:
with tf.device('/device:GPU:0'): #"/GPU:0": Short-hand notation for the first GPU of your machine that is visible to TensorFlow.

  history4 = model4.fit(
      X_train2,
      y_train2,
      epochs=1,
      batch_size=10,
      verbose=2,
      shuffle = True
  )

2304/2304 - 1131s - loss: 0.2029 - accuracy: 0.1829 - 1131s/epoch - 491ms/step


In [58]:
model4.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn_7 (SimpleRNN)    (10, 99, 240)             61680     
                                                                 
 simple_rnn_8 (SimpleRNN)    (10, 99, 124)             45260     
                                                                 
 lstm_6 (LSTM)               (10, 99, 90)              77400     
                                                                 
 gru (GRU)                   (10, 99, 40)              15840     
                                                                 
 lstm_7 (LSTM)               (10, 99, 90)              47160     
                                                                 
 simple_rnn_9 (SimpleRNN)    (10, 99, 32)              3936      
                                                                 
 flatten_1 (Flatten)         (10, 3168)               

In [59]:
# evaluate test set results
model4.evaluate(X_test2, y_test2)



[0.18910416960716248, 0.18671874701976776]

#### Mode #5: Experiment with Keras Tuner


In [60]:
# define keras tuner function

def build_model(hp):
  model5 = Sequential()
  model5.add(SimpleRNN(hp.Choice('units', [8, 16, 32]), return_sequences = True, input_shape = X_train2.shape[-2:]))
  model5.add(LSTM(hidden_size, dropout = 0.2, recurrent_dropout = 0.2, return_sequences = True))
  model5.add(SimpleRNN(hp.Choice('units', [8, 16, 32]), return_sequences = True, input_shape = X_train2.shape[-2:]))
  model5.add(SimpleRNN(hp.Choice('units', [8, 16, 32]), dropout = 0.1))
  model5.add(Dense(100, activation="relu"))
  model5.add(Dense(1, activation="softmax"))

  model5.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
  return model5


In [61]:
tuner = kt.RandomSearch(
    build_model,
    objective='loss',
    max_trials=5)

INFO:tensorflow:Reloading Oracle from existing project ./untitled_project/oracle.json


In [62]:
tuner.search(X_train2, y_train2, epochs=5, validation_split = 0.1)
best_model = tuner.get_best_models()[0]

Trial 2 Complete [00h 20m 26s]
loss: 0.17017751932144165

Best loss So Far: 0.17017751932144165
Total elapsed time: 00h 40m 52s
INFO:tensorflow:Oracle triggered exit


In [63]:
best_model.evaluate(X_test2, y_test2)



[0.16210821270942688, 0.18671874701976776]

### *Question 5: Select the model that best fits the problem you are trying to solve and discuss why it fulfills this purpose*

After running five (5) models, I consider my second model to be the best, based on the classification performance against my test set and the simplicity of the model.  Since my models are obtaining identical test scores, this guides me in my next steps to refine my various RNN architectures to see if I can increase my performance and generate more variance in my metrics.

I used the test set accuracy rate as the primary metric for comparing my models, because performance on unseen data is extremely important, especially in the context of recognizing pedestrian crossings.

All five models were RNNs and I was limited to this specific type of architecture given the sequential nature of my data.  Accordingly, I was able to experiment with various layers specific to RNNs like LSTMs, GRUs, and Bidirectional layers, as well as standard neural network layers like dropout and batch normalization.

Of course, I will acknowledge that the test performance in the 0.18 range is not rigorous enough to be the appropriate solution for a serious issue like detecting pedestrian crossings.  After all, we need nearly 100% accuracy since a pedestrian injury or fatality comes at an incredibly high cost.  However, I believe this is a compelling starting point when tackling the very serious challenge of detecting pedestrian crossing intent using the actions leading up to their walk.

#### References
---
Rasouli, A and Kotseruba, I and Tsotsos, J K. (2017). "Agreeing to cross: How drivers and pedestrians communicate". IEEE Intelligent Vehicles Symposium (IV). 264 - 269

In [None]:
%%shell
jupyter nbconvert --to html 'RNNs_Crossing_Prediction.ipynb'