# **W281_Fall_2022_Final_Report_Driver_Behavior_Detection**
## by Hoon Kim, Kai Ying, Ram Senthamarai, Dmitry Baron

## Overview
### Goal
CDC reported that 3,100 were killed and 424,000 were injured in crashes involving a distracted driver in 2019. While dashcam has become a common and mature technology, if we could use the dashcam videos/images captured to classify various distracted driver behaviors, we could help develop assistive tools to reduce the risk of driver distraction and hence improve public driving safety. 

### Problem Statement
The main objective of our project is to design classification algorithm to classify ten types of driver behavior via identification of the differences in driver’s facial and posture when performing each of these actions: 1) Safe Driving, 2) Texting (right), 3) Phone Call (right), 4) Texting(left), 5) Phone Call(left), 6) Fiddling With Console, 7) Drinking, 8) Reaching Back, 9) Fixing Looks, 10) Conversing.

### Dataset
The dataset is provided from a [Kaggle competition](https://www.kaggle.com/competitions/state-farm-distracted-driver-detection/data) hosted by State Farms. Kaggle provides both the training and testing dataset. However, given that the testing dataset does not have any class label associated with them, for the purpose of this class project, we decided to exclude the test set given.

The training dataset consists of a total of 22,424 images captured in a fixed angle from an in-car dashcam. These images are classified into 10 types of driving behavior as mentioned above. Each image is a 640 x 480 JPEG image with RGB color space.


In [None]:
# Environment Config & Library Imports
%load_ext autoreload
%autoreload 2
%matplotlib inline


import time
import os
from tqdm.notebook import tqdm
from collections import defaultdict

import numpy as np
import pandas as pd

import seaborn as sns
from matplotlib import pyplot as plt
import matplotlib.ticker as ticker
import matplotlib

import cv2 as cv
import torch
from torchvision import transforms

import transformers
import eda_helpers
import feature_helpers
import viz
import configuration
import customdataset
import enums

device = 'cpu'
config = configuration.Configuration()
face_config = configuration.FaceConfig(config)
pose_config = configuration.PoseConfig(config)
vizualizer = viz.Vizualizer(config, face_config, pose_config, tqdm=tqdm)
feature_extractor = feature_helpers.FeatureExtractor(config, face_config, pose_config, tqdm)

IMAGE_TYPES = [enums.ImageTypes.ORIGINAL, enums.ImageTypes.POSE, enums.ImageTypes.FACE]


In [None]:
# Distribution of Training Data
vizualizer.plot_raw_class_counts()

In [None]:
# Display Sample Images
vizualizer.plot_samples(4)

Given the large number of training data and the limited computing resources we have, we reduced the number of data by sampling 600 samples in each of the ten classes that the driver’s face is detectable using a MTCNN classifier. We then separated our dataset into train, validation, and test data using a 80-10-10 split.

In [None]:
%%time
# Extract face from each image. Takes a few hours to run.
def extract_faces():
    eda_helpers.FaceExtractor(config, tqdm).extract_faces(face_config.FEATURES_FOLDER, face_config.FACE_SUMMARY_NAME, config.ANNOTATION_FILE)
# extract_faces() # Commented out as this takes a long time to run on our 20,000+ images.


In [None]:
# Distribution of Face Identification
vizualizer.plot_faces_summary()

In [None]:
# Display the results of face detection.
vizualizer.display_faces(per_group_count = 3)

In [None]:
# Split data into train, validation, test
def split_dataset():
    splitter = eda_helpers.SampleSplitter(config, face_config, pose_config, tqdm=tqdm)
    splitter.sample(config.class_dict.keys(), samples_per_class=[600, 480, 60, 60], out_file=config.ANNOTATION_FILE)

# split_dataset() # We have commented this code out as this is an one time task.



## Feature Extraction
### Facial, Hand, Torso Keypoint Position Features
#### Intuition
Many of the behaviors in our classes could be distinguished by the difference in the driver’s head orientation, hand position, and body orientation. For example, fiddling with the center console usually involves a driver stretching their right arm and having their head slightly lower and towards the right, trying to press buttons on the console.


In [None]:
# Sample image of Fiddling With Console (c6)


Since camera angle is fixed, the angle of the driver’s orientation is relatively stable in the image throughout each class. Hence, the relative position of the body parts detected is stable across each image if the driver is behaving the same way. This allows us to capture differences more due to the different behavior and less of the nose introduced by different camera angles.

For driver facial orientation, we rely on key points of the driver’s eyes, noses, and lips. From the passenger seat perspective, the tighter the nose, lips, and eyes are located in the image of the driver would indicate a higher chance that the driver is facing front as one of the eyes and parts of the noses and lips would not be clearly visible in the image, and vice versa.

For hand position and body orientation, we rely on key points of the drivers’ shoulder, elbow, wrist, and hip. From these key points, we could roughly draw out the body posture of the driver’s arm.


#### Data Preprocessing
Each image was resized and padded with 0 pixel values on the edges to ensure it keeps the aspect ratio while also fitting the expected size for our human pose estimation model. 
Then, we passed each image into the Tensorflow MoveNet model to do keypoint detection. [link](https://www.tensorflow.org/hub/tutorials/movenet)
The model outputted the coordinates and the associated scores of each key point detected.
Instead of aligning the body detected for each image, we anchored the position of the detected nose and calculated the distance between each keypoint to the nose coordinate. This avoids aligning the body detected in each image.


#### Visualization
Below are some example images of how the human pose detection works on our training dataset.

In [None]:
# Extrat Pose using MoveNet
%%time
def extract_pose():
    original_backend = matplotlib.get_backend()
    print(f'Switching MatPlotLib backend from {original_backend} to Agg')
    # Pose extraction uses plt's canvas. We need a non-interactive backend to avoid memory leaks.
    matplotlib.use('Agg')
    pose_extractor = eda_helpers.PoseExtractor(config, face_config, pose_config, tqdm)
    pose_extractor.extract_poses(pose_config.FEATURES_FOLDER, pose_config.SUMMARY_NAME)
    matplotlib.use(original_backend)

# extract_pose()

In [None]:
# Examples of extracted pose
%matplotlib inline
# reset matplotlib's backend in case the previous cell left the Agg backend active.
vizualizer.display_poses(rows=3)

#### Evaluation
We can see pretty apparent differences in the distance from the nose distributions across the different classes which make this feature a good candidate for our classifier. For example, right wrist y-coordinates are much lower in the reaching back class when compared to the rest. Another example is that for drinking and phone calling, the right wrist height is mostly aligned with the nose.


In [None]:
# Plot distribution of keypoint features distance
vizualizer.plot_keypoints_relative_positions()

### Arm Angle Feature
#### Intuition
Again here we are taking advantage of the relatively stable body parts detection due the fixed camera angle in the image. The arm angle of the driver could help us detect distracted drivers, for example distinguishing a driver who is having his/her both hands on the wheel and the ones who are using the right hand to hold a phone, a beverage, playing with console or reaching to get something from a seat. Our intuition is that safe driving can be place in the former category (both hands on the steering wheel) and unsafe (one arm is off). 
Our goal in this feature extraction exercise is to detect location of lower right and left arms and measure their angles. 


#### Data Preprocessing
We first use Tensorflow’s pretrained body segmentation model “bodypix” to identify and segment out the left and right lower arms’ coordinates in the training data. The bodypix package color-codes different body parts. By knowing colors of lower right and left arms, we calculate their colored areas. We then estimated the center of the arms to find the arms’ location, fitted a straight line through the arms’ center and calculated the angle of the arms by fitting a line using the least squares polynomial fit package (numpy.polyfit) through the area.


#### Visualization

Arms are identified and angles are calculated below.


In [None]:
# Visualization on detected arms

#### Evaluation
Using the box plot we can see distinguishable difference in angle and dispersion between class 0 (safe driving) and most of the other classes, especially for the right arm, which makes us believe that it is a useful feature. 
Angles of the left arm are less telling but in combination with the right are should generate good results.


In [None]:
# Plot distribution of arms

# Get box plots information for left arm-orientation
df_without_na = df.dropna()
class_tags = df['Class'].unique()
list_left_arm_orients = list()
for this_tag in class_tags:
    left_arm_orients = df_without_na.loc[df['Class'] == this_tag]['LeftOrient'].tolist()
    list_left_arm_orients.append(left_arm_orients)
# Plot box-plot for right arm
ax = sns.boxplot(list_left_arm_orients)
ax.set_xticklabels(class_tags.tolist())
ax.set(xlabel='Class tags', ylabel='Orientation (in rad)', title='Left arm orientations across different classes')
plt.show()

# Get box plots information for right arm-orientation
df_without_na = df.dropna()
class_tags = df['Class'].unique()
list_right_arm_orients = list()
for this_tag in class_tags:
    right_arm_orients = df_without_na.loc[df['Class'] == this_tag]['RightOrient'].tolist()
    list_right_arm_orients.append(right_arm_orients)
# Plot box-plot for right arm
ax = sns.boxplot(list_right_arm_orients)
ax.set_xticklabels(class_tags.tolist())
ax.set(xlabel='Class tags', ylabel='Orientation (in rad)', title='Right arm orientations across different classes')
plt.show()


#### Other Trials and Errors
We have tried other features such as HOG, Pixel, and Canny Edge Detection. These features require alignment of the image which is hard to achieve and therefore perform poorly as we took them as-is.

We also tried eyes detection, but it has a low coverages as many of the images have facial objects covering parts or entirity of the eyes (ex. Sunglasses)

In addition, we tried using CNN to extract features. With limited computational resources, the performance of this feature is also poor.

In [None]:
# Load image
%%time  
# faces, original_images, poses, y, filenames = load_data(30, other_types=[enums.ImageTypes.ORIGINAL, enums.ImageTypes.POSE, enums.ImageTypes.FACE], included_labels=config.included_labels)
feature_extractor = feature_helpers.FeatureExtractor(config, face_config, pose_config, tqdm)
data = feature_extractor.load_data(image_types=IMAGE_TYPES, sample_type=enums.SampleType.TRAIN_TEST_VALIDATION, shuffle=True)

# load existing features and re-generate them only if needed.
features_list = feature_extractor.load_feature_vectors(config.FEATURE_VECTORS_FOLDER, data[enums.DataColumn.FILENAME.value], data[enums.DataColumn.LABEL.value])


In [None]:
# Extract HOG feature

%%time
hog_features, hogs = feature_extractor.get_hog_features(data[enums.ImageTypes.FACE.name.lower()])

In [None]:
# Extract pixel feature

%%time
pixel_features = feature_extractor.get_pixel_features(data[enums.ImageTypes.FACE.name.lower()])

In [None]:
# Extract CNN feature
%%time
# GPU does not seems to help much.
cnn_features = feature_extractor.get_cnn_features(data[enums.ImageTypes.ORIGINAL.name.lower()], device='cpu')

In [None]:
# Extract Canny Edges

%%time
canny_features, cannies = feature_extractor.get_canny_features(data[enums.ImageTypes.FACE.name.lower()])

In [None]:
# Extract pose

%%time
pose_features = feature_extractor.get_pixel_features(data[enums.ImageTypes.POSE.name.lower()])

In [None]:
# Extract number of eyes detected feature
eye_count = feature_extractor.detect_eyes(config.TRAIN_DATA, data[enums.DataColumn.LABEL.name.lower()], data[enums.DataColumn.FILENAME.name.lower()])

def eye_summary(eye_count):
    df = pd.DataFrame(eye_count, columns=['count'])
    print(f'0: {df[df["count"] == 0].shape[0]}, 1: {df[df["count"] == 1].shape[0]}, 2: {df[df["count"] == 2].shape[0]}')
          
eye_summary(eye_count)


In [None]:
%%time
# Save the generated feature vectors.
features_list = [pixel_features, hog_features, cnn_features, canny_features, pose_features, body_parts_features]
feature_extractor.save_feature_vectors(config.FEATURE_VECTORS_FOLDER, data['filename'], data['label'], features_list)


In [None]:
# EDA on features

print(f'Loaded {data.shape[0]} samples.')
print(f'hog_features:{hog_features.shape}, hog_features.min:{np.min(hog_features)}, hog_features.max:{np.max(hog_features)}')
print(f'pixel_features:{pixel_features.shape}, pixel_features.min:{np.min(pixel_features)}, pixel_features.max:{np.max(pixel_features)}')
print(f'cnn_features:{cnn_features.shape}, cnn_features.min:{np.min(cnn_features)}, cnn_features.max:{np.max(cnn_features)}')
print(f'canny_features:{canny_features.shape}, canny_features.min:{np.min(canny_features)}, canny_features.max:{np.max(canny_features)}')
print(f'pose_features:{pose_features.shape}, pose_features.min:{np.min(pose_features)}, pose_features.max:{np.max(pose_features)}')
print()


In [None]:
# Visualize features
vizualizer.plot_features(included_labels=config.class_dict.keys())

## Classification
### Dimensionality Reduction
We tried two different methods of dimensionality reduction. Both methods look fine with clear clusters of classes visible after the reduction.


#### PCA

In [None]:
# Helpers for vizualizing
def plot_PCA(X_list, names, n_components, max_components, out_file='pca.jpg', ):
    pca_list, xpca_list = feature_extractor.get_PCA(X_list, n_components=n_components)
    plt.figure(figsize=(15,5))
    colors = ['b-', 'g-', 'r-', 'k-', 'y-']
    plot_labels = [f'{name} features' for name in names]
    for i in range(len(X_list)):
        plt.plot(np.cumsum(pca_list[i].explained_variance_ratio_), colors[i], label=plot_labels[i])
    # plt.xticks(np.arange(max_components)+1)
    plt.yticks(np.linspace(0, 1, 8))
    plt.grid(visible=True)
    plt.xlabel('Number of components')
    plt.ylabel('Explained Variances')
    plt.legend()
    plt.title('Explaining Power Of Principal Components')
    plt.tight_layout(pad=0.1, h_pad=None, w_pad=None, rect=None)
    plt.savefig(f'{config.OUTPUT_FOLDER}/report_plots/{out_file}', dpi=300)
    plt.show()

def plot_classes(X, y, ax, title, included_labels):
    colormap = plt.cm.gist_rainbow # hsv tab20 #nipy_spectral #, Set1,Paired
    colorst = [colormap(i) for i in np.linspace(0, 1.0, len(np.unique(y)))]
    markers = ['o', 'v', 's', 'p', 'x', '>', '*', '<', 'P', '^']
    for k, label in enumerate(included_labels):
        marker = markers[k % len(markers)]
        if X.shape[1] == 2:
            ax.scatter(X[y==label, 0], X[y==label, 1], facecolors=colorst[k], marker=marker, label=config.class_dict[label])
        else:
            ax.scatter(X[y==label, 0], X[y==label, 1], X[y==label, 2], facecolors=colorst[k], marker=marker, label=config.class_dict[label])
    ax.set_title(title)
    
def plot_components(features_list, X_pcas, X_tsnes, names, included_labels=LABELS_TO_INCLUDE, out_file='clustering.jpg'):
    # project the features into 2 dimensions
    fig, ax = plt.subplots(nrows=len(features_list), ncols=2, figsize=(10,5))
    if len(features_list) == 1:
        ax = [ax]

    # y is the class labels
    for i in range(len(features_list)):
        plot_classes(X_pcas[i], y, ax[i][0], title=f'{names[i]} PCA', included_labels=LABELS_TO_INCLUDE)
        plot_classes(X_tsnes[i], y, ax[i][1], title=f'{names[i]} tSNE', included_labels=LABELS_TO_INCLUDE)
    
    handles, plot_labels = ax[0][0].get_legend_handles_labels()
    fig.legend(handles, plot_labels, loc='upper center')
    plt.tight_layout(pad=0.1, h_pad=None, w_pad=12, rect=None)
    plt.savefig(f'{config.OUTPUT_FOLDER}/report_plots/{out_file}', dpi=300)
    plt.show()


In [None]:
# Plot PCA
%%time
plot_PCA([cnn_features, keypoints_features], ['CNN', 'Keypoints'], n_components=[200, 26], max_components=200, out_file=f'pca.jpg')


In [None]:
# Visualize components
%%time
def visualize_components():
    features_list = [cnn_features, keypoints_features]
    n_components = [2, 2]
    names = ['CNN', 'Keypoints']
    pcas = feature_extractor.get_PCA(features_list, n_components)[-1]
    tsnes = feature_extractor.get_tsne(features_list, n_components=2)
    plot_components(features_list, pcas, tsnes, names, included_labels=LABELS_TO_INCLUDE)
visualize_components()


     

### Hyperparameter Search
#### Setup
#### Evaluation


### Random Forest
#### Justification


#### Evaluation

### Logistic Regression
#### Justification

#### Evaluation

## Lesson Learned and Future Improvement
### Data
Noise in the data affects the quality of our features. For example, eyes are hard to detect for the driver who is wearing sunglasses. Another example is that the faces we see in the training data are oftentimes not front facing, which also challenged a facial keypoint detection.

### Classifiers
Logistic regression assumes independence and our features are not perfectly independent. For example, the relative position features could be correlated with the arm angle feature.
