Key Point-Based Naturalistic Driving Action Recognition (KNDAR)

This project includes our solution for Track 3 of the 2022 AI City Challenge, solving the problem of naturalistic driving action recognition. Given video input from three different views of a driver inside a car (right side, dashboard, and rearview mirror cameras), we developed a method to identify one of 18 different actions the driver may be performing, including:

Normal forward driving
Drinking
Phone call (right)
Phone call (left)
Eating
Text (right)
Text (left)
Hair / makeup
Reaching behind
Adjust control panel
Pick up from floor (driver)
Pick up from floor (passenger)
Talk to passenger at the right
Talk to passenger at backseat
Yawning
Hand on head
Singing with music
Shaking or dancing with music

Method

The main idea of this work is that driver activity can be determined from the movement of key points on their body. As such, we can use a pre-trained pose estimation model to identify key points (e.g., nose, right ear, left wrist), and derive features from these key points. Features include angles between key points (e.g., angle of the segments created by nose, left ear, and left eye), distances between key points (e.g., left wrist to left hip), position information (distance from center to corner of the image), shifts between some of the current key points and the respective ones in the last key frame, and shifts between certain angles and the respective angles in the last key frame.

Pose Detection	Face Detection

Installation

conda create --name aic22 python=3.9.0
conda activate aic22
conda install pytorch==1.9.1 torchvision==0.10.1 cudatoolkit=10.2 -c pytorch
conda install -c conda-forge py-xgboost-gpu
git clone https://github.com/davidanastasiu/kndar.git
cd kndar
python -m pip install -r requirements.txt
-- install kapao 
git clone https://github.com/wmcnally/kapao.git
cd kapao && python data/scripts/download_models.py
cd ..

Note: for newer devices, such as the NVIDIA RTX 3090, you may use Pytorch 1.10 with cudatoolkit 11.3.

conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge

You will also need to patch kapao/utils/general.py, replacing line 539 with the following (add the clone call)

coords = coords.clone().reshape((nl, -1, 2))

Dataset

Download the track 3 dataset from the 2022 AI City Challenge. Any other data sets must follow the format exactly. In particular, labels should exist in each driver's videos directory in csv format. Alternatively, a labels text file named "dataset_name.txt" (e.g., A2.txt) can be added to the labels subdirectory in the format:

video_id activity_id ts_start ts_end

Workflow

The KNDAR framework has 3 stages:

Extract features from the input data.
Train or apply frame classification model on extracted features.
Merge classified frames into predicted action-consistent segments.

In the following, we will give an example of applying a pre-trained KNDAR model to dataset A2, and training a new model for dataset A1.

Inference example

Optionally extract featues from the A2 dataset videos. Dataset will be written to A2-all-5-f.npz and should be similar to the already existing features/A2-all-5-f.npz. KAPAO has a measure of randomness that means the key points will not be exactly the same in each extraction, which will change our extracted features based on those key points.
```
python extract.py --dataset /path/to/A2 --view all --face --skip 5 --output-path A2-all-5-f.npz
```
Perform inference on the test set using the trained model A1-all-5-f-600-8.pkl. Results will be written to models/A1-all-5-f-600-8-result.txt. If you skipped step 1, change the dataset path to features/A2-all-5-f.npz to use the existing extracted features.
```
python test.py --test A2-all-5-f.npz --model models/A1-all-5-f-600-8.pkl --mgap 90 --minlen 350 --maxp 0.15
```

Training example

Extract features from the A1 dataset videos. The labels in labels/A1.txt or the csv file labels in the video directories will be used to embed the labels and the extracted feature dataset.
```
python extract.py --dataset /path/to/A1 --view all --face --skip 5 --output-path A1-all-5-f.npz
```
Train the frame classification model. Choose one or more values for each of the nestimators and max-depth meta-parameters. If providing multiple values, separate them by comma.
```
python train.py --dataset A1-all-5-f.npz --nestimators 600 --max-depth 8 
```

The model will be written to models/A1-all-5-f-600-8.pkl. Additionally, several figures will be saved showing validation mean logloss, mean error, and feature importance.

Perform inference on the test set using the trained model. Results will be written to models/A1-all-5-f-600-8-result-test.txt.
```
python test.py --test A2-all-5-f.npz --model models/A1-all-5-f-600-8.pkl --mgap 90 --minlen 350 --maxp 0.15
```

Extracted featues

Following is a list of features that are computed by KNDAR in each key frame for the driver.

Angles

nose, left lower corner, left upper corner (of the image)
left wrist, left lower corner, left upper corner (of the image)
right wrist, left lower corner, left upper corner (of the image)
left elbow (left shoulder, left elbow, left wrist)
left shoulder (left elbow, left shoulder, left hip)
right elbow (right shoulder, right elbow, right wrist)
right shoulder (right elbow, right shoulder, right hip)
left eye, nose, right eye
nose, left shoulder, right shoulder
nose, right shoulder, left shoulder
nose, left ear, left eye
nose, right ear, right eye
left ear, right hip, right shoulder
right ear, left hip, left shoulder
left shoulder, right hip, right shoulder
right shoulder, left hip, left shoulder

Distances

nose to lower left corner
left wrist to lower left corner
right wrist to lower left corner
nose to upper left corner
left wrist to upper left corner
right wrist to upper left corner
nose to left shoulder
nose to right shoulder
nose to left ear
nose to right ear
left eye to right eye
left ear to right ear
left ear to left wrist
right ear to right wrist
left elbow to left hip
right elbow to right hip
left wrist to left hip
right wrist to right hip

Positions

relative bounding box width
relative bounding box height
relative position of bounding box center x
relative position of bounding box center x

Position shifts (between current frame and last key frame)

nose
left eye
right eye
left ear
right ear
left shoulder
right shoulder
left elbow
right elbow
left wrist
right wrist

Angle differences (between current frame and last key frame)

nose, left lower corner, left upper corner (of the image)
left wrist, left lower corner, left upper corner (of the image)
right wrist, left lower corner, left upper corner (of the image)
left elbow (left shoulder, left elbow, left wrist)
left shoulder (left elbow, left shoulder, left hip)
right elbow (right shoulder, right elbow, right wrist)
right shoulder (right elbow, right shoulder, right hip)

Facial features [optional, only if --face parameter is invoked]

distance between lower part of upper lip and top part of bottom lip

Results of the best submitted model

We submitted several results to the AIC 2022 challenge. Of all the models we submitted, the following model had the best performance on the full test set.

all-5-f-600-8:
- extract features from all camera views, using keyframes every 5 frames,
- extract both pose and facial features,
- train an XGBoost model with 600 estimators and max tree depth of 8,
- perform classification inference and merge key frame labels.

Feature importances of the best model

We analized the XGBoost feature imporances for the best performing model and found the following 10 features had the highest weights among the 171 extracted features.

Dashboard View

19 - distance - nose to upper left corner of the image
22 - distance - nose to left shoulder
30 - distance - left elbow to left hip

Rearview Mirror View

59 - angle - right wrist, left lower corner, left upper corner of the image
62 - angle - right elbow (right shoulder, right elbow, right wrist)
76 - distance - nose to upper left corner
78 - distance - right wrist to upper left corner

Right Side Camera View

120 - angle - right shoulder (right elbow, right shoulder, right hip)
132 - distance - right wrist to lower left corner
135 - distance - right wrist to upper left corner

Execution instructions to re-create the AIC22 model and results

If you would like to just run inference using the stored datasets and model, skip the training steps or see the inference example. Otherwise, perform the following steps in order. The steps assume you have already downloaded the Track 3 dataset from AIC2022. Note that results will vary slightly due to the stochastic nature of the KAPAO key point extractor.

Create conda environment and install required packages. See installation instructions above.

Extract featues from the videos. Datasets will be written to A1-all-f.npz and A2-all-f.npz.

python extract.py --dataset /path/to/A1 --view all --face --skip 5 --output-path A1-all-5-f.npz
python extract.py --dataset /path/to/A2 --view all --face --skip 5 --output-path A2-all-5-f.npz

Train the frame driver action classification model. Model will be stored in models/A1-all-5-f-600-8.pkl.
```
python train.py --dataset A1-all-5-f.npz --nestimators 600 --max-depth 8 
```
Perform inference on the test set using the trained model. Results will be written to models/A1-all-5-f-600-8-result-test.txt.
```
python test.py --test A2-all-5-f.npz --model models/A1-all-5-f-600-8.pkl --mgap 90 --minlen 350 --maxp 0.15
```

Citation

If you found our work useful, please cite our paper:

@inproceedings{vats-aic22,
   author    = {Arpita Vats and David C. Anastasiu},
   title     = {Key Point-Based Driver Activity Recognition},
   volume    = {1},
   month     = {July},
   booktitle = {2022 IEEE Conference on Computer Vision and Pattern Recognition Workshops},
   series    = {CVPRW'22},
   year      = {2022},
   pages     = {},
   location  = {New Orleans, LA, USA},
}

Future Work

Here are some ideas we thought of but did not have time to try/implement:

Object detection, e.g., using yolov5s6 - look for cell phone, bottle, see what objects are detected, then extract features such as the distance from object center and right wrist or left wrist key points.
Use a longer history of key frames during feature extraction.
Improve the driver identification algorithm.
Improve merge algorithm; specifically, it should not rely on challenge activity statistics.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
features		features
images		images
labels		labels
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
datasets.py		datasets.py
extract.py		extract.py
features.py		features.py
geometry.py		geometry.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
utilities.py		utilities.py
video.py		video.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Key Point-Based Naturalistic Driving Action Recognition (KNDAR)

Method

Installation

Dataset

Workflow

Inference example

Training example

Extracted featues

Angles

Distances

Positions

Position shifts (between current frame and last key frame)

Angle differences (between current frame and last key frame)

Facial features [optional, only if --face parameter is invoked]

Results of the best submitted model

Feature importances of the best model

Execution instructions to re-create the AIC22 model and results

Citation

Future Work

About

Releases

Packages

Languages

License

davidanastasiu/kndar

Folders and files

Latest commit

History

Repository files navigation

Key Point-Based Naturalistic Driving Action Recognition (KNDAR)

Method

Installation

Dataset

Workflow

Inference example

Training example

Extracted featues

Angles

Distances

Positions

Position shifts (between current frame and last key frame)

Angle differences (between current frame and last key frame)

Facial features [optional, only if --face parameter is invoked]

Results of the best submitted model

Feature importances of the best model

Execution instructions to re-create the AIC22 model and results

Citation

Future Work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages