Skip to content

davidanastasiu/kndar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Key Point-Based Naturalistic Driving Action Recognition (KNDAR)

This project includes our solution for Track 3 of the 2022 AI City Challenge, solving the problem of naturalistic driving action recognition. Given video input from three different views of a driver inside a car (right side, dashboard, and rearview mirror cameras), we developed a method to identify one of 18 different actions the driver may be performing, including:

  1. Normal forward driving
  2. Drinking
  3. Phone call (right)
  4. Phone call (left)
  5. Eating
  6. Text (right)
  7. Text (left)
  8. Hair / makeup
  9. Reaching behind
  10. Adjust control panel
  11. Pick up from floor (driver)
  12. Pick up from floor (passenger)
  13. Talk to passenger at the right
  14. Talk to passenger at backseat
  15. Yawning
  16. Hand on head
  17. Singing with music
  18. Shaking or dancing with music

Method

The main idea of this work is that driver activity can be determined from the movement of key points on their body. As such, we can use a pre-trained pose estimation model to identify key points (e.g., nose, right ear, left wrist), and derive features from these key points. Features include angles between key points (e.g., angle of the segments created by nose, left ear, and left eye), distances between key points (e.g., left wrist to left hip), position information (distance from center to corner of the image), shifts between some of the current key points and the respective ones in the last key frame, and shifts between certain angles and the respective angles in the last key frame.

Pose Detection Face Detection
Pose Detection Face Detection

Installation

conda create --name aic22 python=3.9.0
conda activate aic22
conda install pytorch==1.9.1 torchvision==0.10.1 cudatoolkit=10.2 -c pytorch
conda install -c conda-forge py-xgboost-gpu
git clone https://github.com/davidanastasiu/kndar.git
cd kndar
python -m pip install -r requirements.txt
-- install kapao 
git clone https://github.com/wmcnally/kapao.git
cd kapao && python data/scripts/download_models.py
cd ..

Note: for newer devices, such as the NVIDIA RTX 3090, you may use Pytorch 1.10 with cudatoolkit 11.3.

conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge

You will also need to patch kapao/utils/general.py, replacing line 539 with the following (add the clone call)

coords = coords.clone().reshape((nl, -1, 2))

Dataset

Download the track 3 dataset from the 2022 AI City Challenge. Any other data sets must follow the format exactly. In particular, labels should exist in each driver's videos directory in csv format. Alternatively, a labels text file named "dataset_name.txt" (e.g., A2.txt) can be added to the labels subdirectory in the format:

video_id activity_id ts_start ts_end

Workflow

The KNDAR framework has 3 stages:

  1. Extract features from the input data.
  2. Train or apply frame classification model on extracted features.
  3. Merge classified frames into predicted action-consistent segments.

In the following, we will give an example of applying a pre-trained KNDAR model to dataset A2, and training a new model for dataset A1.

Inference example

  1. Optionally extract featues from the A2 dataset videos. Dataset will be written to A2-all-5-f.npz and should be similar to the already existing features/A2-all-5-f.npz. KAPAO has a measure of randomness that means the key points will not be exactly the same in each extraction, which will change our extracted features based on those key points.

    python extract.py --dataset /path/to/A2 --view all --face --skip 5 --output-path A2-all-5-f.npz
  2. Perform inference on the test set using the trained model A1-all-5-f-600-8.pkl. Results will be written to models/A1-all-5-f-600-8-result.txt. If you skipped step 1, change the dataset path to features/A2-all-5-f.npz to use the existing extracted features.

    python test.py --test A2-all-5-f.npz --model models/A1-all-5-f-600-8.pkl --mgap 90 --minlen 350 --maxp 0.15

Training example

  1. Extract features from the A1 dataset videos. The labels in labels/A1.txt or the csv file labels in the video directories will be used to embed the labels and the extracted feature dataset.

    python extract.py --dataset /path/to/A1 --view all --face --skip 5 --output-path A1-all-5-f.npz
  2. Train the frame classification model. Choose one or more values for each of the nestimators and max-depth meta-parameters. If providing multiple values, separate them by comma.

    python train.py --dataset A1-all-5-f.npz --nestimators 600 --max-depth 8 

The model will be written to models/A1-all-5-f-600-8.pkl. Additionally, several figures will be saved showing validation mean logloss, mean error, and feature importance.

  1. Perform inference on the test set using the trained model. Results will be written to models/A1-all-5-f-600-8-result-test.txt.
    python test.py --test A2-all-5-f.npz --model models/A1-all-5-f-600-8.pkl --mgap 90 --minlen 350 --maxp 0.15

Extracted featues

Following is a list of features that are computed by KNDAR in each key frame for the driver.

Angles

  1. nose, left lower corner, left upper corner (of the image)
  2. left wrist, left lower corner, left upper corner (of the image)
  3. right wrist, left lower corner, left upper corner (of the image)
  4. left elbow (left shoulder, left elbow, left wrist)
  5. left shoulder (left elbow, left shoulder, left hip)
  6. right elbow (right shoulder, right elbow, right wrist)
  7. right shoulder (right elbow, right shoulder, right hip)
  8. left eye, nose, right eye
  9. nose, left shoulder, right shoulder
  10. nose, right shoulder, left shoulder
  11. nose, left ear, left eye
  12. nose, right ear, right eye
  13. left ear, right hip, right shoulder
  14. right ear, left hip, left shoulder
  15. left shoulder, right hip, right shoulder
  16. right shoulder, left hip, left shoulder

Distances

  1. nose to lower left corner
  2. left wrist to lower left corner
  3. right wrist to lower left corner
  4. nose to upper left corner
  5. left wrist to upper left corner
  6. right wrist to upper left corner
  7. nose to left shoulder
  8. nose to right shoulder
  9. nose to left ear
  10. nose to right ear
  11. left eye to right eye
  12. left ear to right ear
  13. left ear to left wrist
  14. right ear to right wrist
  15. left elbow to left hip
  16. right elbow to right hip
  17. left wrist to left hip
  18. right wrist to right hip

Positions

  1. relative bounding box width
  2. relative bounding box height
  3. relative position of bounding box center x
  4. relative position of bounding box center x

Position shifts (between current frame and last key frame)

  1. nose
  2. left eye
  3. right eye
  4. left ear
  5. right ear
  6. left shoulder
  7. right shoulder
  8. left elbow
  9. right elbow
  10. left wrist
  11. right wrist

Angle differences (between current frame and last key frame)

  1. nose, left lower corner, left upper corner (of the image)
  2. left wrist, left lower corner, left upper corner (of the image)
  3. right wrist, left lower corner, left upper corner (of the image)
  4. left elbow (left shoulder, left elbow, left wrist)
  5. left shoulder (left elbow, left shoulder, left hip)
  6. right elbow (right shoulder, right elbow, right wrist)
  7. right shoulder (right elbow, right shoulder, right hip)

Facial features [optional, only if --face parameter is invoked]

  1. distance between lower part of upper lip and top part of bottom lip

Results of the best submitted model

We submitted several results to the AIC 2022 challenge. Of all the models we submitted, the following model had the best performance on the full test set.

  • all-5-f-600-8:
    • extract features from all camera views, using keyframes every 5 frames,
    • extract both pose and facial features,
    • train an XGBoost model with 600 estimators and max tree depth of 8,
    • perform classification inference and merge key frame labels.

Feature importances of the best model

We analized the XGBoost feature imporances for the best performing model and found the following 10 features had the highest weights among the 171 extracted features.

Feature Importances for Our Best Model

Dashboard View

  • 19 - distance - nose to upper left corner of the image
  • 22 - distance - nose to left shoulder
  • 30 - distance - left elbow to left hip

Rearview Mirror View

  • 59 - angle - right wrist, left lower corner, left upper corner of the image
  • 62 - angle - right elbow (right shoulder, right elbow, right wrist)
  • 76 - distance - nose to upper left corner
  • 78 - distance - right wrist to upper left corner

Right Side Camera View

  • 120 - angle - right shoulder (right elbow, right shoulder, right hip)
  • 132 - distance - right wrist to lower left corner
  • 135 - distance - right wrist to upper left corner

Execution instructions to re-create the AIC22 model and results

If you would like to just run inference using the stored datasets and model, skip the training steps or see the inference example. Otherwise, perform the following steps in order. The steps assume you have already downloaded the Track 3 dataset from AIC2022. Note that results will vary slightly due to the stochastic nature of the KAPAO key point extractor.

  1. Create conda environment and install required packages. See installation instructions above.

  2. Extract featues from the videos. Datasets will be written to A1-all-f.npz and A2-all-f.npz.

    python extract.py --dataset /path/to/A1 --view all --face --skip 5 --output-path A1-all-5-f.npz
    python extract.py --dataset /path/to/A2 --view all --face --skip 5 --output-path A2-all-5-f.npz
  3. Train the frame driver action classification model. Model will be stored in models/A1-all-5-f-600-8.pkl.

    python train.py --dataset A1-all-5-f.npz --nestimators 600 --max-depth 8 
  4. Perform inference on the test set using the trained model. Results will be written to models/A1-all-5-f-600-8-result-test.txt.

    python test.py --test A2-all-5-f.npz --model models/A1-all-5-f-600-8.pkl --mgap 90 --minlen 350 --maxp 0.15

Citation

If you found our work useful, please cite our paper:

@inproceedings{vats-aic22,
   author    = {Arpita Vats and David C. Anastasiu},
   title     = {Key Point-Based Driver Activity Recognition},
   volume    = {1},
   month     = {July},
   booktitle = {2022 IEEE Conference on Computer Vision and Pattern Recognition Workshops},
   series    = {CVPRW'22},
   year      = {2022},
   pages     = {},
   location  = {New Orleans, LA, USA},
}

Future Work

Here are some ideas we thought of but did not have time to try/implement:

  • Object detection, e.g., using yolov5s6 - look for cell phone, bottle, see what objects are detected, then extract features such as the distance from object center and right wrist or left wrist key points.
  • Use a longer history of key frames during feature extraction.
  • Improve the driver identification algorithm.
  • Improve merge algorithm; specifically, it should not rely on challenge activity statistics.

About

Codes for AIC 2022 submissions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages