Skip to content
This repo is to apply pose machinese to find attention of pedestrians.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
openpose @ bc460c8

Pedestrian Attention Detection

1 Introduction

This repository contains the code and the documentation for the EECS 504 project (Winter 2018).

1.1 Rationale

In the drive towards increased vehicle autonomy, one hurdle is the detection and anticipation of pedestrians. Of course, given the complex nature of human interaction with the environment, especially in cityscapes, we thought that studying the attention of people crossing the streets is paramount to building better self driving cars.

Now, if we can classify the most common mistakes that people make which will take away attention from the road (like appreciating those expensive shoes or chatting with your significant, probably soon to be widowed other), we essentially have more information at the autonomous vehicle to make a better call.

While RCNN and its ilk cite:DBLP:journals/corr/RenHG015,DBLP:journals/corr/Girshick15,DBLP:journals/corr/GirshickDDM13 can detect people well, intent is not known from the masks and that’s an additional step that needs to be done.

By using a pose estimator cite:DBLP:journals/corr/CaoSWS16, we can reduce the problem of pedestrian attention/awareness detection to one that works on classifying poses of skeletons.

Another possible advantage of this approach (only envisioned, of course) is that we can generate synthetic data as we the pose machine already reduces the state space search to a much smaller domain (angles and their relative movements) and generating data for this domain should be a lot easier as well.

1.2 Scope of this work.

In this work, we restrict ourselves to the common patterns of distractions. One of them is not looking at the ground and another is gazing into the deep, dark void of a smartphone.

Also, we use the pose estimator from CMU cite:DBLP:journals/corr/CaoSWS16,simon2017hand to get the skeletons. We then trained it on a minimal set of data we could acquire to train an ensemble of single class SVMs. The data was essentially the angles between the different keypoints output by the previous stage.

We classify single frames and not sequences in this work.

1.3 Future work

There are tons of ways to expand this. I’m listing some of them and will add more as we go along. Issues have been opened for all these.

  • [ ] Use tracking when a person is occluded (Particle filters) and their last movement was towards the road.
  • [ ] Use LSTMs to train what people are likely to do from a given action.
  • [ ] Generate synthetic data maybe using GANs or Unity3D?
  • [ ] Better metric for error.
  • [ ] Replace SVMs with a single neural net and use it in training of the pose estimator.

1.4 A brief introduction to the architecture

  • Get the pose estimate.
  • Get angles between the different skeletons.
  • Use the ensemble of single class SVMs to see if they fall in any of the dangerous categories.

NOTE: Additionally, in the project implementation, we used Hough transforms to find the pavement and reduce the region of interest to only the road. However, with further thought, it may be better to actually look at the entire region.

1.5 SubModules

The openpose library is used as a sub-module here.

2 References

@article{DBLP:journals/corr/CaoSWS16, author = {Zhe Cao and Tomas Simon and Shih{-}En Wei and Yaser Sheikh}, title = {Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields}, journal = {CoRR}, volume = {abs/1611.08050}, year = {2016}, url = {}, archivePrefix = {arXiv}, eprint = {1611.08050}, timestamp = {Mon, 13 Aug 2018 16:47:26 +0200}, biburl = {}, bibsource = {dblp computer science bibliography,} }

@article{DBLP:journals/corr/abs-1802-00434, author = {Riza Alp G{"{u}}ler and Natalia Neverova and Iasonas Kokkinos}, title = {DensePose: Dense Human Pose Estimation In The Wild}, journal = {CoRR}, volume = {abs/1802.00434}, year = {2018}, url = {}, archivePrefix = {arXiv}, eprint = {1802.00434}, timestamp = {Mon, 13 Aug 2018 16:46:31 +0200}, biburl = {}, bibsource = {dblp computer science bibliography,} }

@article{DBLP:journals/corr/GirshickDDM13, author = {Ross B. Girshick and Jeff Donahue and Trevor Darrell and Jitendra Malik}, title = {Rich feature hierarchies for accurate object detection and semantic segmentation}, journal = {CoRR}, volume = {abs/1311.2524}, year = {2013}, url = {}, archivePrefix = {arXiv}, eprint = {1311.2524}, timestamp = {Mon, 13 Aug 2018 16:48:09 +0200}, biburl = {}, bibsource = {dblp computer science bibliography,} }

@article{DBLP:journals/corr/Girshick15, author = {Ross B. Girshick}, title = {Fast {R-CNN}}, journal = {CoRR}, volume = {abs/1504.08083}, year = {2015}, url = {}, archivePrefix = {arXiv}, eprint = {1504.08083}, timestamp = {Mon, 13 Aug 2018 16:49:11 +0200}, biburl = {}, bibsource = {dblp computer science bibliography,} }

@article{DBLP:journals/corr/RenHG015, author = {Shaoqing Ren and Kaiming He and Ross B. Girshick and Jian Sun}, title = {Faster {R-CNN:} Towards Real-Time Object Detection with Region Proposal Networks}, journal = {CoRR}, volume = {abs/1506.01497}, year = {2015}, url = {}, archivePrefix = {arXiv}, eprint = {1506.01497}, timestamp = {Mon, 13 Aug 2018 16:46:02 +0200}, biburl = {}, bibsource = {dblp computer science bibliography,} }

@inproceedings{simon2017hand, author = {Tomas Simon and Hanbyul Joo and Iain Matthews and Yaser Sheikh}, booktitle = {CVPR}, title = {Hand Keypoint Detection in Single Images using Multiview Bootstrapping}, year = {2017} }

You can’t perform that action at this time.