Detecting and counting small objects - Analysis, review and application to counting
Switch branches/tags
Nothing to show
Clone or download


Analysis, review and application of Finding Tiny Faces (P. Hu) [1] with a focus on counting the many faces in a demonstration/crowd.
RecVis (MVA) course - Alexandre Attia, Sharone Dayan
You can find our pre-print report on ArXiv.


The paper - released at CVPR 2017 - deals with finding small objects (particularly faces in our case) in an image, based on scale-specific detectors by using features defined over single (deep) feature hierarchy : Scale Invariance, Image resolution, Contextual reasoning. The algorithm is based on foveal descriptors, i.e blurring the peripheral image to encode and give just enough information about the context, mimicking the human vision.
The subject is still an open challenge and we would like to enlarge it to new horizons and experiment this approach to different applications. The goal would be to deeply understand the choices of the paper, together with their applications on subjects related to security and identification. We are mainly focus on the inference part using a TensorFlow implementation, adapted from this repo.

Face detection benchmark

First, we aim at comparing the Tiny Faces algorithm with other face detection models.
We use two particular sub-folders of the WIDERFACE dataset (Parade and Dresses) to compare our model with Faster R-CNN for face detection (using MXNet, MTCNN[6] (using MXNET), Haar Cascade[2] and HOG[3].
This benchmark can be found in this notebook Benchmark

Image resolution influence

The performance of the Tiny Faces algorithm is linked with the image resolution. Indeed, this parameter really affects the face detection as explained in the original paper. We used the inference part and plotted the variations of detected faces while downscaling the image resolution.

Face Recognition

Face recognition can be another application of the paper. Thus, we aim at building a Python pipeline for face recognition. We would like to use face alignment[4] and face embedding[5] to achieve face classification.
The first application we would like to explore includes : counting the many different faces (numerous people displayed with different size in the picture) in a video of a crowded public demonstration.
This application can be found in this notebook.
In order to achieve it, we have to match people from one frame to another one to make sure the counting of a person is not redundant. The matching is achievied with face recognition and we count people with face detection. We used a linear SVM for the face classificaton. alt-text-1

Repository organisation

notebooks Notebooks folder with the different application and experiments File for the people matching in order to count people (cf the Counting people notebook) Inference function : detecting faces in one (or mulitple) picture Tiny Faces model Misc for overlay bounding boxes


[1] Peiyun Hu and Deva Ramanan. Finding Tiny Faces. 2017.
[2] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. 2001.
[3] Navneet Dalal and Bill Triggs. Histograms of Oriented Gradients for Human Detection. 2005.
[4] Vahid Kazemi and Josephine Sullivan. One Millisecond Face Alignment with an Ensemble of Regression Trees
[5] Florian Schroff, Dmitry Kalenichenko and James Philbin. FaceNet: A Unified Embedding for Face Recognition and Clustering. 2015
[6]Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, Yu Qiao. Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks