Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


This is the repository for experiments on the MSCOCO classes mentioned in the paper Unsupervised Hard Example Mining from Videos for Improved Object Detection mentioned in Section 5(Discussion).

We used the original version of py-faster-rcnn-ft to fine-tune the VGG16 network pretrained on ImageNet dataset to convert it to a binary classifier for an MSCOCO category. Once we had the classifier as the backbone network of the Faster RCNN, we used it to label all the frames within a video for the presence of that particular MSCOCO category. Using the labelled frames, we were able to identify the frames containing hard negatives with the help of our algorithm. Finally, we fine tuned the network again after including the frames containing hard negatives and evaluated the network for improvements using held out validation and test sets.

For our research, we carried out experiments on two MSCOCO categories, Dog and Train.

Steps :-

1. Preparing a Faster RCNN object detector on an MSCOCO category

Follow the steps mentioned in the py-faster-rcnn-ft repository to prepare a VGG16 Faster RCNN network trained on an MSCOCO category of your choice.

2. Label the videos with detections

Scrape the web and download videos that are likely to contain a lot of instances of your chosen category. Helper code to download youtube videos can be found here. Once the videos have been downloaded, run the detections code to label each frame of every video with bounding boxes and confidence scores for that category. See Usage

The list of videos we used is mentioned below :-

  1. Dog videos
  2. Train videos

3. Hard negative mining

The detections code outputs a txt file containing frame wise labeling and bounding box information. Use the hard negative mining code on the detections txt file to output the frames containing hard negatives and a txt file containing the bounding box information on those frames. See Usage.

4. Include the video frames containing hard negatives in the COCO dataset and fine-tune

Use the COCO annotations editor located inside utils to include the frames containing hard negatives in MSCOCO dataset. One the frames have been included in the COCO dataset, fine-tune to get an improved network. See Usage

Results :-

A summary of the results is mentioned below :-

Category Model Training Iterations Training Hyperparams Validation set AP Test set AP
Dog Baseline 29000 LR : 1e-3 for 10k,
1e-4 for 10k-20k,
1e-5 for 20k-29k
26.9 25.3
Flickers as HN 22000 LR : 1e-4 for 15k,
1e-5 for 15k-22k
28.1 26.4
Train Baseline 26000 LR : 1e-3,
stepsize : 10k,
lr decay : 0.1
33.9 33.2
Flickers as HN 24000 LR : 1e-3,
stepsize : 10k,
lr decay : 0.1
35.4 33.7

A few examples on the reduction in false positives achieved for the 'Dog' category are mentioned below :-

Baseline Flickers as HN


Repository for experiments on MSCOCO for Unsupervised Hard Example Mining from Videos for Improved Object Detection(






No releases published


No packages published