Bottom-Up Top-Down Object Features

Those visual features were introduced by Anderson et. al. in the paper Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering.

They are extracted with a Faster R-CNN, and trained on the Visual Genome dataset to detect objects and their attributes (shapes, colors...).

multimodal provides a class to download and use those features extracted on the COCO image dataset. They can be used for most Visual Question Answering and Captionning that use the this dataset for images.

from multimodal.features import COCOBottomUpFeatures

bottomup = COCOBottomUpFeatures(
    features="coco-bottomup-36",
    dir_data="/data/multimodal",
)
image_id = 13455
feats = bottomup[image_id]
print(feats.keys())
# ['image_w', 'image_h', 'num_boxes', 'boxes', 'features']
print(feats["features"].shape)  # numpy array
# (36, 2048)

.. autoclass:: multimodal.features.COCOBottomUpFeatures
    :members:
    :private-members:
    :special-members: __getitem__

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

visual_features.rst

visual_features.rst

Bottom-Up Top-Down Object Features

Files

visual_features.rst

Latest commit

History

visual_features.rst

File metadata and controls

Bottom-Up Top-Down Object Features