Advertisement search interface based on image similarity.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
LICENSE
README.md Update README.md Sep 29, 2017
annoy_indexer.py
annoy_web.py Add license info Sep 29, 2017
global.css
global.js Add front-end files Sep 28, 2017
index.html

README.md

Siamese

Siamese is a search interface for newspaper advertisements based on image similarity. It returns a set of nearest neighbors for a query image grouped by time period, which can be set at various lengths. It includes a graphical interface that presents the top 10 nearest neighbors along with a timeline of nearest neighbors for each year in the dataset.

Background

Siamese was created during the KB Researcher-in-Residenceship of Melvin Wevers (UU) to explore a set of 426,777 high resolution images of historical newspaper advertisements from two Dutch national newspapers: Algemeen Handelsblad (1945-1969) and NRC Handelsblad (1970-1994). Vector representations of the original images were obtained from the next-to-last layer of the Tensorflow Inception image classifier containing a 2048 float description of the image. These representations were indexed for approximate nearest neighbor search with Annoy, creating indices for a number of time period lenghts. A set of thumbnails scaling down the images to a maximum height of 300 pixels was generated to speed up web access.

Usage

Given the availability of appropriately structured data, indices for e.g. each year and decade can be built with:

import annoy_indexer

indexer = annoy_indexer.AnnoyIndexer(vector_dir='vectors', index_dir='indices-eucl', n_dimensions=2048, metric='euclidean')
indexer.build(n_trees=100, step_sizes=[10, 1])

These can now be queried with an identifier for a specific image from the set. To retrieve, for example, the 5 nearest neighbors for each decade:

indexer.load(step_sizes=[10])
indexer.query_all('KBNRC01:000028496:mpeg21:a0065', n_nns=[5])

Web API

Running annoy_web.py starts a Bottle web application that accepts parameters:

  • urn the identifier of the query image
  • nns the number of nearest neighbors to be returned
  • step the time scale for the query specified as number of years
  • vectors whether or not the vectors of the images should be included in the response

Online demo

More information

For more information, instructions and examples, see http://lab.kb.nl/tool/siamese.