Unsupervised clustering of movie posters with features extracted from Convolutional Neural Network
Switch branches/tags
Nothing to show
Clone or download
Latest commit 6e21802 Nov 28, 2018


Build Status codecov



Unsupervised clustering of movie posters with features extracted from Convolutional Neural Network. Visualization using flask as a backend and d3js for the frontend.

This project is divided into 3 main scripts:

  • get_posters.py
    • retrieve the posters from impawards.com.
    • create a thumbnail for each posters for the visualization.
  • get_features_from_cnn.py
    • extract the last convolution layer of a pre-trained ConvNet (VGG-16 or ResNet50)
  • get_data_visu.py
    • dimension reduction for data-visualization with umap.
    • compute the cosine similarity and extract the 6 ``closest'' images for each posters.

To get parameters descriptions:

  • python src/get_XXX.py --help



  • Linux/Unix/OSX (requirement for wget)
  • Python 3.3+
  • ImageMagick
  • Postgresql

Packages Python

  • BeautifulSoup 4.4
  • Tensorflow
  • Keras
  • Pandas
  • requests
  • sklearn
  • numpy
  • PIL
  • flask


The extraction of the features from ConvNet is long if you do not owned a GPU. The computation of the similarity between each posters required O(n^2) in memory which required around 32Go of RAM.


Clone the depot:

$ git clone https://github.com/adrz/movie-posters-convnet.git
$ cd movie-posters-convnet/
$ virtualenv -p python3 env
$ source env/bin/activate
$ pip install -r requirements-gpu.txt

Create postgresql database (supposed you already install postgresql):

$ psql -U postgres -c "createuser movieposters;"
$ psql -U postgres -c "createdb movieposters;"
$ psql -U postgres -c "alter user movieposters with encrypted password 'yourpassword';"
$ psql -U postgres -c "grant all privileges on database movieposters to movieposters ;"



After cloning you can just launch the bash script that will:

  • download posters from 1920 to 2016
  • compute features
  • compute the datavisualization features
$ python src/get_posters.py -c config/development.conf
$ python src/get_get_features_from_cnn.py -c config/development.conf
$ python src/get_data_visu.py -c config/development.conf

Then grab a coffee...


$ source env/bin/activate
$ configapi=./config/development.conf
$ python app.py

Then launch index.html into your favorite browser:

$ chromium


$ chromium


Cherry-piking from the top-200 closest couple of posters (relative to cosine distance):


This project is licensed under the MIT License - see the LICENSE.md file for details