Skip to content
Alex Bozarth edited this page Jul 12, 2018 · 21 revisions

Short Name

Create a web app to interact with machine learning generated image captions

Short Description

Use an open-source image caption generator deep learning model to filter images based on their content in a web application.

Offering Type

Artificial Intelligence

Introduction

The introduction of the IBM Model Asset eXchange (MAX) has given application developers without data science experience easy access to prebuilt machine learning models. This code pattern shows how simple it can be to create a web app that utilizes a MAX model. The web app uses the Image Caption Generator from MAX and creates a simple web UI that allows users to filter images based on the descriptions given by the model.

Author

By Alex Bozarth and Daniel Jalova

Code

Video

Overview

Every day 2.5 quintillion bytes of data are created, based on an IBM study. A lot of that data is unstructured data, such as large texts, audio recordings, and images. In order to do something useful with the data, we must first convert it to structured data.

In this Code Pattern we will use one of the models from the Model Asset Exchange (MAX), an exchange where developers can find and experiment with open source deep learning models. Specifically we will be using the Image Caption Generator to create a web application that will caption images and allow the user to filter through images based image content. The web application provides an interactive user interface backed by a lightweight python server using Tornado. The server takes in images via the UI and sends them to a REST end point for the model and displays the generated captions on the UI. The model's REST endpoint is set up using the docker image provided on MAX. The Web UI displays the generated captions for each image as well as an interactive word cloud to filter images based on their caption.

When the reader has completed this Code Pattern, they will understand how to:

  • Build a Docker image of the Image Caption Generator MAX Model
  • Deploy a deep learning model with a REST endpoint
  • Generate captions for an image using the MAX Model's REST API
  • Run a web application that using the model's REST API

Architecture

Flow

  1. Server sends default images to Model API and receives caption data.
  2. User interacts with Web UI containing default content and uploads image(s).
  3. Web UI requests caption data for image(s) from Server and updates content when data is returned.
  4. Server sends image(s) to Model API and receives caption data to return to Web UI.

Included Components

  • IBM Model Asset Exchange: A place for developers to find and use free and open source deep learning models.
  • Docker: Docker is a tool designed to make it easier to create, deploy, and run applications by using containers.

Featured Technologies

  • Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.
  • JQuery: jQuery is a cross-platform JavaScript library designed to simplify the client-side scripting of HTML.
  • Bootstrap 3: Bootstrap is a free and open-source front-end library for designing websites and web applications.
  • Pexels: Pexels provides high quality and completely free stock photos licensed under the Creative Commons Zero (CC0) license.

Blog

Links

Libraries used in this Code Pattern

  • D3.js: D3.js is a JavaScript library for manipulating documents based on data.
  • d3-cloud: A Wordle-inspired word cloud layout written in JavaScript.
  • Featherlight: Featherlight is a very lightweight jQuery lightbox plugin.
  • Glyphicons: GLYPHICONS is a library of precisely prepared monochromatic icons and symbols, created with an emphasis to simplicity and easy orientation.
  • Image Picker: Image Picker is a simple jQuery plugin that transforms a select element into a more user friendly graphical interface.