Skip to content

HughKu/Im2txt

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
August 5, 2018 18:08
August 5, 2018 14:14
August 5, 2018 18:29
September 13, 2019 11:50

Note:

This repo aims to provide a Ready-to-Go setup with TensorFlow environment for Image Captioning Inference using pre-trained model. For training from scratch or funetuning, please refer to Tensorflow Model Repo.

Contents

Model Overview

Introduction

The Show and Tell model is a deep neural network that learns how to describe the content of images. For example:

Example captions

Show and Tell: A Neural Image Caption Generator

A TensorFlow implementation of the image-to-text model described in the paper:

"Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge."

Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan.

IEEE transactions on pattern analysis and machine intelligence (2016).

Full text available at: http://arxiv.org/abs/1609.06647

Architecture

Please refer to the original Tensorflow Model Repo.

Requirement

Install Required Packages

I strongly suggest that you run pip install -r requirement.txt in your CLI to get all packages needed.

OR you could opt for manually installing the required packages below:

Get Pre-trained Model

Download inceptionv3 finetuned parameters over 1M and you will get 4 files, and make sure to put them all into this path im2txt/model/Hugh/train/

  • newmodel.ckpt-2000000.data-00000-of-00001
  • newmodel.ckpt-2000000.index
  • newmodel.ckpt-2000000.meta
  • checkpoint

Generating Captions

Your downloaded Show and Tell model can generate captions for any JPEG image! The following command line will generate captions for such an image.

python im2txt/run_inference.py --checkpoint_path="im2txt/model/Hugh/train/newmodel.ckpt-2000000" --vocab_file="im2txt/data
/Hugh/word_counts.txt" --input_files="im2txt/data/images/test.jpg"

Example output:

Captions for image test.jpg:
  0) a young boy wearing a hat and tie . (p=0.000195)
  1) a young boy wearing a blue shirt and tie . (p=0.000100)
  2) a young boy wearing a blue shirt and a tie . (p=0.000045)

Note: you may get different results. Some variation between different models is expected.

Here is the image:

ME

Encoutering Issue

First, check out on this thread and it's likely that you find answer there. Otherwise, open an issue and I will try to help you.

About

Image captioning ready-to-go inference: show and tell model compatible with Tensorflow r1.9

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published