-
Notifications
You must be signed in to change notification settings - Fork 17
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
tests on GPU and minor modifications
- Loading branch information
1 parent
3bcf9b2
commit 443b595
Showing
4 changed files
with
37 additions
and
45 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Diverse Beam Search | ||
|
||
This code implements diverse beam search - an approximate inference algorithm that generates diverse decodings. This repository demos the method for image-captioning using [neuraltalk2][1] | ||
|
||
## Requirements | ||
You will need to install [torch](http://torch.ch/) and the packages | ||
- `nn` | ||
- `nngraph` | ||
- `image` | ||
- `loadcaffe` | ||
- `hdf5` (optional, depending on how you want to input data) | ||
You might want to install torch using [this](https://github.com/torch/distro) repository. It installs a bunch of the requirements. | ||
Additionally, if you are using a GPU you will need to install `cutorch` and `cunn`. If the image-captioning checkpoint was trained using `cudnn`, you will need to download `cudnn`. First, you will need to download it from NVIDIA's [website](https://developer.nvidia.com/cudnn) and add it to your `LD_LIBRARY_PATH`. | ||
Any of the checkpoints distributed by Andrej Karpathy along with the [neuraltalk2][1] repository can be used with this code. Additionally, you could also train your own model using [neuraltalk2][1] and use this code to sample diverse sentences. | ||
|
||
## Generating Diverse Sequences | ||
After installing all the dependencies, you should be able to obtain diverse captions by: | ||
``` | ||
$ th -model /path/to/model.t7 -num_images 1 -image_folder eval_images -gpuid -1 | ||
``` | ||
To run a beam search of size 10 with 5 diverse groups and a diversity strength of 0.5 on the same image you would do: | ||
``` | ||
$ th -model /path/to/model.t7 -B 10 -M 5 -lambda 0.5 -num_images 1 -image_folder eval_images -gpuid -1 | ||
``` | ||
|
||
## References | ||
[1]: https://github.com/karpathy/neuraltalk2 | ||
|
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
[{"captions":[{"logp":-5.0367479324341,"sentence":"a dog with a frisbee in its mouth"},{"logp":-6.2141075134277,"sentence":"a small dog with a frisbee in its mouth"},{"logp":-6.6345801353455,"sentence":"a dog is playing with a frisbee in the grass"},{"logp":-7.0890574455261,"sentence":"a black and white dog with a frisbee in its mouth"},{"logp":-7.9146413803101,"sentence":"a dog is playing with a frisbee in the yard"},{"logp":-8.264461517334,"sentence":"a small dog with a frisbee in his mouth"},{"logp":-9.0053958892822,"sentence":"a black and white dog with a frisbee in his mouth"},{"logp":-9.5824060440063,"sentence":"a dog with a frisbee in its mouth running in a field"},{"logp":-9.7516613006592,"sentence":"the dog is running in the grass with a frisbee"},{"logp":-9.9132232666016,"sentence":"the dog is in the grass with a frisbee"}],"image_id":"dog.jpg"}] |