Mindspore Implementation of "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
- Mindspore=2.0.0
- Convolutional Neural Networks
- Long Short Term Memory Cells
- Attention Mechanism
- Ascend910 or RTX 3090
- Mindspore=2.0.0
- Python=3.8.0
- mode=ms.PyNative
Clone the repo:
git clone https://github.com/NicholasKX/ShowAttendTell.git
- Prepare Dataset (Flickr8k).
- Extract and move images to a folder named Images and text to captions.txt.
- Put the folder containing Images and captions.txt in a folder named flickr8k
- Use Andrej Karpathy's training, validation, and test splits.
-- flickr8k
|-- Images
|-- 1000268201_693b08cb0e.jpg
|-- ......
|-- captions.txt
|-- train.csv
|-- val.csv
|-- test.csv
- Run the following command :
python train.py
- You should specify the path of the checkpoint file in the train.py file.
- You can change the hyperparameters in the train.py file alternatively.
- The model will be saved in the model_saved folder
- Download the checkpoint file and put it in the model_saved folder.
- Run the following command :
python caption.py --img <path_to_image> --beam_size <beam search>
- Run the following command :
python evaluation.py
Some of the results obtained are shown below :
Caption : a dog is running on the beach .
Caption : a man is standing on top of a mountain .
Bad Case:
Caption : a man rides a motorcycle.