This project uses the source code created by Satoshi Tsutsui. See also his work based on his code at the arXiv - Using Artificial Tokens to Control Languages for Multilingual Image Caption Generation: arXiv:1706.06275
- Raspberry Pi 3
- Camera module
- External speaker
- Mobile battery
- Setup Raspbian OS environment refering to Rasbpberry Pi web site (Assuming to use RASPBIAN LITE version)
- Enable camera module by
sudo raspi-config
-> Interfacing Options -> Camera -> "Would you like..." -> Yes and reboot - Change time zone as necessary by Localisation Options -> Change Timezone
- Update OS by
sudo apt-get update
followed bysudo apt-get upgrade
Install required programs and tools by commands below.
sudo apt-get install python3-pip
sudo pip3 install chainer==1.19.0
sudo pip3 install scipy
sudo pip3 install h5py
sudo apt-get install python-h5py
sudo apt-get install libopenjp2-7-dev
sudo apt-get install libtiff5
sudo pip3 install Pillow
sudo apt-get install espeak
sudo apt-get install python3-picamera
sudo apt-get install libatlas-base-dev
Download from the Git repository and test sample code by commands below.
sudo apt-get install git
git clone https://github.com/apple2373/chainer-caption.git
cd chainer-caption
bash download.sh
python3 sample_code_beam.py --rnn-model ./data/caption_en_model40.model --cnn-model ./data/ResNet50.model --vocab ./data/MSCOCO/mscoco_caption_train2014_processed_dic.json --gpu -1 --img ./sample_imgs/COCO_val2014_000000185546.jpg
If installed properly, the result will be displayed as below with some warnings.
<sos> a bathroom with a toilet and a shower <eos>
-6.967587262392044
<sos> a bathroom with a toilet , sink , and mirror <eos>
-7.618740811944008
<sos> a bathroom with a toilet , sink , and shower <eos>
-8.537529528141022
Run the captioning program by python3 image_captioning.py
.