Remote Sensing Image Captioning Based on Multi-layer Aggregated Transformer

Here, we provide the pytorch implementation of the paper: "Remote Sensing Image Captioning Based on Multi-Layer Aggregated Transformer".

For more information, please see our published paper in [IEEE | Lab Server] (Accepted by GRSL 2022)

Train

Firstly, download the RSICD dataset. Then preprocess the data as follows:

python create_input_files.py --karpathy_json_path ./RSICD_captions/dataset_rsicd.json --image_folder ./RSICD_captions/images/

After that, you can find some resulted metadata files in ./data/ Secondly, train model:

python train5.py

Note: During training, beam search is not used when computing scores on the validation set. To get the evaluation score with the beam search strategy, run the following command to get the score on the test set:

Test

python eval.py

Citation:

@ARTICLE{9709791,
  author={Liu, Chenyang and Zhao, Rui and Shi, Zhenwei},
  journal={IEEE Geoscience and Remote Sensing Letters}, 
  title={Remote-Sensing Image Captioning Based on Multilayer Aggregated Transformer}, 
  year={2022},
  volume={19},
  number={},
  pages={1-5},
  doi={10.1109/LGRS.2022.3150957}}

Reference:

Thanks to the following repository: a-PyTorch-Tutorial-to-Image-Captioning

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.idea		.idea
data		data
eval_func		eval_func
images		images
README.md		README.md
caption.py		caption.py
create_input_files.py		create_input_files.py
datasets.py		datasets.py
eval.py		eval.py
models3.py		models3.py
models5_fusion.py		models5_fusion.py
readme.txt		readme.txt
run.py		run.py
train5.py		train5.py
transformer3.py		transformer3.py
transformer4.py		transformer4.py
utils.py		utils.py

Chen-Yang-Liu/MLAT

Folders and files

Latest commit

History

Repository files navigation

Remote Sensing Image Captioning Based on Multi-layer Aggregated Transformer

Train

Test

Citation:

Reference:

About

Resources

Stars

Watchers

Forks

Languages