Code for the MMM 2023 paper: Textual Concept Expansion with Commonsense Knowledge to Improve Dual-Stream Image-Text Matching.
- Python 3.6
- PyTorch 1.1.0
- NumPy (>1.12.1)
- TensorBoard
- torchtext
- pycocotools
- pytorch-lightning 1.2.8
- transformers 4.5.1
Download the dataset created by SCAN:
wget https://iudata.blob.core.windows.net/scan/data.zip
wget https://iudata.blob.core.windows.net/scan/vocab.zip
First, please download the stanford-postagger and put it into './stanford-postagger'.
The concepts vocabulary and other train data are generated by generate_vocab.py
:
python generate_vocab.py
The train data for multi-label classifer is generate by ML-Classifier/data/coco_data.py
Train our model on SLURM: sbatch tce_run_with_slurm_das_train.sh
.
DATA_PATH="./data/coco_annotations/Concept_annotations_coco_vocab"
LOG_PATH="./runs/coco_new/"
python train_coco.py --data_path "../../data/scan_data/data" \
--batch_size 512 \
--num_attribute 300 \
--model_name "$LOG_PATH/CVSE_COCO_data_omp_train_top_300_five_caption/" \
--concept_name "$DATA_PATH/data_omp_train_top_300_five_caption/category_concepts.json" \
--inp_name "$DATA_PATH/data_omp_train_top_300_five_caption/coco_concepts_glove_word2vec.pkl" \
--resume "none" \
--adj_file "$DATA_PATH/data_omp_train_top_300_five_caption/coco_adj_concepts.pkl" \
--adj_gen_mode "ReComplex" \
--t 0.3 \
--alpha 0.9 \
--attribute_path "$DATA_PATH/data_omp_train_top_300_five_caption/" \
--test_on "five" \
--re_weight 0.2 \
--logger_name "$LOG_PATH/CVSE_COCO/data_omp_train_top_300_five_caption_prediction/"
Test our model on SLURM.
Generate the Textual Concept Expansion by:
sbatch ml_run_with_slurm_das_test.sh
We upload our model here. You can download them and put them into the model directory.
Test the model by:
sbatch tce_run_with_slurm_das_test.sh
@inproceedings{liang2023textual, title={Textual Concept Expansion with Commonsense Knowledge to Improve Dual-Stream Image-Text Matching}, author={Liang, Mingliang and Liu, Zhuoran and Larson, Martha}, booktitle={MultiMedia Modeling: 29th International Conference, MMM 2023, Bergen, Norway, January 9--12, 2023, Proceedings, Part I}, pages={421--433}, year={2023}, organization={Springer} } }
We borrow the code from "CVSE" and "Multi-label Text Classification with BERT and PyTorch Lightning"