Skip to content

Anastasiais-ml/TCE-CVSE

Repository files navigation

TCE-CVSE

Code for the MMM 2023 paper: Textual Concept Expansion with Commonsense Knowledge to Improve Dual-Stream Image-Text Matching.

Requirements and Installation

  • Python 3.6
  • PyTorch 1.1.0
  • NumPy (>1.12.1)
  • TensorBoard
  • torchtext
  • pycocotools
  • pytorch-lightning 1.2.8
  • transformers 4.5.1

Data

Download the dataset created by SCAN:

wget https://iudata.blob.core.windows.net/scan/data.zip
wget https://iudata.blob.core.windows.net/scan/vocab.zip

First, please download the stanford-postagger and put it into './stanford-postagger'. The concepts vocabulary and other train data are generated by generate_vocab.py:

python generate_vocab.py

The train data for multi-label classifer is generate by ML-Classifier/data/coco_data.py

Train the model

Train our model on SLURM: sbatch tce_run_with_slurm_das_train.sh.

DATA_PATH="./data/coco_annotations/Concept_annotations_coco_vocab"
LOG_PATH="./runs/coco_new/"

python train_coco.py --data_path "../../data/scan_data/data" \
  --batch_size 512 \
  --num_attribute 300 \
  --model_name "$LOG_PATH/CVSE_COCO_data_omp_train_top_300_five_caption/" \
  --concept_name "$DATA_PATH/data_omp_train_top_300_five_caption/category_concepts.json" \
  --inp_name "$DATA_PATH/data_omp_train_top_300_five_caption/coco_concepts_glove_word2vec.pkl" \
  --resume "none" \
  --adj_file "$DATA_PATH/data_omp_train_top_300_five_caption/coco_adj_concepts.pkl" \
  --adj_gen_mode "ReComplex" \
  --t 0.3 \
  --alpha 0.9 \
  --attribute_path "$DATA_PATH/data_omp_train_top_300_five_caption/" \
  --test_on "five" \
  --re_weight 0.2 \
  --logger_name "$LOG_PATH/CVSE_COCO/data_omp_train_top_300_five_caption_prediction/"

Evaluation

Test our model on SLURM. Generate the Textual Concept Expansion by: sbatch ml_run_with_slurm_das_test.sh

We upload our model here. You can download them and put them into the model directory. Test the model by: sbatch tce_run_with_slurm_das_test.sh

Citation

@inproceedings{liang2023textual,
  title={Textual Concept Expansion with Commonsense Knowledge to Improve Dual-Stream Image-Text Matching},
  author={Liang, Mingliang and Liu, Zhuoran and Larson, Martha},
  booktitle={MultiMedia Modeling: 29th International Conference, MMM 2023, Bergen, Norway, January 9--12, 2023, Proceedings, Part I},
  pages={421--433},
  year={2023},
  organization={Springer}
}
}

We borrow the code from "CVSE" and "Multi-label Text Classification with BERT and PyTorch Lightning"

About

Textual Concept Expansion with Commonsense Knowledge to Improve Dual-Stream Image-Text Matching

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published