A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video
This repository contains the code for the paper A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video.
This repository assumes that you run the code with Docker.
- Setup wandb
Please make wandb account and make .wandb_api_key.txt file.
cd docker_setting
echo $YOUR_WAND_API_KEY > .wandb_api_key.txt
Please edit `.project_config.sh` to specify your work directory.
emacs .project_config.sh
WORK_DIR=/path/to/your/work/dir
cd docker_setting
zsh build.sh
cd ./experiments/coco_pretrain_flan_base
zsh env_setup.sh
zsh ./tools/interactive.sh
cd /work
huggingface-cli download tohoku-nlp/multi-vidsum --local-dir . --local-dir-use-symlinks False --repo-type dataset
tar -xzvf download.tar.gz --no-same-owner
cd experiments/coco_pretrain_flan_base
zsh env_setup.sh
zsh ./scripts/prepro.sh ./configs/prepro_config_train.jsonnet
zsh ./scripts/prepro.sh ./configs/prepro_config_valid.jsonnet
zsh ./scripts/prepro.sh ./configs/prepro_config_test.jsonnet
# Train with 4 GPUs
zsh ./scripts/train.sh ./configs/train_config.jsonnet 0 1 2 3
cd experiments/anet_fine_tuning_flan_base_wide
zsh env_setup.sh
zsh ./scripts/prepro.sh ./configs/prepro_config_train.jsonnet
zsh ./scripts/prepro.sh ./configs/prepro_config_valid.jsonnet
zsh ./scripts/prepro.sh ./configs/prepro_config_test.jsonnet
# Train with 4 GPUs
zsh ./scripts/train.sh ./configs/train_config.jsonnet 0 1 2 3
When you just want to use our pretrained model, you need to edit `./configs/train_config.jsonnet`. Please change `model_name_or_path` to `model_name_or_path: “%s/pretrained_models/Video2TextRerankerT5PL” % DOWNLOAD_DIR,` as commented out in the file. When you want to use your trained model following the above instructions, you don’t need to edit `./configs/train_config.jsonnet`.
cd experiments/anet_fine_tuning_flan_base_wide
zsh env_setup.sh
# Please use config file that is suitable for your captioning model. Please refer to the contents of ./configs for details.
zsh ./scripts/prepro.sh ./configs/prepro_config_instruct_blip_t5_few_shot.jsonnet
# Inference with 4 GPUs
zsh ./scripts/test.sh ./configs/inference_normalized_balanced_seq_beam_8_instruct_blip_t5_few_shot_config.jsonnet 0 1 2 3
Results will be saved in `/work/experiment_results/anet_fine_tuning_flan_base_wide_seed_42/logs/reranking_result_${date}.json`.
cd experiments/anet_fine_tuning_flan_base_wide
zsh env_setup.sh
zsh ./scripts/prepro.sh ./configs/prepro_config_train.jsonnet
zsh ./scripts/prepro.sh ./configs/prepro_config_valid.jsonnet
zsh ./scripts/prepro.sh ./configs/prepro_config_test.jsonnet
# Inference with 4 GPUs
zsh ./scripts/test.sh ./configs/test_config.jsonnet 0 1 2 3
Results will be saved in `/work/experiment_results/anet_fine_tuning_flan_base_wide_seed_42/logs/generation_result_${date}.json`.