Skip to content

The official repository for creating casual action effect (CAE) dataset for the IJCNLP-AACL 2023 paper: Implicit Affordance Acquisition via Causal Action–Effect Modeling in the Video Domain

Notifications You must be signed in to change notification settings

Mallory24/cae_dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quick Start:

If you find this dataset helpful, please cite:

@article{yang2023affordance,
  title={Implicit Affordance Acquisition via Causal Action–Effect Modeling in the Video Domain},
  author={Yang, Hsiu-Yu and Silberer, Carina},
  journal={Proceedings of the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 13th International Joint Conference on Natural Language Processing},
  year={2023}
}

CAE Dataset Creation Steps:

Before Start:

Get the relevant resources:

Step 1: Get a list of sure/unsure result verbs

An ideal result verb should possess two properties: (1) visualness (2) effect-causing

$ cd result_verbs
$ python get_result_verbs.py

It should produce two JSON files: (1) sure_result_verbs.json and (2) unsure_result_verbs.json.
sure_result_verbs.json will be used for the following steps.

Step 2: Get relevant video clips from HowTo100M to derive the CAE dataset

* Note: We downsample the video pools by only selecting top 15 viewed videos per wikiHow task id, the list of downloaded video ids is: cae_vids.txt

python prepare_cae.py --meta_file meta_info/HowTo100M/HowTo100M_v1.csv \
--vids_file meta_info/cae_vids.txt \
--subtitles meta_info/HowTo100M/raw_caption_superclean.json \
--result_verbs meta_info/sure_result_verbs.json \
--concrete_word_file meta_info/Concreteness_ratings_Brysbaert_et_al_BRM.txt \
--categories arts,cars,computers,education,family,food,health,hobbies,holidays,home,personal,pets,sports \
--output_dir $CAE/subtitles \
--cache_dir $CAE/subtitles/domain_cache 
--process all

After running the above code, you should see the following folder structure:

    ├── $CAE/subtitles
        ├── domain_cache
            ├── arts
            ├──...
        ├── cae.json
        ├── single_result_verbs_video_clips.json
        ├── single_result_verbs_video_clips_by_vid.json
        ├── single_frames_verbs_stats.json
        ├── single_verbs_nouns_stats.json
        ├── multiple_result_verbs_video_clips.json
        └── multiple_result_verbs_video_clips_by_vid.json
  • cae.json is structured along video clip id:
{"roigpbZ6Dpc_38":{'vid': 'roigpbZ6Dpc', 'vid seg': 38, 'time stamp': '1:30:1:36', 'caption': "you so to start off we're going to make", 'domain': 'arts', 'frames': ['Building'], 'verb': 'make', 'nouns': []
 "roigpbZ6Dpc_73": {'vid': 'roigpbZ6Dpc', 'vid seg': 73, 'time stamp': '3:18:3:22', 'caption': 'make this hot cocoa where you want to', 'domain': 'arts', 'frames': ['Building'], 'verb': 'make', 'nouns': ['cocoa']}
}
...
  • single_result_verbs_video_clips.json is structured along the FrameNet frame:
"Building":{"make":{"arts":{"roigpbZ6Dpc":
[{"vid":"roigpbZ6Dpc","vid seg":38,"time stamp":"1:30:1:36","caption":"you so to start off we're going to make","domain":"arts","frames":["Building"],"verb":"make","nouns":[]},
 {"vid":"roigpbZ6Dpc","vid seg":73,"time stamp":"3:18:3:22","caption":"make this hot cocoa where you want to","domain":"arts","frames": ["Building"],"verb":"make","nouns":["cocoa"]},
...
  • single_result_verbs_video_clips_by_vid.json is structured along a unique video id:
{"roigpbZ6Dpc":
  {"38":{"vid":"roigpbZ6Dpc","vid seg":38,"time stamp":"1:30:1:36","caption":"you so to start off we're going to make","domain":"arts","verbs":["make"],"all_frames":  [["Building"]],"all_nouns":[[]]},
   "73":{"vid":"roigpbZ6Dpc","vid seg":73,"time stamp":"3:18:3:22","caption":"make this hot cocoa where you want to","domain":"arts","verbs":["make"],"all_frames":[["Building"]],"all_nouns":[["cocoa"]]}
  ...
  • For additional statistic information:

    1. single_frames_verbs_stats.json: video clip counts by unique video clip id across verbs and video domains.
    2. single_verbs_nouns_stats.json: (verb, noun) co-occurrence statistics across verbs and video domains.
  • For video clips containing multiple result verbs, check:

    1. multiple_result_verbs_video_clips.json
    2. multiple_result_verbs_video_clips_by_vid.json

* Note: single_result_verbs_video_clips.json and single_frames_verbs_stats.json will be used for the following steps.

Step 3: Split the CAE dataset into train/val/test set

* Note:

  • Since we would like to test verb generalizaiton ability, we tailored the seen verb classes to exclude verbs in kinectics400 dataset, which was used in training the video feature extraction model.
  • This step is customizable according to your experimental setup.
python split_cae.py --video_clips subtitles/single_result_verbs_video_clips.json \
--frame_verb_stats subtitles/single_frames_verbs_stats.json \
--fixed_seen_verb_list meta_info/kinectics400_joint_verb_labels.txt \
--seeds '42' \
--categories arts,cars,computers,education,family,food,health,hobbies,holidays,home,personal,pets,sports \
--output_dir $CAE/single_result_verb_exp

After running the above code, you should see the following folder structure:

    ├── $CAE/single_result_verb_exp
    │   ├── 42
    │   │    ├── eval_table
    │   │    │    └── eval_table.json
    │   │    ├── train
    │   │    │    └── train.json
    │   │    ├── val
    │   │    │    └── val.json
    │   │    ├── test
                └── test.json

About

The official repository for creating casual action effect (CAE) dataset for the IJCNLP-AACL 2023 paper: Implicit Affordance Acquisition via Causal Action–Effect Modeling in the Video Domain

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages