Environment

Code for both "Visual Grounding Helps Learn Word Meanings in Low-Data Regimes" and "Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling"

Environment

Python 3.9, transformer package in huggingface, and datasets package in huggingface.

Training requires pt_framework repo (link).

Evaluation requires lm_eval repo (link).

Install this repo via pip install -e .

Model Training

Dataset preparation

Conceptual-12M is used in both papers. Because downloading the images of this dataset can be hard, we provide visual states precomputed from a DINO-pretrained ViT-Base. These precomputed states and the other needed files are put in a zip file, which can be downloaded from this link.

This repo uses environment variables to check where the dataset is and to decide where the models and evaluation results will be stored, see the following two varialbes:

export ROOT_DIR_FREQ_ORG="/path/to/store_folder"
export DATASET_ROOT_DIR_FREQ="/path/to/dataset_folder"

The downloaded zip file should be extracted to ${DATASET_ROOT_DIR_FREQ} so that the Conceptual-12M folder will be directly under ${DATASET_ROOT_DIR_FREQ}.

Ground-Only Training (Conceptual-12M)

To train the models, go to the scripts folder. The training command is generally the following:

python -m torch.distributed.launch --nproc_per_node=1 --master_port=29123 general_train.py --setting ${SETTING}

The SETTING variable decides which model will be trained.

LexiContrastive Grounding (LCG)

The LCG models use the following SETTING variable: ground_only/exp_clip.py:idx_base_bs128_${size}_lcg_ly6_git_like_clip_s${seed}. Here the ${size} can be one of the following: 100K, 500K, 1M, 5M, 15M, 50M, which represents the number of tokens in the training captions. The ${seed} variable can be one of the following: 1, 2, 11, 12. The SETTING varialbe points the training script to a config function defined inside the src/llm_devo/configs/ground_only/exp_clip.py file.

CLIP

The Visual + Language models in the first paper (also the CLIP models shown in the second paper) use the following SETTING variable: ground_only/exp_clip.py:idx_base_bs512_${size}_cached_git_like_np1_clip_s${seed}.

The Visual + Word models in the first paper use the following SETTING variable: ground_only/exp_clip.py:idx_single_words_bs512_${size}_cached_git_like_np1_clip_s${seed}.

GIT

The Visual + Language models in the first paper (also the GIT models shown in the second paper) use the following SETTING variable: ground_only/exp_git_lang_only.py:idx_base_${size}_dino_cached_50M_gitl_s${seed}.

The Visual + Word models in the first paper use the following SETTING variable: ground_only/exp_git_lang_only.py:idx_single_words_${size}_dino_cached_50M_gitl_s${seed}.

Language-Only

The Language-Only models in the both papers use the following SETTING variable: ground_only/exp_git_lang_only.py:txt_base_${size}_noimg_tie_lyrs_6_gitl_s${seed}.

The Word-Only Baseline models in the first paper use the following SETTING variable: ground_only/exp_git_lang_only.py:txt_single_words_${size}_noimg_tie_lyrs_6_gitl_s${seed}.

Flamingo

COMING SOON.

Mixed Unground-Ground Training

COMING SOON.

Model Evaluation

Word-Relatedness

See the README file in the ./src/llm_devo/word_sim/ folder.

Semantic-Feature Prediction

See the README file in the ./src/llm_devo/word_norm/ folder.

Lexical-Relation Prediction

See the README file in the ./src/llm_devo/lexical_relation/ folder.

Part-Of-Speech Prediction

See the README file in the ./src/llm_devo/pos_pred/ folder.

Context-based Word Understanding

See the README file in the ./src/llm_devo/word_understand/ folder.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
scripts		scripts
src/llm_devo		src/llm_devo
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts

scripts

src/llm_devo

src/llm_devo

.gitignore

.gitignore

README.md

README.md

setup.py

setup.py

Repository files navigation

Environment

Model Training

Dataset preparation

Ground-Only Training (Conceptual-12M)

LexiContrastive Grounding (LCG)

CLIP

GIT

Language-Only

Flamingo

Mixed Unground-Ground Training

Model Evaluation

Word-Relatedness

Semantic-Feature Prediction

Lexical-Relation Prediction

Part-Of-Speech Prediction

Context-based Word Understanding

About

Releases

Packages

Languages

EvLab-MIT/LexiContrastiveGrd

Folders and files

Latest commit

History

Repository files navigation

Environment

Model Training

Dataset preparation

Ground-Only Training (Conceptual-12M)

LexiContrastive Grounding (LCG)

CLIP

GIT

Language-Only

Flamingo

Mixed Unground-Ground Training

Model Evaluation

Word-Relatedness

Semantic-Feature Prediction

Lexical-Relation Prediction

Part-Of-Speech Prediction

Context-based Word Understanding

About

Resources

Stars

Watchers

Forks

Languages