Cross Modal Transformer for Intentonomy

This repository contains the Cross Modal Transformer for Intentonomy project for the CS 395T Deep Learning Seminar course with Philipp Krähenbühl.

The project is built off of Intentonomy: a Dataset and Study towards Human Intent Understanding. You can find its README in intent.md.

Download Data

Create a data folder in this repository by running mkdir -p ./data/2020intent/annotations.
Download the annotations to ./data/2020intent/annotations/ as indicated in DATA.md, but instead of downloading the original data files (intentonomy_[train/val/test]2020.json), download the modified files (intentonomy_[train/val/test]2020_ht_bert.json) from this Google Drive directory.
Open up a tmux, cd to this repository, and run python3 download_data.py.
You can use ctrl+b then d to temporarily minimize the tmux, then run tmux attach to reconnect to it. This allows you to download the dataset in the background, even when you log off.

Environment Setup

Login to eldar-11.cs.utexas.edu.
Download the latest miniconda into your scratch space on Condor:

cd /scratch/cluster/${USER}/
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod 755 Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh

Follow through the installation process:
- yes to the license agreement.
- Install miniconda3 to /scratch/cluster/${USER}/miniconda3.
- yes to conda init.
- conda config --set auto_activate_base false to not run conda all the time.
- Do anything else it tells you to do for .bashrc if needed.
- Add conda's path to your PATH environment variable. One way of doing this is to add PATH=${PATH}:/scratch/cluster/${USER}/miniconda3/bin/ to your ~/.profile.
Clone this repository to anywhere in your Condor scratch space.
cd the env/ folder of this repository and setup the environment by running ./install.sh.
Run conda activate intent and if you get the error conda: command not found run source /scratch/cluster/${USER}/miniconda3/etc/profile.d/conda.sh
Test that your environment works by running the following in python:

>>> import torch
>>> torch.cuda.is_available()
True
>>>

Train Command

Here's an example command:

python3 -m train \
    --name baseline \
    --model_type vis_baseline \
    --bs 50 \
    --lr 1e-3 \
    --linear_warmup \
    --warmup_epochs 5 \
    --epochs 50 \
    --use_loc_loss \
    --loc_loss_alpha 1.0

The model will be trained and saved at models/baseline and logged at logs/baseline. You can use tensorboard --logdir logs to view it. This command correspond to training with batch size 50, learning rate of 0.001, linearly increase learning rate from 0 to 0.001 for the first 5 epochs, training for 50 epochs, and use the localization loss.

For the Visual VirTex model, run the following:

python3 -m train \
    --name virtex \
    --model_type vis_virtex \
    --bs 50 \
    --lr 1e-3 \
    --linear_warmup \
    --warmup_epochs 5 \
    --epochs 50

For the Visual Swin Transformer (Small) model, run:

python3 -m train \
    --name swin_small \
    --model_type vis_swin_small \
    --bs 50 \
    --lr 1e-5 \
    --wd 1e-8 \
    --opt_type adamw \
    --epochs 50

You can specify the optimizer type with the --opt_type parameter. --wd corresponds to the weight decay of the optimizer. Swin transformers are fine-tuned with the parameters given here on ImageNet, which we tried to use for Intentonomy as well.

For the Visual + Hashtag models, run:

Baseline:

python3 -m train \
    --name baseline \
    --model_type baseline \
    --bs 50 \
    --lr 1e-3 \
    --linear_warmup \
    --warmup_epochs 5 \
    --epochs 50 \
    --use_loc_loss \
    --loc_loss_alpha 1.0 \
    --use_hashtags

VirTex:

python3 -m train \
    --name virtex \
    --model_type virtex \
    --bs 50 \
    --lr 1e-3 \
    --linear_warmup \
    --warmup_epochs 5 \
    --epochs 50 \
    --use_hashtags

Swin Transformer (Tiny):

python3 -m train \
    --name swin_tiny \
    --model_type swin_tiny \
    --bs 50 \
    --lr 1e-5 \
    --wd 1e-8 \
    --opt_type adamw \
    --epochs 50 \
    --use_hashtags

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
env		env
hashtag-model		hashtag-model
images		images
scores		scores
.gitignore		.gitignore
DATA.md		DATA.md
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
download_data.py		download_data.py
embedding.py		embedding.py
eval_demo.ipynb		eval_demo.ipynb
eval_utils.py		eval_utils.py
intent.md		intent.md
loc_loss.py		loc_loss.py
model.py		model.py
resnet50_model.ipynb		resnet50_model.ipynb
test.py		test.py
train.py		train.py
utils.py		utils.py
visualizations.ipynb		visualizations.ipynb

License

cs395t-intent/cross-modal-intent

Folders and files

Latest commit

History

Repository files navigation

Cross Modal Transformer for Intentonomy

Download Data

Environment Setup

Train Command

About

Resources

License

Stars

Watchers

Forks

Languages