Universal Mini-batch Consistent Set Encoders

Scalable Set Encoding with Universal Mini-Batch Consistency and Unbiased Full Set Gradient Approximation ICML 2023
Jeffrey Willette*, Seanie Lee*, Bruno Andreis, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang

arXiv

This is the official repository for UMBC: Scalable Set Encoding with Universal Mini-Batch Consistency and Unbiased Full Set Gradient Approximation (ICML 2023).

Motivating Example

In this task, the set encoder must take a set consisting of points in 2D space, and output the parameters of a Gaussian mixture model which maximize the likelihood of the data. The sets comes in one of four different streams. The set encoder must be able to accept the incoming streaming points and process them without storing any input points in memory. Our UMBC model can accomplish this even when using non-MBC components such as self-attention which creates the strongest overall model in this task.

$$\huge {\color{green}✓} = \text{MBC model} \quad\quad {\color{red}✗} = \text{non-MBC model}$$

UMBC (with Set Transformer) $\huge {\color{green}✓}$

Set Transformer $\huge {\color{red}✗}$

Experiment Commands and Datasets

Individual experiment code can be found in the respective {camelyon, celeba, mvn, and text} directories

Celeba

Dataset can be found here

cd celeba
bash run.sh "GPU" 100 umbc 128 true [train|test]

Camelyon16

Dataset can be found here in an S3 bucket. Our preprocessing code can be found in data/preprocessing/camelyon.py.

cd camelyon
python trainer.py \
    --gpus "0,1,2,3,4" \
    --mode [pretrain|pretrain-test|finetune|finetune-test] \
    --model sse-umbc \
    --k 64 \
    --attn-act softmax \
    --grad-set-size 256

Text

Dataset download script [EURLEX57K] can be found here

cd text
bash run.sh "0" 100 true

MVN

MVN data is generated randomly every time the dataloader is asked for a new sample using the sample() method. Code can be found in data/toy_classification.py

cd mvn
python trainer.py \
    --slot-type random \
    --mode [train|test] \
    --gpu "0" \
    --model sse-umbc \
    --heads 4 \
    --epochs 50 \
    --attn-act softmax \
    --grad-correction True \
    --train-set-size 512 \
    --grad-set-size 8 \

Citation

@inproceedings{willette2023umbc,
  title={Scalable Set Encoding with Universal Mini-Batch Consistency and Unbiased Full Set Gradient Approximation},
  author={Jeffrey Willette, Seanie Lee, Bruno Andreis, Kenji Kawaguchi, Juho Lee, Sung Ju Hwang},
  booktitle={International Conference on Machine Learning (ICML) },
  year={2023}
}

Layout

├── data
│   └── preprocessing
├── example-gifs
├── umbc
│   ├── camelyon
│   ├── celeba
│   ├── models
│   │   ├── diem
│   │   ├── layers
│   ├── mvn
│   └── text
└── utils

Running tests

Tests for Minibatch Consistency can be found in the test.py file
To run the tests, navigate to the top level directory and run the following command

make test

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
example-gifs		example-gifs
umbc		umbc
utils		utils
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
base.py		base.py
requirements.txt		requirements.txt
umbc-concept.png		umbc-concept.png

jeffwillette/umbc

Folders and files

Latest commit

History

Repository files navigation

Universal Mini-batch Consistent Set Encoders

Motivating Example

UMBC (with Set Transformer) $\huge {\color{green}✓}$

Set Transformer $\huge {\color{red}✗}$

Experiment Commands and Datasets

Celeba

Camelyon16

Text

MVN

Citation

Layout

Running tests

About

Resources

Stars

Watchers

Forks

Languages