Cross-Modal Conceptualization in Bottleneck Models

How can you start talking to your model in the same concepts?

🎯 Abstract

Concept Bottleneck Models (CBMs) assume that training examples (e.g., x-ray images) are annotated with high-level concepts (e.g., types of abnormalities), and perform classification by first predicting the concepts, followed by predicting the label relying on these concepts. The main difficulty in using CBMs comes from having to choose concepts that are predictive of the label and then having to label training examples with these concepts. In our approach, we adopt a more moderate assumption and instead use text descriptions (e.g., radiology reports), accompanying the images in training, to guide the induction of concepts.

Our cross-modal approach (see above) treats concepts as discrete latent variables and promotes concepts that (1) are predictive of the label, and (2) can be predicted reliably from both the image and text. Through experiments conducted on datasets ranging from synthetic datasets (e.g., synthetic images with generated descriptions) to realistic medical imaging datasets, we demonstrate that cross-modal learning encourages the induction of interpretable concepts while also facilitating disentanglement. Our results also suggest that this guidance leads to increased robustness by suppressing the reliance on shortcut features.

🚀 Getting Started

Set-up conda environment
```
make set-up-env
```
Create .env file from .env.example — optional, only if you're using Telegram to check pipelines
Set-up experiment registry in clear-ml
```
clearml-init
```
Download data for Shapes, CUB-200, MSCOCO, MIMIC-CXR
```
make download_data
```
Pre-process CUB-200
```
make preprocess_cub
```
Pre-process MIMIC-CXR
```
make preprocess_mimic
```
Now you're ready to run your first experiment! Checkpoints and explanations will appear in hydra outputs/ directory (already in .gitignore).
```
python main.py dataset.batch_size=64 seed=42 +experiment={XXX}
```

🤔 FAQ

How can I retrieve explanations?

Our pipeline saves explanations in results.json, which can be found in experiment folder. For visualization you can use inspect.ipynb notebook.
How can I assess my model in terms of DCI?

Measuring disentanglement, completeness, informativeness is moved outside the default pipeline and can be performed via metrics.ipynb notebook.

🧬 Experiments

Following table helps you to navigate through all experimental setups. The configurations files are located in this directory. However, some of them are outdated and left to revise our hypotheses later (for clarity we omit them in navigation table below). Feel free to reuse our setups and add new ones here.

Model	EID-DATASET	pretrained	act_fn	norm_fn	slot_norm	dummy_concept	dummy_tokens	reg_dist	tie_loss*
Standard	E35-SHP	✓	relu	-	-	-	-	-	-
Standard	E36-SHP	✓	sigmoid	-	-	-	-	-	-
Standard	E36-CUB	✓	sigmoid	-	-	-	-	-	-
Standard	E36-MIM	✓	sigmoid	-	-	-	-	-	-
Standard	E36-SHP-NOROBUST	✓	sigmoid	-	-	-	-	-	-
Standard	E37-SHP	✓	gumbel	-	-	-	-	-	-
XCBs	E38-SHP	✓	sigmoid	softmax	✗	-	-	✗	JS($f, c$)
XCBs	E39-SHP	✓	gumbel	softmax	✗	-	-	✗	JS($f, c$)
XCBs	E39-CUB	✓	gumbel	softmax	✗	-	-	✗	JS($f, c$)
XCBs	E39-MIM	✓	gumbel	softmax	✗	-	-	✗	JS($f, c$)
XCBs	E39-SHP-NOROBUST	✓	gumbel	softmax	✗	-	-	✗	JS($f, c$)
XCBs	E40-SHP	✓	gumbel	softmax	✗	-	-	✗	KL($f, c$)
XCBs	E41-SHP	✓	gumbel	softmax	✗	-	-	✗	KL($c, f$)
XCBs	E42-SHP	✓	gumbel	entmax	✗	-	-	✗	JS($f, c$)
XCBs	E43-SHP	✓	gumbel	softmax	✓	✓	✗	✗	JS($f, c$)
XCBs	E44-SHP	✓	gumbel	softmax	✓	✓	✓	✗	JS($f, c$)
XCBs	E45-SHP	✓	gumbel	entmax	✓	✓	✗	✗	JS($f, c$)
XCBs	E46-SHP	✓	gumbel	entmax	✓	✓	✓	✗	JS($f, c$)
XCBs	E47-SHP	✓	gumbel	softmax	✗	-	-	✓	JS($f, c$)
XCBs	E47-MIM	✓	gumbel	softmax	✗	-	-	✓	JS($f, c$)
XCBs	E47-SHP-NOISE	✓	gumbel	softmax	✗	-	-	✓	JS($f, c$)
XCBs	E47-SHP-REDUNDANCY	✓	gumbel	softmax	✗	-	-	✓	JS($f, c$)
XCBs	E48-SHP	✓	gumbel	entmax	✗	-	-	✓	JS($f, c$)
Standard	E49-SHP	✗	relu	-	-	-	-	-	-
Standard	E50-SHP	✗	sigmoid	-	-	-	-	-	-
Standard	E51-SHP	✗	gumbel	-	-	-	-	-	-
XCBs	E52-SHP	✗	sigmoid	softmax	✗	-	-	✗	JS($f, c$)
XCBs	E53-SHP	✗	gumbel	softmax	✗	-	-	✗	JS($f, c$)
XCBs	E54-SHP	✗	gumbel	softmax	✗	-	-	✗	KL($f, c$)
XCBs	E55-SHP	✗	gumbel	softmax	✗	-	-	✗	KL($c, f$)
XCBs	E56-SHP	✗	gumbel	entmax	✗	-	-	✗	JS($f, c$)
XCBs	E57-SHP	✗	gumbel	softmax	✓	✓	✗	✗	JS($f, c$)
XCBs	E58-SHP	✗	gumbel	softmax	✓	✓	✓	✗	JS($f, c$)
XCBs	E59-SHP	✗	gumbel	entmax	✓	✓	✗	✗	JS($f, c$)
XCBs	E60-SHP	✗	gumbel	entmax	✓	✓	✓	✗	JS($f, c$)
XCBs	E61-SHP	✗	gumbel	softmax	✗	-	-	✓	JS($f, c$)
XCBs	E62-SHP	✗	gumbel	entmax	✗	-	-	✓	JS($f, c$)

* KL stands for Kullback–Leibler divergence, JS - for Jensen–Shannon divergence

📖 Citation

Our research paper "Cross-Modal Conceptualization in Bottleneck Models" was published in proceedings of EMNLP 2023. To cite it please use the following bibtex:

@inproceedings{alukaev-etal-2023-cross,
    title = "Cross-Modal Conceptualization in Bottleneck Models",
    author = "Alukaev, Danis  and
      Kiselev, Semen  and
      Pershin, Ilya  and
      Ibragimov, Bulat  and
      Ivanov, Vladimir  and
      Kornaev, Alexey  and
      Titov, Ivan",
    editor = "Bouamor, Houda  and
      Pino, Juan  and
      Bali, Kalika",
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.emnlp-main.318",
    pages = "5241--5253",
    abstract = "Concept Bottleneck Models (CBMs) assume that training examples (e.g., x-ray images) are annotated with high-level concepts (e.g., types of abnormalities), and perform classification by first predicting the concepts, followed by predicting the label relying on these concepts. However, the primary challenge in employing CBMs lies in the requirement of defining concepts predictive of the label and annotating training examples with these concepts. In our approach, we adopt a more moderate assumption and instead use text descriptions (e.g., radiology reports), accompanying the images, to guide the induction of concepts. Our crossmodal approach treats concepts as discrete latent variables and promotes concepts that (1) are predictive of the label, and (2) can be predicted reliably from both the image and text. Through experiments conducted on datasets ranging from synthetic datasets (e.g., synthetic images with generated descriptions) to realistic medical imaging datasets, we demonstrate that crossmodal learning encourages the induction of interpretable concepts while also facilitating disentanglement.",
}

Name		Name	Last commit message	Last commit date
Latest commit History 328 Commits
autoconcept		autoconcept
docs		docs
notebooks		notebooks
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
conda-linux-64.lock		conda-linux-64.lock
environment.yml		environment.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
virtual-packages.yml		virtual-packages.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cross-Modal Conceptualization in Bottleneck Models

🎯 Abstract

🚀 Getting Started

🤔 FAQ

🧬 Experiments

📖 Citation

About

Releases

Packages

Languages

License

DanisAlukaev/XCBs

Folders and files

Latest commit

History

Repository files navigation

Cross-Modal Conceptualization in Bottleneck Models

🎯 Abstract

🚀 Getting Started

🤔 FAQ

🧬 Experiments

📖 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages