<a href="https://colab.research.google.com/github/dlfelps/ml_portfolio/blob/main/concept_bottleneck.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook provides an example of an explainable AI technique called [Concept Bottleneck Models](https://arxiv.org/abs/2007.04612).

It has a companion [blog post](https://dlfelps.github.io/2024/06/03/few-shot.html).

It is part of Daniel Felps' [ML portfolio](https://github.com/dlfelps/ml_portfolio/tree/main)

# SETUP ENVIRONMENT

In [1]:
!git clone https://github.com/dlfelps/datasets.git
!git clone https://github.com/dlfelps/concept_bottleneck.git
!pip install pytorchcv

Cloning into 'datasets'...
remote: Enumerating objects: 18, done.[K
remote: Counting objects: 100% (18/18), done.[K
remote: Compressing objects: 100% (11/11), done.[K
remote: Total 18 (delta 5), reused 15 (delta 5), pack-reused 0[K
Receiving objects: 100% (18/18), 6.23 KiB | 3.11 MiB/s, done.
Resolving deltas: 100% (5/5), done.
Cloning into 'concept_bottleneck'...
remote: Enumerating objects: 33, done.[K
remote: Counting objects: 100% (33/33), done.[K
remote: Compressing objects: 100% (24/24), done.[K
remote: Total 33 (delta 8), reused 30 (delta 8), pack-reused 0[K
Receiving objects: 100% (33/33), 9.24 MiB | 13.80 MiB/s, done.
Resolving deltas: 100% (8/8), done.
Collecting pytorchcv
  Downloading pytorchcv-0.0.67-py2.py3-none-any.whl (532 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m532.4/532.4 kB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pytorchcv
Successfully installed pytorchcv-0.0.67


# IMPORTS

In [2]:
from pathlib import Path
import pickle
import numpy as np
import re
import pandas as pd
import os


from pytorchcv.model_provider import get_model

from datasets.CUB200 import CUB200, CUB200_attributes
from concept_bottleneck.cav import CAV
from concept_bottleneck.interpretablePredictor import InterpretablePredictor
from concept_bottleneck.conceptBottleneck import ConceptBottleneck
from concept_bottleneck.utils import predict_embeddings

import torch
from torchvision.transforms import v2
from torch.utils.data import DataLoader

from sklearn.metrics import accuracy_score

# DOWNLOAD DATASET (CUB)

In [5]:
# TODO add transforms back
cub = CUB200('.', download=True, is_test=False, transform = v2.Compose([
      v2.Resize((224,224)),
      v2.ToImage(),  # Convert to tensor, only needed if you had a PIL image
      v2.ToDtype(torch.float32, scale=True),  # Normalize expects float input
      v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]))



Files already downloaded and verified


# COMPUTE VECTOR EMBEDDINGS

In [6]:
cub_dataloader = DataLoader(cub, batch_size=100)
cub_res = get_model('resnet18_cub', pretrained=True, root='.')
cub_res.output = torch.nn.Identity()
cub_res = cub_res.to('cuda')
cub_embeddings, ids = predict_embeddings(cub_dataloader, cub_res, device='cuda')

Predicting embeddings: 100%|██████████| 58/58 [00:30<00:00,  1.92batch/s]


# LOAD INTERPRETABLE FEATURE FROM CUB

In [7]:
cub_attr = CUB200_attributes('.', X=cub_embeddings, ids = ids, download=False)
_,attributes = cub_attr.get_xy()

id_map = cub.get_id_class_mapper()
classes = np.array(list(map(lambda x: id_map[x], ids)))

# CONCEPT BOTTLENECK MODEL

In [8]:
# download pretrained models
!wget https://github.com/dlfelps/ml_portfolio/raw/main/pretrained_models/cav.pkl
!wget https://github.com/dlfelps/ml_portfolio/raw/main/pretrained_models/ip.pkl

--2024-05-25 23:23:00--  https://github.com/dlfelps/ml_portfolio/raw/main/pretrained_models/cav.pkl
Resolving github.com (github.com)... 20.205.243.166
Connecting to github.com (github.com)|20.205.243.166|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/dlfelps/ml_portfolio/main/pretrained_models/cav.pkl [following]
--2024-05-25 23:23:00--  https://raw.githubusercontent.com/dlfelps/ml_portfolio/main/pretrained_models/cav.pkl
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7899855 (7.5M) [application/octet-stream]
Saving to: ‘cav.pkl’


2024-05-25 23:23:01 (367 MB/s) - ‘cav.pkl’ saved [7899855/7899855]

--2024-05-25 23:23:01--  https://github.com/dlfelps/ml_portfolio/raw/main/pretrained_models/ip

In [9]:
cbm = ConceptBottleneck()
# cbm.fit(cub_embeddings, attributes, classes) # disable comment to train
cbm.load_concept_predictors('cav.pkl') # load pretrained
cbm.load_interpretable_predictor('ip.pkl') # load pretrained


# ACCURACY

In [10]:
cub_test = CUB200('.', download=True, is_test=True, transform = v2.Compose([
      v2.Resize((224,224)),
      v2.ToImage(),  # Convert to tensor, only needed if you had a PIL image
      v2.ToDtype(torch.float32, scale=True),  # Normalize expects float input
      v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]))

Files already downloaded and verified


In [11]:
cub_dataloader = DataLoader(cub_test, batch_size=100)
cub_embeddings, ids = predict_embeddings(cub_dataloader, cub_res, device='cuda')
id_map = cub.get_id_class_mapper()
classes = np.array(list(map(lambda x: id_map[x], ids)))

Predicting embeddings: 100%|██████████| 60/60 [00:30<00:00,  1.96batch/s]


In [12]:
preds = cbm.predict(cub_embeddings)


In [13]:
accuracy_score(classes, preds)

0.5945945945945946