# 🍕 Introduction to Slice Discovery with `domino`

This notebook introduces Domino, a method for identifying slices of data on which a machine learning model underperforms. 


**Useful links:**
- 📄 [ICLR 2022 Paper](https://arxiv.org/abs/2203.14960)
- 💻 [GitHub](https://github.com/HazyResearch/domino)
- 📘 [Docs](https://domino-slice.readthedocs.io/en/latest/)
- 🌍 [BlogPost]()

In [1]:
!pip install "domino[clip] @ git+https://github.com/HazyResearch/domino@main"
!pip install git+https://github.com/openai/CLIP.git

Collecting domino[clip]@ git+https://github.com/HazyResearch/domino@main
  Cloning https://github.com/HazyResearch/domino (to revision main) to /private/var/folders/0g/9gpzpkwn2j74ntfk90_9t31m0000gp/T/pip-install-fjnd7yu7/domino_94a93d534c5547639b9d8ec60117b1f3
  Running command git clone --filter=blob:none --quiet https://github.com/HazyResearch/domino /private/var/folders/0g/9gpzpkwn2j74ntfk90_9t31m0000gp/T/pip-install-fjnd7yu7/domino_94a93d534c5547639b9d8ec60117b1f3
  Resolved https://github.com/HazyResearch/domino to commit 27eb443d1a6b425eedcafa06b1ae2944008b7f27
  Preparing metadata (setup.py) ... [?25ldone
Collecting scikit-learn==0.24.2
  Using cached scikit-learn-0.24.2.tar.gz (7.5 MB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25lerror
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mPreparing metadata [0m[1;32m([0m[32mpyproject.t

In [3]:
from domino import explore, DominoSlicer
import meerkat as mk

In [4]:
# if you don't have access to a GPU, set this to `DEVICE="cpu"`
# if you are running this notebook on Google colab, you can use a GPU by going to 
# "Runtime" -> "Change runtime type" and selecting "GPU" under Hardware accelerator
DEVICE = "cpu"

## 💾 Downloading the data
First, we'll download some data to explore. We're going to use the [Imagenette dataset](https://github.com/fastai/imagenette#image%E7%BD%91), a small subset of the original [ImageNet](https://www.image-net.org/update-mar-11-2021.php).  This dataset is made up of 10 classes (e.g. "garbage truck", "gas pump", "golf ball").
- Download time: <1 minute
- Download size: 130M

In [5]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [16]:
import os

df = mk.datasets.get("imagenette")

# we'll only be using the validation data
df = df[df["split"] == "valid"][:512]

## 🤖 Loading a model and computing predictions
Next, we'll load in the model we are going to audit: a ResNet18 pretrained on the full ImageNet (courtesy of [TorchVision](https://pytorch.org/vision/stable/models.html)). We'll compute the model's prediction for each example in the Imagenette validation dataset we loaded above. 

In [17]:
import torch
from torchvision.models import resnet18
import torchvision.transforms as transforms
model = resnet18(pretrained=True)



In [18]:
# 1. Define transform
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
])

# 2. Create new column with transform 
df["input"] = df["img"].defer(transform)

In [19]:
# 1. Move the model to device
model.to(DEVICE).eval()

# 2. Define a function that runs a forward pass over a batch 
@torch.no_grad()
def predict(input: mk.TensorColumn):
    x: torch.Tensor = input.to_tensor().to(DEVICE)  # We get the underlying torch tensor with `data` and move to GPU 
    out: torch.Tensor = model(x)  # Run forward pass

    # Return a dictionary with one key for each of the new columns. Each value in the
    # dictionary should have the same length as the batch. 
    return {
        "pred": out.cpu().numpy().argmax(axis=-1),
        "probs": torch.softmax(out, axis=-1).cpu().numpy(),
    }

# 3. Apply the update. Note that the `predict` function operates on batches, so we set 
# `is_batched_fn=True`. Also, the `predict` function only accesses the "input" column, by 
# specifying that here we instruct update to only load that one column and skip others 
pred_df = df.map(
    function=predict,
    is_batched_fn=True,
    batch_size=32,
    pbar=True
)
df = mk.concat([df, pred_df], axis=1)

100%|██████████| 16/16 [00:19<00:00,  1.23s/it]


## 🎯 Computing average metrics

Next we'll compute metrics for one of the classes: "gas pump". 

In [21]:
df["correct"] = df["pred"] == df["label_idx"].to_numpy()
accuracy = df["correct"].mean()
print(f"Micro accuracy across the ten Imagenette classes: {accuracy:0.3}")

Micro accuracy across the ten Imagenette classes: 0.521


## 🔎 Discovering underperforming slices

Although the model performs quite well on average, it may still underperform on interesting slices of data. Slice Discovery Methods (SDM) are automated algorithms that aim to identify these slices.  Most SDMs adhere to a three-step procedure highlighted in the figure below: (1) embed, (2) slice, and (3) describe.  For each of these steps, the `domino` package provides implementations of various algorithms under a common API. This makes it easy to compose a custom slice discovery method from different choices for each step.

<div>
<img src="attachment:509a0045-9a12-4397-a206-749ea863d6ec.png" width="500"/>
</div>


Below, we use `domino` to discover slices of the Imagenette data on which the model underperforms.
We'll focus on one class at a time when discovering slices in this tutorial. Below, we start with the class "gas pump", but feel free to try a different class by changing the `LABEL_IDX` constant. 

```
{'cassette player': 482,
 'garbage truck': 569,
 'tench': 0,
 'english springer spaniel': 217,
 'church': 497,
 'parachute': 701,
 'french horn': 566,
 'chainsaw': 491,
 'golf ball': 574,
 'gas pump': 571}
```

In [22]:
LABEL_IDX = 571

# convert to a binary task 
df["prob"] = df["probs"][:, LABEL_IDX]
df["target"] = (df["label_idx"] == LABEL_IDX)


### 📊 1. Embed

Domino encodes the validation images alongside text in a cross-modal embedding space using a model like CLIP.

In [23]:
from domino import embed
df = embed(
    df, 
    input_col="img",
    encoder="clip", 
    modality="image",
    device=DEVICE
)

100%|██████████| 4/4 [00:07<00:00,  1.92s/it]


### 🍕 2. Slice

Using an error-aware mixture model, Domino identifies regions in the embedding space with a high concentration of errors.

In [24]:
domino = DominoSlicer(
    y_log_likelihood_weight=40,
    y_hat_log_likelihood_weight=40,
    n_mixture_components=25,
    n_slices=5
)

domino.fit(data=df, embeddings="clip(img)", targets="target", pred_probs="prob")

df["domino_slices"] = domino.predict_proba(
    data=df, embeddings="clip(img)", targets="target", pred_probs="prob"
)

 16%|[38;2;241;122;74m█▌        [0m| 16/100 [00:00<00:00, 91.42it/s]


### ✏️ 3. Describe

Finally, to help practitioners understand the commonalities among the examples in each slice, Domino generates natural language descriptions of the slices. To do so, it leverages the cross-modal embeddings computed in Step 1, surfacing the text nearest to the slice in embedding space.

In [28]:
from domino import generate_candidate_descriptions
phrase_templates = [
    "a photo of [MASK].",
    "a photo of {} [MASK].",
    "a photo of [MASK] {}.",
    "a photo of [MASK] {} [MASK].",
]

text_df = generate_candidate_descriptions(
    templates=phrase_templates,
    num_candidates=10_000
)

[nltk_data] Downloading package words to
[nltk_data]     /Users/sabrieyuboglu/nltk_data...
[nltk_data]   Package words is already up-to-date!


Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.weight', 'bert.pooler.dense.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


AssertionError: Torch not compiled with CUDA enabled

In [23]:
text_df = embed(
    text_df, 
    input_col="output_phrase", 
    encoder="clip",
    device=DEVICE
)

  0%|          | 0/79 [00:00<?, ?it/s]

In [None]:
from domino import describe

df["target"] = df["target"].astype(int)

descriptions = describe(
    data=df,
    embeddings="clip(img)",
    targets="target",
    slices="domino_slices",
    text=text_df,
    text_embeddings="clip(output_phrase)",
    slice_idx=0
)
descriptions[(-descriptions["score"]).argsort()][:10]

## 🧗🏾 Exploring discovered slices 

In [41]:
explore(
    data=df["img_path", "img", "label", "prob", "target", "clip(img)", "domino_slices"],
    embeddings="clip(img)",
    pred_probs="prob",
    targets="target",
    slices="domino_slices",
    text=text_df,
    text_embeddings="clip(output_phrase)",
) 

HBox(children=(HTML(value='<p><strong> Domino Slice Explorer </strong></p>'), Dropdown(description='Slice', in…

FloatSlider(value=0.5, continuous_update=False, description='Slice Inclusion Threshold', max=1.0, readout_form…

Output()

VBox(children=(HTML(value='<p> <strong> Natural language descriptions of the slice: </strong> </p>'), Output()…

HBox(children=(VBox(children=(HTML(value='<style>p{word-wrap: break-word}</style> <p>Select multiple columns w…

VBox(children=(HTML(value='<p> <strong>  Examples in the slice, ranked by likelihood: </strong> </p>'), Output…