# PyTorch: IBA (Per-Sample Bottleneck)

This notebook shows how to apply the Input level bottleneck to pretrained 4-layer LSTM models on IMDB. 

Make sure to run this notebook under the correct environment with all dependencies

In [None]:
# to set you cuda device
%env CUDA_VISIBLE_DEVICES=0

%load_ext autoreload
%autoreload 2

import torch
from torch import nn
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt 
import os
from tqdm import tqdm_notebook
import json
from PIL import Image
import numpy as np
import sys

import iba

from iba.models import NLPAttributor
from iba.models import build_attributor

env: CUDA_VISIBLE_DEVICES=0


## Load and check data

Change this to the path of config file (_dir/configs/deep_lstm.py)

In [None]:
# this assumes the work dir is the project dir
cfg_path = 'project_dir/configs/deep_lstm.py'

In [None]:
from iba.datasets import build_dataset
import mmcv
cfg = mmcv.Config.fromfile(cfg_path)
cfg.attribution_cfg['input_mask']['show'] = True

In [None]:
dataset = build_dataset(cfg.data['attribution'])

aclImdb_v1.tar.gz: 100%|██████████| 84.1M/84.1M [00:05<00:00, 14.2MB/s]
data/vector_cache/glove.6B.zip: 862MB [02:42, 5.30MB/s]                           
100%|█████████▉| 399211/400000 [00:21<00:00, 18973.05it/s]

In [None]:
datapoint = next(iter(dataset))

In [None]:
# exam one data point
print("Plain text: {}".format(datapoint['input_text']))
print("Processed text as tensor: {}".format(datapoint['input']))
print("Target class: {}".format(datapoint['target']))
print("File name: {}".format(datapoint['input_name']))
print("Text length: {}".format(datapoint['input_length']))

Plain text: Zentropa has much in common with The Third Man, another noir-like film set among the rubble of postwar Europe. Like TTM, there is much inventive camera work. There is an innocent American who gets emotionally involved with a woman he doesn't really understand, and whose naivety is all the more striking in contrast with the natives.<br /><br />But I'd have to say that The Third Man has a more well-crafted storyline. Zentropa is a bit disjointed in this respect. Perhaps this is intentional: it is presented as a dream/nightmare, and making it too coherent would spoil the effect. <br /><br />This movie is unrelentingly grim--"noir" in more than one sense; one never sees the sun shine. Grim, but intriguing, and frightening.
Processed text as tensor: tensor([13824,    52,    81,    12,  1125,    20,     2,   852,   135,     4,
          164,     0,    23,   293,   769,     2, 15259,     7, 13683,  2278,
            3,    45,     0,     4,    46,    10,    81,  4385,   391,   170,

# Information flow to generate input level attribution map for text data

In [None]:
device='cuda:0'

In [None]:
attributor = build_attributor(cfg.attributor, default_args=dict(device=device))

  "num_layers={}".format(dropout, num_layers))
100%|█████████▉| 399211/400000 [00:39<00:00, 18973.05it/s]

In [None]:
from iba.datasets import nlp_collate_fn
dataloader = DataLoader(dataset, batch_size=8, shuffle=False, collate_fn=nlp_collate_fn)

Estimate the distribution for hidden information bottleneck

In [None]:
attributor.estimate(dataloader, cfg.estimation_cfg)

In [None]:
attributor.feat_iba.estimator.mean().shape

torch.Size([256])

## Train Attributor on a sample text
The training pipeline is integrated into *attributor* class

In [None]:
datapoint = next(iter(dataset))

In [None]:
label =  datapoint['target']

In [None]:
text = datapoint['input_text']

In [None]:
input = datapoint['input']

In [None]:
target = torch.tensor([label]).double().expand(10,-1)

In [None]:
# expand target 10 times to match the batch inside attributor
target = torch.tensor([label]).double().expand(10,-1)
attributor.set_text(text)
attributor.make_attribution(input.to(device),
              target.to(device),
              attribution_cfg=cfg.attribution_cfg)

## Display feature mask from IBA (already summed over channels)

We highlight tokens with different colors based on their attribution value, dark red means the token is very important for model decision, shallower color means the token is not important for model decision

In [None]:
# tokenizer is needed to divide text into tokens, so we can assign attribution value
from torchtext.data.utils import get_tokenizer
tokenizer = get_tokenizer('basic_english')

In [None]:
attributor.show_feat_mask(tokenizer=tokenizer, show=True)

## Display final word mask learned from image IB

In [None]:
attributor.show_input_mask(tokenizer=tokenizer, show=True)