# Attentionの可視化

このNotebookではALBERTモデルを転移学習によって、Livedoorニュースコーパスと呼ばれるデータセットを  
分類問題として学習する。また、その分類結果の解釈をAttentionWeightを可視化することによって行う。  
(https://www.rondhuit.com/download.html)

## 準備
必要なライブラリ群をインポート

In [1]:
import os, re
import glob, csv, time
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from transformers import AutoTokenizer, AlbertForSequenceClassification
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

import warnings
warnings.filterwarnings('ignore')

path = './text/'
list_dir = {name: path + name + '/' for name in os.listdir(path) if not os.path.isfile(os.path.join(path, name))}

## 前処理
テキストデータの読み込みと前処理

In [2]:
tsv_fname = "./datasets/all_text.tsv"
brackets_tail = re.compile('【[^】]*】$')
brackets_head = re.compile('^【[^】]*】')

def remove_brackets(inp):
    output = re.sub(brackets_head, '', re.sub(brackets_tail, '', inp))
    return output

def read_title_text(f):
    # 2行スキップ
    next(f)
    next(f)
    title = next(f) # 3行目を返す
    title = remove_brackets(title)
    text = ''.join([l.strip() for l in f])
    return title[:-1], text

with open(tsv_fname, "w", encoding='utf8') as wf:
    for i, (k, v) in enumerate(list_dir.items()):
        files = [fname for fname in glob.glob(v + '*.txt') if 'LICENSE.txt' not in fname]
        for f_path in files:
            title, text = read_title_text(open(f_path, 'r', encoding='utf-8'))
            row = [k, '%d'%i, title, text]
            wf.write('\t'.join(row) + '\n')

In [3]:
cols = ['name', 'label', 'title', 'sentence']
# データの読み込み
df = pd.read_csv(tsv_fname, delimiter='\t', header=None, names=cols)
df = df.dropna(subset=['sentence'])

# データの確認
print(f'データサイズ： {df.shape}')
df.sample(10)

データサイズ： (7362, 4)


Unnamed: 0,name,label,title,sentence
2991,livedoor-homme,3,年収1000万円のビジネスパーソンに聞いた「子供を進学させたい大学ランキング ベスト20」 ...,こんにちは。「ビズリーチ年収1000万円研究所」所長の佐藤和男です。この研究所では、年収10...
631,dokujo-tsushin,0,独身女のひな祭り,女の子の節句「ひな祭り」。バレンタインデー、ホワイトデーよりも忘れられがちだが、立派な日本の...
5080,smax,6,Samsung、LTE対応Android 4.0 ICS搭載スマートフォン「GALAXY R...,ゼロから始めるスマートフォンSamsung Electronics（サムスン電子）は5月31...
2974,livedoor-homme,3,Paul Smith | ポール・スミスが提案するスーツの現在進行形[5/5],PS Paul Smithトレンドの先を往く新鮮なシルエットがファッション感度をアピールす...
7029,topic-news,8,解散するSDN48への発言で秋元康に批判殺到,プロデューサーの秋元康氏が16日、Google＋上で3月31日に解散することが決まっているS...
5839,sports-watch,7,ショック、“鬼軍曹”山本小鉄氏が急逝,現役時代は、星野勘太郎氏とのタッグ「ヤマハブラザーズ」で活躍し、引退後は、新日本プロレスの鬼...
6379,sports-watch,7,五輪目指す山崎静代、芸能との両立には「両方をやってこそ意味がある」,日本テレビ「NEWS ZERO」（13日放送分）では、女子ボクシング全日本選手権で優勝し、ロ...
5202,smax,6,本日予約開始！AQUOS PHONE ZETA SH-09DとOptimus it L-05...,本日22日（金）から全国のドコモショップなどにて事前予約が開始されたスマートフォン「AQUO...
1797,kaden-channel,2,日本版HuluはHuluにしてHuluにあらず？,最近、米国のビデオオンデマンドサービス「Hulu」の日本語版サービスが開始されたが、個人的に...
2499,kaden-channel,2,オール電化の家庭に衝撃！　経産省専門委がオール電化割引に廃止要請,キッチンや風呂のエネルギーも電気でまかなうのが「オール電化」。取り入れている家庭も多いだろう...


## 学習済みモデルのロード
今回はHuggingFaceの[Transformers](https://huggingface.co/transformers/) というライブラリからダウンロード可能なモデルを使用する。




In [4]:
model_path = "ALINEAR/albert-japanese-v2"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AlbertForSequenceClassification.from_pretrained(model_path, num_labels=len(list_dir), output_attentions=True)

device = "cuda:0"
model.train()
model.to(device)

optimizer = optim.Adam([
    {'params': model.albert.encoder.parameters(), 'lr': 5e-5},
    {'params': model.classifier.parameters(), 'lr': 5e-5}
], betas=(0.9, 0.999))

Some weights of the model checkpoint at ALINEAR/albert-japanese-v2 were not used when initializing AlbertForSequenceClassification: ['predictions.bias', 'predictions.LayerNorm.weight', 'predictions.LayerNorm.bias', 'predictions.dense.weight', 'predictions.dense.bias', 'predictions.decoder.weight', 'predictions.decoder.bias']
- This IS expected if you are initializing AlbertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing AlbertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of AlbertForSequenceClassification were not initialized from the model checkpoint at ALINEAR/albert-japanese-v2 and are newly initialized: ['classifier.weight', 'c

## 前処理（つづき）
プレーンなテキストを学習済みのモデルに合ったトークン化を行い、学習データと検証データに分ける。

In [5]:
max_len = 256

def tokenize(x):
    x = tokenizer.tokenize(x)
    x = tokenizer.convert_tokens_to_ids(x)
    if len(x) > max_len:
        x = x[:max_len]
    else:
        num = max_len - len(x)
        x = x + [0 for _ in range(num)]
    return x

def to_ids(x):
    x = tokenizer.tokenize(x)
    if len(x) > max_len:
        x = x[:max_len]
    else:
        num = max_len - len(x)
        x = x + ['-' for _ in range(num)]
    return x

df['tokens'] = df['sentence'].apply(tokenize)
df['tokenized_text'] = df['sentence'].apply(to_ids)

In [6]:
df_train, df_test = train_test_split(df[['label', 'tokens', 'tokenized_text']], test_size=0.3, random_state=42)
df_train.to_csv('./datasets/train.csv')
df_test.to_csv('./datasets/test.csv')

In [7]:
BATCH_SIZE = 8

steps_per_epoch = int(df_train.shape[0] / BATCH_SIZE)
print(df_train.shape)
print(steps_per_epoch)

valid_steps = int(df_test.shape[0] / BATCH_SIZE)
print(df_test.shape)
print(valid_steps)

(5153, 3)
644
(2209, 3)
276


## 学習を実行する

In [8]:
def calc_acc(predictions, ground_truths, batch, batch_size):
    tmp = []
    for i in range(len(predictions)):
        for j in range(batch_size):
            tmp.append(1 if predictions[i][j] == ground_truths[i][j] else 0)
    acc = sum(tmp) / (batch_size * batch)
    return acc

def exec_model(df, steps_per_epoch, phase='train'):
    t_iter_start = time.time()
    predictions, ground_truths = [], []
    epoch_loss = 0

    for batch in range(steps_per_epoch):
        iteration = batch + 1
        start, end = batch * BATCH_SIZE, (batch + 1) * BATCH_SIZE
        inputs = torch.tensor(df['tokens'].values[start: end].tolist())
        labels = torch.tensor(df['label'].values[start: end].tolist())
        # GPUにデータを送る
        inputs = inputs.to(device)
        labels = labels.to(device)

        optimizer.zero_grad()
        # 順伝搬（forward）計算
        with torch.set_grad_enabled(phase == 'train'):
            outputs = model(input_ids=inputs, labels=labels)
            loss, logit = outputs[:2]
            _, preds = torch.max(logit, 1)  # ラベルを予測
            predictions.append(preds.cpu().numpy())
            ground_truths.append(labels.data.cpu().numpy())

            if phase == 'train':
                loss.backward()
                optimizer.step()

            if (iteration % 100 == 0):
                t_iter_finish = time.time()
                duration = t_iter_finish - t_iter_start
                acc = calc_acc(predictions, ground_truths, iteration, BATCH_SIZE)
                if phase == 'train':
                    print('イテレーション {}|| Loss: {:.4f}|| 100iter: {:.4f} sec|| 正解率：{:.4f}'.format(iteration, loss.item(), duration, acc))
                else:
                    print('Valid Step: {}|| Loss: {:.4f}|| 100iter: {:.4f} sec|| 正解率：{:.4f}'.format(iteration, loss.item(), duration, acc))
                t_iter_start = time.time()
        epoch_loss += loss.item()
        del inputs, labels
    acc = calc_acc(predictions, ground_truths, iteration, BATCH_SIZE)
    return acc, epoch_loss, (np.array(predictions).flatten(), np.array(ground_truths).flatten())

predictions, ground_truths = [], []

is_train = False
if is_train:
    num_epochs = 10
    for epoch in range(1, num_epochs + 1):
        t_epoch_start = time.time()
        epoch_acc, epoch_loss, _ = exec_model(df_train, steps_per_epoch)

        # epochごとのlossと正解率
        t_epoch_finish = time.time()
        duration = t_epoch_finish - t_epoch_start
        template = 'Epoch {}: {:.4f} sec. || Loss: {:.4f} || 正解率：{}'
        print(template.format(epoch, duration, epoch_loss, epoch_acc))

        t_epoch_start = time.time()
        valid_acc, valid_loss, y = exec_model(df_test, valid_steps, phase='valid')
        print(classification_report(y[1], y[0]))
        t_epoch_finish = time.time()
        duration = t_epoch_finish - t_epoch_start
        template = 'Valid {}: {:.4f} sec. || Loss: {:.4f} || 正解率：{}'
        print(template.format(epoch, duration, valid_loss, valid_acc))

        model.save_pretrained('./checkpoints/')
        t_epoch_start = time.time()

In [9]:
torch.cuda.empty_cache()

model = AlbertForSequenceClassification.from_pretrained('./checkpoints/')
model.eval()
model.to(device)

inputs = torch.tensor(df_test['tokens'].values[:10].tolist())
inputs = inputs.to(device)

df_test.head()

Unnamed: 0,label,tokens,tokenized_text
6338,7,"[16, 29, 26, 145, 36, 4780, 73, 1577, 5, 99, 3...","[▁, 1, 月, 30, 日, 08, :, 00, (, 日本, 時間, ), 時点で,..."
1919,2,"[16, 5683, 31, 5928, 2005, 2763, 42, 2001, 13,...","[▁, 子ども, や, ペット, がいる, 家庭, では, 部屋, の, 水, 拭, き, ..."
4396,5,"[16, 15131, 11187, 2268, 120, 5348, 108, 252, ...","[▁, 誕生日, と並んで, 相手, への, 贈, り, 物, に, “, 気, 合い, ”..."
4963,6,"[16, 16549, 13, 910, 20682, 9444, 27, 110, 123...","[▁, シャープ, の, ハイ, スペック, スマートフォン, 「, a, qu, os, ..."
5421,6,"[16, 14682, 14547, 442, 2110, 16, 4377, 16, 25...","[▁, gal, ax, y, ▁s, ▁, iii, ▁, sc, -, 06, d, の..."


## Attentionを可視化する
ALBERTが文章を分類する際に注目したトークンにハイライトを当てる。  
より注目したトークンには、より強くハイライトする。

In [10]:
start, end = 0, 10
inputs = torch.tensor(df_test['tokens'].values[start: end].tolist())
labels = torch.tensor(df_test['label'].values[start: end].tolist())

def compute_attentions(sentence, target_class_idx):
    logit, attention = model(sentence, output_attentions=True)
    _, preds = torch.max(logit, 1)
    sum_attention = torch.sum(attention[0], 1)[:, 0, :]
    for i in range(1, len(attention)):
        sum_attention = torch.add(sum_attention, torch.sum(attention[i], 1)[:, 0, :])
    return sum_attention, preds
    
attentions, preds = compute_attentions(inputs.to(device), labels)

In [11]:
from IPython.display import HTML, display

for i in range(start, end):
    attention = attentions
    df = df_test.reset_index(drop=True).loc[i]
    df = [d for d in df.values[1:]]
    df = pd.DataFrame(df).T.rename(columns={0: 'id', 1: 'token'})
    df['attention'] = attention[i].cpu().detach().numpy()
    mean = df['attention'].mean()
    std = df['attention'].std()
    df['normalized_attention'] = (df['attention'] - mean) / std
    vmax = df['normalized_attention'].max()
    vmin = df['normalized_attention'].min()
    df['normalized_attention'] = (df['normalized_attention'] - vmin) / (vmax - vmin)
    pclass = list(list_dir.keys())[preds[i]]
    tclass = list(list_dir.keys())[labels[i]]
    print('predict class is %s'%pclass)
    print('ground truth class is %s'%tclass)

    mean = df['normalized_attention'].mean()
    std = df['normalized_attention'].std()
    html_output = ''
    for idx, row in df.iterrows():
        color = hex(255 - int(row['normalized_attention'] * 255))[2:]
        color = ('%sffff'%color).zfill(6)
        html_output += '<span style="background-color: #%s">%s</span>'%(color, row['token'])

    display(HTML(html_output))
    print()

predict class is sports-watch
ground truth class is sports-watch



predict class is kaden-channel
ground truth class is kaden-channel



predict class is peachy
ground truth class is peachy



predict class is smax
ground truth class is smax



predict class is smax
ground truth class is smax



predict class is sports-watch
ground truth class is sports-watch



predict class is sports-watch
ground truth class is sports-watch



predict class is smax
ground truth class is smax



predict class is topic-news
ground truth class is topic-news



predict class is peachy
ground truth class is peachy



