<a href="https://colab.research.google.com/github/danielhou13/cogs402longformer/blob/main/src/CaptumLongformerSequenceClassification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook adapts the Captum tutorial for question answering and refactors it into the longformer sequence classification task. Specifically, this notebook focuses on using the model's embeddings to get word attributions for the examples of your choice, or the entire dataset if needed.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Import dependencies

In [2]:
pip install transformers --quiet

[K     |████████████████████████████████| 4.4 MB 7.0 MB/s 
[K     |████████████████████████████████| 101 kB 12.2 MB/s 
[K     |████████████████████████████████| 596 kB 76.4 MB/s 
[K     |████████████████████████████████| 6.6 MB 55.4 MB/s 
[?25h

In [3]:
pip install captum --quiet

[?25l[K     |▎                               | 10 kB 39.3 MB/s eta 0:00:01[K     |▌                               | 20 kB 28.9 MB/s eta 0:00:01[K     |▊                               | 30 kB 13.4 MB/s eta 0:00:01[K     |█                               | 40 kB 6.6 MB/s eta 0:00:01[K     |█▏                              | 51 kB 6.7 MB/s eta 0:00:01[K     |█▍                              | 61 kB 8.0 MB/s eta 0:00:01[K     |█▋                              | 71 kB 8.3 MB/s eta 0:00:01[K     |█▉                              | 81 kB 6.2 MB/s eta 0:00:01[K     |██                              | 92 kB 6.9 MB/s eta 0:00:01[K     |██▎                             | 102 kB 7.6 MB/s eta 0:00:01[K     |██▌                             | 112 kB 7.6 MB/s eta 0:00:01[K     |██▊                             | 122 kB 7.6 MB/s eta 0:00:01[K     |███                             | 133 kB 7.6 MB/s eta 0:00:01[K     |███▏                            | 143 kB 7.6 MB/s eta 0:00:01[K  

In [4]:
pip install datasets --quiet

[K     |████████████████████████████████| 362 kB 7.6 MB/s 
[K     |████████████████████████████████| 1.1 MB 79.2 MB/s 
[K     |████████████████████████████████| 212 kB 74.9 MB/s 
[K     |████████████████████████████████| 140 kB 61.4 MB/s 
[K     |████████████████████████████████| 127 kB 88.6 MB/s 
[K     |████████████████████████████████| 144 kB 89.5 MB/s 
[K     |████████████████████████████████| 94 kB 3.5 MB/s 
[K     |████████████████████████████████| 271 kB 70.8 MB/s 
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.[0m
[?25h

In [5]:
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

In [6]:
from captum.attr import visualization as viz
from captum.attr import IntegratedGradients, LayerConductance, LayerIntegratedGradients
from captum.attr import configure_interpretable_embedding_layer, remove_interpretable_embedding_layer

import torch
import pandas as pd

In [7]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

## Import model

In [8]:
from transformers import LongformerForSequenceClassification, LongformerTokenizer, LongformerConfig
# replace <PATH-TO-SAVED-MODEL> with the real path of the saved model
model_path = 'danielhou13/longformer-finetuned_papers_v2'
#model_path = 'danielhou13/longformer-finetuned-new-cogs402'

# load model
model = LongformerForSequenceClassification.from_pretrained(model_path, num_labels = 2)
model.to(device)
model.eval()
model.zero_grad()

# load tokenizer
tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096")

Downloading:   0%|          | 0.00/0.99k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/567M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/694 [00:00<?, ?B/s]

Create functions that give us the input ids and the position ids for the text we want to examine

In [9]:
def predict(inputs, position_ids=None, attention_mask=None):
    output = model(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask)
    return output.logits

In [10]:
ref_token_id = tokenizer.pad_token_id # A token used for generating token reference
sep_token_id = tokenizer.sep_token_id # A token used as a separator between question and text and it is also added to the end of the text.
cls_token_id = tokenizer.cls_token_id # A token used for prepending to the concatenated question-text word sequence

In [11]:
max_length = 2046
def construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id):

    text_ids = tokenizer.encode(text, truncation = True, add_special_tokens=False, max_length = max_length)
    # construct input token ids
    input_ids = [cls_token_id] + text_ids + [sep_token_id]
    # construct reference token ids 
    ref_input_ids = [cls_token_id] + [ref_token_id] * len(text_ids) + [sep_token_id]

    return torch.tensor([input_ids], device=device), torch.tensor([ref_input_ids], device=device), len(text_ids)

def construct_input_ref_pos_id_pair(input_ids):
    seq_length = input_ids.size(1)
    position_ids = torch.arange(seq_length, dtype=torch.long, device=device)
    # we could potentially also use random permutation with `torch.randperm(seq_length, device=device)`
    ref_position_ids = torch.zeros(seq_length, dtype=torch.long, device=device)

    position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
    ref_position_ids = ref_position_ids.unsqueeze(0).expand_as(input_ids)
    return position_ids, ref_position_ids
    
def construct_attention_mask(input_ids):
    return torch.ones_like(input_ids)

Import dataset and take a few examples from it for testing purposes

Here we import the papers dataset

In [12]:
from datasets import load_dataset
import numpy as np
cogs402_ds = load_dataset("danielhou13/cogs402dataset")["test"]

Downloading:   0%|          | 0.00/739 [00:00<?, ?B/s]

Using custom data configuration danielhou13--cogs402dataset-144b958ac1a53abb


Downloading and preparing dataset None/None (download: 157.87 MiB, generated: 311.56 MiB, post-processed: Unknown size, total: 469.43 MiB) to /root/.cache/huggingface/datasets/danielhou13___parquet/danielhou13--cogs402dataset-144b958ac1a53abb/0.0.0/7328ef7ee03eaf3f86ae40594d46a1cec86161704e02dd19f232d81eee72ade8...


Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/132M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/33.6M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

0 tables [00:00, ? tables/s]

0 tables [00:00, ? tables/s]

Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/danielhou13___parquet/danielhou13--cogs402dataset-144b958ac1a53abb/0.0.0/7328ef7ee03eaf3f86ae40594d46a1cec86161704e02dd19f232d81eee72ade8. Subsequent calls will reuse this data.


  0%|          | 0/2 [00:00<?, ?it/s]

Here we import the news dataset

In [13]:
# cogs402_ds2 = load_dataset('hyperpartisan_news_detection', 'bypublisher')['validation']
# val_size = 5000
# val_indices = np.random.randint(0, len(cogs402_ds2), val_size)
# val_ds = cogs402_ds2.select(val_indices)
# labels2 = map(int, val_ds['hyperpartisan'])
# labels2 = list(labels2)
# val_ds = val_ds.add_column("labels", labels2)

In [14]:
#set 1 if we are dealing with a positive class, and 0 if dealing with negative class
def custom_forward(inputs, position_ids=None, attention_mask=None):
    preds = predict(inputs,
                   position_ids=position_ids,
                   attention_mask=attention_mask
                   )
    return torch.softmax(preds, dim = 1)

Perform Layer Integrated Gradients using the longformer's embeddings

In [15]:
def summarize_attributions(attributions):
    attributions = attributions.sum(dim=-1).squeeze(0)
    attributions = attributions / torch.linalg.norm(attributions)
    return attributions

In [16]:
lig = LayerIntegratedGradients(custom_forward, model.longformer.embeddings)

This function will let us get the example and the baseline inputs in order to perform integrated gradients, and add the attributions to our visualization tool. Additionally, we will add the attributions and tokens for each example into an array so we can use them when we want to further example the attributions scores for each example

In [17]:
vis_data_records = []
all_attributions = {}
all_tokens = {}

def get_token_attributions(dataset, example):
  text = cogs402_ds['text'][example]
  label = cogs402_ds['labels'][example]

  input_ids, ref_input_ids, sep_id = construct_input_ref_pair(text, ref_token_id, sep_token_id, cls_token_id)
  position_ids, ref_position_ids = construct_input_ref_pos_id_pair(input_ids)
  attention_mask = construct_attention_mask(input_ids)

  indices = input_ids[0].detach().tolist()
  all_tokens_curr = tokenizer.convert_ids_to_tokens(indices)

  all_tokens[str(example)] = all_tokens_curr

  attributions, delta = lig.attribute(inputs=input_ids,
                                    baselines=ref_input_ids,
                                    return_convergence_delta=True,
                                    additional_forward_args=(position_ids, attention_mask),
                                    target=1,
                                    n_steps=250,
                                    internal_batch_size = 2)

  attributions_sum = summarize_attributions(attributions)

  all_attributions[str(example)] = attributions_sum

  score = predict(input_ids, position_ids, attention_mask)

  # storing couple samples in an array for visualization purposes
  vis_data_records.append(viz.VisualizationDataRecord(
                        attributions_sum,
                        torch.softmax(score, dim = 1).max(),
                        torch.argmax(torch.softmax(score, dim = 1)),
                        label,
                        str(1),
                        attributions_sum.sum(),       
                        all_tokens_curr,
                        delta)
  )

In [18]:
get_token_attributions(cogs402_ds, 976)
get_token_attributions(cogs402_ds, 891)
# get_token_attributions(cogs402_ds, 605)
# get_token_attributions(cogs402_ds, 148)

Visualize the attributions for the words

In [19]:
# # storing couple samples in an array for visualization purposes
# score_vis = viz.VisualizationDataRecord(
#                         attributions_sum,
#                         torch.softmax(score, dim = 1).max(),
#                         torch.argmax(torch.softmax(score, dim = 1)),
#                         label,
#                         str(1),
#                         attributions_sum.sum(),       
#                         all_tokens,
#                         delta)

print('\033[1m', 'Visualization For Score', '\033[0m')
_ = viz.visualize_text(vis_data_records)

[1m Visualization For Score [0m


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (1.00),1.0,9.32,"#s Published Ġas Ġa Ġconference Ġpaper Ġin ĠInternational ĠConference Ġof ĠComputer ĠVision Ġ( IC CV ) Ġ2017 Ġ ĠSpeaking Ġthe ĠSame ĠLanguage : ĠMatch ing ĠMachine Ġto ĠHuman ĠCapt ions Ġby ĠAd vers arial ĠTraining ĠRak sh ith ĠShe tty 1 Ġ ĠMarcus ĠRoh r bach 2 , 3 Ġ Ġar X iv : 17 03 . 10 476 v 2 Ġ[ cs . CV ] Ġ6 ĠNov Ġ2017 Ġ ĠMario ĠFritz 1 Ġ1 Ġ ĠLisa ĠAnne ĠHendricks 2 Ġ ĠBer nt ĠS chie le 1 Ġ ĠMax ĠPlan ck ĠInstitute Ġfor ĠIn format ics , ĠSa ar land ĠIn format ics ĠCampus , ĠSa arb ru Ì Ī ck en , ĠGermany Ġ2 Ġ3 ĠUC ĠBerkeley ĠE EC S , ĠCA , ĠUnited ĠStates ĠFacebook ĠAI ĠResearch Ġ ĠAbstract ĠWhile Ġstrong Ġprogress Ġhas Ġbeen Ġmade Ġin Ġimage Ġcaption ing Ġrecently , Ġmachine Ġand Ġhuman Ġcapt ions Ġare Ġstill Ġquite Ġdistinct . ĠThis Ġis Ġprimarily Ġdue Ġto Ġthe Ġdeficiencies Ġin Ġthe Ġgenerated Ġword Ġdistribution , Ġvocabulary Ġsize , Ġand Ġstrong Ġbias Ġin Ġthe Ġgenerators Ġtowards Ġfrequent Ġcapt ions . ĠFurthermore , Ġhumans ĠâĢĵ Ġrightfully Ġso ĠâĢĵ Ġgenerate Ġmultiple , Ġdiverse Ġcapt ions , Ġdue Ġto Ġthe Ġinherent Ġambiguity Ġin Ġthe Ġcaption ing Ġtask Ġwhich Ġis Ġnot Ġexplicitly Ġconsidered Ġin Ġtoday âĢ Ļ s Ġsystems . ĠTo Ġaddress Ġthese Ġchallenges , Ġwe Ġchange Ġthe Ġtraining Ġobjective Ġof Ġthe Ġcaption Ġgenerator Ġfrom Ġreprodu cing Ġground truth Ġcapt ions Ġto Ġgenerating Ġa Ġset Ġof Ġcapt ions Ġthat Ġis Ġindistinguishable Ġfrom Ġhuman Ġwritten Ġcapt ions . ĠInstead Ġof Ġhand craft ing Ġsuch Ġa Ġlearning Ġtarget , Ġwe Ġemploy Ġadvers arial Ġtraining Ġin Ġcombination Ġwith Ġan Ġapproximate ĠG umb el Ġsam pler Ġto Ġimplicitly Ġmatch Ġthe Ġgenerated Ġdistribution Ġto Ġthe Ġhuman Ġone . ĠWhile Ġour Ġmethod Ġachieves Ġcomparable Ġperformance Ġto Ġthe Ġstate - of - the - art Ġin Ġterms Ġof Ġthe Ġcorrectness Ġof Ġthe Ġcapt ions , Ġwe Ġgenerate Ġa Ġset Ġof Ġdiverse Ġcapt ions Ġthat Ġare Ġsignificantly Ġless Ġbiased Ġand Ġbetter Ġmatch Ġthe Ġglobal Ġun i -, Ġbi - Ġand Ġtri - gram Ġdistributions Ġof Ġthe Ġhuman Ġcapt ions . Ġ ĠO urs : Ġa Ġperson Ġon Ġsk is Ġjumping Ġover Ġa Ġramp Ġ ĠO urs : Ġa Ġsk ier Ġis Ġmaking Ġa Ġturn Ġon Ġa Ġcourse Ġ ĠO urs : Ġa Ġcross Ġcountry Ġsk ier Ġmakes Ġhis Ġway Ġthrough Ġthe Ġsnow Ġ ĠO urs : Ġa Ġsk ier Ġis Ġheaded Ġdown Ġa Ġsteep Ġslope Ġ ĠBas eline : Ġa Ġman Ġriding Ġsk is Ġdown Ġa Ġsnow Ġcovered Ġslope Ġ ĠFigure Ġ1 : ĠFour Ġimages Ġfrom Ġthe Ġtest Ġset , Ġall Ġrelated Ġto Ġskiing , Ġshown Ġwith Ġcapt ions Ġfrom Ġour Ġadvers arial Ġmodel Ġand Ġa Ġbaseline . ĠBas eline Ġmodel Ġdescribes Ġall Ġfour Ġimages Ġwith Ġone Ġgeneric Ġcaption , Ġwhereas Ġour Ġmodel Ġproduces Ġdiverse Ġand Ġmore Ġimage Ġspecific Ġcapt ions . ĠAs Ġwe Ġanalyze Ġin Ġthis Ġpaper , Ġthis Ġis Ġlikely Ġdue Ġto Ġartifacts Ġand Ġdeficiencies Ġin Ġthe Ġstatistics Ġof Ġthe Ġgenerated Ġcapt ions , Ġwhich Ġis Ġmore Ġapparent Ġwhen Ġobserving Ġmultiple Ġsamples . ĠSpecifically , Ġwe Ġobserve Ġthat Ġstate - of - the - art Ġsystems Ġfrequently ĠâĢ ľ reve al Ġthemselves âĢ Ŀ Ġby Ġgenerating Ġa Ġdifferent Ġword Ġdistribution Ġand Ġusing Ġsmaller Ġvocabulary . ĠFurther Ġscrutiny Ġreveals Ġthat Ġgeneral ization Ġfrom Ġthe Ġtraining Ġset Ġis Ġstill Ġchallenging Ġand Ġgeneration Ġis Ġbiased Ġto Ġfrequent Ġfragments Ġand Ġcapt ions . ĠAlso , Ġtoday âĢ Ļ s Ġsystems Ġare Ġevaluated Ġto Ġproduce Ġa Ġsingle Ġcaption . ĠYet , Ġmultiple Ġpotentially Ġdistinct Ġcapt ions Ġare Ġtypically Ġcorrect Ġfor Ġa Ġsingle Ġimage ĠâĢĵ Ġa Ġproperty Ġthat Ġis Ġreflected Ġin Ġhuman Ġground - truth . ĠThis Ġdiversity Ġis Ġnot Ġequally Ġreproduced Ġby Ġstate - of - the - art Ġcaption Ġgenerators Ġ[ 40 , Ġ23 ]. ĠTherefore , Ġour Ġgoal Ġis Ġto Ġmake Ġimage Ġcapt ions Ġless Ġdistinguish able Ġfrom Ġhuman Ġones ĠâĢĵ Ġsimilar Ġin Ġthe Ġspirit Ġto Ġa ĠTuring Ġ Ġ1 . ĠIntroduction ĠImage Ġcaption ing Ġsystems Ġhave Ġa Ġvariety Ġof Ġapplications Ġranging Ġfrom Ġmedia Ġretrieval Ġand Ġtagging Ġto Ġassistance Ġfor Ġthe Ġvisually Ġimpaired . ĠIn Ġparticular , Ġmodels Ġwhich Ġcombine Ġstate - of - the - art Ġimage Ġrepresentations Ġbased Ġon Ġdeep Ġconv olution al Ġnetworks Ġand Ġdeep Ġrecurrent Ġlanguage Ġmodels Ġhave Ġled Ġto Ġever Ġincreasing Ġperformance Ġon Ġevaluation Ġmetrics Ġsuch Ġas ĠC ID Er Ġ[ 39 ] Ġand ĠMET E OR Ġ[ 8 ] Ġas Ġcan Ġbe Ġseen Ġe . g . Ġon Ġthe ĠC OC O Ġimage ĠCaption Ġchallenge Ġleader board Ġ[ 6 ]. ĠDespite Ġthese Ġadvances , Ġit Ġis Ġoften Ġeasy Ġfor Ġhumans Ġto Ġdifferentiate Ġbetween Ġmachine Ġand Ġhuman Ġcapt ions ĠâĢĵ Ġparticularly Ġwhen Ġobserving Ġmultiple Ġcapt ions Ġfor Ġa Ġsingle Ġimage . Ġ1 Ġ Ġ Č 2 . ĠRelated ĠWork Ġ Ġa Ġbus Ġthat Ġhas Ġpulled Ġinto Ġthe Ġside Ġof Ġthe Ġstreet Ġa Ġbus Ġis Ġparked Ġat Ġthe Ġside Ġof Ġthe Ġroad Ġa Ġwhite Ġbus Ġis Ġparked Ġnear Ġa Ġcurb Ġwith Ġpeople Ġwalking Ġby Ġ Ġa Ġgroup Ġof Ġpeople Ġstanding Ġoutside Ġin Ġa Ġold Ġmuseum Ġan Ġairplane Ġshow Ġwhere Ġpeople Ġstand Ġaround Ġa Ġline Ġof Ġplanes Ġparked Ġat Ġan Ġairport Ġshow Ġ ĠBase ĠâĢ¢ Ġa Ġbus Ġis Ġparked Ġon Ġthe Ġside Ġof Ġline Ġthe Ġroad ĠâĢ¢ Ġa Ġbus Ġthat Ġis Ġparked Ġin Ġthe Ġstreet Ġa Ġbus Ġis Ġparked Ġin Ġthe Ġstreet Ġnext Ġto Ġa Ġbus Ġ Ġa Ġgroup Ġof Ġpeople Ġstanding Ġaround Ġa Ġplane Ġa Ġgroup Ġof Ġpeople Ġstanding Ġaround Ġa Ġplane Ġa Ġgroup Ġof Ġpeople Ġstanding Ġaround Ġa Ġplane Ġ ĠO urs Ġ ĠFigure Ġ2 : ĠTwo Ġexamples Ġcomparing Ġmultiple Ġcapt ions Ġgenerated Ġby Ġour Ġadvers arial Ġmodel Ġand Ġthe Ġbaseline . ĠBi - gram s Ġwhich Ġare Ġtop - 20 Ġfrequent Ġbi - gram s Ġin Ġthe Ġtraining Ġset Ġare Ġmarked Ġin Ġred Ġ( e . g ., ĠâĢ ľ a Ġgroup âĢ Ŀ Ġand ĠâĢ ľ group Ġof âĢ Ŀ ). ĠCapt ions Ġwhich Ġare Ġrepl icas Ġfrom Ġtraining Ġset Ġare Ġmarked Ġwith ĠâĢ¢ Ġ. ĠTest . ĠWe Ġalso Ġembrace Ġthe Ġambiguity Ġof Ġthe Ġtask Ġand Ġextend Ġour Ġinvestigation Ġto Ġpredicting Ġsets Ġof Ġcapt ions Ġfor Ġa Ġsingle Ġimage Ġand Ġevaluating Ġtheir Ġquality , Ġparticularly Ġin Ġterms Ġof Ġthe Ġdiversity Ġin Ġthe Ġgenerated Ġset . ĠIn Ġcontrast , Ġpopular Ġapproaches Ġto Ġimage Ġcaption ing Ġare Ġtrained Ġwith Ġan Ġobjective Ġto Ġreproduce Ġthe Ġcapt ions Ġas Ġprovided Ġby Ġthe Ġground - truth . ĠInstead Ġof Ġrelying Ġon Ġhand craft ing Ġloss - fun ctions Ġto Ġachieve Ġour Ġgoal , Ġwe Ġpropose Ġan Ġadvers arial Ġtraining Ġmechanism Ġfor Ġimage Ġcaption ing . ĠFor Ġthis Ġwe Ġbuild Ġon ĠGener ative ĠAd vers arial ĠNetworks Ġ( GAN s ) Ġ[ 14 ], Ġwhich Ġhave Ġbeen Ġsuccessfully Ġused Ġto Ġgenerate Ġmainly Ġcontinuous Ġdata Ġdistributions Ġsuch Ġas Ġimages Ġ[ 9 , Ġ30 ], Ġalthough Ġexceptions Ġexist Ġ[ 27 ]. ĠIn Ġcontrast Ġto Ġimages , Ġcapt ions Ġare Ġdiscrete , Ġwhich Ġposes Ġa Ġchallenge Ġwhen Ġtrying Ġto Ġback prop agate Ġthrough Ġthe Ġgeneration Ġstep . ĠTo Ġovercome Ġthis Ġobstacle , Ġwe Ġuse Ġa ĠG umb el Ġsam pler Ġ[ 20 , Ġ28 ] Ġthat Ġallows Ġfor Ġend - to - end Ġtraining . ĠWe Ġaddress Ġthe Ġproblem Ġof Ġcaption Ġset Ġgeneration Ġfor Ġimages Ġand Ġdiscuss Ġmetrics Ġto Ġmeasure Ġthe Ġcaption Ġdiversity Ġand Ġcompare Ġit Ġto Ġhuman Ġground - truth . ĠWe Ġcontribute Ġa Ġnovel Ġsolution Ġto Ġthis Ġproblem Ġusing Ġan Ġadvers arial Ġformulation . ĠThe Ġevaluation Ġof Ġour Ġmodel Ġshows Ġthat Ġaccuracy Ġof Ġgenerated Ġcapt ions Ġis Ġon Ġpar Ġto Ġthe Ġstate - of - the - art , Ġbut Ġwe Ġgreatly Ġincrease Ġthe Ġdiversity Ġof Ġthe Ġcaption Ġsets Ġand Ġbetter Ġmatch Ġthe Ġground - truth Ġstatistics Ġin Ġseveral Ġmeasures . ĠQual itatively , Ġour Ġmodel Ġproduces Ġmore Ġdiverse Ġcapt ions Ġacross Ġimages Ġcontaining Ġsimilar Ġcontent Ġ( Figure Ġ1 ) Ġand Ġwhen Ġsampling Ġmultiple Ġcapt ions Ġfor Ġan Ġimage Ġ( see Ġsupplementary ) 1 Ġ. Ġ1 Ġhttps :// goo . gl / 3 y R V n q Ġ ĠImage ĠDescription . ĠEarly Ġcaption ing Ġmodels Ġrely Ġon Ġfirst Ġrecognizing Ġvisual Ġelements , Ġsuch Ġas Ġobjects , Ġattributes , Ġand Ġactivities , Ġand Ġthen Ġgenerating Ġa Ġsentence Ġusing Ġlanguage Ġmodels Ġsuch Ġas Ġa Ġtemplate Ġmodel Ġ[ 13 ], Ġn - gram Ġmodel Ġ[ 22 ], Ġor Ġstatistical Ġmachine Ġtranslation Ġ[ 34 ]. ĠAdv ances Ġin Ġdeep Ġlearning Ġhave Ġled Ġto Ġend - to - end Ġtrain able Ġmodels Ġthat Ġcombine Ġdeep Ġconv olution al Ġnetworks Ġto Ġextract Ġvisual Ġfeatures Ġand Ġrecurrent Ġnetworks Ġto Ġgenerate Ġsentences Ġ[ 11 , Ġ41 , Ġ21 ]. ĠThough Ġmodern Ġdescription Ġmodels Ġare Ġcapable Ġof Ġproducing Ġcoherent Ġsentences Ġwhich Ġaccurately Ġdescribe Ġan Ġimage , Ġthey Ġtend Ġto Ġproduce Ġgeneric Ġsentences Ġwhich Ġare Ġreplicated Ġfrom Ġthe Ġtrain Ġset Ġ[ 10 ]. ĠFurthermore , Ġan Ġimage Ġcan Ġcorrespond Ġto Ġmany Ġvalid Ġdescriptions . ĠHowever , Ġat Ġtest Ġtime , Ġsentences Ġgenerated Ġwith Ġmethods Ġsuch Ġas Ġbeam Ġsearch Ġare Ġgenerally Ġvery Ġsimilar . Ġ[ 40 , Ġ23 ] Ġfocus Ġon Ġincreasing Ġsentence Ġdiversity Ġby Ġintegrating Ġa Ġdiversity Ġpromoting Ġhe uristic Ġinto Ġbeam Ġsearch . Ġ[ 42 ] Ġattempts Ġto Ġincrease Ġthe Ġdiversity Ġin Ġcaption Ġgeneration Ġby Ġtraining Ġan Ġensemble Ġof Ġcaption Ġgenerators Ġeach Ġspecializing Ġin Ġdifferent Ġportions Ġof Ġthe Ġtraining Ġset . ĠIn Ġcontrast , Ġwe Ġfocus Ġon Ġimproving Ġdiversity Ġof Ġgenerated Ġcapt ions Ġusing Ġa Ġsingle Ġmodel . ĠOur Ġmethod Ġachieves Ġthis Ġby Ġlearning Ġa Ġcorresponding Ġmodel Ġusing Ġa Ġdifferent Ġtraining Ġloss Ġas Ġopposed Ġto Ġafter Ġtraining Ġhas Ġcompleted . ĠWe Ġnote Ġthat Ġgenerating Ġdiverse Ġsentences Ġis Ġalso Ġa Ġchallenge Ġin Ġvisual Ġquestion Ġgeneration , Ġsee Ġconcurrent Ġwork Ġ[ 19 ], Ġand Ġin Ġlanguage - only Ġdialogue Ġgeneration Ġstudied Ġin Ġthe Ġlinguistic Ġcommunity , Ġsee Ġe . g . Ġ[ 23 , Ġ24 ]. ĠWhen Ġtraining Ġrecurrent Ġdescription Ġmodels , Ġthe Ġmost Ġcommon Ġmethod Ġis Ġto Ġpredict Ġa Ġword Ġw t Ġconditioned Ġon Ġan Ġimage Ġand Ġall Ġprevious Ġground Ġtruth Ġwords . ĠAt Ġtest Ġtime , Ġeach Ġword Ġis Ġpredicted Ġconditioned Ġon Ġan Ġimage Ġand Ġpreviously Ġpredicted Ġwords . ĠConsequently , Ġat Ġtest Ġtime Ġpredicted Ġwords Ġmay Ġbe Ġconditioned Ġon Ġwords Ġthat Ġwere Ġincorrectly Ġpredicted Ġby Ġthe Ġmodel . ĠBy Ġonly Ġtraining Ġon Ġground Ġtruth Ġwords , Ġthe Ġmodel Ġsuffers Ġfrom Ġexposure Ġbias Ġ[ 31 ] Ġand Ġcannot Ġeffectively Ġlearn Ġto Ġrecover Ġwhen Ġit Ġpredicts Ġan Ġincorrect Ġword Ġduring Ġtraining . ĠTo Ġavoid Ġthis , Ġ[ 4 ] Ġproposes Ġa Ġscheduled Ġsampling Ġtraining Ġscheme Ġwhich Ġbegins Ġby Ġtraining Ġwith Ġground Ġtruth Ġwords , Ġbut Ġthen Ġslowly Ġconditions Ġgenerated Ġwords Ġon Ġwords Ġpreviously Ġproduced Ġby Ġthe Ġmodel . ĠHowever , Ġ[ 17 ] Ġshows Ġthat Ġthe Ġscheduled Ġsampling Ġalgorithm Ġis Ġinconsistent Ġand Ġthe Ġoptimal Ġsolution Ġunder Ġthis Ġobjective Ġdoes Ġnot Ġconverge Ġto Ġthe Ġtrue Ġdata Ġdistribution . ĠTaking Ġa Ġdifferent Ġdirection , Ġ[ 31 ] Ġproposes Ġto Ġaddress Ġthe Ġexposure Ġbias Ġby Ġgradually Ġmixing Ġa Ġsequence Ġlevel Ġloss Ġ( BLE U Ġscore ) Ġusing ĠRE IN FOR CE Ġrule Ġwith Ġthe Ġstandard Ġmaximum Ġlikelihood Ġtraining . ĠSeveral Ġother Ġworks Ġhave Ġfollowed Ġthis Ġup Ġwith Ġusing Ġreinforcement Ġlearning Ġbased Ġapproaches Ġto Ġdirectly Ġoptimize Ġthe Ġevaluation Ġmetrics Ġlike ĠB LE U , ĠMET E OR Ġand ĠC IDER Ġ[ 33 , Ġ25 ]. ĠHowever , Ġoptimizing Ġthe Ġevaluation Ġmetrics Ġdoes Ġnot Ġdirectly Ġaddress Ġthe Ġdiversity Ġof Ġthe Ġ Ġ Č generated Ġcapt ions . ĠSince Ġall Ġcurrent Ġevaluation Ġmetrics Ġuse Ġn - gram Ġmatching Ġto Ġscore Ġthe Ġcapt ions , Ġcapt ions Ġusing Ġmore Ġfrequent Ġn - gram s Ġare Ġlikely Ġto Ġachieve Ġbetter Ġscores Ġthan Ġones Ġusing Ġrare r Ġand Ġmore Ġdiverse Ġn - gram s . ĠIn Ġthis Ġwork , Ġwe Ġformulate Ġour Ġcaption Ġgenerator Ġas Ġa Ġgener ative Ġadvers arial Ġnetwork . ĠWe Ġdesign Ġa Ġdiscrim inator Ġthat Ġexplicitly Ġencourages Ġgenerated Ġcapt ions Ġto Ġbe Ġdiverse Ġand Ġindistinguishable Ġfrom Ġhuman Ġcapt ions . ĠThe Ġgenerator Ġis Ġtrained Ġwith Ġan Ġadvers arial Ġloss Ġwith Ġthis Ġdiscrim inator . ĠConsequently , Ġour Ġmodel Ġgenerates Ġcapt ions Ġthat Ġbetter Ġreflect Ġthe Ġway Ġhumans Ġdescribe Ġimages Ġwhile Ġmaintaining Ġsimilar Ġcorrectness Ġas Ġdetermined Ġby Ġa Ġhuman Ġevaluation . ĠGener ative ĠAd vers arial ĠNetworks . ĠThe ĠGener ative ĠAd vers arial ĠNetworks Ġ( GAN s ) Ġ[ 14 ] Ġframework Ġlearns Ġgener ative Ġmodels Ġwithout Ġexplicitly Ġdefining Ġa Ġloss Ġfrom Ġa Ġtarget Ġdistribution . ĠInstead , ĠG AN s Ġlearn Ġa Ġgenerator Ġusing Ġa Ġloss Ġfrom Ġa Ġdiscrim inator Ġwhich Ġtries Ġto Ġdifferentiate Ġreal Ġand Ġgenerated Ġsamples , Ġwhere Ġthe Ġgenerated Ġsamples Ġcome Ġfrom Ġthe Ġgenerator . ĠWhen Ġtraining Ġto Ġgenerate Ġreal Ġimages , #/s"
,,,,
1.0,0 (1.00),1.0,-0.02,"#s ar X iv : 17 05 . 0 39 16 v 1 Ġ[ cs . MA ] Ġ10 ĠMay Ġ2017 Ġ ĠUnder Ġconsideration Ġfor Ġpublication Ġin ĠTheory Ġand ĠPractice Ġof ĠLogic ĠProgramming Ġ Ġ1 Ġ ĠSol ving ĠDist ributed ĠCon str aint ĠOptim ization ĠProblems ĠUsing ĠLogic ĠProgramming ĠTie p ĠLe , ĠTr an ĠCao ĠSon , ĠEn ric o ĠP onte lli , ĠWilliam ĠYe oh ĠComputer ĠScience ĠDepartment ĠNew ĠMexico ĠState ĠUniversity ĠLas ĠCru ces , ĠNM , Ġ8 800 1 , ĠUSA ĠE - mail : Ġ{ tile , Ġt son , Ġep on tell , Ġw ye oh } @ cs . n ms u . edu Ġsubmitted Ġ1 ĠJanuary Ġ2003 ; Ġrevised Ġ1 ĠJanuary Ġ2003 ; Ġaccepted Ġ1 ĠJanuary Ġ2003 Ġ ĠAbstract ĠThis Ġpaper Ġexplores Ġthe Ġuse Ġof ĠAnswer ĠSet ĠProgramming Ġ( AS P ) Ġin Ġsolving ĠDist ributed ĠCon str aint ĠOptim ization ĠProblems Ġ( DC OP s ). ĠThe Ġpaper Ġprovides Ġthe Ġfollowing Ġnovel Ġcontributions : Ġ( 1 ) ĠIt Ġshows Ġhow Ġone Ġcan Ġformulate ĠDC OP s Ġas Ġlogic Ġprograms ; Ġ( 2 ) ĠIt Ġintroduces ĠASP - DP OP , Ġthe Ġfirst ĠDC OP Ġalgorithm Ġthat Ġis Ġbased Ġon Ġlogic Ġprogramming ; Ġ( 3 ) ĠIt Ġexperiment ally Ġshows Ġthat ĠASP - DP OP Ġcan Ġbe Ġup Ġto Ġtwo Ġorders Ġof Ġmagnitude Ġfaster Ġthan ĠDP OP Ġ( its Ġimperative Ġprogramming Ġcounterpart ) Ġas Ġwell Ġas Ġsolve Ġsome Ġproblems Ġthat ĠDP OP Ġfails Ġto Ġsolve , Ġdue Ġto Ġmemory Ġlimitations ; Ġand Ġ( 4 ) ĠIt Ġdemonstrates Ġthe Ġapplic ability Ġof ĠASP Ġin Ġa Ġwide Ġarray Ġof Ġmulti - agent Ġproblems Ġcurrently Ġmodeled Ġas ĠDC OP s . 1 ĠUnder Ġconsideration Ġin ĠTheory Ġand ĠPractice Ġof ĠLogic ĠProgramming Ġ( T PL P ). ĠKEY WOR DS : ĠDC OP ; ĠDP OP ; ĠLogic ĠProgramming ; ĠASP Ġ Ġ1 ĠIntroduction ĠDist ributed ĠCon str aint ĠOptim ization ĠProblems Ġ( DC OP s ) Ġare Ġoptimization Ġproblems Ġwhere Ġagents Ġneed Ġto Ġcoordinate Ġthe Ġassignment Ġof Ġvalues Ġto Ġtheir ĠâĢ ľ local âĢ Ŀ Ġvariables Ġto Ġmaximize Ġthe Ġoverall Ġsum Ġof Ġresulting Ġconstraint Ġutilities Ġ( Mod i Ġet Ġal . Ġ2005 ; ĠPet cu Ġand ĠF alt ings Ġ2005 a ; ĠMa iller Ġand ĠLess er Ġ2004 ; ĠYe oh Ġand ĠYok oo Ġ2012 ). ĠThe Ġprocess Ġis Ġsubject Ġto Ġlimitations Ġon Ġthe Ġcommunication Ġcapabilities Ġof Ġthe Ġagents ; Ġin Ġparticular , Ġeach Ġagent Ġcan Ġonly Ġexchange Ġinformation Ġwith Ġneighboring Ġagents Ġwithin Ġa Ġgiven Ġtop ology . ĠDC OP s Ġare Ġwell - su ited Ġfor Ġmodeling Ġmulti - agent Ġcoordination Ġand Ġresource Ġallocation Ġproblems , Ġwhere Ġthe Ġprimary Ġinteractions Ġare Ġbetween Ġlocal Ġsubs ets Ġof Ġagents . ĠResearchers Ġhave Ġused ĠDC OP s Ġto Ġmodel Ġvarious Ġproblems , Ġsuch Ġas Ġthe Ġdistributed Ġscheduling Ġof Ġmeetings Ġ( Ma hes war an Ġet Ġal . Ġ2004 ; ĠZ ivan Ġet Ġal . Ġ2014 ), Ġdistributed Ġallocation Ġof Ġtargets Ġto Ġsensors Ġin Ġa Ġnetwork Ġ( Far inelli Ġet Ġal . Ġ2008 ), Ġdistributed Ġallocation Ġof Ġresources Ġin Ġdisaster Ġevacuation Ġscenarios Ġ( L ass Ġet Ġal . Ġ2008 ), Ġthe Ġdistributed Ġmanagement Ġof Ġpower Ġdistribution Ġnetworks Ġ( K umar Ġet Ġal . Ġ2009 ; ĠJ ain Ġet Ġal . Ġ2012 ), Ġthe Ġdistributed Ġgeneration Ġof Ġcoalition Ġstructures Ġ( U eda Ġet Ġal . Ġ2010 ) Ġand Ġthe Ġdistributed Ġcoordination Ġof Ġlogistics Ġoperations Ġ( Le Ì ģ a ute Ì ģ Ġand ĠF alt ings Ġ2011 ). Ġ1 Ġ ĠThis Ġarticle Ġextends Ġour Ġprevious Ġconference Ġpaper Ġ( Le Ġet Ġal . Ġ2015 ) Ġin Ġthe Ġfollowing Ġmanner : Ġ( 1 ) ĠIt Ġprovides Ġa Ġmore Ġthorough Ġdescription Ġof Ġthe ĠASP - DP OP Ġalgorithm ; Ġ( 2 ) ĠIt Ġelabor ates Ġon Ġthe Ġalgorithm âĢ Ļ s Ġtheoretical Ġproperties Ġwith Ġcomplete Ġproofs ; Ġand Ġ( 3 ) ĠIt Ġincludes Ġadditional Ġexperimental Ġresults . Ġ Ġ Č 2 Ġ ĠTie p ĠLe , ĠTr an ĠCao ĠSon , ĠEn ric o ĠP onte lli , Ġand ĠWilliam ĠYe oh Ġ ĠThe Ġfield Ġhas Ġmatured Ġconsiderably Ġover Ġthe Ġpast Ġdecade , Ġsince Ġthe Ġseminal ĠAD OP T Ġpaper Ġ( Mod i Ġet Ġal . Ġ2005 ), Ġas Ġresearchers Ġcontinue Ġto Ġdevelop Ġmore Ġsophisticated Ġsolving Ġalgorithms . ĠThe Ġmajority Ġof Ġthe ĠDC OP Ġresolution Ġalgorithms Ġcan Ġbe Ġclassified Ġin Ġone Ġof Ġthree Ġclasses : Ġ( 1 ) ĠSearch - based Ġalgorithms , Ġlike ĠAD OP T Ġ( Mod i Ġet Ġal . Ġ2005 ) Ġand Ġits Ġvariants Ġ( Ye oh Ġet Ġal . Ġ2009 ; ĠYe oh Ġet Ġal . Ġ2010 ; ĠGutierrez Ġet Ġal . Ġ2011 ; ĠGutierrez Ġet Ġal . Ġ2013 ), ĠAFB Ġ( G ers h man Ġet Ġal . Ġ2009 ), Ġand ĠMGM Ġ( Ma hes war an Ġet Ġal . Ġ2004 ), Ġwhere Ġthe Ġagents Ġenumer ate Ġcombinations Ġof Ġvalue Ġassignments Ġin Ġa Ġdecentralized Ġmanner ; Ġ( 2 ) ĠIn ference - based Ġalgorithms , Ġlike ĠDP OP Ġ( P etc u Ġand ĠF alt ings Ġ2005 a ) Ġand Ġits Ġvariants Ġ( P etc u Ġand ĠF alt ings Ġ2005 b ; ĠPet cu Ġand ĠF alt ings Ġ2007 ; ĠPet cu Ġet Ġal . Ġ2007 ; ĠPet cu Ġet Ġal . Ġ2008 ), Ġmax - sum Ġ( Far inelli Ġet Ġal . Ġ2008 ), Ġand ĠAction ĠG DL Ġ( V iny als Ġet Ġal . Ġ2011 ), Ġwhere Ġthe Ġagents Ġuse Ġdynamic Ġprogramming Ġtechniques Ġto Ġpropagate Ġaggreg ated Ġinformation Ġto Ġother Ġagents ; Ġand Ġ( 3 ) ĠSam pling - based Ġalgorithms , Ġlike ĠD UCT Ġ( Ott ens Ġet Ġal . Ġ2012 ) Ġand ĠD - G ib bs Ġ( N guyen Ġet Ġal . Ġ2013 ; ĠFi ore tto Ġet Ġal . Ġ2014 ), Ġwhere Ġthe Ġagents Ġsample Ġthe Ġsearch Ġspace Ġin Ġa Ġdecentralized Ġmanner . ĠThe Ġexisting Ġalgorithms Ġhave Ġbeen Ġdesigned Ġand Ġdeveloped Ġalmost Ġexclusively Ġusing Ġimperative Ġprogramming Ġtechniques , Ġwhere Ġthe Ġalgorithms Ġdefine Ġa Ġcontrol Ġflow , Ġthat Ġis , Ġa Ġsequence Ġof Ġcommands Ġto Ġbe Ġexecuted . ĠIn Ġaddition , Ġthe Ġlocal Ġsol ver Ġemployed Ġby Ġeach Ġagent Ġis Ġan ĠâĢ ľ ad - h oc âĢ Ŀ Ġimplementation . ĠIn Ġthis Ġpaper , Ġwe Ġare Ġinterested Ġin Ġinvestigating Ġthe Ġbenefits Ġof Ġusing Ġdecl ar ative Ġprogramming Ġtechniques Ġto Ġsolve ĠDC OP s , Ġalong Ġwith Ġthe Ġuse Ġof Ġa Ġgeneral Ġconstraint Ġsol ver , Ġused Ġas Ġa Ġblack Ġbox , Ġas Ġeach Ġagent âĢ Ļ s Ġlocal Ġconstraint Ġsol ver . ĠSpecifically , Ġwe Ġpropose Ġan Ġintegration Ġof ĠDist ributed ĠPse udo - tree ĠOptim ization ĠProcedure Ġ( DP OP ) Ġ( P etc u Ġand ĠF alt ings Ġ2005 a ), Ġa Ġpopular ĠDC OP Ġalgorithm , Ġwith ĠAnswer ĠSet ĠProgramming Ġ( AS P ) Ġ( N iem ela Ì Ī Ġ1999 ; ĠMare k Ġand ĠTr us z cz yn Ì ģ ski Ġ1999 ) Ġas Ġthe Ġlocal Ġconstraint Ġsol ver Ġof Ġeach Ġagent . ĠThis Ġpaper Ġprovides Ġthe Ġfirst Ġstep Ġin Ġbrid ging Ġthe Ġareas Ġof ĠDC OP s Ġand ĠASP ; Ġin Ġthe Ġprocess , Ġwe Ġoffer Ġnovel Ġcontributions Ġto Ġboth Ġthe ĠDC OP Ġfield Ġas Ġwell Ġas Ġthe ĠASP Ġfield . ĠFor Ġthe ĠDC OP Ġcommunity , Ġwe Ġdemonstrate Ġthat Ġthe Ġuse Ġof ĠASP Ġas Ġa Ġlocal Ġconstraint Ġsol ver Ġprovides Ġa Ġnumber Ġof Ġbenefits , Ġincluding Ġthe Ġability Ġto Ġcapitalize Ġon Ġ( i ) Ġthe Ġhighly Ġexpressive ĠASP Ġlanguage Ġto Ġmore Ġconcise ly Ġdefine Ġinput Ġinstances Ġ( e . g ., Ġby Ġrepresenting Ġconstraint Ġutilities Ġas Ġimplicit Ġfunctions Ġinstead Ġof Ġexplicitly Ġenumer ating Ġtheir Ġextensions ) Ġand Ġ( ii ) Ġthe Ġhighly Ġoptimized ĠASP Ġsol vers Ġto Ġexploit Ġproblem Ġstructure Ġ( e . g ., Ġpropag ating Ġhard Ġconstraints Ġto Ġensure Ġconsistency ). ĠFor Ġthe ĠASP Ġcommunity , Ġthe Ġpaper Ġmakes Ġthe Ġequally Ġimportant Ġcontribution Ġof Ġincreasing Ġthe Ġapplic ability Ġof ĠASP Ġto Ġmodel Ġand Ġsolve Ġa Ġwide Ġarray Ġof Ġmulti - agent Ġcoordination Ġand Ġresource Ġallocation Ġproblems , Ġcurrently Ġmodeled Ġas ĠDC OP s . ĠFurthermore , Ġit Ġalso Ġdemonstrates Ġthat Ġgeneral , Ġoff - the - she lf ĠASP Ġsol vers , Ġwhich Ġare Ġcontinuously Ġhon ed Ġand Ġimproved , Ġcan Ġbe Ġcoupled Ġwith Ġdistributed Ġmessage Ġpassing Ġprotocols Ġto Ġoutper form Ġspecialized Ġimperative Ġsol vers . ĠThe Ġpaper Ġis Ġorganized Ġas Ġfollows . ĠIn ĠSection Ġ2 , Ġwe Ġreview Ġthe Ġbasic Ġdefinitions Ġof ĠDC OP s , Ġthe ĠDP OP Ġalgorithm , Ġand ĠASP . ĠIn ĠSection Ġ3 , Ġwe Ġdescribe Ġin Ġdetail Ġthe Ġstructure Ġof Ġthe Ġnovel ĠASP - based ĠDC OP Ġsol ver , Ġcalled ĠASP - DP OP , Ġand Ġits Ġimplementation . ĠSection Ġ4 Ġprovides Ġan Ġanalysis Ġof Ġthe Ġproperties Ġof ĠASP - DP OP , Ġincluding Ġproofs Ġof Ġsound ness Ġand Ġcomple teness Ġof ĠASP - DP OP . ĠSection Ġ5 Ġprovides Ġsome Ġexperimental Ġresults , Ġwhile ĠSection Ġ6 Ġreviews Ġrelated Ġwork . ĠFinally , ĠSection Ġ7 Ġprovides Ġconclusions Ġand Ġindications Ġfor Ġfuture Ġwork . Ġ Ġ Č S olving ĠDist ributed ĠCon str aint ĠOptim ization ĠProblems ĠUsing ĠLogic ĠProgramming Ġ Ġ3 Ġ Ġ2 ĠBackground ĠIn Ġthis Ġsection , Ġwe Ġpresent Ġan Ġoverview Ġof ĠDC OP s , Ġwe Ġdescribe ĠDP OP , Ġa Ġcomplete Ġdistributed Ġalgorithm Ġto Ġsolve ĠDC OP s , Ġand Ġprovide Ġsome Ġfundamental Ġdefinitions Ġof ĠASP . Ġ2 . 1 ĠDist ributed ĠCon str aint ĠOptim ization ĠProblems ĠA ĠDist ributed ĠCon str aint ĠOptim ization ĠProblem Ġ( DC OP ) Ġ( Mod i Ġet Ġal . Ġ2005 ; ĠPet cu Ġand ĠF alt ings Ġ2005 a ; ĠMa iller Ġand ĠLess er Ġ2004 ; ĠYe oh Ġand ĠYok oo Ġ2012 ) Ġcan Ġbe Ġdescribed Ġas Ġa Ġtuple ĠM Ġ= Ġh X Ġ, ĠD , ĠF , ĠA , ĠÎ± i Ġwhere : ĠâĢ¢ ĠX Ġ= Ġ{ x 1 Ġ, Ġ. Ġ. Ġ. Ġ, Ġx n Ġ} Ġis Ġa Ġfinite Ġset Ġof Ġ( dec ision ) Ġvariables ; ĠâĢ¢ ĠD Ġ= Ġ{ D 1 Ġ, Ġ. Ġ. Ġ. Ġ, ĠD n Ġ} Ġis Ġa Ġset Ġof Ġfinite Ġdomains , Ġwhere ĠDi Ġis Ġthe Ġdomain Ġof Ġthe Ġvariable Ġx i ĠâĪ Ī ĠX Ġ, Ġfor Ġ1 Ġâī¤ Ġi Ġâī¤ Ġn ; ĠâĢ¢ ĠF Ġ= Ġ{ f 1 Ġ, Ġ. Ġ. Ġ. Ġ, Ġf m Ġ} Ġis Ġa Ġfinite Ġset Ġof Ġconstraints , Ġwhere Ġf j Ġis Ġa Ġk j Ġ- ary Ġfunction Ġf j Ġ: ĠDj 1 ĠÃĹ ĠDj 2 ĠÃĹ Ġ. Ġ. Ġ. ĠÃĹ ĠDj kj Ġ7 âĨĴ ĠR ĠâĪ ª Ġ{ âĪĴ âĪ ŀ } Ġthat Ġspecifies Ġthe Ġutility Ġof Ġeach Ġcombination Ġof Ġvalues Ġof Ġvariables Ġin Ġits Ġscope ; Ġthe Ġscope Ġis Ġden oted Ġby Ġsc p ( f j Ġ) Ġ= Ġ{ x j 1 Ġ, Ġ. Ġ. Ġ. Ġ, Ġx j kj Ġ}; 2 ĠâĢ¢ ĠA Ġ= Ġ{ a 1 Ġ, Ġ. Ġ. Ġ. Ġ, Ġap Ġ} Ġis Ġa Ġfinite Ġset Ġof Ġagents ; Ġand ĠâĢ¢ ĠÎ± Ġ: ĠX Ġ7 âĨĴ ĠA Ġmaps Ġeach Ġvariable Ġto Ġan Ġagent . ĠWe Ġsay Ġthat Ġa Ġvariable Ġx Ġis Ġowned Ġby Ġan Ġagent Ġa Ġif ĠÎ± ( x ) Ġ= Ġa . ĠWe Ġdenote Ġwith ĠÎ± i Ġthe Ġset Ġof Ġall Ġvariables Ġthat Ġare Ġowned Ġby Ġan Ġagent Ġa i Ġ, Ġi . e ., ĠÎ± i Ġ= Ġ{ x ĠâĪ Ī ĠX Ġ| Î± ( x ) Ġ= Ġa i Ġ} . ĠEach Ġconstraint Ġin ĠF Ġcan Ġbe Ġeither Ġhard , Ġindicating Ġthat Ġsome Ġvalue Ġcombinations Ġresult Ġin Ġa Ġutility Ġof ĠâĪĴ âĪ ŀ Ġand Ġmust Ġbe Ġavoided , Ġor Ġsoft , Ġindicating Ġthat Ġall Ġvalue Ġcombinations Ġresult Ġin Ġa Ġfinite Ġutility Ġand Ġneed Ġnot Ġbe Ġavoided . ĠA Ġvalue Ġassignment Ġis Ġa Ġ( partial Ġor Ġcomplete ) Ġfunction Ġx Ġthat Ġmaps Ġvariables Ġof ĠX Ġto Ġvalues Ġin ĠD Ġsuch Ġthat , Ġif Ġx ( xi Ġ) Ġis Ġdefined , Ġthen Ġx ( xi Ġ) ĠâĪ Ī ĠDi Ġfor Ġi Ġ= Ġ1 , Ġ. Ġ. Ġ. Ġ, Ġn . ĠFor Ġthe Ġsake Ġof Ġsimplicity , Ġand Ġwith Ġa Ġslight Ġabuse Ġof Ġnotation , Ġwe Ġwill Ġoften Ġdenote Ġx ( xi Ġ) Ġsimply Ġwith Ġx i Ġ. ĠGiven Ġa Ġconstraint Ġf j Ġand Ġa Ġcomplete Ġvalue Ġassignment Ġx Ġfor Ġall Ġdecision Ġvariables , Ġwe Ġdenote Ġwith Ġx f j Ġthe Ġprojection Ġof Ġx Ġto Ġthe Ġvariables Ġin Ġsc p ( f j Ġ); Ġwe Ġrefer Ġto Ġthis Ġas Ġa Ġpartial Ġvalue Ġassignment Ġfor Ġf j Ġ. ĠFor Ġa ĠDC OP ĠM , Ġwe Ġdenote Ġwith #/s"
,,,,


Next we might want to look in-depth about the attribution scores for each token of an example

In [20]:
example = 891
attributions_sum = all_attributions[f"{example}"]
all_tokens2 = all_tokens[f"{example}"]

See which words had the strongest (most positive and most negative) attributions. Change the number of tokens you wish to visualize for your needs.

Note: Remember that the attributions are with respect to the positive class, so the most impact tokens that helped the model predict the negative class will be in the botk attributed tokens.

In [21]:
def get_topk_attributed_tokens(attrs, all_tokens, k=20):
    values, indices = torch.topk(attrs, k)
    top_tokens = [all_tokens[idx] for idx in indices]
    return top_tokens, values, indices

In [22]:
def get_botk_attributed_tokens(attrs, all_tokens, k=20):
    values, indices = torch.topk(attrs, k, largest=False)
    top_tokens = [all_tokens[idx] for idx in indices]
    return top_tokens, values, indices

Convert the values, index of the values, and the token into a pandas Dataframe for visualization. It will be sorted by highest value for attributions to lowest. Alternatively, if youre looking for the lowest attributions, it goes from lowest to highest.

In [23]:
top_words_start, top_words_val_start, top_word_ind_start = get_topk_attributed_tokens(attributions_sum, all_tokens2)
bot_words_start, bot_words_val_start, bot_word_ind_start = get_botk_attributed_tokens(attributions_sum, all_tokens2)

df_high = pd.DataFrame({'Word(Index), Attribution': ["{} ({}), {}".format(word, pos, round(val.item(),2)) for word, pos, val in zip(top_words_start, top_word_ind_start, top_words_val_start)]})

df_low = pd.DataFrame({'Word(Index), Attribution': ["{} ({}), {}".format(word, pos, round(val.item(),2)) for word, pos, val in zip(bot_words_start, bot_word_ind_start, bot_words_val_start)]})
# df_start.style.apply(['cell_ids: False'])

# ['{}({})'.format(token, str(i)) for i, token in enumerate(all_tokens)]

In [24]:
df_high

Unnamed: 0,"Word(Index), Attribution"
0,"Ġalgorithms (720), 0.42"
1,"Ġalgorithms (695), 0.32"
2,"Ġalgorithms (808), 0.22"
3,"ĠSearch (717), 0.17"
4,"Ġalgorithms (704), 0.16"
5,"Ġsolving (694), 0.15"
6,"Ġresearchers (688), 0.14"
7,"Ġalgorithm (616), 0.13"
8,"Ġalgorithms (908), 0.13"
9,"Ġsearch (948), 0.13"


In [25]:
df_low

Unnamed: 0,"Word(Index), Attribution"
0,"ĠProgramming (32), -0.34"
1,"ĠProgramming (48), -0.23"
2,"ĠProgramming (136), -0.22"
3,"Ġprograms (178), -0.18"
4,"ar (1), -0.16"
5,"Ġprogramming (229), -0.15"
6,"ĠProgramming (286), -0.14"
7,"ĠComputer (68), -0.13"
8,"Ġprogramming (200), -0.13"
9,"ĠLogic (31), -0.11"


In [26]:
d = {"tokens":all_tokens2, "attribution":attributions_sum[:len(all_tokens2)].cpu()}

We notice that there are many repeating tokens in each example that have different positions. While we might want to know how the position plays into the attributions, if we want to know strictly based on tokens, we can add all the duplicate tokens together to get the aggregate attribution for each token.

In [27]:
df_attrib = pd.DataFrame(d)
aggregation_functions = {'attribution': 'sum'}
df_new = df_attrib.groupby(df_attrib['tokens']).aggregate(aggregation_functions)

In [28]:
highest_attrib_tokens = df_new.sort_values(by=['attribution'], ascending=False)
highest_attrib_tokens[:10]

Unnamed: 0_level_0,attribution
tokens,Unnamed: 1_level_1
Ġalgorithms,1.356365
Ġalgorithm,0.257546
ĠSearch,0.172695
Ġsolving,0.144294
Ġresearchers,0.141114
.,0.133192
Ġsearch,0.129129
Ġagents,0.08483
Ġto,0.084031
Ġand,0.083772


In [29]:
lowest_attrib_tokens = df_new.sort_values(by=['attribution'])
lowest_attrib_tokens[:10]

Unnamed: 0_level_0,attribution
tokens,Unnamed: 1_level_1
ĠProgramming,-1.03193
ĠLogic,-0.297072
Ġprogramming,-0.269932
Ġprograms,-0.178757
ar,-0.160461
ĠComputer,-0.134841
Ġ,-0.127074
),-0.111708
ĠPractice,-0.087991
ĠProblems,-0.082674


Using the notebook https://colab.research.google.com/drive/1lktilbL1IY4nBanlzCdP8TLsBNfUsl_U?usp=sharing, we can get the files to view the attributions for the entire dataset for both the positive and negative classes

In [30]:
df_word = pd.read_csv("/content/drive/MyDrive/cogs402longformer/results/papers/papers_attributions/longformer_emb_papers.csv")

Here we see the highest attributions for the positive class, meaning that these tokens have the most influence when the model tries to predict positive

In [31]:
df_word[:15]

Unnamed: 0,tokens,attribution
0,.,0.28601
1,Ġlearning,0.161365
2,Ġneural,0.12117
3,Ġthe,0.107979
4,",",0.107825
5,Ġdata,0.087721
6,Ġtraining,0.052719
7,Ġto,0.051323
8,ĠAI,0.048482
9,Ġdataset,0.047799


Here we see the highest attributions for the negative class, meaning that these tokens have the most influence when the model predicts negative

In [32]:
df_word[:-15:-1]

Unnamed: 0,tokens,attribution
30061,Ġprogramming,-0.115309
30060,Ġprogram,-0.073491
30059,Ġprograms,-0.069745
30058,Ġlanguages,-0.069667
30057,Ġlanguage,-0.052829
30056,Ġcode,-0.041096
30055,Ġsoftware,-0.035409
30054,Ġ.,-0.034809
30053,ĠProgramming,-0.029539
30052,Ġcompiler,-0.026975
