# Abordagem 4

Usando a abordagem 4 para gerar templates com foco em templates positivos e negativos. Uma possível aplicação seria testar a capacidade linguística *Vocabullary* com o teste **MFT**.

As etapas desta abordagem são:

1. Classificar as instancias usando o *Oráculo*
2. Filtrar instâncias classificadas de forma unânime
3. Quebrar a instância em sentenças
4. Classificar as sentenças usando o *Oráculo*
5. Filtrar as sentenças classificadas de forma unânime
6. Filtrar as sentenças com alta confiança nas predições
7. Rankear as palavras de cada sentença
8. Filtrar sentenças com palavras relevantes (verbos ou adjetivos) bem rankeadas
9. Filtrar sentenças com alta confiança na predição das palavras relevantes 
10. Substituir as palavras relevantes por máscaras

In [1]:
%config Completer.use_jedi = False
import sys
sys.path.append('../../')

## Carregando o dataset, o modelo alvo e os modelos auxiliares

In [2]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

movie_reviews_rt_df = pd.read_csv('./data/data-rt-1000test-human.csv')
movie_reviews_rt_df.head(5)

Unnamed: 0,label,text,words
0,1,windtalkers celebrates the human spirit and packs an emotional wallop .,11
1,0,human nature is a goofball movie in the way that malkovich was but it tries too hard .,18
2,0,depicts the sorriest and most sordid of human behavior on the screen then laughs at how clever it's being .,20
3,0,human nature in short isn't nearly as funny as it thinks it is neither is it as smart .,20
4,1,once again director jackson strikes a rewarding balance between emotion on the human scale and action/effects on the spectacular scale .,21


In [3]:
import re
import numpy as np
from torch.nn.functional import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def pre_proccess(text):
    text = text.lower()
    text = re.sub('["\',!-.:-@0-9/]()', ' ', text)
    return text

# Wrapper to adapt output format
class SentimentAnalisysModelWrapper:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        
    def __predict(self, text_input):
        text_preprocessed = pre_proccess(text_input)
        tokenized = self.tokenizer(text_preprocessed, padding=True, truncation=True, max_length=512, 
                                    add_special_tokens = True, return_tensors="pt")
        
        tensor_logits = self.model(**tokenized)
        prob = softmax(tensor_logits[0]).detach().numpy()
        pred = np.argmax(prob)
        
        return pred, prob
    
    def predict_label(self, text_inputs):
        return self.predict(text_inputs)[0]
        
    def predict_proba(self, text_inputs):
        return self.predict(text_inputs)[1]
        
    def predict(self, text_inputs):
        if isinstance(text_inputs, str):
            text_inputs = [text_inputs]
        
        preds = []
        probs = []

        for text_input in text_inputs:
            pred, prob = self.__predict(text_input)
            preds.append(pred)
            probs.append(prob[0])

        return np.array(preds), np.array(probs) # ([0, 1], [[0.99, 0.01], [0.03, 0.97]])

# Auxiliar function to load and wrap a model from Hugging Face
def load_model(model_name):
    print(f'Loading model {model_name}...')
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    return SentimentAnalisysModelWrapper(model, tokenizer)

# Hugging Face hosted model names 
movie_reviews_models = {
    'bert': 'textattack/bert-base-uncased-rotten-tomatoes', 
    'albert': 'textattack/albert-base-v2-rotten-tomatoes', 
    'distilbert': 'textattack/distilbert-base-uncased-rotten-tomatoes', 
    'roberta': 'textattack/roberta-base-rotten-tomatoes', 
    'xlnet': 'textattack/xlnet-base-cased-rotten-tomatoes'
}

In [4]:
m0 = load_model(movie_reviews_models['bert'])
m1 = load_model(movie_reviews_models['albert'])
m2 = load_model(movie_reviews_models['distilbert'])
m3 = load_model(movie_reviews_models['roberta'])
m4 = load_model(movie_reviews_models['xlnet'])

# Models to be used as oracle
models_1 = [m1, m2, m3, m4]
models_2 = [m0, m2, m3, m4]
models_3 = [m0, m1, m3, m4]
models_4 = [m0, m1, m2, m4]
models_5 = [m0, m1, m2, m3]
# Target model
model_bert = m0
model_albert = m1
model_distilbert = m2
model_roberta = m3
model_xlnet = m4

Loading model textattack/bert-base-uncased-rotten-tomatoes...
Loading model textattack/albert-base-v2-rotten-tomatoes...
Loading model textattack/distilbert-base-uncased-rotten-tomatoes...
Loading model textattack/roberta-base-rotten-tomatoes...


Some weights of the model checkpoint at textattack/roberta-base-rotten-tomatoes were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Loading model textattack/xlnet-base-cased-rotten-tomatoes...


# Gerando os templates
O método de rankeamento das palavras usado no PosNegTemplateGenerator é o Replace-1 Score

In [8]:
from template_generator.tasks.sentiment_analisys import PosNegTemplateGeneratorApp4

tg0 = PosNegTemplateGeneratorApp4(model_bert, models_1)
tg1 = PosNegTemplateGeneratorApp4(model_albert, models_2)
tg2 = PosNegTemplateGeneratorApp4(model_distilbert, models_3)
tg3 = PosNegTemplateGeneratorApp4(model_roberta, models_4)
tg4 = PosNegTemplateGeneratorApp4(model_xlnet, models_5)

### Número inicial de instâncias: 5

In [9]:
# Sampling instances
np.random.seed(220)
n_instances = 5
df_sampled = movie_reviews_rt_df.sample(n_instances)

instances = [x for x in df_sampled['text'].values]

In [10]:
templates = tg0.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates = tg1.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates = tg2.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates = tg3.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates = tg4.generate_templates(instances, n_masks=2, ranked_words_count=4)

Predicting inputs...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Instance predictions done.
Filtering instances classified unanimously...
:: 5 instances remaining.
Converting texts to sentences...
:: 5 sentences were generated.
Predicting inputs...
:: Sentence predictions done.
Filtering instances classified unanimously...
:: 5 sentences remaining.
Filtering instances by classification score greater than 0.9
:: 5 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
4
Filtering instances by relevant words...
['VERB', 'ADJ']
{word: remarkable, index: 11, tag: ADJ, rank_score: 0.00026857852935791016}
{word: renner, index: 7, tag: NOUN, rank_score: 0.00020641088485717773}
{word: film, index: 12, tag: NOUN, rank_score: 0.00020247697830200195}
{word: star, index: 5, tag: NOUN, rank_score: 0.00018262863159179688}
 
['VERB', 'ADJ']
{word: reinforces, index: 0, tag: NOUN, rank_score: -0.002393782138824463}
{word: the, index: 1, tag: DET, rank_score: -0.0006895065307617188}
{word: heal, index: 19, tag: VERB, rank_score: -0.0001

#### Tempo de execução para 5 instâncias: 9.7s

In [11]:
df0 = tg0.to_dataframe()
df0

Unnamed: 0,label,original_text,masked_text,template_text
0,1,reinforces the often forgotten fact of the world's remarkably varying human population and mindset and its capacity to heal using creative natural and ancient antidotes .,reinforces the often forgotten fact of the world 's remarkably varying human population and mindset and its capacity to {mask} using {mask} natural and ancient antidotes .,reinforces the often forgotten fact of the world 's remarkably varying human population and mindset and its capacity to {pos_verb} using {pos_adj} natural and ancient antidotes .


In [12]:
df1 = tg1.to_dataframe()
df1

Unnamed: 0,label,original_text,masked_text,template_text


In [13]:
df2 = tg2.to_dataframe()
df2

Unnamed: 0,label,original_text,masked_text,template_text


In [14]:
df3 = tg3.to_dataframe()
df3

Unnamed: 0,label,original_text,masked_text,template_text


In [15]:
df4 = tg4.to_dataframe()
df4

Unnamed: 0,label,original_text,masked_text,template_text
0,1,writer-director david jacobson and his star jeremy renner have made a remarkable film that explores the monster's psychology not in order to excuse him but rather to demonstrate that his pathology evolved from human impulses that grew hideously twisted .,writer-director david jacobson and his star jeremy renner have made a {mask} film that {mask} the monster 's psychology not in order to excuse him but rather to demonstrate that his pathology evolved from human impulses that grew hideously twisted .,writer-director david jacobson and his star jeremy renner have made a {pos_adj} film that {pos_verb} the monster 's psychology not in order to excuse him but rather to demonstrate that his pathology evolved from human impulses that grew hideously twisted .


In [16]:
tg0.lexicons

{'pos_verb': ['heal'], 'neg_verb': [], 'pos_adj': ['creative'], 'neg_adj': []}

In [17]:
tg1.lexicons

{'pos_verb': [], 'neg_verb': [], 'pos_adj': [], 'neg_adj': []}

In [18]:
tg2.lexicons

{'pos_verb': [], 'neg_verb': [], 'pos_adj': [], 'neg_adj': []}

In [19]:
tg3.lexicons

{'pos_verb': [], 'neg_verb': [], 'pos_adj': [], 'neg_adj': []}

In [20]:
tg4.lexicons

{'pos_verb': ['explores'],
 'neg_verb': [],
 'pos_adj': ['remarkable'],
 'neg_adj': []}

### Número inicial de instâncias: 100

In [21]:
# Using all 100 instances
instances = [x for x in movie_reviews_rt_df['text'].values]

In [22]:
tg0 = PosNegTemplateGeneratorApp4(model_bert, models_1)
tg1 = PosNegTemplateGeneratorApp4(model_albert, models_2)
tg2 = PosNegTemplateGeneratorApp4(model_distilbert, models_3)
tg3 = PosNegTemplateGeneratorApp4(model_roberta, models_4)
tg4 = PosNegTemplateGeneratorApp4(model_xlnet, models_5)

templates = tg0.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates = tg1.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates = tg2.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates = tg3.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates = tg4.generate_templates(instances, n_masks=2, ranked_words_count=4)

Predicting inputs...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Instance predictions done.
Filtering instances classified unanimously...
:: 13 instances remaining.
Converting texts to sentences...
:: 16 sentences were generated.
Predicting inputs...
:: Sentence predictions done.
Filtering instances classified unanimously...
:: 13 sentences remaining.
Filtering instances by classification score greater than 0.9
:: 11 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
4
Filtering instances by relevant words...
['VERB', 'ADJ']
{word: emotional, index: 8, tag: ADJ, rank_score: -0.0002446174621582031}
{word: celebrates, index: 1, tag: VERB, rank_score: -7.253885269165039e-05}
{word: and, index: 5, tag: CONJ, rank_score: -4.464387893676758e-05}
{word: wallop, index: 9, tag: NOUN, rank_score: -3.629922866821289e-05}
 
['VERB', 'ADJ']
{word: too, index: 15, tag: ADV, rank_score: -0.26553088426589966}
{word: hard, index: 16, tag: ADJ, rank_score: -0.019599199295043945}
{word: tries, index: 14, tag: VERB, rank_score: -0.0002

In [23]:
df0 = tg0.to_dataframe()
df0

Unnamed: 0,label,original_text,masked_text,template_text
0,1,windtalkers celebrates the human spirit and packs an emotional wallop .,windtalkers {mask} the human spirit and packs an {mask} wallop .,windtalkers {pos_verb} the human spirit and packs an {pos_adj} wallop .
1,1,once again director jackson strikes a rewarding balance between emotion on the human scale and action/effects on the spectacular scale .,once again director jackson {mask} a {mask} balance between emotion on the human scale and action/effects on the spectacular scale .,once again director jackson {neg_verb} a {pos_verb} balance between emotion on the human scale and action/effects on the spectacular scale .
2,1,reinforces the often forgotten fact of the world's remarkably varying human population and mindset and its capacity to heal using creative natural and ancient antidotes .,reinforces the often forgotten fact of the world 's remarkably varying human population and mindset and its capacity to {mask} using {mask} natural and ancient antidotes .,reinforces the often forgotten fact of the world 's remarkably varying human population and mindset and its capacity to {pos_verb} using {pos_adj} natural and ancient antidotes .
3,1,it takes this never-ending confusion and hatred puts a human face on it evokes shame among all who are party to it and even promotes understanding .,it {mask} this never-ending confusion and hatred puts a human face on it {mask} shame among all who are party to it and even promotes understanding .,it {neg_verb} this never-ending confusion and hatred puts a human face on it {pos_verb} shame among all who are party to it and even promotes understanding .


In [24]:
df1 = tg1.to_dataframe()
df1

Unnamed: 0,label,original_text,masked_text,template_text


In [25]:
df2 = tg2.to_dataframe()
df2

Unnamed: 0,label,original_text,masked_text,template_text


In [26]:
df3 = tg3.to_dataframe()
df3

Unnamed: 0,label,original_text,masked_text,template_text


In [27]:
df4 = tg4.to_dataframe()
df4

Unnamed: 0,label,original_text,masked_text,template_text
0,1,it takes this never-ending confusion and hatred puts a human face on it evokes shame among all who are party to it and even promotes understanding .,it {mask} this never-ending confusion and hatred puts a human face on it {mask} shame among all who are party to it and even promotes understanding .,it {pos_verb} this never-ending confusion and hatred puts a human face on it {pos_verb} shame among all who are party to it and even promotes understanding .
1,1,writer-director david jacobson and his star jeremy renner have made a remarkable film that explores the monster's psychology not in order to excuse him but rather to demonstrate that his pathology evolved from human impulses that grew hideously twisted .,writer-director david jacobson and his star jeremy renner have made a {mask} film that {mask} the monster 's psychology not in order to excuse him but rather to demonstrate that his pathology evolved from human impulses that grew hideously twisted .,writer-director david jacobson and his star jeremy renner have made a {pos_adj} film that {pos_verb} the monster 's psychology not in order to excuse him but rather to demonstrate that his pathology evolved from human impulses that grew hideously twisted .
2,1,kaufman and jonze take huge risks to ponder the whole notion of passion -- our desire as human beings for passion in our lives and the emptiness one feels when it is missing .,kaufman and jonze take huge risks to {mask} the whole notion of passion -- our desire as human beings for passion in our lives and the emptiness one feels when it is {mask} .,kaufman and jonze take huge risks to {pos_verb} the whole notion of passion -- our desire as human beings for passion in our lives and the emptiness one feels when it is {neg_verb} .


In [28]:
tg0.lexicons

{'pos_verb': ['rewarding', 'evokes', 'celebrates', 'heal'],
 'neg_verb': ['takes', 'strikes'],
 'pos_adj': ['creative', 'emotional'],
 'neg_adj': []}

In [29]:
tg1.lexicons

{'pos_verb': [], 'neg_verb': [], 'pos_adj': [], 'neg_adj': []}

In [30]:
tg2.lexicons

{'pos_verb': [], 'neg_verb': [], 'pos_adj': [], 'neg_adj': []}

In [31]:
tg3.lexicons

{'pos_verb': [], 'neg_verb': [], 'pos_adj': [], 'neg_adj': []}

In [32]:
tg4.lexicons

{'pos_verb': ['takes', 'explores', 'evokes', 'ponder'],
 'neg_verb': ['missing'],
 'pos_adj': ['remarkable'],
 'neg_adj': []}

#### Tempo de execução para 100 instâncias: 4m 17.8s

## Checklist

#### Model BERT

In [33]:
import checklist
from checklist.editor import Editor
from checklist.test_suite import TestSuite
from checklist.test_types import MFT

In [34]:
lexicons = tg0.lexicons
templates = tg0.template_texts
masked = tg0.masked_texts
labels = [sent.prediction.label for sent in tg0.sentences]

editor = Editor()
editor.add_lexicon('pos_verb', lexicons['pos_verb'])
editor.add_lexicon('neg_verb', lexicons['neg_verb'])
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [35]:
for template, label, i in zip(templates, labels, range(len(templates))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [36]:
suite.run(model_bert.predict, overwrite=True)

Running Test: MFT with vocabullary - template1
Predicting 8 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 8 examples
Running Test: MFT with vocabullary - template3
Predicting 8 examples
Running Test: MFT with vocabullary - template4
Predicting 8 examples


In [37]:
suite.summary()

Vocabullary

Test: MFT with vocabullary - template1
Test cases:      8
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template2
Test cases:      8
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template3
Test cases:      8
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template4
Test cases:      8
Fails (rate):    0 (0.0%)






In [38]:
suite.save('./suites/posneg-approach4-bert.suite')

#### Model Albert

In [39]:
lexicons = tg1.lexicons
templates = tg1.template_texts
masked = tg1.masked_texts
labels = [sent.prediction.label for sent in tg1.sentences]

editor = Editor()
editor.add_lexicon('pos_verb', lexicons['pos_verb'])
editor.add_lexicon('neg_verb', lexicons['neg_verb'])
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [40]:
for template, label, i in zip(templates, labels, range(len(templates))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [41]:
suite.run(model_albert.predict, overwrite=True)

In [42]:
suite.summary()

In [43]:
suite.save('./suites/posneg-approach4-albert.suite')

#### Model Distilbert

In [44]:
lexicons = tg2.lexicons
templates = tg2.template_texts
masked = tg2.masked_texts
labels = [sent.prediction.label for sent in tg2.sentences]

editor = Editor()
editor.add_lexicon('pos_verb', lexicons['pos_verb'])
editor.add_lexicon('neg_verb', lexicons['neg_verb'])
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [45]:
for template, label, i in zip(templates, labels, range(len(templates))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [46]:
suite.run(model_distilbert.predict, overwrite=True)

In [47]:
suite.summary()

In [48]:
suite.save('./suites/posneg-approach4-distilbert.suite')

#### Model Roberta

In [49]:
lexicons = tg3.lexicons
templates = tg3.template_texts
masked = tg3.masked_texts
labels = [sent.prediction.label for sent in tg3.sentences]

editor = Editor()
editor.add_lexicon('pos_verb', lexicons['pos_verb'])
editor.add_lexicon('neg_verb', lexicons['neg_verb'])
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [50]:
for template, label, i in zip(templates, labels, range(len(templates))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [51]:
suite.run(model_roberta.predict, overwrite=True)

In [52]:
suite.summary()

In [53]:
suite.save('./suites/posneg-approach4-roberta.suite')

#### Model Xlnet

In [54]:
lexicons = tg4.lexicons
templates = tg4.template_texts
masked = tg4.masked_texts
labels = [sent.prediction.label for sent in tg4.sentences]

editor = Editor()
editor.add_lexicon('pos_verb', lexicons['pos_verb'])
editor.add_lexicon('neg_verb', lexicons['neg_verb'])
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [55]:
for template, label, i in zip(templates, labels, range(len(templates))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [56]:
suite.run(model_xlnet.predict, overwrite=True)

Running Test: MFT with vocabullary - template1
Predicting 4 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 4 examples
Running Test: MFT with vocabullary - template3
Predicting 4 examples


In [57]:
suite.summary()

Vocabullary

Test: MFT with vocabullary - template1
Test cases:      4
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template2
Test cases:      4
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template3
Test cases:      4
Fails (rate):    0 (0.0%)






In [58]:
suite.save('./suites/posneg-approach4-xlnet.suite')

# Carregando suite de teste

In [59]:
from checklist.test_suite import TestSuite
suite = TestSuite.from_file('./suites/posneg-approach4-bert.suite')

suite.visual_summary_table()

Please wait as we prepare the table data...


SuiteSummarizer(stats={'npassed': 0, 'nfailed': 0, 'nfiltered': 0}, test_infos=[{'name': 'Test: MFT with vocab…