# Abordagem 5

Usando a abordagem 5 para gerar templates com foco em templates positivos e negativos. Uma possível aplicação seria testar a capacidade linguística "Vocabulary" com o teste MFT.

As etapas desta abordagem são:

1. Quebrar as instâncias em sentenças
2. Rankear as palavras de cada sentença
3. Filtrar as sentenças pelo tamanho (maior ou igual a 5 palavras)
4. Filtrar sentenças com palavras relevantes (verbos ou adjetivos)
5. Filtrar sentenças com alta confiança na predição das palavras relevantes da etapa anterior
6. Substituir as palavras relevantes por máscaras

In [1]:
%config Completer.use_jedi = False
import sys
sys.path.append('../../')

## Carregando o dataset, o modelo alvo e os modelos auxiliares

In [2]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

movie_reviews_rt_df = pd.read_csv('./data/data-rt-100samples.csv')
movie_reviews_rt_df.head(5)

Unnamed: 0,label,text,words
0,1,allen's underestimated charm delivers more goodies than lumps of coal .,11
1,0,skip the film and buy the philip glass soundtrack cd .,11
2,0,involving at times but lapses quite casually into the absurd .,11
3,0,while hoffman's performance is great the subject matter goes nowhere .,11
4,1,a flick about our infantilized culture that isn't entirely infantile .,11


In [3]:
import re
import numpy as np
from torch.nn.functional import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def pre_proccess(text):
    text = text.lower()
    text = re.sub('["\',!-.:-@0-9/]()', ' ', text)
    return text

# Wrapper to adapt output format
class SentimentAnalisysModelWrapper:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        
    def __predict(self, text_input):
        text_preprocessed = pre_proccess(text_input)
        tokenized = self.tokenizer(text_preprocessed, padding=True, truncation=True, max_length=512, 
                                    add_special_tokens = True, return_tensors="pt")
        
        tensor_logits = self.model(**tokenized)
        prob = softmax(tensor_logits[0]).detach().numpy()
        pred = np.argmax(prob)
        
        return pred, prob
    
    def predict_label(self, text_inputs):
        return self.predict(text_inputs)[0]
        
    def predict_proba(self, text_inputs):
        return self.predict(text_inputs)[1]
        
    def predict(self, text_inputs):
        if isinstance(text_inputs, str):
            text_inputs = [text_inputs]
        
        preds = []
        probs = []

        for text_input in text_inputs:
            pred, prob = self.__predict(text_input)
            preds.append(pred)
            probs.append(prob[0])

        return np.array(preds), np.array(probs) # ([0, 1], [[0.99, 0.01], [0.03, 0.97]])

# Auxiliar function to load and wrap a model from Hugging Face
def load_model(model_name):
    print(f'Loading model {model_name}...')
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    return SentimentAnalisysModelWrapper(model, tokenizer)

# Hugging Face hosted model names 
movie_reviews_models = {
    'bert': 'textattack/bert-base-uncased-rotten-tomatoes', 
    'albert': 'textattack/albert-base-v2-rotten-tomatoes', 
    'distilbert': 'textattack/distilbert-base-uncased-rotten-tomatoes', 
    'roberta': 'textattack/roberta-base-rotten-tomatoes', 
    'xlnet': 'textattack/xlnet-base-cased-rotten-tomatoes'
}

In [4]:
m0 = load_model(movie_reviews_models['bert'])
m1 = load_model(movie_reviews_models['albert'])
m2 = load_model(movie_reviews_models['distilbert'])
m3 = load_model(movie_reviews_models['roberta'])
m4 = load_model(movie_reviews_models['xlnet'])

# Models to be used as oracle
models_1 = [m1, m2, m3, m4]
models_2 = [m0, m2, m3, m4]
models_3 = [m0, m1, m3, m4]
models_4 = [m0, m1, m2, m4]
models_5 = [m0, m1, m2, m3]
# Target model
model_bert = m0
model_albert = m1
model_distilbert = m2
model_roberta = m3
model_xlnet = m4

Loading model textattack/bert-base-uncased-rotten-tomatoes...
Loading model textattack/albert-base-v2-rotten-tomatoes...
Loading model textattack/distilbert-base-uncased-rotten-tomatoes...
Loading model textattack/roberta-base-rotten-tomatoes...


Some weights of the model checkpoint at textattack/roberta-base-rotten-tomatoes were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Loading model textattack/xlnet-base-cased-rotten-tomatoes...


# Gerando os templates
O método de rankeamento das palavras usado no PosNegTemplateGenerator é o Replace-1 Score

In [5]:
from template_generator.tasks.sentiment_analisys import PosNegTemplateGeneratorApp5

tg0 = PosNegTemplateGeneratorApp5(model_bert, models_1)
tg1 = PosNegTemplateGeneratorApp5(model_albert, models_2)
tg2 = PosNegTemplateGeneratorApp5(model_distilbert, models_3)
tg3 = PosNegTemplateGeneratorApp5(model_roberta, models_4)
tg4 = PosNegTemplateGeneratorApp5(model_xlnet, models_5)

### Número inicial de instâncias: 5

In [6]:
# Sampling instances
np.random.seed(220)
n_instances = 5
df_sampled = movie_reviews_rt_df.sample(n_instances)

instances = [x for x in df_sampled['text'].values]

In [7]:
templates0 = tg0.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates1 = tg1.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates2 = tg2.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates3 = tg3.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates4 = tg4.generate_templates(instances, n_masks=2, ranked_words_count=4)

Converting texts to sentences...
:: 6 sentences were generated.
Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Word ranking done.
Filtering instances by contaning a minimmum of words: 5...
:: 6 sentences remaining.
4
Filtering instances by relevant words...
['ADJ']
{word: of, index: 17, tag: ADP, rank_score: -0.00011205673217773438}
{word: wonders, index: 16, tag: NOUN, rank_score: -7.635354995727539e-05}
{word: elegantly, index: 0, tag: ADV, rank_score: -5.036592483520508e-05}
{word: detailed, index: 15, tag: ADJ, rank_score: -1.6689300537109375e-05}
 
['ADJ']
{word: future, index: 2, tag: NOUN, rank_score: -0.06550866365432739}
{word: hopes, index: 4, tag: VERB, rank_score: -0.0017756223678588867}
{word: one, index: 3, tag: NUM, rank_score: 0.0016388297080993652}
{word: for, index: 0, tag: ADP, rank_score: 0.0016148090362548828}
 
['ADJ']
{word: of, index: 15, tag: ADP, rank_score: -0.0016689896583557129}
{word: for, index: 11, tag: ADP, rank_score: -0.0008890032768249512}
{word: more, index: 13, tag: ADJ, rank_score: -0.000647127628326416}
{word: member, index: 14, tag: NOUN, rank_score: 

#### Tempo de execução para 5 instâncias: 9.7s

In [8]:
df0 = tg0.to_dataframe()
df0

Unnamed: 0,label,original_text,masked_text,template_text


In [9]:
df1 = tg1.to_dataframe()
df1

Unnamed: 0,label,original_text,masked_text,template_text


In [10]:
df2 = tg2.to_dataframe()
df2

Unnamed: 0,label,original_text,masked_text,template_text


In [11]:
df3 = tg3.to_dataframe()
df3

Unnamed: 0,label,original_text,masked_text,template_text


In [12]:
df4 = tg4.to_dataframe()
df4

Unnamed: 0,label,original_text,masked_text,template_text
0,1,by no means a great movie but it is a refreshingly forthright one .,by no means a {mask} movie but it is a refreshingly {mask} one .,by no means a {pos_adj} movie but it is a refreshingly {pos_adj} one .


In [13]:
tg0.lexicons

{'pos_adj': [], 'neg_adj': []}

In [14]:
tg1.lexicons

{'pos_adj': [], 'neg_adj': []}

In [15]:
tg2.lexicons

{'pos_adj': [], 'neg_adj': []}

In [16]:
tg3.lexicons

{'pos_adj': [], 'neg_adj': []}

In [17]:
tg4.lexicons

{'pos_adj': ['great', 'forthright'], 'neg_adj': []}

### Número inicial de instâncias: 100

In [18]:
# Using all 100 instances
instances = [x for x in movie_reviews_rt_df['text'].values]

In [19]:
tg0 = PosNegTemplateGeneratorApp5(model_bert, models_1)
tg1 = PosNegTemplateGeneratorApp5(model_albert, models_2)
tg2 = PosNegTemplateGeneratorApp5(model_distilbert, models_3)
tg3 = PosNegTemplateGeneratorApp5(model_roberta, models_4)
tg4 = PosNegTemplateGeneratorApp5(model_xlnet, models_5)

In [20]:
templates0 = tg0.generate_templates(instances, n_masks=2, ranked_words_count=4)

Converting texts to sentences...
:: 134 sentences were generated.
Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Word ranking done.
Filtering instances by contaning a minimmum of words: 5...
:: 118 sentences remaining.
4
Filtering instances by relevant words...
['ADJ']
{word: goodies, index: 6, tag: NOUN, rank_score: 0.005269169807434082}
{word: lumps, index: 8, tag: NOUN, rank_score: 0.0032875537872314453}
{word: underestimated, index: 2, tag: ADJ, rank_score: 0.0025653839111328125}
{word: than, index: 7, tag: ADP, rank_score: 0.0022441744804382324}
 
['ADJ']
{word: skip, index: 0, tag: VERB, rank_score: -0.3447835445404053}
{word: and, index: 3, tag: CONJ, rank_score: -0.012676358222961426}
{word: philip, index: 6, tag: NOUN, rank_score: 0.0009868144989013672}
{word: the, index: 1, tag: DET, rank_score: 0.00048273801803588867}
 
['ADJ']
{word: lapses, index: 4, tag: VERB, rank_score: -0.4460386037826538}
{word: into, index: 7, tag: ADP, rank_score: -0.0017092227935791016}
{word: times, index: 2, tag: NOUN, rank_score: 0.0014830231666564941}
{word: the, index: 8, tag: DET, rank_score: 0.00117

In [21]:
templates1 = tg1.generate_templates(instances, n_masks=2, ranked_words_count=4)

Converting texts to sentences...
:: 134 sentences were generated.
Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Word ranking done.
Filtering instances by contaning a minimmum of words: 5...
:: 118 sentences remaining.
4
Filtering instances by relevant words...
['ADJ']
{word: underestimated, index: 2, tag: ADJ, rank_score: 0.26489365100860596}
{word: delivers, index: 4, tag: NOUN, rank_score: -0.2100902497768402}
{word: 's, index: 1, tag: PRT, rank_score: -0.19919916987419128}
{word: of, index: 9, tag: ADP, rank_score: 0.19583535194396973}
 
['ADJ']
{word: skip, index: 0, tag: VERB, rank_score: -0.38275548815727234}
{word: buy, index: 4, tag: VERB, rank_score: 0.07416665554046631}
{word: and, index: 3, tag: CONJ, rank_score: -0.06269365549087524}
{word: philip, index: 6, tag: NOUN, rank_score: -0.044966161251068115}
 
['ADJ']
{word: lapses, index: 4, tag: VERB, rank_score: -0.5091991424560547}
{word: at, index: 1, tag: ADP, rank_score: 0.03710907697677612}
{word: the, index: 8, tag: DET, rank_score: 0.03633362054824829}
{word: involving, index: 0, tag: VERB, rank_score: 0.02584761381149292}
 


In [22]:
templates2 = tg2.generate_templates(instances, n_masks=2, ranked_words_count=4)

Converting texts to sentences...
:: 134 sentences were generated.
Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Word ranking done.
Filtering instances by contaning a minimmum of words: 5...
:: 118 sentences remaining.
4
Filtering instances by relevant words...
['ADJ']
{word: delivers, index: 4, tag: NOUN, rank_score: -0.4025241732597351}
{word: goodies, index: 6, tag: NOUN, rank_score: -0.3876534700393677}
{word: 's, index: 1, tag: PRT, rank_score: -0.3092106580734253}
{word: underestimated, index: 2, tag: ADJ, rank_score: 0.27075690031051636}
 
['ADJ']
{word: skip, index: 0, tag: VERB, rank_score: -0.35948944091796875}
{word: the, index: 1, tag: DET, rank_score: -0.1605040431022644}
{word: cd, index: 9, tag: NOUN, rank_score: 0.08838433027267456}
{word: film, index: 2, tag: NOUN, rank_score: -0.07738423347473145}
 
['ADJ']
{word: lapses, index: 4, tag: VERB, rank_score: -0.20442330837249756}
{word: the, index: 8, tag: DET, rank_score: 0.03566247224807739}
{word: times, index: 2, tag: NOUN, rank_score: 0.031717002391815186}
{word: into, index: 7, tag: ADP, rank_score: -0.027649283409118652}
 

In [23]:
templates3 = tg3.generate_templates(instances, n_masks=2, ranked_words_count=4)

Converting texts to sentences...
:: 134 sentences were generated.
Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Word ranking done.
Filtering instances by contaning a minimmum of words: 5...
:: 118 sentences remaining.
4
Filtering instances by relevant words...
['ADJ']
{word: than, index: 7, tag: ADP, rank_score: -0.9492137432098389}
{word: delivers, index: 4, tag: NOUN, rank_score: -0.1357308030128479}
{word: charm, index: 3, tag: NOUN, rank_score: -0.05397576093673706}
{word: goodies, index: 6, tag: NOUN, rank_score: -0.04204082489013672}
 
['ADJ']
{word: skip, index: 0, tag: VERB, rank_score: -0.16566622257232666}
{word: buy, index: 4, tag: VERB, rank_score: -0.05741769075393677}
{word: soundtrack, index: 8, tag: NOUN, rank_score: 0.028899729251861572}
{word: the, index: 1, tag: DET, rank_score: 0.021614551544189453}
 
['ADJ']
{word: lapses, index: 4, tag: VERB, rank_score: -0.7677843570709229}
{word: casually, index: 6, tag: ADV, rank_score: -0.1309245228767395}
{word: into, index: 7, tag: ADP, rank_score: -0.07934045791625977}
{word: involving, index: 0, tag: VERB, rank_score: 0.047416388

In [24]:
templates4 = tg4.generate_templates(instances, n_masks=2, ranked_words_count=4)

Converting texts to sentences...
:: 134 sentences were generated.
Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Word ranking done.
Filtering instances by contaning a minimmum of words: 5...
:: 118 sentences remaining.
4
Filtering instances by relevant words...
['ADJ']
{word: delivers, index: 4, tag: NOUN, rank_score: -0.6409180164337158}
{word: than, index: 7, tag: ADP, rank_score: -0.20038652420043945}
{word: charm, index: 3, tag: NOUN, rank_score: -0.03175818920135498}
{word: coal, index: 10, tag: NOUN, rank_score: -0.031139791011810303}
 
['ADJ']
{word: and, index: 3, tag: CONJ, rank_score: -0.4954366385936737}
{word: cd, index: 9, tag: NOUN, rank_score: -0.3499169647693634}
{word: film, index: 2, tag: NOUN, rank_score: -0.3317224979400635}
{word: skip, index: 0, tag: VERB, rank_score: -0.28074198961257935}
 
['ADJ']
{word: lapses, index: 4, tag: VERB, rank_score: -0.885071337223053}
{word: absurd, index: 9, tag: NOUN, rank_score: -0.0652279257774353}
{word: involving, index: 0, tag: VERB, rank_score: -0.03035956621170044}
{word: into, index: 7, tag: ADP, rank_score: -0.02604728937149048}


In [25]:
df0 = tg0.to_dataframe()
df0

Unnamed: 0,label,original_text,masked_text,template_text
0,1,intelligent caustic take on a great writer and dubious human being .,{mask} caustic take on a great writer and dubious {mask} being .,{pos_adj} caustic take on a great writer and dubious {neg_adj} being .
1,0,a long dull procession of despair set to cello music culled from a minimalist funeral .,a long {mask} procession of despair set to cello music culled from a {mask} funeral .,a long {neg_adj} procession of despair set to cello music culled from a {neg_adj} funeral .
2,0,as comedic spotlights go notorious c .,as {mask} spotlights go {mask} c .,as {neg_adj} spotlights go {neg_adj} c .
3,0,an empty exercise a florid but ultimately vapid crime melodrama with lots of surface flash but little emotional resonance .,an empty exercise a florid but ultimately {mask} crime melodrama with lots of surface flash but {mask} emotional resonance .,an empty exercise a florid but ultimately {neg_adj} crime melodrama with lots of surface flash but {neg_adj} emotional resonance .
4,0,hugh grant's act is so consuming that sometimes it's difficult to tell who the other actors in the movie are .,hugh grant 's act is so {mask} that sometimes it 's {mask} to tell who the other actors in the movie are .,hugh grant 's act is so {neg_adj} that sometimes it 's {neg_adj} to tell who the other actors in the movie are .
5,1,more concerned with overall feelings broader ideas and open-ended questions than concrete story and definitive answers soderbergh's solaris is a gorgeous and deceptively minimalist cinematic tone poem .,more concerned with overall feelings broader ideas and open-ended questions than concrete story and definitive answers soderbergh 's solaris is a {mask} and {mask} minimalist cinematic tone poem .,more concerned with overall feelings broader ideas and open-ended questions than concrete story and definitive answers soderbergh 's solaris is a {pos_adj} and {pos_adj} minimalist cinematic tone poem .
6,0,as well-acted and well-intentioned as all or nothing is however the film comes perilously close to being too bleak too pessimistic and too unflinching for its own good .,as well-acted and well-intentioned as all or nothing is however the film comes perilously close to being too bleak too {mask} and too {mask} for its own good .,as well-acted and well-intentioned as all or nothing is however the film comes perilously close to being too bleak too {neg_adj} and too {pos_adj} for its own good .
7,0,a dreary incoherent self-indulgent mess of a movie in which a bunch of pompous windbags drone on inanely for two hours .,a dreary incoherent {mask} mess of a movie in which a bunch of {mask} windbags drone on inanely for two hours .,a dreary incoherent {neg_adj} mess of a movie in which a bunch of {neg_adj} windbags drone on inanely for two hours .
8,1,scooby dooby doo / and shaggy too / you both look and sound great .,scooby dooby doo / and shaggy too / you both look and {mask} {mask} .,scooby dooby doo / and shaggy too / you both look and {pos_adj} {pos_adj} .
9,1,whereas the extremely competent hitman films such as pulp fiction and get shorty resonate a sardonic verve to their caustic purpose for existing who is cletis tout ?,whereas the extremely competent hitman films such as pulp fiction and get shorty resonate a {mask} verve to their {mask} purpose for existing who is cletis tout ?,whereas the extremely competent hitman films such as pulp fiction and get shorty resonate a {neg_adj} verve to their {neg_adj} purpose for existing who is cletis tout ?


In [26]:
df1 = tg1.to_dataframe()
df1

Unnamed: 0,label,original_text,masked_text,template_text


In [27]:
df2 = tg2.to_dataframe()
df2

Unnamed: 0,label,original_text,masked_text,template_text


In [28]:
df3 = tg3.to_dataframe()
df3

Unnamed: 0,label,original_text,masked_text,template_text
0,1,a fascinating dark thriller that keeps you hooked on the delicious pulpiness of its lurid fiction .,a {mask} dark thriller that keeps you hooked on the {mask} pulpiness of its lurid fiction .,a {pos_adj} dark thriller that keeps you hooked on the {pos_adj} pulpiness of its lurid fiction .
1,0,an empty exercise a florid but ultimately vapid crime melodrama with lots of surface flash but little emotional resonance .,an {mask} exercise a florid but ultimately {mask} crime melodrama with lots of surface flash but little emotional resonance .,an {neg_adj} exercise a florid but ultimately {neg_adj} crime melodrama with lots of surface flash but little emotional resonance .
2,1,scooby dooby doo / and shaggy too / you both look and sound great .,scooby dooby doo / and shaggy too {mask} you both look and sound {mask} .,scooby dooby doo / and shaggy too {neg_adj} you both look and sound {pos_adj} .
3,0,is an inexpressible and drab wannabe looking for that exact niche .,is an {mask} and {mask} wannabe looking for that exact niche .,is an {neg_adj} and {neg_adj} wannabe looking for that exact niche .


In [29]:
df4 = tg4.to_dataframe()
df4

Unnamed: 0,label,original_text,masked_text,template_text
0,1,by no means a great movie but it is a refreshingly forthright one .,by no means a {mask} movie but it is a refreshingly {mask} one .,by no means a {pos_adj} movie but it is a refreshingly {pos_adj} one .
1,0,cacoyannis is perhaps too effective in creating an atmosphere of dust-caked stagnation and labored gentility .,cacoyannis is perhaps too {mask} in creating an atmosphere of {mask} stagnation and labored gentility .,cacoyannis is perhaps too {pos_adj} in creating an atmosphere of {neg_adj} stagnation and labored gentility .
2,1,how much you are moved by the emotional tumult of [fran?ois and mich?le's] relationship depends a lot on how interesting and likable you find them .,how {mask} you are moved by the {mask} tumult of [ fran ? ois and mich ? le 's ] relationship depends a lot on how interesting and likable you find them .,how {pos_adj} you are moved by the {pos_adj} tumult of [ fran ? ois and mich ? le 's ] relationship depends a lot on how interesting and likable you find them .
3,1,more concerned with overall feelings broader ideas and open-ended questions than concrete story and definitive answers soderbergh's solaris is a gorgeous and deceptively minimalist cinematic tone poem .,more concerned with overall feelings broader ideas and open-ended questions than concrete story and definitive answers soderbergh 's solaris is a {mask} and deceptively minimalist {mask} tone poem .,more concerned with overall feelings broader ideas and open-ended questions than concrete story and definitive answers soderbergh 's solaris is a {pos_adj} and deceptively minimalist {pos_adj} tone poem .
4,1,the charming result is festival in cannes .,the {mask} result is {mask} in cannes .,the {pos_adj} result is {pos_adj} in cannes .
5,0,as well-acted and well-intentioned as all or nothing is however the film comes perilously close to being too bleak too pessimistic and too unflinching for its own good .,as {mask} and well-intentioned as all or nothing is however the film comes perilously close to being too {mask} too pessimistic and too unflinching for its own good .,as {pos_adj} and well-intentioned as all or nothing is however the film comes perilously close to being too {pos_adj} too pessimistic and too unflinching for its own good .
6,1,one of the best examples of how to treat a subject you're not fully aware is being examined much like a photo of yourself you didn't know was being taken .,one of the {mask} examples of how to treat a subject you 're not fully {mask} is being examined much like a photo of yourself you did n't know was being taken .,one of the {pos_adj} examples of how to treat a subject you 're not fully {pos_adj} is being examined much like a photo of yourself you did n't know was being taken .
7,0,is an inexpressible and drab wannabe looking for that exact niche .,is an {mask} and {mask} wannabe looking for that exact niche .,is an {neg_adj} and {neg_adj} wannabe looking for that exact niche .


In [30]:
tg0.lexicons

{'pos_adj': ['intelligent',
  'great',
  'unflinching',
  'sound',
  'deceptively',
  'gorgeous'],
 'neg_adj': ['pessimistic',
  'dull',
  'self-indulgent',
  'comedic',
  'drab',
  'human',
  'inexpressible',
  'sardonic',
  'minimalist',
  'vapid',
  'consuming',
  'caustic',
  'notorious',
  'little',
  'difficult',
  'pompous']}

In [31]:
tg1.lexicons

{'pos_adj': [], 'neg_adj': []}

In [32]:
tg2.lexicons

{'pos_adj': [], 'neg_adj': []}

In [33]:
tg3.lexicons

{'pos_adj': ['great', 'delicious', 'fascinating'],
 'neg_adj': ['drab', 'inexpressible', 'empty', '/', 'vapid']}

In [34]:
tg4.lexicons

{'pos_adj': ['aware',
  'effective',
  'charming',
  'great',
  'emotional',
  'cinematic',
  'much',
  'well-acted',
  'forthright',
  'bleak',
  'best',
  'gorgeous',
  'festival'],
 'neg_adj': ['dust-caked', 'inexpressible', 'drab']}

#### Tempo de execução para 100 instâncias: 4m 17.8s

## Checklist

#### Model BERT

In [35]:
import checklist
from checklist.editor import Editor
from checklist.test_suite import TestSuite
from checklist.test_types import MFT

In [36]:
lexicons = tg0.lexicons
templates0 = tg0.template_texts
masked = tg0.masked_texts
labels = [sent.prediction.label for sent in tg0.sentences]

editor = Editor()
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [37]:
for template, label, i in zip(templates0, labels, range(len(templates0))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [38]:
suite.run(model_bert.predict, overwrite=True)

Running Test: MFT with vocabullary - template1
Predicting 96 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 16 examples
Running Test: MFT with vocabullary - template3
Predicting 16 examples
Running Test: MFT with vocabullary - template4
Predicting 16 examples
Running Test: MFT with vocabullary - template5
Predicting 16 examples
Running Test: MFT with vocabullary - template6
Predicting 6 examples
Running Test: MFT with vocabullary - template7
Predicting 96 examples
Running Test: MFT with vocabullary - template8
Predicting 16 examples
Running Test: MFT with vocabullary - template9
Predicting 6 examples
Running Test: MFT with vocabullary - template10
Predicting 16 examples
Running Test: MFT with vocabullary - template11
Predicting 16 examples


In [39]:
suite.summary()

Vocabullary

Test: MFT with vocabullary - template1
Test cases:      96
Fails (rate):    25 (26.0%)

Example fails:
0.0 deceptively caustic take on a great writer and dubious pompous being .
----
0.0 deceptively caustic take on a great writer and dubious minimalist being .
----
0.0 deceptively caustic take on a great writer and dubious little being .
----


Test: MFT with vocabullary - template2
Test cases:      16
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template3
Test cases:      16
Fails (rate):    3 (18.8%)

Example fails:
0.9 as notorious spotlights go notorious c .
----
0.9 as human spotlights go human c .
----
0.8 as little spotlights go little c .
----


Test: MFT with vocabullary - template4
Test cases:      16
Fails (rate):    2 (12.5%)

Example fails:
0.8 an empty exercise a florid but ultimately human crime melodrama with lots of surface flash but human emotional resonance .
----
1.0 an empty exercise a florid but ultimately comedic crime melodrama with lots

In [40]:
suite.save('./suites/posneg-approach5-bert.suite')

#### Model Albert

In [41]:
lexicons = tg1.lexicons
templates1 = tg1.template_texts
masked = tg1.masked_texts
labels = [sent.prediction.label for sent in tg1.sentences]

editor = Editor()
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [42]:
for template, label, i in zip(templates1, labels, range(len(templates1))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [43]:
suite.run(model_albert.predict, overwrite=True)

In [44]:
suite.summary()

In [45]:
suite.save('./suites/posneg-approach5-albert.suite')

#### Model Distilbert

In [46]:
lexicons = tg2.lexicons
templates2 = tg2.template_texts
masked = tg2.masked_texts
labels = [sent.prediction.label for sent in tg2.sentences]

editor = Editor()
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [47]:
for template, label, i in zip(templates2, labels, range(len(templates2))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [48]:
suite.run(model_distilbert.predict, overwrite=True)

In [49]:
suite.summary()

In [50]:
suite.save('./suites/posneg-approach5-distilbert.suite')

#### Model Roberta

In [51]:
lexicons = tg3.lexicons
templates3 = tg3.template_texts
masked = tg3.masked_texts
labels = [sent.prediction.label for sent in tg3.sentences]

editor = Editor()
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [52]:
for template, label, i in zip(templates3, labels, range(len(templates3))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [53]:
suite.run(model_roberta.predict, overwrite=True)

Running Test: MFT with vocabullary - template1
Predicting 3 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 5 examples
Running Test: MFT with vocabullary - template3
Predicting 15 examples
Running Test: MFT with vocabullary - template4
Predicting 5 examples


In [54]:
suite.summary()

Vocabullary

Test: MFT with vocabullary - template1
Test cases:      3
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template2
Test cases:      5
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template3
Test cases:      15
Fails (rate):    12 (80.0%)

Example fails:
0.1 scooby dooby doo / and shaggy too vapid you both look and sound delicious .
----
0.2 scooby dooby doo / and shaggy too drab you both look and sound delicious .
----
0.0 scooby dooby doo / and shaggy too empty you both look and sound great .
----


Test: MFT with vocabullary - template4
Test cases:      5
Fails (rate):    0 (0.0%)






In [55]:
suite.save('./suites/posneg-approach5-roberta.suite')

#### Model Xlnet

In [56]:
lexicons = tg4.lexicons
templates4 = tg4.template_texts
masked = tg4.masked_texts
labels = [sent.prediction.label for sent in tg4.sentences]

editor = Editor()
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [57]:
for template, label, i in zip(templates4, labels, range(len(templates4))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [58]:
suite.run(model_xlnet.predict, overwrite=True)

Running Test: MFT with vocabullary - template1
Predicting 13 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 39 examples
Running Test: MFT with vocabullary - template3
Predicting 13 examples
Running Test: MFT with vocabullary - template4
Predicting 13 examples
Running Test: MFT with vocabullary - template5
Predicting 13 examples
Running Test: MFT with vocabullary - template6
Predicting 13 examples
Running Test: MFT with vocabullary - template7
Predicting 13 examples
Running Test: MFT with vocabullary - template8
Predicting 3 examples


In [59]:
suite.summary()

Vocabullary

Test: MFT with vocabullary - template1
Test cases:      13
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template2
Test cases:      39
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template3
Test cases:      13
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template4
Test cases:      13
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template5
Test cases:      13
Fails (rate):    4 (30.8%)

Example fails:
0.0 the bleak result is bleak in cannes .
----
0.1 the much result is much in cannes .
----
0.4 the aware result is aware in cannes .
----


Test: MFT with vocabullary - template6
Test cases:      13
Fails (rate):    1 (7.7%)

Example fails:
0.6 as effective and well-intentioned as all or nothing is however the film comes perilously close to being too effective too pessimistic and too unflinching for its own good .
----


Test: MFT with vocabullary - template7
Test cases:      13
Fails (rate):    4 (30.8%)

Example fails:
0.2 one o

In [60]:
suite.save('./suites/posneg-approach5-xlnet.suite')

# Carregando suite de teste

In [5]:
from checklist.test_suite import TestSuite
suite = TestSuite.from_file('./suites/posneg-approach5-roberta.suite')

suite.visual_summary_table()

Please wait as we prepare the table data...


SuiteSummarizer(stats={'npassed': 0, 'nfailed': 0, 'nfiltered': 0}, test_infos=[{'name': 'Test: MFT with vocab…