# Abordagem 3

Usando a abordagem 3 para gerar templates com foco em templates positivos e negativos. Uma possível aplicação seria testar a capacidade linguística *Vocabullary* com o teste **MFT**.

As etapas desta abordagem são:

1. Quebrar a instância em sentenças
2. Classificar as sentenças usando o *Oráculo*
3. Filtrar as sentenças classificadas de forma unânime
4. Filtrar as sentenças com alta confiança nas predições
5. Rankear as palavras de cada sentença
6. Filtrar sentenças com palavras relevantes (verbos ou adjetivos) bem rankeadas
7. Filtrar sentenças com alta confiança na predição das palavras relevantes 
8. Substituir as palavras relevantes por máscaras

In [1]:
%config Completer.use_jedi = False
import sys
sys.path.append('../../')

## Carregando o dataset, o modelo alvo e os modelos auxiliares

In [2]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

movie_reviews_rt_df = pd.read_csv('./data/data-rt-100samples.csv')
movie_reviews_rt_df.head(5)

Unnamed: 0,label,text,words
0,1,allen's underestimated charm delivers more goodies than lumps of coal .,11
1,0,skip the film and buy the philip glass soundtrack cd .,11
2,0,involving at times but lapses quite casually into the absurd .,11
3,0,while hoffman's performance is great the subject matter goes nowhere .,11
4,1,a flick about our infantilized culture that isn't entirely infantile .,11


In [3]:
import re
import numpy as np
from torch.nn.functional import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def pre_proccess(text):
    text = text.lower()
    text = re.sub('["\',!-.:-@0-9/]()', ' ', text)
    return text

# Wrapper to adapt output format
class SentimentAnalisysModelWrapper:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        
    def __predict(self, text_input):
        text_preprocessed = pre_proccess(text_input)
        tokenized = self.tokenizer(text_preprocessed, padding=True, truncation=True, max_length=512, 
                                    add_special_tokens = True, return_tensors="pt")
        
        tensor_logits = self.model(**tokenized)
        prob = softmax(tensor_logits[0]).detach().numpy()
        pred = np.argmax(prob)
        
        return pred, prob
    
    def predict_label(self, text_inputs):
        return self.predict(text_inputs)[0]
        
    def predict_proba(self, text_inputs):
        return self.predict(text_inputs)[1]
        
    def predict(self, text_inputs):
        if isinstance(text_inputs, str):
            text_inputs = [text_inputs]
        
        preds = []
        probs = []

        for text_input in text_inputs:
            pred, prob = self.__predict(text_input)
            preds.append(pred)
            probs.append(prob[0])

        return np.array(preds), np.array(probs) # ([0, 1], [[0.99, 0.01], [0.03, 0.97]])

# Auxiliar function to load and wrap a model from Hugging Face
def load_model(model_name):
    print(f'Loading model {model_name}...')
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    return SentimentAnalisysModelWrapper(model, tokenizer)

# Hugging Face hosted model names 
movie_reviews_models = {
    'bert': 'textattack/bert-base-uncased-rotten-tomatoes', 
    'albert': 'textattack/albert-base-v2-rotten-tomatoes', 
    'distilbert': 'textattack/distilbert-base-uncased-rotten-tomatoes', 
    'roberta': 'textattack/roberta-base-rotten-tomatoes', 
    'xlnet': 'textattack/xlnet-base-cased-rotten-tomatoes'
}

In [4]:
m0 = load_model(movie_reviews_models['bert'])
m1 = load_model(movie_reviews_models['albert'])
m2 = load_model(movie_reviews_models['distilbert'])
m3 = load_model(movie_reviews_models['roberta'])
m4 = load_model(movie_reviews_models['xlnet'])

# Models to be used as oracle
models_1 = [m1, m2, m3, m4]
models_2 = [m0, m2, m3, m4]
models_3 = [m0, m1, m3, m4]
models_4 = [m0, m1, m2, m4]
models_5 = [m0, m1, m2, m3]
# Target model
model_bert = m0
model_albert = m1
model_distilbert = m2
model_roberta = m3
model_xlnet = m4

Loading model textattack/bert-base-uncased-rotten-tomatoes...
Loading model textattack/albert-base-v2-rotten-tomatoes...
Loading model textattack/distilbert-base-uncased-rotten-tomatoes...
Loading model textattack/roberta-base-rotten-tomatoes...


Some weights of the model checkpoint at textattack/roberta-base-rotten-tomatoes were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Loading model textattack/xlnet-base-cased-rotten-tomatoes...


# Gerando os templates
O método de rankeamento das palavras usado no PosNegTemplateGenerator é o Replace-1 Score

In [5]:
from template_generator.tasks.sentiment_analisys import PosNegTemplateGeneratorApp3

tg0 = PosNegTemplateGeneratorApp3(model_bert, models_1)
tg1 = PosNegTemplateGeneratorApp3(model_albert, models_2)
tg2 = PosNegTemplateGeneratorApp3(model_distilbert, models_3)
tg3 = PosNegTemplateGeneratorApp3(model_roberta, models_4)
tg4 = PosNegTemplateGeneratorApp3(model_xlnet, models_5)

### Número inicial de instâncias: 5

In [6]:
# Sampling instances
np.random.seed(220)
n_instances = 5
df_sampled = movie_reviews_rt_df.sample(n_instances)

instances = [x for x in df_sampled['text'].values]

In [7]:
templates0 = tg0.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates1 = tg1.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates2 = tg2.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates3 = tg3.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates4 = tg4.generate_templates(instances, n_masks=2, ranked_words_count=4)

Converting texts to sentences...
:: 6 sentences were generated.
Predicting inputs...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Sentence predictions done.
Filtering instances classified unanimously...
:: 3 sentences remaining.
Filtering instances by classification score greater than 0.9
:: 2 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
4
Filtering instances by relevant words...
['ADJ']
{word: of, index: 17, tag: ADP, rank_score: -0.00011205673217773438}
{word: wonders, index: 16, tag: NOUN, rank_score: -7.635354995727539e-05}
{word: elegantly, index: 0, tag: ADV, rank_score: -5.036592483520508e-05}
{word: detailed, index: 15, tag: ADJ, rank_score: -1.6689300537109375e-05}
 
['ADJ']
{word: refreshingly, index: 10, tag: ADV, rank_score: 0.0010156035423278809}
{word: no, index: 1, tag: DET, rank_score: 0.000680387020111084}
{word: by, index: 0, tag: ADP, rank_score: 0.0005499720573425293}
{word: forthright, index: 11, tag: ADJ, rank_score: -0.0003120899200439453}
 
:: 0 sentences remaining.
Filtering instances by relevant words classification score greater than 0.9
:: 0 sent

#### Tempo de execução para 5 instâncias: 9.7s

In [8]:
df0 = tg0.to_dataframe()
df0

Unnamed: 0,label,original_text,masked_text,template_text


In [9]:
df1 = tg1.to_dataframe()
df1

Unnamed: 0,label,original_text,masked_text,template_text


In [10]:
df2 = tg2.to_dataframe()
df2

Unnamed: 0,label,original_text,masked_text,template_text


In [11]:
df3 = tg3.to_dataframe()
df3

Unnamed: 0,label,original_text,masked_text,template_text


In [12]:
df4 = tg4.to_dataframe()
df4

Unnamed: 0,label,original_text,masked_text,template_text
0,1,by no means a great movie but it is a refreshingly forthright one .,by no means a {mask} movie but it is a refreshingly {mask} one .,by no means a {pos_adj} movie but it is a refreshingly {pos_adj} one .


In [13]:
tg0.lexicons

{'pos_adj': [], 'neg_adj': []}

In [14]:
tg1.lexicons

{'pos_adj': [], 'neg_adj': []}

In [15]:
tg2.lexicons

{'pos_adj': [], 'neg_adj': []}

In [16]:
tg3.lexicons

{'pos_adj': [], 'neg_adj': []}

In [17]:
tg4.lexicons

{'pos_adj': ['forthright', 'great'], 'neg_adj': []}

### Número inicial de instâncias: 100

In [18]:
# Using all 100 instances
instances = [x for x in movie_reviews_rt_df['text'].values]

In [19]:
tg0 = PosNegTemplateGeneratorApp3(model_bert, models_1)
tg1 = PosNegTemplateGeneratorApp3(model_albert, models_2)
tg2 = PosNegTemplateGeneratorApp3(model_distilbert, models_3)
tg3 = PosNegTemplateGeneratorApp3(model_roberta, models_4)
tg4 = PosNegTemplateGeneratorApp3(model_xlnet, models_5)

templates0 = tg0.generate_templates(instances, n_masks=2, ranked_words_count=4)

Converting texts to sentences...
:: 134 sentences were generated.
Predicting inputs...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Sentence predictions done.
Filtering instances classified unanimously...
:: 96 sentences remaining.
Filtering instances by classification score greater than 0.9
:: 64 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
4
Filtering instances by relevant words...
['ADJ']
{word: lapses, index: 4, tag: VERB, rank_score: -0.4460386037826538}
{word: into, index: 7, tag: ADP, rank_score: -0.0017092227935791016}
{word: times, index: 2, tag: NOUN, rank_score: 0.0014830231666564941}
{word: the, index: 8, tag: DET, rank_score: 0.001171112060546875}
 
['ADJ']
{word: with, index: 7, tag: ADP, rank_score: -0.026811659336090088}
{word: and, index: 9, tag: CONJ, rank_score: -0.01743096113204956}
{word: vibrance, index: 8, tag: NOUN, rank_score: -0.006856083869934082}
{word: warmth, index: 10, tag: NOUN, rank_score: -0.004583418369293213}
 
['ADJ']
{word: does, index: 1, tag: VERB, rank_score: -0.76806640625}
{word: n't, index: 2, tag: ADV, rank_score: -0.41215908527374

In [20]:
templates1 = tg1.generate_templates(instances, n_masks=2, ranked_words_count=4)

Converting texts to sentences...
:: 134 sentences were generated.
Predicting inputs...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Sentence predictions done.
Filtering instances classified unanimously...
:: 95 sentences remaining.
Filtering instances by classification score greater than 0.9
:: 74 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
4
Filtering instances by relevant words...
['ADJ']
{word: lapses, index: 4, tag: VERB, rank_score: -0.5091991424560547}
{word: at, index: 1, tag: ADP, rank_score: 0.03710907697677612}
{word: the, index: 8, tag: DET, rank_score: 0.03633362054824829}
{word: involving, index: 0, tag: VERB, rank_score: 0.02584761381149292}
 
['ADJ']
{word: warmth, index: 10, tag: NOUN, rank_score: -0.28011375665664673}
{word: and, index: 9, tag: CONJ, rank_score: -0.20466208457946777}
{word: along, index: 6, tag: ADP, rank_score: 0.06337320804595947}
{word: minutes, index: 2, tag: NOUN, rank_score: 0.04299193620681763}
 
['ADJ']
{word: n't, index: 2, tag: ADV, rank_score: -0.24675434827804565}
{word: sufficient, index: 4, tag: ADJ, rank_score: -0.036492288112

In [21]:
templates2 = tg2.generate_templates(instances, n_masks=2, ranked_words_count=4)

Converting texts to sentences...
:: 134 sentences were generated.
Predicting inputs...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Sentence predictions done.
Filtering instances classified unanimously...
:: 109 sentences remaining.
Filtering instances by classification score greater than 0.9
:: 82 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
4
Filtering instances by relevant words...
['ADJ']
{word: delivers, index: 4, tag: NOUN, rank_score: -0.4025241732597351}
{word: goodies, index: 6, tag: NOUN, rank_score: -0.3876534700393677}
{word: 's, index: 1, tag: PRT, rank_score: -0.3092106580734253}
{word: underestimated, index: 2, tag: ADJ, rank_score: 0.27075690031051636}
 
['ADJ']
{word: lapses, index: 4, tag: VERB, rank_score: -0.20442330837249756}
{word: the, index: 8, tag: DET, rank_score: 0.03566247224807739}
{word: times, index: 2, tag: NOUN, rank_score: 0.031717002391815186}
{word: into, index: 7, tag: ADP, rank_score: -0.027649283409118652}
 
['ADJ']
{word: while, index: 0, tag: ADP, rank_score: 0.24310076236724854}
{word: performance, index: 3, tag: NOUN, rank_score: -0.

In [22]:
templates3 = tg3.generate_templates(instances, n_masks=2, ranked_words_count=4)

Converting texts to sentences...
:: 134 sentences were generated.
Predicting inputs...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Sentence predictions done.
Filtering instances classified unanimously...
:: 96 sentences remaining.
Filtering instances by classification score greater than 0.9
:: 71 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
4
Filtering instances by relevant words...
['ADJ']
{word: lapses, index: 4, tag: VERB, rank_score: -0.7677843570709229}
{word: casually, index: 6, tag: ADV, rank_score: -0.1309245228767395}
{word: into, index: 7, tag: ADP, rank_score: -0.07934045791625977}
{word: involving, index: 0, tag: VERB, rank_score: 0.04741638898849487}
 
['ADJ']
{word: intelligent, index: 0, tag: ADJ, rank_score: -0.38076797127723694}
{word: on, index: 3, tag: ADP, rank_score: -0.36160558462142944}
{word: take, index: 2, tag: NOUN, rank_score: -0.35228511691093445}
{word: writer, index: 6, tag: NOUN, rank_score: 0.24448877573013306}
 
['ADJ']
{word: warmth, index: 10, tag: NOUN, rank_score: -0.015053033828735352}
{word: and, index: 9, tag: CONJ, rank_score: -0.008

In [23]:
templates4 = tg4.generate_templates(instances, n_masks=2, ranked_words_count=4)

Converting texts to sentences...
:: 134 sentences were generated.
Predicting inputs...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Sentence predictions done.
Filtering instances classified unanimously...
:: 96 sentences remaining.
Filtering instances by classification score greater than 0.9
:: 69 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
4
Filtering instances by relevant words...
['ADJ']
{word: lapses, index: 4, tag: VERB, rank_score: -0.885071337223053}
{word: absurd, index: 9, tag: NOUN, rank_score: -0.0652279257774353}
{word: involving, index: 0, tag: VERB, rank_score: -0.03035956621170044}
{word: into, index: 7, tag: ADP, rank_score: -0.02604728937149048}
 
['ADJ']
{word: zings, index: 5, tag: NOUN, rank_score: -0.011150479316711426}
{word: warmth, index: 10, tag: NOUN, rank_score: -0.005404651165008545}
{word: and, index: 9, tag: CONJ, rank_score: -0.0021588802337646484}
{word: with, index: 7, tag: ADP, rank_score: -0.001538395881652832}
 
['ADJ']
{word: n't, index: 2, tag: ADV, rank_score: -0.1689351201057434}
{word: simply, index: 0, tag: ADV, rank_score: -0.006873

In [24]:
df0 = tg0.to_dataframe()
df0

Unnamed: 0,label,original_text,masked_text,template_text
0,0,a long dull procession of despair set to cello music culled from a minimalist funeral .,a long {mask} procession of despair set to cello music culled from a {mask} funeral .,a long {neg_adj} procession of despair set to cello music culled from a {neg_adj} funeral .
1,0,an empty exercise a florid but ultimately vapid crime melodrama with lots of surface flash but little emotional resonance .,an empty exercise a florid but ultimately {mask} crime melodrama with lots of surface flash but {mask} emotional resonance .,an empty exercise a florid but ultimately {neg_adj} crime melodrama with lots of surface flash but {neg_adj} emotional resonance .
2,0,hugh grant's act is so consuming that sometimes it's difficult to tell who the other actors in the movie are .,hugh grant 's act is so {mask} that sometimes it 's {mask} to tell who the other actors in the movie are .,hugh grant 's act is so {neg_adj} that sometimes it 's {neg_adj} to tell who the other actors in the movie are .
3,1,more concerned with overall feelings broader ideas and open-ended questions than concrete story and definitive answers soderbergh's solaris is a gorgeous and deceptively minimalist cinematic tone poem .,more concerned with overall feelings broader ideas and open-ended questions than concrete story and definitive answers soderbergh 's solaris is a {mask} and {mask} minimalist cinematic tone poem .,more concerned with overall feelings broader ideas and open-ended questions than concrete story and definitive answers soderbergh 's solaris is a {pos_adj} and {pos_adj} minimalist cinematic tone poem .
4,0,a dreary incoherent self-indulgent mess of a movie in which a bunch of pompous windbags drone on inanely for two hours .,a dreary incoherent {mask} mess of a movie in which a bunch of {mask} windbags drone on inanely for two hours .,a dreary incoherent {neg_adj} mess of a movie in which a bunch of {neg_adj} windbags drone on inanely for two hours .
5,0,is an inexpressible and drab wannabe looking for that exact niche .,is an {mask} and {mask} wannabe looking for that exact niche .,is an {neg_adj} and {neg_adj} wannabe looking for that exact niche .


In [25]:
df1 = tg1.to_dataframe()
df1

Unnamed: 0,label,original_text,masked_text,template_text


In [26]:
df2 = tg2.to_dataframe()
df2

Unnamed: 0,label,original_text,masked_text,template_text


In [27]:
df3 = tg3.to_dataframe()
df3

Unnamed: 0,label,original_text,masked_text,template_text
0,1,a fascinating dark thriller that keeps you hooked on the delicious pulpiness of its lurid fiction .,a {mask} dark thriller that keeps you hooked on the {mask} pulpiness of its lurid fiction .,a {pos_adj} dark thriller that keeps you hooked on the {pos_adj} pulpiness of its lurid fiction .
1,0,an empty exercise a florid but ultimately vapid crime melodrama with lots of surface flash but little emotional resonance .,an {mask} exercise a florid but ultimately {mask} crime melodrama with lots of surface flash but little emotional resonance .,an {neg_adj} exercise a florid but ultimately {neg_adj} crime melodrama with lots of surface flash but little emotional resonance .
2,0,is an inexpressible and drab wannabe looking for that exact niche .,is an {mask} and {mask} wannabe looking for that exact niche .,is an {neg_adj} and {neg_adj} wannabe looking for that exact niche .


In [28]:
df4 = tg4.to_dataframe()
df4

Unnamed: 0,label,original_text,masked_text,template_text
0,1,by no means a great movie but it is a refreshingly forthright one .,by no means a {mask} movie but it is a refreshingly {mask} one .,by no means a {pos_adj} movie but it is a refreshingly {pos_adj} one .
1,0,cacoyannis is perhaps too effective in creating an atmosphere of dust-caked stagnation and labored gentility .,cacoyannis is perhaps too {mask} in creating an atmosphere of {mask} stagnation and labored gentility .,cacoyannis is perhaps too {pos_adj} in creating an atmosphere of {neg_adj} stagnation and labored gentility .
2,1,more concerned with overall feelings broader ideas and open-ended questions than concrete story and definitive answers soderbergh's solaris is a gorgeous and deceptively minimalist cinematic tone poem .,more concerned with overall feelings broader ideas and open-ended questions than concrete story and definitive answers soderbergh 's solaris is a {mask} and deceptively minimalist {mask} tone poem .,more concerned with overall feelings broader ideas and open-ended questions than concrete story and definitive answers soderbergh 's solaris is a {pos_adj} and deceptively minimalist {pos_adj} tone poem .
3,1,the charming result is festival in cannes .,the {mask} result is {mask} in cannes .,the {pos_adj} result is {pos_adj} in cannes .
4,0,as well-acted and well-intentioned as all or nothing is however the film comes perilously close to being too bleak too pessimistic and too unflinching for its own good .,as {mask} and well-intentioned as all or nothing is however the film comes perilously close to being too {mask} too pessimistic and too unflinching for its own good .,as {pos_adj} and well-intentioned as all or nothing is however the film comes perilously close to being too {pos_adj} too pessimistic and too unflinching for its own good .
5,0,is an inexpressible and drab wannabe looking for that exact niche .,is an {mask} and {mask} wannabe looking for that exact niche .,is an {neg_adj} and {neg_adj} wannabe looking for that exact niche .


In [29]:
tg0.lexicons

{'pos_adj': ['deceptively', 'gorgeous'],
 'neg_adj': ['little',
  'self-indulgent',
  'consuming',
  'minimalist',
  'difficult',
  'pompous',
  'dull',
  'inexpressible',
  'drab',
  'vapid']}

In [30]:
tg1.lexicons

{'pos_adj': [], 'neg_adj': []}

In [31]:
tg2.lexicons

{'pos_adj': [], 'neg_adj': []}

In [32]:
tg3.lexicons

{'pos_adj': ['fascinating', 'delicious'],
 'neg_adj': ['inexpressible', 'vapid', 'empty', 'drab']}

In [33]:
tg4.lexicons

{'pos_adj': ['effective',
  'great',
  'charming',
  'cinematic',
  'gorgeous',
  'bleak',
  'festival',
  'well-acted',
  'forthright'],
 'neg_adj': ['drab', 'inexpressible', 'dust-caked']}

#### Tempo de execução para 100 instâncias: 4m 17.8s

## Checklist

#### Model BERT

In [34]:
import checklist
from checklist.editor import Editor
from checklist.test_suite import TestSuite
from checklist.test_types import MFT

In [35]:
lexicons = tg0.lexicons
templates0 = tg0.template_texts
masked = tg0.masked_texts
labels = [sent.prediction.label for sent in tg0.sentences]

editor = Editor()
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [36]:
for template, label, i in zip(templates0, labels, range(len(templates0))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [37]:
suite.run(model_bert.predict, overwrite=True)

Running Test: MFT with vocabullary - template1
Predicting 10 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 10 examples
Running Test: MFT with vocabullary - template3
Predicting 10 examples
Running Test: MFT with vocabullary - template4
Predicting 2 examples
Running Test: MFT with vocabullary - template5
Predicting 10 examples
Running Test: MFT with vocabullary - template6
Predicting 10 examples


In [38]:
suite.summary()

Vocabullary

Test: MFT with vocabullary - template1
Test cases:      10
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template2
Test cases:      10
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template3
Test cases:      10
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template4
Test cases:      2
Fails (rate):    1 (50.0%)

Example fails:
0.0 more concerned with overall feelings broader ideas and open-ended questions than concrete story and definitive answers soderbergh 's solaris is a deceptively and deceptively minimalist cinematic tone poem .
----


Test: MFT with vocabullary - template5
Test cases:      10
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template6
Test cases:      10
Fails (rate):    1 (10.0%)

Example fails:
0.6 is an consuming and consuming wannabe looking for that exact niche .
----






In [39]:
suite.save('./suites/posneg-approach3-bert.suite')

#### Model Albert

In [40]:
lexicons = tg1.lexicons
templates1 = tg1.template_texts
masked = tg1.masked_texts
labels = [sent.prediction.label for sent in tg1.sentences]

editor = Editor()
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [41]:
for template, label, i in zip(templates1, labels, range(len(templates1))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [42]:
suite.run(model_albert.predict, overwrite=True)

In [43]:
suite.summary()

In [44]:
suite.save('./suites/posneg-approach3-albert.suite')

#### Model Distilbert

In [45]:
lexicons = tg2.lexicons
templates2 = tg2.template_texts
masked = tg2.masked_texts
labels = [sent.prediction.label for sent in tg2.sentences]

editor = Editor()
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [46]:
for template, label, i in zip(templates2, labels, range(len(templates2))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [47]:
suite.run(model_distilbert.predict, overwrite=True)

In [48]:
suite.summary()

In [49]:
suite.save('./suites/posneg-approach3-distilbert.suite')

#### Model Roberta

In [50]:
lexicons = tg3.lexicons
templates3 = tg3.template_texts
masked = tg3.masked_texts
labels = [sent.prediction.label for sent in tg3.sentences]

editor = Editor()
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [51]:
for template, label, i in zip(templates3, labels, range(len(templates3))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [52]:
suite.run(model_roberta.predict, overwrite=True)

Running Test: MFT with vocabullary - template1
Predicting 2 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 4 examples
Running Test: MFT with vocabullary - template3
Predicting 4 examples


In [53]:
suite.summary()

Vocabullary

Test: MFT with vocabullary - template1
Test cases:      2
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template2
Test cases:      4
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template3
Test cases:      4
Fails (rate):    0 (0.0%)






In [54]:
suite.save('./suites/posneg-approach3-roberta.suite')

#### Model Xlnet

In [55]:
lexicons = tg4.lexicons
templates4 = tg4.template_texts
masked = tg4.masked_texts
labels = [sent.prediction.label for sent in tg4.sentences]

editor = Editor()
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [56]:
for template, label, i in zip(templates4, labels, range(len(templates4))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [57]:
suite.run(model_xlnet.predict, overwrite=True)

Running Test: MFT with vocabullary - template1
Predicting 9 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 27 examples
Running Test: MFT with vocabullary - template3
Predicting 9 examples
Running Test: MFT with vocabullary - template4
Predicting 9 examples
Running Test: MFT with vocabullary - template5
Predicting 9 examples
Running Test: MFT with vocabullary - template6
Predicting 3 examples


In [58]:
suite.summary()

Vocabullary

Test: MFT with vocabullary - template1
Test cases:      9
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template2
Test cases:      27
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template3
Test cases:      9
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template4
Test cases:      9
Fails (rate):    2 (22.2%)

Example fails:
0.0 the bleak result is bleak in cannes .
----
0.2 the cinematic result is cinematic in cannes .
----


Test: MFT with vocabullary - template5
Test cases:      9
Fails (rate):    1 (11.1%)

Example fails:
0.6 as effective and well-intentioned as all or nothing is however the film comes perilously close to being too effective too pessimistic and too unflinching for its own good .
----


Test: MFT with vocabullary - template6
Test cases:      3
Fails (rate):    0 (0.0%)






In [59]:
suite.save('./suites/posneg-approach3-xlnet.suite')

# Carregando suite de teste

In [60]:
from checklist.test_suite import TestSuite
suite = TestSuite.from_file('./suites/posneg-approach3-bert.suite')

suite.visual_summary_table()

Please wait as we prepare the table data...


SuiteSummarizer(stats={'npassed': 0, 'nfailed': 0, 'nfiltered': 0}, test_infos=[{'name': 'Test: MFT with vocab…