# Abordagem 2

Usando a abordagem 2 para gerar templates com foco em templates positivos e negativos. Uma possível aplicação seria testar a capacidade linguística *Vocabullary* com o teste **MFT**.

As etapas desta abordagem são:

1. Rankear as palavras das instâncias completas
2. Quebrar as instâncias em sentenças
3. Filtrar as sentenças que contêm ao menos uma das palavras mais bem rankeadas na etapa anterior
4. Rankear as palavras de cada sentença
5. Filtrar as sentenças com palavras relevantes (adjetivos ou verbos)
6. Classificar as sentenças usando o *Oráculo*
7. Substituir as palavras relevantes por máscaras

In [1]:
%config Completer.use_jedi = False
import sys
sys.path.append('../../')

## Carregando o dataset, o modelo alvo e os modelos auxiliares

In [2]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

movie_reviews_rt_df = pd.read_csv('./data/data-rt-100samples.csv')
movie_reviews_rt_df.head(5)

Unnamed: 0,label,text,words
0,1,allen's underestimated charm delivers more goodies than lumps of coal .,11
1,0,skip the film and buy the philip glass soundtrack cd .,11
2,0,involving at times but lapses quite casually into the absurd .,11
3,0,while hoffman's performance is great the subject matter goes nowhere .,11
4,1,a flick about our infantilized culture that isn't entirely infantile .,11


In [3]:
import re
import numpy as np
from torch.nn.functional import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def pre_proccess(text):
    text = text.lower()
    text = re.sub('["\',!-.:-@0-9/]()', ' ', text)
    return text

# Wrapper to adapt output format
class SentimentAnalisysModelWrapper:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        
    def __predict(self, text_input):
        text_preprocessed = pre_proccess(text_input)
        tokenized = self.tokenizer(text_preprocessed, padding=True, truncation=True, max_length=512, 
                                    add_special_tokens = True, return_tensors="pt")
        
        tensor_logits = self.model(**tokenized)
        prob = softmax(tensor_logits[0]).detach().numpy()
        pred = np.argmax(prob)
        
        return pred, prob
    
    def predict_label(self, text_inputs):
        return self.predict(text_inputs)[0]
        
    def predict_proba(self, text_inputs):
        return self.predict(text_inputs)[1]
        
    def predict(self, text_inputs):
        if isinstance(text_inputs, str):
            text_inputs = [text_inputs]
        
        preds = []
        probs = []

        for text_input in text_inputs:
            pred, prob = self.__predict(text_input)
            preds.append(pred)
            probs.append(prob[0])

        return np.array(preds), np.array(probs) # ([0, 1], [[0.99, 0.01], [0.03, 0.97]])

# Auxiliar function to load and wrap a model from Hugging Face
def load_model(model_name):
    print(f'Loading model {model_name}...')
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    return SentimentAnalisysModelWrapper(model, tokenizer)

# Hugging Face hosted model names 
movie_reviews_models = {
    'bert': 'textattack/bert-base-uncased-rotten-tomatoes', 
    'albert': 'textattack/albert-base-v2-rotten-tomatoes', 
    'distilbert': 'textattack/distilbert-base-uncased-rotten-tomatoes', 
    'roberta': 'textattack/roberta-base-rotten-tomatoes', 
    'xlnet': 'textattack/xlnet-base-cased-rotten-tomatoes'
}

In [4]:
m0 = load_model(movie_reviews_models['bert'])
m1 = load_model(movie_reviews_models['albert'])
m2 = load_model(movie_reviews_models['distilbert'])
m3 = load_model(movie_reviews_models['roberta'])
m4 = load_model(movie_reviews_models['xlnet'])

# Models to be used as oracle
models_1 = [m1, m2, m3, m4]
models_2 = [m0, m2, m3, m4]
models_3 = [m0, m1, m3, m4]
models_4 = [m0, m1, m2, m4]
models_5 = [m0, m1, m2, m3]
# Target model
model_bert = m0
model_albert = m1
model_distilbert = m2
model_roberta = m3
model_xlnet = m4

Loading model textattack/bert-base-uncased-rotten-tomatoes...
Loading model textattack/albert-base-v2-rotten-tomatoes...
Loading model textattack/distilbert-base-uncased-rotten-tomatoes...
Loading model textattack/roberta-base-rotten-tomatoes...


Some weights of the model checkpoint at textattack/roberta-base-rotten-tomatoes were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Loading model textattack/xlnet-base-cased-rotten-tomatoes...


# Gerando os templates
O método de rankeamento das palavras usado no PosNegTemplateGenerator é o Replace-1 Score

In [5]:
from template_generator.tasks.sentiment_analisys import PosNegTemplateGeneratorApp2

tg0 = PosNegTemplateGeneratorApp2(model_bert, models_1)
tg1 = PosNegTemplateGeneratorApp2(model_albert, models_2)
tg2 = PosNegTemplateGeneratorApp2(model_distilbert, models_3)
tg3 = PosNegTemplateGeneratorApp2(model_roberta, models_4)
tg4 = PosNegTemplateGeneratorApp2(model_xlnet, models_5)

### Número inicial de instâncias: 5

In [6]:
# Sampling instances
np.random.seed(220)
n_instances = 5
df_sampled = movie_reviews_rt_df.sample(n_instances)

instances = [x for x in df_sampled['text'].values]

In [7]:
templates0 = tg0.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates1 = tg1.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates2 = tg2.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates3 = tg3.generate_templates(instances, n_masks=2, ranked_words_count=4)
templates4 = tg4.generate_templates(instances, n_masks=2, ranked_words_count=4)

Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


Converting texts to sentences...
:: 6 sentences were generated.
Filtering instances by contaning ranked words...
:: 1 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
4
Filtering instances by relevant words...
['ADJ']
{word: future, index: 2, tag: NOUN, rank_score: -0.06550866365432739}
{word: hopes, index: 4, tag: VERB, rank_score: -0.0017756223678588867}
{word: one, index: 3, tag: NUM, rank_score: 0.0016388297080993652}
{word: for, index: 0, tag: ADP, rank_score: 0.0016148090362548828}
 
:: 0 sentences remaining.
Predicting inputs...
:: Sentence predictions done.
Ranking words using Replace-1 Score...
Converting texts to sentences...
:: 6 sentences were generated.
Filtering instances by contaning ranked words...
:: 1 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
4
Filtering instances by relevant words...
['ADJ']
{word: for, index: 0, tag: ADP, rank_score: -0.12896734476089478}
{word: the, index: 1, tag: DET, rank_sc

#### Tempo de execução para 5 instâncias: 9.7s

In [8]:
df0 = tg0.to_dataframe()
df0

Unnamed: 0,label,original_text,masked_text,template_text


In [9]:
df1 = tg1.to_dataframe()
df1

Unnamed: 0,label,original_text,masked_text,template_text


In [10]:
df2 = tg2.to_dataframe()
df2

Unnamed: 0,label,original_text,masked_text,template_text


In [11]:
df3 = tg3.to_dataframe()
df3

Unnamed: 0,label,original_text,masked_text,template_text


In [12]:
df4 = tg4.to_dataframe()
df4

Unnamed: 0,label,original_text,masked_text,template_text


In [13]:
tg0.lexicons

{'pos_adj': [], 'neg_adj': []}

In [14]:
tg1.lexicons

{'pos_adj': [], 'neg_adj': []}

In [15]:
tg2.lexicons

{'pos_adj': [], 'neg_adj': []}

In [16]:
tg3.lexicons

{'pos_adj': [], 'neg_adj': []}

In [17]:
tg4.lexicons

{'pos_adj': [], 'neg_adj': []}

### Número inicial de instâncias: 100

In [18]:
# Using all 100 instances
instances = [x for x in movie_reviews_rt_df['text'].values]

In [19]:
tg0 = PosNegTemplateGeneratorApp2(model_bert, models_1)
tg1 = PosNegTemplateGeneratorApp2(model_albert, models_2)
tg2 = PosNegTemplateGeneratorApp2(model_distilbert, models_3)
tg3 = PosNegTemplateGeneratorApp2(model_roberta, models_4)
tg4 = PosNegTemplateGeneratorApp2(model_xlnet, models_5)

In [20]:
templates0 = tg0.generate_templates(instances, n_masks=2, ranked_words_count=4)

Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


Converting texts to sentences...
:: 134 sentences were generated.
Filtering instances by contaning ranked words...
:: 23 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
4
Filtering instances by relevant words...
['ADJ']
{word: clunker, index: 11, tag: NOUN, rank_score: -0.0016779303550720215}
{word: a, index: 4, tag: DET, rank_score: -0.0002397298812866211}
{word: well-made, index: 5, tag: ADJ, rank_score: -0.00020068883895874023}
{word: thoughtful, index: 6, tag: ADJ, rank_score: -0.0001347064971923828}
 
['ADJ']
{word: this, index: 9, tag: DET, rank_score: -0.0038509368896484375}
{word: regard, index: 10, tag: NOUN, rank_score: 0.002340257167816162}
{word: guard, index: 12, tag: NOUN, rank_score: 0.0020131468772888184}
{word: on, index: 11, tag: ADP, rank_score: 0.0017519593238830566}
 
['ADJ']
{word: trailers, index: 10, tag: NOUN, rank_score: -0.004239559173583984}
{word: bad, index: 7, tag: ADJ, rank_score: -0.0035950541496276855}
{word: could, in

In [21]:
templates1 = tg1.generate_templates(instances, n_masks=2, ranked_words_count=4)

Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


Converting texts to sentences...
:: 134 sentences were generated.
Filtering instances by contaning ranked words...
:: 24 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
4
Filtering instances by relevant words...
['ADJ']
{word: clunker, index: 11, tag: NOUN, rank_score: -0.05216914415359497}
{word: nonetheless, index: 12, tag: ADV, rank_score: -0.012312531471252441}
{word: but, index: 9, tag: CONJ, rank_score: -0.00903022289276123}
{word: well-acted, index: 7, tag: ADJ, rank_score: 0.0024683475494384766}
 
['ADJ']
{word: delivers, index: 13, tag: NOUN, rank_score: -0.46632376313209534}
{word: in, index: 8, tag: ADP, rank_score: -0.04138880968093872}
{word: guard, index: 12, tag: NOUN, rank_score: -0.026207327842712402}
{word: regard, index: 10, tag: NOUN, rank_score: 0.021691203117370605}
 
['ADJ']
{word: as, index: 6, tag: ADV, rank_score: -0.07609403133392334}
{word: trailers, index: 10, tag: NOUN, rank_score: -0.074942946434021}
{word: could, index: 

In [22]:
templates2 = tg2.generate_templates(instances, n_masks=2, ranked_words_count=4)

Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


Converting texts to sentences...
:: 134 sentences were generated.
Filtering instances by contaning ranked words...
:: 24 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
4
Filtering instances by relevant words...
['ADJ']
{word: clunker, index: 8, tag: NOUN, rank_score: -0.3396601974964142}
{word: but, index: 9, tag: CONJ, rank_score: 0.21684283018112183}
{word: thoughtful, index: 6, tag: ADJ, rank_score: -0.20432782173156738}
{word: clunker, index: 11, tag: NOUN, rank_score: 0.11272972822189331}
 
['ADJ']
{word: delivers, index: 13, tag: NOUN, rank_score: -0.055378258228302}
{word: regard, index: 10, tag: NOUN, rank_score: -0.04361259937286377}
{word: on, index: 11, tag: ADP, rank_score: 0.039002180099487305}
{word: and, index: 7, tag: CONJ, rank_score: -0.025519728660583496}
 
['ADJ']
{word: bad, index: 7, tag: ADJ, rank_score: -0.1291724443435669}
{word: trailers, index: 10, tag: NOUN, rank_score: -0.04799562692642212}
{word: could, index: 0, tag: VER

In [23]:
templates3 = tg3.generate_templates(instances, n_masks=2, ranked_words_count=4)

Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


Converting texts to sentences...
:: 134 sentences were generated.
Filtering instances by contaning ranked words...
:: 22 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
4
Filtering instances by relevant words...
['ADJ']
{word: clunker, index: 2, tag: NOUN, rank_score: -0.5367560386657715}
{word: ., index: 3, tag: ., rank_score: -0.0029336214065551758}
{word: real, index: 1, tag: ADJ, rank_score: 0.002181828022003174}
{word: a, index: 0, tag: DET, rank_score: 0.0017777681350708008}
 
['ADJ']
{word: delivers, index: 13, tag: NOUN, rank_score: -0.5631450414657593}
{word: and, index: 7, tag: CONJ, rank_score: -0.038094162940979004}
{word: this, index: 9, tag: DET, rank_score: -0.022512376308441162}
{word: regard, index: 10, tag: NOUN, rank_score: 0.00884091854095459}
 
['ADJ']
{word: bad, index: 7, tag: ADJ, rank_score: -0.012041032314300537}
{word: country, index: 2, tag: NOUN, rank_score: -0.009274661540985107}
{word: be, index: 5, tag: VERB, rank_score:

In [24]:
templates4 = tg4.generate_templates(instances, n_masks=2, ranked_words_count=4)

Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


Converting texts to sentences...
:: 134 sentences were generated.
Filtering instances by contaning ranked words...
:: 21 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
4
Filtering instances by relevant words...
['ADJ']
{word: clunker, index: 2, tag: NOUN, rank_score: -0.9947958588600159}
{word: a, index: 0, tag: DET, rank_score: -0.00312197208404541}
{word: real, index: 1, tag: ADJ, rank_score: -0.00022745132446289062}
{word: ., index: 3, tag: ., rank_score: 0.0}
 
['ADJ']
{word: satisfies, index: 2, tag: NOUN, rank_score: -0.6157572865486145}
{word: sometimes, index: 0, tag: ADV, rank_score: -0.6004771590232849}
{word: like, index: 3, tag: ADP, rank_score: -0.575774610042572}
{word: nothing, index: 1, tag: NOUN, rank_score: -0.4366878569126129}
 
['ADJ']
{word: bad, index: 7, tag: ADJ, rank_score: -0.018970727920532227}
{word: country, index: 2, tag: NOUN, rank_score: -0.005220293998718262}
{word: as, index: 8, tag: ADP, rank_score: -0.00503683090209

In [25]:
df0 = tg0.to_dataframe()
df0

Unnamed: 0,label,original_text,masked_text,template_text
0,0,a well-made thoughtful well-acted clunker but a clunker nonetheless .,a {mask} {mask} well-acted clunker but a clunker nonetheless .,a {pos_adj} {neg_adj} well-acted clunker but a clunker nonetheless .
1,0,a dreary incoherent self-indulgent mess of a movie in which a bunch of pompous windbags drone on inanely for two hours .,a dreary incoherent {mask} mess of a movie in which a bunch of {mask} windbags drone on inanely for two hours .,a dreary incoherent {neg_adj} mess of a movie in which a bunch of {neg_adj} windbags drone on inanely for two hours .
2,1,whereas the extremely competent hitman films such as pulp fiction and get shorty resonate a sardonic verve to their caustic purpose for existing who is cletis tout ?,whereas the extremely competent hitman films such as pulp fiction and get shorty resonate a {mask} verve to their {mask} purpose for existing who is cletis tout ?,whereas the extremely competent hitman films such as pulp fiction and get shorty resonate a {neg_adj} verve to their {neg_adj} purpose for existing who is cletis tout ?


In [26]:
df1 = tg1.to_dataframe()
df1

Unnamed: 0,label,original_text,masked_text,template_text
0,1,a story an old and scary one about the monsters we make and the vengeance they take .,a story an {mask} and {mask} one about the monsters we make and the vengeance they take .,a story an {neg_adj} and {neg_adj} one about the monsters we make and the vengeance they take .


In [27]:
df2 = tg2.to_dataframe()
df2

Unnamed: 0,label,original_text,masked_text,template_text
0,0,as comedic spotlights go notorious c .,as {mask} spotlights go {mask} c .,as {neg_adj} spotlights go {neg_adj} c .
1,1,the charming result is festival in cannes .,the {mask} result is {mask} in cannes .,the {pos_adj} result is {pos_adj} in cannes .
2,0,is an inexpressible and drab wannabe looking for that exact niche .,is an {mask} and {mask} wannabe looking for that exact niche .,is an {neg_adj} and {neg_adj} wannabe looking for that exact niche .


In [28]:
df3 = tg3.to_dataframe()
df3

Unnamed: 0,label,original_text,masked_text,template_text
0,1,the charming result is festival in cannes .,the {mask} result is {mask} in cannes .,the {pos_adj} result is {neg_adj} in cannes .


In [29]:
df4 = tg4.to_dataframe()
df4

Unnamed: 0,label,original_text,masked_text,template_text
0,0,half of it is composed of snappy patter and pseudo-sophisticated cultural observations while the remainder .,half of it is composed of {mask} patter and {mask} cultural observations while the remainder .,half of it is composed of {pos_adj} patter and {neg_adj} cultural observations while the remainder .
1,0,is an inexpressible and drab wannabe looking for that exact niche .,is an {mask} and {mask} wannabe looking for that exact niche .,is an {neg_adj} and {neg_adj} wannabe looking for that exact niche .


In [30]:
tg0.lexicons

{'pos_adj': ['well-made'],
 'neg_adj': ['pompous', 'sardonic', 'caustic', 'thoughtful', 'self-indulgent']}

In [31]:
tg1.lexicons

{'pos_adj': [], 'neg_adj': ['scary', 'old']}

In [32]:
tg2.lexicons

{'pos_adj': ['charming', 'festival'],
 'neg_adj': ['comedic', 'drab', 'inexpressible', 'notorious']}

In [33]:
tg3.lexicons

{'pos_adj': ['charming'], 'neg_adj': ['festival']}

In [34]:
tg4.lexicons

{'pos_adj': ['snappy'],
 'neg_adj': ['pseudo-sophisticated', 'drab', 'inexpressible']}

#### Tempo de execução para 100 instâncias: 4m 17.8s

## Checklist

#### Model BERT

In [35]:
import checklist
from checklist.editor import Editor
from checklist.test_suite import TestSuite
from checklist.test_types import MFT

In [37]:
lexicons = tg0.lexicons
templates0 = tg0.template_texts
masked = tg0.masked_texts
labels = [sent.prediction.label for sent in tg0.sentences]

editor = Editor()
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [38]:
for template, label, i in zip(templates0, labels, range(len(templates0))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [39]:
suite.run(model_bert.predict, overwrite=True)

Running Test: MFT with vocabullary - template1
Predicting 5 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 5 examples
Running Test: MFT with vocabullary - template3
Predicting 5 examples


In [40]:
suite.summary()

Vocabullary

Test: MFT with vocabullary - template1
Test cases:      5
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template2
Test cases:      5
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template3
Test cases:      5
Fails (rate):    1 (20.0%)

Example fails:
0.0 whereas the extremely competent hitman films such as pulp fiction and get shorty resonate a pompous verve to their pompous purpose for existing who is cletis tout ?
----






In [41]:
suite.save('./suites/posneg-approach2-bert.suite')

#### Model Albert

In [44]:
lexicons = tg1.lexicons
templates1 = tg1.template_texts
masked = tg1.masked_texts
labels = [sent.prediction.label for sent in tg1.sentences]

editor = Editor()
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [45]:
for template, label, i in zip(templates1, labels, range(len(templates1))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [46]:
suite.run(model_albert.predict, overwrite=True)

Running Test: MFT with vocabullary - template1
Predicting 2 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


In [47]:
suite.summary()

Vocabullary

Test: MFT with vocabullary - template1
Test cases:      2
Fails (rate):    0 (0.0%)






In [48]:
suite.save('./suites/posneg-approach2-albert.suite')

#### Model Distilbert

In [49]:
lexicons = tg2.lexicons
templates2 = tg2.template_texts
masked = tg2.masked_texts
labels = [sent.prediction.label for sent in tg2.sentences]

editor = Editor()
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [50]:
for template, label, i in zip(templates2, labels, range(len(templates2))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [51]:
suite.run(model_distilbert.predict, overwrite=True)

Running Test: MFT with vocabullary - template1
Predicting 4 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 2 examples
Running Test: MFT with vocabullary - template3
Predicting 4 examples


In [52]:
suite.summary()

Vocabullary

Test: MFT with vocabullary - template1
Test cases:      4
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template2
Test cases:      2
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template3
Test cases:      4
Fails (rate):    1 (25.0%)

Example fails:
0.8 is an comedic and comedic wannabe looking for that exact niche .
----






In [53]:
suite.save('./suites/posneg-approach2-distilbert.suite')

#### Model Roberta

In [54]:
lexicons = tg3.lexicons
templates3 = tg3.template_texts
masked = tg3.masked_texts
labels = [sent.prediction.label for sent in tg3.sentences]

editor = Editor()
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [55]:
for template, label, i in zip(templates3, labels, range(len(templates3))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [56]:
suite.run(model_roberta.predict, overwrite=True)

Running Test: MFT with vocabullary - template1
Predicting 1 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


In [57]:
suite.summary()

Vocabullary

Test: MFT with vocabullary - template1
Test cases:      1
Fails (rate):    0 (0.0%)






In [58]:
suite.save('./suites/posneg-approach2-roberta.suite')

#### Model Xlnet

In [59]:
lexicons = tg4.lexicons
templates4 = tg4.template_texts
masked = tg4.masked_texts
labels = [sent.prediction.label for sent in tg4.sentences]

editor = Editor()
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [60]:
for template, label, i in zip(templates4, labels, range(len(templates4))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [61]:
suite.run(model_xlnet.predict, overwrite=True)

Running Test: MFT with vocabullary - template1
Predicting 3 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 3 examples


In [62]:
suite.summary()

Vocabullary

Test: MFT with vocabullary - template1
Test cases:      3
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template2
Test cases:      3
Fails (rate):    1 (33.3%)

Example fails:
0.6 is an pseudo-sophisticated and pseudo-sophisticated wannabe looking for that exact niche .
----






In [63]:
suite.save('./suites/posneg-approach2-xlnet.suite')

# Carregando suite de teste

In [64]:
from checklist.test_suite import TestSuite
suite = TestSuite.from_file('./suites/posneg-approach2-bert.suite')

suite.visual_summary_table()

Please wait as we prepare the table data...


SuiteSummarizer(stats={'npassed': 0, 'nfailed': 0, 'nfiltered': 0}, test_infos=[{'name': 'Test: MFT with vocab…