# Abordagem randômica

Usando a abordagem randômica para gerar templates com foco em templates positivos e negativos. Uma possível aplicação seria testar a capacidade linguística *Vocabullary* com o teste **MFT**.

As etapas desta abordagem são:

1. Quebrar as instâncias em sentenças
2. Selecionar uma amostra de *K* sentenças de forma aleatória
3. Rankear as palavras de cada sentença
4. Realizar as predições de cada sentença usando o *Oráculo*
5. Substituir as palavras relevantes por máscaras

In [1]:
%config Completer.use_jedi = False
import sys
import random

sys.path.append('../../')
random.seed(220) 

## Carregando o dataset, o modelo alvo e os modelos auxiliares

In [2]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

movie_reviews_rt_df = pd.read_csv('./data/data-rt-100samples.csv')
movie_reviews_rt_df.head(5)

Unnamed: 0,label,text,words
0,1,allen's underestimated charm delivers more goodies than lumps of coal .,11
1,0,skip the film and buy the philip glass soundtrack cd .,11
2,0,involving at times but lapses quite casually into the absurd .,11
3,0,while hoffman's performance is great the subject matter goes nowhere .,11
4,1,a flick about our infantilized culture that isn't entirely infantile .,11


In [3]:
import re
import numpy as np
from torch.nn.functional import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def pre_proccess(text):
    text = text.lower()
    text = re.sub('["\',!-.:-@0-9/]()', ' ', text)
    return text

# Wrapper to adapt output format
class SentimentAnalisysModelWrapper:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        
    def __predict(self, text_input):
        text_preprocessed = pre_proccess(text_input)
        tokenized = self.tokenizer(text_preprocessed, padding=True, truncation=True, max_length=512, 
                                    add_special_tokens = True, return_tensors="pt")
        
        tensor_logits = self.model(**tokenized)
        prob = softmax(tensor_logits[0]).detach().numpy()
        pred = np.argmax(prob)
        
        return pred, prob
    
    def predict_label(self, text_inputs):
        return self.predict(text_inputs)[0]
        
    def predict_proba(self, text_inputs):
        return self.predict(text_inputs)[1]
        
    def predict(self, text_inputs):
        if isinstance(text_inputs, str):
            text_inputs = [text_inputs]
        
        preds = []
        probs = []

        for text_input in text_inputs:
            pred, prob = self.__predict(text_input)
            preds.append(pred)
            probs.append(prob[0])

        return np.array(preds), np.array(probs) # ([0, 1], [[0.99, 0.01], [0.03, 0.97]])

# Auxiliar function to load and wrap a model from Hugging Face
def load_model(model_name):
    print(f'Loading model {model_name}...')
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    return SentimentAnalisysModelWrapper(model, tokenizer)

# Hugging Face hosted model names 
movie_reviews_models = {
    'bert': 'textattack/bert-base-uncased-rotten-tomatoes', 
    'albert': 'textattack/albert-base-v2-rotten-tomatoes', 
    'distilbert': 'textattack/distilbert-base-uncased-rotten-tomatoes', 
    'roberta': 'textattack/roberta-base-rotten-tomatoes', 
    'xlnet': 'textattack/xlnet-base-cased-rotten-tomatoes'
}

In [4]:
m1 = load_model(movie_reviews_models['albert'])
m2 = load_model(movie_reviews_models['distilbert'])
m3 = load_model(movie_reviews_models['roberta'])
m4 = load_model(movie_reviews_models['xlnet'])

# Models to be used as oracle
models = [m1, m2, m3, m4]
# Target model
model = load_model(movie_reviews_models['bert'])

Loading model textattack/albert-base-v2-rotten-tomatoes...
Loading model textattack/distilbert-base-uncased-rotten-tomatoes...
Loading model textattack/roberta-base-rotten-tomatoes...


Some weights of the model checkpoint at textattack/roberta-base-rotten-tomatoes were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Loading model textattack/xlnet-base-cased-rotten-tomatoes...
Loading model textattack/bert-base-uncased-rotten-tomatoes...


# Gerando os templates
O método de rankeamento das palavras usado no PosNegTemplateGenerator é o Replace-1 Score

In [5]:
from template_generator.tasks.sentiment_analisys import PosNegTemplateGeneratorRandom

tg = PosNegTemplateGeneratorRandom(model, models)

### Número inicial de instâncias: 5

In [6]:
# Sampling instances
np.random.seed(220)
n_instances = 5
df_sampled = movie_reviews_rt_df.sample(n_instances)

instances = [x for x in df_sampled['text'].values]

In [7]:
templates = tg.generate_templates(instances, n_masks=2, k_templates=1)

Converting texts to sentences...
:: 6 sentences were generated.
Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Word ranking done.
Predicting inputs...
:: Sentence predictions done.


#### Tempo de execução para 5 instâncias: 2.1s

In [8]:
df = tg.to_dataframe()
df

Unnamed: 0,label,original_text,masked_text,template_text
0,0,while hoffman's performance is great the subject matter goes nowhere .,while hoffman 's performance is {mask} the {mask} matter goes nowhere .,while hoffman 's performance is {pos_adj} the {neg_adj} matter goes nowhere .


In [9]:
tg.lexicons

{'pos_adj': ['great'], 'neg_adj': ['subject']}

### Número inicial de instâncias: 100

In [10]:
# Using all 100 instances
instances = [x for x in movie_reviews_rt_df['text'].values]

In [11]:
tg = PosNegTemplateGeneratorRandom(model, models)
templates = tg.generate_templates(instances, n_masks=2, k_templates=18)

Converting texts to sentences...
:: 134 sentences were generated.
Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Word ranking done.
Predicting inputs...
:: Sentence predictions done.


#### Tempo de execução para 100 instâncias: 45.5s

In [12]:
df = tg.to_dataframe()
df

Unnamed: 0,label,original_text,masked_text,template_text
0,0,.,.,.
1,0,opens at a funeral ends on the protagonist's death bed and doesn't get much livelier in the three hours in between .,opens at a {mask} ends on the protagonist 's death bed and does n't get {mask} livelier in the three hours in between .,opens at a {neg_adj} ends on the protagonist 's death bed and does n't get {pos_adj} livelier in the three hours in between .
2,0,what remains is a variant of the nincompoop benigni persona here a more annoying though less angry version of the irresponsible sandlerian manchild undercut by the voice of the star of road trip .,what remains is a variant of the {mask} benigni persona here a more annoying though less {mask} version of the irresponsible sandlerian manchild undercut by the voice of the star of road trip .,what remains is a variant of the {pos_adj} benigni persona here a more annoying though less {neg_adj} version of the irresponsible sandlerian manchild undercut by the voice of the star of road trip .
3,0,plympton will find room for one more member of his little band a professional screenwriter .,plympton will find room for one {mask} member of his little band a {mask} screenwriter .,plympton will find room for one {pos_adj} member of his little band a {neg_adj} screenwriter .
4,0,the two leads are almost good enough to camouflage the dopey plot but so much naturalistic small talk delivered in almost muffled exchanges eventually has a lulling effect .,the two leads are almost good enough to camouflage the dopey plot but so much {mask} small talk delivered in almost muffled exchanges eventually has a {mask} effect .,the two leads are almost good enough to camouflage the dopey plot but so much {pos_adj} small talk delivered in almost muffled exchanges eventually has a {neg_adj} effect .
5,0,a dreary incoherent self-indulgent mess of a movie in which a bunch of pompous windbags drone on inanely for two hours .,a dreary incoherent {mask} mess of a movie in which a bunch of {mask} windbags drone on inanely for two hours .,a dreary incoherent {neg_adj} mess of a movie in which a bunch of {neg_adj} windbags drone on inanely for two hours .
6,1,reveals how important our special talents can be when put in service of of others .,reveals how {mask} our {mask} talents can be when put in service of of others .,reveals how {pos_adj} our {pos_adj} talents can be when put in service of of others .
7,1,elegantly produced and expressively performed the six musical numbers crystallize key plot moments into minutely detailed wonders of dreamlike ecstasy .,elegantly produced and expressively performed the six musical numbers crystallize {mask} plot moments into minutely {mask} wonders of dreamlike ecstasy .,elegantly produced and expressively performed the six musical numbers crystallize {neg_adj} plot moments into minutely {pos_adj} wonders of dreamlike ecstasy .
8,0,if you're really renting this you're not interested in discretion in your entertainment choices you're interested in anne geddes john grisham and thomas kincaid .,if you 're really renting this you 're not {mask} in discretion in your entertainment choices you 're {mask} in anne geddes john grisham and thomas kincaid .,if you 're really renting this you 're not {neg_adj} in discretion in your entertainment choices you 're {neg_adj} in anne geddes john grisham and thomas kincaid .
9,1,a story an old and scary one about the monsters we make and the vengeance they take .,a story an {mask} and {mask} one about the monsters we make and the vengeance they take .,a story an {neg_adj} and {pos_adj} one about the monsters we make and the vengeance they take .


In [13]:
tg.lexicons

{'pos_adj': ['detailed',
  'special',
  'important',
  'scary',
  'naturalistic',
  'well-made',
  'astonish',
  'intelligent',
  'cultural',
  'much',
  'more',
  'nincompoop',
  'ironic'],
 'neg_adj': ['human',
  'unbearable',
  'key',
  'angry',
  'old',
  'self-indulgent',
  'thoughtful',
  'ridiculous',
  'pseudo-sophisticated',
  'interested',
  'professional',
  'funeral',
  'lulling',
  'pompous']}

## Checklist

In [14]:
from checklist.editor import Editor
from checklist.test_suite import TestSuite
from checklist.test_types import MFT

In [15]:
lexicons = tg.lexicons
templates = tg.template_texts
masked = tg.masked_texts
labels = [sent.prediction.label for sent in tg.sentences]

editor = Editor()
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [16]:
for template, label, i in zip(templates, labels, range(len(templates))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary"))

In [17]:
suite.run(model.predict, overwrite=True)

Running Test: MFT with vocabullary - template1
Predicting 1 examples
Running Test: MFT with vocabullary - template2
Predicting 182 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template3
Predicting 182 examples
Running Test: MFT with vocabullary - template4
Predicting 182 examples
Running Test: MFT with vocabullary - template5
Predicting 182 examples
Running Test: MFT with vocabullary - template6
Predicting 14 examples
Running Test: MFT with vocabullary - template7
Predicting 13 examples
Running Test: MFT with vocabullary - template8
Predicting 182 examples
Running Test: MFT with vocabullary - template9
Predicting 14 examples
Running Test: MFT with vocabullary - template10
Predicting 182 examples
Running Test: MFT with vocabullary - template11
Predicting 182 examples
Running Test: MFT with vocabullary - template12
Predicting 1 examples
Running Test: MFT with vocabullary - template13
Predicting 1 examples
Running Test: MFT with vocabullary - template14
Predicting 13 examples
Running Test: MFT with vocabullary - template15
Predicting 182 examples
Running Test: MFT with vocabullary - template16
Predicting 14 examples
Running 

In [18]:
suite.summary()

Vocabullary

Test: MFT with vocabullary - template1
Test cases:      1
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template2
Test cases:      182
Fails (rate):    5 (2.7%)

Example fails:
0.6 opens at a key ends on the protagonist 's death bed and does n't get special livelier in the three hours in between .
----
0.7 opens at a thoughtful ends on the protagonist 's death bed and does n't get special livelier in the three hours in between .
----
0.6 opens at a angry ends on the protagonist 's death bed and does n't get special livelier in the three hours in between .
----


Test: MFT with vocabullary - template3
Test cases:      182
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template4
Test cases:      182
Fails (rate):    16 (8.8%)

Example fails:
1.0 plympton will find room for one detailed member of his little band a thoughtful screenwriter .
----
0.6 plympton will find room for one intelligent member of his little band a key screenwriter .
----
0.9 plympton

In [19]:
suite.save('./suites/posneg-random.suite')

# Carregando suite de teste

In [20]:
from checklist.test_suite import TestSuite
suite = TestSuite.from_file('./suites/posneg-random.suite')

suite.visual_summary_table()

Please wait as we prepare the table data...


SuiteSummarizer(stats={'npassed': 0, 'nfailed': 0, 'nfiltered': 0}, test_infos=[{'name': 'Test: MFT with vocab…