# Abordagem randômica

Usando a abordagem randômica para gerar templates com foco em templates positivos e negativos. Uma possível aplicação seria testar a capacidade linguística *Vocabullary* com o teste **MFT**.

As etapas desta abordagem são:

1. Quebrar as instâncias em sentenças
2. Selecionar uma amostra de *K* sentenças de forma aleatória
3. Rankear as palavras de cada sentença
4. Realizar as predições de cada sentença usando o *Oráculo*
5. Substituir as palavras relevantes por máscaras

In [1]:
%config Completer.use_jedi = False
import sys
import random

sys.path.append('../../')
random.seed(220) 

## Carregando o dataset, o modelo alvo e os modelos auxiliares

In [2]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

movie_reviews_rt_df = pd.read_csv('./data/data-rt-1000samples.csv')
movie_reviews_rt_df.head(5)

Unnamed: 0,label,text,words
0,1,drumline ably captures the complicated relationships in a marching band .,11
1,1,delivers roughly equal amounts of beautiful movement and inside information .,11
2,1,saved from being merely way-cool by a basic credible compassion .,11
3,1,this is a movie full of grace and ultimately hope .,11
4,1,the imax screen enhances the personal touch of manual animation .,11


In [3]:
import re
import numpy as np
from torch.nn.functional import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def pre_proccess(text):
    text = text.lower()
    text = re.sub('["\',!-.:-@0-9/]()', ' ', text)
    return text

# Wrapper to adapt output format
class SentimentAnalisysModelWrapper:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        
    def __predict(self, text_input):
        text_preprocessed = pre_proccess(text_input)
        tokenized = self.tokenizer(text_preprocessed, padding=True, truncation=True, max_length=512, 
                                    add_special_tokens = True, return_tensors="pt")
        
        tensor_logits = self.model(**tokenized)
        prob = softmax(tensor_logits[0]).detach().numpy()
        pred = np.argmax(prob)
        
        return pred, prob
    
    def predict_label(self, text_inputs):
        return self.predict(text_inputs)[0]
        
    def predict_proba(self, text_inputs):
        return self.predict(text_inputs)[1]
        
    def predict(self, text_inputs):
        if isinstance(text_inputs, str):
            text_inputs = [text_inputs]
        
        preds = []
        probs = []

        for text_input in text_inputs:
            pred, prob = self.__predict(text_input)
            preds.append(pred)
            probs.append(prob[0])

        return np.array(preds), np.array(probs) # ([0, 1], [[0.99, 0.01], [0.03, 0.97]])

# Auxiliar function to load and wrap a model from Hugging Face
def load_model(model_name):
    print(f'Loading model {model_name}...')
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    return SentimentAnalisysModelWrapper(model, tokenizer)

# Hugging Face hosted model names 
movie_reviews_models = {
    'bert': 'textattack/bert-base-uncased-rotten-tomatoes', 
    'albert': 'textattack/albert-base-v2-rotten-tomatoes', 
    'distilbert': 'textattack/distilbert-base-uncased-rotten-tomatoes', 
    'roberta': 'textattack/roberta-base-rotten-tomatoes', 
    'xlnet': 'textattack/xlnet-base-cased-rotten-tomatoes'
}

In [4]:
m1 = load_model(movie_reviews_models['albert'])
m2 = load_model(movie_reviews_models['distilbert'])
m3 = load_model(movie_reviews_models['roberta'])
m4 = load_model(movie_reviews_models['xlnet'])

# Models to be used as oracle
models = [m1, m2, m3, m4]
# Target model
model = load_model(movie_reviews_models['bert'])

Loading model textattack/albert-base-v2-rotten-tomatoes...
Loading model textattack/distilbert-base-uncased-rotten-tomatoes...
Loading model textattack/roberta-base-rotten-tomatoes...


Some weights of the model checkpoint at textattack/roberta-base-rotten-tomatoes were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Loading model textattack/xlnet-base-cased-rotten-tomatoes...
Loading model textattack/bert-base-uncased-rotten-tomatoes...


# Gerando os templates
O método de rankeamento das palavras usado no PosNegTemplateGenerator é o Replace-1 Score

In [5]:
from template_generator.tasks.sentiment_analisys import PosNegTemplateGeneratorRandom

tg = PosNegTemplateGeneratorRandom(model, models)

### Número inicial de instâncias: 5

In [6]:
# Sampling instances
np.random.seed(220)
n_instances = 5
df_sampled = movie_reviews_rt_df.sample(n_instances)

instances = [x for x in df_sampled['text'].values]

In [7]:
templates = tg.generate_templates(instances, n_masks=2, k_templates=1)

Converting texts to sentences...
:: 9 sentences were generated.
Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Word ranking done.
Predicting inputs...
:: Sentence predictions done.


#### Tempo de execução para 5 instâncias: 2.1s

In [8]:
df = tg.to_dataframe()
df

Unnamed: 0,label,original_text,masked_text,template_text
0,1,i loved it !,i {mask} it !,i {pos_verb} it !


In [9]:
tg.lexicons

{'pos_verb': ['loved'], 'neg_verb': [], 'pos_adj': [], 'neg_adj': []}

### Número inicial de instâncias: 100

In [10]:
# Using all 100 instances
instances = [x for x in movie_reviews_rt_df['text'].values]

In [11]:
tg = PosNegTemplateGeneratorRandom(model, models)
templates = tg.generate_templates(instances, n_masks=2, k_templates=100)

Converting texts to sentences...
:: 1473 sentences were generated.
Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Word ranking done.
Predicting inputs...
:: Sentence predictions done.


#### Tempo de execução para 100 instâncias: 45.5s

In [12]:
df = tg.to_dataframe()
df

Unnamed: 0,label,original_text,masked_text,template_text
0,1,birthday girl doesn't try to surprise us with plot twists but rather seems to enjoy its own transparency .,birthday girl {mask} n't try to surprise us with plot twists but rather {mask} to enjoy its own transparency .,birthday girl {neg_verb} n't try to surprise us with plot twists but rather {neg_verb} to enjoy its own transparency .
1,1,reinforces the often forgotten fact of the world's remarkably varying human population and mindset and its capacity to heal using creative natural and ancient antidotes .,reinforces the often forgotten fact of the world 's remarkably varying human population and mindset and its capacity to {mask} using {mask} natural and ancient antidotes .,reinforces the often forgotten fact of the world 's remarkably varying human population and mindset and its capacity to {pos_verb} using {pos_adj} natural and ancient antidotes .
2,0,keep the movie from ever reaching the comic heights it obviously desired .,{mask} the movie from ever reaching the {mask} heights it obviously desired .,{pos_verb} the movie from ever reaching the {neg_adj} heights it obviously desired .
3,0,"it's as if a bored cage spent the duration of the film's shooting schedule waiting to scream : "" got aids yet ? """,it 's as if a {mask} cage {mask} the duration of the film 's shooting schedule waiting to scream : `` got aids yet ? ``,it 's as if a {neg_adj} cage {neg_verb} the duration of the film 's shooting schedule waiting to scream : `` got aids yet ? ``
4,0,lanie's professional success means she must be a failure at life because she's driven by ambition and doesn't know how to have fun .,lanie 's professional success means she must be a failure at life because she 's driven by ambition and {mask} n't {mask} how to have fun .,lanie 's professional success means she must be a failure at life because she 's driven by ambition and {neg_verb} n't {pos_verb} how to have fun .
...,...,...,...,...
95,0,in the not-too-distant future movies like ghost ship will be used as analgesic balm for overstimulated minds .,in the not-too-distant future movies like ghost ship will be {mask} as {mask} balm for overstimulated minds .,in the not-too-distant future movies like ghost ship will be {neg_verb} as {neg_adj} balm for overstimulated minds .
96,0,.,.,.
97,0,it doesn't believe in itself it has no sense of humor?it's just plain bored .,it does n't {mask} in itself it has no sense of humor ? it {mask} just plain bored .,it does n't {pos_verb} in itself it has no sense of humor ? it {neg_verb} just plain bored .
98,0,just how extreme are these ops ?,just how {mask} {mask} these ops ?,just how {pos_adj} {neg_verb} these ops ?


In [13]:
tg.lexicons

{'pos_verb': ['gloss',
  'create',
  'gets',
  'thank',
  'beginning',
  'savor',
  'speaks',
  'keep',
  'storytelling',
  'realize',
  'orchestrates',
  'watched',
  'gives',
  'amazing',
  'is',
  'proven',
  'believe',
  'logic',
  'overlook',
  'heal',
  'makes',
  'astonishing',
  'get',
  'marking',
  'keeps',
  'admire',
  'know'],
 'neg_verb': ['escape',
  'bedevils',
  'seems',
  'begins',
  'ending',
  'yawp',
  'employ',
  'see',
  "'s",
  'justify',
  'has',
  'told',
  'mounting',
  'can',
  'used',
  'be',
  'sink',
  'think',
  'produced',
  'fascinate',
  'disgusting',
  'suit',
  'take',
  'stuffs',
  'sell',
  'will',
  'lacking',
  'are',
  'directed',
  'disintegrates',
  'telegraphed',
  'lose',
  'associated',
  'lived',
  'plays',
  'describe',
  'exist',
  'may',
  'named',
  'remains',
  'wreak',
  'was',
  'could',
  'have',
  'does',
  'fails',
  'spent',
  'forced',
  'scattered',
  'overcome'],
 'pos_adj': ['cinematic',
  'remarkable',
  'creative',
  'vis

## Checklist

In [14]:
from checklist.editor import Editor
from checklist.test_suite import TestSuite
from checklist.test_types import MFT

In [15]:
lexicons = tg.lexicons
templates = tg.template_texts
masked = tg.masked_texts
labels = [sent.prediction.label for sent in tg.sentences]

editor = Editor()
editor.add_lexicon('pos_verb', lexicons['pos_verb'])
editor.add_lexicon('neg_verb', lexicons['neg_verb'])
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [16]:
for template, label, i in zip(templates, labels, range(len(templates))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary"))

In [17]:
suite.run(model.predict, overwrite=True)

Running Test: MFT with vocabullary - template1
Predicting 50 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 756 examples
Running Test: MFT with vocabullary - template3
Predicting 945 examples
Running Test: MFT with vocabullary - template4
Predicting 1750 examples
Running Test: MFT with vocabullary - template5
Predicting 1350 examples
Running Test: MFT with vocabullary - template6
Predicting 1400 examples
Running Test: MFT with vocabullary - template7
Predicting 1350 examples
Running Test: MFT with vocabullary - template8
Predicting 1 examples
Running Test: MFT with vocabullary - template9
Predicting 28 examples
Running Test: MFT with vocabullary - template10
Predicting 1350 examples
Running Test: MFT with vocabullary - template11
Predicting 1 examples
Running Test: MFT with vocabullary - template12
Predicting 945 examples
Running Test: MFT with vocabullary - template13
Predicting 1 examples
Running Test: MFT with vocabullary - template14
Predicting 1 examples
Running Test: MFT with vocabullary - template15
Predicting 1750 examples
Run

In [18]:
suite.summary()

Vocabullary

Test: MFT with vocabullary - template1
Test cases:      50
Fails (rate):    49 (98.0%)

Example fails:
0.0 birthday girl 's n't try to surprise us with plot twists but rather 's to enjoy its own transparency .
----
0.0 birthday girl exist n't try to surprise us with plot twists but rather exist to enjoy its own transparency .
----
0.0 birthday girl justify n't try to surprise us with plot twists but rather justify to enjoy its own transparency .
----


Test: MFT with vocabullary - template2
Test cases:      756
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template3
Test cases:      945
Fails (rate):    81 (8.6%)

Example fails:
0.9 gives the movie from ever reaching the dreary heights it obviously desired .
----
0.9 gives the movie from ever reaching the rival heights it obviously desired .
----
1.0 amazing the movie from ever reaching the many heights it obviously desired .
----


Test: MFT with vocabullary - template4
Test cases:      1750
Fails (rate):    0 

In [19]:
suite.save('./suites/posneg-random.suite')

# Carregando suite de teste

In [20]:
from checklist.test_suite import TestSuite
suite = TestSuite.from_file('./suites/posneg-random.suite')

suite.visual_summary_table()

Please wait as we prepare the table data...


SuiteSummarizer(stats={'npassed': 0, 'nfailed': 0, 'nfiltered': 0}, test_infos=[{'name': 'Test: MFT with vocab…