# Abordagem 1

Usando a abordagem 1 para gerar templates com foco em templates positivos e negativos. Uma possível aplicação seria testar a capacidade linguística *Vocabullary* com o teste **MFT**.

As etapas desta abordagem são:

1. Rankear as palavras das instâncias completas
2. Quebrar as instâncias em sentenças
3. Filtrar as sentenças que contêm ao menos uma das palavras mais bem rankeadas na etapa anterior
4. Filtrar as sentenças com palavras relevantes (adjetivos ou verbos)
5. Classificar as sentenças usando o *Oráculo*
6. Filtrar as sentenças classificadas de forma unânime
7. Substituir as palavras relevantes por máscaras

In [1]:
%config Completer.use_jedi = False
import sys
sys.path.append('../../')

## Carregando o dataset, o modelo alvo e os modelos auxiliares

In [2]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

movie_reviews_rt_df = pd.read_csv('./data/data-rt-1000samples.csv')
movie_reviews_rt_df.head(5)

Unnamed: 0,label,text,words
0,1,drumline ably captures the complicated relationships in a marching band .,11
1,1,delivers roughly equal amounts of beautiful movement and inside information .,11
2,1,saved from being merely way-cool by a basic credible compassion .,11
3,1,this is a movie full of grace and ultimately hope .,11
4,1,the imax screen enhances the personal touch of manual animation .,11


In [3]:
import re
import numpy as np
from torch.nn.functional import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def pre_proccess(text):
    text = text.lower()
    text = re.sub('["\',!-.:-@0-9/]()', ' ', text)
    return text

# Wrapper to adapt output format
class SentimentAnalisysModelWrapper:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        
    def __predict(self, text_input):
        text_preprocessed = pre_proccess(text_input)
        tokenized = self.tokenizer(text_preprocessed, padding=True, truncation=True, max_length=512, 
                                    add_special_tokens = True, return_tensors="pt")
        
        tensor_logits = self.model(**tokenized)
        prob = softmax(tensor_logits[0]).detach().numpy()
        pred = np.argmax(prob)
        
        return pred, prob
    
    def predict_label(self, text_inputs):
        return self.predict(text_inputs)[0]
        
    def predict_proba(self, text_inputs):
        return self.predict(text_inputs)[1]
        
    def predict(self, text_inputs):
        if isinstance(text_inputs, str):
            text_inputs = [text_inputs]
        
        preds = []
        probs = []

        for text_input in text_inputs:
            pred, prob = self.__predict(text_input)
            preds.append(pred)
            probs.append(prob[0])

        return np.array(preds), np.array(probs) # ([0, 1], [[0.99, 0.01], [0.03, 0.97]])

# Auxiliar function to load and wrap a model from Hugging Face
def load_model(model_name):
    print(f'Loading model {model_name}...')
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    return SentimentAnalisysModelWrapper(model, tokenizer)

# Hugging Face hosted model names 
movie_reviews_models = {
    'bert': 'textattack/bert-base-uncased-rotten-tomatoes', 
    'albert': 'textattack/albert-base-v2-rotten-tomatoes', 
    'distilbert': 'textattack/distilbert-base-uncased-rotten-tomatoes', 
    'roberta': 'textattack/roberta-base-rotten-tomatoes', 
    'xlnet': 'textattack/xlnet-base-cased-rotten-tomatoes'
}

In [4]:
m1 = load_model(movie_reviews_models['albert'])
m2 = load_model(movie_reviews_models['distilbert'])
m3 = load_model(movie_reviews_models['roberta'])
m4 = load_model(movie_reviews_models['xlnet'])

# Models to be used as oracle
models = [m1, m2, m3, m4]
# Target model
model = load_model(movie_reviews_models['bert'])

Loading model textattack/albert-base-v2-rotten-tomatoes...
Loading model textattack/distilbert-base-uncased-rotten-tomatoes...
Loading model textattack/roberta-base-rotten-tomatoes...


Some weights of the model checkpoint at textattack/roberta-base-rotten-tomatoes were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Loading model textattack/xlnet-base-cased-rotten-tomatoes...
Loading model textattack/bert-base-uncased-rotten-tomatoes...


# Gerando os templates
O método de rankeamento das palavras usado no PosNegTemplateGenerator é o Replace-1 Score

In [5]:
from template_generator.tasks.sentiment_analisys import PosNegTemplateGeneratorApp1

tg = PosNegTemplateGeneratorApp1(model, models)

showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml


### Número inicial de instâncias: 5

In [6]:
# Sampling instances
np.random.seed(220)
n_instances = 5
df_sampled = movie_reviews_rt_df.sample(n_instances)

instances = [x for x in df_sampled['text'].values]

In [7]:
templates = tg.generate_templates(instances, n_masks=2, ranked_words_count=4)

Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


Converting texts to sentences...
:: 9 sentences were generated.
Filtering instances by contaning ranked words...
:: 3 sentences remaining.
4
Filtering instances by relevant words...
['VERB', 'ADJ']
{word: loved, index: 24, tag: VERB, rank_score: -0.0017096400260925293}
{word: i, index: 23, tag: NOUN, rank_score: -0.0009018778800964355}
{word: it, index: 25, tag: PRON, rank_score: 0.00016921758651733398}
{word: !, index: 26, tag: ., rank_score: 0.0}
 
['VERB', 'ADJ']
{word: 'performance, index: 29, tag: NOUN, rank_score: 0.0009433627128601074}
{word: incredible, index: 32, tag: ADJ, rank_score: 0.0006940364837646484}
{word: is, index: 31, tag: VERB, rank_score: -0.00015503168106079102}
{word: 's, index: 28, tag: PRT, rank_score: 0.0001334547996520996}
 
['VERB', 'ADJ']
{word: overcome, index: 21, tag: VERB, rank_score: -0.0029458999633789062}
{word: part, index: 6, tag: NOUN, rank_score: -0.0015453696250915527}
{word: always, index: 4, tag: ADV, rank_score: -0.0013220906257629395}
{word

#### Tempo de execução para 5 instâncias: 9.7s

In [8]:
df = tg.to_dataframe()
df

Unnamed: 0,label,original_text,masked_text,template_text
0,1,gollum's 'performance' is incredible !,gollum 's 'performance ' {mask} {mask} !,gollum 's 'performance ' {pos_verb} {pos_adj} !
1,1,"but tongue-in-cheek preposterousness has always been part of for the most part wilde's droll whimsy helps "" being earnest "" overcome its weaknesses and parker's creative interference .",but tongue-in-cheek preposterousness has always been part of for the most part wilde 's droll whimsy helps `` being earnest `` {mask} its weaknesses and parker 's {mask} interference .,but tongue-in-cheek preposterousness has always been part of for the most part wilde 's droll whimsy helps `` being earnest `` {neg_verb} its weaknesses and parker 's {pos_adj} interference .


In [9]:
tg.lexicons

{'pos_verb': ['is'],
 'neg_verb': ['overcome'],
 'pos_adj': ['creative', 'incredible'],
 'neg_adj': []}

### Número inicial de instâncias: 100

In [10]:
# Using all 100 instances
instances = [x for x in movie_reviews_rt_df['text'].values]

In [11]:
tg = PosNegTemplateGeneratorApp1(model, models)
templates = tg.generate_templates(instances, n_masks=2, ranked_words_count=4)

Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


Converting texts to sentences...
:: 1473 sentences were generated.
Filtering instances by contaning ranked words...
:: 305 sentences remaining.
4
Filtering instances by relevant words...
['VERB', 'ADJ']
{word: nonsense, index: 1, tag: NOUN, rank_score: -0.0002835392951965332}
{word: this, index: 2, tag: DET, rank_score: -0.0002015233039855957}
{word: sad, index: 0, tag: ADJ, rank_score: -3.2961368560791016e-05}
{word: ., index: 3, tag: ., rank_score: 0.0}
 
['VERB', 'ADJ']
{word: money, index: 9, tag: NOUN, rank_score: -0.04729241132736206}
{word: cry, index: 6, tag: VERB, rank_score: 0.0039604902267456055}
{word: you, index: 4, tag: PRON, rank_score: 0.002862989902496338}
{word: for, index: 7, tag: ADP, rank_score: 0.0012971758842468262}
 
['VERB', 'ADJ']
{word: movie, index: 4, tag: NOUN, rank_score: -6.133317947387695e-05}
{word: just, index: 6, tag: ADV, rank_score: -5.97834587097168e-05}
{word: is, index: 5, tag: VERB, rank_score: -5.501508712768555e-05}
{word: a, index: 7, tag: D

#### Tempo de execução para 100 instâncias: 4m 17.8s

In [12]:
df = tg.to_dataframe()
df

Unnamed: 0,label,original_text,masked_text,template_text
0,0,still i thought it could have been more .,still {mask} thought it could have been {mask} .,still {neg_adj} thought it could have been {pos_adj} .
1,1,quite good at providing some good old fashioned spooks .,quite {mask} at {mask} some good old fashioned spooks .,quite {pos_adj} at {pos_verb} some good old fashioned spooks .
2,0,lucy's a dull girl that's all .,lucy 's a {mask} girl that {mask} all .,lucy 's a {neg_adj} girl that {neg_verb} all .
3,0,william shatner as a pompous professor is the sole bright spot .,william shatner as a {mask} professor is the {mask} bright spot .,william shatner as a {neg_adj} professor is the {neg_adj} bright spot .
4,0,dreary highly annoying .,{mask} highly {mask} .,{neg_adj} highly {neg_verb} .
...,...,...,...,...
104,1,"the gorgeously elaborate continuation of "" the lord of the rings "" trilogy is so huge that a column of words cannot adequately describe co-writer/director peter jackson's expanded vision of j .",the gorgeously elaborate continuation of `` the lord of the rings `` trilogy is so {mask} that a column of words {mask} not adequately describe co-writer/director peter jackson 's expanded vision of j .,the gorgeously elaborate continuation of `` the lord of the rings `` trilogy is so {neg_adj} that a column of words {neg_verb} not adequately describe co-writer/director peter jackson 's expanded vision of j .
105,0,it's hard to say who might enjoy this are there tolstoy groupies out there ?,it 's hard to {mask} who {mask} enjoy this are there tolstoy groupies out there ?,it 's hard to {neg_verb} who {neg_verb} enjoy this are there tolstoy groupies out there ?
106,0,there are far worse messages to teach a young audience which will probably be perfectly happy with the sloppy slapstick comedy .,there are far {mask} messages to teach a young audience which will probably be perfectly {mask} with the sloppy slapstick comedy .,there are far {neg_adj} messages to teach a young audience which will probably be perfectly {pos_adj} with the sloppy slapstick comedy .
107,0,parker should be commended for taking a fresh approach to familiar material but his determination to remain true to the original text leads him to adopt a somewhat mannered tone .,parker {mask} be {mask} for taking a fresh approach to familiar material but his determination to remain true to the original text leads him to adopt a somewhat mannered tone .,parker {neg_verb} be {pos_verb} for taking a fresh approach to familiar material but his determination to remain true to the original text leads him to adopt a somewhat mannered tone .


In [13]:
tg.lexicons

{'pos_verb': ['makes',
  'is',
  'fun',
  'shine',
  'feel',
  'love',
  'keeps',
  'making',
  'dazzling',
  'talento',
  'enjoy',
  'getting',
  'commended',
  'moving',
  'keep',
  'jarring',
  'gotten',
  'comes',
  'thank',
  'gives',
  'watching',
  'being',
  'como',
  'bring',
  'seem',
  'saving',
  'look',
  'providing',
  'find',
  'goes'],
 'neg_verb': ['required',
  'say',
  'annoying',
  'missing',
  'should',
  'expect',
  'take',
  'maddening',
  'might',
  'does',
  'quibble',
  'may',
  'sticking',
  "'s",
  'have',
  'guide',
  'seems',
  'be',
  'works',
  'sounded',
  'heavy-handed',
  'avoid',
  'wind',
  'adding',
  'pull',
  'can',
  'diminishing',
  'fails',
  'involving',
  'has',
  'milked',
  'thought',
  'coupled',
  'will',
  'desired',
  'could',
  'staged',
  'fall',
  'swinging',
  'overcome',
  'are',
  'plays',
  'report',
  'work',
  'play',
  'executed',
  'regret',
  'go',
  'expects',
  'hoping',
  'leaving',
  'dabbles',
  'feels'],
 'pos_adj': [

## Checklist

In [14]:
import checklist
from checklist.editor import Editor
from checklist.test_suite import TestSuite
from checklist.test_types import MFT

In [15]:
lexicons = tg.lexicons
templates = tg.template_texts
masked = tg.masked_texts
labels = [sent.prediction.label for sent in tg.sentences]

editor = Editor()
editor.add_lexicon('pos_verb', lexicons['pos_verb'])
editor.add_lexicon('neg_verb', lexicons['neg_verb'])
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [16]:
for template, label, i in zip(templates, labels, range(len(templates))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [17]:
suite.run(model.predict, overwrite=True)

Running Test: MFT with vocabullary - template1
Predicting 2016 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 1259 examples
Running Test: MFT with vocabullary - template3
Predicting 2544 examples
Running Test: MFT with vocabullary - template4
Predicting 48 examples
Running Test: MFT with vocabullary - template5
Predicting 2544 examples
Running Test: MFT with vocabullary - template6
Predicting 30 examples
Running Test: MFT with vocabullary - template7
Predicting 1259 examples
Running Test: MFT with vocabullary - template8
Predicting 1259 examples
Running Test: MFT with vocabullary - template9
Predicting 1440 examples
Running Test: MFT with vocabullary - template10
Predicting 1440 examples
Running Test: MFT with vocabullary - template11
Predicting 2544 examples
Running Test: MFT with vocabullary - template12
Predicting 1440 examples
Running Test: MFT with vocabullary - template13
Predicting 1590 examples
Running Test: MFT with vocabullary - template14
Predicting 1590 examples
Running Test: MFT with vocabullary - template15
Predicting 1590

In [18]:
suite.summary()

Vocabullary

Test: MFT with vocabullary - template1
Test cases:      2016
Fails (rate):    35 (1.7%)

Example fails:
0.6 still [ thought it could have been naturalness .
----
0.7 still [ thought it could have been seductive .
----
0.6 still nifty thought it could have been more .
----


Test: MFT with vocabullary - template2
Test cases:      1259
Fails (rate):    107 (8.5%)

Example fails:
0.1 quite much at being some good old fashioned spooks .
----
0.0 quite innocent at talento some good old fashioned spooks .
----
0.0 quite easy at moving some good old fashioned spooks .
----


Test: MFT with vocabullary - template3
Test cases:      2544
Fails (rate):    961 (37.8%)

Example fails:
0.7 lucy 's a bad girl that pull all .
----
1.0 lucy 's a huge girl that has all .
----
0.9 lucy 's a commercial girl that have all .
----


Test: MFT with vocabullary - template4
Test cases:      48
Fails (rate):    19 (39.6%)

Example fails:
0.8 william shatner as a rent professor is the rent bright spo

In [19]:
suite.save('./suites/posneg-approach1.suite')

# Carregando suite de teste

In [20]:
from checklist.test_suite import TestSuite
suite = TestSuite.from_file('./suites/posneg-approach1.suite')

suite.visual_summary_table()

Please wait as we prepare the table data...


SuiteSummarizer(stats={'npassed': 0, 'nfailed': 0, 'nfiltered': 0}, test_infos=[{'name': 'Test: MFT with vocab…