# Abordagem 1

Usando a abordagem 1 para gerar templates com foco em templates positivos e negativos. Uma possível aplicação seria testar a capacidade linguística *Vocabullary* com o teste **MFT**.

As etapas desta abordagem são:

1. Rankear as palavras das instâncias completas
2. Quebrar as instâncias em sentenças
3. Filtrar as sentenças que contêm ao menos uma das palavras mais bem rankeadas na etapa anterior
4. Filtrar as sentenças com palavras relevantes (adjetivos ou verbos)
5. Classificar as sentenças usando o *Oráculo*
6. Filtrar as sentenças classificadas de forma unânime
7. Substituir as palavras relevantes por máscaras

In [1]:
%config Completer.use_jedi = False
import sys
sys.path.append('../../')

## Carregando o dataset, o modelo alvo e os modelos auxiliares

In [2]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

movie_reviews_rt_df = pd.read_csv('./data/data-rt-100samples.csv')
movie_reviews_rt_df.head(5)

Unnamed: 0,label,text,words
0,1,allen's underestimated charm delivers more goodies than lumps of coal .,11
1,0,skip the film and buy the philip glass soundtrack cd .,11
2,0,involving at times but lapses quite casually into the absurd .,11
3,0,while hoffman's performance is great the subject matter goes nowhere .,11
4,1,a flick about our infantilized culture that isn't entirely infantile .,11


In [3]:
import re
import numpy as np
from torch.nn.functional import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def pre_proccess(text):
    text = text.lower()
    text = re.sub('["\',!-.:-@0-9/]()', ' ', text)
    return text

# Wrapper to adapt output format
class SentimentAnalisysModelWrapper:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        
    def __predict(self, text_input):
        text_preprocessed = pre_proccess(text_input)
        tokenized = self.tokenizer(text_preprocessed, padding=True, truncation=True, max_length=512, 
                                    add_special_tokens = True, return_tensors="pt")
        
        tensor_logits = self.model(**tokenized)
        prob = softmax(tensor_logits[0]).detach().numpy()
        pred = np.argmax(prob)
        
        return pred, prob
    
    def predict_label(self, text_inputs):
        return self.predict(text_inputs)[0]
        
    def predict_proba(self, text_inputs):
        return self.predict(text_inputs)[1]
        
    def predict(self, text_inputs):
        if isinstance(text_inputs, str):
            text_inputs = [text_inputs]
        
        preds = []
        probs = []

        for text_input in text_inputs:
            pred, prob = self.__predict(text_input)
            preds.append(pred)
            probs.append(prob[0])

        return np.array(preds), np.array(probs) # ([0, 1], [[0.99, 0.01], [0.03, 0.97]])

# Auxiliar function to load and wrap a model from Hugging Face
def load_model(model_name):
    print(f'Loading model {model_name}...')
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    return SentimentAnalisysModelWrapper(model, tokenizer)

# Hugging Face hosted model names 
movie_reviews_models = {
    'bert': 'textattack/bert-base-uncased-rotten-tomatoes', 
    'albert': 'textattack/albert-base-v2-rotten-tomatoes', 
    'distilbert': 'textattack/distilbert-base-uncased-rotten-tomatoes', 
    'roberta': 'textattack/roberta-base-rotten-tomatoes', 
    'xlnet': 'textattack/xlnet-base-cased-rotten-tomatoes'
}

In [4]:
m1 = load_model(movie_reviews_models['albert'])
m2 = load_model(movie_reviews_models['distilbert'])
m3 = load_model(movie_reviews_models['roberta'])
m4 = load_model(movie_reviews_models['xlnet'])

# Models to be used as oracle
models = [m1, m2, m3, m4]
# Target model
model = load_model(movie_reviews_models['bert'])

Loading model textattack/albert-base-v2-rotten-tomatoes...
Loading model textattack/distilbert-base-uncased-rotten-tomatoes...
Loading model textattack/roberta-base-rotten-tomatoes...


Some weights of the model checkpoint at textattack/roberta-base-rotten-tomatoes were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Loading model textattack/xlnet-base-cased-rotten-tomatoes...
Loading model textattack/bert-base-uncased-rotten-tomatoes...


# Gerando os templates
O método de rankeamento das palavras usado no PosNegTemplateGenerator é o Replace-1 Score

In [5]:
from template_generator.tasks.sentiment_analisys import PosNegTemplateGeneratorApp1

tg = PosNegTemplateGeneratorApp1(model, models)

### Número inicial de instâncias: 5

In [6]:
# Sampling instances
np.random.seed(220)
n_instances = 5
df_sampled = movie_reviews_rt_df.sample(n_instances)

instances = [x for x in df_sampled['text'].values]

In [8]:
templates = tg.generate_templates(instances, n_masks=2, ranked_words_count=4)

Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


Converting texts to sentences...
:: 6 sentences were generated.
Filtering instances by contaning ranked words...
:: 1 sentences remaining.
4
Filtering instances by relevant words...
['VERB', 'ADJ']
{word: future, index: 2, tag: NOUN, rank_score: -0.001435995101928711}
{word: for, index: 0, tag: ADP, rank_score: -0.0009976029396057129}
{word: hopes, index: 4, tag: VERB, rank_score: -0.00036329030990600586}
{word: one, index: 3, tag: NUM, rank_score: -0.0001398324966430664}
 
:: 0 sentences remaining.
Predicting inputs...
:: Sentence predictions done.


#### Tempo de execução para 5 instâncias: 9.7s

In [9]:
df = tg.to_dataframe()
df

Unnamed: 0,label,original_text,masked_text,template_text


In [10]:
tg.lexicons

{'pos_verb': [], 'neg_verb': [], 'pos_adj': [], 'neg_adj': []}

### Número inicial de instâncias: 100

In [11]:
# Using all 100 instances
instances = [x for x in movie_reviews_rt_df['text'].values]

In [12]:
tg = PosNegTemplateGeneratorApp1(model, models)
templates = tg.generate_templates(instances, n_masks=2, ranked_words_count=4)

Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


Converting texts to sentences...
:: 134 sentences were generated.
Filtering instances by contaning ranked words...
:: 23 sentences remaining.
4
Filtering instances by relevant words...
['VERB', 'ADJ']
{word: well-made, index: 5, tag: ADJ, rank_score: -0.0003566145896911621}
{word: clunker, index: 11, tag: NOUN, rank_score: -0.00032889842987060547}
{word: clunker, index: 8, tag: NOUN, rank_score: 0.00023746490478515625}
{word: thoughtful, index: 6, tag: ADJ, rank_score: -0.00020372867584228516}
 
['VERB', 'ADJ']
{word: and, index: 7, tag: CONJ, rank_score: -0.40014511346817017}
{word: this, index: 9, tag: DET, rank_score: -0.3538123369216919}
{word: regard, index: 10, tag: NOUN, rank_score: -0.012109756469726562}
{word: guard, index: 12, tag: NOUN, rank_score: 0.0036880970001220703}
 
['VERB', 'ADJ']
{word: bad, index: 7, tag: ADJ, rank_score: -0.014482975006103516}
{word: trailers, index: 10, tag: NOUN, rank_score: -0.007329761981964111}
{word: as, index: 6, tag: ADV, rank_score: -0.00

#### Tempo de execução para 100 instâncias: 4m 17.8s

In [13]:
df = tg.to_dataframe()
df

Unnamed: 0,label,original_text,masked_text,template_text
0,0,a well-made thoughtful well-acted clunker but a clunker nonetheless .,a {mask} {mask} well-acted clunker but a clunker nonetheless .,a {pos_adj} {neg_adj} well-acted clunker but a clunker nonetheless .
1,0,could the country bears really be as bad as its trailers ?,{mask} the country bears really be as {mask} as its trailers ?,{neg_verb} the country bears really be as {neg_adj} as its trailers ?
2,0,it's pauly shore awful .,it {mask} pauly {mask} awful .,it {neg_verb} pauly {neg_adj} awful .
3,0,don't say you weren't warned .,{mask} n't say you {mask} n't warned .,{neg_verb} n't say you {neg_verb} n't warned .
4,1,you'll probably love it .,you {mask} probably {mask} it .,you {pos_verb} probably {pos_verb} it .
5,0,hits all the verbal marks it should .,{mask} all the verbal marks it {mask} .,{pos_verb} all the verbal marks it {neg_verb} .
6,1,the charming result is festival in cannes .,the {mask} result is {mask} in cannes .,the {pos_adj} result is {pos_adj} in cannes .
7,0,/ but daphne you're too buff / fred thinks he's tough / and velma - wow you've lost weight !,/ but daphne you {mask} too buff / fred thinks he 's tough / and velma - wow you 've {mask} weight !,/ but daphne you {neg_verb} too buff / fred thinks he 's tough / and velma - wow you 've {neg_verb} weight !
8,1,is inspiring ironic and revelatory of just how ridiculous and money-oriented the record industry really is .,is {mask} ironic and revelatory of just how {mask} and money-oriented the record industry really is .,is {pos_verb} ironic and revelatory of just how {neg_adj} and money-oriented the record industry really is .


In [14]:
tg.lexicons

{'pos_verb': ["'ll", 'hits', 'love', 'inspiring'],
 'neg_verb': ['lost', "'s", 'do', 'could', 'should', 'were', "'re"],
 'pos_adj': ['well-made', 'charming', 'festival'],
 'neg_adj': ['bad', 'shore', 'thoughtful', 'ridiculous']}

## Checklist

In [15]:
import checklist
from checklist.editor import Editor
from checklist.test_suite import TestSuite
from checklist.test_types import MFT

In [16]:
lexicons = tg.lexicons
templates = tg.template_texts
masked = tg.masked_texts
labels = [sent.prediction.label for sent in tg.sentences]

editor = Editor()
editor.add_lexicon('pos_verb', lexicons['pos_verb'])
editor.add_lexicon('neg_verb', lexicons['neg_verb'])
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [17]:
for template, label, i in zip(templates, labels, range(len(templates))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary")) 

In [18]:
suite.run(model.predict, overwrite=True)

Running Test: MFT with vocabullary - template1
Predicting 12 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 28 examples
Running Test: MFT with vocabullary - template3
Predicting 28 examples
Running Test: MFT with vocabullary - template4
Predicting 7 examples
Running Test: MFT with vocabullary - template5
Predicting 4 examples
Running Test: MFT with vocabullary - template6
Predicting 28 examples
Running Test: MFT with vocabullary - template7
Predicting 3 examples
Running Test: MFT with vocabullary - template8
Predicting 7 examples
Running Test: MFT with vocabullary - template9
Predicting 16 examples


In [19]:
suite.summary()

Vocabullary

Test: MFT with vocabullary - template1
Test cases:      12
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template2
Test cases:      28
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template3
Test cases:      28
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template4
Test cases:      7
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template5
Test cases:      4
Fails (rate):    2 (50.0%)

Example fails:
0.2 you hits probably hits it .
----
0.0 you 'll probably 'll it .
----


Test: MFT with vocabullary - template6
Test cases:      28
Fails (rate):    14 (50.0%)

Example fails:
0.9 love all the verbal marks it should .
----
1.0 inspiring all the verbal marks it 're .
----
1.0 inspiring all the verbal marks it were .
----


Test: MFT with vocabullary - template7
Test cases:      3
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template8
Test cases:      7
Fails (rate):    6 (85.7%)

Example fails:
1.0 / but daphne you 's too

In [20]:
suite.save('./suites/posneg-approach1.suite')

# Carregando suite de teste

In [21]:
from checklist.test_suite import TestSuite
suite = TestSuite.from_file('./suites/posneg-approach1.suite')

suite.visual_summary_table()

Please wait as we prepare the table data...


SuiteSummarizer(stats={'npassed': 0, 'nfailed': 0, 'nfiltered': 0}, test_infos=[{'name': 'Test: MFT with vocab…

# teste

In [22]:
lexicons = tg.lexicons
templates = tg.template_texts
labels = [sent.prediction.label for sent in tg.sentences]

editor = Editor()
editor.add_lexicon('pos_verb', lexicons['pos_verb'])
editor.add_lexicon('neg_verb', lexicons['neg_verb'])
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [23]:
data = []
lbl = []
for template, label in zip(templates, labels):
    t = editor.template(template, remove_duplicates=True, labels=int(label))
    data.extend(t.data)
    lbl.extend(t.labels)

suite.add(MFT(
    data=data,
    labels=lbl,
    capability="Vocabullary",
    name="Template Generator - Vocabulary in MFT",
    description="Testing the model for vocabulary capability"))

In [24]:
suite.run(model.predict, overwrite=True)

Running Template Generator - Vocabulary in MFT
Predicting 133 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


In [25]:
suite.summary()

Vocabullary

Template Generator - Vocabulary in MFT
Test cases:      133
Fails (rate):    22 (16.5%)

Example fails:
1.0 love all the verbal marks it do .
----
1.0 love all the verbal marks it were .
----
0.9 / but daphne you 're too buff / fred thinks he 's tough / and velma - wow you 've 're weight !
----




