# Abordagem 5

Usando a abordagem 5 para gerar templates com foco em templates positivos e negativos. Uma possível aplicação seria testar a capacidade linguística "Vocabulary" com o teste MFT.

As etapas desta abordagem são:

1. Quebrar as instâncias em sentenças
2. Rankear as palavras de cada sentença
3. Filtrar as sentenças pelo tamanho (maior ou igual a 5 palavras)
4. Filtrar sentenças com palavras relevantes (verbos ou adjetivos)
5. Filtrar sentenças com alta confiança na predição das palavras relevantes da etapa anterior
6. Substituir as palavras relevantes por máscaras

In [1]:
%config Completer.use_jedi = False
import sys
sys.path.append('../../')

## Carregando o dataset, o modelo alvo e os modelos auxiliares

In [2]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

movie_reviews_rt_df = pd.read_csv('./data/data-rt-100samples.csv')
movie_reviews_rt_df.head(5)

Unnamed: 0,label,text,words
0,1,allen's underestimated charm delivers more goodies than lumps of coal .,11
1,0,skip the film and buy the philip glass soundtrack cd .,11
2,0,involving at times but lapses quite casually into the absurd .,11
3,0,while hoffman's performance is great the subject matter goes nowhere .,11
4,1,a flick about our infantilized culture that isn't entirely infantile .,11


In [3]:
import re
import numpy as np
from torch.nn.functional import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def pre_proccess(text):
    text = text.lower()
    text = re.sub('["\',!-.:-@0-9/]()', ' ', text)
    return text

# Wrapper to adapt output format
class SentimentAnalisysModelWrapper:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        
    def __predict(self, text_input):
        text_preprocessed = pre_proccess(text_input)
        tokenized = self.tokenizer(text_preprocessed, padding=True, truncation=True, max_length=512, 
                                    add_special_tokens = True, return_tensors="pt")
        
        tensor_logits = self.model(**tokenized)
        prob = softmax(tensor_logits[0]).detach().numpy()
        pred = np.argmax(prob)
        
        return pred, prob
    
    def predict_label(self, text_inputs):
        return self.predict(text_inputs)[0]
        
    def predict_proba(self, text_inputs):
        return self.predict(text_inputs)[1]
        
    def predict(self, text_inputs):
        if isinstance(text_inputs, str):
            text_inputs = [text_inputs]
        
        preds = []
        probs = []

        for text_input in text_inputs:
            pred, prob = self.__predict(text_input)
            preds.append(pred)
            probs.append(prob[0])

        return np.array(preds), np.array(probs) # ([0, 1], [[0.99, 0.01], [0.03, 0.97]])

# Auxiliar function to load and wrap a model from Hugging Face
def load_model(model_name):
    print(f'Loading model {model_name}...')
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    return SentimentAnalisysModelWrapper(model, tokenizer)

# Hugging Face hosted model names 
movie_reviews_models = {
    'bert': 'textattack/bert-base-uncased-rotten-tomatoes', 
    'albert': 'textattack/albert-base-v2-rotten-tomatoes', 
    'distilbert': 'textattack/distilbert-base-uncased-rotten-tomatoes', 
    'roberta': 'textattack/roberta-base-rotten-tomatoes', 
    'xlnet': 'textattack/xlnet-base-cased-rotten-tomatoes'
}

In [4]:
m1 = load_model(movie_reviews_models['albert'])
m2 = load_model(movie_reviews_models['distilbert'])
m3 = load_model(movie_reviews_models['roberta'])
m4 = load_model(movie_reviews_models['xlnet'])

# Models to be used as oracle
models = [m1, m2, m3, m4]
# Target model
model = load_model(movie_reviews_models['bert'])

Loading model textattack/albert-base-v2-rotten-tomatoes...
Loading model textattack/distilbert-base-uncased-rotten-tomatoes...
Loading model textattack/roberta-base-rotten-tomatoes...


Some weights of the model checkpoint at textattack/roberta-base-rotten-tomatoes were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Loading model textattack/xlnet-base-cased-rotten-tomatoes...
Loading model textattack/bert-base-uncased-rotten-tomatoes...


# Gerando os templates
O método de rankeamento das palavras usado no PosNegTemplateGenerator é o Replace-1 Score

In [5]:
from template_generator.tasks.sentiment_analisys import PosNegTemplateGeneratorApp5

tg = PosNegTemplateGeneratorApp5(model, models)

### Número inicial de instâncias: 5

In [6]:
# Sampling instances
np.random.seed(220)
n_instances = 5
df_sampled = movie_reviews_rt_df.sample(n_instances)

instances = [x for x in df_sampled['text'].values]

In [7]:
tg = PosNegTemplateGeneratorApp5(model, models)
templates = tg.generate_templates(instances, n_masks=2, ranked_words_count=4, min_classification_score=0.8)

Converting texts to sentences...
:: 6 sentences were generated.
Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Word ranking done.
Filtering instances by contaning a minimmum of words: 5...
:: 6 sentences remaining.
4
Filtering instances by relevant words...
['VERB', 'ADJ']
{word: of, index: 17, tag: ADP, rank_score: -0.00011205673217773438}
{word: wonders, index: 16, tag: NOUN, rank_score: -7.635354995727539e-05}
{word: elegantly, index: 0, tag: ADV, rank_score: -5.036592483520508e-05}
{word: detailed, index: 15, tag: ADJ, rank_score: -1.6689300537109375e-05}
 
['VERB', 'ADJ']
{word: future, index: 2, tag: NOUN, rank_score: -0.06550866365432739}
{word: hopes, index: 4, tag: VERB, rank_score: -0.0017756223678588867}
{word: one, index: 3, tag: NUM, rank_score: 0.0016388297080993652}
{word: for, index: 0, tag: ADP, rank_score: 0.0016148090362548828}
 
['VERB', 'ADJ']
{word: of, index: 15, tag: ADP, rank_score: -0.0016689896583557129}
{word: for, index: 11, tag: ADP, rank_score: -0.0008890032768249512}
{word: more, index: 13, tag: ADJ, rank_score: -0.000647127628326416}
{word: member, index: 14,

#### Tempo de execução para 100 instâncias: 9.1s

In [8]:
df = tg.to_dataframe()
df

Unnamed: 0,label,original_text,masked_text,template_text


In [9]:
tg.lexicons

{'pos_verb': [], 'neg_verb': [], 'pos_adj': [], 'neg_adj': []}

### Número inicial de instâncias: 100

In [10]:
# Using all 100 instances
instances = [x for x in movie_reviews_rt_df['text'].values]

In [11]:
tg = PosNegTemplateGeneratorApp5(model, models)
templates = tg.generate_templates(instances, n_masks=2, ranked_words_count=4, min_classification_score=0.8)

Converting texts to sentences...
:: 134 sentences were generated.
Ranking words using Replace-1 Score...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Word ranking done.
Filtering instances by contaning a minimmum of words: 5...
:: 118 sentences remaining.
4
Filtering instances by relevant words...
['VERB', 'ADJ']
{word: goodies, index: 6, tag: NOUN, rank_score: 0.005269169807434082}
{word: lumps, index: 8, tag: NOUN, rank_score: 0.0032875537872314453}
{word: underestimated, index: 2, tag: ADJ, rank_score: 0.0025653839111328125}
{word: than, index: 7, tag: ADP, rank_score: 0.0022441744804382324}
 
['VERB', 'ADJ']
{word: skip, index: 0, tag: VERB, rank_score: -0.3447835445404053}
{word: and, index: 3, tag: CONJ, rank_score: -0.012676358222961426}
{word: philip, index: 6, tag: NOUN, rank_score: 0.0009868144989013672}
{word: the, index: 1, tag: DET, rank_score: 0.00048273801803588867}
 
['VERB', 'ADJ']
{word: lapses, index: 4, tag: VERB, rank_score: -0.4460386037826538}
{word: into, index: 7, tag: ADP, rank_score: -0.0017092227935791016}
{word: times, index: 2, tag: NOUN, rank_score: 0.0014830231666564941}
{word: the, index: 8, tag: 

#### Tempo de execução para 100 instâncias: 4min 38.9s

In [12]:
df = tg.to_dataframe()
df

Unnamed: 0,label,original_text,masked_text,template_text
0,1,intelligent caustic take on a great writer and dubious human being .,{mask} caustic take on a great writer and dubious {mask} being .,{pos_adj} caustic take on a great writer and dubious {neg_adj} being .
1,0,it's a bad sign in a thriller when you instantly know whodunit .,it 's a {mask} sign in a thriller when you instantly {mask} whodunit .,it 's a {neg_adj} sign in a thriller when you instantly {pos_verb} whodunit .
2,0,falsehoods pile up undermining the movie's reality and stifling its creator's comic voice .,falsehoods {mask} up undermining the movie 's reality and stifling its creator 's {mask} voice .,falsehoods {neg_verb} up undermining the movie 's reality and stifling its creator 's {neg_adj} voice .
3,0,a well-made thoughtful well-acted clunker but a clunker nonetheless .,a {mask} {mask} well-acted clunker but a clunker nonetheless .,a {pos_adj} {neg_adj} well-acted clunker but a clunker nonetheless .
4,0,a long dull procession of despair set to cello music culled from a minimalist funeral .,a long {mask} procession of despair set to cello music culled from a {mask} funeral .,a long {neg_adj} procession of despair set to cello music culled from a {neg_adj} funeral .
5,0,could the country bears really be as bad as its trailers ?,{mask} the country bears really be as {mask} as its trailers ?,{neg_verb} the country bears really be as {neg_adj} as its trailers ?
6,0,as comedic spotlights go notorious c .,as comedic spotlights {mask} {mask} c .,as comedic spotlights {neg_verb} {neg_adj} c .
7,1,the movie is saved from unbearable lightness by the simplicity of the storytelling and the authenticity of the performances .,the movie is {mask} from {mask} lightness by the simplicity of the storytelling and the authenticity of the performances .,the movie is {pos_verb} from {neg_adj} lightness by the simplicity of the storytelling and the authenticity of the performances .
8,0,an empty exercise a florid but ultimately vapid crime melodrama with lots of surface flash but little emotional resonance .,an empty exercise a florid but ultimately {mask} crime melodrama with lots of surface flash but {mask} emotional resonance .,an empty exercise a florid but ultimately {neg_adj} crime melodrama with lots of surface flash but {neg_adj} emotional resonance .
9,0,hugh grant's act is so consuming that sometimes it's difficult to tell who the other actors in the movie are .,hugh grant 's act is so {mask} that sometimes it 's {mask} to tell who the other actors in the movie are .,hugh grant 's act is so {neg_adj} that sometimes it 's {neg_adj} to tell who the other actors in the movie are .


In [13]:
tg.lexicons

{'pos_verb': ['saved',
  'inspiring',
  'eat',
  'know',
  'is',
  'moviemaking',
  'looking',
  'heartbreaking'],
 'neg_verb': ['examined',
  'does',
  'chokes',
  'depends',
  'thinks',
  'could',
  'should',
  'seeks',
  'lost',
  'otherwise',
  'pile',
  'go',
  'shows',
  'cliched'],
 'pos_adj': ['powerful',
  'in-depth',
  'gorgeous',
  'sound',
  'unflinching',
  'nincompoop',
  'well-made',
  'much',
  'intelligent',
  'pleasant',
  'best',
  'grand-scale',
  'deceptively',
  'riveting',
  'great'],
 'neg_adj': ['minimalist',
  'ridiculous',
  'sardonic',
  'self-indulgent',
  'pompous',
  'vapid',
  'difficult',
  'caustic',
  'unbearable',
  'pessimistic',
  'bad',
  'notorious',
  'drab',
  'dull',
  'human',
  'consuming',
  'undone',
  'thoughtful',
  'comic',
  'little']}

## Checklist

In [14]:
from checklist.editor import Editor
from checklist.test_suite import TestSuite
from checklist.test_types import MFT

In [15]:
lexicons = tg.lexicons
templates = tg.template_texts
masked = tg.masked_texts
labels = [sent.prediction.label for sent in tg.sentences]

editor = Editor()
editor.add_lexicon('pos_verb', lexicons['pos_verb'])
editor.add_lexicon('neg_verb', lexicons['neg_verb'])
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [16]:
for template, label, i in zip(templates, labels, range(len(templates))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary"))

In [17]:
suite.run(model.predict, overwrite=True)

Running Test: MFT with vocabullary - template1
Predicting 300 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 160 examples
Running Test: MFT with vocabullary - template3
Predicting 280 examples
Running Test: MFT with vocabullary - template4
Predicting 300 examples
Running Test: MFT with vocabullary - template5
Predicting 20 examples
Running Test: MFT with vocabullary - template6
Predicting 280 examples
Running Test: MFT with vocabullary - template7
Predicting 280 examples
Running Test: MFT with vocabullary - template8
Predicting 160 examples
Running Test: MFT with vocabullary - template9
Predicting 20 examples
Running Test: MFT with vocabullary - template10
Predicting 20 examples
Running Test: MFT with vocabullary - template11
Predicting 210 examples
Running Test: MFT with vocabullary - template12
Predicting 120 examples
Running Test: MFT with vocabullary - template13
Predicting 210 examples
Running Test: MFT with vocabullary - template14
Predicting 280 examples
Running Test: MFT with vocabullary - template15
Predicting 210 examples
Run

In [18]:
suite.summary()

Vocabullary

Test: MFT with vocabullary - template1
Test cases:      300
Fails (rate):    64 (21.3%)

Example fails:
0.0 deceptively caustic take on a great writer and dubious pompous being .
----
0.1 grand-scale caustic take on a great writer and dubious ridiculous being .
----
0.0 sound caustic take on a great writer and dubious sardonic being .
----


Test: MFT with vocabullary - template2
Test cases:      160
Fails (rate):    24 (15.0%)

Example fails:
1.0 it 's a thoughtful sign in a thriller when you instantly inspiring whodunit .
----
0.9 it 's a thoughtful sign in a thriller when you instantly eat whodunit .
----
1.0 it 's a consuming sign in a thriller when you instantly know whodunit .
----


Test: MFT with vocabullary - template3
Test cases:      280
Fails (rate):    0 (0.0%)


Test: MFT with vocabullary - template4
Test cases:      300
Fails (rate):    13 (4.3%)

Example fails:
0.7 a riveting notorious well-acted clunker but a clunker nonetheless .
----
0.8 a in-depth comic

In [19]:
suite.save('./suites/posneg-approach5.suite')

# Carregando suite de teste

In [20]:
from checklist.test_suite import TestSuite
suite = TestSuite.from_file('./suites/posneg-approach5.suite')

suite.visual_summary_table()

Please wait as we prepare the table data...


SuiteSummarizer(stats={'npassed': 0, 'nfailed': 0, 'nfiltered': 0}, test_infos=[{'name': 'Test: MFT with vocab…

# teste

In [21]:
lexicons = tg.lexicons
templates = tg.template_texts
labels = [sent.prediction.label for sent in tg.sentences]

editor = Editor()
editor.add_lexicon('pos_verb', lexicons['pos_verb'])
editor.add_lexicon('neg_verb', lexicons['neg_verb'])
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [22]:
data = []
lbl = []
for template, label in zip(templates, labels):
    t = editor.template(template, remove_duplicates=True, labels=int(label))
    data.extend(t.data)
    lbl.extend(t.labels)

suite.add(MFT(
    data=data,
    labels=lbl,
    capability="Vocabullary",
    name="Template Generator - Vocabulary in MFT",
    description="Testing the model for vocabulary capability"))

In [23]:
suite.run(model.predict, overwrite=True)

Running Template Generator - Vocabulary in MFT
Predicting 4362 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


In [24]:
suite.summary()

Vocabullary

Template Generator - Vocabulary in MFT
Test cases:      4362
Fails (rate):    505 (11.6%)

Example fails:
0.3 this is a movie that is what it is : a intelligent distraction a friday night diversion an excuse to saved popcorn .
----
0.0 sound caustic take on a great writer and dubious unbearable being .
----
0.5 as comedic spotlights shows comic c .
----




