# Abordagem 3

Usando a abordagem 3 para gerar templates com foco em templates positivos e negativos. Uma possível aplicação seria testar a capacidade linguística *Vocabullary* com o teste **MFT**.

As etapas desta abordagem são:

1. Quebrar a instância em sentenças
2. Classificar as sentenças usando o *Oráculo*
3. Filtrar as sentenças classificadas de forma unânime
4. Filtrar as sentenças com alta confiança nas predições
5. Rankear as palavras de cada sentença
6. Filtrar sentenças com palavras relevantes (verbos ou adjetivos) bem rankeadas
7. Filtrar sentenças com alta confiança na predição das palavras relevantes 
8. Substituir as palavras relevantes por máscaras

In [1]:
%config Completer.use_jedi = False
import sys
sys.path.append('../../')

## Carregando o dataset, o modelo alvo e os modelos auxiliares

In [2]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

movie_reviews_rt_df = pd.read_csv('./data/data-rt-1000samples.csv')
movie_reviews_rt_df.head(5)

Unnamed: 0,label,text,words
0,1,drumline ably captures the complicated relationships in a marching band .,11
1,1,delivers roughly equal amounts of beautiful movement and inside information .,11
2,1,saved from being merely way-cool by a basic credible compassion .,11
3,1,this is a movie full of grace and ultimately hope .,11
4,1,the imax screen enhances the personal touch of manual animation .,11


In [3]:
import re
import numpy as np
from torch.nn.functional import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def pre_proccess(text):
    text = text.lower()
    text = re.sub('["\',!-.:-@0-9/]()', ' ', text)
    return text

# Wrapper to adapt output format
class SentimentAnalisysModelWrapper:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        
    def __predict(self, text_input):
        text_preprocessed = pre_proccess(text_input)
        tokenized = self.tokenizer(text_preprocessed, padding=True, truncation=True, max_length=512, 
                                    add_special_tokens = True, return_tensors="pt")
        
        tensor_logits = self.model(**tokenized)
        prob = softmax(tensor_logits[0]).detach().numpy()
        pred = np.argmax(prob)
        
        return pred, prob
    
    def predict_label(self, text_inputs):
        return self.predict(text_inputs)[0]
        
    def predict_proba(self, text_inputs):
        return self.predict(text_inputs)[1]
        
    def predict(self, text_inputs):
        if isinstance(text_inputs, str):
            text_inputs = [text_inputs]
        
        preds = []
        probs = []

        for text_input in text_inputs:
            pred, prob = self.__predict(text_input)
            preds.append(pred)
            probs.append(prob[0])

        return np.array(preds), np.array(probs) # ([0, 1], [[0.99, 0.01], [0.03, 0.97]])

# Auxiliar function to load and wrap a model from Hugging Face
def load_model(model_name):
    print(f'Loading model {model_name}...')
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    return SentimentAnalisysModelWrapper(model, tokenizer)

# Hugging Face hosted model names 
movie_reviews_models = {
    'bert': 'textattack/bert-base-uncased-rotten-tomatoes', 
    'albert': 'textattack/albert-base-v2-rotten-tomatoes', 
    'distilbert': 'textattack/distilbert-base-uncased-rotten-tomatoes', 
    'roberta': 'textattack/roberta-base-rotten-tomatoes', 
    'xlnet': 'textattack/xlnet-base-cased-rotten-tomatoes'
}

In [4]:
m1 = load_model(movie_reviews_models['albert'])
m2 = load_model(movie_reviews_models['distilbert'])
m3 = load_model(movie_reviews_models['roberta'])
m4 = load_model(movie_reviews_models['xlnet'])

# Models to be used as oracle
models = [m1, m2, m3, m4]
# Target model
model = load_model(movie_reviews_models['bert'])

Loading model textattack/albert-base-v2-rotten-tomatoes...
Loading model textattack/distilbert-base-uncased-rotten-tomatoes...
Loading model textattack/roberta-base-rotten-tomatoes...


Some weights of the model checkpoint at textattack/roberta-base-rotten-tomatoes were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Loading model textattack/xlnet-base-cased-rotten-tomatoes...
Loading model textattack/bert-base-uncased-rotten-tomatoes...


# Gerando os templates
O método de rankeamento das palavras usado no PosNegTemplateGenerator é o Replace-1 Score

In [5]:
from template_generator.tasks.sentiment_analisys import PosNegTemplateGeneratorApp3

tg = PosNegTemplateGeneratorApp3(model, models)

### Número inicial de instâncias: 5

In [6]:
# Sampling instances
np.random.seed(220)
n_instances = 5
df_sampled = movie_reviews_rt_df.sample(n_instances)

instances = [x for x in df_sampled['text'].values]

In [7]:
templates = tg.generate_templates(instances,  n_masks=2, ranked_words_count=4, min_classification_score=0.8)

Converting texts to sentences...
:: 9 sentences were generated.
Predicting inputs...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Sentence predictions done.
Filtering instances classified unanimously...
:: 5 sentences remaining.
Filtering instances by classification score greater than 0.8
:: 5 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
4
Filtering instances by relevant words...
['VERB', 'ADJ']
{word: rejected, index: 19, tag: ADJ, rank_score: -0.00044840574264526367}
{word: partway, index: 0, tag: ADV, rank_score: -0.00021529197692871094}
{word: concoction, index: 6, tag: NOUN, rank_score: -9.399652481079102e-05}
{word: episodes, index: 16, tag: NOUN, rank_score: -8.362531661987305e-05}
 
['VERB', 'ADJ']
{word: loved, index: 24, tag: VERB, rank_score: -0.8720396161079407}
{word: i, index: 23, tag: NOUN, rank_score: -0.0017665624618530273}
{word: it, index: 25, tag: PRON, rank_score: -0.0007870197296142578}
{word: !, index: 26, tag: ., rank_score: 0.0}
 
['VERB', 'ADJ']
{word: incredible, index: 32, tag: ADJ, rank_score: -0.019590258598327637}
{word: 's, index: 28, tag: PR

#### Tempo de execução para 5 instâncias: 6.1s

In [8]:
df = tg.to_dataframe()
df

Unnamed: 0,label,original_text,masked_text,template_text
0,1,"but tongue-in-cheek preposterousness has always been part of for the most part wilde's droll whimsy helps "" being earnest "" overcome its weaknesses and parker's creative interference .",but tongue-in-cheek preposterousness has always been part of for the most part wilde 's droll whimsy helps `` being earnest `` {mask} its weaknesses and parker 's {mask} interference .,but tongue-in-cheek preposterousness has always been part of for the most part wilde 's droll whimsy helps `` being earnest `` {neg_verb} its weaknesses and parker 's {pos_adj} interference .


In [9]:
tg.lexicons

{'pos_verb': [],
 'neg_verb': ['overcome'],
 'pos_adj': ['creative'],
 'neg_adj': []}

### Número inicial de instâncias: 100

In [10]:
# Using all 100 instances
instances = [x for x in movie_reviews_rt_df['text'].values]

In [11]:
tg = PosNegTemplateGeneratorApp3(model, models)
templates = tg.generate_templates(instances,  n_masks=2, ranked_words_count=4, min_classification_score=0.8)

Converting texts to sentences...
:: 1473 sentences were generated.
Predicting inputs...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Sentence predictions done.
Filtering instances classified unanimously...
:: 951 sentences remaining.
Filtering instances by classification score greater than 0.8
:: 893 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
4
Filtering instances by relevant words...
['VERB', 'ADJ']
{word: captures, index: 2, tag: VERB, rank_score: -0.016763925552368164}
{word: ably, index: 1, tag: ADV, rank_score: -0.00023543834686279297}
{word: drumline, index: 0, tag: NOUN, rank_score: -4.45246696472168e-05}
{word: in, index: 6, tag: ADP, rank_score: 3.7610530853271484e-05}
 
['VERB', 'ADJ']
{word: amounts, index: 3, tag: NOUN, rank_score: -0.016751408576965332}
{word: equal, index: 2, tag: ADJ, rank_score: -0.011423766613006592}
{word: beautiful, index: 5, tag: ADJ, rank_score: -0.0007750988006591797}
{word: of, index: 4, tag: ADP, rank_score: -0.00031381845474243164}
 
['VERB', 'ADJ']
{word: of, index: 5, tag: ADP, rank_score: 0.0004849433898925781}
{word: grace, index

#### Tempo de execução para 100 instâncias: 4m 22.7s

In [12]:
df = tg.to_dataframe()
df

Unnamed: 0,label,original_text,masked_text,template_text
0,1,delivers roughly equal amounts of beautiful movement and inside information .,delivers roughly {mask} amounts of {mask} movement and inside information .,delivers roughly {neg_adj} amounts of {pos_adj} movement and inside information .
1,1,manages to be wholesome and subversive at the same time .,manages to be wholesome and {mask} at the {mask} time .,manages to be wholesome and {pos_adj} at the {neg_adj} time .
2,0,friday after next is a lot more bluster than bite .,friday after {mask} is a lot {mask} bluster than bite .,friday after {neg_adj} is a lot {pos_adj} bluster than bite .
3,1,wow so who knew charles dickens could be so light-hearted ?,wow so who {mask} charles dickens could be so {mask} ?,wow so who {pos_verb} charles dickens could be so {pos_adj} ?
4,1,this is a very fine movie -- go see it .,this is a very {mask} movie -- go {mask} it .,this is a very {pos_adj} movie -- go {neg_verb} it .
...,...,...,...,...
223,0,guys say mean things and shoot a lot of bullets .,guys {mask} {mask} things and shoot a lot of bullets .,guys {neg_verb} {neg_adj} things and shoot a lot of bullets .
224,1,delia greta and paula rank as three of the most multilayered and sympathetic female characters of the year .,delia greta and paula rank as three of the most {mask} and {mask} female characters of the year .,delia greta and paula rank as three of the most {neg_adj} and {pos_adj} female characters of the year .
225,0,as with so many merchandised-to-the-max movies of this type more time appears to have gone into recruiting the right bands for the playlist and the costuming of the stars than into the script which has a handful of smart jokes and not much else .,as with so many merchandised-to-the-max movies of this type more time appears to have gone into recruiting the right bands for the playlist and the costuming of the stars than into the script which has a handful of {mask} jokes and not {mask} else .,as with so many merchandised-to-the-max movies of this type more time appears to have gone into recruiting the right bands for the playlist and the costuming of the stars than into the script which has a handful of {pos_adj} jokes and not {pos_adj} else .
226,1,the film reminds me of a vastly improved germanic version of my big fat greek wedding -- with better characters some genuine quirkiness and at least a measure of style .,the film {mask} me of a vastly {mask} germanic version of my big fat greek wedding -- with better characters some genuine quirkiness and at least a measure of style .,the film {pos_verb} me of a vastly {pos_verb} germanic version of my big fat greek wedding -- with better characters some genuine quirkiness and at least a measure of style .


In [13]:
tg.lexicons

{'pos_verb': ['stays',
  'elevate',
  'comes',
  'provides',
  'orchestrates',
  'haunting',
  'rhythms',
  'feel',
  'moviegoing',
  'keep',
  'storytelling',
  'capturing',
  'heartening',
  'improved',
  'riveted',
  'live',
  'appreciate',
  'care',
  'is',
  'being',
  'evokes',
  'filmmaking',
  'offerings',
  'touches',
  'satisfying',
  'keeps',
  'knew',
  'charming',
  'become',
  'riveting',
  'coheres',
  'trust',
  'reminds',
  'heal',
  'photographed',
  'kissing',
  'rock',
  'succeeds',
  'celebrates',
  'proves',
  'surprising',
  'rewarded',
  'rewarding',
  'creating'],
 'neg_verb': ['could',
  'running',
  'expected',
  'resembles',
  'done',
  'escape',
  'smoochy',
  'strikes',
  'sell',
  'avoid',
  'telegraphed',
  'named',
  'screen',
  'supposed',
  'damned',
  'showing',
  'lifts',
  'contrived',
  'simmering',
  'employ',
  'playing',
  'classify',
  'falls',
  'might',
  'dismissed',
  'screaming',
  'offend',
  'wish',
  'tries',
  'waterlogged',
  'tells'

## Checklist

In [14]:
from checklist.editor import Editor
from checklist.test_suite import TestSuite
from checklist.test_types import MFT

In [15]:
lexicons = tg.lexicons
templates = tg.template_texts
masked = tg.masked_texts
labels = [sent.prediction.label for sent in tg.sentences]

editor = Editor()
editor.add_lexicon('pos_verb', lexicons['pos_verb'])
editor.add_lexicon('neg_verb', lexicons['neg_verb'])
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [16]:
for template, label, i in zip(templates, labels, range(len(templates))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary"))

In [17]:
suite.run(model.predict, overwrite=True)

Running Test: MFT with vocabullary - template1
Predicting 10564 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 10564 examples
Running Test: MFT with vocabullary - template3
Predicting 10564 examples
Running Test: MFT with vocabullary - template4
Predicting 3342 examples
Running Test: MFT with vocabullary - template5
Predicting 7980 examples
Running Test: MFT with vocabullary - template6
Predicting 105 examples
Running Test: MFT with vocabullary - template7
Predicting 105 examples
Running Test: MFT with vocabullary - template8
Predicting 3342 examples
Running Test: MFT with vocabullary - template9
Predicting 7980 examples
Running Test: MFT with vocabullary - template10
Predicting 139 examples
Running Test: MFT with vocabullary - template11
Predicting 139 examples
Running Test: MFT with vocabullary - template12
Predicting 10564 examples
Running Test: MFT with vocabullary - template13
Predicting 3342 examples
Running Test: MFT with vocabullary - template14
Predicting 139 examples
Running Test: MFT with vocabullary - template15
Predicting 13

In [18]:
suite.summary()

Vocabullary

Test: MFT with vocabullary - template1
Test cases:      10564
Fails (rate):    4470 (42.3%)

Example fails:
0.0 delivers roughly disappointing amounts of comfortable movement and inside information .
----
0.0 delivers roughly few amounts of satisfying movement and inside information .
----
0.0 delivers roughly strained amounts of good movement and inside information .
----


Test: MFT with vocabullary - template2
Test cases:      10564
Fails (rate):    729 (6.9%)

Example fails:
0.1 manages to be wholesome and inventive at the ridiculous time .
----
0.1 manages to be wholesome and able at the stupid time .
----
0.0 manages to be wholesome and artful at the stupid time .
----


Test: MFT with vocabullary - template3
Test cases:      10564
Fails (rate):    1094 (10.4%)

Example fails:
1.0 friday after hard is a lot superior bluster than bite .
----
1.0 friday after dreary is a lot effective bluster than bite .
----
0.9 friday after loud is a lot rewarding bluster than bite .

In [19]:
suite.save('./suites/posneg-approach3.suite')

# Carregando suite de teste

In [20]:
from checklist.test_suite import TestSuite
suite = TestSuite.from_file('./suites/posneg-approach3.suite')

suite.visual_summary_table()

Please wait as we prepare the table data...


SuiteSummarizer(stats={'npassed': 0, 'nfailed': 0, 'nfiltered': 0}, test_infos=[{'name': 'Test: MFT with vocab…