# Abordagem 3

Usando a abordagem 3 para gerar templates com foco em templates positivos e negativos. Uma possível aplicação seria testar a capacidade linguística *Vocabullary* com o teste **MFT**.

As etapas desta abordagem são:

1. Quebrar a instância em sentenças
2. Classificar as sentenças usando o *Oráculo*
3. Filtrar as sentenças classificadas de forma unânime
4. Filtrar as sentenças com alta confiança nas predições
5. Rankear as palavras de cada sentença
6. Filtrar sentenças com palavras relevantes (verbos ou adjetivos) bem rankeadas
7. Filtrar sentenças com alta confiança na predição das palavras relevantes 
8. Substituir as palavras relevantes por máscaras

In [11]:
%config Completer.use_jedi = False
import sys
sys.path.append('../../')

## Carregando o dataset, o modelo alvo e os modelos auxiliares

In [12]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

imdb_df = pd.read_csv('../data/imdb_sampled/data-1000samples.csv')
imdb_df.head(5)

Unnamed: 0,label,text,words
0,0,"Here's example number 87,358 of Hollywood's anti-Biblical bias, so typical of them.<br /><br />Early on, Ray Liotta's wife has did and women are being interviewed for the position of housekeeper. The first interviewee is an old-fashioned-looking (dress, mannerisms, speech) who immediately lays down here strict rules, stating that ""there will be two hours of Bible study ever day.""<br /><br />This is said, of course, to make it sound like reading the Bible is the worse punishment you could ever inflict on someone, especially a kid. Once again, the Bible is equated with stuffy, mean-spirited people. That woman, of course, is dismissed immediately.<br /><br />Naturally, the liberal black woman (Whoopi Goldberg - who else?) is the one who is hired and, voilà, saves the day! <br /><br />Yawn.",127
1,0,"I don't know about the real Cobb but I got the distinct impression that the filmmakers' aim was to try to soften his jagged edges and reputation, not give us a true portrait of the man himself. In the movie, besides a few racist remarks, he's shown to be just another hard-nosed, cantakerous old coot (he's so full of life!) with a heart of gold(more or less). This is also the worst acting I've seen T.L.Jones do(he brings nothing new or subtle to his stereotyped character). He just doesn't flesh out Cobb in a way that pulls me into the movie. Not for one minute did I forget that it was Tommy Lee Jones on the screen pretending to be Ty Cobb. Robert Wuhl didnt impress either. The ""comedic"" elements in this movie were just distracting and didnt ring true at all. A bloody waste of time, it is",149
2,0,"Reba is a very dumb show. You can predict pretty much anything that's about to happen. Barbra Jean is just too stupid. It's like she's not even a character. A show like this should at least have SOMEONE who resembles a real-life person. I guess Barbra Jean represents a retarded person. Keira or whatever her name is, Reba, Brock, they're all stupid! Keira is like the smartest person on the show, and she's still stupid. EVERYONE IS STUPID! That's my opinion on Reba. Since I have said all I can say about this show, I'll just take up the next few lines of text by saying what I am currently saying right now and do it until there's 10 lines. There. Reba gets 2/10.",124
3,1,"""One Crazy Summer"" is the funniest, craziest (not necessarily the best), movie I have ever seen.<br /><br />Just when one crazy scene is done, another emerges. It never lets you rest. Just one thing after another. The soundtrack is great. The songs are the right ones for the scenes.<br /><br />It is also a clean movie. Little that is dirty in it.<br /><br />Of course, it has the story of the guys you wouldn't trust with your lunch money, taking up a challenge, and winning over people with more resources. Who'd want to see it if they failed? There is a serious side, in that parents and children do not live up to each others' dreams. One should always have an open mind, and weigh all the options. This applies both to parents and children. In ""One Crazy Summer"", the parents are wrong. This is not always the case.",149
4,0,"""A young man, recently engaged to be married, is the victim of a traffic accident and dies as a result of his injuries. His father, desperate to revive his son, agrees to let a scientist friend try his experimental soul transmigration process to save him. After the young man returns to life, the father and fiancée notice a dark and violent change in the young man's behavior, leading them to believe something went horribly wrong in the revival process,"" according to the DVD sleeve's synopsis.<br /><br />At one point, Edward Norris (as Philip Bennett) is asked, ""What do you think this is, Boys Town?"" Mr. Norris should know, since he was in ""Boys Town"". ""The Man with Two Lives "" is more like ""Black Friday"" minus Karloff and Lugosi. You do the math. This film might have been a contender, with a re-worked script; it does feature an intriguing final act. After a tepid ""shoot out"", hang in for the drama to pick up with a well-played scene between star Norris and pursuing detective Addison Richards (as George Bradley).<br /><br />**** The Man with Two Lives (1942) Phil Rosen ~ Edward Norris, Eleanor Lawson, Addison Richards",196


In [13]:
import re
import numpy as np
from torch.nn.functional import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def pre_proccess(text):
    text = text.lower()
    text = re.sub('["\',!-.:-@0-9/]()', ' ', text)
    return text

# Wrapper to adapt output format
class SentimentAnalisysModelWrapper:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        
    def __predict(self, text_input):
        text_preprocessed = pre_proccess(text_input)
        tokenized = self.tokenizer(text_preprocessed, padding=True, truncation=True, max_length=512, 
                                    add_special_tokens = True, return_tensors="pt")
        
        tensor_logits = self.model(**tokenized)
        prob = softmax(tensor_logits[0]).detach().numpy()
        pred = np.argmax(prob)
        
        return pred, prob
    
    def predict_label(self, text_inputs):
        return self.predict(text_inputs)[0]
        
    def predict_proba(self, text_inputs):
        return self.predict(text_inputs)[1]
        
    def predict(self, text_inputs):
        if isinstance(text_inputs, str):
            text_inputs = [text_inputs]
        
        preds = []
        probs = []

        for text_input in text_inputs:
            pred, prob = self.__predict(text_input)
            preds.append(pred)
            probs.append(prob[0])

        return np.array(preds), np.array(probs) # ([0, 1], [[0.99, 0.01], [0.03, 0.97]])

# Auxiliar function to load and wrap a model from Hugging Face
def load_model(model_name):
    print(f'Loading model {model_name}...')
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    return SentimentAnalisysModelWrapper(model, tokenizer)

# Hugging Face hosted model names 
imdb_models = {
    'bert': 'textattack/bert-base-uncased-imdb', 
    'albert': 'textattack/albert-base-v2-imdb', 
    'distilbert': 'textattack/distilbert-base-uncased-imdb', 
    'roberta': 'textattack/roberta-base-imdb', 
    'xlnet': 'textattack/xlnet-base-cased-imdb'
}

In [14]:
m1 = load_model(imdb_models['albert'])
m2 = load_model(imdb_models['distilbert'])
m3 = load_model(imdb_models['roberta'])
m4 = load_model(imdb_models['xlnet'])

# Models to be used as oracle
models = [m1, m2, m3, m4]
# Target model
model = load_model(imdb_models['bert'])

Loading model textattack/albert-base-v2-imdb...
Loading model textattack/distilbert-base-uncased-imdb...
Loading model textattack/roberta-base-imdb...


Some weights of the model checkpoint at textattack/roberta-base-imdb were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Loading model textattack/xlnet-base-cased-imdb...
Loading model textattack/bert-base-uncased-imdb...


# Gerando os templates
O método de rankeamento das palavras usado no PosNegTemplateGenerator é o Replace-1 Score

In [15]:
from template_generator.tasks.sentiment_analisys import PosNegTemplateGeneratorApp3

tg = PosNegTemplateGeneratorApp3(model, models)

### Número inicial de instâncias: 5

In [16]:
# Sampling instances
np.random.seed(220)
n_instances = 5
df_sampled = imdb_df.sample(n_instances)

instances = [x for x in df_sampled['text'].values]

In [17]:
templates = tg.generate_templates(instances, ranked_words_count=4, min_classification_score=0.8)

Converting texts to sentences...
:: 31 sentences were generated.
Predicting inputs...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Sentence predictions done.
Filtering instances classified unanimously...
:: 21 sentences remaining.
Filtering instances by classification score greater than 0.8
:: 20 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
Filtering instances by relevant words...
:: 8 sentences remaining.
Filtering instances by relevant words classification score greater than 0.8
:: 1 sentences remaining.


#### Tempo de execução para 5 instâncias: 1m 19.9s

In [18]:
df = tg.to_dataframe()
df

Unnamed: 0,label,original_text,masked_text,template_text
0,0,They should have ditched the space station and headed for Mars.<br /><br />Major raspberries.,They {mask} have {mask} the space station and headed for Mars. < br / > < br / > Major raspberries .,They {neg_verb} have {neg_verb} the space station and headed for Mars. < br / > < br / > Major raspberries .


In [19]:
tg.lexicons

{'pos_verb': [],
 'neg_verb': ['should', 'ditched'],
 'pos_adj': [],
 'neg_adj': []}

### Número inicial de instâncias: 1000

In [20]:
# Using all 1000 instances
instances = [x for x in imdb_df['text'].values]

In [21]:
tg = PosNegTemplateGeneratorApp3(model, models)
templates = tg.generate_templates(instances, ranked_words_count=4, min_classification_score=0.8)

Converting texts to sentences...
:: 7973 sentences were generated.
Predicting inputs...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Sentence predictions done.
Filtering instances classified unanimously...
:: 5054 sentences remaining.
Filtering instances by classification score greater than 0.8
:: 4587 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
Filtering instances by relevant words...
:: 1758 sentences remaining.
Filtering instances by relevant words classification score greater than 0.8
:: 200 sentences remaining.


#### Tempo de execução para 1000 instâncias: 248m 21.1s

In [22]:
df = tg.to_dataframe()
df

Unnamed: 0,label,original_text,masked_text,template_text
0,1,"One should always have an open mind, and weigh all the options.","One {mask} always have an {mask} mind , and weigh all the options .","One {neg_verb} always have an {pos_adj} mind , and weigh all the options ."
1,1,"It takes someone of Chaplin's skill as a comedian to make something as dreary as trench warfare into such a brilliant comedy, but the irony that he uses in the film makes even the most uncomfortable conditions highly amusing.<br /><br />Like all of the best of Chaplin's films, short films and otherwise, this one is packed with brilliant and memorable scenes, such as the scene where he marks off kills with a piece of chalk on a board in the trench, erasing one when he gets his helmet shot off, the scene where he and his fellow soldiers are sleeping underwater, the opening of the beer bottle and lighting of the cigarette, and of course, the overtaking of the enemy.","It takes someone of Chaplin 's skill as a comedian to make something as dreary as trench warfare into such a brilliant comedy , but the irony that he uses in the film makes even the most uncomfortable conditions highly amusing. < br / > < br / > Like all of the {mask} of Chaplin 's films , short films and otherwise , this one is packed with brilliant and {mask} scenes , such as the scene where he marks off kills with a piece of chalk on a board in the trench , erasing one when he gets his helmet shot off , the scene where he and his fellow soldiers are sleeping underwater , the opening of the beer bottle and lighting of the cigarette , and of course , the overtaking of the enemy .","It takes someone of Chaplin 's skill as a comedian to make something as dreary as trench warfare into such a brilliant comedy , but the irony that he uses in the film makes even the most uncomfortable conditions highly amusing. < br / > < br / > Like all of the {pos_adj} of Chaplin 's films , short films and otherwise , this one is packed with brilliant and {pos_adj} scenes , such as the scene where he marks off kills with a piece of chalk on a board in the trench , erasing one when he gets his helmet shot off , the scene where he and his fellow soldiers are sleeping underwater , the opening of the beer bottle and lighting of the cigarette , and of course , the overtaking of the enemy ."
2,0,"This movie illustrates like no other the state of the Australian film industry and everything that's holding it back.<br /><br />Awesome talent, outstanding performances (particularly by Victoria Hill), but a let down in practically every other way.<br /><br />An ""adaptation"" of sorts, it brought nothing new to Macbeth (no, setting it in present-day Australia is not enough), and essentially, completely failed to justify its existence, apart from (let's face it, completely unnecessarily) paying homage to the original work.","This movie illustrates like no other the state of the Australian film industry and everything that 's holding it back. < br / > < br / > Awesome talent , {mask} performances ( particularly by Victoria Hill ) , but a let down in practically every other way. < br / > < br / > An `` adaptation '' of sorts , it brought nothing new to Macbeth ( no , setting it in present-day Australia is not enough ) , and essentially , completely {mask} to justify its existence , apart from ( let 's face it , completely unnecessarily ) paying homage to the original work .","This movie illustrates like no other the state of the Australian film industry and everything that 's holding it back. < br / > < br / > Awesome talent , {pos_adj} performances ( particularly by Victoria Hill ) , but a let down in practically every other way. < br / > < br / > An `` adaptation '' of sorts , it brought nothing new to Macbeth ( no , setting it in present-day Australia is not enough ) , and essentially , completely {neg_verb} to justify its existence , apart from ( let 's face it , completely unnecessarily ) paying homage to the original work ."
3,0,"So any adaptation, if it's not to be a self-indulgent and pointless exercise, needs to at least bring some new interpretation to the work.<br /><br />And that's what this Macbeth fails to do.","So any adaptation , if it 's not to be a self-indulgent and {mask} exercise , needs to at least bring some new interpretation to the work. < br / > < br / > And that 's what this Macbeth {mask} to do .","So any adaptation , if it 's not to be a self-indulgent and {neg_adj} exercise , needs to at least bring some new interpretation to the work. < br / > < br / > And that 's what this Macbeth {neg_verb} to do ."
4,1,Cute Movie feel good movie I had never heard of this movie but ran across it while looking for something to rent.,Cute Movie feel {mask} movie I {mask} never heard of this movie but ran across it while looking for something to rent .,Cute Movie feel {pos_adj} movie I {neg_verb} never heard of this movie but ran across it while looking for something to rent .
...,...,...,...,...
195,1,Danson is outstanding as the title character and edward fox makes a wonderful villain.,Danson is {mask} as the title character and edward fox makes a {mask} villain .,Danson is {pos_adj} as the title character and edward fox makes a {pos_adj} villain .
196,1,OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full,OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the {mask} show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the {mask} show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full,OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the {pos_adj} show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the {pos_adj} show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full stop.OZ is the greatest show ever mad full
197,1,The big budget special effects are spectacular but overshadowed by the jokes from the original comic.<br /><br />Depardieu and Clavier still work brilliantly as a pair while Monica Bellucci makes perfect casting as Cleopatra.,The big budget special effects are spectacular but overshadowed by the jokes from the original comic. < br / > < br / > Depardieu and Clavier still work brilliantly as a pair while Monica Bellucci makes {mask} {mask} as Cleopatra .,The big budget special effects are spectacular but overshadowed by the jokes from the original comic. < br / > < br / > Depardieu and Clavier still work brilliantly as a pair while Monica Bellucci makes {pos_adj} {neg_verb} as Cleopatra .
198,0,"The only positive in the movie is the presence of some talented actors who did not help much text to explode their talents Finally.., iwant to know your comment about this movie guys... is it bad or what?","The {mask} positive in the movie is the presence of some talented actors who did not help much text to explode their talents Finally .. , iwant to know your comment about this movie guys ... is it {mask} or what ?","The {pos_adj} positive in the movie is the presence of some talented actors who did not help much text to explode their talents Finally .. , iwant to know your comment about this movie guys ... is it {neg_adj} or what ?"


In [23]:
tg.lexicons

{'pos_verb': ['inspired',
  'watched',
  'Watching',
  'choreographed',
  'carries',
  'sit',
  'appreciate',
  'touch',
  'email',
  'watching',
  'appreciating',
  'exsists',
  'enjoyed',
  'knew.This',
  'heartfelt',
  'Thank',
  'reunite',
  'evokes',
  'recreates',
  'complimented',
  'sings',
  'recommend',
  'compelling',
  'unfolds',
  'care',
  "'ll",
  'loved',
  'inspiring',
  'amazing',
  'knew',
  'enjoy',
  'works',
  'wrenching'],
 'neg_verb': ['hate',
  'is',
  'panned',
  'cringed',
  'be',
  'Wasted',
  'pretends',
  'ignores',
  'might',
  'spend',
  'was',
  'lack',
  "'s",
  'failed',
  'costumed',
  'skip',
  'suck',
  'broke',
  'tries',
  'expecting',
  'centered',
  'waste',
  'wasted',
  'should',
  'acting',
  'seen',
  'directed',
  'bored',
  'had',
  'fails',
  'looking',
  'forget',
  'did',
  'casting',
  'involved',
  'ruined',
  'degrading',
  'replaced',
  'fail',
  'boring',
  'sickening',
  'avoid',
  'grating',
  'cut',
  'ditched',
  'crap',
  'me

# Usando os templates gerados pelo TemplateGenerator no CheckList

In [24]:
from checklist.editor import Editor
from checklist.test_suite import TestSuite
from checklist.test_types import MFT

In [25]:
lexicons = tg.lexicons
templates = tg.template_texts
masked = tg.masked_texts
labels = [sent.prediction.label for sent in tg.sentences]

editor = Editor()
editor.add_lexicon('pos_verb', lexicons['pos_verb'])
editor.add_lexicon('neg_verb', lexicons['neg_verb'])
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [26]:
for template, label, i in zip(templates, labels, range(len(templates))):
    t = editor.template(template, remove_duplicates=True, labels=int(label))

    suite.add(MFT(
        data=t.data,
        labels=label,
        capability="Vocabullary", 
        name=f"Test: MFT with vocabullary - template{i+1}",
        description="Checking if the model can handle vocabullary"))

In [27]:
suite.run(model.predict, overwrite=True)
suite.save('./suites/posneg-approach3.suite')

Running Test: MFT with vocabullary - template1
Predicting 3710 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


Running Test: MFT with vocabullary - template2
Predicting 70 examples
Running Test: MFT with vocabullary - template3
Predicting 3710 examples
Running Test: MFT with vocabullary - template4
Predicting 4133 examples
Running Test: MFT with vocabullary - template5
Predicting 3710 examples
Running Test: MFT with vocabullary - template6
Predicting 2574 examples
Running Test: MFT with vocabullary - template7
Predicting 2574 examples
Running Test: MFT with vocabullary - template8
Predicting 78 examples
Running Test: MFT with vocabullary - template9
Predicting 3710 examples
Running Test: MFT with vocabullary - template10
Predicting 4133 examples
Running Test: MFT with vocabullary - template11
Predicting 78 examples
Running Test: MFT with vocabullary - template12
Predicting 2308 examples
Running Test: MFT with vocabullary - template13
Predicting 2574 examples
Running Test: MFT with vocabullary - template14
Predicting 2574 examples
Running Test: MFT with vocabullary - template15
Predicting 70 exa

# Carregando suite de teste

In [29]:
from checklist.test_suite import TestSuite
suite = TestSuite.from_file('./suites/posneg-approach3.suite')

suite.visual_summary_table()

Please wait as we prepare the table data...


SuiteSummarizer(stats={'npassed': 0, 'nfailed': 0, 'nfiltered': 0}, test_infos=[{'name': 'Test: MFT with vocab…