# Abordagem 3

Usando a abordagem 3 para gerar templates com foco em templates positivos e negativos. Uma possível aplicação seria testar a capacidade linguística "Vocabulary" com o teste MFT.

As etapas desta abordagem são:

1. Quebrar a instância em sentenças
2. Classificar as sentenças usando um ou mais modelos para ajudar a rotular as sentenças
3. Filtrar as sentenças classificadas de forma unânime
4. Filtrar as sentenças com alta confiança nas predições
5. Rankear as palavras de cada sentença
6. Filtrar sentenças com palavras relevantes (verbos e adjetivos)
7. Substituir as palavras relevantes por máscaras

In [1]:
%config Completer.use_jedi = False
import sys
sys.path.append('../')

## Carregando o dataset, o modelo alvo e os modelos auxiliares

In [2]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

movie_reviews_rt_df = pd.read_csv('./data/data-rt-100samples.csv')
movie_reviews_rt_df.head(5)

Unnamed: 0,label,text,words
0,1,allen's underestimated charm delivers more goodies than lumps of coal .,11
1,0,skip the film and buy the philip glass soundtrack cd .,11
2,0,involving at times but lapses quite casually into the absurd .,11
3,0,while hoffman's performance is great the subject matter goes nowhere .,11
4,1,a flick about our infantilized culture that isn't entirely infantile .,11


In [3]:
import re
import numpy as np
from torch.nn.functional import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def pre_proccess(text):
    text = text.lower()
    text = re.sub('["\',!-.:-@0-9/]()', ' ', text)
    return text

# Wrapper to adapt output format
class SentimentAnalisysModelWrapper:
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        
    def __predict(self, text_input):
        text_preprocessed = pre_proccess(text_input)
        tokenized = self.tokenizer(text_preprocessed, padding=True, truncation=True, max_length=512, 
                                    add_special_tokens = True, return_tensors="pt")
        
        tensor_logits = self.model(**tokenized)
        prob = softmax(tensor_logits[0]).detach().numpy()
        pred = np.argmax(prob)
        
        return pred, prob
    
    def predict_label(self, text_inputs):
        return self.predict(text_inputs)[0]
        
    def predict_proba(self, text_inputs):
        return self.predict(text_inputs)[1]
        
    def predict(self, text_inputs):
        if isinstance(text_inputs, str):
            text_inputs = [text_inputs]
        
        preds = []
        probs = []

        for text_input in text_inputs:
            pred, prob = self.__predict(text_input)
            preds.append(pred)
            probs.append(prob[0])

        return np.array(preds), np.array(probs) # ([0, 1], [[0.99, 0.01], [0.03, 0.97]])

# Auxiliar function to load and wrap a model from Hugging Face
def load_model(model_name):
    print(f'Loading model {model_name}...')
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    return SentimentAnalisysModelWrapper(model, tokenizer)

# Hugging Face hosted model names 
movie_reviews_models = {
    'bert': 'textattack/bert-base-uncased-rotten-tomatoes', 
    'albert': 'textattack/albert-base-v2-rotten-tomatoes', 
    'distilbert': 'textattack/distilbert-base-uncased-rotten-tomatoes', 
    'roberta': 'textattack/roberta-base-rotten-tomatoes', 
    'xlnet': 'textattack/xlnet-base-cased-rotten-tomatoes'
}

In [4]:
m1 = load_model(movie_reviews_models['albert'])
m2 = load_model(movie_reviews_models['distilbert'])
m3 = load_model(movie_reviews_models['roberta'])
m4 = load_model(movie_reviews_models['xlnet'])

# Models to be used as oracle
models = [m1, m2, m3, m4]
# Target model
model = load_model(movie_reviews_models['bert'])

Loading model textattack/albert-base-v2-rotten-tomatoes...
Loading model textattack/distilbert-base-uncased-rotten-tomatoes...
Loading model textattack/roberta-base-rotten-tomatoes...


Some weights of the model checkpoint at textattack/roberta-base-rotten-tomatoes were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Loading model textattack/xlnet-base-cased-rotten-tomatoes...
Loading model textattack/bert-base-uncased-rotten-tomatoes...


# Gerando os templates
O método de rankeamento das palavras usado no PosNegTemplateGenerator é o Replace-1 Score

In [5]:
from template_generator.tasks.sentiment_analisys import PosNegTemplateGeneratorApp3

tg = PosNegTemplateGeneratorApp3(model, models)

### Número inicial de instâncias: 5

In [6]:
# Sampling instances
np.random.seed(220)
n_instances = 5
df_sampled = movie_reviews_rt_df.sample(n_instances)

instances = [x for x in df_sampled['text'].values]

In [7]:
templates = tg.generate_templates(instances, n_masks=1, range_words=100)

Converting texts to sentences...
:: 6 sentences were generated.
Predicting inputs...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Sentence predictions done.
Filtering instances classified unanimously...
:: 3 sentences remaining.
Filtering instances by classification score greater than 0.9
:: 2 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
Filtering instances by relevant words...
:: 2 sentences remaining.
Filtering instances by relevant words classification score greater than 0.9
:: 2 sentences remaining.


#### Tempo de execução para 5 instâncias: 16.8s

In [8]:
df = tg.to_dataframe()
df

Unnamed: 0,label,original_text,masked_text,template_text
0,1,elegantly produced and expressively performed the six musical numbers crystallize key plot moments into minutely detailed wonders of dreamlike ecstasy .,elegantly produced and expressively performed the six musical numbers crystallize key plot moments into minutely {mask} wonders of dreamlike ecstasy .,elegantly produced and expressively performed the six musical numbers crystallize key plot moments into minutely {pos_adj} wonders of dreamlike ecstasy .
1,1,by no means a great movie but it is a refreshingly forthright one .,by no means a great movie but it is a refreshingly {mask} one .,by no means a great movie but it is a refreshingly {pos_adj} one .


In [9]:
tg.lexicons

{'pos_verb': [],
 'neg_verb': [],
 'pos_adj': ['detailed', 'forthright'],
 'neg_adj': []}

### Número inicial de instâncias: 100

In [10]:
# Using all 100 instances
instances = [x for x in movie_reviews_rt_df['text'].values]

In [11]:
templates = tg.generate_templates(instances, n_masks=2, range_words=4, min_classification_score=0.8)

Converting texts to sentences...
:: 134 sentences were generated.
Predicting inputs...


  prob = softmax(tensor_logits[0]).detach().numpy()


:: Sentence predictions done.
Filtering instances classified unanimously...
:: 96 sentences remaining.
Filtering instances by classification score greater than 0.8
:: 87 sentences remaining.
Ranking words using Replace-1 Score...
:: Word ranking done.
Filtering instances by relevant words...
:: 36 sentences remaining.
Filtering instances by relevant words classification score greater than 0.8
:: 25 sentences remaining.


#### Tempo de execução para 100 instâncias: 08m 19.7s

In [12]:
df = tg.to_dataframe()
df

Unnamed: 0,label,original_text,masked_text,template_text
0,1,intelligent caustic take on a great writer and dubious human being .,{mask} caustic take on a great writer and dubious {mask} being .,{pos_adj} caustic take on a great writer and dubious {neg_adj} being .
1,0,it's a bad sign in a thriller when you instantly know whodunit .,it 's a {mask} sign in a thriller when you instantly {mask} whodunit .,it 's a {neg_adj} sign in a thriller when you instantly {pos_verb} whodunit .
2,0,falsehoods pile up undermining the movie's reality and stifling its creator's comic voice .,falsehoods {mask} up undermining the movie 's reality and stifling its creator 's {mask} voice .,falsehoods {neg_verb} up undermining the movie 's reality and stifling its creator 's {neg_adj} voice .
3,0,a long dull procession of despair set to cello music culled from a minimalist funeral .,a long {mask} procession of despair set to cello music culled from a {mask} funeral .,a long {neg_adj} procession of despair set to cello music culled from a {neg_adj} funeral .
4,0,could the country bears really be as bad as its trailers ?,{mask} the country bears really be as {mask} as its trailers ?,{neg_verb} the country bears really be as {neg_adj} as its trailers ?
5,1,the movie is saved from unbearable lightness by the simplicity of the storytelling and the authenticity of the performances .,the movie is {mask} from {mask} lightness by the simplicity of the storytelling and the authenticity of the performances .,the movie is {pos_verb} from {neg_adj} lightness by the simplicity of the storytelling and the authenticity of the performances .
6,0,an empty exercise a florid but ultimately vapid crime melodrama with lots of surface flash but little emotional resonance .,an empty exercise a florid but ultimately {mask} crime melodrama with lots of surface flash but {mask} emotional resonance .,an empty exercise a florid but ultimately {neg_adj} crime melodrama with lots of surface flash but {neg_adj} emotional resonance .
7,0,hugh grant's act is so consuming that sometimes it's difficult to tell who the other actors in the movie are .,hugh grant 's act is so {mask} that sometimes it 's {mask} to tell who the other actors in the movie are .,hugh grant 's act is so {neg_adj} that sometimes it 's {neg_adj} to tell who the other actors in the movie are .
8,0,opens at a funeral ends on the protagonist's death bed and doesn't get much livelier in the three hours in between .,opens at a funeral ends on the protagonist 's death bed and {mask} n't get {mask} livelier in the three hours in between .,opens at a funeral ends on the protagonist 's death bed and {neg_verb} n't get {pos_adj} livelier in the three hours in between .
9,1,at its best this is grand-scale moviemaking for a larger-than-life figure an artist who has been awarded mythic status in contemporary culture .,at its best this is {mask} {mask} for a larger-than-life figure an artist who has been awarded mythic status in contemporary culture .,at its best this is {pos_adj} {pos_verb} for a larger-than-life figure an artist who has been awarded mythic status in contemporary culture .


In [13]:
tg.lexicons

{'pos_verb': ['looking',
  'saved',
  'moviemaking',
  'eat',
  'heartbreaking',
  'inspiring',
  'is',
  'know'],
 'neg_verb': ['shows',
  'depends',
  'does',
  'should',
  'lost',
  'otherwise',
  'thinks',
  'cliched',
  'could',
  'pile',
  'chokes',
  'seeks'],
 'pos_adj': ['gorgeous',
  'pleasant',
  'deceptively',
  'forthright',
  'unflinching',
  'in-depth',
  'riveting',
  'powerful',
  'grand-scale',
  'much',
  'detailed',
  'intelligent',
  'nincompoop'],
 'neg_adj': ['dull',
  'unbearable',
  'vapid',
  'pessimistic',
  'difficult',
  'self-indulgent',
  'drab',
  'consuming',
  'human',
  'ridiculous',
  'bad',
  'minimalist',
  'undone',
  'comic',
  'little',
  'pompous']}

## Checklist

In [14]:
import checklist
from checklist.editor import Editor
from checklist.test_suite import TestSuite
from checklist.test_types import MFT

In [15]:
lexicons = tg.lexicons
templates = tg.template_texts
labels = [sent.prediction.label for sent in tg.sentences]

editor = Editor()
editor.add_lexicon('pos_verb', lexicons['pos_verb'])
editor.add_lexicon('neg_verb', lexicons['neg_verb'])
editor.add_lexicon('pos_adj', lexicons['pos_adj'])
editor.add_lexicon('neg_adj', lexicons['neg_adj'])

suite = TestSuite()

In [16]:
data = []
lbl = []
for template, label in zip(templates, labels):
    t = editor.template(template, remove_duplicates=True, labels=int(label))
    data.extend(t.data)
    lbl.extend(t.labels)

suite.add(MFT(
    data=data,
    labels=lbl,
    capability="Vocabullary",
    name="Template Generator - Vocabulary in MFT",
    description="Testing the model for vocabulary capability"
))

In [17]:
suite.run(model.predict, overwrite=True)

Running Template Generator - Vocabulary in MFT
Predicting 2657 examples


  prob = softmax(tensor_logits[0]).detach().numpy()


In [18]:
suite.summary()

Vocabullary

Template Generator - Vocabulary in MFT
Test cases:      2657
Fails (rate):    233 (8.8%)

Example fails:
0.8 is an inexpressible and human wannabe heartbreaking for that exact niche .
----
0.2 is looking ironic and revelatory of just how drab and money-oriented the record industry really is .
----
0.5 is an inexpressible and consuming wannabe heartbreaking for that exact niche .
----




