### All code below uses concatenated sentence pairs in the Substitute Generation step in order to generate similar substitutes (as opposed to generation of fitting substitutes only)

In [None]:
from transformers import AutoTokenizer, AutoModelForMaskedLM
from fitbert import FitBert
import pandas as pd
from transformers import pipeline

# read the tsv file
filename = "./data/trial/tsar2022_en_trial_none.tsv"
data = pd.read_csv(filename, sep='\t', header=None, names=["sentence", "complex_word"])



In [2]:
# create an empty dataframe to store the substitutes for evaluation
substitutes_df = pd.DataFrame(columns=["sentence", "complex_word"] + [f"substitute_{i+1}" for i in range(10)])

### Bert-base

In [3]:
# initialize the tokenizer and the models
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
lm_model = AutoModelForMaskedLM.from_pretrained("bert-base-uncased")

# create a fill-mask pipeline
fill_mask = pipeline("fill-mask", lm_model, tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased"))

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


#### Only Substitute Generation with BERT-base (k=10)

In [4]:
# in each row, for each complex word: 
for index, row in data.iterrows():
       
    # 1. Substitute Generation (SG): perform masking and generate substitutes:
    
    ## print the sentence and the complex word
    sentence, complex_word = row["sentence"], row["complex_word"]
    print(f"Sentence: {sentence}")
    print(f"Complex word: {complex_word}")

    ## in the sentence, replace the complex word with a masked word
    sentence_masked_word = sentence.replace(complex_word, "[MASK]")

    ## concatenate the original sentence and the masked sentence
    tokenizer = fill_mask.tokenizer
    sentences_concat = f"{sentence} {tokenizer.sep_token} {sentence_masked_word}"

    ## generate and rank candidate substitutes for the masked word using the fill_mask pipeline
    top_k = 10
    result = fill_mask(sentences_concat, top_k=top_k)
   
    ## lowercase and print the top-k substitutes
    substitutes = [substitute["token_str"].lower() for substitute in result]
    print(f"SG step: generated substitutes: {substitutes}\n")
    
    
    
    # add the sentence, complex_word, and substitutes to the dataframe 
    substitutes_df.loc[index] = [sentence, complex_word] + substitutes
    
    # remove the #34-3 and #35-14 character combinations from the sentences in the dataframe
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#34-3 \"", "")
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#35-14 ", "")
    
    

# export the dataframe to a tsv file
substitutes_df.to_csv("./predictions/trial/BertBase_SG.tsv", sep="\t", index=False, header=False)

Sentence: A Spanish government source, however, later said that banks able to cover by themselves losses on their toxic property assets will not be forced to remove them from their books while it will be compulsory for those receiving public help.
Complex word: compulsory
SG step: generated substitutes: ['compulsory', 'mandatory', 'obligatory', 'optional', 'required', 'necessary', 'standard', 'voluntary', 'customary', 'impossible']

Sentence: Rajoy's conservative government had instilled markets with a brief dose of confidence by stepping into Bankia, performing a U-turn on its refusal to spend public money to rescue banks.
Complex word: instilled
SG step: generated substitutes: ['strengthened', 'targeted', 'tested', 'provided', 'created', 'affected', 'addressed', 'attacked', 'punished', 'bred']

Sentence: #34-3 "War maniacs of the South Korean puppet military made another grave provocation to the DPRK in the central western sector of the front on Thursday afternoon.
Complex word: mani

python tsar_eval.py --gold_file .\gold_trial.tsv --predictions_file ./predictions/trial/BertBase_SG.tsv --output_file .\output

#### Substitute Generation with BERT-base, and Substitute Selection steps a-c (k=30, limited to 10 after step 2c)

In [5]:
from nltk.corpus import wordnet as wn
import spacy
nlp = spacy.load("en_core_web_sm")



In [1]:
import string

In [2]:

# in each row, for each complex word: 
for index, row in data.iterrows():
       
    # 1. Substitute Generation (SG): perform masking and generate substitutes:
    
    ## print the sentence and the complex word
    sentence, complex_word = row["sentence"], row["complex_word"]
    print(f"Sentence: {sentence}")
    print(f"Complex word: {complex_word}")

    ## in the sentence, replace the complex word with a masked word
    sentence_masked_word = sentence.replace(complex_word, "[MASK]")

    ## concatenate the original sentence and the masked sentence
    tokenizer = fill_mask.tokenizer
    sentences_concat = f"{sentence} {tokenizer.sep_token} {sentence_masked_word}"

    ## generate and rank candidate substitutes for the masked word using the fill_mask pipeline
    top_k = 30
    result = fill_mask(sentences_concat, top_k=top_k)
   
    ## lowercase and print the top-k substitutes
    substitutes = [substitute["token_str"].lower() for substitute in result]
    print(f"SG step: generated substitutes: {substitutes}\n")
    
   # 2. Substitute Selection (SS):   
    
     # create a punctuation set without hyphen, in order to retain hyphens in compound substitutes
    punctuation_without_hyphen = set(string.punctuation) - set('-')
    
    # a) remove duplicates and unwanted punctuation within the substitute list from the substitute list
    substitutes_no_dupl = []
    for sub in substitutes:
        if sub not in substitutes_no_dupl and not any(char in punctuation_without_hyphen for char in sub):
            substitutes_no_dupl.append(sub)
    print(f"SS step: a) substitute list without duplicates and undesired punctuation: {substitutes_no_dupl}\n")
    
   
    # b) remove duplicates and inflected forms of the complex word from the substitute list
    ## Lemmatize the complex word with spaCy, in order to compare it with the lemmatized substitute later to see if their mutual lemmas are the same
    doc_complex_word = nlp(complex_word)
    complex_word_lemma = doc_complex_word[0].lemma_
    print(f"complex_word_lemma for complex word '{complex_word}': {complex_word_lemma}\n")


    ## remove duplicates and inflected forms of the complex word from the list with substitutes
    substitutes_no_dupl_complex_word = []
    for substitute in substitutes_no_dupl:
        doc_substitute = nlp(substitute)
        substitute_lemma = doc_substitute[0].lemma_
        if substitute_lemma != complex_word_lemma:
            substitutes_no_dupl_complex_word.append(substitute)
    print(f"SS step: b) substitute list without duplicates and inflected forms of the complex word: {substitutes_no_dupl_complex_word}\n")

    # c) remove antonyms of the complex word from the substitute list
    substitutes_no_dupl_complex_word_no_antonym = []
    for substitute in substitutes_no_dupl_complex_word:
        syn = wn.synsets(complex_word_lemma)
        if syn:
            syn = syn[0]
            for lemma in syn.lemmas():
                if lemma.antonyms() and lemma.name() == substitute_lemma:
                    print(f"Antonym removed (lemma): {lemma.antonyms()[0].name()}")
                    break
            else:
                substitutes_no_dupl_complex_word_no_antonym.append(substitute)
        else:
            substitutes_no_dupl_complex_word_no_antonym.append(substitute)
    print(f"SS step: c): substitute list without antonyms of the complex word: {substitutes_no_dupl_complex_word_no_antonym}\n")
    
     
    
    # limit the substitutes to the 10 first ones for evaluation
    top_10_substitutes = substitutes_no_dupl_complex_word_no_antonym[:10]
    
    # add the sentence, complex_word, and substitutes to the dataframe 
    substitutes_df.loc[index] = [sentence, complex_word] + top_10_substitutes
    
    # remove the #34-3 and #35-14 character combinations from the sentences in the dataframe
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#34-3 \"", "")
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#35-14 ", "")
    
    

# export the dataframe to a tsv file
substitutes_df.to_csv("./predictions/trial/BertBase_SG_SS_abc.tsv", sep="\t", index=False, header=False)
    
    

NameError: name 'data' is not defined

python tsar_eval.py --gold_file .\gold_trial.tsv --predictions_file ./predictions/trial/BertBase_SG_SS_abc.tsv --output_file .\output

Result: steps 2 a-c only contributed to MAP@10, and only very slightly

#### Substitute Generation with BERT-base, and Substitute Selection steps a-c, and the resulting list with FitBERT

In [7]:
# FitBERT mainly seems to look at syntactic fit
import fitbert

In [8]:

# instantiate a FitBert model
fb_model = FitBert(lm_model)


# in each row, for each complex word: 
for index, row in data.iterrows():
       
    # 1. Substitute Generation (SG): perform masking and generate substitutes:
    
    ## print the sentence and the complex word
    sentence, complex_word = row["sentence"], row["complex_word"]
    print(f"Sentence: {sentence}")
    print(f"Complex word: {complex_word}")

    ## in the sentence, replace the complex word with a masked word
    sentence_masked_word = sentence.replace(complex_word, "[MASK]")

    ## concatenate the original sentence and the masked sentence
    tokenizer = fill_mask.tokenizer
    sentences_concat = f"{sentence} {tokenizer.sep_token} {sentence_masked_word}"

    ## generate and rank candidate substitutes for the masked word using the fill_mask pipeline
    top_k = 30
    result = fill_mask(sentences_concat, top_k=top_k)
   
    ## lowercase and print the top-k substitutes
    substitutes = [substitute["token_str"].lower() for substitute in result]
    print(f"SG step: generated substitutes: {substitutes}\n")
    
    # 2. Substitute Selection (SS):   
    
    # create a punctuation set without hyphen, in order to retain hyphens in compound substitutes
    punctuation_without_hyphen = set(string.punctuation) - set('-')
    
    # a) remove duplicates and unwanted punctuation within the substitute list from the substitute list
    substitutes_no_dupl = []
    for sub in substitutes:
        if sub not in substitutes_no_dupl and not any(char in punctuation_without_hyphen for char in sub):
            substitutes_no_dupl.append(sub)
    print(f"SS step: a) substitute list without duplicates and undesired punctuation: {substitutes_no_dupl}\n")
    

   
    # b) remove duplicates and inflected forms of the complex word from the substitute list
    ## Lemmatize the complex word with spaCy, in order to compare it with the lemmatized substitute later to see if their mutual lemmas are the same
    doc_complex_word = nlp(complex_word)
    complex_word_lemma = doc_complex_word[0].lemma_
    print(f"complex_word_lemma for complex word '{complex_word}': {complex_word_lemma}\n")


    ## remove duplicates and inflected forms of the complex word from the list with substitutes
    substitutes_no_dupl_complex_word = []
    for substitute in substitutes_no_dupl:
        doc_substitute = nlp(substitute)
        substitute_lemma = doc_substitute[0].lemma_
        if substitute_lemma != complex_word_lemma:
            substitutes_no_dupl_complex_word.append(substitute)
    print(f"SS step: b) substitute list without duplicates and inflected forms of the complex word: {substitutes_no_dupl_complex_word}\n")

    # c) remove antonyms of the complex word from the substitute list
    substitutes_no_dupl_complex_word_no_antonym = []
    for substitute in substitutes_no_dupl_complex_word:
        syn = wn.synsets(complex_word_lemma)
        if syn:
            syn = syn[0]
            for lemma in syn.lemmas():
                if lemma.antonyms() and lemma.name() == substitute_lemma:
                    print(f"Antonym removed (lemma): {lemma.antonyms()[0].name()}")
                    break
            else:
                substitutes_no_dupl_complex_word_no_antonym.append(substitute)
        else:
            substitutes_no_dupl_complex_word_no_antonym.append(substitute)
    print(f"SS step: c): substitute list without antonyms of the complex word: {substitutes_no_dupl_complex_word_no_antonym}\n")
    
     # d) apply FITBERT to the list of substitutes
    sentence_fitbert_masked = sentence_masked_word.replace("[MASK]", "***mask***")   # fitbert uses ***mask*** instead of [MASK] or <mask> 
    sentences_concat_fitbert = f"{sentence} {tokenizer.sep_token} {sentence_fitbert_masked}"
    
    ranked_substitutes = fb_model.rank(sentences_concat_fitbert, substitutes_no_dupl_complex_word_no_antonym)
    print(f"SS step: d) ranked substitutes using FitBert: {ranked_substitutes}\n")
    
    print('-----------------------------------------------------------------------------------------')
    print()
    
    
    # limit the substitutes to the 10 first ones for evaluation
    top_10_substitutes = ranked_substitutes[:10]
    
    # add the sentence, complex_word, and substitutes to the dataframe 
    substitutes_df.loc[index] = [sentence, complex_word] + top_10_substitutes
    
    # remove the #34-3 and #35-14 character combinations from the sentences in the dataframe
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#34-3 \"", "")
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#35-14 ", "")
    
    

# export the dataframe to a tsv file
substitutes_df.to_csv("./predictions/trial/BertBase_SG_SS_abc_fb.tsv", sep="\t", index=False, header=False)


device: cpu
using custom model: ['BertForMaskedLM']
Sentence: A Spanish government source, however, later said that banks able to cover by themselves losses on their toxic property assets will not be forced to remove them from their books while it will be compulsory for those receiving public help.
Complex word: compulsory
SG step: generated substitutes: ['compulsory', 'mandatory', 'obligatory', 'optional', 'required', 'necessary', 'standard', 'voluntary', 'customary', 'impossible', 'easier', 'only', 'illegal', 'sufficient', 'unnecessary', 'easy', 'normal', 'permitted', 'mandated', 'difficult', 'simple', 'appropriate', 'expensive', 'possible', 'commonplace', 'essential', 'proper', 'available', 'enough', 'affordable']

SS step: a) substitute list without duplicates: ['compulsory', 'mandatory', 'obligatory', 'optional', 'required', 'necessary', 'standard', 'voluntary', 'customary', 'impossible', 'easier', 'only', 'illegal', 'sufficient', 'unnecessary', 'easy', 'normal', 'permitted', 'man

python tsar_eval.py --gold_file .\gold_trial.tsv --predictions_file ./predictions/trial/BertBase_SG_SS_abc_fb.tsv --output_file .\output

Result: only Map@5 and Map@10 improved slightly, probably due to ranking subs starting with ## at the end of the list, which seems to be the only thing that FitBERT did. 

#### Substitute Generation with BERT-base, and Substitute Selection steps a-c, and the resulting list with contextualized embeddings

In [9]:
from transformers import TFAutoModel
import tensorflow as tf
import numpy as np

In [10]:
# Calculates similarity between the original sentence and the sentences with candidate substitutes that were retrieved in the SG step 
# creates a list with sentences with substitute words filled in (commented out for oversight purposes)


def calculate_similarity_scores(sentence, sentence_with_substitutes):
    tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
    tf_model = TFAutoModel.from_pretrained("bert-base-uncased")

    def embed_text(text):
        tokens = tokenizer(text, padding=True, truncation=True, return_tensors="tf")
        outputs = tf_model(**tokens)
        embeddings = outputs.last_hidden_state[:, 0, :]
        embeddings = tf.nn.l2_normalize(embeddings, axis=1)
        return embeddings

    original_sentence_embedding = embed_text(sentence)
    substitute_sentence_embeddings = embed_text(sentence_with_substitutes)

    cosine_similarity = np.inner(original_sentence_embedding, substitute_sentence_embeddings)
    similarity_scores = cosine_similarity[0]

    return similarity_scores



# in each row, for each complex word: 
for index, row in data.iterrows():
       
    # 1. Substitute Generation (SG): perform masking and generate substitutes:
    
    ## print the sentence and the complex word
    sentence, complex_word = row["sentence"], row["complex_word"]
    print(f"Sentence: {sentence}")
    print(f"Complex word: {complex_word}")

    ## in the sentence, replace the complex word with a masked word
    sentence_masked_word = sentence.replace(complex_word, "[MASK]")

    ## concatenate the original sentence and the masked sentence
    tokenizer = fill_mask.tokenizer
    sentences_concat = f"{sentence} {tokenizer.sep_token} {sentence_masked_word}"

    ## generate and rank candidate substitutes for the masked word using the fill_mask pipeline
    top_k = 30
    result = fill_mask(sentences_concat, top_k=top_k)
   
    ## lowercase and print the top-k substitutes
    substitutes = [substitute["token_str"].lower() for substitute in result]
    print(f"SG step: generated substitutes: {substitutes}\n")
    
    # 2. Substitute Selection (SS):   
    
    # create a punctuation set without hyphen, in order to retain hyphens in compound substitutes
    punctuation_without_hyphen = set(string.punctuation) - set('-')
    
    # a) remove duplicates and unwanted punctuation within the substitute list from the substitute list
    substitutes_no_dupl = []
    for sub in substitutes:
        if sub not in substitutes_no_dupl and not any(char in punctuation_without_hyphen for char in sub):
            substitutes_no_dupl.append(sub)
    print(f"SS step: a) substitute list without duplicates and undesired punctuation: {substitutes_no_dupl}\n")
    
   
    # b) remove duplicates and inflected forms of the complex word from the substitute list
    ## Lemmatize the complex word with spaCy, in order to compare it with the lemmatized substitute later to see if their mutual lemmas are the same
    doc_complex_word = nlp(complex_word)
    complex_word_lemma = doc_complex_word[0].lemma_
    print(f"complex_word_lemma for complex word '{complex_word}': {complex_word_lemma}\n")


    ## remove duplicates and inflected forms of the complex word from the list with substitutes
    substitutes_no_dupl_complex_word = []
    for substitute in substitutes_no_dupl:
        doc_substitute = nlp(substitute)
        substitute_lemma = doc_substitute[0].lemma_
        if substitute_lemma != complex_word_lemma:
            substitutes_no_dupl_complex_word.append(substitute)
    print(f"SS step: b) substitute list without duplicates and inflected forms of the complex word: {substitutes_no_dupl_complex_word}\n")

    # c) remove antonyms of the complex word from the substitute list
    substitutes_no_dupl_complex_word_no_antonym = []
    for substitute in substitutes_no_dupl_complex_word:
        syn = wn.synsets(complex_word_lemma)
        if syn:
            syn = syn[0]
            for lemma in syn.lemmas():
                if lemma.antonyms() and lemma.name() == substitute_lemma:
                    print(f"Antonym removed (lemma): {lemma.antonyms()[0].name()}")
                    break
            else:
                substitutes_no_dupl_complex_word_no_antonym.append(substitute)
        else:
            substitutes_no_dupl_complex_word_no_antonym.append(substitute)
    print(f"SS step: c): substitute list without antonyms of the complex word: {substitutes_no_dupl_complex_word_no_antonym}\n")
    
    
    # create sentence with the complex word replaced by the substitutes
    sentence_with_substitutes = [sentence.replace(complex_word, sub) for sub in substitutes_no_dupl_complex_word_no_antonym]
    #print(f"List with sentences where complex word is substituted: {sentence_with_substitutes}\n")
    
    
    # d) calculate cosine similarity scores, and rank the substitutes based on their similarity score
    similarity_scores = calculate_similarity_scores(sentence, sentence_with_substitutes)
    #print(f"Similarity scores: {similarity_scores}\n")
    ranked_substitutes_withscores = sorted(zip(substitutes_no_dupl_complex_word_no_antonym, similarity_scores), key=lambda x: x[1], reverse=True)
    #print(f"SS step d) Ranked substitutes, including similarity scores in context: {ranked_substitutes}\n")
    ranked_substitutes = [substitute for substitute, score in ranked_substitutes_withscores]
    print(f"SS step d) Ranked substitutes, based on cosine similarity scores in context: {ranked_substitutes}\n")
        
    print('-----------------------------------------------------------------------------------------')
    print()
    
       
    
    
    # limit the substitutes to the 10 first ones for evaluation
    top_10_substitutes = ranked_substitutes[:10]
    
    # add the sentence, complex_word, and substitutes to the dataframe 
    substitutes_df.loc[index] = [sentence, complex_word] + top_10_substitutes
    
    # remove the #34-3 and #35-14 character combinations from the sentences in the dataframe
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#34-3 \"", "")
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#35-14 ", "")
    
    

# export the dataframe to a tsv file
substitutes_df.to_csv("./predictions/trial/BertBase_SG_SS_abc_ce.tsv", sep="\t", index=False, header=False)

Sentence: A Spanish government source, however, later said that banks able to cover by themselves losses on their toxic property assets will not be forced to remove them from their books while it will be compulsory for those receiving public help.
Complex word: compulsory
SG step: generated substitutes: ['compulsory', 'mandatory', 'obligatory', 'optional', 'required', 'necessary', 'standard', 'voluntary', 'customary', 'impossible', 'easier', 'only', 'illegal', 'sufficient', 'unnecessary', 'easy', 'normal', 'permitted', 'mandated', 'difficult', 'simple', 'appropriate', 'expensive', 'possible', 'commonplace', 'essential', 'proper', 'available', 'enough', 'affordable']

SS step: a) substitute list without duplicates: ['compulsory', 'mandatory', 'obligatory', 'optional', 'required', 'necessary', 'standard', 'voluntary', 'customary', 'impossible', 'easier', 'only', 'illegal', 'sufficient', 'unnecessary', 'easy', 'normal', 'permitted', 'mandated', 'difficult', 'simple', 'appropriate', 'expen

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['mandatory', 'obligatory', 'voluntary', 'permitted', 'mandated', 'illegal', 'required', 'optional', 'unnecessary', 'proper', 'customary', 'expensive', 'possible', 'necessary', 'standard', 'impossible', 'normal', 'easier', 'essential', 'difficult', 'only', 'appropriate', 'easy', 'sufficient', 'simple', 'enough', 'commonplace', 'affordable', 'available']

-----------------------------------------------------------------------------------------

Sentence: Rajoy's conservative government had instilled markets with a brief dose of confidence by stepping into Bankia, performing a U-turn on its refusal to spend public money to rescue banks.
Complex word: instilled
SG step: generated substitutes: ['strengthened', 'targeted', 'tested', 'provided', 'created', 'affected', 'addressed', 'attacked', 'punished', 'bred', 'rewarded', 'empowered', 'shocked', 'introduced', 'silenced', 'improved', 'set', 'defeated', 'treated', '

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['infused', 'prepared', 'hardened', 'silenced', 'reinforced', 'shocked', 'established', 'set', 'secured', 'created', 'delivered', 'rewarded', 'surprised', 'treated', 'bred', 'empowered', 'hit', 'tested', 'addressed', 'attacked', 'introduced', 'released', 'targeted', 'strengthened', 'provided', 'defeated', 'challenged', 'affected', 'punished', 'improved']

-----------------------------------------------------------------------------------------

Sentence: #34-3 "War maniacs of the South Korean puppet military made another grave provocation to the DPRK in the central western sector of the front on Thursday afternoon.
Complex word: maniacs
SG step: generated substitutes: ['criminals', 'crimes', 'machines', '"', 'rats', 'pigs', 'heroes', 'demons', 'robots', 'machine', 'games', 'doctors', 'dogs', 'priests', 'bandits', 'lords', 'monsters', 'soldiers', '##games', 'leaders', 'freaks', 'ants', 'saints', 'people', '##is

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['rats', 'monsters', 'lords', 'demons', 'freaks', 'zombies', 'robots', 'bandits', 'doctors', 'ants', 'pigs', 'priests', 'criminals', 'saints', 'dogs', 'terror', 'leaders', 'people', 'heroes', 'machines', 'soldiers', 'workers', 'elephants', 'fighters', 'machine', 'games', 'crimes', '##games', '##ists', '"']

-----------------------------------------------------------------------------------------

Sentence: The daily death toll in Syria has declined as the number of observers has risen, but few experts expect the U.N. plan to succeed in its entirety.
Complex word: observers
SG step: generated substitutes: ['observers', 'witnesses', 'participants', 'casualties', 'refugees', 'observer', 'visitors', 'monitors', 'victims', 'delegates', 'travelers', 'experts', 'troops', 'inspectors', 'allies', 'diplomats', 'inspections', 'soldiers', 'fatalities', 'workers', 'pilgrims', 'people', 'officials', 'deaths', 'tourists', 's

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['monitors', 'officials', 'casualties', 'travelers', 'people', 'tourists', 'citizens', 'troops', 'diplomats', 'witnesses', 'victims', 'civilians', 'survivors', 'experts', 'soldiers', 'deaths', 'refugees', 'fatalities', 'visitors', 'inspections', 'volunteers', 'participants', 'pilgrims', 'combatants', 'workers', 'inspectors', 'delegates', 'allies']

-----------------------------------------------------------------------------------------

Sentence: An amateur video showed a young girl who apparently suffered shrapnel wounds in her thigh undergoing treatment in a makeshift Rastan hospital while screaming in pain.
Complex word: shrapnel
SG step: generated substitutes: ['stab', 'gunshot', 'bullet', 'knife', 'multiple', 'severe', 'arrow', 'several', 'minor', 'stabbing', 'laser', 'open', 'two', 'rifle', 'serious', 'blunt', 'similar', 'blast', 'numerous', 'penetrating', 'three', 'sword', 'some', 'slash', 'shooting', 

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['arrow', 'bullet', 'blast', 'multiple', 'penetrating', 'three', 'several', 'numerous', 'severe', 'minor', 'two', 'deep', 'serious', 'gunshot', 'blunt', 'stab', 'slash', 'fatal', 'rifle', 'some', 'open', 'shooting', 'painful', 'knife', 'sword', 'the', 'stabbing', 'laser', 'horrible', 'similar']

-----------------------------------------------------------------------------------------

Sentence: A local witness said a separate group of attackers disguised in burqas — the head-to-toe robes worn by conservative Afghan women — then tried to storm the compound.
Complex word: disguised
SG step: generated substitutes: ['disguised', 'dressed', 'clad', 'masked', 'clothed', 'dressing', 'disguise', 'concealed', 'cloak', 'posing', 'appeared', 'dress', 'posed', 'guise', 'costume', 'attire', 'appearing', 'dresses', 'hiding', 'hooded', 'covered', 'draped', 'costumes', ',', 'hidden', 'wrapped', 'undercover', 'lured', 'present

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['masked', 'hooded', 'clad', 'clothed', 'dressed', 'adorned', 'covered', 'concealed', 'posing', 'dressing', 'draped', 'wrapped', 'hidden', 'hiding', ',', 'posed', 'disguise', 'appearing', 'undercover', 'lured', 'appeared', 'present', 'guise', 'cloak', 'costumes', 'costume', 'attire', 'dress', 'dresses']

-----------------------------------------------------------------------------------------

Sentence: Syria's Sunni majority is at the forefront of the uprising against Assad, whose minority Alawite sect is an offshoot of Shi'ite Islam.
Complex word: offshoot
SG step: generated substitutes: ['affiliate', 'extension', 'arm', 'adaptation', 'incarnation', 'ally', 'opponent', 'attack', 'expansion', 'outpost', 'associate', 'advocate', 'enemy', 'aspect', 'origin', 'branch', 'adversary', 'independent', 'alliance', 'ancestor', 'iteration', 'evolution', 'independently', 'antagonist', 'offspring', 'overthrow', 'imitation

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['branch', 'arm', 'affiliate', 'extension', 'associate', 'offspring', 'adversary', 'organization', 'outpost', 'aspect', 'independent', 'iteration', 'advocate', 'incarnation', 'opponent', 'antagonist', 'independently', 'evolution', 'implementation', 'adaptation', 'alliance', 'expansion', 'ancestor', 'origin', 'ally', 'enemy', 'imitation', 'overthrow', 'example', 'attack']

-----------------------------------------------------------------------------------------

Sentence: Although not as rare in the symphonic literature as sharper keys , examples of symphonies in A major are not as numerous as for D major or G major .
Complex word: symphonic
SG step: generated substitutes: ['symphonic', 'orchestral', 'symphony', 'musical', 'classical', 'operatic', 'philharmonic', 'melodic', 'choral', 'music', 'orchestra', 'instrumental', 'concert', 'historical', 'popular', 'concerto', 'harmonic', 'symphonies', 'technical', 'tro

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['orchestral', 'harmonic', 'composers', 'vocal', 'concert', 'popular', 'melodic', 'technical', 'orchestra', 'symphony', 'classical', 'philharmonic', 'traditional', 'musical', 'general', 'thematic', 'instrumental', 'dynamic', 'dramatic', 'historical', 'choral', 'symphonies', 'lyric', 'music', 'operatic', 'concerto', 'liturgical', 'trombone', 'romantic']

-----------------------------------------------------------------------------------------

Sentence: That prompted the military to deploy its largest warship, the BRP Gregorio del Pilar, which was recently acquired from the United States.
Complex word: deploy
SG step: generated substitutes: ['deploy', 'deployed', 'withdraw', 'activate', 'deployment', 'send', 'maintain', 'dispatch', 'use', 'acquire', 'reserve', 'retain', 'move', 'utilize', 'request', 'return', 'include', 'add', 'launch', 'designate', 'fire', 'operate', 'provide', 'establish', 'employ', 'outfit',

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['deployment', 'activate', 'utilize', 'dispatch', 'operate', 'expand', 'include', 'maintain', 'add', 'employ', 'launch', 'use', 'return', 'move', 'provide', 'acquire', 'retain', 'outfit', 'send', 'request', 'build', 'purchase', 'establish', 'reserve', 'retire', 'fire', 'withdraw', 'designate']

-----------------------------------------------------------------------------------------

Sentence: #35-14 UK police were expressly forbidden, at a ministerial level, to provide any assistance to Thai authorities as the case involves the death penalty.
Complex word: authorities
SG step: generated substitutes: ['authorities', 'officials', 'police', '##s', 'citizens', 'forces', 'government', 'officers', 'people', 'governments', 'residents', 'troops', 'authority', 'courts', 'individuals', 'persons', 'organisations', 'policemen', 'civilians', 'nationals', 'protesters', 'institutions', 'interests', 'agencies', 'subjects', '

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['officials', 'personnel', 'citizens', 'agencies', 'nationals', 'individuals', 'police', 'persons', 'bodies', 'residents', 'officers', 'courts', 'civilians', 'people', 'interests', 'organisations', 'government', 'institutions', 'governments', 'subjects', 'policemen', 'forces', 'magistrates', 'troops', 'politicians', 'activists', 'protesters', '##s']

-----------------------------------------------------------------------------------------



python tsar_eval.py --gold_file .\gold_trial.tsv --predictions_file ./predictions/trial/BertBase_SG_SS_abc_ce.tsv --output_file .\output

In [11]:
# best output so far on MAP@1., and on Potential.

#### Substitute Generation with BERT-base, and Substitute Selection steps a-c, and the resulting list with BERTScore

In [14]:
import bert_score
from bert_score import score

In [15]:


# in each row, for each complex word: 
for index, row in data.iterrows():
       
    # 1. Substitute Generation (SG): perform masking and generate substitutes:
    
    ## print the sentence and the complex word
    sentence, complex_word = row["sentence"], row["complex_word"]
    print(f"Sentence: {sentence}")
    print(f"Complex word: {complex_word}")

    ## in the sentence, replace the complex word with a masked word
    sentence_masked_word = sentence.replace(complex_word, "[MASK]")

    ## concatenate the original sentence and the masked sentence
    tokenizer = fill_mask.tokenizer
    sentences_concat = f"{sentence} {tokenizer.sep_token} {sentence_masked_word}"

    ## generate and rank candidate substitutes for the masked word using the fill_mask pipeline
    top_k = 30
    result = fill_mask(sentences_concat, top_k=top_k)
   
    ## lowercase and print the top-k substitutes
    substitutes = [substitute["token_str"].lower() for substitute in result]
    print(f"SG step: generated substitutes: {substitutes}\n")
    
    # 2. Substitute Selection (SS):   
    
     # create a punctuation set without hyphen, in order to retain hyphens in compound substitutes
    punctuation_without_hyphen = set(string.punctuation) - set('-')
    
    # a) remove duplicates and unwanted punctuation within the substitute list from the substitute list
    substitutes_no_dupl = []
    for sub in substitutes:
        if sub not in substitutes_no_dupl and not any(char in punctuation_without_hyphen for char in sub):
            substitutes_no_dupl.append(sub)
    print(f"SS step: a) substitute list without duplicates and undesired punctuation: {substitutes_no_dupl}\n")
    
   
    # b) remove duplicates and inflected forms of the complex word from the substitute list
    ## Lemmatize the complex word with spaCy, in order to compare it with the lemmatized substitute later to see if their mutual lemmas are the same
    doc_complex_word = nlp(complex_word)
    complex_word_lemma = doc_complex_word[0].lemma_
    print(f"complex_word_lemma for complex word '{complex_word}': {complex_word_lemma}\n")


    ## remove duplicates and inflected forms of the complex word from the list with substitutes
    substitutes_no_dupl_complex_word = []
    for substitute in substitutes_no_dupl:
        doc_substitute = nlp(substitute)
        substitute_lemma = doc_substitute[0].lemma_
        if substitute_lemma != complex_word_lemma:
            substitutes_no_dupl_complex_word.append(substitute)
    print(f"SS step: b) substitute list without duplicates and inflected forms of the complex word: {substitutes_no_dupl_complex_word}\n")

    # c) remove antonyms of the complex word from the substitute list
    substitutes_no_dupl_complex_word_no_antonym = []
    for substitute in substitutes_no_dupl_complex_word:
        syn = wn.synsets(complex_word_lemma)
        if syn:
            syn = syn[0]
            for lemma in syn.lemmas():
                if lemma.antonyms() and lemma.name() == substitute_lemma:
                    print(f"Antonym removed (lemma): {lemma.antonyms()[0].name()}")
                    break
            else:
                substitutes_no_dupl_complex_word_no_antonym.append(substitute)
        else:
            substitutes_no_dupl_complex_word_no_antonym.append(substitute)
    print(f"SS step: c): substitute list without antonyms of the complex word: {substitutes_no_dupl_complex_word_no_antonym}\n")
    
    
    # create sentences with the complex word replaced by the substitutes
    sentences_with_substitutes = [sentence.replace(complex_word, sub) for sub in substitutes_no_dupl_complex_word_no_antonym]
    #print(f"SG step: sentences with substitutes: {sentences_with_substitutes}\n")
    
          
    # d) use BERTScore for sorting
    scores = bert_score.score([sentence]*len(sentences_with_substitutes), sentences_with_substitutes, lang="en", model_type='bert-base-uncased', verbose=False)
    ranked_substitutes = [substitute for _, substitute in sorted(zip(scores[0].tolist(), substitutes_no_dupl_complex_word_no_antonym), reverse=True)]
    print(f"SS step: d) substitute list sorted by descending BERTScore: {ranked_substitutes}\n")

    
    print('-----------------------------------------------------------------------------------------')
    print()
    
    
    # limit the substitutes to the 10 first ones for evaluation
    top_10_substitutes = ranked_substitutes[:10]
    
    # add the sentence, complex_word, and substitutes to the dataframe 
    substitutes_df.loc[index] = [sentence, complex_word] + top_10_substitutes
    
    # remove the #34-3 and #35-14 character combinations from the sentences in the dataframe
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#34-3 \"", "")
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#35-14 ", "")
    
    

# export the dataframe to a tsv file
substitutes_df.to_csv("./predictions/trial/BertBase_SG_SS_abc_bs.tsv", sep="\t", index=False, header=False)

device: cpu
using custom model: ['BertForMaskedLM']
Sentence: A Spanish government source, however, later said that banks able to cover by themselves losses on their toxic property assets will not be forced to remove them from their books while it will be compulsory for those receiving public help.
Complex word: compulsory
SG step: generated substitutes: ['compulsory', 'mandatory', 'obligatory', 'optional', 'required', 'necessary', 'standard', 'voluntary', 'customary', 'impossible', 'easier', 'only', 'illegal', 'sufficient', 'unnecessary', 'easy', 'normal', 'permitted', 'mandated', 'difficult', 'simple', 'appropriate', 'expensive', 'possible', 'commonplace', 'essential', 'proper', 'available', 'enough', 'affordable']

SS step: a) substitute list without duplicates: ['compulsory', 'mandatory', 'obligatory', 'optional', 'required', 'necessary', 'standard', 'voluntary', 'customary', 'impossible', 'easier', 'only', 'illegal', 'sufficient', 'unnecessary', 'easy', 'normal', 'permitted', 'man

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['mandatory', 'required', 'mandated', 'necessary', 'essential', 'voluntary', 'optional', 'permitted', 'appropriate', 'illegal', 'standard', 'unnecessary', 'simple', 'obligatory', 'easier', 'normal', 'expensive', 'sufficient', 'easy', 'affordable', 'possible', 'only', 'enough', 'available', 'commonplace', 'impossible', 'customary', 'difficult', 'proper']

-----------------------------------------------------------------------------------------

Sentence: Rajoy's conservative government had instilled markets with a brief dose of confidence by stepping into Bankia, performing a U-turn on its refusal to spend public money to rescue banks.
Complex word: instilled
SG step: generated substitutes: ['strengthened', 'targeted', 'tested', 'provided', 'created', 'affected', 'addressed', 'attacked', 'punished', 'bred', 'rewarded', 'empowered', 'shocked', 'introduced', 'silenced', 'improved', 'set', 'defeated', 'treated', 'reinforced', 'pre

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['infused', 'provided', 'empowered', 'treated', 'hit', 'strengthened', 'rewarded', 'reinforced', 'affected', 'challenged', 'set', 'tested', 'surprised', 'created', 'attacked', 'shocked', 'prepared', 'established', 'delivered', 'improved', 'secured', 'targeted', 'silenced', 'punished', 'addressed', 'introduced', 'released', 'hardened', 'defeated', 'bred']

-----------------------------------------------------------------------------------------

Sentence: #34-3 "War maniacs of the South Korean puppet military made another grave provocation to the DPRK in the central western sector of the front on Thursday afternoon.
Complex word: maniacs
SG step: generated substitutes: ['criminals', 'crimes', 'machines', '"', 'rats', 'pigs', 'heroes', 'demons', 'robots', 'machine', 'games', 'doctors', 'dogs', 'priests', 'bandits', 'lords', 'monsters', 'soldiers', '##games', 'leaders', 'freaks', 'ants', 'saints', 'people', '##ists', 'workers', '

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['freaks', 'criminals', 'lords', 'machines', 'soldiers', 'dogs', 'heroes', 'machine', 'fighters', 'games', 'crimes', 'workers', 'leaders', 'demons', 'terror', 'monsters', '##ists', 'elephants', 'rats', 'doctors', 'people', 'ants', 'bandits', 'robots', 'saints', 'priests', 'zombies', 'pigs', '##games', '"']

-----------------------------------------------------------------------------------------

Sentence: The daily death toll in Syria has declined as the number of observers has risen, but few experts expect the U.N. plan to succeed in its entirety.
Complex word: observers
SG step: generated substitutes: ['observers', 'witnesses', 'participants', 'casualties', 'refugees', 'observer', 'visitors', 'monitors', 'victims', 'delegates', 'travelers', 'experts', 'troops', 'inspectors', 'allies', 'diplomats', 'inspections', 'soldiers', 'fatalities', 'workers', 'pilgrims', 'people', 'officials', 'deaths', 'tourists', 'survivors', 'volun

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['monitors', 'inspectors', 'diplomats', 'experts', 'inspections', 'witnesses', 'allies', 'officials', 'delegates', 'workers', 'volunteers', 'participants', 'visitors', 'tourists', 'pilgrims', 'travelers', 'troops', 'soldiers', 'survivors', 'combatants', 'citizens', 'refugees', 'civilians', 'people', 'casualties', 'victims', 'fatalities', 'deaths']

-----------------------------------------------------------------------------------------

Sentence: An amateur video showed a young girl who apparently suffered shrapnel wounds in her thigh undergoing treatment in a makeshift Rastan hospital while screaming in pain.
Complex word: shrapnel
SG step: generated substitutes: ['stab', 'gunshot', 'bullet', 'knife', 'multiple', 'severe', 'arrow', 'several', 'minor', 'stabbing', 'laser', 'open', 'two', 'rifle', 'serious', 'blunt', 'similar', 'blast', 'numerous', 'penetrating', 'three', 'sword', 'some', 'slash', 'shooting', 'the', 'painful',

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['bullet', 'stab', 'gunshot', 'shooting', 'knife', 'multiple', 'stabbing', 'blunt', 'severe', 'deep', 'blast', 'several', 'numerous', 'minor', 'serious', 'two', 'open', 'three', 'slash', 'penetrating', 'some', 'rifle', 'fatal', 'arrow', 'painful', 'sword', 'horrible', 'similar', 'the', 'laser']

-----------------------------------------------------------------------------------------

Sentence: A local witness said a separate group of attackers disguised in burqas — the head-to-toe robes worn by conservative Afghan women — then tried to storm the compound.
Complex word: disguised
SG step: generated substitutes: ['disguised', 'dressed', 'clad', 'masked', 'clothed', 'dressing', 'disguise', 'concealed', 'cloak', 'posing', 'appeared', 'dress', 'posed', 'guise', 'costume', 'attire', 'appearing', 'dresses', 'hiding', 'hooded', 'covered', 'draped', 'costumes', ',', 'hidden', 'wrapped', 'undercover', 'lured', 'present', 'adorned']

SS

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['masked', 'concealed', 'clothed', 'posing', 'dressed', 'clad', 'adorned', 'wrapped', 'hidden', 'covered', 'dressing', 'hiding', 'draped', 'hooded', 'appearing', 'posed', 'present', 'dress', 'appeared', 'lured', 'disguise', 'undercover', 'dresses', 'cloak', ',', 'costumes', 'costume', 'guise', 'attire']

-----------------------------------------------------------------------------------------

Sentence: Syria's Sunni majority is at the forefront of the uprising against Assad, whose minority Alawite sect is an offshoot of Shi'ite Islam.
Complex word: offshoot
SG step: generated substitutes: ['affiliate', 'extension', 'arm', 'adaptation', 'incarnation', 'ally', 'opponent', 'attack', 'expansion', 'outpost', 'associate', 'advocate', 'enemy', 'aspect', 'origin', 'branch', 'adversary', 'independent', 'alliance', 'ancestor', 'iteration', 'evolution', 'independently', 'antagonist', 'offspring', 'overthrow', 'imitation', 'implementatio

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['extension', 'affiliate', 'arm', 'offspring', 'iteration', 'incarnation', 'outpost', 'adaptation', 'evolution', 'ally', 'aspect', 'branch', 'ancestor', 'imitation', 'expansion', 'adversary', 'opponent', 'origin', 'associate', 'antagonist', 'enemy', 'example', 'independent', 'alliance', 'advocate', 'implementation', 'organization', 'independently', 'attack', 'overthrow']

-----------------------------------------------------------------------------------------

Sentence: Although not as rare in the symphonic literature as sharper keys , examples of symphonies in A major are not as numerous as for D major or G major .
Complex word: symphonic
SG step: generated substitutes: ['symphonic', 'orchestral', 'symphony', 'musical', 'classical', 'operatic', 'philharmonic', 'melodic', 'choral', 'music', 'orchestra', 'instrumental', 'concert', 'historical', 'popular', 'concerto', 'harmonic', 'symphonies', 'technical', 'trombone', 'romantic

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['philharmonic', 'symphony', 'orchestral', 'melodic', 'operatic', 'concerto', 'thematic', 'symphonies', 'classical', 'musical', 'dramatic', 'choral', 'harmonic', 'dynamic', 'concert', 'liturgical', 'instrumental', 'music', 'general', 'popular', 'traditional', 'historical', 'lyric', 'romantic', 'orchestra', 'trombone', 'composers', 'technical', 'vocal']

-----------------------------------------------------------------------------------------

Sentence: That prompted the military to deploy its largest warship, the BRP Gregorio del Pilar, which was recently acquired from the United States.
Complex word: deploy
SG step: generated substitutes: ['deploy', 'deployed', 'withdraw', 'activate', 'deployment', 'send', 'maintain', 'dispatch', 'use', 'acquire', 'reserve', 'retain', 'move', 'utilize', 'request', 'return', 'include', 'add', 'launch', 'designate', 'fire', 'operate', 'provide', 'establish', 'employ', 'outfit', 'purchase', 'exp

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['dispatch', 'activate', 'utilize', 'employ', 'use', 'launch', 'operate', 'send', 'outfit', 'reserve', 'expand', 'add', 'establish', 'purchase', 'maintain', 'provide', 'acquire', 'fire', 'build', 'request', 'retain', 'move', 'designate', 'retire', 'include', 'withdraw', 'deployment', 'return']

-----------------------------------------------------------------------------------------

Sentence: #35-14 UK police were expressly forbidden, at a ministerial level, to provide any assistance to Thai authorities as the case involves the death penalty.
Complex word: authorities
SG step: generated substitutes: ['authorities', 'officials', 'police', '##s', 'citizens', 'forces', 'government', 'officers', 'people', 'governments', 'residents', 'troops', 'authority', 'courts', 'individuals', 'persons', 'organisations', 'policemen', 'civilians', 'nationals', 'protesters', 'institutions', 'interests', 'agencies', 'subjects', 'bodies', 'activis

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['police', 'officials', 'government', 'magistrates', 'policemen', 'courts', 'civilians', 'troops', 'citizens', 'residents', 'nationals', 'forces', 'activists', 'governments', 'politicians', 'personnel', 'officers', 'institutions', 'protesters', 'agencies', 'people', 'organisations', 'individuals', 'subjects', 'bodies', 'persons', 'interests', '##s']

-----------------------------------------------------------------------------------------



python tsar_eval.py --gold_file .\gold_trial.tsv --predictions_file ./predictions/trial/BertBase_SG_SS_abc_bs.tsv --output_file .\output

Comparable with context. embeddings model based on regular Bert. MAP is better (only MAP1 is the same), potential and accuracy differ.

### Bert-large

In [16]:
# # initialize the tokenizer and the models
tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased")
lm_model = AutoModelForMaskedLM.from_pretrained("bert-large-uncased")

# create a fill-mask pipeline 
fill_mask = pipeline("fill-mask", lm_model, tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased"))


Some weights of the model checkpoint at bert-large-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


#### Only Substitute Generation with BERT-large

In [17]:


# in each row, for each complex word: 
for index, row in data.iterrows():
       
    # 1. Substitute Generation (SG): perform masking and generate substitutes:
    
    ## print the sentence and the complex word
    sentence, complex_word = row["sentence"], row["complex_word"]
    print(f"Sentence: {sentence}")
    print(f"Complex word: {complex_word}")

    ## in the sentence, replace the complex word with a masked word
    sentence_masked_word = sentence.replace(complex_word, "[MASK]")

    ## concatenate the original sentence and the masked sentence
    tokenizer = fill_mask.tokenizer
    sentences_concat = f"{sentence} {tokenizer.sep_token} {sentence_masked_word}"

    ## generate and rank candidate substitutes for the masked word using the fill_mask pipeline
    top_k = 10
    result = fill_mask(sentences_concat, top_k=top_k)
   
    ## lowercase and print the top-k substitutes
    substitutes = [substitute["token_str"].lower() for substitute in result]
    print(f"SG step: generated substitutes: {substitutes}\n")
    
    
    # add the sentence, complex_word, and substitutes to the dataframe 
    substitutes_df.loc[index] = [sentence, complex_word] + substitutes
    
    # remove the #34-3 and #35-14 character combinations from the sentences in the dataframe
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#34-3 \"", "")
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#35-14 ", "")
    
    

# export the dataframe to a tsv file
substitutes_df.to_csv("./predictions/trial/BertLarge_SG.tsv", sep="\t", index=False, header=False)

Sentence: A Spanish government source, however, later said that banks able to cover by themselves losses on their toxic property assets will not be forced to remove them from their books while it will be compulsory for those receiving public help.
Complex word: compulsory
SG step: generated substitutes: ['compulsory', 'mandatory', 'obligatory', 'required', 'voluntary', 'optional', 'mandated', 'necessary', 'forbidden', 'permitted']

Sentence: Rajoy's conservative government had instilled markets with a brief dose of confidence by stepping into Bankia, performing a U-turn on its refusal to spend public money to rescue banks.
Complex word: instilled
SG step: generated substitutes: ['provided', 'injected', 'infused', 'presented', 'left', 'supplied', 'delivered', 'endowed', 'impressed', 'fed']

Sentence: #34-3 "War maniacs of the South Korean puppet military made another grave provocation to the DPRK in the central western sector of the front on Thursday afternoon.
Complex word: maniacs
SG 

python tsar_eval.py --gold_file .\gold_trial.tsv --predictions_file ./predictions/trial/BertLarge_SG.tsv --output_file .\output

In [18]:
# Surprisingly, bert-base performs better on the MAP@1 metric than bert-large!  on the other metrics, the results vary. Find out what this means and adapt approach accordingly!

#### Substitute Generation with BERT-large, and Substitute Selection steps a-c

In [19]:

# in each row, for each complex word: 
for index, row in data.iterrows():
       
    # 1. Substitute Generation (SG): perform masking and generate substitutes:
    
    ## print the sentence and the complex word
    sentence, complex_word = row["sentence"], row["complex_word"]
    print(f"Sentence: {sentence}")
    print(f"Complex word: {complex_word}")

    ## in the sentence, replace the complex word with a masked word
    sentence_masked_word = sentence.replace(complex_word, "[MASK]")

    ## concatenate the original sentence and the masked sentence
    tokenizer = fill_mask.tokenizer
    sentences_concat = f"{sentence} {tokenizer.sep_token} {sentence_masked_word}"

    ## generate and rank candidate substitutes for the masked word using the fill_mask pipeline
    top_k = 30
    result = fill_mask(sentences_concat, top_k=top_k)
   
    ## lowercase and print the top-k substitutes
    substitutes = [substitute["token_str"].lower() for substitute in result]
    print(f"SG step: generated substitutes: {substitutes}\n")
    
    # 2. Substitute Selection (SS):   
    
    # a) remove duplicates within the substitute list from the substitute list 
    
    # create a punctuation set without hyphen, in order to retain hyphens in compound substitutes
    punctuation_without_hyphen = set(string.punctuation) - set('-')
    
    # a) remove duplicates and unwanted punctuation within the substitute list from the substitute list
    substitutes_no_dupl = []
    for sub in substitutes:
        if sub not in substitutes_no_dupl and not any(char in punctuation_without_hyphen for char in sub):
            substitutes_no_dupl.append(sub)
    print(f"SS step: a) substitute list without duplicates and undesired punctuation: {substitutes_no_dupl}\n")
    

   
    # b) remove duplicates and inflected forms of the complex word from the substitute list
    ## Lemmatize the complex word with spaCy, in order to compare it with the lemmatized substitute later to see if their mutual lemmas are the same
    doc_complex_word = nlp(complex_word)
    complex_word_lemma = doc_complex_word[0].lemma_
    print(f"complex_word_lemma for complex word '{complex_word}': {complex_word_lemma}\n")


    ## remove duplicates and inflected forms of the complex word from the list with substitutes
    substitutes_no_dupl_complex_word = []
    for substitute in substitutes_no_dupl:
        doc_substitute = nlp(substitute)
        substitute_lemma = doc_substitute[0].lemma_
        if substitute_lemma != complex_word_lemma:
            substitutes_no_dupl_complex_word.append(substitute)
    print(f"SS step: b) substitute list without duplicates and inflected forms of the complex word: {substitutes_no_dupl_complex_word}\n")

    # c) remove antonyms of the complex word from the substitute list
    substitutes_no_dupl_complex_word_no_antonym = []
    for substitute in substitutes_no_dupl_complex_word:
        syn = wn.synsets(complex_word_lemma)
        if syn:
            syn = syn[0]
            for lemma in syn.lemmas():
                if lemma.antonyms() and lemma.name() == substitute_lemma:
                    print(f"Antonym removed (lemma): {lemma.antonyms()[0].name()}")
                    break
            else:
                substitutes_no_dupl_complex_word_no_antonym.append(substitute)
        else:
            substitutes_no_dupl_complex_word_no_antonym.append(substitute)
    print(f"SS step: c): substitute list without antonyms of the complex word: {substitutes_no_dupl_complex_word_no_antonym}\n")
    
     
    
    # limit the substitutes to the 10 first ones for evaluation
    top_10_substitutes = substitutes_no_dupl_complex_word_no_antonym[:10]
    
    # add the sentence, complex_word, and substitutes to the dataframe 
    substitutes_df.loc[index] = [sentence, complex_word] + top_10_substitutes
    
    # remove the #34-3 and #35-14 character combinations from the sentences in the dataframe
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#34-3 \"", "")
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#35-14 ", "")
    
    

# export the dataframe to a tsv file
substitutes_df.to_csv("./predictions/trial/BertLarge_SG_SS_abc.tsv", sep="\t", index=False, header=False)
    

Sentence: A Spanish government source, however, later said that banks able to cover by themselves losses on their toxic property assets will not be forced to remove them from their books while it will be compulsory for those receiving public help.
Complex word: compulsory
SG step: generated substitutes: ['compulsory', 'mandatory', 'obligatory', 'required', 'voluntary', 'optional', 'mandated', 'necessary', 'forbidden', 'permitted', 'prescribed', '##rricular', 'illegal', 'available', 'beneficial', 'prohibited', 'scheduled', 'obliged', 'tertiary', 'secondary', 'statutory', 'legal', 'free', 'used', 'customary', 'primary', 'authorised', 'canonical', 'standard', 'lawful']

SS step: a) substitute list without duplicates: ['compulsory', 'mandatory', 'obligatory', 'required', 'voluntary', 'optional', 'mandated', 'necessary', 'forbidden', 'permitted', 'prescribed', '##rricular', 'illegal', 'available', 'beneficial', 'prohibited', 'scheduled', 'obliged', 'tertiary', 'secondary', 'statutory', 'leg

python tsar_eval.py --gold_file .\gold_trial.tsv --predictions_file ./predictions/trial/BertLarge_SG_SS_abc.tsv --output_file .\output

Slightly better results on all Map metrics, except for Map@1 (the same).  and on Potential @5. rest is the same.

#### Substitute Generation with BERT-large, and Substitute Selection steps a-c, and the resulting list with FitBERT

In [20]:

# instantiate a FitBert model
fb_model = FitBert(lm_model)



# in each row, for each complex word: 
for index, row in data.iterrows():
       
    # 1. Substitute Generation (SG): perform masking and generate substitutes:
    
    ## print the sentence and the complex word
    sentence, complex_word = row["sentence"], row["complex_word"]
    print(f"Sentence: {sentence}")
    print(f"Complex word: {complex_word}")

    ## in the sentence, replace the complex word with a masked word
    sentence_masked_word = sentence.replace(complex_word, "[MASK]")

    ## concatenate the original sentence and the masked sentence
    tokenizer = fill_mask.tokenizer
    sentences_concat = f"{sentence} {tokenizer.sep_token} {sentence_masked_word}"

    ## generate and rank candidate substitutes for the masked word using the fill_mask pipeline
    top_k = 30
    result = fill_mask(sentences_concat, top_k=top_k)
   
    ## lowercase and print the top-k substitutes
    substitutes = [substitute["token_str"].lower() for substitute in result]
    print(f"SG step: generated substitutes: {substitutes}\n")
    
    # 2. Substitute Selection (SS):   
    
     # create a punctuation set without hyphen, in order to retain hyphens in compound substitutes
    punctuation_without_hyphen = set(string.punctuation) - set('-')
    
    # a) remove duplicates and unwanted punctuation within the substitute list from the substitute list
    substitutes_no_dupl = []
    for sub in substitutes:
        if sub not in substitutes_no_dupl and not any(char in punctuation_without_hyphen for char in sub):
            substitutes_no_dupl.append(sub)
    print(f"SS step: a) substitute list without duplicates and undesired punctuation: {substitutes_no_dupl}\n")
    

   
    # b) remove duplicates and inflected forms of the complex word from the substitute list
    ## Lemmatize the complex word with spaCy, in order to compare it with the lemmatized substitute later to see if their mutual lemmas are the same
    doc_complex_word = nlp(complex_word)
    complex_word_lemma = doc_complex_word[0].lemma_
    print(f"complex_word_lemma for complex word '{complex_word}': {complex_word_lemma}\n")


    ## remove duplicates and inflected forms of the complex word from the list with substitutes
    substitutes_no_dupl_complex_word = []
    for substitute in substitutes_no_dupl:
        doc_substitute = nlp(substitute)
        substitute_lemma = doc_substitute[0].lemma_
        if substitute_lemma != complex_word_lemma:
            substitutes_no_dupl_complex_word.append(substitute)
    print(f"SS step: b) substitute list without duplicates and inflected forms of the complex word: {substitutes_no_dupl_complex_word}\n")

    # c) remove antonyms of the complex word from the substitute list
    substitutes_no_dupl_complex_word_no_antonym = []
    for substitute in substitutes_no_dupl_complex_word:
        syn = wn.synsets(complex_word_lemma)
        if syn:
            syn = syn[0]
            for lemma in syn.lemmas():
                if lemma.antonyms() and lemma.name() == substitute_lemma:
                    print(f"Antonym removed (lemma): {lemma.antonyms()[0].name()}")
                    break
            else:
                substitutes_no_dupl_complex_word_no_antonym.append(substitute)
        else:
            substitutes_no_dupl_complex_word_no_antonym.append(substitute)
    print(f"SS step: c): substitute list without antonyms of the complex word: {substitutes_no_dupl_complex_word_no_antonym}\n")
    
    
    # d) apply FITBERT to the list of substitutes
    sentence_fitbert_masked = sentence_masked_word.replace("[MASK]", "***mask***")
    sentences_concat_fitbert = f"{sentence} {tokenizer.sep_token} {sentence_fitbert_masked}"
    
    ranked_substitutes = fb_model.rank(sentences_concat_fitbert, substitutes_no_dupl_complex_word_no_antonym)
    print(f"SS step: d) ranked substitutes using FitBert: {ranked_substitutes}\n")
    
    print('-----------------------------------------------------------------------------------------')
    print()
    
    
    
    # limit the substitutes to the 10 first ones for evaluation
    top_10_substitutes = ranked_substitutes[:10]
    
    # add the sentence, complex_word, and substitutes to the dataframe 
    substitutes_df.loc[index] = [sentence, complex_word] + top_10_substitutes
    
    # remove the #34-3 and #35-14 character combinations from the sentences in the dataframe
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#34-3 \"", "")
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#35-14 ", "")
    
    

# export the dataframe to a tsv file
substitutes_df.to_csv("./predictions/trial/BertLarge_SG_SS_abc_fb.tsv", sep="\t", index=False, header=False)

device: cpu
using custom model: ['BertForMaskedLM']
Sentence: A Spanish government source, however, later said that banks able to cover by themselves losses on their toxic property assets will not be forced to remove them from their books while it will be compulsory for those receiving public help.
Complex word: compulsory
SG step: generated substitutes: ['compulsory', 'mandatory', 'obligatory', 'required', 'voluntary', 'optional', 'mandated', 'necessary', 'forbidden', 'permitted', 'prescribed', '##rricular', 'illegal', 'available', 'beneficial', 'prohibited', 'scheduled', 'obliged', 'tertiary', 'secondary', 'statutory', 'legal', 'free', 'used', 'customary', 'primary', 'authorised', 'canonical', 'standard', 'lawful']

SS step: a) substitute list without duplicates: ['compulsory', 'mandatory', 'obligatory', 'required', 'voluntary', 'optional', 'mandated', 'necessary', 'forbidden', 'permitted', 'prescribed', '##rricular', 'illegal', 'available', 'beneficial', 'prohibited', 'scheduled', '

python tsar_eval.py --gold_file .\gold_trial.tsv --predictions_file ./predictions/trial/BertLarge_SG_SS_abc_fb.tsv --output_file .\output

In [21]:
# hardly changed, only a very tiny bit better on map@5 and map@10 

#### Substitute Generation with BERT-large, and Substitute Selection steps a-c, and the resulting list with contextualized embeddings

In [22]:
from transformers import TFAutoModel
import tensorflow as tf
import numpy as np

In [23]:
# Calculates similarity between the original sentence and the sentences with candidate substitutes that were retrieved in the SG step 
# creates a list with sentences with substitute words filled in (commented out for oversight purposes)


def calculate_similarity_scores(sentence, sentence_with_substitutes):
    tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased")
    tf_model = TFAutoModel.from_pretrained("bert-large-uncased")

    def embed_text(text):
        tokens = tokenizer(text, padding=True, truncation=True, return_tensors="tf")
        outputs = tf_model(**tokens)
        embeddings = outputs.last_hidden_state[:, 0, :]
        embeddings = tf.nn.l2_normalize(embeddings, axis=1)
        return embeddings

    original_sentence_embedding = embed_text(sentence)
    substitute_sentence_embeddings = embed_text(sentence_with_substitutes)

    cosine_similarity = np.inner(original_sentence_embedding, substitute_sentence_embeddings)
    similarity_scores = cosine_similarity[0]

    return similarity_scores



# in each row, for each complex word: 
for index, row in data.iterrows():
       
    # 1. Substitute Generation (SG): perform masking and generate substitutes:
    
    ## print the sentence and the complex word
    sentence, complex_word = row["sentence"], row["complex_word"]
    print(f"Sentence: {sentence}")
    print(f"Complex word: {complex_word}")

    ## in the sentence, replace the complex word with a masked word
    sentence_masked_word = sentence.replace(complex_word, "[MASK]")

    ## concatenate the original sentence and the masked sentence
    tokenizer = fill_mask.tokenizer
    sentences_concat = f"{sentence} {tokenizer.sep_token} {sentence_masked_word}"

    ## generate and rank candidate substitutes for the masked word using the fill_mask pipeline
    top_k = 30
    result = fill_mask(sentences_concat, top_k=top_k)
   
    ## lowercase and print the top-k substitutes
    substitutes = [substitute["token_str"].lower() for substitute in result]
    print(f"SG step: generated substitutes: {substitutes}\n")
    
    # 2. Substitute Selection (SS):   
    
     # create a punctuation set without hyphen, in order to retain hyphens in compound substitutes
    punctuation_without_hyphen = set(string.punctuation) - set('-')
    
    # a) remove duplicates and unwanted punctuation within the substitute list from the substitute list
    substitutes_no_dupl = []
    for sub in substitutes:
        if sub not in substitutes_no_dupl and not any(char in punctuation_without_hyphen for char in sub):
            substitutes_no_dupl.append(sub)
    print(f"SS step: a) substitute list without duplicates and undesired punctuation: {substitutes_no_dupl}\n")
    
   
    # b) remove duplicates and inflected forms of the complex word from the substitute list
    ## Lemmatize the complex word with spaCy, in order to compare it with the lemmatized substitute later to see if their mutual lemmas are the same
    doc_complex_word = nlp(complex_word)
    complex_word_lemma = doc_complex_word[0].lemma_
    print(f"complex_word_lemma for complex word '{complex_word}': {complex_word_lemma}\n")


    ## remove duplicates and inflected forms of the complex word from the list with substitutes
    substitutes_no_dupl_complex_word = []
    for substitute in substitutes_no_dupl:
        doc_substitute = nlp(substitute)
        substitute_lemma = doc_substitute[0].lemma_
        if substitute_lemma != complex_word_lemma:
            substitutes_no_dupl_complex_word.append(substitute)
    print(f"SS step: b) substitute list without duplicates and inflected forms of the complex word: {substitutes_no_dupl_complex_word}\n")

    # c) remove antonyms of the complex word from the substitute list
    substitutes_no_dupl_complex_word_no_antonym = []
    for substitute in substitutes_no_dupl_complex_word:
        syn = wn.synsets(complex_word_lemma)
        if syn:
            syn = syn[0]
            for lemma in syn.lemmas():
                if lemma.antonyms() and lemma.name() == substitute_lemma:
                    print(f"Antonym removed (lemma): {lemma.antonyms()[0].name()}")
                    break
            else:
                substitutes_no_dupl_complex_word_no_antonym.append(substitute)
        else:
            substitutes_no_dupl_complex_word_no_antonym.append(substitute)
    print(f"SS step: c): substitute list without antonyms of the complex word: {substitutes_no_dupl_complex_word_no_antonym}\n")
    
    
    # create sentence with the complex word replaced by the substitutes
    sentence_with_substitutes = [sentence.replace(complex_word, sub) for sub in substitutes_no_dupl_complex_word_no_antonym]
    #print(f"List with sentences where complex word is substituted: {sentence_with_substitutes}\n")
    
    
    # d) calculate cosine similarity scores, and rank the substitutes based on their similarity score
    similarity_scores = calculate_similarity_scores(sentence, sentence_with_substitutes)
    #print(f"Similarity scores: {similarity_scores}\n")
    ranked_substitutes_withscores = sorted(zip(substitutes_no_dupl_complex_word_no_antonym, similarity_scores), key=lambda x: x[1], reverse=True)
    #print(f"SS step d) Ranked substitutes, including similarity scores in context: {ranked_substitutes}\n")
    ranked_substitutes = [substitute for substitute, score in ranked_substitutes_withscores]
    print(f"SS step d) Ranked substitutes, based on cosine similarity scores in context: {ranked_substitutes}\n")
        
    print('-----------------------------------------------------------------------------------------')
    print()
    
       
    
    
    # limit the substitutes to the 10 first ones for evaluation
    top_10_substitutes = ranked_substitutes[:10]
    
    # add the sentence, complex_word, and substitutes to the dataframe 
    substitutes_df.loc[index] = [sentence, complex_word] + top_10_substitutes
    
    # remove the #34-3 and #35-14 character combinations from the sentences in the dataframe
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#34-3 \"", "")
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#35-14 ", "")
    
    

# export the dataframe to a tsv file
substitutes_df.to_csv("./predictions/trial/BertLarge_SG_SS_abc_ce.tsv", sep="\t", index=False, header=False)

Sentence: A Spanish government source, however, later said that banks able to cover by themselves losses on their toxic property assets will not be forced to remove them from their books while it will be compulsory for those receiving public help.
Complex word: compulsory
SG step: generated substitutes: ['compulsory', 'mandatory', 'obligatory', 'required', 'voluntary', 'optional', 'mandated', 'necessary', 'forbidden', 'permitted', 'prescribed', '##rricular', 'illegal', 'available', 'beneficial', 'prohibited', 'scheduled', 'obliged', 'tertiary', 'secondary', 'statutory', 'legal', 'free', 'used', 'customary', 'primary', 'authorised', 'canonical', 'standard', 'lawful']

SS step: a) substitute list without duplicates: ['compulsory', 'mandatory', 'obligatory', 'required', 'voluntary', 'optional', 'mandated', 'necessary', 'forbidden', 'permitted', 'prescribed', '##rricular', 'illegal', 'available', 'beneficial', 'prohibited', 'scheduled', 'obliged', 'tertiary', 'secondary', 'statutory', 'leg

Some layers from the model checkpoint at bert-large-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-large-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['obligatory', 'forbidden', 'voluntary', 'legal', 'secondary', 'statutory', 'primary', 'mandatory', 'authorised', 'necessary', 'illegal', 'free', 'available', 'scheduled', 'lawful', 'obliged', 'standard', 'prescribed', 'used', 'beneficial', 'prohibited', 'mandated', 'permitted', 'required', 'optional', 'customary', 'tertiary', 'canonical', '##rricular']

-----------------------------------------------------------------------------------------

Sentence: Rajoy's conservative government had instilled markets with a brief dose of confidence by stepping into Bankia, performing a U-turn on its refusal to spend public money to rescue banks.
Complex word: instilled
SG step: generated substitutes: ['provided', 'injected', 'infused', 'presented', 'left', 'supplied', 'delivered', 'endowed', 'impressed', 'fed', 'gifted', 'struck', 'furnished', 'inspired', 'fitted', 'equipped', 'raised', 'given', 'seeded', 'created', 'rew

Some layers from the model checkpoint at bert-large-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-large-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['infused', 'filled', 'injected', 'fitted', 'delivered', 'seeded', 'equipped', 'infected', 'left', 'created', 'fed', 'tested', 'gifted', 'encouraged', 'given', 'inspired', 'raised', 'impressed', 'shaken', 'endowed', 'introduced', 'hit', 'supplied', 'treated', 'offered', 'struck', 'provided', 'rewarded', 'furnished', 'presented']

-----------------------------------------------------------------------------------------

Sentence: #34-3 "War maniacs of the South Korean puppet military made another grave provocation to the DPRK in the central western sector of the front on Thursday afternoon.
Complex word: maniacs
SG step: generated substitutes: ['machines', 'mania', 'criminals', 'killers', 'monsters', 'zombies', 'freaks', 'hysteria', 'demons', 'gods', 'insects', 'sims', 'crimes', 'pigs', 'dogs', 'victims', 'organs', 'heroes', 'fans', 'fantasies', 'dolls', 'leaders', 'puppets', 'followers', '##atics', 'vampires',

Some layers from the model checkpoint at bert-large-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-large-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['monsters', 'demons', 'insects', 'vampires', 'zombies', 'freaks', 'dogs', 'pigs', 'gods', 'lovers', 'fantasies', 'dolls', 'mania', 'giants', 'sims', 'leaders', 'machines', 'fans', 'hysteria', 'followers', 'children', 'puppets', 'organs', 'heroes', 'victims', 'killers', 'theorists', 'criminals', 'crimes', '##atics']

-----------------------------------------------------------------------------------------

Sentence: The daily death toll in Syria has declined as the number of observers has risen, but few experts expect the U.N. plan to succeed in its entirety.
Complex word: observers
SG step: generated substitutes: ['observers', 'observer', 'monitors', 'witnesses', 'participants', 'analysts', 'spectators', 'observations', 'journalists', 'fighters', 'observation', '##keepers', 'experts', 'activists', 'casualties', 'volunteers', 'civilians', 'observing', 'people', 'citizens', 'combatants', 'observe', 'individuals

Some layers from the model checkpoint at bert-large-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-large-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['monitoring', 'observation', 'troops', 'monitors', 'soldiers', 'analysts', 'experts', 'victims', 'casualties', 'observations', 'refugees', 'fighters', 'observing', 'civilians', 'volunteers', 'individuals', 'observe', 'indicators', 'participants', 'journalists', 'people', 'combatants', 'citizens', 'witnesses', 'instruments', 'activists', '##keepers', 'spectators']

-----------------------------------------------------------------------------------------

Sentence: An amateur video showed a young girl who apparently suffered shrapnel wounds in her thigh undergoing treatment in a makeshift Rastan hospital while screaming in pain.
Complex word: shrapnel
SG step: generated substitutes: ['gunshot', 'bullet', 'stab', 'knife', 'flesh', 'arrow', 'multiple', 'minor', 'blast', 'battle', 'three', 'splinter', 'shot', 'serious', 'exit', 'entry', 'shotgun', 'several', 'two', 'severe', 'shell', 'open', 'burn', 'bomb', 'penet

Some layers from the model checkpoint at bert-large-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-large-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['bullet', 'blast', 'arrow', 'serious', 'fatal', 'severe', 'penetrating', 'multiple', 'splinter', 'flesh', 'shotgun', 'gunshot', 'two', 'stabbing', 'several', 'battle', 'inflicted', 'three', 'minor', 'entry', 'exit', 'shot', 'burn', 'open', 'internal', 'knife', 'the', 'stab', 'shell', 'bomb']

-----------------------------------------------------------------------------------------

Sentence: A local witness said a separate group of attackers disguised in burqas — the head-to-toe robes worn by conservative Afghan women — then tried to storm the compound.
Complex word: disguised
SG step: generated substitutes: ['disguised', 'dressed', 'masked', 'disguise', 'dressing', 'armoured', 'clothed', 'concealed', 'clad', 'guise', 'hiding', 'draped', 'posing', 'appearing', ',', 'hidden', 'acting', 'dress', 'painted', 'seated', 'fitted', 'identified', 'covered', 'armed', 'appeared', 'cloak', 'armored', 'undercover', 'portr

Some layers from the model checkpoint at bert-large-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-large-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['masked', 'dressed', 'clad', 'covered', 'armed', 'hiding', 'draped', 'posing', 'dressing', 'concealed', 'hidden', 'acting', 'clothed', 'armored', 'undercover', 'seated', 'armoured', ',', 'portrayed', 'appearing', 'disguise', 'depicted', 'cloak', 'painted', 'fitted', 'dress', 'appeared', 'guise', 'identified']

-----------------------------------------------------------------------------------------

Sentence: Syria's Sunni majority is at the forefront of the uprising against Assad, whose minority Alawite sect is an offshoot of Shi'ite Islam.
Complex word: offshoot
SG step: generated substitutes: ['extension', 'affiliate', 'ally', 'arm', 'outpost', 'element', 'opponent', 'enemy', 'aspect', 'enclave', 'heir', 'isolate', 'offspring', 'expression', 'orthodox', 'evolution', 'exception', 'example', 'sect', 'embrace', 'echo', 'approximation', 'adherence', 'adhere', 'adaptation', 'associate', 'issue', 'inhibitor', 'i

Some layers from the model checkpoint at bert-large-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-large-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['extension', 'echo', 'arm', 'aspect', 'adhere', 'outpost', 'affiliate', 'heir', 'extreme', 'isolate', 'expression', 'imitation', 'adaptation', 'associate', 'evolution', 'offspring', 'approximation', 'element', 'adherence', 'inhibitor', 'ally', 'opponent', 'orthodox', 'enemy', 'enclave', 'embrace', 'sect', 'example', 'issue', 'exception']

-----------------------------------------------------------------------------------------

Sentence: Although not as rare in the symphonic literature as sharper keys , examples of symphonies in A major are not as numerous as for D major or G major .
Complex word: symphonic
SG step: generated substitutes: ['symphonic', 'orchestral', 'classical', 'symphony', 'philharmonic', 'symphonies', 'musical', 'concerto', 'sonata', 'operatic', 'piano', 'concert', 'chamber', 'music', 'orchestra', 'modern', 'lyrical', 'thematic', 'instrumental', 'continental', 'orchestrated', 'scholarly', '

Some layers from the model checkpoint at bert-large-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-large-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['orchestrated', 'chamber', 'symphonies', 'piano', 'romantic', 'sonata', 'concerto', 'symphony', 'orchestral', 'goldberg', 'philharmonic', 'music', 'orchestra', 'instrumental', 'classical', 'scholarly', 'modern', 'concert', 'musical', 'dramatic', 'standard', 'written', 'operatic', 'comparative', 'thematic', 'choral', 'continental', 'poetic', 'lyrical']

-----------------------------------------------------------------------------------------

Sentence: That prompted the military to deploy its largest warship, the BRP Gregorio del Pilar, which was recently acquired from the United States.
Complex word: deploy
SG step: generated substitutes: ['deploy', 'deployed', 'activate', 'deployment', 'dispatch', 'assemble', 'launch', 'send', 'use', 'withdraw', 'utilize', 'operate', 'employ', 'acquire', 'maintain', 'reinforce', 'exercise', 'install', 'build', 'move', 'drill', 'fire', 'construct', 'alert', 'dock', 'establish

Some layers from the model checkpoint at bert-large-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-large-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['deployment', 'activate', 'employ', 'utilize', 'ready', 'operate', 'dock', 'dispatch', 'reinforce', 'exercise', 'outfit', 'acquire', 'launch', 'send', 'install', 'assemble', 'use', 'establish', 'move', 'maintain', 'construct', 'station', 'drill', 'build', 'withdraw', 'release', 'fire', 'alert']

-----------------------------------------------------------------------------------------

Sentence: #35-14 UK police were expressly forbidden, at a ministerial level, to provide any assistance to Thai authorities as the case involves the death penalty.
Complex word: authorities
SG step: generated substitutes: ['authorities', 'police', 'officials', '##s', 'magistrates', 'officers', 'forces', 'courts', 'authority', 'people', 'government', 'investigators', 'governments', 'victims', 'ministers', 'operators', 'bodies', 'judges', 'agencies', 'policemen', 'rulers', 'organisations', 'prosecutors', 'elements', 'offences', 'jo

Some layers from the model checkpoint at bert-large-uncased were not used when initializing TFBertModel: ['mlm___cls', 'nsp___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-large-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


SS step d) Ranked substitutes, based on cosine similarity scores in context: ['agencies', 'officials', 'police', 'government', 'policemen', 'prosecutors', 'officers', 'ministers', 'persons', 'courts', 'bodies', 'governments', 'people', 'residents', 'organisations', 'forces', 'magistrates', 'subjects', 'investigators', 'judges', 'offences', 'rulers', 'prisoners', 'victims', 'journalists', 'elements', 'operators', '##s']

-----------------------------------------------------------------------------------------



In [None]:
python tsar_eval.py --gold_file .\gold_trial.tsv --predictions_file ./predictions/trial/BertLarge_SG_SS_abc_ce.tsv --output_file .\output

results a lot worse than bert-base!

#### Substitute Generation with BERT-Large, and Substitute Selection steps a-c, and the resulting list with BERTScore

In [24]:
import bert_score
from bert_score import score

In [25]:


# in each row, for each complex word: 
for index, row in data.iterrows():
       
    # 1. Substitute Generation (SG): perform masking and generate substitutes:
    
    ## print the sentence and the complex word
    sentence, complex_word = row["sentence"], row["complex_word"]
    print(f"Sentence: {sentence}")
    print(f"Complex word: {complex_word}")

    ## in the sentence, replace the complex word with a masked word
    sentence_masked_word = sentence.replace(complex_word, "[MASK]")

    ## concatenate the original sentence and the masked sentence
    tokenizer = fill_mask.tokenizer
    sentences_concat = f"{sentence} {tokenizer.sep_token} {sentence_masked_word}"

    ## generate and rank candidate substitutes for the masked word using the fill_mask pipeline
    top_k = 30
    result = fill_mask(sentences_concat, top_k=top_k)
   
    ## lowercase and print the top-k substitutes
    substitutes = [substitute["token_str"].lower() for substitute in result]
    print(f"SG step: generated substitutes: {substitutes}\n")
    
    # 2. Substitute Selection (SS):   
    
     # create a punctuation set without hyphen, in order to retain hyphens in compound substitutes
    punctuation_without_hyphen = set(string.punctuation) - set('-')
    
    # a) remove duplicates and unwanted punctuation within the substitute list from the substitute list
    substitutes_no_dupl = []
    for sub in substitutes:
        if sub not in substitutes_no_dupl and not any(char in punctuation_without_hyphen for char in sub):
            substitutes_no_dupl.append(sub)
    print(f"SS step: a) substitute list without duplicates and undesired punctuation: {substitutes_no_dupl}\n")
    

   
    # b) remove duplicates and inflected forms of the complex word from the substitute list
    ## Lemmatize the complex word with spaCy, in order to compare it with the lemmatized substitute later to see if their mutual lemmas are the same
    doc_complex_word = nlp(complex_word)
    complex_word_lemma = doc_complex_word[0].lemma_
    print(f"complex_word_lemma for complex word '{complex_word}': {complex_word_lemma}\n")


    ## remove duplicates and inflected forms of the complex word from the list with substitutes
    substitutes_no_dupl_complex_word = []
    for substitute in substitutes_no_dupl:
        doc_substitute = nlp(substitute)
        substitute_lemma = doc_substitute[0].lemma_
        if substitute_lemma != complex_word_lemma:
            substitutes_no_dupl_complex_word.append(substitute)
    print(f"SS step: b) substitute list without duplicates and inflected forms of the complex word: {substitutes_no_dupl_complex_word}\n")

    # c) remove antonyms of the complex word from the substitute list
    substitutes_no_dupl_complex_word_no_antonym = []
    for substitute in substitutes_no_dupl_complex_word:
        syn = wn.synsets(complex_word_lemma)
        if syn:
            syn = syn[0]
            for lemma in syn.lemmas():
                if lemma.antonyms() and lemma.name() == substitute_lemma:
                    print(f"Antonym removed (lemma): {lemma.antonyms()[0].name()}")
                    break
            else:
                substitutes_no_dupl_complex_word_no_antonym.append(substitute)
        else:
            substitutes_no_dupl_complex_word_no_antonym.append(substitute)
    print(f"SS step: c): substitute list without antonyms of the complex word: {substitutes_no_dupl_complex_word_no_antonym}\n")
    
    
    # create sentences with the complex word replaced by the substitutes
    sentences_with_substitutes = [sentence.replace(complex_word, sub) for sub in substitutes_no_dupl_complex_word_no_antonym]
    #print(f"SG step: sentences with substitutes: {sentences_with_substitutes}\n")
    
          
    # d) use BERTScore for sorting
    scores = bert_score.score([sentence]*len(sentences_with_substitutes), sentences_with_substitutes, lang="en", model_type='bert-large-uncased', verbose=False)
    ranked_substitutes = [substitute for _, substitute in sorted(zip(scores[0].tolist(), substitutes_no_dupl_complex_word_no_antonym), reverse=True)]
    print(f"SS step: d) substitute list sorted by descending BERTScore: {ranked_substitutes}\n")

    
    print('-----------------------------------------------------------------------------------------')
    print()
    
    
    # limit the substitutes to the 10 first ones for evaluation
    top_10_substitutes = ranked_substitutes[:10]
    
    # add the sentence, complex_word, and substitutes to the dataframe 
    substitutes_df.loc[index] = [sentence, complex_word] + top_10_substitutes
    
    # remove the #34-3 and #35-14 character combinations from the sentences in the dataframe
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#34-3 \"", "")
    substitutes_df.iloc[:, 0] = substitutes_df.iloc[:, 0].str.replace("#35-14 ", "")
    
    

# export the dataframe to a tsv file
substitutes_df.to_csv("./predictions/trial/BertLarge_SG_SS_abc_bs.tsv", sep="\t", index=False, header=False)

device: cpu
using custom model: ['BertForMaskedLM']
Sentence: A Spanish government source, however, later said that banks able to cover by themselves losses on their toxic property assets will not be forced to remove them from their books while it will be compulsory for those receiving public help.
Complex word: compulsory
SG step: generated substitutes: ['compulsory', 'mandatory', 'obligatory', 'required', 'voluntary', 'optional', 'mandated', 'necessary', 'forbidden', 'permitted', 'prescribed', '##rricular', 'illegal', 'available', 'beneficial', 'prohibited', 'scheduled', 'obliged', 'tertiary', 'secondary', 'statutory', 'legal', 'free', 'used', 'customary', 'primary', 'authorised', 'canonical', 'standard', 'lawful']

SS step: a) substitute list without duplicates: ['compulsory', 'mandatory', 'obligatory', 'required', 'voluntary', 'optional', 'mandated', 'necessary', 'forbidden', 'permitted', 'prescribed', '##rricular', 'illegal', 'available', 'beneficial', 'prohibited', 'scheduled', '

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['mandatory', 'required', 'mandated', 'necessary', 'voluntary', 'optional', 'permitted', 'legal', 'prohibited', 'free', 'lawful', 'illegal', 'standard', 'obligatory', 'forbidden', 'tertiary', 'available', 'beneficial', 'authorised', 'customary', 'obliged', 'used', 'prescribed', 'secondary', 'scheduled', 'statutory', 'primary', '##rricular', 'canonical']

-----------------------------------------------------------------------------------------

Sentence: Rajoy's conservative government had instilled markets with a brief dose of confidence by stepping into Bankia, performing a U-turn on its refusal to spend public money to rescue banks.
Complex word: instilled
SG step: generated substitutes: ['provided', 'injected', 'infused', 'presented', 'left', 'supplied', 'delivered', 'endowed', 'impressed', 'fed', 'gifted', 'struck', 'furnished', 'inspired', 'fitted', 'equipped', 'raised', 'given', 'seeded', 'created', 'rewarded', 'hit', 't

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['infused', 'filled', 'injected', 'fed', 'inspired', 'provided', 'seeded', 'encouraged', 'gifted', 'endowed', 'infected', 'equipped', 'presented', 'shaken', 'treated', 'impressed', 'hit', 'struck', 'furnished', 'left', 'supplied', 'rewarded', 'raised', 'tested', 'created', 'delivered', 'offered', 'given', 'introduced', 'fitted']

-----------------------------------------------------------------------------------------

Sentence: #34-3 "War maniacs of the South Korean puppet military made another grave provocation to the DPRK in the central western sector of the front on Thursday afternoon.
Complex word: maniacs
SG step: generated substitutes: ['machines', 'mania', 'criminals', 'killers', 'monsters', 'zombies', 'freaks', 'hysteria', 'demons', 'gods', 'insects', 'sims', 'crimes', 'pigs', 'dogs', 'victims', 'organs', 'heroes', 'fans', 'fantasies', 'dolls', 'leaders', 'puppets', 'followers', '##atics', 'vampires', 'theorists', 'gi

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['freaks', 'criminals', 'puppets', 'mania', 'machines', 'dogs', 'heroes', 'killers', 'crimes', 'leaders', 'lovers', 'children', 'fans', 'victims', 'followers', 'sims', 'organs', 'demons', 'hysteria', 'monsters', 'gods', 'dolls', 'theorists', 'giants', '##atics', 'fantasies', 'zombies', 'pigs', 'insects', 'vampires']

-----------------------------------------------------------------------------------------

Sentence: The daily death toll in Syria has declined as the number of observers has risen, but few experts expect the U.N. plan to succeed in its entirety.
Complex word: observers
SG step: generated substitutes: ['observers', 'observer', 'monitors', 'witnesses', 'participants', 'analysts', 'spectators', 'observations', 'journalists', 'fighters', 'observation', '##keepers', 'experts', 'activists', 'casualties', 'volunteers', 'civilians', 'observing', 'people', 'citizens', 'combatants', 'observe', 'individuals', 'indicators', 

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['monitors', 'experts', 'observations', 'witnesses', 'volunteers', '##keepers', 'analysts', 'activists', 'participants', 'journalists', 'monitoring', 'spectators', 'troops', 'soldiers', 'combatants', 'indicators', 'citizens', 'fighters', 'refugees', 'instruments', 'individuals', 'civilians', 'observation', 'people', 'casualties', 'victims', 'observing', 'observe']

-----------------------------------------------------------------------------------------

Sentence: An amateur video showed a young girl who apparently suffered shrapnel wounds in her thigh undergoing treatment in a makeshift Rastan hospital while screaming in pain.
Complex word: shrapnel
SG step: generated substitutes: ['gunshot', 'bullet', 'stab', 'knife', 'flesh', 'arrow', 'multiple', 'minor', 'blast', 'battle', 'three', 'splinter', 'shot', 'serious', 'exit', 'entry', 'shotgun', 'several', 'two', 'severe', 'shell', 'open', 'burn', 'bomb', 'penetrating', 'the', '

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['bullet', 'stab', 'gunshot', 'knife', 'multiple', 'stabbing', 'severe', 'blast', 'several', 'splinter', 'minor', 'serious', 'two', 'open', 'internal', 'three', 'battle', 'shell', 'penetrating', 'shotgun', 'shot', 'burn', 'flesh', 'entry', 'fatal', 'arrow', 'inflicted', 'exit', 'the', 'bomb']

-----------------------------------------------------------------------------------------

Sentence: A local witness said a separate group of attackers disguised in burqas — the head-to-toe robes worn by conservative Afghan women — then tried to storm the compound.
Complex word: disguised
SG step: generated substitutes: ['disguised', 'dressed', 'masked', 'disguise', 'dressing', 'armoured', 'clothed', 'concealed', 'clad', 'guise', 'hiding', 'draped', 'posing', 'appearing', ',', 'hidden', 'acting', 'dress', 'painted', 'seated', 'fitted', 'identified', 'covered', 'armed', 'appeared', 'cloak', 'armored', 'undercover', 'portrayed', 'depicted'

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['masked', 'concealed', 'clothed', 'posing', 'dressed', 'clad', 'hidden', 'covered', 'dressing', 'hiding', 'draped', 'armed', 'appearing', 'acting', 'painted', 'seated', 'identified', 'dress', 'armored', 'appeared', 'depicted', 'portrayed', 'fitted', 'disguise', 'undercover', 'cloak', ',', 'armoured', 'guise']

-----------------------------------------------------------------------------------------

Sentence: Syria's Sunni majority is at the forefront of the uprising against Assad, whose minority Alawite sect is an offshoot of Shi'ite Islam.
Complex word: offshoot
SG step: generated substitutes: ['extension', 'affiliate', 'ally', 'arm', 'outpost', 'element', 'opponent', 'enemy', 'aspect', 'enclave', 'heir', 'isolate', 'offspring', 'expression', 'orthodox', 'evolution', 'exception', 'example', 'sect', 'embrace', 'echo', 'approximation', 'adherence', 'adhere', 'adaptation', 'associate', 'issue', 'inhibitor', 'imitation', 'extre

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['extension', 'affiliate', 'arm', 'offspring', 'outpost', 'adaptation', 'echo', 'expression', 'evolution', 'ally', 'aspect', 'heir', 'approximation', 'extreme', 'imitation', 'exception', 'opponent', 'element', 'associate', 'enemy', 'enclave', 'embrace', 'example', 'sect', 'adherence', 'inhibitor', 'orthodox', 'isolate', 'adhere', 'issue']

-----------------------------------------------------------------------------------------

Sentence: Although not as rare in the symphonic literature as sharper keys , examples of symphonies in A major are not as numerous as for D major or G major .
Complex word: symphonic
SG step: generated substitutes: ['symphonic', 'orchestral', 'classical', 'symphony', 'philharmonic', 'symphonies', 'musical', 'concerto', 'sonata', 'operatic', 'piano', 'concert', 'chamber', 'music', 'orchestra', 'modern', 'lyrical', 'thematic', 'instrumental', 'continental', 'orchestrated', 'scholarly', 'dramatic', 'goldb

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['philharmonic', 'symphony', 'orchestral', 'operatic', 'concerto', 'thematic', 'symphonies', 'sonata', 'classical', 'musical', 'dramatic', 'choral', 'written', 'lyrical', 'concert', 'instrumental', 'music', 'modern', 'poetic', 'chamber', 'continental', 'romantic', 'orchestra', 'standard', 'comparative', 'scholarly', 'goldberg', 'piano', 'orchestrated']

-----------------------------------------------------------------------------------------

Sentence: That prompted the military to deploy its largest warship, the BRP Gregorio del Pilar, which was recently acquired from the United States.
Complex word: deploy
SG step: generated substitutes: ['deploy', 'deployed', 'activate', 'deployment', 'dispatch', 'assemble', 'launch', 'send', 'use', 'withdraw', 'utilize', 'operate', 'employ', 'acquire', 'maintain', 'reinforce', 'exercise', 'install', 'build', 'move', 'drill', 'fire', 'construct', 'alert', 'dock', 'establish', 'release', 'ou

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['dispatch', 'activate', 'utilize', 'station', 'employ', 'use', 'install', 'release', 'launch', 'operate', 'send', 'outfit', 'exercise', 'assemble', 'reinforce', 'dock', 'establish', 'drill', 'construct', 'maintain', 'acquire', 'fire', 'ready', 'build', 'move', 'withdraw', 'deployment', 'alert']

-----------------------------------------------------------------------------------------

Sentence: #35-14 UK police were expressly forbidden, at a ministerial level, to provide any assistance to Thai authorities as the case involves the death penalty.
Complex word: authorities
SG step: generated substitutes: ['authorities', 'police', 'officials', '##s', 'magistrates', 'officers', 'forces', 'courts', 'authority', 'people', 'government', 'investigators', 'governments', 'victims', 'ministers', 'operators', 'bodies', 'judges', 'agencies', 'policemen', 'rulers', 'organisations', 'prosecutors', 'elements', 'offences', 'journalists', 'pers

Some weights of the model checkpoint at roberta-large were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


SS step: d) substitute list sorted by descending BERTScore: ['police', 'officials', 'government', 'magistrates', 'investigators', 'prosecutors', 'policemen', 'courts', 'judges', 'residents', 'prisoners', 'forces', 'governments', 'officers', 'operators', 'ministers', 'agencies', 'journalists', 'rulers', 'people', 'organisations', 'subjects', 'bodies', 'persons', 'elements', 'victims', 'offences', '##s']

-----------------------------------------------------------------------------------------



python tsar_eval.py --gold_file .\gold_trial.tsv --predictions_file ./predictions/trial/BertLarge_SG_SS_abc_bs.tsv --output_file .\output

=========   EVALUATION config.=========
GOLD file = .\gold_trial.tsv
PREDICTION LABELS file = ./predictions/trial/BertLarge_SG_SS_abc_bs.tsv
OUTPUT file = .\output
===============   RESULTS  =============
MAP@1/Potential@1/Precision@1 = 0.6

MAP@3 = 0.3777
MAP@5 = 0.2836
MAP@10 = 0.1713

Potential@3 = 0.7
Potential@5 = 0.9
Potential@10 = 0.9

Accuracy@1@top_gold_1 = 0.3
Accuracy@2@top_gold_1 = 0.5
Accuracy@3@top_gold_1 = 0.5

Results: best result so far for BERTLarge. MAP@3,5,10 and Pot@5 also slightly better compared to Bert-base (rest is the same) These good results are surprising due to the fact that the plain results for BERTLarge were worse than the plain results for BERTBase.

### Conclusion so far

SS Step a, b, c for both BERT models: keep them in Subs.Selection method as they do help a bit. 

SS Step FitBERT: remove, due to low scores.

In [None]:
SS Step context. emb and Bertscore: keep in Subs selection, but differences are subtile and not always better 

Best performing on MAP@1:  BertBase_SG_SS_abc_ce and  BertBase_SG_SS_abc_bs and BertLarge_SG_SS_abc_bs; when taking the other metrics into account:  BertLarge_SG_SS_abc_bs (see below). 

### please note i updated the code after running the eval's. As I removed the punct chars after the SG step.