# Edge Weights Updating
<b>Date:</b> October 21, 2023 \
<b>Author:</b> Dimitris Lymperopoulos \
<b>Description:</b> A notebook containing necessary functions for fine-tuning the bipartite graph for edits generation

## TODOs
* Create parallelization for fluency

## Install Required Packages

In [None]:
# !pip install numpy
# !pip install pandas
# !pip install networkx
# !pip install scikit-learn
# !pip install matplotlib

In [None]:
# !pip install polyjuice_nlp
# !pip install torch
# !pip install evaluate
# !pip install bert_score
# !python -m spacy download en_core_web

## Imports

In [2]:
# general imports
import numpy as np
import pandas as pd
import networkx as nx

# Metric-related imports
import torch
from transformers import OpenAIGPTTokenizer, OpenAIGPTLMHeadModel
from transformers import GPT2Tokenizer, GPT2LMHeadModel
from evaluate import load
from joblib import Parallel, delayed
from pylev import levenshtein as lev_dist

In [1]:
%run functions/GPT2_functions.ipynb

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
%run functions/graph_functions.ipynb

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\jimli\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\jimli\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\jimli\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


In [3]:
bertscore = load("bertscore")

## Edge Updating Function

In [5]:
def update_edges(edges, substitutions, lr, baseline_metric_value, current_metric_value):
    """
    A function that takes as input a list of weighted edges along with other parameters, and uses
    these parameters to update the edge weights.

    :param edges: an iterable containing weighted edges as tuples
    :param substitutions: a dictionary with edges as keys, and their substitution occurence as values
    :param lr: float value, representing the learning rate for the weight updating
    :param baseline_metric_value: a float, representing the baseline evaluation metric value
    :param current_metric_value: a float, representing the current evaluation metric value
    :returns: a list of tuples, where each tuple represents an updated weighted edge
    """
    
    updated_edges = list()
    for (u, v, w) in edges:
        try:
            # get substitution occurences for each edge
            edge_subs = substitutions[(u,v)] 
            # updating formula
            new_w = w - lr * (baseline_metric_value - current_metric_value) / edge_subs   # - for minimizing, + for maximizing
            # add the updated edge to the list
            updated_edges.append((u, v, new_w))
        except KeyError:
            print("Something went wrong during updating of edges' weights")
    
    return updated_edges

## Metric Functions

In [38]:
def get_fluency(data, counter_data):
    """
    A function that takes as input the original and the counter data and returns the average fluency
    between the sentence pairs

    :param data: dataframe containing one column with the original data
    :param counter_data: dataframe containing one column with the counter data
    :returns: float value representing the average fluency
    """
    
    # extract sentences and counter-sentences from the data and check that they are of the same length
    sentences = [elem[0] for elem in data.values.tolist()]
    counter_sentences= [elem[0] for elem in counter_data.values.tolist()]

    assert len(sentences) == len(counter_sentences)
    
    # compute average fluency
    cuda = torch.cuda.is_available()
    model, tokenizer = model_init(cuda=cuda)
    avg_fluency, counter = 0, 0
    for x in zip(sentences, counter_sentences):
        if len(x[0]) <= 1024 and len(x[1]) <= 1024:
            avg_fluency += abs(sent_scoring(model, tokenizer, x[0], cuda=cuda)[0] - sent_scoring(model, tokenizer, x[1], cuda=cuda)[0])
            counter += 1

        # except IndexError:    # when sentence is too large, it cannot fit into the model, thus causing IndexError
        #     continue
        # except RuntimeError:
        #     continue
    print(counter)
    return avg_fluency / counter

In [None]:
# df = pd.DataFrame({
#     'sents': [
#         'A great man was standing in a tall and magnificent hill, gazing upon the sad and destructive army',
#         'The clever boy was wondering when the fat dog would return with the big stick',
#         'A small town was standing next to the large river and the tall building'
#     ]
# })

# counter_df = pd.DataFrame({
#     'sents': [
#         'A small man was standing in a large and magnificent hill, gazing upon the happy and destructive army',
#         'The dumb boy was wondering when the slim dog would return with the tinny stick',
#         'A big town was standing next to the tall river and the large building'
#     ]
    
# })

# get_fluency(df, counter_df)

In [18]:
def get_closeness(data, counter_data):
    """
    A function that takes as input the original and the counter data and returns the average levenshtein
    distance as a measure of closeness between the sentence pairs

    :param data: dataframe containing one column with the original data
    :param counter_data: dataframe containing one column with the counter data
    :returns: float value representing the average levenshtein distance
    """
        
    # extract sentences and counter-sentences from the data and check that they are of the same length
    sentences = [elem[0] for elem in data.values.tolist()]
    counter_sentences= [elem[0] for elem in counter_data.values.tolist()]

    assert len(sentences) == len(counter_sentences)

    # compute average levenshtein distance as a measurement of closeness
    avg_lev = sum(Parallel(n_jobs=-1)(delayed(lev_dist)(x[0], x[1]) for x in zip(sentences, counter_sentences))) / len(sentences)

    return avg_lev

In [8]:
def get_bertscore(data, counter_data):
    """
    A function that takes as input the original and the counter data and returns the average bertscore
    between the sentence pairs

    :param data: dataframe containing one column with the original data
    :param counter_data: dataframe containing one column with the counter data
    :returns: float value representing the average bertscore
    """
    
    # extract sentences and counter-sentences from the data and check that they are of the same length
    sentences = [elem[0] for elem in data.values.tolist()]
    counter_sentences= [elem[0] for elem in counter_data.values.tolist()]

    assert len(sentences) == len(counter_sentences)

    # compute average bertscore
    avg_bertscore = sum(bertscore.compute(predictions=counter_sentences, references=sentences, model_type="distilbert-base-uncased", nthreads=20)['f1']) / len(sentences)

    return avg_bertscore

In [14]:
def get_flip_rate(original_p, counter_p):
    """
    A function that takes as input the original predictions and the new ones, and returns the  
    flip-rate as a percentage.

    :param original_p: list containing the predictions for the original data
    :param counter_p: dataframe containing the predictions for the counter data
    :returns: dictionary containing model-related metrics
    """
                     
    # check that predictions and counter_predictions are of the same length
    assert len(original_p) == len(counter_p)
    
    # compute flip_rate
    flip_rate_percent = sum(Parallel(n_jobs=-1)(delayed(lambda x: x[0] != x[1])(x) for x in zip(original_p, counter_p))) / len(original_p)

    return flip_rate_percent

In [19]:
def get_fluency_bertscore(data, counter_data):
    fl, bs = get_fluency(data, counter_data),  1 - get_bertscore(data, counter_data)

    return 2 * fl * bs / (fl + bs)

In [6]:
# def generate_model_agnostic_metrics(data, counter_data):
#     """
#     A function that takes as input the original and the counter data and returns a dictionary with 
#     model-agnostic metrics such as closeness and fluency.

#     :param data: dataframe containing one column with the original data
#     :param counter_data: dataframe containing one column with the counter data
#     :returns: dictionary containing model-agnostic metrics
#     """
    
#     # extract sentences and counter-sentences from the data and check that they are of the same length
#     sentences = [elem[0] for elem in data.values.tolist()]
#     counter_sentences= [elem[0] for elem in counter_data.values.tolist()]

#     assert len(sentences) == len(counter_sentences)

#     sent_length = len(sentences)

#     # compute average levenshtein distance as a measurement of closeness
#     #avg_lev = sum(list(map(lambda x: lev_dist(x[0], x[1]), zip(sentences, counter_sentences)))) / sent_length

#     # compute average fluency
#     model, tokenizer = model_init()
#     #avg_fluency = sum(list(map(lambda x: sent_scoring(model, tokenizer, x)[0], counter_sentences))) / sent_length
#     avg_fluency = sum(
#         list(map(lambda x: abs(sent_scoring(model, tokenizer, x[0])[0] - sent_scoring(model, tokenizer, x[1])[0]), zip(sentences, counter_sentences)))
#     ) / len(sentences)

#     # compute average bertscore
#    # avg_bertscore = 1 - sum(bertscore.compute(predictions=counter_sentences, references=sentences, model_type="distilbert-base-uncased")['f1']) / sent_length
    
#     # create metrics dictionary
#     metrics = {
#         #'levenshtein': avg_lev,     # we want this to be as low as possible
#         'fluency': avg_fluency,     # we want this to be as low as possible (it is the difference |original_fluency - counter_fluency|)
#         #'bertscore': avg_bertscore  #  we want this to be as low as possible
#     }

#     return metrics

In [7]:
# def generate_model_related_metrics(original_p, counter_p):
#     """
#     A function that takes as input the original predictions and the new ones, and returns a dictionary with 
#     model-related metrics such as flip-rate.

#     :param original_p: list containing the predictions for the original data
#     :param counter_p: dataframe containing the predictions for the counter data
#     :returns: dictionary containing model-related metrics
#     """

#     # check that predictions and counter_predictions are of the same length
#     assert len(original_p) == len(counter_p)
    
#     # compute flip_rate
#     flip_rate_percent = sum(x[0] != x[1] for x in zip(original_p, counter_p)) / len(original_p)

#     # create metrics dictionary
#     metrics = {
#         'flip-rate': flip_rate_percent
#     }

#     return metrics

In [17]:
def get_counterfactual_metric(metrics):
    """
    A function that takes as input a dictionary containing evalutation metrics, and returns
    a combination of those metrics.

    :param metrics: dictionary containing different evaluation metrics such as fluency, flip-rate, etc.
    :returns: float value, computed as a combination of the metrics in the given dictionary
    """
    return metrics['fluency'] if metrics['fluency'] is not None else 0
    #return 2 / (1/metrics['bertscore'] + 1/metrics['fluency'])
    #return len(metrics) / sum(1/v for v in metrics.values())   # compute final metric as the harmonic mean of the given metrics

In [44]:
def get_baseline_metric(data, pos, eval_metric, model_required=False, preprocessor=None, model=None, antonyms=False):
    """
    A function that takes as input a dataframe with the textual data, and computes a metric based on a bipartite graph,
    where the edge weights represent the distance between words (nodes) as extracted from wordnet.

    :param data: pd.DataFrame() containing one column with the textual data
    :param pos: string that specifies which part-of-speech shall be considered for substitution (noun, verb, adv)
    :param eval_metric: a function that computes the metric which must be optimized during fine-tuning
    :param model_required: boolean value specifing whether a pretrained model is also required for the metric computation
    :param preprocessor: a custom class that implements the necessary preprocessing of the data
    :param model: a pretrained model on the dataset 
    :returns: a float value representing the computed evaluation metric
    """
    
    sents = [elem[0] for elem in data.values.tolist()]
    counter_sents, _, _, _= get_edits(sents, pos=pos, thresh=3, antonyms=antonyms)
    
    counter_data_df = pd.DataFrame({
        'counter_sents': counter_sents
    })

    # print('Generating Model Agnostic Metrics...')
    # metrics = generate_model_agnostic_metrics(data, counter_data_df)
    if model_required == False:
        return eval_metric(data, counter_data_df), counter_data_df

    else:
        # first process the original data and get model predictions
        processed_data = preprocessor.process(data)
        original_preds = model.predict(processed_data)
    
        # do the same but for the counterfactual-generated data
        processed_counter_data = preprocessor.process(counter_data_df)
        counter_preds = model.predict(processed_counter_data)
    
        return eval_metric(original_preds, counter_preds), counter_data_df
        
    # return get_counterfactual_metric(metrics), counter_data_df

## Graph-Related Functions

In [10]:
def create_graph(data, pos, antonyms=False):
    """
    A function that takes as input a dataframe and a part-of-speech tag, and creates a bipartite graph
    with the possible substitution words and their candidates.

    :param data: pd.DataFrame() containing one column with the textual data
    :param pos: string that specifies which part-of-speech shall be considered for substitution (noun, verb, adv)
    :param antonyms: boolean value specifing whether or not to use antonyms in the candidate substitutions 
    :returns: a dictionary containing the graph, along with other related features
    """

    sentences = [elem[0] for elem in data.values.tolist()]
    lst = None
    
    # use appropriate function based on pos to get the list of the specified pos words from the data
    if pos == 'adv':
        lst = create_attributes_list(sentences)
    elif pos == 'verb':
        lst = create_verb_list(sentences)
    elif pos == 'noun':
        lst = create_singular_list(sentences) 
    else:
        raise AttributeError("pos '{}' is not supported!".format(pos)) 
    
    weights = []
    syn0 = list(lst)
    syn1 = list(get_antonym_list(lst)) if antonyms else list(lst)
        
    all_syn0, d0, ind0 = get_synsets(syn0, pos=pos, return_index=True)
    all_syn1, d1, ind1= get_synsets(syn1, pos=pos return_index=True)
    
    print("Creating Node Names...")
    names0 = ['G0_'+str(i) for i in range(len(all_syn0))]  # give unique names for each synset of the two sets
    names1 = ['G1_'+str(i) for i in range(len(all_syn1))]

    word_to_node0 = dict()
    word_to_node1 = dict()
    for t in zip(names0, ind0):
        word_to_node0[syn0[t[1]]] = t[0]

    for t in zip(names1, ind1):
        word_to_node1[syn1[t[1]]] = t[0]
        
    
    # synset as key, word as val
    combinations_nodes = all_combinations(names0, names1)        # all combinations of names
    combinations_synsets = all_combinations(all_syn0, all_syn1)  # all combinations of synsets
    weights = [1] * len(combinations_nodes)

    print("Creating Bipartite Graph...")
    G, min_list_nodes = bipartite_graph(names0, names1, combinations_nodes, weights) # create bipartite graph

    graph_dict = {
        'graph': G,
        'min_list_nodes': min_list_nodes,
        'weights': weights,
        'd0': d0,
        'd1': d1,
        'comb_nodes': combinations_nodes,
        'comb_syn': combinations_synsets,
        'word_to_node0': word_to_node0,
        'word_to_node1': word_to_node1
    }

    return graph_dict

In [19]:
def generate_graph_matching(graph_dict):
    """
    A function that takes as input a dictionary containing a graph and other related features, and uses
    a minimum graph matching algorithm to return candidate substitutions, along with other graph features.

    :param graph_dict: a dictionary containing a bipartite graph and other related features
    :returns: a list of feasible substitutions, mappings of synsets to their words, and a tuple containing the graph, a min_list_nodes and the minimum matching
    """
    
    # unpack dictionary items
    G = graph_dict['graph']
    min_list_nodes = graph_dict['min_list_nodes']
    weights = graph_dict['weights']
    d0 = graph_dict['d0']
    d1 = graph_dict['d1']
    combinations_nodes = graph_dict['comb_nodes']
    combinations_synsets = graph_dict['comb_syn']

    # find min weight match
    print("Finding Minimum Match...")
    min_match = minimum_match(G, min_list_nodes)                                     
    match_tuple = dict_to_tuple(min_match)
    
    new_match=[]
    for i in match_tuple:
        new_match.append(tuple(sorted(i)))
        new_match = remove_duplicates(new_match)

    positions = pos_in_list(combinations_nodes, list(new_match))
    # substitution_synsets = []
    substitution_synsets = dict()
    print("Creating Substitution Synsets Dictionary...")
    for i in positions:
        # substitution_synsets.append((weights[i], combinations_synsets[i][0], combinations_synsets[i][1])) 
        substitution_synsets[d0[combinations_synsets[i][0]]] = d1[combinations_synsets[i][1]]
        substitution_synsets[d1[combinations_synsets[i][1]]] = d0[combinations_synsets[i][0]]
    # sum_similarities, avg_similarity, best_matched_synsets = total_graph_weight(positions, weights, combinations_synsets)
        
    return substitution_synsets, d0, d1, (G, min_list_nodes, new_match)

In [23]:
def generate_counterfactuals(graph_dict, data, pos):
    """
    A function that takes as input a dictionary containing graph information, along with a dataframe and a part-of-speech tag,
    and uses them to generate counterfactual edits from the data.

    :param graph_dict: a dictionary containing a bipartite graph and other related features
    :param data: pd.DataFrame() containing one column with the textual data
    :param pos: string that specifies which part-of-speech shall be considered for substitution (noun, verb, adv)
    :returns: a dataframe with the generated counterfactual data, a list of selected edges from the graph and a dictionary containing substitution occurrence
    """
    
    G = graph_dict['graph']
    w2n0 = graph_dict['word_to_node0']
    w2n1 = graph_dict['word_to_node1']
    sentences = [elem[0] for elem in data.values.tolist()] 
    
    # find best matching and generate edits
    substitution_synsets, d0, d1, g = generate_graph_matching(graph_dict)
    print("Generating Edits...")
    all_swaps, if_change, attr_counter, substitutions = external_swaps(sentences, pos, substitution_synsets, d0, d1, thresh=3)
    

    counter_data = pd.DataFrame({
        'counter_sents': all_swaps
    })

    subs_as_nodes = dict()
    for (k,v) in substitutions.items():
        try:
            subs_as_nodes[(w2n0[k[0]], w2n1[k[1]])] =  v
        except KeyError:
            subs_as_nodes[(w2n0[k[1]], w2n1[k[0]])] =  v

    selected_edges = []
    for (u,v) in subs_as_nodes.keys():
        w = G.get_edge_data(u, v, default=0)['weight']
        selected_edges.append((u, v, w))

    return counter_data, selected_edges, subs_as_nodes

In [43]:
def train_graph(graph_dict, data, pos, eval_metric, preprocessor=None, model=None, learning_rate=0.1, th=0.005, max_iterations=100, model_required=False, baseline_metric=None):
    """
    A function that represents the training process for the graph edges. It gets predictions for the original data
    then uses a graph approach to generate counter data and get predictions for them. To get the current_metric
    it compares the two predictions and based on those updates the weights of the selected edges.
    
    :param graph_dict: a dictionary containing the bipartite graph along with other variables and characteristics
    :param data: a dataframe containing the textual examples we will use to train the graph
    :param pos: a string specifing which part-of-speech shall be considered for substitutions (noun, verb, adv)
    :param eval_metric: a function that computes the metric which must be optimized during fine-tuning
    :param preprocessor: a custom class that implements the necessary preprocessing of the data
    :param model: a pretrained model on the dataset
    :param learning_rate: float value defining how fast or slow the edge weights will be updated
    :param th: float value defining a threshold, where if the difference |baseline - current| get smaller, the training stops
    :param max_iterations: integer value representing the maximum number of iterations for the training procedure
    :param model_required: boolean value for whether or not to compute model-related metrics
    :returns: the graph_dictionary with the fine-tuned (post-training) graph along with the rest of its features
    """
    
    # initialize baseline and current metric
    if baseline_metric is None:
        baseline_metric = get_baseline_metric(data, pos=pos, eval_metric=eval_metric, model_required=model_required, preprocessor=preprocessor, model=model)[0] 
    current_metric = baseline_metric + 2 * th   # initialize current_metric so that the dif |baseline-current| is bigger than th
    
    iterations = 0
    next_baseline_metric = baseline_metric
    while abs(current_metric - baseline_metric) >= th and iterations < max_iterations:
        print("ITERATION {}".format(iterations))

        updated_edges = []
        baseline_metric = next_baseline_metric

        while nx.is_bipartite(graph_dict['graph']):
            try:
                counter_data, selected_edges, substitutions = generate_counterfactuals(graph_dict, data, pos)
                
                if model_required == False:
                # compute current_metric valule
                    current_metric = eval_metric(data, counter_data)
                    
                else:
                    processed_counter_data = preprocessor.process(counter_data)
                    counter_preds = model.predict(processed_counter_data)
            
                    # compute model-related current_metric value
                    current_metric = eval_metric(original_preds, counter_preds) 
        
                # compute the final metric as a combination of the previously computed metrics
                # current_metric = get_counterfactual_metric(current_metrics_dict)
    
                g = graph_dict['graph']
                g.remove_edges_from(selected_edges)
                new_edges = update_edges(selected_edges, substitutions, learning_rate, baseline_metric, current_metric)
                
                graph_dict['graph'] = g
                updated_edges.extend(new_edges)
            except:
                graph_dict['graph'] = g
                break
            
        g = graph_dict['graph']
        # print(updated_edges)
        g.add_weighted_edges_from(updated_edges)
        graph_dict['graph'] = g

        # update baseline_metric value and iterations
        next_baseline_metric = min(baseline_metric, current_metric)
        iterations += 1

    return graph_dict

## Testing

In [26]:
# POS = 'adv'
# MAX_ITER = 3
# ANTONYMS = True

# df = pd.DataFrame({
#     'sents': [
#         'A great man was standing in a tall and magnificent hill, gazing upon the sad and destructive army',
#         'The clever boy was wondering when the fat dog would return with the big stick',
#         'A small town was standing next to the large river and the tall building'
#     ]
# })

# gd = create_graph(data=df, pos=POS, antonyms=ANTONYMS)
# trained_gd = train_graph(graph_dict=gd, data=df, pos=POS, max_iterations=MAX_ITER)

In [35]:
# x1 = """
# I went and saw this movie last night after being coaxed to by a few friends of mine. 
# I'll admit that I was reluctant to see it because from what I knew of Ashton Kutcher he was only able to do comedy.
# I was wrong. Kutcher played the character of Jake Fischer very well, and Kevin Costner played Ben Randall with such professionalism. 
# The sign of a good movie is that it can toy with our emotions. This one did exactly that. The entire theater (which was sold out) was 
# overcome by laughter during the first half of the movie, and were moved to tears during the second half. While exiting the theater I 
# not only saw many women in tears, but many full grown men as well, trying desperately not to let anyone see them crying. This movie was great,
# and I suggest that you go see it before you judge."""

# x2 = """
# Maybe I'm reading into this too much, but I wonder how much of a hand Hongsheng had in developing the film. I mean, when a story is told casting the
# main character as himself, I would think he would be a heavy hand in writing, documenting, etc. and that would make it a little biased.<br /><br />
# But...his family and friends also may have had a hand in getting the actual details about Hongsheng's life. I think the best view would have been told
# from Hongsheng's family and friends' perspectives. They saw his transformation and weren't so messed up on drugs that they remember everything.<br /><br />As
# for Hongsheng being full of himself, the consistencies of the Jesus Christ pose make him appear as a martyr who sacrificed his life (metaphorically, of course, 
# he's obviously still alive as he was cast as himself) for his family's happiness. Huh?<br /><br />The viewer sees him at his lowest points while still maintaining 
# a superiority complex. He lies on the grass coming down from (during?) a high by himself and with his father, he contemplates life and has visions of dragons at his
# window, he celebrates his freedom on a bicycle all while outstretching his arms, his head cocked to the side.<br /><br />It's fabulous that he's off of drugs now, but 
# he's no hero. He went from a high point in his career in acting to his most vulnerable point while on drugs to come back somewhere in the middle.<br /><br />This same 
# device is used in Ted Demme's "Blow" where the audience empathizes with the main character who is shown as a flawed hero.<br /><br />However, "Quitting" ("Zuotian") is a 
# film that is recommended, mostly for its haunting soundtrack, superb acting, and landscapes. But, the best part is the feeling that one gets when what we presume to be the
# house of Jia Hongsheng is actually a stage setting for a play. It makes the viewer feel as if Hongsheng's life was merely a play told in many difficult parts.
# """

In [36]:
# from datetime import datetime

# start = datetime.now()
# ld = lev_dist(x1, x2)
# print("Command Execution Time: {}".format(datetime.now() - start))

In [27]:
# counter_data, selected_edges, subs = generate_counterfactuals(trained_gd, df, POS)
# for i in range(df.shape[0]):
#     print("ORIGINAL:")
#     print(df['sents'][i])
#     print("COUNTER:")
#     print(counter_data['counter_sents'][i])
#     print("===============================================================================================================")

In [28]:
# final_metric = get_counterfactual_metric(generate_model_agnostic_metrics(df, counter_data))
# baseline_metric = get_baseline_metric(df, pos=POS)

# print("Baseline metric value: {}".format(baseline_metric))
# print("Fine-tuned metric value: {}".format(final_metric))
# print("Difference: {}".format(abs(baseline_metric - final_metric)))