# Overlap metrics

This section contains the evaluation with the overlap metrics (ROUGE, BLEU-1, 3, and 4) of the generations obtained with the various models. It also contains the selection of the CNs to compose the best datasets (Best_LM, Best_D, Best_LM+D).

In [None]:
!pip install transformers
!pip install datasets
!pip install rouge_score
!pip install bleu

In [None]:
import os
import pandas as pd
from functools import reduce
from evaluate import *

In [None]:
# the dataset with generated data includes one column per generation (column names should be model_decodingnr, where nr is the number of the generated sequence), one reference column to calculate overlap and one index column
generations = pd.read_csv(path_to_generated_data)

Unnamed: 0,index,CN_ed,DialoGPT_bs0,DialoGPT_bs1,DialoGPT_bs2,DialoGPT_bs3,DialoGPT_bs4,DialoGPT_kp0,DialoGPT_kp1,DialoGPT_kp2,...,T5_tk0,T5_tk1,T5_tk2,T5_tk3,T5_tk4,T5_tp0,T5_tp1,T5_tp2,T5_tp3,T5_tp4
0,746,Are you suggesting that the 'Prevent' programm...,There is no evidence that Muslims are being 'i...,The vast majority of Muslims live in peaceful ...,The vast majority of Muslims live in peaceful ...,The vast majority of Muslims live in peaceful ...,The vast majority of Muslims live in peaceful ...,The UK government does not have a religious id...,You may think this is a case of 'limiting' the...,"I am not sure about Islam, but I do not think ...",...,This is a story about British society who is b...,Why is it that the U.K. is being islamized? Wh...,Many Muslims are not islamic today and it is h...,Most British citizens are well aware that thei...,The UK government is trying to develop a stron...,What if we were to'submit to every whim of Isl...,What exactly do you mean by being a muslim? Wh...,How about you discuss Islam?,"The British Government is being ""Islamized"" ve...",Is it possible that you are not concerned abou...


In [None]:
# generations are evaluated and ranked
evaluated_data, ranked_data = get_evaluated_and_ranked_dfs(generations[:6])

In [None]:
models = ['DialoGPT', 'T5', 'BART', 'BERT', 'GPT-2']
decodings = ['bs', 'tp', 'tk', 'kp']
model_deco = [a_ + '_' + b_ for a_ in models for b_ in decodings]

In [None]:
# Best_LM dataset (for each HS, one CN per model is selected)
best_lm = create_best_dataset(generations[:6], ranked_data, evaluated_data, models)
best_lm.head(1)

Unnamed: 0,index,CN_ed,selected_CN,selected_config,rouge,bleu-1,bleu-3,bleu-4
0,746,Are you suggesting that the 'Prevent' programm...,There is no evidence that Muslims are being 'i...,DialoGPT_bs0,0.129032,0.25,0.0,0.0


In [None]:
# Best_D dataset (one CN per decoding mechanism is selected)
best_d = create_best_dataset(generations[:6], ranked_data, evaluated_data, decodings)
best_d.head(1)

Unnamed: 0,index,CN_ed,selected_CN,selected_config,rouge,bleu-1,bleu-3,bleu-4
0,746,Are you suggesting that the 'Prevent' programm...,"Islam is a religion of peace, and the British ...",BART_bs0,0.193548,0.25,0.0,0.0


In [None]:
# Best_LM+D dataset (for each HS, one CN per model-decoding combination is selected)
best_lmd = create_best_dataset(generations[:6], ranked_data, evaluated_data, model_deco)
best_lmd.head(1)

Unnamed: 0,index,CN_ed,selected_CN,selected_config,rouge,bleu-1,bleu-3,bleu-4
0,746,Are you suggesting that the 'Prevent' programm...,There is no evidence that Muslims are being 'i...,DialoGPT_bs0,0.129032,0.25,0.0,0.0


# [Syntactic complexity](https://spacy.io/usage/linguistic-features)

- Maximum Syntactic Depth (MSD): the maximum depth among the dependency trees calculated over each sentence composing a CN.
- Average Syntactic Depth (ASD): the depth of the sentences in each CN.
- Number of Sentences (NST)

In [None]:
df['msd'] = df.apply(lambda x: get_max_sd(x['CN']), axis=1)
df['asd'] = df.apply(lambda x: get_avg_sd(x['CN']), axis=1)
df['nst'] = df.apply(lambda x: get_nst(x['CN']), axis=1)

# Novelty

The novelty script takes in input two txt files containing (i) the reference data (ii) the generated data.

In [None]:
%cd /path/to/novelty/script/and_data

/content/drive/MyDrive/from_ACL22/repository/evaluation


In [None]:
!python novelty.py gold_data.txt generated_data.txt