There are 3 types of models analyzed for evaluation metrics based on this paper: https://arxiv.org/pdf/1904.02295.pdf


1. Cross-aligned autoencoder (CAAE)
2. Adversarilly Regularized Autoencoder (ARAE)
3. Delete and Retrieve Models (DAR)

I will be implementing the evaluation metric methodology and code from this paper for our project. 

There are 3 main metrics discussed in this paper. 
1. Style Transfer Intensity
2. Naturalness
3. Content_preservation 

And these 3 metrics are both evaluated automatically and by human readers and stored in the evaluation folder. The automated metrics are performed at the sentence and corpus level while human evaluation is only at the sentence level. The numbers associated with each model's file name corresponds to the value of the hyperparameter passed in (e.g. "ARAE_lambda_1.npy", lambda hyperparameter = 1).  




In [None]:
from google.colab import drive
import os 
import numpy as np 
import csv 
import pandas as pd
import torch
drive.mount('/content/drive', force_remount=True)

# Folder name
folderName = 'UMass/Spring 2022/COMPSCI685/project/'
assert folderName is not None, "[Error] Please enter folder name."


# Load python files from our folder
import sys
sys.path.append('/content/drive/My Drive/{}'.format(folderName))




Mounted at /content/drive


**Content Preservation**

WMD is a metric that calculates the similarity between word embeddings. and is used to calculate content preservation. The code for this is in "content_preservation.py". As of right now, if we used unmasked content preservation, we are getting content preservation scores of all infinity except for 1 style transfer sentence from our output. Trying to perform this with masked style input and shared lexicon between input and output results in a simple import error that should not occur. 



**Aggregated Metric**


Professor's paper and his evaluation metric. 

In [None]:

#Store output and binary label 
labeled_predictions = '/content/drive/MyDrive/UMass/Spring 2022/COMPSCI685/project /Sanity Check/Optimization/002/labeled_en_predictions.csv'
at_start = True
output_metrics = {}
with open(labeled_predictions) as csv_file:
  csv_reader = csv.reader(csv_file, delimiter=',')
  for row in csv_reader:
    if at_start == True:
      at_start = False
      continue
    output_metrics[row[1]] = {}
    output_metrics[row[1]]['acc'] = row[2]

print(output_metrics)

{'a jumbled confession can only receive A jumbled absolution.': {'acc': '0'}, "I love the rich Capulet's daughter.": {'acc': '1'}, ', but we must have you to marry us.': {'acc': '1'}, "I'll tell thee more in anon how and where we met, How we fell in love, how we exchanged promises, And how we have engaged ourselves, But now I pray thee, give me leave to marry us.": {'acc': '1'}, ', Holy Saint Francis, this is a changeable!': {'acc': '1'}, ', have you given up so suddenly on Rosaline, whom you loved so quickly?': {'acc': '0'}, 'then young men love with their eyes, not with their hearts.': {'acc': '1'}, 'the groans you made of me still Rings in my old ears.': {'acc': '0'}, "'tis an old tear that is not washed away yet.": {'acc': '1'}, 'if ever thou wast, and this grief was thine, Thou didst all for Rosalind.': {'acc': '1'}, 'thou art changed?': {'acc': '1'}, ', repeat this after me : women can not be true when men are so unreliable.': {'acc': '0'}, 'Often didst thou chide for loving Rosa

In [None]:
!pip install -U sentence-transformers

Collecting sentence-transformers
  Downloading sentence-transformers-2.2.0.tar.gz (79 kB)
[K     |████████████████████████████████| 79 kB 6.6 MB/s 
[?25hCollecting transformers<5.0.0,>=4.6.0
  Downloading transformers-4.19.0-py3-none-any.whl (4.2 MB)
[K     |████████████████████████████████| 4.2 MB 45.9 MB/s 
Collecting sentencepiece
  Downloading sentencepiece-0.1.96-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[K     |████████████████████████████████| 1.2 MB 71.4 MB/s 
[?25hCollecting huggingface-hub
  Downloading huggingface_hub-0.6.0-py3-none-any.whl (84 kB)
[K     |████████████████████████████████| 84 kB 3.9 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 60.4 MB/s 
[?25hCollecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("cointegrated/roberta-large-cola-krishna2020")

model = AutoModelForSequenceClassification.from_pretrained("cointegrated/roberta-large-cola-krishna2020")



Downloading:   0%|          | 0.00/289 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/628 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/780k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.32G [00:00<?, ?B/s]

In [None]:
path = "/content/drive/MyDrive/UMass/Spring 2022/COMPSCI685/project /Sanity Check/data/Testing/generated_and_actual_text.csv"
df = pd.read_csv(path)
df

Unnamed: 0,Actual Text,Generated Text
0,Riddling confession finds but riddling shrift .,a jumbled confession can only receive A jumble...
1,Then plainly know my heart's dear love is set ...,I love the rich Capulet's daughter.
2,"As mine on hers , so hers is set on mine , And...",", but we must have you to marry us."
3,"When and where and how We met , we wooed and m...",I'll tell thee more in anon how and where we m...
4,"Holy Saint Francis , what a change is here !",", Holy Saint Francis, this is a changeable!"
...,...,...
1457,That's my good son .,"'tis good, my boy."
1458,"But where hast thou been , then ?",", but where hast thou been?"
1459,I'll tell thee ere thou ask it me again .,ere you ask me again.
1460,Both our remedies Within thy help and holy phy...,thou hast sacred power to cure both.


In [None]:
from tqdm import tqdm
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device
for _, row in tqdm(df.iterrows()):
  sentences = list(row)

  sentences = sentences[1]
  tokenized_generated = tokenizer(sentences,return_tensors="pt")
  with torch.no_grad():
    logits = model(**tokenized_generated).logits

  predicted_class_id = logits.argmax().item()
  if model.config.id2label[predicted_class_id] == 'LABEL_1':
    output_metrics[row[1]]['fluency'] = 0
  else:
    output_metrics[row[1]]['fluency'] = 1



5it [00:02,  2.12it/s]


KeyboardInterrupt: ignored

In [None]:
#Calculating parallel J(ACC,SIM,FL) score

import csv
simi_file_path = '/content/drive/MyDrive/UMass/Spring 2022/COMPSCI685/project /Sanity Check/data/Testing/similarity_score.csv'
labeled_predictions_file_path = '/content/drive/MyDrive/UMass/Spring 2022/COMPSCI685/project /Sanity Check/Optimization/002/labeled_en_predictions.csv'
fluency_file_path = '/content/drive/MyDrive/UMass/Spring 2022/COMPSCI685/project /Sanity Check/data/Testing/fluency.csv'
at_start = True
num_test = None
metric_store = []
corpus_metric = 0


with open(simi_file_path) as simi_file, open(labeled_predictions_file_path) as label_file, open(fluency_file_path) as fluency_file:
  simi_reader = csv.reader(simi_file,delimiter=',')
  label_reader = csv.reader(label_file,delimiter=',')
  fluency_reader = csv.reader(fluency_file,delimiter=',')


  for simi_row,label_row,fluency_row in zip(simi_reader,label_reader,fluency_reader):
    if at_start == True:
      at_start = False
      continue
    simi = simi_row[1]
    label = label_row[2]
    fluency = None 
    if fluency_row[1] == 1:
      fluency = 0
    else:
      fluency = 1
    metric_store.append([simi,label,fluency])
  
for i in range(len(metric_store)):
  corpus_metric += (float(metric_store[i][0])*100 * float(metric_store[i][1]) * float(metric_store[i][2]))

corpus_metric /= len(metric_store)

print("J(ACC,SIM,FL) of our parallel data set is {}".format(corpus_metric))




J(ACC,SIM,FL) of our parallel data set is 52.847397639466486


In [None]:
#Calculating psuedo-parallel J(ACC,SIM,FL) score

simi_file_path = '/content/drive/MyDrive/UMass/Spring 2022/COMPSCI685/project /Sanity Check/data/Testing/pseudo_parallel/pseudo_similarity_score.csv'
labeled_predictions_file_path = '/content/drive/MyDrive/UMass/Spring 2022/COMPSCI685/project /Sanity Check/data/Testing/pseudo_parallel/labeled_pseudo_parallel_en_predictions.csv'
fluency_file_path = '/content/drive/MyDrive/UMass/Spring 2022/COMPSCI685/project /Sanity Check/data/Testing/pseudo_parallel/pseudo_fluency.csv'


at_start = True
num_test = None
metric_store = []
corpus_metric = 0


with open(simi_file_path) as simi_file, open(labeled_predictions_file_path) as label_file, open(fluency_file_path) as fluency_file:
  simi_reader = csv.reader(simi_file,delimiter=',')
  label_reader = csv.reader(label_file,delimiter=',')
  fluency_reader = csv.reader(fluency_file,delimiter=',')


  for simi_row,label_row,fluency_row in zip(simi_reader,label_reader,fluency_reader):
    if at_start == True:
      at_start = False
      continue
    simi = simi_row[1]
    label = label_row[2]
    fluency = None 
    if fluency_row[1] == 1:
      fluency = 0
    else:
      fluency = 1
    metric_store.append([simi,label,fluency])
  
for i in range(len(metric_store)):
  corpus_metric += (float(metric_store[i][0])*100 * float(metric_store[i][1]) * float(metric_store[i][2]))

corpus_metric /= len(metric_store)

print("J(ACC,SIM,FL) of our pseudo-parallel data set is {}".format(corpus_metric))

J(ACC,SIM,FL) of our pseudo-parallel data set is 27.629975075376205
