# Evaluation Metrics
The metrics we will use are:

**1. Style Transfer Intensity (STI)**

The STI metric is used to measure how much a
style transfer model has changed the style of a text sample.

**2. Content Preservation Score (CPS)**

The CPS metric is used to measure how well a
style transfer model preserves the original content of a text sample.


In [1]:
import pandas as pd
from scipy.stats import wasserstein_distance
from sentence_transformers import SentenceTransformer
from scipy.spatial.distance import cosine
import numpy as np

In [2]:
# Load the CSV file
file_path = '../../data/data_for_eval.csv'
df = pd.read_csv(file_path)

# Ensure the column names match CSV file's column names
source_label_0_col = 'source_label_0'
source_label_1_col = 'source_label_1'
target_label_0_col = 'target_label_0'
target_label_1_col = 'target_label_1'
predicted_label_0_col = 'predicted_label_0'  
predicted_label_1_col = 'predicted_label_1'

## For Style Transfer Intensity (STI) calculation using Earth Mover's Distance
1. EMD Calculation

EMD is calculated between the style distributions (probabilities) of the source text and the predicted text, as well as between the target text and the predicted text.

2. Style Distributions

The style distribution for each text type (source, target, predicted) is represented by the probabilities that the text is neutral or subjective. These are stored in the columns like source_label_0, source_label_1, etc.

3. EMD for Source-Predicted

For each pair of source and predicted texts, EMD measures how much the style of the text has shifted after the style transfer. A lower EMD indicates a smaller shift, suggesting that the predicted text retains much of the source text's style.

4. EMD for Target-Predicted

Similarly, EMD between the target and predicted texts measures how close the style of the predicted text is to the desired target style. A lower EMD here indicates that the predicted text closely matches the target style.

In [3]:
# Calculate EMD for Style Transfer Intensity (STI)

# Here, 'source_label_0' and 'source_label_1' are the probabilities for the source text being neutral and subjective, respectively
# Similarly for 'target_label_0', 'target_label_1', 'predicted_label_0', and 'predicted_label_1'

emd_source_predicted = [wasserstein_distance([row[source_label_0_col], row[source_label_1_col]],
                                              [row[predicted_label_0_col], row[predicted_label_1_col]])
                        for index, row in df.iterrows()]
emd_target_predicted = [wasserstein_distance([row[target_label_0_col], row[target_label_1_col]],
                                              [row[predicted_label_0_col], row[predicted_label_1_col]])
                        for index, row in df.iterrows()]

# Calculate Content Preservation Score
model = SentenceTransformer('bert-base-nli-mean-tokens')
source_embeddings = model.encode(df['source_text'].tolist())
predicted_embeddings = model.encode(df['predictions'].tolist())

## For Content Preservation Score (CPS) calculation

**1. Compute Sentence Embeddings**

SentenceTransformer Model: The code uses a pre-trained model from the sentence-transformers library, specifically 'bert-base-nli-mean-tokens'. This model is designed to produce meaningful sentence embeddings for a wide range of texts.

Embeddings for Source and Predicted Texts: The model encodes both the source and predicted texts, converting them into high-dimensional vectors (embeddings) that represent their semantic content.

**2. Calculate Cosine Similarity for Content Preservation**

Cosine Similarity: This metric measures the cosine of the angle between two vectors. In the context of sentence embeddings, a higher cosine similarity indicates greater semantic similarity between texts.

Iterative Comparison: The code iterates over each pair of source and predicted embeddings, calculating the cosine similarity for each pair. This value ranges from -1 to 1, where 1 means identical directionality (high semantic similarity), 0 indicates orthogonality (no similarity), and -1 implies completely opposite directionality.

In [4]:
content_scores = [1 - cosine(source_emb, pred_emb) 
                  if not np.isnan(cosine(source_emb, pred_emb)) else 0
                  for source_emb, pred_emb in zip(source_embeddings, predicted_embeddings)]

In [5]:
# Combine results in a new DataFrame
evaluation_df = pd.DataFrame({
    "Source Text": df['source_text'],
    "Target Text": df['target_text'],
    "Predicted Text": df['predictions'],
    "EMD Source-Predicted": emd_source_predicted,
    "EMD Target-Predicted": emd_target_predicted,
    "Content Preservation Score": content_scores
})

In [6]:
print(evaluation_df.head())

                                         Source Text  \
0  in april 2009 a brazilian human rights group, ...   
1  the 51 day standoff and ensuing murder of 76 m...   
2  mark oaten (born 8 march 1964, watford) is a d...   
3  another infamous period of colonisation in anc...   
4  photo sequence of astonishing 2005 chicagoland...   

                                         Target Text  \
0  in april 2009 a brazilian human rights group, ...   
1  the 51 day standoff and ensuing deaths of 76 m...   
2  mark oaten (born 8 march 1964, watford) is a l...   
3  another period of colonisation in ancient time...   
4  photo sequence of 2005 chicagoland crash with ...   

                                      Predicted Text  EMD Source-Predicted  \
0  in april 2009 a brazilian human rights group, ...              0.000000   
1  the 51 day standoff and ensuing murder of 76 m...              0.000000   
2  mark oaten (born 8 march 1964, watford) is a l...              0.037568   
3  another per

In [7]:
# Save the evaluation DataFrame to a new CSV file
evaluation_csv_path = "../../data/evaluated.csv"
evaluation_df.to_csv(evaluation_csv_path, index=False)

print("Evaluation data saved to:", evaluation_csv_path)

Evaluation data saved to: ../../data/evaluated.csv


# Refined Evaluation metrics