## Semantic diversity

According to recent studies (Tevet and Berant, 2021; Stasaski and Hearst, 2022), just lexical-level evaluation metrics often fail to capture semantic diversity, since texts including similar words can have different semantics and texts with different words can have similar semantics (Yarats and Lewis, 2018).

We tackle this problem by transforming sentences into semantically meaningful sentence embeddings using Sentence-BERT (Reimers and Gurevych, 2019).

We quantify semantic diversity as the dispersion of sentence embeddings over the semantic space. The dispersion is measured by either the average pairwise cosine-distance of all embedding vectors (D_sem_p) or the mean cosine distance of each embedding vector to the centroid
(D_sem_c).


In [1]:
from sentence_transformers import SentenceTransformer
import numpy as np
from scipy.spatial.distance import cosine

def calculate_semantic_diversity(texts, measure='average'):
    """
    Calculate semantic diversity for a set of texts.
    
    Parameters:
    - texts (list of str): The set of texts to be evaluated.
    - measure (str): Type of semantic diversity measure: 'average' for average pairwise distance,
                     or 'centroid' for distance to centroid.
                     
    Returns:
    - float: The semantic diversity score.
    """
    # Load the pre-trained Sentence-BERT model
    model = SentenceTransformer('all-MiniLM-L6-v2')
    
    # Convert texts into embeddings
    embeddings = model.encode(texts)
    
    # Calculate semantic diversity based on the specified measure
    if measure == 'average':
        # Calculate average pairwise cosine distance
        distances = []
        for i in range(len(embeddings)):
            for j in range(i + 1, len(embeddings)):
                distance = cosine(embeddings[i], embeddings[j])
                distances.append(distance)
        semantic_diversity = np.mean(distances)
    elif measure == 'centroid':
        # Calculate mean cosine distance from each embedding to the centroid
        centroid = np.mean(embeddings, axis=0)
        distances = [cosine(embedding, centroid) for embedding in embeddings]
        semantic_diversity = np.mean(distances)
    else:
        raise ValueError("Measure must be 'average' or 'centroid'")
    
    return semantic_diversity


  from .autonotebook import tqdm as notebook_tqdm


In [4]:

# Example text 
#PROMPT: Generate a Story about love.
#gpt3.5
file_path1 = "/Users/Vas/Documents/Coding_Projects/BA_Experiment_Tests/Metrics/sample1.txt"
#gpt4
file_path2 = "/Users/Vas/Documents/Coding_Projects/BA_Experiment_Tests/Metrics/sample2.txt"

with open(file_path1, 'r', encoding="utf-8") as file:
    texts1 = file.read().split('.')
    print(texts1)
with open(file_path2, 'r', encoding="utf-8") as file:
    texts2 = file.read().split('.')


print("Semantic diversity (average) Text 1:", calculate_semantic_diversity(texts1, 'average'))
print("Semantic diversity (average) Text 2:", calculate_semantic_diversity(texts2, 'average'))

print("Semantic diversity (centroid) Text 1:", calculate_semantic_diversity(texts1, 'centroid'))
print("Semantic diversity (centroid) Text 2:", calculate_semantic_diversity(texts2, 'centroid'))


['Once upon a time, in a quaint little village nestled between rolling hills and whispering forests, there lived two souls destined for each other', ' Their names were Emily and Jack', ' Emily was a lively young woman with a heart as boundless as the open sky, while Jack was a gentle soul, with eyes that held the wisdom of ages', '\n\nTheir paths first crossed on a sunny afternoon in the village square', ' Emily had been selling her handcrafted jewelry at the weekly market, her creations shimmering in the sunlight like fragments of a rainbow', ' Jack, a painter by trade, was captivated by her vibrant spirit and the way she seemed to breathe life into the world around her', '\n\nAs weeks turned into months, their chance encounters at the market grew into intentional meetings', ' They shared stories and dreams beneath the shade of an ancient oak tree, their laughter mingling with the rustle of leaves in the breeze', ' With each passing day, their bond deepened, like the roots of the tree