## Self BLEU
A high Self-BLEU score indicates that the generated texts are very similar to each other, suggesting low diversity, while a lower Self-BLEU score suggests higher diversity, therefore we report 1 - self Bleu which reverses this for a more intuitive reading. Thus the higher the reporte score the higher the diversity. 

The weights parameter in sentence_bleu is set to give equal importance to 1-gram, 2-gram, 3-gram, and 4-gram matches.

In [6]:
from nltk.translate.bleu_score import sentence_bleu

def calculate_self_bleu(texts):
    """
    Calculate the Self-BLEU score for a set of texts.
    
    Parameters:
    - texts (list of str): The set of generated texts to be evaluated.
    
    Returns:
    - float: The average Self-BLEU score of the texts.
    """
    scores = []
    for i, candidate in enumerate(texts):
        # Consider all other texts as references for the current candidate text
        references = [texts[j].split() for j in range(len(texts)) if i != j]
        candidate_tokens = candidate.split()
        # Calculate the BLEU score for this text against all others
        score = sentence_bleu(references, candidate_tokens, weights=(0.25, 0.25, 0.25, 0.25))
        scores.append(score)
    
    # Calculate the average score across all texts
    average_score = sum(scores) / len(scores) if scores else 0
    return average_score


In [7]:

# Example text 
#PROMPT: Generate a Story about love.
#gpt3.5
file_path1 = "/Users/Vas/Documents/Coding_Projects/BA_Experiment_Tests/Metrics/sample1.txt"
#gpt4
file_path2 = "/Users/Vas/Documents/Coding_Projects/BA_Experiment_Tests/Metrics/sample2.txt"

with open(file_path1, 'r', encoding="utf-8") as file:
    text1 = file.read()
with open(file_path2, 'r', encoding="utf-8") as file:
    text2 = file.read()

generated_text1 = text1.split('.')
generated_text2 = text2.split('.')
self_bleu_score1 = calculate_self_bleu(generated_text1)
self_bleu_score2 = calculate_self_bleu(generated_text2)
print(f'Self-BLEU score 1: {1-self_bleu_score1}')
print(f'Self-BLEU score 2: {1-self_bleu_score2}')


Self-BLEU score 1: 0.9426469781691708
Self-BLEU score 2: 0.9874474642014096
