# Sentence Style Transfer: T5 vs GPT-4o Mini

This notebook demonstrates English sentence style transfer using a fine-tuned T5 model and OpenAI GPT-4o Mini. It covers informal-to-formal and formal-to-informal transfer, BLEU score evaluation, and LLM-based judging.

## 1. Introduction

Style transfer is the task of rewriting text from one style (e.g., informal) to another (e.g., formal) while preserving meaning. This notebook compares a fine-tuned T5 model and GPT-4o Mini for this task.

## 2. Setup and Imports

Install and import all required libraries for model loading, inference, and evaluation.

In [1]:
!pip install torch
!pip install dotenv
!pip install openai
!pip install transformers



In [4]:
!pip install --upgrade torch

Collecting torch
  Downloading torch-2.2.2-cp39-none-macosx_10_9_x86_64.whl (150.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m150.8/150.8 MB[0m [31m13.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: torch
  Attempting uninstall: torch
    Found existing installation: torch 1.13.0
    Uninstalling torch-1.13.0:
      Successfully uninstalled torch-1.13.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchvision 0.14.0 requires torch==1.13.0, but you have torch 2.2.2 which is incompatible.
ctgan 0.6.0 requires torch<2,>=1.8.0, but you have torch 2.2.2 which is incompatible.[0m[31m
[0mSuccessfully installed torch-2.2.2


In [4]:
!pip install --upgrade transformers



In [3]:
# Install required packages if running in Colab or a fresh environment
# !pip install transformers openai nltk python-dotenv

import os
import openai
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
import torch
print(torch.__version__)
import transformers
print(transformers.__version__)

2.8.0
4.57.6


In [3]:
!pip uninstall -y transformers
!pip install transformers

Found existing installation: transformers 4.29.2
Uninstalling transformers-4.29.2:
  Successfully uninstalled transformers-4.29.2
Collecting transformers
  Obtaining dependency information for transformers from https://files.pythonhosted.org/packages/52/f3/ac976fa8e305c9e49772527e09fbdc27cc6831b8a2f6b6063406626be5dd/transformers-5.0.0-py3-none-any.whl.metadata
  Downloading transformers-5.0.0-py3-none-any.whl.metadata (37 kB)
Collecting huggingface-hub<2.0,>=1.3.0 (from transformers)
  Obtaining dependency information for huggingface-hub<2.0,>=1.3.0 from https://files.pythonhosted.org/packages/54/89/bfbfde252d649fae8d5f09b14a2870e5672ed160c1a6629301b3e5302621/huggingface_hub-1.3.7-py3-none-any.whl.metadata
  Downloading huggingface_hub-1.3.7-py3-none-any.whl.metadata (13 kB)
Collecting tokenizers<=0.23.0,>=0.22.0 (from transformers)
  Obtaining dependency information for tokenizers<=0.23.0,>=0.22.0 from https://files.pythonhosted.org/packages/2e/47/174dca0502ef88b28f1c9e06b73ce33500eed

## 3. Load Models

Load the fine-tuned T5 model and set up the GPT-4o Mini API for style transfer.

In [None]:
# Load fine-tuned T5 model
import os
from dotenv import load_dotenv
load_dotenv()  # Make sure this is called before accessing the API key
import openai
model_name = 'prithivida/informal_to_formal_styletransfer'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Set up OpenAI API key (ensure your .env file is set or set the key here)
openai.api_key = os.getenv("OPENAI_API_KEY")
print('OpenAI API Key loaded:', openai.api_key is not None)

## 4. Define Helper Functions

Define functions for style transfer, BLEU score calculation, and LLM-based judging.

In [6]:
def t5_style_transfer(sentence, direction="informal_to_formal", max_length=64):
    if direction == "informal_to_formal":
        input_text = f"transfer informal to formal: {sentence}"
    else:
        input_text = f"transfer formal to informal: {sentence}"
    inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=max_length)
    outputs = model.generate(**inputs, max_length=max_length, num_beams=4, early_stopping=True)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

def openai_style_transfer(sentence, direction="informal_to_formal", model_name="gpt-4o-mini"):
    if not openai.api_key:
        return "[OpenAI API key not set]"
    if direction == "informal_to_formal":
        system_prompt = "You are a helpful assistant that rewrites informal English sentences into formal English."
        user_prompt = f"Rewrite this sentence in a formal style: {sentence}"
    else:
        system_prompt = "You are a helpful assistant that rewrites formal English sentences into informal English."
        user_prompt = f"Rewrite this sentence in an informal style: {sentence}"
    response = openai.chat.completions.create(
        model=model_name,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        max_tokens=128
    )
    return response.choices[0].message.content.strip()

def bleu_scores(reference, prediction):
    smoothie = SmoothingFunction().method4
    scores = {}
    for n in range(1, 5):
        weights = tuple([1.0 / n] * n + [0.0] * (4 - n))
        scores[f"BLEU-{n}"] = sentence_bleu([reference], prediction, weights=weights[:n], smoothing_function=smoothie)
    return scores

def llm_judge(original, t5_out, llm_out, direction):
    prompt = f"""You are an expert in English style transfer. Given the following original sentence and two outputs from different models, judge which output is the best {direction.replace('_', ' ')} version. Explain your reasoning and give a score (1-10) for each output. Respond in no more than two lines.\n\nOriginal: {original}\nFine-tuned T5: {t5_out}\nGPT-4o Mini: {llm_out}\n"""
    if not openai.api_key:
        return "[OpenAI API key not set]"
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "system", "content": "You are a helpful judge for style transfer outputs."},
                  {"role": "user", "content": prompt}],
        max_tokens=256
    )
    return response.choices[0].message.content.strip()

## 5. Example Sentences

Set up example sentences for informal-to-formal and formal-to-informal style transfer.

In [7]:
# Example sentences for both directions
informal_example = "yo, gimme the doc quick!"
formal_example = "Could you please provide the document at your earliest convenience?"

examples = [
    (informal_example, "informal_to_formal"),
    (formal_example, "formal_to_informal")
]

## 6. Run Style Transfer

Apply both models to the example sentences and collect their outputs.

In [8]:
# Store results for each example
results = []
for sent, direction in examples:
    t5_out = t5_style_transfer(sent, direction)
    llm_out = openai_style_transfer(sent, direction)
    results.append({
        "input": sent,
        "direction": direction,
        "t5_out": t5_out,
        "llm_out": llm_out
    })
results

[{'input': 'yo, gimme the doc quick!',
  'direction': 'informal_to_formal',
  't5_out': 'You should see the doctor quickly.',
  'llm_out': '[OpenAI API key not set]'},
 {'input': 'Could you please provide the document at your earliest convenience?',
  'direction': 'formal_to_informal',
  't5_out': 'Could you please provide the document at your earliest convenience?',
  'llm_out': '[OpenAI API key not set]'}]

## 7. Evaluate with BLEU Scores

Compute BLEU-1 to BLEU-4 scores for each model's output compared to the reference.

In [10]:
# Calculate BLEU scores for each result
bleu_results = []
for r in results:
    ref = r["input"].split()
    t5_bleu = bleu_scores(ref, r["t5_out"].split())
    llm_bleu = bleu_scores(ref, r["llm_out"].split())
    bleu_results.append({
        "input": r["input"],
        "direction": r["direction"],
        "t5_bleu": t5_bleu,
        "llm_bleu": llm_bleu
    })
bleu_results

[{'input': 'yo, gimme the doc quick!',
  'direction': 'informal_to_formal',
  't5_bleu': {'BLEU-1': 0.16666666666666669,
   'BLEU-2': 0.07728215553472559,
   'BLEU-3': 0.051142590811078435,
   'BLEU-4': 0.03759340464156993},
  'llm_bleu': {'BLEU-1': 0, 'BLEU-2': 0, 'BLEU-3': 0, 'BLEU-4': 0}},
 {'input': 'Could you please provide the document at your earliest convenience?',
  'direction': 'formal_to_informal',
  't5_bleu': {'BLEU-1': 1.0, 'BLEU-2': 1.0, 'BLEU-3': 1.0, 'BLEU-4': 1.0},
  'llm_bleu': {'BLEU-1': 0, 'BLEU-2': 0, 'BLEU-3': 0, 'BLEU-4': 0}}]

## 8. LLM Judge Evaluation

Let GPT-4o Mini act as a judge to compare the outputs and provide a score for tone and style.

In [12]:
# Judge each pair using GPT-4o Mini
judge_results = []
for r in results:
    judge = llm_judge(r["input"], r["t5_out"], r["llm_out"], r["direction"])
    judge_results.append({
        "input": r["input"],
        "direction": r["direction"],
        "judge": judge
    })
judge_results

[{'input': 'yo, gimme the doc quick!',
  'direction': 'informal_to_formal',
  'judge': '[OpenAI API key not set]'},
 {'input': 'Could you please provide the document at your earliest convenience?',
  'direction': 'formal_to_informal',
  'judge': '[OpenAI API key not set]'}]