# Shakespearean Text Generation with Hugging Face Transformers

This notebook shows how to load a pretrained causal language model from the Hugging Face Hub and use its `generate()` method to craft text that mimics Shakespeare's diction. Feel free to adapt the prompts and decoding parameters to experiment with different styles.


In [3]:
%pip install --quiet transformers torch accelerate

Note: you may need to restart the kernel to use updated packages.


## Load a lightweight Shakespeare-friendly model

The `distilgpt2` checkpoint is a distilled version of GPT-2 that fits on most laptops while still producing coherent prose. For better results you can swap in larger checkpoints (e.g., `gpt2-medium`, `tiiuae/falcon-7b-instruct`) if your hardware allows it.


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

MODEL_NAME = "distilgpt2"

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Usando dispositivo: {DEVICE}")

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
model.to(DEVICE)
model.eval();


NameError: name 'AutoTokenizer' is not defined

In [None]:
def generate_shakespearean_text(
    prompt: str,
    max_new_tokens: int = 120,
    temperature: float = 0.95,
    top_p: float = 0.92,
    repetition_penalty: float = 1.05,
    seed: int | None = 42,
) -> str:
    """Generate Shakespearean-style text using nucleus sampling."""
    if seed is not None:
        torch.manual_seed(seed)
        if torch.cuda.is_available():
            torch.cuda.manual_seed_all(seed)

    inputs = tokenizer(prompt, return_tensors="pt").to(DEVICE)

    with torch.no_grad():
        output_ids = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=temperature,
            top_p=top_p,
            repetition_penalty=repetition_penalty,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )

    generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    return generated_text


## Try a few Shakespearean prompts

Adjust the prompt and decoding hyperparameters to steer tone, length, and creativity. Lower `temperature` or `top_p` for more conservative prose; raise them for adventurous language.


In [None]:
prompt = (
    "Compose a soliloquy in Shakespearean English about a knight who doubts "
    "the justice of his king yet remains loyal."
)

sample = generate_shakespearean_text(
    prompt,
    max_new_tokens=100,
    temperature=0.9,
    top_p=0.9,
    repetition_penalty=1.03,
)
print(sample)
