Fine-tune GPT or GPT-2 for creative story generation.

In [None]:
import json
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel, Trainer, TrainingArguments, DataCollatorForLanguageModeling
from datasets import load_dataset, Dataset

In [None]:
# Load the AI-generated stories dataset
with open("ai_generated_stories.json", "r", encoding="utf-8") as file:
    data = json.load(file)

In [None]:
# Convert JSON to Hugging Face dataset format
texts = [item["story"] for item in data]
dataset = Dataset.from_dict({"text": texts})

In [None]:
# Load GPT-2 Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token  # Set pad token as EOS token

In [None]:
# Tokenize the dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=512)

tokenized_dataset = dataset.map(tokenize_function, batched=True, remove_columns=["text"])

# Load GPT-2 Model
model = GPT2LMHeadModel.from_pretrained("gpt2")


In [None]:
# Training Arguments
training_args = TrainingArguments(
    output_dir="./gpt2-story-model",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    num_train_epochs=3,
    save_total_limit=2,
    logging_dir="./logs",
    logging_steps=500,
    report_to="none"
)

# Data Collator (for padding)
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

# Trainer Initialization
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    eval_dataset=tokenized_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator
)

# Train the model
trainer.train()

# Save the fine-tuned model
model.save_pretrained("./fine_tuned_gpt2_story")
tokenizer.save_pretrained("./fine_tuned_gpt2_story")


Map:   0%|          | 0/10 [00:00<?, ? examples/s]

  trainer = Trainer(
`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.


Epoch,Training Loss,Validation Loss
1,No log,2.98549
2,No log,2.671569
3,No log,2.559867


('./fine_tuned_gpt2_story/tokenizer_config.json',
 './fine_tuned_gpt2_story/special_tokens_map.json',
 './fine_tuned_gpt2_story/vocab.json',
 './fine_tuned_gpt2_story/merges.txt',
 './fine_tuned_gpt2_story/added_tokens.json')

In [None]:
from transformers import pipeline, GPT2Tokenizer, GPT2LMHeadModel

# Load the fine-tuned model and tokenizer
model_path = "./fine_tuned_gpt2_story"
tokenizer = GPT2Tokenizer.from_pretrained(model_path)
model = GPT2LMHeadModel.from_pretrained(model_path)

# Initialize text generation pipeline
story_generator = pipeline("text-generation", model=model, tokenizer=tokenizer)

# Test prompt
# New test prompt
prompt = "Deep within the enchanted forest, a hidden portal shimmered under the moonlight."

# Generate a new story
generated_story = story_generator(prompt, max_length=250, num_return_sequences=1)

# Print the output
print("Generated Story:\n")
print(generated_story[0]["generated_text"])

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Generated Story:

Deep within the enchanted forest, a hidden portal shimmered under the moonlight. It was full of mystical power and a twisted message in its hold told of a mysterious king who had cursed the people. Only those who knew the king would find it – and they feared the message wouldn't keep them hidden.

Soon, they stumbled across a massive chamber filled with strange objects inside. It haunted them…but their first clue came when they stopped on the trail…


…and soon, there wasn't anything we could do.

The door to the kingdom came undone…when the king locked it…and whispered…...

…in…the night…for the last time…

…never…told…

A strange, powerful voice told them…a tale…and…they couldn't believe it. They didn't know it, of course, and just as they opened the door, a large force vanished…

…
