## **Fine-tuning GPT using RLHF**

Fine-tuning GPT and BERT using Reinforcement Learning (RLHF)
Overview of RLHF (Reinforcement Learning from Human Feedback):
RLHF fine-tunes language models by incorporating feedback from human evaluations. The idea is to train the model using rewards derived from human feedback to improve its response quality, factual correctness, and user alignment.

**Imports**

In [None]:
!pip install trl datasets
!pip install tf-keras
from transformers import GPT2LMHeadModel, GPT2Tokenizer
from trl import PPOTrainer  # Correct import!
from datasets import load_dataset
import torch



**Load Pre-trained GPT Model and Tokenizer**

In [None]:
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")

**Load Dataset**

In [None]:
dataset = load_dataset('openai/gpt', split='train')

**Tokenize the Text**

In [None]:
def tokenize_function(examples):
    return tokenizer(examples['text'], return_tensors='pt', truncation=True, padding=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)


**Define Reward Model**

In [2]:
def reward_function(output_text):
    # Human feedback evaluation (simple for demonstration)
    reward_score = 0
    if "good" in output_text:
        reward_score = 1  # Positive reward
    else:
        reward_score = -1  # Negative reward
    return reward_score

**Reinforcement Learning Fine-tuning with PPO**

In [None]:
trainer = PPOTrainer(
    model=model,
    tokenizer=tokenizer,
    reward_function=reward_function,
    train_dataset=tokenized_datasets,
    batch_size=8,
    num_train_epochs=3
)

**Fine-tuning the model using RL**

In [None]:
trainer.train()

**Generate Text using the Fine-tuned Model**

In [None]:
input_text = "Once upon a time"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(inputs['input_ids'], max_length=50, num_return_sequences=1)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))