## Reinforcement Learning Epsilon-Greedy Strategy in Text Generation

This mini-project explores the application of the epsilon-greedy strategy in the context of NLP, specifically in text generation using a pre-trained language model (GPT-2).

The epsilon-greedy strategy, commonly used in RL, balances exploration (trying new actions) and exploitation (leveraging known strategies) to enhance decision-making processes.

In our NLP task, this approach is adapted to decide whether to generate the next word in a sequence based on the model's prediction (exploitation) or select a word randomly (exploration).

Key highlights of the project include:

* Utilizing the GPT-2 model for word prediction, demonstrating exploitation by leveraging the model's understanding of language patterns.
* Implementing random word selection to introduce exploration, allowing for creative and diverse text generation beyond the model's standard predictions.
* Combining exploration and exploitation using an epsilon value of 0.6, guiding the balance between predictable and novel text generation.

In [1]:
import random
import numpy as np
from transformers import GPT2LMHeadModel, GPT2Tokenizer


In [2]:
# Load the pre-trained language model and tokenizer
model = GPT2LMHeadModel.from_pretrained('gpt2')
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

# Set epsilon for exploration
epsilon = 0.6

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [3]:
# Define a function to choose an action using epsilon-greedy policy
def epsilon_greedy_policy(text, epsilon):
    if random.random() < epsilon:
        # Exploration: choose a random action (word)
        possible_actions = tokenizer.get_vocab().keys()
        next_word = random.choice(list(possible_actions))
        generated_text = text + ' ' + next_word
    else:
        # Exploitation: choose the best action according to the model
        input_ids = tokenizer.encode(text, return_tensors='pt')
        outputs = model.generate(input_ids, max_length=input_ids.shape[1] + 1, do_sample=False)
        next_word_id = outputs[0, -1].item()
        next_word = tokenizer.decode([next_word_id])
        generated_text = text + ' ' + next_word
    return generated_text

In [4]:
# Usage
initial_text = "The AI model"
for _ in range(10):  # Let's generate 10 words using epsilon-greedy policy
    initial_text = epsilon_greedy_policy(initial_text, epsilon)
    print(initial_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


The AI model Ġtowards


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


The AI model Ġtowards  a
The AI model Ġtowards  a liness
The AI model Ġtowards  a liness ĠMercedes
The AI model Ġtowards  a liness ĠMercedes Ġhazard
The AI model Ġtowards  a liness ĠMercedes Ġhazard  �
The AI model Ġtowards  a liness ĠMercedes Ġhazard  � Ġmilk
The AI model Ġtowards  a liness ĠMercedes Ġhazard  � Ġmilk Software
The AI model Ġtowards  a liness ĠMercedes Ġhazard  � Ġmilk Software reciation
The AI model Ġtowards  a liness ĠMercedes Ġhazard  � Ġmilk Software reciation Ġput
