# Q-Learning for Prompt Selection (Predefined Questions)

This notebook demonstrates how to use Q-learning to select between original and elaborated prompts using predefined test questions.

## Key Components
- **States**: Features extracted from the prompt (type, length, complexity)
- **Actions**: Choose between original or elaborated prompt
- **Rewards**: Simulated or predefined feedback on which response was better

In [11]:
import sys
import os
import json
from IPython.display import display, HTML

# Add the project root to the path
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..')))

from src.flow.prompt_elaborator import PromptElaborator
from src.rl.q_learning import QLearningModel, ORIGINAL_PROMPT, ELABORATED_PROMPT
from src.model.model_catalogue import ModelCatalogue
from src.model.wrappers import ChatModelWrapper

## Initialize Components

In [12]:
# Initialize the prompt elaborator
elaborator = PromptElaborator(model_name="claude_3_sonnet")

# Initialize the Q-learning model
q_model = QLearningModel(
    learning_rate=0.1,
    exploration_rate=0.3,
    save_path="../results/q_learning_notebook"
)

# Initialize a test model
multimodal_models = ModelCatalogue.get_MLLMs()
print("Available multimodal models:")
for name in multimodal_models.keys():
    print(f"- {name}")

test_model_name = "claude_3_sonnet" if "claude_3_sonnet" in multimodal_models else list(multimodal_models.keys())[0]
print(f"\nUsing model: {test_model_name}")
test_model = ChatModelWrapper(multimodal_models[test_model_name]).model

Available multimodal models:
- oai_4o_latest
- oai_chatgpt_latest
- claude_3_sonnet
- claude_3_haiku
- gemini_1.5_flash
- gemini_1.5_8b_flash
- gemini_1.5_pro
- grok_2_vision
- llava_7b
- llava_13b
- llava_34b
- bakllava_7b

Using model: claude_3_sonnet


## Predefined Testing

In [16]:
# Define a list of test prompts for different types of queries
test_prompts = [
    "Explain the concept of artificial intelligence",
    "How to implement a binary search algorithm?"
]

# Define a function to process a single prompt
def process_test_prompt(prompt, feedback=None):
    print(f"\n{'='*80}\nProcessing prompt: {prompt}\n{'='*80}\n")
    
    # Extract features for the state
    features = q_model.extract_features(prompt)
    print("Extracted features:")
    for key, value in features.items():
        print(f"  {key}: {value}")
    
    # Elaborate the prompt
    elaborated_prompt = elaborator(prompt)
    
    print("\nOriginal prompt:")
    print(prompt)
    
    print("\nElaborated prompt:")
    print(elaborated_prompt)
    
    # Select action using Q-learning
    action = q_model.select_action(prompt)
    action_name = "Original" if action == ORIGINAL_PROMPT else "Elaborated"
    print(f"\nQ-learning selected: {action_name} prompt")
    
    # Get responses from both prompts
    print("\nGetting responses...")
    try:
        original_response = test_model.invoke(prompt)
        elaborated_response = test_model.invoke(elaborated_prompt)
        
        # Truncate responses for display
        orig_content = original_response.content
        elab_content = elaborated_response.content
        
        print("\nResponse to original prompt:")
        print(orig_content[:500] + "..." if len(orig_content) > 500 else orig_content)
        
        print("\nResponse to elaborated prompt:")
        print(elab_content[:500] + "..." if len(elab_content) > 500 else elab_content)
        
        # Process feedback if provided
        if feedback is not None:
            print(f"\nPredefined feedback: {feedback}")
            q_model.process_feedback(feedback)
            
            # Show updated stats
            stats = q_model.get_stats()
            print("\nUpdated Q-learning stats:")
            print(f"Total updates: {stats['total_updates']}")
            print(f"Original selections: {stats['original_percentage']:.1f}%")
            print(f"Elaborated selections: {stats['elaborated_percentage']:.1f}%")
            print(f"Exploration rate: {stats['exploration_rate']:.3f}")
            
            # Save the model
            q_model.save()
            print("\nQ-learning model updated and saved")
        else:
            print("\nNo feedback provided - skipping Q-learning update")
        
        return original_response, elaborated_response
    except Exception as e:
        print(f"Error getting responses: {e}")
        return None, None

## Run Tests with Predefined Feedback

Let's run through our test prompts with predefined feedback.

In [17]:
# Predefined feedback to simulate user preferences
# This could be replaced with your own judgments after seeing the responses
predefined_feedback = [
    "elaborated",  # For explanatory questions, elaborate is often better
    "original"    # For how-to questions
]

# Test the first prompt to see responses without giving feedback
process_test_prompt(test_prompts[0])

2025-04-04 18:56:32,734 - langchain_aws.llms.bedrock - INFO - Using Bedrock Invoke API to generate response



Processing prompt: Explain the concept of artificial intelligence

Extracted features:
  has_image: False
  length_category: medium
  is_question: False
  prompt_type: descriptive
  complexity: simple


2025-04-04 18:56:42,430 - langchain_aws.chat_models.bedrock - INFO - The message received from Bedrock: Artificial intelligence (AI) is a rapidly evolving field that encompasses a wide range of techniques and technologies aimed at creating intelligent systems that can perceive, learn, reason, and act in ways that mimic or even surpass human capabilities. To effectively explain this complex concept, please provide a detailed and comprehensive response covering the following aspects:

1. Historical Background:
   - Briefly introduce the origins and key milestones in the development of AI, highlighting pioneers and their contributions.
   - Explain the driving forces behind the pursuit of artificial intelligence, such as the desire to automate tasks, enhance decision-making processes, and push the boundaries of human knowledge.

2. Definition and Components:
   - Provide a clear and concise definition of artificial intelligence, highlighting its fundamental goals and characteristics.
   -


Original prompt:
Explain the concept of artificial intelligence

Elaborated prompt:
Artificial intelligence (AI) is a rapidly evolving field that encompasses a wide range of techniques and technologies aimed at creating intelligent systems that can perceive, learn, reason, and act in ways that mimic or even surpass human capabilities. To effectively explain this complex concept, please provide a detailed and comprehensive response covering the following aspects:

1. Historical Background:
   - Briefly introduce the origins and key milestones in the development of AI, highlighting pioneers and their contributions.
   - Explain the driving forces behind the pursuit of artificial intelligence, such as the desire to automate tasks, enhance decision-making processes, and push the boundaries of human knowledge.

2. Definition and Components:
   - Provide a clear and concise definition of artificial intelligence, highlighting its fundamental goals and characteristics.
   - Discuss the main c

2025-04-04 18:56:49,907 - langchain_aws.chat_models.bedrock - INFO - The message received from Bedrock: Artificial Intelligence (AI) is a branch of computer science that focuses on developing machines and systems that can perform tasks that typically require human intelligence, such as learning, problem-solving, decision-making, perception, and reasoning. The goal of AI is to create intelligent agents that can analyze data, recognize patterns, and make decisions or take actions based on that analysis.

AI involves several approaches and techniques, including:

1. Machine Learning: Machine learning is a subset of AI that involves training algorithms to learn from data and make predictions or decisions without being explicitly programmed. This includes supervised learning (using labeled data), unsupervised learning (finding patterns in unlabeled data), and reinforcement learning (learning through trial and error).

2. Neural Networks: Neural networks are a type of machine learning algori

Error getting responses: An error occurred (ThrottlingException) when calling the InvokeModel operation (reached max retries: 4): Too many requests, please wait before trying again.


(None, None)

Now that you've seen the responses, you can run all tests with predefined feedback. You can modify the feedback if needed based on your own judgment.

In [18]:
# Process all test prompts with predefined feedback
for i, prompt in enumerate(test_prompts):
    if i < len(predefined_feedback):
        process_test_prompt(prompt, predefined_feedback[i])

2025-04-04 18:57:05,304 - langchain_aws.llms.bedrock - INFO - Using Bedrock Invoke API to generate response



Processing prompt: Explain the concept of artificial intelligence

Extracted features:
  has_image: False
  length_category: medium
  is_question: False
  prompt_type: descriptive
  complexity: simple


2025-04-04 18:57:12,370 - langchain_aws.chat_models.bedrock - INFO - The message received from Bedrock: Artificial intelligence (AI) is a broad and fascinating field that involves the development of intelligent systems and machines capable of performing tasks that typically require human-like cognition, such as learning, problem-solving, reasoning, perception, and decision-making. To provide a comprehensive explanation of AI, please address the following points:

1. Definition and History:
   - Provide a concise definition of artificial intelligence.
   - Briefly discuss the historical milestones and pioneers in the development of AI.

2. Applications and Domains:
   - Describe some major domains and applications where AI is currently being utilized or has the potential for significant impact.
   - Provide specific examples of how AI is being applied in fields like healthcare, finance, transportation, and entertainment.

3. Techniques and Approaches:
   - Explain the key techniques and


Original prompt:
Explain the concept of artificial intelligence

Elaborated prompt:
Artificial intelligence (AI) is a broad and fascinating field that involves the development of intelligent systems and machines capable of performing tasks that typically require human-like cognition, such as learning, problem-solving, reasoning, perception, and decision-making. To provide a comprehensive explanation of AI, please address the following points:

1. Definition and History:
   - Provide a concise definition of artificial intelligence.
   - Briefly discuss the historical milestones and pioneers in the development of AI.

2. Applications and Domains:
   - Describe some major domains and applications where AI is currently being utilized or has the potential for significant impact.
   - Provide specific examples of how AI is being applied in fields like healthcare, finance, transportation, and entertainment.

3. Techniques and Approaches:
   - Explain the key techniques and approaches used in

2025-04-04 18:57:18,268 - root - ERROR - Error raised by bedrock service: An error occurred (ThrottlingException) when calling the InvokeModel operation (reached max retries: 4): Too many requests, please wait before trying again.
2025-04-04 18:57:18,273 - langchain_aws.llms.bedrock - INFO - Using Bedrock Invoke API to generate response


Error getting responses: An error occurred (ThrottlingException) when calling the InvokeModel operation (reached max retries: 4): Too many requests, please wait before trying again.

Processing prompt: How to implement a binary search algorithm?

Extracted features:
  has_image: False
  length_category: medium
  is_question: True
  prompt_type: procedural
  complexity: simple


2025-04-04 18:57:21,673 - root - ERROR - Error raised by bedrock service: An error occurred (ThrottlingException) when calling the InvokeModel operation (reached max retries: 4): Too many requests, please wait before trying again.


ThrottlingException: An error occurred (ThrottlingException) when calling the InvokeModel operation (reached max retries: 4): Too many requests, please wait before trying again.

## View Current Q-Learning Statistics

In [19]:
# Get and display current stats
stats = q_model.get_stats()

print(f"Current Q-Learning Statistics:")
print(f"Total updates: {stats['total_updates']}")
print(f"Original prompt selections: {stats['original_selected']} ({stats['original_percentage']:.1f}%)")
print(f"Elaborated prompt selections: {stats['elaborated_selected']} ({stats['elaborated_percentage']:.1f}%)")
print(f"Current exploration rate: {stats['exploration_rate']:.3f}")
print(f"Q-table size: {stats['q_table_size']} states")

# Show recent history if available
if 'history' in stats and stats['history']:
    print("\nRecent learning history:")
    for i, entry in enumerate(stats['history'][-5:]):
        print(f"Entry {i+1}:")
        print(f"  Action: {entry['action']}")
        print(f"  Reward: {entry['reward']}")
        print(f"  New Q-value: {entry['new_q_value']:.4f}")

Current Q-Learning Statistics:
Total updates: 0
Original prompt selections: 2 (40.0%)
Elaborated prompt selections: 3 (60.0%)
Current exploration rate: 0.300
Q-table size: 2 states


## Examine the Q-Table

In [20]:
# Function to display the Q-table in a readable format
def display_q_table(q_table, limit=10):
    print(f"Q-Table (showing up to {limit} entries):")
    print("-" * 80)
    print(f"{'State':<50} | {'Original Q-Value':>15} | {'Elaborated Q-Value':>15}")
    print("-" * 80)
    
    # Convert the state keys to a more readable format
    for i, (state_key, q_values) in enumerate(q_table.items()):
        if i >= limit:
            print(f"... and {len(q_table) - limit} more entries")
            break
            
        # Make the state key more readable
        readable_state = state_key.replace("|", ", ")
        if len(readable_state) > 50:
            readable_state = readable_state[:47] + "..."
            
        print(f"{readable_state:<50} | {q_values[0]:15.4f} | {q_values[1]:15.4f}")

# Display the Q-table
display_q_table(q_model.q_table)

Q-Table (showing up to 10 entries):
--------------------------------------------------------------------------------
State                                              | Original Q-Value | Elaborated Q-Value
--------------------------------------------------------------------------------
complexity:simple, has_image:False, is_question... |          0.0000 |          0.0000
complexity:simple, has_image:False, is_question... |          0.0000 |          0.0000


## Custom Test Prompt

Here you can test the trained model with your own prompt to see which prompt type it selects.

In [21]:
# Enter your custom prompt here
custom_prompt = "Explain the concept of reinforcement learning in AI"

# Process the custom prompt
process_test_prompt(custom_prompt)

2025-04-04 18:58:27,410 - langchain_aws.llms.bedrock - INFO - Using Bedrock Invoke API to generate response



Processing prompt: Explain the concept of reinforcement learning in AI

Extracted features:
  has_image: False
  length_category: long
  is_question: False
  prompt_type: descriptive
  complexity: simple


2025-04-04 18:58:33,538 - langchain_aws.chat_models.bedrock - INFO - The message received from Bedrock: Reinforcement learning is a fundamental concept in artificial intelligence that deals with how an agent can learn to make optimal decisions by interacting with its environment. I'd like you to provide a detailed and easy-to-understand explanation of this concept, covering the following points:

1. **Introduction**: Begin with a brief overview of what reinforcement learning is and why it's important in AI.

2. **Key Components**: Explain the key components of a reinforcement learning system, such as the agent, environment, actions, rewards, and state transitions.

3. **Learning Process**: Describe how the agent learns from its experiences, highlighting the exploration-exploitation trade-off and the role of rewards in shaping the agent's behavior.

4. **Algorithms**: Provide an overview of some popular reinforcement learning algorithms, such as Q-learning, SARSA, and policy gradient me


Original prompt:
Explain the concept of reinforcement learning in AI

Elaborated prompt:
Reinforcement learning is a fundamental concept in artificial intelligence that deals with how an agent can learn to make optimal decisions by interacting with its environment. I'd like you to provide a detailed and easy-to-understand explanation of this concept, covering the following points:

1. **Introduction**: Begin with a brief overview of what reinforcement learning is and why it's important in AI.

2. **Key Components**: Explain the key components of a reinforcement learning system, such as the agent, environment, actions, rewards, and state transitions.

3. **Learning Process**: Describe how the agent learns from its experiences, highlighting the exploration-exploitation trade-off and the role of rewards in shaping the agent's behavior.

4. **Algorithms**: Provide an overview of some popular reinforcement learning algorithms, such as Q-learning, SARSA, and policy gradient methods. You can

2025-04-04 18:58:41,988 - langchain_aws.chat_models.bedrock - INFO - The message received from Bedrock: Reinforcement learning is a type of machine learning technique that deals with how software agents ought to take actions in an environment to maximize some notion of cumulative reward. It is inspired by the way humans and animals learn from experience through trial-and-error interactions with their environment.

In reinforcement learning, an agent (the learner or decision-maker) interacts with an environment by taking actions and receiving rewards or penalties based on the consequences of those actions. The goal of the agent is to learn a policy, which is a mapping from perceived environmental states to actions, that maximizes the expected cumulative reward over time.

The key elements of reinforcement learning are:

1. Environment: The world or system in which the agent operates and interacts with.
2. Agent: The software entity that takes actions and learns from its experiences.
3. 

Error getting responses: An error occurred (ThrottlingException) when calling the InvokeModel operation (reached max retries: 4): Too many requests, please wait before trying again.


(None, None)

## Exploitation Mode Test

This cell demonstrates how the model works in pure exploitation mode (no exploration).

In [None]:
# Create a temporary model with no exploration
exploit_model = QLearningModel(
    learning_rate=0.1,
    exploration_rate=0.0,  # No exploration, pure exploitation
    save_path="../results/q_learning_notebook"
)

# Load the existing Q-table
exploit_model._load_q_table()

# Test the model with a new prompt
test_prompt = "What are the advantages of quantum computing over classical computing?"

# Extract features for the state
features = exploit_model.extract_features(test_prompt)
state_key = exploit_model._get_state_key(features)

print("Extracted features:")
for key, value in features.items():
    print(f"  {key}: {value}")

# Elaborate the prompt
elaborated_test_prompt = elaborator(test_prompt)

# Get Q-values for this state
if state_key in exploit_model.q_table:
    q_values = exploit_model.q_table[state_key]
    print(f"\nFound Q-values in table: Original={q_values[0]:.4f}, Elaborated={q_values[1]:.4f}")
else:
    q_values = [0.0, 0.0]
    print("\nNo existing Q-values found for this state. Using default values.")

# Select action based on Q-values (pure exploitation)
if q_values[ORIGINAL_PROMPT] > q_values[ELABORATED_PROMPT]:
    selected_action = ORIGINAL_PROMPT
    selected_prompt = test_prompt
    action_name = "Original"
elif q_values[ELABORATED_PROMPT] > q_values[ORIGINAL_PROMPT]:
    selected_action = ELABORATED_PROMPT
    selected_prompt = elaborated_test_prompt
    action_name = "Elaborated"
else:
    # If Q-values are equal, default to elaborated
    selected_action = ELABORATED_PROMPT
    selected_prompt = elaborated_test_prompt
    action_name = "Elaborated (default for equal Q-values)"

print(f"\nSelected: {action_name} prompt")

# Display both prompts
print("\nOriginal prompt:")
print(test_prompt)

print("\nElaborated prompt:")
print(elaborated_test_prompt)

# Get response for the selected prompt
print(f"\nGetting response for the {action_name.lower()} prompt...")
try:
    response = test_model.invoke(selected_prompt)
    print("\nResponse:")
    print(response.content[:500] + "..." if len(response.content) > 500 else response.content)
except Exception as e:
    print(f"Error getting response: {e}")

## Conclusion

The Q-learning model learns from predefined feedback which prompt type (original or elaborated) works better for different types of queries. This approach with predefined prompts and feedback lets you train the model.