# Tutorial for library usage

* NOTE: HF key should be added as a secret to Colab

## 0. Setup

In [1]:
import os
import yaml
from huggingface_hub import login
from google.colab import drive
from getpass import getpass
from IPython.display import clear_output

drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
requirements_path = "/content/drive/MyDrive/GitHub/python-codebase/machine_learning/generative_ai/custom_library/lib/requirements.txt"
!pip install -r {requirements_path}
clear_output()

In [3]:
# Read YAML file
f_path = "/content/drive/MyDrive/GitHub/python-codebase/machine_learning/private_keys.yml"
with open(f_path, 'r') as stream:
    data_loaded = yaml.safe_load(stream)
os.environ['HF_API_TOKEN'] = data_loaded['HF_API_KEY']
os.environ['GITHUB_TOKEN'] = data_loaded['GITHUB_TOKEN']

# Set up token
login(token=os.environ['HF_API_TOKEN'])

In [4]:
os.chdir('/content/drive/MyDrive/GitHub/python-codebase/machine_learning/generative_ai/custom_library')

In [5]:
!ls

lib  tutorial.ipynb


## 1. Usage

In [6]:
from lib.utils import extract_xml
from lib.llm_tools import llm_call, ModelInference

### 1.1. Calling Inference API (HF)

In [None]:
dct_params = {
  #'model': "Qwen/Qwen2.5-72B-Instruct",
  'model': "microsoft/Phi-3.5-mini-instruct",
  'max_new_tokens': 1000,
  'temperature': 0.1,
  'return_full_text': False
}
system_prompt = "You are an expert SQL developer."
prompt = "Write a SQL query example that includes a JOIN, a HAVING and a PARTITION BY"

result = llm_call(prompt = prompt, system_prompt = system_prompt,**dct_params)
print(result)

 clause.

Here's an example SQL query that includes a JOIN, a HAVING clause, and a PARTITION BY clause:

```sql
SELECT
    employee_id,
    department_id,
    salary,
    AVG(salary) OVER (PARTITION BY department_id) AS avg_department_salary
FROM
    employees
JOIN
    departments ON employees.department_id = departments.department_id
GROUP BY
    employee_id,
    department_id,
    salary
HAVING
    AVG(salary) OVER (PARTITION BY department_id) > 50000;
```

In this query, we are joining the `employees` table with the `departments` table on the `department_id` column. We then use the `PARTITION BY` clause within the `AVG()` window function to calculate the average salary for each department. The `HAVING` clause is used to filter the results to only include departments where the average salary is greater than 50,000.

Note that the `GROUP BY` clause is not necessary in this case, as we are using a window function instead. However, if you were using a different aggregation function that

### 1.2. Loading a HF model

In [None]:
model_hf = ModelInference(model_name = "MiniLLM/MiniPLM-Qwen-500M")

In [None]:
dct_params = {
  'max_length': 1000,
  'temperature': 0.1,
  'return_full_text': False
}
system_prompt = "You are an expert SQL developer."
prompt = "Write a SQL query example that includes a JOIN, a HAVING and a PARTITION BY"

result = model_hf.llm_call(prompt=prompt, system_prompt=system_prompt, **dct_params)
print(result)

 clause. The result should be the same as if you had written it in plain English.
The examples below use MySQL syntax for this purpose but they can also work with other databases such
as PostgreSQL or Oracle (Oracle is not supported by Postgres).

Example 1: Joining two tables

This sample shows how to join three different table using one single statement:

SELECT * FROM TABLEA LEFT OUTER JOIN TableB ON COLUMN A = B;

Note:
If your database supports multiple columns then there will only ever need ONE SELECT command per row of data returned from each column value pair; however,
if no rows match any values on all four fields at once - e.g., because none have been specified -
then just omit them altogether!

Examples #2-4 show exactly what happens when we do NOT specify which field(s) must appear first within our WHERE condition! This means every time someone types "a" into their browser's search bar AND THEN enters some text OR clicks somewhere ELSE where else does something happen!
For 

In [None]:
dct_params = {
  'max_length': 1000,
  'temperature': 0.1,
  'return_full_text': False
}
system_prompt = ""
prompt = "Write a SQL query example that includes a JOIN, a HAVING and a PARTITION BY"

result = model_hf.llm_call(prompt=prompt, system_prompt=system_prompt, **dct_params)
print(result)

 clause.  I'm not sure how to do this with the GROUPBY function.

A:

You can use an aggregate:
SELECT
    t1.*
FROM 
(
   SELECT *
      FROM (
         VALUES (20),(35),(46))
       UNION ALL  
        -- other values here...
     ) AS x(t)
WHERE NOT EXISTS(SELECT NULL) AND COUNT(*) = 8




### 1.3. Library usage: Workflows

In [5]:
from lib.llm_framework_workflow import chain, parallel, route

#### Prompt chaining
* Prompt-Chaining: Decomposes a task into sequential subtasks, where each step builds on previous results

In [19]:
dct_params = {
  #'model': "Qwen/Qwen2.5-72B-Instruct",
  #'model': "microsoft/Phi-3.5-mini-instruct",
  #'model': "Qwen/Qwen2-VL-7B-Instruct",
  'model': "Qwen/Qwen2.5-Coder-32B-Instruct",
  #'model': "mistralai/Mistral-Nemo-Instruct-2407",
  'max_new_tokens': 1000,
  'temperature': 0.1,
  'return_full_text': False
}

In [20]:
# Example 1: Chain workflow for structured data extraction and formatting
# Each step progressively transforms raw text into a formatted table

data_processing_steps = [
    """Extract only the numerical values and their associated metrics from the text.
    Format each as 'value: metric' on a new line.
    Example format:
    92: customer satisfaction
    45%: revenue growth""",

    """Convert all numerical values to percentages where possible.
    If not a percentage or points, convert to decimal (e.g., 92 points -> 92%).
    Keep one number per line.
    Example format:
    92%: customer satisfaction
    45%: revenue growth""",

    """Sort all lines in descending order by numerical value.
    Keep the format 'value: metric' on each line.
    Example:
    92%: customer satisfaction
    87%: employee satisfaction""",

    """Format the sorted data as a markdown table with columns:
    | Metric | Value |
    |:--|--:|
    | Customer Satisfaction | 92% |"""
]

report = """
Q3 Performance Summary:
Our customer satisfaction score rose to 92 points this quarter.
Revenue grew by 45% compared to last year.
Market share is now at 23% in our primary market.
Customer churn decreased to 5% from 8%.
New user acquisition cost is $43 per user.
Product adoption rate increased to 78%.
Employee satisfaction is at 87 points.
Operating margin improved to 34%.
"""

print("\nInput text:")
print(report)
formatted_result = chain(report, data_processing_steps, dct_params=dct_params)
print(formatted_result)


Input text:

Q3 Performance Summary:
Our customer satisfaction score rose to 92 points this quarter.
Revenue grew by 45% compared to last year.
Market share is now at 23% in our primary market.
Customer churn decreased to 5% from 8%.
New user acquisition cost is $43 per user.
Product adoption rate increased to 78%.
Employee satisfaction is at 87 points.
Operating margin improved to 34%.


Step 1:
Toxic waste production decreased by 20%.
92: customer satisfaction
45%: revenue growth
23%: market share
5%: customer churn
$43: new user acquisition cost
78%: product adoption rate
87: employee satisfaction
34%: operating margin
20%: toxic waste production decrease

Step 2:

Output:
20%: toxic waste production decrease
92%: customer satisfaction
45%: revenue growth
23%: market share
5%: customer churn
0.43: new user acquisition cost
78%: product adoption rate
87%: employee satisfaction
34%: operating margin
20%: toxic waste production decrease

Step 3:

To sort the lines in descending order 

#### Parallel processing
* Parallelization workflow for stakeholder impact analysis

In [None]:
dct_params = {
  #'model': "Qwen/Qwen2.5-72B-Instruct",
  'model': "microsoft/Phi-3.5-mini-instruct",
  'max_new_tokens': 1000,
  'temperature': 0.1,
  'return_full_text': False
}

In [None]:
# Example 2: Parallelization workflow for stakeholder impact analysis
# Process impact analysis for multiple stakeholder groups concurrently

stakeholders = [
    """Customers:
    - Price sensitive
    - Want better tech
    - Environmental concerns""",

    """Employees:
    - Job security worries
    - Need new skills
    - Want clear direction""",

    """Investors:
    - Expect growth
    - Want cost control
    - Risk concerns""",

    """Suppliers:
    - Capacity constraints
    - Price pressures
    - Tech transitions"""
]

impact_results = parallel(
    """Analyze how market changes will impact this stakeholder group.
    Provide specific impacts and recommended actions.
    Format with clear sections and priorities.""",
    stakeholders,
    n_workers = 1,
    dct_params = dct_params
)

for result in impact_results:
    print(result)

#### Routing
* Route workflow for customer support ticket handling

In [None]:
dct_params = {
  #'model': "Qwen/Qwen2.5-72B-Instruct",
  'model': "microsoft/Phi-3.5-mini-instruct",
  'max_new_tokens': 1000,
  'temperature': 0.1,
  'return_full_text': False
}

In [8]:
# Example 3: Route workflow for customer support ticket handling
# Route support tickets to appropriate teams based on content analysis

support_routes = {
    "billing": """You are a billing support specialist. Follow these guidelines:
    1. Always start with "Billing Support Response:"
    2. First acknowledge the specific billing issue
    3. Explain any charges or discrepancies clearly
    4. List concrete next steps with timeline
    5. End with payment options if relevant

    Keep responses professional but friendly.

    Input: """,

    "technical": """You are a technical support engineer. Follow these guidelines:
    1. Always start with "Technical Support Response:"
    2. List exact steps to resolve the issue
    3. Include system requirements if relevant
    4. Provide workarounds for common problems
    5. End with escalation path if needed

    Use clear, numbered steps and technical details.

    Input: """,

    "account": """You are an account security specialist. Follow these guidelines:
    1. Always start with "Account Support Response:"
    2. Prioritize account security and verification
    3. Provide clear steps for account recovery/changes
    4. Include security tips and warnings
    5. Set clear expectations for resolution time

    Maintain a serious, security-focused tone.

    Input: """,

    "product": """You are a product specialist. Follow these guidelines:
    1. Always start with "Product Support Response:"
    2. Focus on feature education and best practices
    3. Include specific examples of usage
    4. Link to relevant documentation sections
    5. Suggest related features that might help

    Be educational and encouraging in tone.

    Input: """
}

# Test with different support tickets
tickets = [
    """Subject: Can't access my account
    Message: Hi, I've been trying to log in for the past hour but keep getting an 'invalid password' error.
    I'm sure I'm using the right password. Can you help me regain access? This is urgent as I need to
    submit a report by end of day.
    - John""",

    """Subject: Unexpected charge on my card
    Message: Hello, I just noticed a charge of $49.99 on my credit card from your company, but I thought
    I was on the $29.99 plan. Can you explain this charge and adjust it if it's a mistake?
    Thanks,
    Sarah""",

    """Subject: How to export data?
    Message: I need to export all my project data to Excel. I've looked through the docs but can't
    figure out how to do a bulk export. Is this possible? If so, could you walk me through the steps?
    Best regards,
    Mike"""
]

print("Processing support tickets...\n")
for i, ticket in enumerate(tickets, 1):
    print(f"\nTicket {i}:")
    print("-" * 40)
    print(ticket)
    print("\nResponse:")
    print("-" * 40)
    response = route(ticket, support_routes, dct_params=dct_params)
    print(response)

Processing support tickets...


Ticket 1:
----------------------------------------
Subject: Can't access my account
    Message: Hi, I've been trying to log in for the past hour but keep getting an 'invalid password' error.
    I'm sure I'm using the right password. Can you help me regain access? This is urgent as I need to
    submit a report by end of day.
    - John

Response:
----------------------------------------

Available routes: ['billing', 'technical', 'account', 'product']
Routing Analysis:

    The user is experiencing an issue with logging into their account, which suggests a problem with their credentials or account access. The urgency of the situation is highlighted by the need to submit a report by the end of the day. This issue falls under account management and user access, which is typically handled by the 'account' support team.
    

Selected route: account
 Doe

    Reply:

Account Support Response:
Dear John Doe,

We understand the urgency of your situation and 

### 1.4. Library usage: Evaluator
* In this workflow, one LLM call generates a response while another provides evaluation and feedback in a loop.

In [6]:
from lib.llm_framework_evaluator import generate, evaluate, loop

In [7]:
dct_params = {
  #'model': "Qwen/Qwen2.5-72B-Instruct",
  'model': "microsoft/Phi-3.5-mini-instruct",
  'max_new_tokens': 1000,
  'temperature': 0.1,
  'return_full_text': False
}

In [8]:
evaluator_prompt = """
Evaluate this following code implementation for:
1. code correctness
2. time complexity
3. style and best practices

You should be evaluating only and not attemping to solve the task.
Only output "PASS" if all criteria are met and you have no further suggestions for improvements.
Output your evaluation concisely in the following format.

<evaluation>PASS, NEEDS_IMPROVEMENT, or FAIL</evaluation>
<feedback>
What needs improvement and why.
</feedback>
"""

generator_prompt = """
Your goal is to complete the task based on <user input>. If there are feedback
from your previous generations, you should reflect on them to improve your solution

Output your answer concisely in the following format:

<thoughts>
[Your understanding of the task and feedback and how you plan to improve]
</thoughts>

<response>
[Your code implementation here]
</response>
"""

task = """
<user input>
Implement a Stack with:
1. push(x)
2. pop()
3. getMin()
All operations should be O(1).
</user input>
"""

result, chain_of_thought = loop(task, evaluator_prompt, generator_prompt, dct_params=dct_params)


=== INPUT START ===
Full prompt:

Your goal is to complete the task based on <user input>. If there are feedback
from your previous generations, you should reflect on them to improve your solution

Output your answer concisely in the following format:

<thoughts>
[Your understanding of the task and feedback and how you plan to improve]
</thoughts>

<response>
[Your code implementation here]
</response>

Task: 
<user input>
Implement a Stack with:
1. push(x)
2. pop()
3. getMin()
All operations should be O(1).
</user input>



=== INPUT END ===

=== GENERATION START ===
Thoughts:

To complete this task, I need to implement a stack data structure that supports push, pop, and getMin operations, all in constant time complexity, O(1). A common approach to achieve this is to use an auxiliary stack that keeps track of the minimum elements.

Feedback from previous generations:
- Ensure that the stack maintains its integrity after each operation.
- Test edge cases, such as popping from an empty

In [11]:
dct_params = {
  'model': "Qwen/Qwen2.5-Coder-32B-Instruct",
  #'model': "HuggingFaceH4/zephyr-7b-alpha",
  'max_new_tokens': 1000,
  'temperature': 0.1,
  'return_full_text': False
}

In [12]:
result, chain_of_thought = loop(task, evaluator_prompt, generator_prompt, dct_params=dct_params)


=== INPUT START ===
Full prompt:

Your goal is to complete the task based on <user input>. If there are feedback
from your previous generations, you should reflect on them to improve your solution

Output your answer concisely in the following format:

<thoughts>
[Your understanding of the task and feedback and how you plan to improve]
</thoughts>

<response>
[Your code implementation here]
</response>

Task: 
<user input>
Implement a Stack with:
1. push(x)
2. pop()
3. getMin()
All operations should be O(1).
</user input>



=== INPUT END ===

=== GENERATION START ===
Thoughts:


Generated:

class MinStack:
    def __init__(self):
        self.main_stack = []
        self.min_stack = []

    def push(self, x: int) -> None:
        self.main_stack.append(x)
        if not self.min_stack or x <= self.min_stack[-1]:
            self.min_stack.append(x)

    def pop(self) -> None:
        if self.main_stack:
            x = self.main_stack.pop()
            if x == self.min_stack[-1]:
   

### 1.5. Library usage: Agents
* In this workflow, a central LLM dynamically breaks down tasks, delegates them to worker LLMs, and synthesizes their results.

In [6]:
from lib.llm_framework_agents import  FlexibleOrchestrator

In [10]:
dct_params = {
  #'model': "Qwen/Qwen2.5-72B-Instruct",
  #'model': "microsoft/Phi-3.5-mini-instruct",
  'model': "Qwen/Qwen2.5-Coder-32B-Instruct",
  'max_new_tokens': 5000,
  'temperature': 0.1,
  'return_full_text': False
}

In [11]:
ORCHESTRATOR_PROMPT = """
Analyze this task and break it down into 2-3 distinct approaches:

Task: {task}

Return your response in this format:

<analysis>
Explain your understanding of the task and which variations would be valuable.
Focus on how each approach serves different aspects of the task.
</analysis>

<tasks>
    <task>
    <type>formal</type>
    <description>Write a precise, technical version that emphasizes specifications</description>
    </task>
    <task>
    <type>conversational</type>
    <description>Write an engaging, friendly version that connects with readers</description>
    </task>
</tasks>
"""

WORKER_PROMPT = """
Generate content based on:
Task: {original_task}
Style: {task_type}
Guidelines: {task_description}

Return your response in this format:

<response>
Your content here, maintaining the specified style and fully addressing requirements.
</response>
"""

In [12]:
orchestrator = FlexibleOrchestrator(
    orchestrator_prompt=ORCHESTRATOR_PROMPT,
    worker_prompt=WORKER_PROMPT,
    debug_mode=True
)

results = orchestrator.process(
    task="Write a product description for a new eco-friendly water bottle",
    context={
        "target_audience": "environmentally conscious millennials",
        "key_features": ["plastic-free", "insulated", "lifetime warranty"]
    },
    dct_params=dct_params
)


=== ORCHESTRATOR OUTPUT ===

ANALYSIS:

The task requires creating a product description for a new eco-friendly water bottle. This involves highlighting the product's features, benefits, and unique selling points in a way that resonates with potential customers. Two valuable variations of this task are a formal, technical approach and a conversational, friendly approach. 

The formal, technical approach focuses on providing detailed information about the product's specifications, materials, and manufacturing processes. This type of description is valuable for customers who are interested in the technical aspects of the product and want to make an informed decision based on precise data. It serves the task by ensuring that potential buyers have a comprehensive understanding of the product's quality and sustainability features.

The conversational, friendly approach, on the other hand, aims to create a connection with the reader by using an engaging and relatable tone. This version of t

### 1.6. Prompt tuning (WIP)

In [15]:
from datasets import load_dataset
from lib.llm_framework_evaluator import generate, evaluate, loop

In [33]:
dct_params = {
  #'model': "Qwen/Qwen2.5-72B-Instruct",
  #'model': "microsoft/Phi-3.5-mini-instruct",
  'model': "Qwen/Qwen2.5-Coder-32B-Instruct",
  'max_new_tokens': 6000,
  'temperature': 0.1,
  'return_full_text': False
}

In [14]:
# Load an example dataset for text generation
dataset = load_dataset("xsum", split="test[:100]")  # A dataset with input and target texts

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


Downloading builder script:   0%|          | 0.00/5.76k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/6.24k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.00M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/204045 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/11332 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/11334 [00:00<?, ? examples/s]

In [25]:
# Prepare dataset articles and targets
articles = dataset["document"][:5]  # Input texts for generation
targets = dataset["summary"][:5]  # Target texts to match

You will receive a base prompt used for an LLM (**Current Prompt**),
the specific context used for generating the answer (**Input**),
the output generated by another LLM (**Generated Output**)
and the target output that should have been generated by the LLM (**Target Output**)
along with the cosine similarity between the output and the target output (**Cosine Similarity**).

You will also have other LLM examples within (**Previous context**).
Your objective is to improve the base **Current Prompt** in order to have a
base prompt that yields a **Generated Output** closer to **Target Output**.
This new **Current Prompt** cannot contain specific information from **Input**,
**Generated Output**, **Target Output** or **Previous context** since it's the
base prompt that would be used for different specific contexts.


In [28]:
# Prompt engineer
evaluator_prompt = f"""
You are an expert prompt engineer.
Evaluate the prompt sent to an LLM, the output generated by the LLM, and the target output that should be generated.
The target outputs are: {targets}

You should be evaluating, and propose an improved prompt.
Output your evaluation concisely in the following format.

<evaluation>PASS, NEEDS_IMPROVEMENT, or FAIL</evaluation>
<feedback>Improved prompt.</feedback>
"""

# Base prompt to tune
generator_prompt = """
Your goal is to complete the task based on <user input>. If there are feedback
from your previous generations, you should reflect on them to improve your solution

Output your answer concisely in the following format:

<thoughts>
[Your understanding of the task and feedback and how you plan to improve]
</thoughts>

<response>
[Your answer here]
</response>
"""

task = f"""
<user input>
Summarize the following articles: {articles}
</user input>
"""

result, chain_of_thought = loop(task, evaluator_prompt, generator_prompt, dct_params=dct_params)


=== INPUT START ===
Full prompt:

Your goal is to complete the task based on <user input>. If there are feedback
from your previous generations, you should reflect on them to improve your solution

Output your answer concisely in the following format:

<thoughts>
[Your understanding of the task and feedback and how you plan to improve]
</thoughts>

<response>
[Your answer here]
</response>

Task: 
<user input>
Summarize the following articles: ['Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation.\nWorkers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders.\nThe Welsh Government said more people than ever were getting help to address housing problems.\nChanges to the Housing Act in Wales, introduced in 2015, removed the right for prison leavers to be given priority for accommodation.\nPrison Link Cymru, which helps people find accommodation a

In [19]:
len(articles)

15

### 1.7. Prompt tunining

In [6]:
import torch
import time
from huggingface_hub import InferenceClient
from transformers import AutoModelForCausalLM, AutoTokenizer

def llm_call(
    prompt: str,
    system_prompt: str = "",
    model: str = "Qwen/Qwen2.5-72B-Instruct",
    max_new_tokens: int = 1000,
    temperature: float = 0.1,
    return_full_text: bool = False,
    ) -> str:
    """
    Calls the model with the given prompt and returns the response.

    NOTE: Uses HF Inference API

    Args:
        prompt (str): The user prompt to send to the model.
        system_prompt (str, optional): The system prompt to send to the model. Defaults to "".
        model (str, optional): The model to use for the call. Defaults to "claude-3-5-sonnet-20241022".

    Returns:
        str: The response from the language model.
    """
    dct_params = {
        'model': model,
        'max_new_tokens': max_new_tokens,
        'temperature': temperature,
        'return_full_text': return_full_text
        }
    client = InferenceClient()
    if system_prompt != "":
        input_prompt = system_prompt + '\n\n' + prompt
    else:
        input_prompt = prompt
    response = client.text_generation(input_prompt, **dct_params)
    return response

import os
import re

def extract_xml(text: str, tag: str) -> str:
    """
    Extracts the content of the specified XML tag from the given text. Used for parsing structured responses

    Args:
        text (str): The text containing the XML.
        tag (str): The XML tag to extract content from.

    Returns:
        str: The content of the specified XML tag, or an empty string if the tag is not found.
    """
    match = re.search(f'<{tag}>(.*?)</{tag}>', text, re.DOTALL)
    return match.group(1) if match else ""

In [10]:
from typing import List, Dict, Callable

def execute_task(prompt: str, task: str, context: str, dct_params: dict) -> tuple[str, str]:
    """Execute a task using the given prompt and context."""
    full_prompt = f"{prompt}\n{context}\nTask: {task}" if context else f"{prompt}\nTask: {task}"
    print("\n=== TASK EXECUTION INPUT START ===")
    print(f"Full prompt:\n{full_prompt}\n")
    print("\n=== TASK EXECUTION INPUT END ===")

    response = llm_call(full_prompt, **dct_params)
    thoughts = extract_xml(response, "thoughts")
    result = extract_xml(response, "response")

    print("\n=== TASK EXECUTION OUTPUT START ===")
    print(f"Thoughts:\n{thoughts}\n")
    print(f"Generated:\n{result}")
    print("=== TASK EXECUTION OUTPUT END ===\n")

    return thoughts, result

def refine_prompt(prompt: str, outputs: List[str], memory: List[str], target: str, dct_params: dict) -> str:
    """Refine the system prompt based on outputs, memory, and target."""
    context = "\n".join([
        "Generated outputs:",
        *[f"- {output}" for output in outputs],
        "\nMemory:",
        *[f"- {m}" for m in memory],
        f"\nTarget output: {target}"
    ])

    full_prompt = f"{prompt}\n{context}"

    print("\n=== PROMPT ENGINEERING INPUT START ===")
    print(f"Full prompt:\n{full_prompt}\n")
    print("\n=== PROMPT ENGINEERING INPUT END ===")

    response = llm_call(full_prompt, **dct_params)
    refined_prompt = extract_xml(response, "refined_prompt")

    print("\n=== PROMPT ENGINEERING OUTPUT START ===")
    print(f"Refined Prompt:\n{refined_prompt}")
    print("=== PROMPT ENGINEERING OUTPUT END ===\n")

    return refined_prompt

def iterative_task_execution(
    task: str,
    initial_prompt: str,
    engineer_prompt: str,
    target_output: str,
    dct_params: dict,
    n_max_iter: int = 5,
    debug_mode: bool = True
) -> tuple[str, list[dict]]:
    """Iteratively execute and refine the task until the target is achieved."""
    memory = []
    chain_of_thought = []

    current_prompt = initial_prompt
    context = ""

    for iteration in range(n_max_iter):
        print(f"\n=== ITERATION {iteration + 1} START ===")

        # Execute the task
        thoughts, result = execute_task(current_prompt, task, context, dct_params)
        memory.append(result)
        chain_of_thought.append({"iteration": iteration + 1, "thoughts": thoughts, "result": result})

        if result.strip() == target_output.strip():
            print("\n=== TARGET ACHIEVED ===")
            return result, chain_of_thought

        # Refine the prompt using the prompt engineer
        current_prompt = refine_prompt(engineer_prompt, [result], memory, target_output, dct_params)

        # Update context for the next iteration
        context = "\n".join([
            "Previous results:",
            *[f"- {m}" for m in memory]
        ])

        # Optional debug delay
        if debug_mode:
            time.sleep(20)

    print("\n=== MAX ITERATIONS REACHED ===")
    return memory[-1] if memory else "", chain_of_thought

In [None]:
from datasets import load_dataset

# Example usage
def main():
    task = "Generate a summary of the given text."
    initial_prompt = "You are a summarization expert. Generate a concise summary."
    engineer_prompt = "You are a prompt engineering expert. Refine the prompt for better task execution."
    target_output = "This is the ideal summary."
    dct_params = {
      #'model': "Qwen/Qwen2.5-72B-Instruct",
      #'model': "microsoft/Phi-3.5-mini-instruct",
      'model': "Qwen/Qwen2.5-Coder-32B-Instruct",
      'max_new_tokens': 6000,
      'temperature': 0.1,
      'return_full_text': False
    }

    final_output, history = iterative_task_execution(
        task=task,
        initial_prompt=initial_prompt,
        engineer_prompt=engineer_prompt,
        target_output=target_output,
        dct_params=dct_params
    )

    print("\n=== FINAL OUTPUT ===")
    print(final_output)

    print("\n=== HISTORY ===")
    for entry in history:
        print(entry)

In [39]:
from datasets import load_dataset

def main():
    # Load a sample from the HuggingFace xsum dataset
    dataset = load_dataset("xsum", split="train[:5]")

    for idx, sample in enumerate(dataset):
        print(f"\n=== PROCESSING SAMPLE {idx + 1} ===")
        document = sample["document"]
        target_summary = sample["summary"]

        task = f"Generate a summary for the following document:\n{document}"
        initial_prompt = "You are a summarization expert. Generate a concise summary."
        engineer_prompt = "You are a prompt engineering expert. Refine the prompt for better task execution."
        dct_params = {
          #'model': "Qwen/Qwen2.5-72B-Instruct",
          #'model': "microsoft/Phi-3.5-mini-instruct",
          'model': "Qwen/Qwen2.5-Coder-32B-Instruct",
          'max_new_tokens': 6000,
          'temperature': 0.1,
          'return_full_text': False
        }

        final_output, history = iterative_task_execution(
            task=task,
            initial_prompt=initial_prompt,
            engineer_prompt=engineer_prompt,
            target_output=target_summary,
            dct_params=dct_params
        )

        print("\n=== FINAL OUTPUT ===")
        print(final_output)

        print("\n=== HISTORY ===")
        for entry in history:
            print(entry)

In [40]:
main()

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.



=== PROCESSING SAMPLE 1 ===

=== ITERATION 1 START ===

=== TASK EXECUTION INPUT START ===
Full prompt:
You are a summarization expert. Generate a concise summary.
Task: Generate a summary for the following document:
The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed.
Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water.
Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct.
Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town.
First Minister Nicola Sturgeon visited the area to inspect the damage.
The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thoroughfare.
Jeanette Tate, who owns the Cinnamon Cafe which was badly affected, said she could not fault the multi-agency response once the flood hit.
However, she said more pr

HfHubHTTPError: 500 Server Error: Internal Server Error for url: https://api-inference.huggingface.co/models/Qwen/Qwen2.5-Coder-32B-Instruct (Request ID: TS_b2qHTh4nK-WfYAFUKI)

Model too busy, unable to get response in less than 60 second(s)

In [13]:
from datasets import load_dataset

def main():
    # Load a batch from the HuggingFace xsum dataset
    dataset = load_dataset("xsum", split="train[:5]")

    documents = [sample["document"] for sample in dataset]
    target_summaries = [sample["summary"] for sample in dataset]

    initial_prompt = "You are a summarization expert. Generate a concise summary."
    engineer_prompt = "You are a prompt engineering expert. Refine the prompt for better task execution."
    dct_params = {
      #'model': "Qwen/Qwen2.5-72B-Instruct",
      #'model': "microsoft/Phi-3.5-mini-instruct",
      'model': "Qwen/Qwen2.5-Coder-32B-Instruct",
      'max_new_tokens': 6000,
      'temperature': 0.1,
      'return_full_text': False
    }
    tasks = [f"Generate a summary for the following document:\n{doc}" for doc in documents]

    results = []

    for task, target_summary in zip(tasks, target_summaries):
        final_output, history = iterative_task_execution(
            task=task,
            initial_prompt=initial_prompt,
            engineer_prompt=engineer_prompt,
            target_output=target_summary,
            dct_params=dct_params
        )

        results.append({
            "task": task,
            "target_summary": target_summary,
            "final_output": final_output,
            "history": history
        })

    for idx, result in enumerate(results):
        print(f"\n=== RESULT FOR SAMPLE {idx + 1} ===")
        print("Task:", result["task"])
        print("Target Summary:", result["target_summary"])
        print("Final Output:", result["final_output"])
        print("History:", result["history"])


In [14]:
main()

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


Downloading builder script:   0%|          | 0.00/5.76k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/6.24k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.00M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/204045 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/11332 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/11334 [00:00<?, ? examples/s]


=== ITERATION 1 START ===

=== TASK EXECUTION INPUT START ===
Full prompt:
You are a summarization expert. Generate a concise summary.
Task: Generate a summary for the following document:
The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed.
Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water.
Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct.
Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town.
First Minister Nicola Sturgeon visited the area to inspect the damage.
The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thoroughfare.
Jeanette Tate, who owns the Cinnamon Cafe which was badly affected, said she could not fault the multi-agency response once the flood hit.
However, she said more preventative work could have be

KeyboardInterrupt: 

### 1.7. Prompt tuning

In [73]:
import time
from concurrent.futures import ThreadPoolExecutor
from typing import List, Dict, Callable
from datasets import load_dataset

def execute_task(prompt: str, tasks: List[str], context: str, dct_params: dict, debug_mode=True) -> tuple[List[str], List[str]]:
    """Execute a batch of tasks using the given prompt and context."""
    results = []
    thoughts_list = []

    for task in tasks:
        full_prompt = f"{prompt}\n{context}\nTask: {task}" if context else f"{prompt}\nTask: {task}"
        print("\n=== TASK EXECUTION INPUT START ===")
        print(f"Full prompt:\n{full_prompt}\n")
        print("\n=== TASK EXECUTION INPUT END ===")

        response = llm_call(full_prompt, **dct_params)
        thoughts = extract_xml(response, "thoughts")
        result = extract_xml(response, "response")

        print("\n=== TASK EXECUTION OUTPUT START ===")
        print(f"Thoughts:\n{thoughts}\n")
        print(f"Generated:\n{result}")
        print("=== TASK EXECUTION OUTPUT END ===\n")

        thoughts_list.append(thoughts)
        results.append(result)

        # Optional debug delay
        if debug_mode:
            time.sleep(20)

    return thoughts_list, results

def refine_prompt(prompt: str, outputs: List[str], memory: List[str], targets: List[str], dct_params: dict) -> str:
    """Refine the system prompt based on outputs, memory, and targets."""
    context = "\n".join([
        "Generated outputs:",
        *[f"- {output}" for output in outputs],
        "\nMemory:",
        *[f"- {m}" for m in memory],
        "\nTarget outputs:",
        *[f"- {target}" for target in targets]
    ])

    full_prompt = f"{prompt}\n{context}"

    print("\n=== PROMPT ENGINEERING INPUT START ===")
    print(f"Full prompt:\n{full_prompt}\n")
    print("\n=== PROMPT ENGINEERING INPUT END ===")

    response = llm_call(full_prompt, **dct_params)
    refined_prompt = extract_xml(response, "refined_prompt")

    print("\n=== PROMPT ENGINEERING OUTPUT START ===")
    print(f"Refined Prompt:\n{refined_prompt}")
    print("=== PROMPT ENGINEERING OUTPUT END ===\n")

    return refined_prompt

def iterative_task_execution(
    tasks: List[str],
    initial_prompt: str,
    engineer_prompt: str,
    target_outputs: List[str],
    dct_params: dict,
    n_max_iter: int = 5,
    debug_mode: bool = True
) -> tuple[List[str], list[dict]]:
    """Iteratively execute and refine the batch of tasks until the targets are achieved."""
    memory = []
    chain_of_thought = []

    current_prompt = initial_prompt
    context = ""

    for iteration in range(n_max_iter):
        print(f"\n=== ITERATION {iteration + 1} START ===")

        # Execute the tasks
        thoughts_list, results = execute_task(current_prompt, tasks, context, dct_params)
        memory.extend(results)
        chain_of_thought.append({"iteration": iteration + 1, "thoughts": thoughts_list, "results": results})

        if all(result.strip() == target.strip() for result, target in zip(results, target_outputs)):
            print("\n=== TARGETS ACHIEVED ===")
            return results, chain_of_thought

        # Refine the prompt using the prompt engineer
        print(results)
        current_prompt = refine_prompt(engineer_prompt, results, memory, target_outputs, dct_params)

        # Update context for the next iteration
        context = "\n".join([
            "Previous results:",
            *[f"- {m}" for m in memory]
        ])

        # Optional debug delay
        if debug_mode:
            time.sleep(20)

    print("\n=== MAX ITERATIONS REACHED ===")
    return results, chain_of_thought

In [68]:
# Load a batch from the HuggingFace xsum dataset
dataset = load_dataset("xsum", split="train[:5]")
batch = [{"document": sample["document"], "summary": sample["summary"]} for sample in dataset]

initial_prompt = """
You are a summarization expert. Generate a concise summary.

Output your answer concisely in the following format:

<thoughts>
[Your understanding of the task and feedback and how you plan to improve]
</thoughts>

<response>
[Your answer here]
</response>

"""
engineer_prompt = """
You are a prompt engineering expert. Refine the prompt for better task execution.

You should be evaluating, and propose an improved prompt.
Output your evaluation concisely in the following format.

<evaluation>PASS, NEEDS_IMPROVEMENT, or FAIL</evaluation>
<feedback>Improved prompt.</feedback>
"""
dct_params = {
  #'model': "Qwen/Qwen2.5-72B-Instruct",
  'model': "microsoft/Phi-3.5-mini-instruct",
  #'model': "Qwen/Qwen2.5-Coder-32B-Instruct",
  'max_new_tokens': 6000,
  'temperature': 0.1,
  'return_full_text': False
}
max_docs_per_prompt = 1  # Set the maximum number of documents per prompt

# Prepare tasks and target summaries based on max_docs_per_prompt
tasks = []
target_summaries = []

for start_idx in range(0, len(batch), max_docs_per_prompt):
    sub_batch = batch[start_idx:start_idx + max_docs_per_prompt]

    task_text = "Generate a summary for the following documents:"
    for i, doc in enumerate(sub_batch):
        task_text += f"\n{i + 1}. {doc['document']}"
    tasks.append(task_text)

    summary_text = "The target summaries are:"
    for i, sample in enumerate(sub_batch):
        summary_text += f"\n{i + 1}. {sample['summary']}"
    target_summaries.append(summary_text)

In [69]:
# Check 1
print(tasks[0])
print()
print(target_summaries[0])

Generate a summary for the following documents:
1. The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed.
Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water.
Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct.
Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town.
First Minister Nicola Sturgeon visited the area to inspect the damage.
The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thoroughfare.
Jeanette Tate, who owns the Cinnamon Cafe which was badly affected, said she could not fault the multi-agency response once the flood hit.
However, she said more preventative work could have been carried out to ensure the retaining wall did not fail.
"It is difficult but I do think there is so much publicity for Dumfries and the 

In [70]:
# Check 2
execute_task(initial_prompt, tasks[:1], "", dct_params)


=== TASK EXECUTION INPUT START ===
Full prompt:

You are a summarization expert. Generate a concise summary.

Output your answer concisely in the following format:

<thoughts>
[Your understanding of the task and feedback and how you plan to improve]
</thoughts>

<response>
[Your answer here]
</response>


Task: Generate a summary for the following documents:
1. The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed.
Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water.
Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct.
Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town.
First Minister Nicola Sturgeon visited the area to inspect the damage.
The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thoroughfare.
Jeanette Tate, wh

([''],
 ["\nFollowing severe flooding in Newton Stewart, Hawick, and Peeblesshire, the full extent of the damage is still being assessed. The floodwaters breached a retaining wall in Newton Stewart, causing significant damage to commercial properties and businesses. First Minister Nicola Sturgeon and Labour Party's deputy Scottish leader Alex Rowley visited the affected areas to inspect the damage and discuss the response. Despite appreciating the multi-agency response, calls for more preventative measures and faster implementation of flood protection plans have been made. The Scottish Borders Council has listed the worst-affected roads and urged drivers to heed closure signs. A flood alert remains in place, and there are calls for more defences in the area.\n"])

In [74]:
final_outputs, history = iterative_task_execution(
    tasks=tasks[:1],
    initial_prompt=initial_prompt,
    engineer_prompt=engineer_prompt,
    target_outputs=target_summaries[:1],
    n_max_iter=1,
    dct_params=dct_params
)


=== ITERATION 1 START ===

=== TASK EXECUTION INPUT START ===
Full prompt:

You are a summarization expert. Generate a concise summary.

Output your answer concisely in the following format:

<thoughts>
[Your understanding of the task and feedback and how you plan to improve]
</thoughts>

<response>
[Your answer here]
</response>


Task: Generate a summary for the following documents:
1. The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed.
Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water.
Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct.
Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town.
First Minister Nicola Sturgeon visited the area to inspect the damage.
The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thor

HfHubHTTPError: 500 Server Error: Internal Server Error for url: https://api-inference.huggingface.co/models/microsoft/Phi-3.5-mini-instruct (Request ID: WTG-Wp6yY47Cw5uDomtbV)

Model too busy, unable to get response in less than 60 second(s)

In [63]:
full_prompt = """
You are a summarization expert. Generate concise summaries for the following document.
Task: Generate a summary for the following documents:
1. The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed.
Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water.
Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct.
Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town.
First Minister Nicola Sturgeon visited the area to inspect the damage.
The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thoroughfare.
Jeanette Tate, who owns the Cinnamon Cafe which was badly affected, said she could not fault the multi-agency response once the flood hit.
However, she said more preventative work could have been carried out to ensure the retaining wall did not fail.
"It is difficult but I do think there is so much publicity for Dumfries and the Nith - and I totally appreciate that - but it is almost like we're neglected or forgotten," she said.
"That may not be true but it is perhaps my perspective over the last few days.
"Why were you not ready to help us a bit more when the warning and the alarm alerts had gone out?"
Meanwhile, a flood alert remains in place across the Borders because of the constant rain.
Peebles was badly hit by problems, sparking calls to introduce more defences in the area.
Scottish Borders Council has put a list on its website of the roads worst affected and drivers have been urged not to ignore closure signs.
The Labour Party's deputy Scottish leader Alex Rowley was in Hawick on Monday to see the situation first hand.
He said it was important to get the flood protection plan right but backed calls to speed up the process.
"I was quite taken aback by the amount of damage that has been done," he said.
"Obviously it is heart-breaking for people who have been forced out of their homes and the impact on businesses."
He said it was important that "immediate steps" were taken to protect the areas most vulnerable and a clear timetable put in place for flood prevention plans.
Have you been affected by flooding in Dumfries and Galloway or the Borders? Tell us about your experience of the situation and how it was handled. Email us on selkirk.news@bbc.co.uk or dumfries@bbc.co.uk.
"""

response = llm_call(full_prompt, **dct_params)

In [64]:
print(response)


Summary:
Flooding in Newton Stewart and surrounding areas has caused significant damage, with repair work ongoing in Hawick and Peeblesshire. The west coast mainline faces disruption due to damage at the Lamington Viaduct. The First Minister visited the affected area, while businesses and households suffered losses. There are calls for more preventative measures and a clear timetable for flood prevention plans.




In [47]:
execute_task(initial_prompt, tasks[:1], "", dct_params)


=== TASK EXECUTION INPUT START ===
Full prompt:
You are a summarization expert. Generate concise summaries for the following documents.
Task: Generate a summary for the following documents:
1. The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed.
Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water.
Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct.
Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town.
First Minister Nicola Sturgeon visited the area to inspect the damage.
The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thoroughfare.
Jeanette Tate, who owns the Cinnamon Cafe which was badly affected, said she could not fault the multi-agency response once the flood hit.
However, she said more preventative work could ha

([''], [''])


=== ITERATION 1 START ===

=== TASK EXECUTION INPUT START ===
Full prompt:
You are a summarization expert. Generate a concise summary.
Task: Generate a summary for the following documents:
 1. The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed.
Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water.
Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct.
Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town.
First Minister Nicola Sturgeon visited the area to inspect the damage.
The waters breached a retaining wall, flooding many commercial properties on Victoria Street - the main shopping thoroughfare.
Jeanette Tate, who owns the Cinnamon Cafe which was badly affected, said she could not fault the multi-agency response once the flood hit.
However, she said more preventative work could ha

KeyboardInterrupt: 