# Prompt Testing for Single Agent System


In [1]:
"""
This file is used to test different prompts based on the LLMs response.
For this case, the single agent scenario is evaluated. 

Important Notes on Constructing Prompts for the Single Agent below.

TOOLS AVAILABLE:

    * get_wear -> function to get the torque wear:
        INPUTS: 
            - type of material, the LLM should be able to associate the periodic correct letter for it! [P,N,K]
            - feed rate, a float value in [mm/min] units
            - cutting speed, also a float value in [m/min] units

    * get_quality -> function to determine GOOD/BAD quality:
        INPUTS:
            - cooling, a float value in [%] units
            - feed rate, a float value in [mm/min] units
            - cutting speed, a float value in [m/min] units
            # drill bit material, only necessary for CCF criteria. [N, H, K]
 

BEHAVIOUR OF USER + LLM:

    The user can make well structured questions, but the goal is to evaluate the LLM for unexperienced users. 
    Below are listed some of the "mistake" that can be present in a query, and the LLM should be able to deal with them,
    given the correct prompt.

    1) Missing values for mandatory parameters:
        Besides the drill bit material, all the other values are needed for their correspondent tool
        The LLM should:
            1.1) Either tell the user that some values are missing and request them.
            1.2) Tell the user that the values were missing and that it used some default ones.

        We need to evaluate if 1.1) or 1.2) is better overall.

    2) Missing value for drill bit material:
        If the user does not provide the value for this when asking about quality.
        The LLM should tell the user that it only can perform BEF criteria and not CCF. 
        The LLM should retrieve the BEF quality and ask if the user wants to also get the CCF by providing this.

        If the user asks about wear tool, then the LLM should not ask for drill bit material at anypoint.

    3) The user gives the values without SI units or wrong SI units:
        The LLM should tell the user that it is assuming the units defined in the tool,
        or be able to correctly convert this values to the right units used by the tool.
"""

'\nThis file is used to test different prompts based on the LLMs response.\nFor this case, the single agent scenario is evaluated. \n\nImportant Notes on Constructing Prompts for the Single Agent below.\n\nTOOLS AVAILABLE:\n\n    * get_wear -> function to get the torque wear:\n        INPUTS: \n            - type of material, the LLM should be able to associate the periodic correct letter for it! [P,N,K]\n            - feed rate, a float value in [mm/min] units\n            - cutting speed, also a float value in [m/min] units\n\n    * get_quality -> function to determine GOOD/BAD quality:\n        INPUTS:\n            - cooling, a float value in [%] units\n            - feed rate, a float value in [mm/min] units\n            - cutting speed, a float value in [m/min] units\n            # drill bit material, only necessary for CCF criteria. [N, H, K]\n\n\nBEHAVIOUR OF USER + LLM:\n\n    The user can make well structured questions, but the goal is to evaluate the LLM for unexperienced use

1) Prompts For Only 1 Agent

In [2]:
SINGLE_AGENT_PROMPTS = {

    "Zero-Shot Prompts": [
        ("You are a helpful assistant. \
            You have access to special tools for estimating drill wear and evaluating drilling quality. \
            However, you should only use these tools if the user explicitly \
            asks you to perform a calculation for drill wear or drilling quality. \
            If the user asks general questions or greets you, respond directly without using any tools. \
            Additionally, if the user does not provide the required parameters for tool calculations, \
            inform them that the values are missing and ask for them. \
            If the user omits the drill bit material, tell them that you can only perform the BEF criteria \
            and that you need the drill bit material for CCF criteria. \
            Assume default values if necessary and clarify the units being used."
        ),

        ("If the user does not provide mandatory parameters (feed rate, cutting speed, etc.), \
         kindly ask for the missing information or inform the user that default values will be used. \
         If the query does not specify units for the values, assume standard \
         SI units or clarify the units you're using. \
         Always communicate any assumptions you make to the user."),
    ],
    
    "Few-Shot Prompts": [
        ("You are a helpful assistant responsible for estimating tool wear or evaluating drilling quality based on user input. Here's an example of what you should do:\n\n"
        "User: 'I want to estimate the wear of my tool. I'm using material P with a feed rate of 0.2 and cutting speed of 100.'\n"
        "Assistant: 'Great! I will now calculate the torque wear using material P, feed rate of 0.2, and cutting speed of 100. Please hold on.'\n\n"
        "If the user does not provide values for feed rate or cutting speed, politely inform them of the missing information and ask for the missing values."),
        
        ("Example 2:\n\n"
        "User: 'Can you help me determine the quality of my drilling process? I'm using cooling of 12.5, feed rate 0.13, and cutting speed 16.7.'\n"
        "Assistant: 'To calculate the quality of your drilling, I will evaluate using the BEF criteria since the drill bit material wasn't provided. If you'd like, I can calculate the CCF quality if you provide the material.'\n\n"
        "If any value is missing, prompt the user to provide the missing data or explain that default values will be used."),
    ],
    
    "Chain-of-Thought Prompts": [
        ("You have several tools at your disposal, such as the ones for estimating wear and evaluating drilling quality. When the user queries you, think through the problem step by step.\n\n"
        "For example, if the user asks for wear estimation, you first need to check if the required parameters (material, feed rate, cutting speed) are provided. If they are, use the tool. If any are missing, ask the user for them.\n\n"
        "Similarly, if the user asks for drilling quality evaluation, check if the necessary values (cooling, feed rate, cutting speed) are available. If the drill bit material is missing, explain the limitation of the BEF criteria and ask for the material to calculate CCF quality."),
        
        ("If a parameter is missing or if the units are unclear, always inform the user about the issue. For example, if the feed rate is provided without SI units, tell the user you're assuming SI units unless otherwise specified. You should also make sure the user understands any assumptions you make during the process."),
    ],

    "Plan and Solve Prompts": [
        ("When the user asks a question, follow these steps:\n\n"
        "1. First, identify what the user is asking for: whether it's a wear estimation or quality evaluation.\n"
        "2. Then, check if all required parameters are provided (e.g., feed rate, cutting speed, etc.).\n"
        "3. If any parameter is missing, either ask the user for the missing data or inform them that default values will be used.\n"
        "4. If the drill bit material is missing, explain that only BEF quality can be evaluated and ask for the material if they want the CCF quality.\n"
        "5. Use the corresponding tool for the calculation once you have all the necessary parameters and explain your process to the user."),
        
        ("If the user provides units that aren't in SI or specifies incorrect units, inform them that you're assuming standard SI units unless otherwise stated. Once all inputs are confirmed, proceed with the calculation and provide the result. If the input is incomplete, ask for the missing values or clarify the units."),
    ],

}

2) User Queries by Difficult

In [3]:
user_queries = {

    # EASY: These queries are straightforward, with correct values and units for material, feed rate, cutting speed, and cooling. No ambiguity in the input, designed to test if the system can handle perfectly structured queries.
    "EASY": [
        ("I want to estimate the wear of my model. I am going to use the material 'P' and I want to have a feed rate value of 0.2 and cutting speed of 100."), 
        ("Can you help me evaluate the quality of my drilling process? I am using cooling of 12.5, feed rate of 0.13, and cutting speed of 16.7."), 
        ("Please estimate the wear for material 'N' with a feed rate of 0.3 and a cutting speed of 120.")
    ],


    # MEDIUM: These scenarios involve:
        # Unit conversion: The feed rate and cutting speed values are given in non-SI units (meters per minute, inches per minute). 
        # The LLM needs to convert these to the standard units used in the system (SI).

        # Missing values: Some queries are missing parameters (e.g., feed rate or cutting speed), and the LLM has to either ask for the missing values or use defaults.
    "MEDIUM": [
        ("I want to estimate the wear of my model using material 'K', feed rate of 0.00025 (in meters per minute), and cutting speed of 100 (in mm/min)."),  # Unit conversion test
        ("Can you calculate the drilling quality for cooling 10.5, feed rate 0.2 (in inches/min), and cutting speed 10 (in inches/min)?"),  # Unit conversion test
        ("I want to evaluate my drilling quality with feed rate 0.15 and cutting speed 75, but I forgot to mention the cooling value. Could you assume cooling 15?"),  # Missing parameter, assume default
        ("Estimate wear for material 'P'. I provided the cutting speed 100, but I forgot the feed rate. Can you assume a default of 0.2?"),  # Missing parameter, assume default
        
    ],

    # HARD: These queries require both tools (get_wear and get_quality) to be used together. 
    # Some of the queries also introduce missing values, which will test if the LLM can handle 
    # situations where it has to infer missing information (e.g., assuming a drill bit material).
    "HARD": [
        ("Can you calculate both the wear and the quality of my model? The material is 'P', feed rate is 0.3, cutting speed is 120, and cooling is 12.5. Also, assume the drill bit is 'N' for quality evaluation."),  # Using both tools
        ("I want to estimate the wear for material 'K' with a feed rate of 0.25 and cutting speed of 100, but I also want to know the quality of my drilling with cooling 10. Can you calculate both for me?"),  # Using both tools
        ("I need to evaluate both wear and quality for my model. I'm using material 'N', feed rate of 0.2, and cutting speed of 80, but I forgot the drill bit material for quality. Can you assume 'H' and 10.0 for cooling and calculate both?")  # Missing drill bit material, both tools
    ]
}

user_queries = {

    "EASY": [
        ("I want to estimate the wear of my model. I am going to use the material 'P' and I want to have a feed rate value of 0.2 and cutting speed of 100."), 
        ("Can you help me evaluate the quality of my drilling process? I am using cooling of 12.5, feed rate of 0.13, and cutting speed of 16.7."), 
        ("Please estimate the wear for material 'N' with a feed rate of 0.3 and a cutting speed of 120.")
    ],

}



3) Run Code

In [4]:
# True to see the logic behind the LLM
# False to just get final response

DETAILED_RESPONSES = False

In [5]:
## DO NOT CHANGE ANYTHING
import json

from langchain_ollama import ChatOllama
from langchain_ollama.llms import OllamaLLM
from Tools_LLM import get_wear, get_quality
from langchain.agents import create_agent
import time

llm = ChatOllama(
    model="granite3.3:2b",
    temperature=0.2,
)

time_elapsed = {}

LLM_RESPONSES = {}

for i in range(len(SINGLE_AGENT_PROMPTS.keys())):
    #iterate over prompts for the single agent

    prompt_type = list(SINGLE_AGENT_PROMPTS.keys())[i]
    print("PROMPT TYPE: ", prompt_type)  # Or use this value as needed
    

    time_elapsed[prompt_type] = {}

    LLM_RESPONSES[prompt_type] = []

    for prompt in SINGLE_AGENT_PROMPTS[prompt_type]:

        agent = create_agent(llm, tools=[get_wear, get_quality], system_prompt=prompt)

        for level in user_queries.keys():

            start_time = time.time()
            #iterate over user queries
            print("Level of Query: ", level)
            for query in user_queries[level]:

                print("USER QUERY: ", query)

                if DETAILED_RESPONSES == True:
                    #get the response from the AGENT with all informationa about Tool calls
                    for step in agent.stream({"messages": [{"role": "user", "content": query}]}):
                        for update in step.values():
                            for message in update.get("messages", []):
                                message.pretty_print()

                else:
                    final_response = agent.invoke({
                        "messages": [{"role": "user", "content": query}]
                    })
                    

                    # Extract AI's response
                    ai_response = None
                    for message in final_response['messages']:
                        if "AIMessage" in str(type(message)):  # Check if the message is from the AI
                            ai_response = message.content

                    if ai_response:
                        print(ai_response)  # Print only the AI's response
                    else:
                        print("No AI response found.")  # Handle case if AI message is not found
                    
                    LLM_RESPONSES[prompt_type].append(ai_response)


            end_time = time.time()
            duration = end_time - start_time
            
            time_elapsed[prompt_type][f"id: {str(i)}"] = duration 

with open("granite_answers_EASY.json", "w") as f:
    json.dump(LLM_RESPONSES, f, indent=4)

with open("granite_answers_TIME_EASY.json", "w") as f:
    json.dump(time_elapsed, f, indent=4)

PROMPT TYPE:  Zero-Shot Prompts
Level of Query:  EASY
USER QUERY:  I want to estimate the wear of my model. I am going to use the material 'P' and I want to have a feed rate value of 0.2 and cutting speed of 100.
Based on the parameters you provided, your drill bit's wear is estimated as follows:
- Safe: 81.9%
- Fail: 2.7%
This indicates that under your current feed rate of 0.2 mm/min and cutting speed of 100 m/min, drilling material 'P', the drill bit is expected to perform well with a low probability of failure.
USER QUERY:  Can you help me evaluate the quality of my drilling process? I am using cooling of 12.5, feed rate of 0.13, and cutting speed of 16.7.
Of course, I can assist with that. However, to perform a comprehensive evaluation, I would also need the material you're drilling into. Could you provide that information? Also, please note that if the drill bit material is not specified, I'll assume it's for Basic Efficiency Factor (BEF) criteria calculation and won't be able to 

4) Evaluation of Responses

In [6]:
print(time_elapsed)

{'Zero-Shot Prompts': {'id: 0': 33.88163113594055}, 'Few-Shot Prompts': {'id: 1': 35.244433641433716}, 'Chain-of-Thought Prompts': {'id: 2': 47.27047109603882}, 'Plan and Solve Prompts': {'id: 3': 45.09834170341492}}
