# Prompt Testing for Single Agent System


In [None]:
"""
This file is used to test different prompts based on the LLMs response.
For this case, the single agent scenario is evaluated. 

Important Notes on Constructing Prompts for the Single Agent below.

TOOLS AVAILABLE:

    * get_wear -> function to get the torque wear:
        INPUTS: 
            - type of material, the LLM should be able to associate the periodic correct letter for it! [P,N,K]
            - feed rate, a float value in [mm/min] units
            - cutting speed, also a float value in [m/min] units

    * get_quality -> function to determine GOOD/BAD quality:
        INPUTS:
            - cooling, a float value in [%] units
            - feed rate, a float value in [mm/min] units
            - cutting speed, a float value in [m/min] units
            # drill bit material, only necessary for CCF criteria. [N, H, K]
 

BEHAVIOUR OF USER + LLM:

    The user can make well structured questions, but the goal is to evaluate the LLM for unexperienced users. 
    Below are listed some of the "mistake" that can be present in a query, and the LLM should be able to deal with them,
    given the correct prompt.

    1) Missing values for mandatory parameters:
        Besides the drill bit material, all the other values are needed for their correspondent tool
        The LLM should:
            1.1) Either tell the user that some values are missing and request them.
            1.2) Tell the user that the values were missing and that it used some default ones.

        We need to evaluate if 1.1) or 1.2) is better overall.

    2) Missing value for drill bit material:
        If the user does not provide the value for this when asking about quality.
        The LLM should tell the user that it only can perform BEF criteria and not CCF. 
        The LLM should retrieve the BEF quality and ask if the user wants to also get the CCF by providing this.

        If the user asks about wear tool, then the LLM should not ask for drill bit material at anypoint.

    3) The user gives the values without SI units or wrong SI units:
        The LLM should tell the user that it is assuming the units defined in the tool,
        or be able to correctly convert this values to the right units used by the tool.
"""

1) Prompts For Only 1 Agent

In [1]:
SINGLE_AGENT_PROMPT = """

You have several tools at your disposal. When needed, please use themm to respond to the user's query.
You need to think step by step, such as:
    1) Evaluate what tools are necessary.
    2) Check if all the values are given by the user.
    3) Respond after obtaining the final response.

If the user gives values in another units, you need to convert them into the right units used by the tools. 
Additionally, if there is some value missing, please tell the user that and refrain from calculating without this value.

Lastly, make sure that you respond to all the user's questions and only if needed use both tools for the same answer.
"""

2) User Queries by Difficult

In [2]:
user_queries = {

    # EASY: These queries are straightforward, with correct values and units for material, feed rate, cutting speed, and cooling. No ambiguity in the input, designed to test if the system can handle perfectly structured queries.
    "EASY": [
        ("I want to estimate the wear of my model. I am going to use the material 'P' and I want to have a feed rate value of 0.2 and cutting speed of 100."), 
        ("Can you help me evaluate the quality of my drilling process? I am using cooling of 12.5, feed rate of 0.13, and cutting speed of 16.7."), 
        ("Please estimate the wear for material 'N' with a feed rate of 0.3 and a cutting speed of 120.")
    ],


    # MEDIUM: These scenarios involve:
        # Unit conversion: The feed rate and cutting speed values are given in non-SI units (meters per minute, inches per minute). 
        # The LLM needs to convert these to the standard units used in the system (SI).

        # Missing values: Some queries are missing parameters (e.g., feed rate or cutting speed), and the LLM has to either ask for the missing values or use defaults.
    "MEDIUM": [
        ("I want to estimate the wear of my model using material 'K', feed rate of 0.00025 (in meters per minute), and cutting speed of 100 (in mm/min)."),  # Unit conversion test
        ("Can you calculate the drilling quality for cooling 10.5, feed rate 0.2 (in inches/min), and cutting speed 10 (in inches/min)?"),  # Unit conversion test
        ("I want to evaluate my drilling quality with feed rate 0.15 and cutting speed 75, but I forgot to mention the cooling value. Could you assume cooling 15?"),  # Missing parameter, assume default
        ("Estimate wear for material 'P'. I provided the cutting speed 100, but I forgot the feed rate. Can you assume a default of 0.2?"),  # Missing parameter, assume default
        
    ],

    # HARD: These queries require both tools (get_wear and get_quality) to be used together. 
    # Some of the queries also introduce missing values, which will test if the LLM can handle 
    # situations where it has to infer missing information (e.g., assuming a drill bit material).
    "HARD": [
        ("Can you calculate both the wear and the quality of my model? The material is 'P', feed rate is 0.3, cutting speed is 120, and cooling is 12.5. Also, assume the drill bit is 'N' for quality evaluation."),  # Using both tools
        ("I want to estimate the wear for material 'K' with a feed rate of 0.25 and cutting speed of 100, but I also want to know the quality of my drilling with cooling 10. Can you calculate both for me?"),  # Using both tools
        ("I need to evaluate both wear and quality for my model. I'm using material 'N', feed rate of 0.2, and cutting speed of 80, but I forgot the drill bit material for quality. Can you assume 'H' and 10.0 for cooling and calculate both?")  # Missing drill bit material, both tools
    ]
}





3) Run Code

In [3]:
# True to see the logic behind the LLM
# False to just get final response

DETAILED_RESPONSES = False

In [4]:
## DO NOT CHANGE ANYTHING
import json

from langchain_ollama import ChatOllama
from langchain_ollama.llms import OllamaLLM
from Tools_LLM import get_wear, get_quality
from langchain.agents import create_agent
import time

llm = ChatOllama(
    model="granite3.3:2b",
    temperature=0.2,
)

agent = create_agent(llm, tools=[get_wear, get_quality], system_prompt=SINGLE_AGENT_PROMPT)

time_elapsed = {}

LLM_RESPONSES = {}


for level in user_queries.keys():

     

    start_time = time.time()
    #iterate over user queries
    print("Level of Query: ", level)

    LLM_RESPONSES[level] = []
    for query in user_queries[level]:

        print("USER QUERY: ", query)

        if DETAILED_RESPONSES == True:
            #get the response from the AGENT with all informationa about Tool calls
            for step in agent.stream({"messages": [{"role": "user", "content": query}]}):
                for update in step.values():
                    for message in update.get("messages", []):
                        message.pretty_print()

        else:
            final_response = agent.invoke({
                "messages": [{"role": "user", "content": query}]
            })
            

            # Extract AI's response
            ai_response = None
            for message in final_response['messages']:
                if "AIMessage" in str(type(message)):  # Check if the message is from the AI
                    ai_response = message.content

            if ai_response:
                print(ai_response)  # Print only the AI's response
            else:
                print("No AI response found.")  # Handle case if AI message is not found
            
            LLM_RESPONSES[level].append(ai_response)


    end_time = time.time()
    duration = end_time - start_time
    
    time_elapsed[level] = duration 

with open("SA_FINAL.json", "w") as f:
    json.dump(LLM_RESPONSES, f, indent=4)

with open("SA_FINAL_TIME.json", "w") as f:
    json.dump(time_elapsed, f, indent=4)

Level of Query:  EASY
USER QUERY:  I want to estimate the wear of my model. I am going to use the material 'P' and I want to have a feed rate value of 0.2 and cutting speed of 100.
USER QUERY:  Can you help me evaluate the quality of my drilling process? I am using cooling of 12.5, feed rate of 0.13, and cutting speed of 16.7.
Based on the drilling parameters you provided - cooling of 12.5%, feed rate of 0.13 mm/min, and cutting speed of 16.7 m/min - your drilling quality can be evaluated as follows:
- Edge Quality will likely be Bad due to insufficient cooling.
- Compression Chips Quality is expected to be Good because the parameters fall within a reasonable range for chip formation.
USER QUERY:  Please estimate the wear for material 'N' with a feed rate of 0.3 and a cutting speed of 120.
The estimated wear for material 'N' with a feed rate of 0.3 and a cutting speed of 120 is as follows:
- Safe: 77.9%
- Fail: 3.4%
Level of Query:  MEDIUM
USER QUERY:  I want to estimate the wear of my

4) Evaluation of Responses

In [None]:
print(time_elapsed)