### Ideas for Evaluation
Main goal is creating a reliable and replicable system that can evaluate the LLM outputs/responses and also determine which prompts work better for our task compared to others.

- LLM Judges
- Check if XRIF Json is valid json
- Response Quality
- Try many different requests, add new locations, modify information relevant to locations?
- Try running a large loop of requests all using the same prompt prefix, and see which ones do well and which ones don't. Can your eval function properly identify good vs bad outputs?
- Build some skills/tool specs and test those

add timing to tasks


### Relevant LLM concepts to know
- Skills / Function Calling (Cohere calls them Tools)
https://docs.cohere.com/v2/docs/tools
- Structured Outputs -> https://docs.cohere.com/v2/docs/structured-outputs
- AI Agents
- RAG -> Retrieval Augmented Generation

In [None]:
%pip install -r requirements.in

In [None]:
%pip freeze requirements.txt

In [39]:
import cohere
import pandas as pd
import json
import os

In [40]:
COHERE_API_KEY = os.getenv("COHERE_API_KEY")

In [41]:
cohere_client = cohere.ClientV2(api_key=COHERE_API_KEY)

In [42]:
def prompt_cohere(prompt, model_name = "command-r-plus-08-2024"):
    global cohere_client
    response = cohere_client.chat(
        model=model_name,
        messages=[
            {
                "role" : "user",
                "content" : prompt
            }
        ],
        response_format={"type": "json_object"},
    )
    return response.message.content[0].text

In [43]:
prompt_prefix_1 = """
You control a robot that can navigate through a building based on a json instruction format,
you understand several waypoints that have been given to you before (you can use RAG to retrieve
what room numbers or waypoints correspond to which people or semantics).

Here are all the waypoints you have access to:
- E7 1st floor Elevators, floor: 1
- E7 North Entrance, floor: 1
- E7 East Entrance, floor: 1
- E7 South Entrance, floor: 1
- E7 Coffee and Donuts, floor: 1
- Outreach Classroom, floor: 1
- RoboHub Entrance, floor: 1
- Vending Machine, floor: 1
- Room 2106, floor: 2

Extra information:
- Room 2106 is Zach's office
- Coffee and Donuts is referred to as CnD

Prompt: Can you pick something up from Zach's office and drop it off at the RoboHub?

Answer:
{
    "goals": [
        {
            "name": "Room 2106",
            "floor": 1
        },
        {
            "name": "RoboHub Entrance",
            "floor": 1
        }
    ]
}

Prompt:
"""

In [44]:
waypoints_df = pd.read_csv("/Users/amaankler/Documents/FYDP/cookbook/experiments/datasets/uw_e7_floor1.csv")
waypoints_list_str = json.dumps(waypoints_df.to_dict(orient="records"), indent=4)

e7_xrif_with_actions_3 = """
# Role
You control a robot that can navigate through a building based on a json instruction format, you understand several waypoints that have been provided in your context.
Your goal is to interperate user's requests into this JSON instruction format.


# Context
Here are all the waypoints you have access to:
{waypoints_list}

Here are all the Functions you have access to:
Navigate, speak and wait


# Understanding user prompts
when you are asked to get something make sure you speak in the exact location that the item is procured.
if someone states their location  and you are requested to go perform a task go there after you recive the item they are requesting or performed the task they are requestion.
if a waypoint is not in the list that is given or not like a waypoint in that list say this location does not exist.


# Examples
Example Prompt: Can you pick something up from the c and d and drop it off at the RoboHub?

Example Answer:
{{
    "actions": [
        {{
            "action": "navigate",
            "input": {{
                "name": "C&D - Coffee Area",
                "x": 100,
                "y": 100,

            }}
        }},
        {{
            "action": "navigate",
            "input": {{
                "name": "RoboHub Entrance",
                "x": 25,
                "y": 50,
            }}
        }}
    ]
}}

Example Prompt: Can you ask Zach for the keys he is at the robohub and drop it off at the ideas clinic? Wait for 10 seconds when you meet Zach so he can give you the keys.

Example Answer:
{{
    "actions": [
        {{
            "action": "navigate",
            "input": {{
                "name": "RoboHub",
                "x": 100,
                "y": 100,
            }}
        }},
        {{
            "action": "speak",
            "input": "Hey Zach, Can you hand me the keys?"
        }},
        {{
            "action": "wait",
            "input": 10
        }},
        {{
            "action": "navigate",
            "input": {{
                "name": "Room 1427 - Ideas Clinic",
                "x": 25,
                "y": 75,
            }}
        }}
    ]
}}
Example Prompt: can you go to the dairy queen

Example Answer:
{{
  "actions": [
    {{
      "action": "speak",
      "input": "This waypoint does not exist"
    }}
  ]
}}

Prompt: {{query}}
"""

final_prompt = e7_xrif_with_actions_3.format(
    waypoints_list=waypoints_list_str,)


In [45]:
waypoints_df = pd.read_csv("/Users/amaankler/Documents/FYDP/cookbook/experiments/datasets/uw_e7_floor1.csv")
waypoints_list_str = json.dumps(waypoints_df.to_dict(orient="records"), indent=4)

e7_xrif_with_actions_4 = """
# Role
You control a robot that navigates through a building using a JSON instruction format. You have access to several pre-defined waypoints and functions. Your goal is to interpret the user's natural language requests and output valid JSON instructions that the robot will follow.

# JSON Output Format
Each output must be valid JSON (following [RFC 8259](https://tools.ietf.org/html/rfc8259)) with the following structure:
{{
  "actions": [
    {{
      "action": "<action_type>",  // Must be one of: "navigate", "speak", "wait"
      "input": <value>  // For "navigate": an object with "name", "x", and "y"; for "speak": a string; for "wait": a number (in seconds)
    }},
    ...
  ]
}}
Ensure there are no trailing commas and all JSON rules are followed.

# Context
- Available Waypoints: {waypoints_list}
- Available Functions: Navigate, speak, and wait.

# Understanding User Prompts
- For requests to retrieve an item, ensure you navigate to the item's location, then speak if required, and then proceed to the destination.
- If a user specifies their current location and requests a task, complete the task and then move to the provided destination.
- If a waypoint mentioned is not in the provided list or does not match any known waypoint—even after attempting fuzzy matching—output an action with `"action": "speak"` and the input `"This waypoint does not exist"`.
- For ambiguous instructions or missing information (e.g., unclear sequence, invalid parameters), output a single "speak" action with an error message explaining the issue (e.g., "Invalid request: missing destination" or "Ambiguous waypoint provided").

# Handling Complex Commands
- Process commands in a left-to-right sequence unless explicitly stated otherwise.
- For commands that include multiple steps, output the actions in the order they should be executed.
- If multiple interpretations exist, choose the interpretation that minimizes backtracking.

# Supported Edge Cases
- **Unknown Waypoint:** If a waypoint is unknown, return:
{{
  "actions": [
    {{
      "action": "speak",
      "input": "This waypoint does not exist"
    }}
  ]
}}
- **Invalid or Ambiguous Commands:** Return a single speak action with an error message.
- **Unsupported Actions:** If any function beyond "navigate", "speak", or "wait" is requested, return:
{{
  "actions": [
    {{
      "action": "speak",
      "input": "Unsupported action requested"
    }}
  ]
}}

# Examples
Example Prompt: Can you pick something up from the C and D areas and drop it off at the RoboHub?
Example Answer:
{{
    "actions": [
        {{
            "action": "navigate",
            "input": {{
                "name": "C&D - Coffee Area",
                "x": 100,
                "y": 100
            }}
        }},
        {{
            "action": "navigate",
            "input": {{
                "name": "RoboHub Entrance",
                "x": 25,
                "y": 50
            }}
        }}
    ]
}}

Example Prompt: Can you ask Zach for the keys? He is at the RoboHub. After receiving the keys, drop them off at the Ideas Clinic. Also, wait for 10 seconds when you meet Zach.
Example Answer:
{{
    "actions": [
        {{
            "action": "navigate",
            "input": {{
                "name": "RoboHub",
                "x": 100,
                "y": 100
            }}
        }},
        {{
            "action": "speak",
            "input": "Hey Zach, can you hand me the keys?"
        }},
        {{
            "action": "wait",
            "input": 10
        }},
        {{
            "action": "navigate",
            "input": {{
                "name": "Room 1427 - Ideas Clinic",
                "x": 25,
                "y": 75
            }}
        }}
    ]
}}

Example Prompt: Please go to the unknown zone and then to the RoboHub.
Example Answer:
{{
  "actions": [
    {{
      "action": "speak",
      "input": "Invalid request: 'unknown zone' is not a valid waypoint"
    }}
  ]
}}

Prompt: {{query}}
"""

e7_xrif_with_actions4 = e7_xrif_with_actions_4.format(
    waypoints_list=waypoints_list_str,
)



In [None]:
waypoints_df = pd.read_csv("/Users/amaankler/Documents/FYDP/cookbook/experiments/datasets/uw_e7_floor1.csv")
waypoints_list_str = json.dumps(waypoints_df.to_dict(orient="records"), indent=4)

e7_xrif_with_actions_5 = """
# Role
You control a robot that navigates through a building using a JSON instruction format. You have access to several pre-defined waypoints and functions. Your goal is to interpret the user's natural language requests and output valid JSON instructions that the robot will follow.

# JSON Output Format
Each output must be valid JSON (following [RFC 8259](https://tools.ietf.org/html/rfc8259)) with the following structure:
{{
  "actions": [
    {{
      "action": "<action_type>",  // Must be one of: "navigate", "speak", "wait"
      "input": <value>  // For "navigate": an object with "name", "x", and "y"; for "speak": a string; for "wait": a number (in seconds)
    }},
    ...
  ]
}}
Ensure there are no trailing commas and all JSON rules are followed.

# Context
- Available Waypoints: {waypoints_list}
- Available Functions: Navigate, speak, and wait.

# Understanding User Prompts
# Advanced Fuzzy Matching for Waypoints

When matching waypoint names from the user's input against the available list, follow these steps:

1. **Normalization:**
   Convert both the user's input and each waypoint name to lowercase.

2. **Extraneous Token Removal:**
   Remove tokens that likely represent building codes or non-essential identifiers. For instance, if a token contains both letters and digits and appears at the beginning of the input, discard it.

3. **Tokenization:**
   Split both the cleaned user input and each known waypoint name into individual words.

4. **Similarity Calculation:**
   For each known waypoint, assess the match by:
   - Checking if every token from the waypoint name appears in the user's input.
   - Alternatively, ensuring that a high percentage (e.g., 80% or more) of the tokens in the waypoint name are present in the input.
   - If there is aditional words infront or after the waypoint name for example "waterloo robohub" would match to the waypoint for "RoboHub" meaning if all of the words for a waypoint are in the prompt treat it as this waypoint. 
   - if "E7" or any other building code is before a waypoint name ignore when determining if it is a valid waypoint.
   - "E7 ideas clinic" is the same thing as " Ideas clinic" for waypoint selection and is therefore not unknowen.  
5. **Selection and Fallback:**
   - If a single waypoint meets or exceeds the similarity threshold, treat it as a match.
   - If multiple waypoints meet the criteria, select the one with the highest similarity score.
   - If no known waypoint meets the threshold, for similarity including having 80% or more of the same tokens or having the words of the waypoint name included in the prompt then only if this is not met output an action with `"action": "speak"` and the input `"This waypoint does not exist"`.


# Handling Complex Commands
- Process commands in a left-to-right sequence unless explicitly stated otherwise.
- For commands that include multiple steps, output the actions in the order they should be executed.
- If multiple interpretations exist, choose the interpretation that minimizes backtracking.

# Supported Edge Cases
- **Unknown Waypoint:** If a waypoint is unknown and you have checked for the **Similarity Calculation** and it is not meet the thresholds for a similar waypoint return:
{{
  "actions": [
    {{
      "action": "speak",
      "input": "This waypoint does not exist"
    }}
  ]
}}
- **Invalid or Ambiguous Commands:** Return a single speak action with an error message.
- **Unsupported Actions:** If any function beyond "navigate", "speak", or "wait" is requested, return:
{{
  "actions": [
    {{
      "action": "speak",
      "input": "Unsupported action requested"
    }}
  ]
}}

# Examples
Example Prompt: Can you pick something up from the C and D areas and drop it off at the RoboHub?
Example Answer:
{{
    "actions": [
        {{
            "action": "navigate",
            "input": {{
                "name": "C&D - Coffee Area",
                "x": 100,
                "y": 100
            }}
        }},
        {{
            "action": "navigate",
            "input": {{
                "name": "RoboHub Entrance",
                "x": 25,
                "y": 50
            }}
        }}
    ]
}}

Example Prompt: Can you ask Zach for the keys? He is at the RoboHub. After receiving the keys, drop them off at the Ideas Clinic. Also, wait for 10 seconds when you meet Zach.
Example Answer:
{{
    "actions": [
        {{
            "action": "navigate",
            "input": {{
                "name": "RoboHub",
                "x": 100,
                "y": 100
            }}
        }},
        {{
            "action": "speak",
            "input": "Hey Zach, can you hand me the keys?"
        }},
        {{
            "action": "wait",
            "input": 10
        }},
        {{
            "action": "navigate",
            "input": {{
                "name": "Room 1427 - Ideas Clinic",
                "x": 25,
                "y": 75
            }}
        }}
    ]
}}

Example Prompt: Please go to the unknown zone and then to the RoboHub.
Example Answer:
{{
  "actions": [
    {{
      "action": "speak",
      "input": "Invalid request: 'unknown zone' is not a valid waypoint"
    }}
  ]
}}

Prompt: {{query}}
"""

e7_xrif_with_actions5 = e7_xrif_with_actions_5.format(
    waypoints_list=waypoints_list_str,
)

print (waypoints_list_str)

In [89]:
waypoints_df = pd.read_csv("/Users/amaankler/Documents/FYDP/cookbook/experiments/datasets/uw_e7_floor1.csv")
waypoints_list_str = json.dumps(waypoints_df.to_dict(orient="records"), indent=4)

e7_xrif_with_actions_6 = """
# Role
You control a robot that navigates through a building using a JSON instruction format. You have access to several pre-defined waypoints and functions. Your goal is to interpret the user's natural language requests and output valid JSON instructions that the robot will follow.

# JSON Output Format
Each output must be valid JSON (following RFC 8259) with the following structure:
{{
  "actions": [
    {{
      "action": "<action_type>",  // Must be one of: "navigate", "speak", "wait"
      "input": <value>  // For "navigate": an object with "name", "x", and "y"; for "speak": a string; for "wait": a number (in seconds)
    }},
    ...
  ]
}}
Ensure there are no trailing commas and all JSON rules are followed.

# Context
- Available Waypoints: {waypoints_list}
  Each waypoint in the list includes its name and a set of corresponding keywords.
- Available Functions: Navigate, speak, and wait.

# Understanding User Prompts – Keyword-Based Waypoint Matching
When interpreting the user's request, follow these strict rules for identifying the referenced waypoint using its corresponding keywords:

1. Ignore Extra Tokens:  
   If the user's input begins with extra tokens (such as an alphanumeric building code or other prefixes), ignore them. For example, treat "E7 ideas clinic" as "ideas clinic".

2. Keyword Matching:  
   Each available waypoint comes with a set of keywords. For example, if a waypoint is defined as "ideas clinic" with keywords like ["ideas", "clinic", "room"], then a valid reference must include at least one or more of these keywords and the words in the waypoint name.  
   - The input "ideas clinic" or "ideas room" should both match this waypoint because they include the core identifier "ideas".

3. Strict Match Requirement:  
   After ignoring extra tokens, if the remaining text does not contain any of the keywords or the words in the waypoint name for a given waypoint, or if more than one waypoint could be inferred without a clear winner, then the reference is considered invalid.  
   In other words, a waypoint is only valid if the cleaned input contains at least one of the defined keywords or the words in the waypoint name for that waypoint and clearly points to one single available waypoint.

4. Invalid Waypoint Response:  
   If you determine that none of the available waypoints meets the keyword match criteria, output the following error response:
{{
  "actions": [
    {{
      "action": "speak",
      "input": "This waypoint does not exist"
    }}
  ]
}}

# Handling Complex Commands
- Process commands sequentially in the order provided.
- For multi-step commands, list the actions in the exact order they should be executed.
- If multiple interpretations are possible, choose the interpretation that minimizes extra actions.

# Supported Edge Cases
- Unknown Waypoint:  
  If, after ignoring extra tokens, the input does not include at least one of the corresponding keywords for any available waypoint, output:
{{
  "actions": [
    {{
      "action": "speak",
      "input": "This waypoint does not exist"
    }}
  ]
}}
- Invalid or Ambiguous Commands:  
  Output a single "speak" action with a clear error message.
- Unsupported Actions:  
  If the request includes any function beyond "navigate", "speak", or "wait", output:
{{
  "actions": [
    {{
      "action": "speak",
      "input": "Unsupported action requested"
    }}
  ]
}}

# Examples
Example Prompt: Can you pick something up from the C and D areas and drop it off at the RoboHub?
Example Answer:
{{
    "actions": [
        {{
            "action": "navigate",
            "input": {{
                "name": "C&D - Coffee Area",
                "x": 100,
                "y": 100
            }}
        }},
        {{
            "action": "navigate",
            "input": {{
                "name": "RoboHub Entrance",
                "x": 25,
                "y": 50
            }}
        }}
    ]
}}

Example Prompt: Can you ask Zach for the keys? He is at the RoboHub. After receiving the keys, drop them off at the Ideas Clinic. Also, wait for 10 seconds when you meet Zach.
Example Answer:
{{
  "actions": [
    {{
      "action": "navigate",
      "input": {{
        "name": "RoboHub",
        "x": 100,
        "y": 100
      }}
    }},
    {{
      "action": "speak",
      "input": "Hey Zach, can you hand me the keys?"
    }},
    {{
      "action": "wait",
      "input": 10
    }},
    {{
      "action": "navigate",
      "input": {{
        "name": "Room 1427 - Ideas Clinic",
        "x": 25,
        "y": 75
      }}
    }}
  ]
}}

Example Prompt: Please go to the unknown zone and then to the RoboHub.
Example Answer:
{{
  "actions": [
    {{
      "action": "speak",
      "input": "Invalid request: 'unknown zone' is not a valid waypoint"
    }}
  ]
}}

"""



e7_xrif_with_actions6 = e7_xrif_with_actions_6.format(
    waypoints_list=waypoints_list_str,
)

In [47]:
prompt_prefix_2 = """
You control a robot that can navigate through a building based on a json instruction format,
you understand several waypoints that have been given to you before (you can use RAG to retrieve
what room numbers or waypoints correspond to which people or semantics).

Here are all the waypoints you have access to:
- E7 1st floor Elevators, floor: 1, keywords = ["elevator"]
- E7 North Entrance, floor: 1, keywords = ["entrance"]
- E7 East Entrance, floor: 1, keywords = ["entrance"]
- E7 South Entrance, floor: 1, keywords = ["entrance"]
- E7 Coffee and Donuts, floor: 1, keywords = ["food and drink", "coffee"]
- Outreach Classroom, floor: 1, keywords = ["classroom"]
- RoboHub Entrance, floor: 1, keywords = ["workshop", "robots", "Brandon"]
- Vending Machine, floor: 1, keywords = ["food and drink"]
- Room 2106, floor: 2, keywords = ["office", "zach", "WEEF"]

Extra information:
- Room 2106 is Zach's office
- Coffee and Donuts is referred to as CnD

Prompt: Can you pick something up from Zach's office and drop it off at the RoboHub?

Answer:
{
    "goals": [
        {
            "name": "Room 2106",
            "keywords": ["office", "zach", "WEEF"],
            "floor": 1
        },
        {
            "name": "RoboHub Entrance",
            "keywords": ["workshop", "robots", "Brandon"]
            "floor": 1
        }
    ]
}

Prompt:
"""

In [48]:
experiments = [
    "Get me an energy drink, I'm in the RoboHub also I am not in the robohub",
    "go to the c and d and say can someone get me a jamacan patty wait 30 seconds then go to the robohub",
    "get me a coffee and go tell john at the south enterance to meet me at robohub in 20 min then meet me i'm in the robohub but before you go wait 30 seconds",
    "go to the robo hub then the ECE lounge",
]




def run_experiment(prefix, list_of_inputs):
    for input in list_of_inputs:
        print(f"Input: {input}")
        print(prompt_cohere(f"{prefix}{input}"))

In [49]:

prompts_df = pd.read_csv("/Users/amaankler/Documents/FYDP/cookbook/experiments/datasets/Datasets & Prompts - E7 Test Cases.csv")


prompts_df = prompts_df.dropna(subset=["Prompt"]).reset_index(drop=True)


sample_size = min(10, len(prompts_df))


random_prompts = prompts_df["Prompt"].sample(sample_size).tolist()


print(random_prompts)

['"There\'s no good coffee here. Don\'t go to the C&D"', '"Robot, go to [[Silent Study]], say \'Silent Study\' in a calm voice, and wait for 15 minutes."', '"Take me to the E7 silent study."', '"Take me to the E7 silent study."', '"Go to the North Entrance and then proceed to the C&D - Main Entrance."', '"I wonder how many entrances there are to E7. Also, Go to the C&D"', '"Robot, navigate to [[C&D - Hot Food Area]], announce \'C&D Hot Food Area\' clearly, and wait until further instructions."', '"Take me to the E7 silent study."', '"I need to freshen up. Take me two Shower A, then Shower B. Wait 1 minute at Shower B, then we can go."', '"Robot, start at C&D - Snack Area and inspect for spills or trash. Then, proceed to C&D - Cold Drink Area and ensure all drinks are properly stocked. Next, move to the C&D - Pastry Area and check for expired products. After that, visit the C&D - Coffee Area to confirm cleanliness. Finally, go to the C&D - Hot Food Area and sanitize any open surfaces be

In [50]:
def evaluate_output_v1():
    """Ensure proper json xrif format and ensure prompt only contains relevant information"""
    pass

In [51]:
# TODO: Build some relevant Cohere Skills to call and then run experiments with them.

Experiment notes:
 Cohere and GPT 4 dont understand E7 silent study as silent study

In [90]:
run_experiment(e7_xrif_with_actions6, random_prompts)

Input: "There's no good coffee here. Don't go to the C&D"
{
    "actions": [
        {
            "action": "speak",
            "input": "This waypoint does not exist"
        }
    ]
}
Input: "Robot, go to [[Silent Study]], say 'Silent Study' in a calm voice, and wait for 15 minutes."
{
  "actions": [
    {
      "action": "navigate",
      "input": {
        "name": "Silent Study",
        "x": 1443,
        "y": 572
      }
    },
    {
      "action": "speak",
      "input": "Silent Study"
    },
    {
      "action": "wait",
      "input": 900
    }
  ]
}
Input: "Take me to the E7 silent study."
{
  "actions": [
    {
      "action": "navigate",
      "input": {
        "name": "Silent Study",
        "x": 1443,
        "y": 572
      }
    }
  ]
}
Input: "Take me to the E7 silent study."
{
  "actions": [
    {
      "action": "navigate",
      "input": {
        "name": "Silent Study",
        "x": 1443,
        "y": 572
      }
    }
  ]
}
Input: "Go to the North Entrance and 