### Ideas for Evaluation
Main goal is creating a reliable and replicable system that can evaluate the LLM outputs/responses and also determine which prompts work better for our task compared to others.

- LLM Judges
- Check if XRIF Json is valid json
- Response Quality
- Try many different requests, add new locations, modify information relevant to locations?
- Try running a large loop of requests all using the same prompt prefix, and see which ones do well and which ones don't. Can your eval function properly identify good vs bad outputs?
- Build some skills/tool specs and test those

add timing to tasks


### Relevant LLM concepts to know
- Skills / Function Calling (Cohere calls them Tools)
https://docs.cohere.com/v2/docs/tools
- Structured Outputs -> https://docs.cohere.com/v2/docs/structured-outputs
- AI Agents
- RAG -> Retrieval Augmented Generation

In [None]:
import cohere
import pandas as pd
import json
import os

In [None]:
COHERE_API_KEY = os.getenv("COHERE_API_KEY")

In [17]:
cohere_client = cohere.ClientV2(api_key=COHERE_API_KEY)

In [18]:
def prompt_cohere(prompt, model_name = "command-r-plus-08-2024"):
    global cohere_client
    response = cohere_client.chat(
        model=model_name,
        messages=[
            {
                "role" : "user",
                "content" : prompt
            }
        ],
        response_format={"type": "json_object"},
    )
    return response.message.content[0].text

In [19]:
prompt_prefix_1 = """
You control a robot that can navigate through a building based on a json instruction format,
you understand several waypoints that have been given to you before (you can use RAG to retrieve
what room numbers or waypoints correspond to which people or semantics).

Here are all the waypoints you have access to:
- E7 1st floor Elevators, floor: 1
- E7 North Entrance, floor: 1
- E7 East Entrance, floor: 1
- E7 South Entrance, floor: 1
- E7 Coffee and Donuts, floor: 1
- Outreach Classroom, floor: 1
- RoboHub Entrance, floor: 1
- Vending Machine, floor: 1
- Room 2106, floor: 2

Extra information:
- Room 2106 is Zach's office
- Coffee and Donuts is referred to as CnD

Prompt: Can you pick something up from Zach's office and drop it off at the RoboHub?

Answer:
{
    "goals": [
        {
            "name": "Room 2106",
            "floor": 1
        },
        {
            "name": "RoboHub Entrance",
            "floor": 1
        }
    ]
}

Prompt:
"""

In [43]:
waypoints_df = pd.read_csv("/content/Datasets & Prompts - E7 First Floor.csv")
waypoints_list_str = json.dumps(waypoints_df.to_dict(orient="records"), indent=4)

e7_xrif_with_actions_3 = """
# Role
You control a robot that can navigate through a building based on a json instruction format, you understand several waypoints that have been provided in your context.
Your goal is to interperate user's requests into this JSON instruction format.


# Context
Here are all the waypoints you have access to:
{waypoints_list}

Here are all the Functions you have access to:
Navigate, speak and wait


# Understanding user prompts
when you are asked to get something make sure you speak in the exact location that the item is procured.
if someone states their location  and you are requested to go perform a task go there after you recive the item they are requesting or performed the task they are requestion.
if a waypoint is not in the list that is given or not like a waypoint in that list say this location does not exist.


# Examples
Example Prompt: Can you pick something up from the c and d and drop it off at the RoboHub?

Example Answer:
{{
    "actions": [
        {{
            "action": "navigate",
            "input": {{
                "name": "C&D - Coffee Area",
                "x": 100,
                "y": 100,

            }}
        }},
        {{
            "action": "navigate",
            "input": {{
                "name": "RoboHub Entrance",
                "x": 25,
                "y": 50,
            }}
        }}
    ]
}}

Example Prompt: Can you ask Zach for the keys he is at the robohub and drop it off at the ideas clinic? Wait for 10 seconds when you meet Zach so he can give you the keys.

Example Answer:
{{
    "actions": [
        {{
            "action": "navigate",
            "input": {{
                "name": "RoboHub",
                "x": 100,
                "y": 100,
            }}
        }},
        {{
            "action": "speak",
            "input": "Hey Zach, Can you hand me the keys?"
        }},
        {{
            "action": "wait",
            "input": 10
        }},
        {{
            "action": "navigate",
            "input": {{
                "name": "Room 1427 - Ideas Clinic",
                "x": 25,
                "y": 75,
            }}
        }}
    ]
}}
Example Prompt: can you go to the dairy queen

Example Answer:
{{
  "actions": [
    {{
      "action": "speak",
      "input": "This waypoint does not exist"
    }}
  ]
}}

Prompt: {{query}}
"""

final_prompt = e7_xrif_with_actions_3.format(
    waypoints_list=waypoints_list_str,)


In [56]:
waypoints_df = pd.read_csv("/content/Datasets & Prompts - E7 First Floor.csv")
waypoints_list_str = json.dumps(waypoints_df.to_dict(orient="records"), indent=4)

e7_xrif_with_actions_4 = """
# Role
You control a robot that navigates through a building using a JSON instruction format. You have access to several pre-defined waypoints and functions. Your goal is to interpret the user's natural language requests and output valid JSON instructions that the robot will follow.

# JSON Output Format
Each output must be valid JSON (following [RFC 8259](https://tools.ietf.org/html/rfc8259)) with the following structure:
{{
  "actions": [
    {{
      "action": "<action_type>",  // Must be one of: "navigate", "speak", "wait"
      "input": <value>  // For "navigate": an object with "name", "x", and "y"; for "speak": a string; for "wait": a number (in seconds)
    }},
    ...
  ]
}}
Ensure there are no trailing commas and all JSON rules are followed.

# Context
- Available Waypoints: {waypoints_list}
- Available Functions: Navigate, speak, and wait.

# Understanding User Prompts
- For requests to retrieve an item, ensure you navigate to the item's location, then speak if required, and then proceed to the destination.
- If a user specifies their current location and requests a task, complete the task and then move to the provided destination.
- If a waypoint mentioned is not in the provided list or does not match any known waypoint—even after attempting fuzzy matching—output an action with `"action": "speak"` and the input `"This waypoint does not exist"`.
- For ambiguous instructions or missing information (e.g., unclear sequence, invalid parameters), output a single "speak" action with an error message explaining the issue (e.g., "Invalid request: missing destination" or "Ambiguous waypoint provided").

# Handling Complex Commands
- Process commands in a left-to-right sequence unless explicitly stated otherwise.
- For commands that include multiple steps, output the actions in the order they should be executed.
- If multiple interpretations exist, choose the interpretation that minimizes backtracking.

# Supported Edge Cases
- **Unknown Waypoint:** If a waypoint is unknown, return:
{{
  "actions": [
    {{
      "action": "speak",
      "input": "This waypoint does not exist"
    }}
  ]
}}
- **Invalid or Ambiguous Commands:** Return a single speak action with an error message.
- **Unsupported Actions:** If any function beyond "navigate", "speak", or "wait" is requested, return:
{{
  "actions": [
    {{
      "action": "speak",
      "input": "Unsupported action requested"
    }}
  ]
}}

# Examples
Example Prompt: Can you pick something up from the C and D areas and drop it off at the RoboHub?
Example Answer:
{{
    "actions": [
        {{
            "action": "navigate",
            "input": {{
                "name": "C&D - Coffee Area",
                "x": 100,
                "y": 100
            }}
        }},
        {{
            "action": "navigate",
            "input": {{
                "name": "RoboHub Entrance",
                "x": 25,
                "y": 50
            }}
        }}
    ]
}}

Example Prompt: Can you ask Zach for the keys? He is at the RoboHub. After receiving the keys, drop them off at the Ideas Clinic. Also, wait for 10 seconds when you meet Zach.
Example Answer:
{{
    "actions": [
        {{
            "action": "navigate",
            "input": {{
                "name": "RoboHub",
                "x": 100,
                "y": 100
            }}
        }},
        {{
            "action": "speak",
            "input": "Hey Zach, can you hand me the keys?"
        }},
        {{
            "action": "wait",
            "input": 10
        }},
        {{
            "action": "navigate",
            "input": {{
                "name": "Room 1427 - Ideas Clinic",
                "x": 25,
                "y": 75
            }}
        }}
    ]
}}

Example Prompt: Please go to the unknown zone and then to the RoboHub.
Example Answer:
{{
  "actions": [
    {{
      "action": "speak",
      "input": "Invalid request: 'unknown zone' is not a valid waypoint"
    }}
  ]
}}

Prompt: {{query}}
"""

e7_xrif_with_actions4 = e7_xrif_with_actions_4.format(
    waypoints_list=waypoints_list_str,
)



In [67]:
waypoints_df = pd.read_csv("/content/Datasets & Prompts - E7 First Floor.csv")
waypoints_list_str = json.dumps(waypoints_df.to_dict(orient="records"), indent=4)

e7_xrif_with_actions_5 = """
# Role
You control a robot that navigates through a building using a JSON instruction format. You have access to several pre-defined waypoints and functions. Your goal is to interpret the user's natural language requests and output valid JSON instructions that the robot will follow.

# JSON Output Format
Each output must be valid JSON (following [RFC 8259](https://tools.ietf.org/html/rfc8259)) with the following structure:
{{
  "actions": [
    {{
      "action": "<action_type>",  // Must be one of: "navigate", "speak", "wait"
      "input": <value>  // For "navigate": an object with "name", "x", and "y"; for "speak": a string; for "wait": a number (in seconds)
    }},
    ...
  ]
}}
Ensure there are no trailing commas and all JSON rules are followed.

# Context
- Available Waypoints: {waypoints_list}
- Available Functions: Navigate, speak, and wait.

# Understanding User Prompts
# Advanced Fuzzy Matching for Waypoints

When matching waypoint names from the user's input against the available list, follow these steps:

1. **Normalization:**
   Convert both the user's input and each waypoint name to lowercase.

2. **Extraneous Token Removal:**
   Remove tokens that likely represent building codes or non-essential identifiers. For instance, if a token contains both letters and digits and appears at the beginning of the input, discard it.

3. **Tokenization:**
   Split both the cleaned user input and each known waypoint name into individual words.

4. **Similarity Calculation:**
   For each known waypoint, assess the match by:
   - Checking if every token from the waypoint name appears in the user's input.
   - Alternatively, ensuring that a high percentage (e.g., 80% or more) of the tokens in the waypoint name are present in the input.
   This allows the input to include additional tokens (such as non-essential prefixes) without preventing a match.

5. **Selection and Fallback:**
   - If a single waypoint meets or exceeds the similarity threshold, treat it as a match.
   - If multiple waypoints meet the criteria, select the one with the highest similarity score.
   - If no known waypoint meets the threshold, output an action with `"action": "speak"` and the input `"This waypoint does not exist"`.


# Handling Complex Commands
- Process commands in a left-to-right sequence unless explicitly stated otherwise.
- For commands that include multiple steps, output the actions in the order they should be executed.
- If multiple interpretations exist, choose the interpretation that minimizes backtracking.

# Supported Edge Cases
- **Unknown Waypoint:** If a waypoint is unknown, return:
{{
  "actions": [
    {{
      "action": "speak",
      "input": "This waypoint does not exist"
    }}
  ]
}}
- **Invalid or Ambiguous Commands:** Return a single speak action with an error message.
- **Unsupported Actions:** If any function beyond "navigate", "speak", or "wait" is requested, return:
{{
  "actions": [
    {{
      "action": "speak",
      "input": "Unsupported action requested"
    }}
  ]
}}

# Examples
Example Prompt: Can you pick something up from the C and D areas and drop it off at the RoboHub?
Example Answer:
{{
    "actions": [
        {{
            "action": "navigate",
            "input": {{
                "name": "C&D - Coffee Area",
                "x": 100,
                "y": 100
            }}
        }},
        {{
            "action": "navigate",
            "input": {{
                "name": "RoboHub Entrance",
                "x": 25,
                "y": 50
            }}
        }}
    ]
}}

Example Prompt: Can you ask Zach for the keys? He is at the RoboHub. After receiving the keys, drop them off at the Ideas Clinic. Also, wait for 10 seconds when you meet Zach.
Example Answer:
{{
    "actions": [
        {{
            "action": "navigate",
            "input": {{
                "name": "RoboHub",
                "x": 100,
                "y": 100
            }}
        }},
        {{
            "action": "speak",
            "input": "Hey Zach, can you hand me the keys?"
        }},
        {{
            "action": "wait",
            "input": 10
        }},
        {{
            "action": "navigate",
            "input": {{
                "name": "Room 1427 - Ideas Clinic",
                "x": 25,
                "y": 75
            }}
        }}
    ]
}}

Example Prompt: Please go to the unknown zone and then to the RoboHub.
Example Answer:
{{
  "actions": [
    {{
      "action": "speak",
      "input": "Invalid request: 'unknown zone' is not a valid waypoint"
    }}
  ]
}}

Prompt: {{query}}
"""

e7_xrif_with_actions5 = e7_xrif_with_actions_5.format(
    waypoints_list=waypoints_list_str,
)



In [21]:
prompt_prefix_2 = """
You control a robot that can navigate through a building based on a json instruction format,
you understand several waypoints that have been given to you before (you can use RAG to retrieve
what room numbers or waypoints correspond to which people or semantics).

Here are all the waypoints you have access to:
- E7 1st floor Elevators, floor: 1, keywords = ["elevator"]
- E7 North Entrance, floor: 1, keywords = ["entrance"]
- E7 East Entrance, floor: 1, keywords = ["entrance"]
- E7 South Entrance, floor: 1, keywords = ["entrance"]
- E7 Coffee and Donuts, floor: 1, keywords = ["food and drink", "coffee"]
- Outreach Classroom, floor: 1, keywords = ["classroom"]
- RoboHub Entrance, floor: 1, keywords = ["workshop", "robots", "Brandon"]
- Vending Machine, floor: 1, keywords = ["food and drink"]
- Room 2106, floor: 2, keywords = ["office", "zach", "WEEF"]

Extra information:
- Room 2106 is Zach's office
- Coffee and Donuts is referred to as CnD

Prompt: Can you pick something up from Zach's office and drop it off at the RoboHub?

Answer:
{
    "goals": [
        {
            "name": "Room 2106",
            "keywords": ["office", "zach", "WEEF"],
            "floor": 1
        },
        {
            "name": "RoboHub Entrance",
            "keywords": ["workshop", "robots", "Brandon"]
            "floor": 1
        }
    ]
}

Prompt:
"""

In [48]:
experiments = [
    "Get me an energy drink, I'm in the RoboHub also I am not in the robohub",
    "go to the c and d and say can someone get me a jamacan patty wait 30 seconds then go to the robohub",
    "get me a coffee and go tell john at the south enterance to meet me at robohub in 20 min then meet me i'm in the robohub but before you go wait 30 seconds",
    "go to the robo hub then the ECE lounge",
]




def run_experiment(prefix, list_of_inputs):
    for input in list_of_inputs:
        print(f"Input: {input}")
        print(prompt_cohere(f"{prefix}{input}"))

In [52]:

prompts_df = pd.read_csv("/content/Datasets & Prompts - E7 Test Cases.csv")


prompts_df = prompts_df.dropna(subset=["Prompt"]).reset_index(drop=True)


sample_size = min(10, len(prompts_df))


random_prompts = prompts_df["Prompt"].sample(sample_size).tolist()


print(random_prompts)

['"I could really go for a shower."', '"Take me to the Elevators. Wait there for 15 seconds, and then let\'s find the East Entrance."', '"Robot, go to Silent Study and say \'Silent Study\' in a calm voice."', '"Can you go to the Dairy Queen?"', '"Take me to the E7 silent study."', '"Robot, proceed to North Entrance and say \'North Entrance\' out loud."', '"Navigate to Ideas Clinic and then go Assembly Room."', '"Take me to the E7 silent study."', '"I\'m fealing peckish. Can we swing by the C&D - Snack Area and then grab a drink at the C&D - Cold Drink Aeria?"', '"Travel to the E5 Entrance and then find the Microwaves."']


In [23]:
def evaluate_output_v1():
    """Ensure proper json xrif format and ensure prompt only contains relevant information"""
    pass

In [24]:
# TODO: Build some relevant Cohere Skills to call and then run experiments with them.

Experiment notes:
 Cohere and GPT 4 dont understand E7 silent study as silent study

In [68]:
run_experiment(e7_xrif_with_actions5, random_prompts)

Input: "I could really go for a shower."
{
    "actions": [
        {
            "action": "speak",
            "input": "This waypoint does not exist"
        }
    ]
}
Input: "Take me to the Elevators. Wait there for 15 seconds, and then let's find the East Entrance."
{
    "actions": [
        {
            "action": "navigate",
            "input": {
                "name": "Elevators",
                "x": 1709.0,
                "y": 778.0
            }
        },
        {
            "action": "wait",
            "input": 15
        },
        {
            "action": "navigate",
            "input": {
                "name": "East Entrance",
                "x": 1611.0,
                "y": 413.0
            }
        }
    ]
}
Input: "Robot, go to Silent Study and say 'Silent Study' in a calm voice."
{
    "actions": [
        {
            "action": "navigate",
            "input": {
                "name": "Silent Study",
                "x": 1443,
                "y": 572
