# Prompting Experiments
In this notebook, we are experimenting with various few-shot and chain-of-thought prompts that can translate queries such as "Can you get me a coffee" into a sequence of actions and responses that a robot would perform.

Research Ideas:
- Waypoint-based navigation in ambiguous environments such as UWaterloo campus or in a parts storage warehouse
- Spatial Knowledge (coordinates in context)
- Update waypoint info + context automatically
    - "Remember that Zach's office is room 2441"
    - "The best place to get chips is the vending machine"
- What happens when you have a lot of waypoints in the context
- What to do when two waypoints match a simple description
- Ask to clarify if there is unclear information
- Modes -> teleop, nav, conversation-only
- Dynamic Waypoint Generation
    - During manual teleop -> "Remember this location you just passed by, this is called Room 2106"
    - Use camera to pick up potential new waypoints/context -> Robot says "I passed by a fire extinguisher, should I make a new waypoint?"
- Image Embeddings for waypoints 
    - Remember what certain places look like / what points/objects of interest are near that waypoint
- Robot Knowledge Graph (use graph nav notebook)
    - Remember waypoint locations
    - Remember people you've met -> "Hey I'm Zach, go to my office"
- Looping tasks -> "Do this 5 times"
- Scheduling tasks -> "At 2pm, take a picture of the north entrance"
    - eventually -> task duration estimates
    - Check at regular intervals if you have a task you're supposed to do right now
- Test different floors -> ex: warehouse with different sections



In [1]:
! pip install -r requirements.in


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
import os
import argparse
import json
import yaml
from typing import Dict, Any
import dotenv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
import cohere
from transformers import AutoModelForCausalLM, AutoTokenizer, get_linear_schedule_with_warmup
from openai import OpenAI
from haystack import Pipeline, Document
from haystack.utils import Secret
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import HuggingFaceLocalGenerator

sns.set_theme(style="whitegrid")

  from .autonotebook import tqdm as notebook_tqdm


In [12]:
dotenv.load_dotenv()

COHERE_API_KEY = os.getenv("COHERE_API_KEY")
if COHERE_API_KEY is None:
    raise ValueError("COHERE_API_KEY is not set")

# Models
We'll use Cohere as a baseline alongside DeepSeek-R1 7B.

In [3]:
cohere_client = cohere.ClientV2(api_key=COHERE_API_KEY)

In [4]:
def prompt_cohere(prompt, model_name = "command-r-plus-08-2024"):
    global cohere_client
    response = cohere_client.chat(
        model=model_name,
        messages=[
            {
                "role" : "user",
                "content" : prompt
            }
        ],
        response_format={"type": "json_object"},
    )
    return response.message.content[0].text

### XRIF V1
The goal of XRIF V1 is to be easily serializable (using JSON for now), making it easy to generate with simple in-context learning and chain of thought prompts. Beyond prompt engineering, fine-tuning experiments can enable smaller models to better produce more accurate XRIF instructions.

In [4]:
e7_xrif_with_waypoint_keywords = """
You control a robot that can navigate through a building based on a json instruction format,
you understand several waypoints that have been given to you before (you can use RAG to retrieve
what room numbers or waypoints correspond to which people or semantics).

Here are all the waypoints you have access to:
- E7 1st floor Elevators, floor: 1, keywords = ["elevator"]
- E7 North Entrance, floor: 1, keywords = ["entrance"]
- E7 East Entrance, floor: 1, keywords = ["entrance"]
- E7 South Entrance, floor: 1, keywords = ["entrance"]
- E7 Coffee and Donuts, floor: 1, keywords = ["food and drink", "coffee"]
- Outreach Classroom, floor: 1, keywords = ["classroom"]
- RoboHub Entrance, floor: 1, keywords = ["workshop", "robots", "Brandon"]
- Vending Machine, floor: 1, keywords = ["food and drink"]
- Room 2106, floor: 2, keywords = ["office", "zach", "WEEF"]

Extra information:
- Room 2106 is Zach's office
- Coffee and Donuts is referred to as CnD

Prompt: Can you pick something up from Zach's office and drop it off at the RoboHub?

Answer:
{
    "goals": [
        {
            "name": "Room 2106",
            "keywords": ["office", "zach", "WEEF"],
            "floor": 1
        },
        {
            "name": "RoboHub Entrance",
            "keywords": ["workshop", "robots", "Brandon"]
            "floor": 1
        }
    ]
}

Prompt:
"""

In [5]:
experiments = [
    "Send a Coffee to Zach's Office",
    "Get me an energy drink, I'm in the RoboHub",
    "go to the south entrance then the north entrance",
]

def run_experiment(prefix, list_of_inputs):
    for input in list_of_inputs:
        print(f"Input: {input}")
        print(prompt_cohere(f"{prefix}{input}"))

### XRIF V2
V2 introduces action calling, which allows XRIF to go beyond just waypoint navigation.

In [6]:
XRIF_MODES = ["learning", "navigating", "conversation", "teleop"]

In [7]:
waypoint_list_v1 = [
{
    "name": "E7 1st floor Elevators",
    "x": 50,
    "y": 50,
    "floor": 1,
    "keywords": ["elevator"],
},
{
    "name": "E7 North Entrance",
    "x": 50,
    "y": 0,
    "floor": 1,
    "keywords": ["entrance"],
},
{
    "name": "E7 East Entrance",
    "x": 0,
    "y": 50,
    "floor": 1,
    "keywords": ["entrance"],
},
{
    "name": "E7 South Entrance",
    "x": 50,
    "y": 100,
    "floor": 1,
    "keywords": ["entrance"],
},
{
    "name": "E7 Coffee and Donuts",
    "x": 75,
    "y": 75,
    "floor": 1,
    "keywords": ["food and drink", "coffee", "breakfast", "snacks"],
    "aliases": ["CnD", "C and D"]
},
{
    "name": "Room 1427 - Ideas Clinic",
    "x": 25,
    "y": 75,
    "floor": 1,
    "keywords": ["classroom", "workshop", "lab"],
},
{
    "name": "Outreach Classroom",
    "x": 25,
    "y": 25,
    "floor": 1,
    "keywords": ["classroom", "lecture hall"],
},
{
    "name": "RoboHub Entrance",
    "x": 25,
    "y": 50,
    "floor": 1,
    "keywords": ["robots", "workshop", "lab"],
},
{
    "name": "Vending Machine",
    "x": 75,
    "y": 50,
    "floor": 1,
    "keywords": ["food and drink", "snacks"],
},
{
    "name": "Room 2241",
    "x": 100,
    "y": 100,
    "floor": 2,
    "keywords": ["office", "Zach"],
}
]

In [8]:
actions_list_v1 = [
{
    "name": "navigate",
    "input": ["waypoint"],
},
{
    "name": "wait",
    "input": ["seconds"],
},
{
    "name": "speak",
    "input": ["output"],
},
{
    "name": "add_waypoint",
    "input": ["waypoint"],
}
]

In [9]:
xrif_with_actions_1 = """
You control a robot that can navigate through a building based on a json instruction format, you understand several waypoints that have been given to you before (you can use RAG to retrieve what room numbers or waypoints correspond to which people or semantics).

Here are all the waypoints you have access to:
{{ waypoints_list }}

Here are all the functions you have access to:
{{ actions_list }}

Example Prompt: Can you pick something up from Zach's office and drop it off at the RoboHub?

Example Answer:
{
    "actions": [
        {
            "action": "navigate",
            "input": {
                "name": "Room 2241",
                "x": 100,
                "y": 100,
                "floor": 2,
                "keywords": ["office", "Zach"],
            }
        },
        {
            "action": "navigate",
            "input": {
                "name": "RoboHub Entrance",
                "x": 25,
                "y": 50,
                "floor": 1,
                "keywords": ["robots", "workshop", "lab"],
            },
        },
    ]
}

Example Prompt: Can you ask Zach for the keys and drop it off at the ideas clinic? Wait for 10 seconds when you meet Zach so he can give you the keys.

Example Answer:
{
    "actions": [
        {
            "action": "navigate",
            "input": {
                "name": "Room 2241",
                "x": 100,
                "y": 100,
                "floor": 2,
                "keywords": ["office", "Zach"],
            }
        },
        {
            "action": "speak",
            "input": "Hey Zach, Can you hand me the keys?",
        },
        {
            "action": "wait",
            "input": 10,
        },
        {
            "action": "navigate",
            "input": {
                "name": "Room 1427 - Ideas Clinic",
                "x": 25,
                "y": 75,
                "floor": 1,
                "keywords": ["classroom", "workshop", "lab"],
            },
        },
    ]
}

Prompt: {{ query }}
"""

In [None]:
e7_xrif_with_actions_2 = """
# Role 
You control a robot that can navigate through a building based on a json instruction format, you understand several waypoints that have been provided in your context.

# Context
Here are all the waypoints you have access to:
{{ waypoints_list }}

Here are all the Functions you have access to:
{{ actions_list }}

# Examples
Example Prompt: Can you pick something up from Zach's office and drop it off at the RoboHub?

Example Answer:
{
    "actions": [
        {
            "action": "navigate",
            "input": {
                "name": "Room 2241",
                "x": 100,
                "y": 100,
                "floor": 2,
                "keywords": ["office", "Zach"],
            }
        },
        {
            "action": "navigate",
            "input": {
                "name": "RoboHub Entrance",
                "x": 25,
                "y": 50,
                "floor": 1,
                "keywords": ["robots", "workshop", "lab"],
            },
        },
    ]
}

Example Prompt: Can you ask Zach for the keys and drop it off at the ideas clinic? Wait for 10 seconds when you meet Zach so he can give you the keys.

Example Answer:
{
    "actions": [
        {
            "action": "navigate",
            "input": {
                "name": "Room 2241",
                "x": 100,
                "y": 100,
                "floor": 2,
                "keywords": ["office", "Zach"],
            }
        },
        {
            "action": "speak",
            "input": "Hey Zach, Can you hand me the keys?",
        },
        {
            "action": "wait",
            "input": 10,
        },
        {
            "action": "navigate",
            "input": {
                "name": "Room 1427 - Ideas Clinic",
                "x": 25,
                "y": 75,
                "floor": 1,
                "keywords": ["classroom", "workshop", "lab"],
            },
        },
    ]
}

Prompt: {{ query }}
"""

In [3]:
class LLMGenerator():
  def __init__(self, 
               model_name: str, 
               task: str, 
               waypoints_csv: str,
               promp_metadata_file : str, 
               generation_kwargs = {"max_new_tokens": 5000, "temperature": 0.9}):
    self.model_name = model_name
    self.task = task
    self.generation_kwargs = generation_kwargs
    self.waypoints = waypoints_csv
    self.waypoints_list = []
    self.document_store = InMemoryDocumentStore()
    self.prompt_metadata = promp_metadata_file
    self.llm = HuggingFaceLocalGenerator(model=self.model_name, task=self.task, generation_kwargs = self.generation_kwargs)

    with open(self.prompt_metadata, 'r') as file:
      prompt_dict = yaml.safe_load(file)
      self.prompt = prompt_dict['prompt']
      self.prompt_name = prompt_dict['prompt_name']

    
    self.retriever = InMemoryBM25Retriever(document_store=self.document_store)
    self.prompt_builder = PromptBuilder(template=self.prompt)
    waypoints_csv = pd.read_csv(self.waypoints)
    waypoint_cols = ['Location', 'X co-ordinate', 'Y co-ordinate', 'Floor', 'Section', 'Keywords']
    given_waypoints_cols = list(waypoints_csv.columns)
    if given_waypoints_cols != waypoint_cols:
      raise Exception("Columns do not match, please provide a new dataset")
    
    
    for index, row in waypoints_csv.iterrows():
      self.waypoints_list.append(Document(content=f"Location Name: {row['Location']} X-Coordinate: {row['X co-ordinate']} Y-Coordinate: {row['Y co-ordinate']} Section: {row['Section']} Keywords: {row['Keywords']}"))

    self.document_store.write_documents(self.waypoints_list)
    self.pipeline = Pipeline()

    self.pipeline.add_component("retriever", self.retriever)
    self.pipeline.add_component("prompt_builder", self.prompt_builder)
    self.pipeline.add_component("generator", self.llm)
    self.pipeline.connect("retriever", "prompt_builder.documents")
    self.pipeline.connect("prompt_builder", "generator")
  
  def generate(self, query: str):
    res = self.pipeline.run({
      "prompt_builder": {
        "query": query
      },
      "retriever" : {
        "query": query
      }

    })

    return res['generator']['replies'][0]

In [4]:
deepseek = LLMGenerator(model_name="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", task="text-generation", waypoints_csv="datasets/uw_e7_floor1.csv", promp_metadata_file="prompts/few_shot_xrif.yaml")

In [5]:
deepseek.generate("Can you pick something up from Zach's office and drop it off at the RoboHub?")

Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Device set to use mps


'<think>\nAlright, so I\'m trying to help the user generate a JSON response for their path planning agent. The prompt they provided is "Can you pick something up from Zach\'s office and drop it off at the RoboHub?" and there\'s a list of waypoints provided. \n\nFirst, I need to understand what each waypoint represents. From the example given, it looks like each location has specific coordinates, keywords, and sections. So, each location\'s information is tied to these key points.\n\nThe user\'s prompt asks to pick something from Zach\'s office. So, I should check if Zach\'s office is listed as a location. Looking at the provided waypoints, the closest is "C&D - Main Entrance". But I notice that some locations are more about their features and keywords than just locations. For instance, "Shower A" has coordinates but no location name. So, I might need to treat that as a separate entity.\n\nSince the user wants to pick something up, I should include the "C&D - Main Entrance" as the locat

In [7]:
some_string = '<think>\nAlright, so I\'m trying to help the user generate a JSON response for their path planning agent. The prompt they provided is "Can you pick something up from Zach\'s office and drop it off at the RoboHub?" and there\'s a list of waypoints provided. \n\nFirst, I need to understand what each waypoint represents. From the example given, it looks like each location has specific coordinates, keywords, and sections. So, each location\'s information is tied to these key points.\n\nThe user\'s prompt asks to pick something from Zach\'s office. So, I should check if Zach\'s office is listed as a location. Looking at the provided waypoints, the closest is "C&D - Main Entrance". But I notice that some locations are more about their features and keywords than just locations. For instance, "Shower A" has coordinates but no location name. So, I might need to treat that as a separate entity.\n\nSince the user wants to pick something up, I should include the "C&D - Main Entrance" as the location to pick from. Then, the target is "RoboHub". So, I\'ll structure the JSON with a "goals" array containing two objects: one for picking up from Zach\'s office and another for dropping it off at RoboHub.\n\nI should make sure the keys are correct and match the example provided. That way, the system can interpret the data properly and apply it to the user\'s query. If I were to write this, I\'d have to map each location to its respective coordinates and keywords to ensure the system can retrieve the correct information accurately.\n</think>\n\n```json\n{\n    "goals": [\n        {\n            "name": "C&D - Main Entrance",\n            "floor": 1,\n            "keywords": {\n                "X-Coordinate": 1965,\n                "Y-Coordinate": 1132,\n                "Section": "C&D",\n                "Keywords": ["Coffee and Donuts, Cafe, Coffee Shop, Store, Snacks, Drinks, 1416"]\n            }\n        },\n        {\n            "name": "RoboHub",\n            "floor": 1,\n            "keywords": {\n                "X-Coordinate": 1339,\n                "Y-Coordinate": 906,\n                "Section": "RoboHub",\n                "Keywords": ["Robotics, 1339, Robot, Drone"]\n            }\n        }\n    ]\n}\n```'

print(some_string)

<think>
Alright, so I'm trying to help the user generate a JSON response for their path planning agent. The prompt they provided is "Can you pick something up from Zach's office and drop it off at the RoboHub?" and there's a list of waypoints provided. 

First, I need to understand what each waypoint represents. From the example given, it looks like each location has specific coordinates, keywords, and sections. So, each location's information is tied to these key points.

The user's prompt asks to pick something from Zach's office. So, I should check if Zach's office is listed as a location. Looking at the provided waypoints, the closest is "C&D - Main Entrance". But I notice that some locations are more about their features and keywords than just locations. For instance, "Shower A" has coordinates but no location name. So, I might need to treat that as a separate entity.

Since the user wants to pick something up, I should include the "C&D - Main Entrance" as the location to pick fro

# Evaluation
- Inference latency
- RAG / Tool call latency
- Model size (if open source)
- Model cost (if API like Cohere/OpenAI)
- Correctness of outputs:
    - LLM Judge → Use a separate LLM to check the output of the one you are experimenting with
        - Prompt is not something that is telling the robot do anything dangerous → Sentiment analysis 
    - Formatting → Is the output in the correct format (i.e. JSON)
    - Human Eval → Manually check → possibly build out action path and have human accept or deny
    - Validate XRIF itself → make sure XRIF fields are correct → validation script 
    - Valid waypoints outputted → Doesn’t output a non-existent waypoint 
    - Checking vector search accuracy → can you partition accurately 
