# ETOT VLM Planning

In [None]:
from vlm import BaseVLM
api_key = ""
base_url = ""
erp = ""
model = "gpt-4o"
system_message = "You are a helpful assistant."

## 1. Priori Branching

### prompt

In [2]:
subtask_fewshot = '''
Task background: The robot is in front of the desk and needs to complete a multi-step task.
You need to generate a multi-step task plan based on the provided task instruction, environment_objects, object states and example specifications.

Action Skills Set:
- PICK UP
- PUT ON
- PUT INTO
- OPEN
- CLOSE

Note:
1. Articulated objects (e.g., cabinet) require OPEN and CLOSE actions. Articulated objects (such as drawers) need to be opened and closed, and if they are not (such as holders), they do not need to be opened and closed. If the task being executed is unrelated to the articulated object, or if there is no task of placing the object in the articulated object, do not open or close the articulated object. 
2. There is a situation where multiple branches need to be generated: the diversity of object types, for example, if two objects need to be placed in three containers and their corresponding containers are not specified in the task instructions, there will be six branches, or if one of the three objects needs to be picked up, there will be three branches.
3. If there are multiple objects that need to be placed in different containers and different branches need to be generated, the order of object operations remains unchanged, while the corresponding order of containers changes.
4. If there are multiple poses for a specific grasping action, ignore them and summarize them with only one PICK UP without adding any other modifiers.
5. Additionally, we require that when multiple objects are placed into a articulated container, each insertion must follow an open–pick-putin–close cycle. That is, the container must be opened, the object placed inside, and then closed before inserting the next object.

### Example1:
Input:
Instruction: put a toy into the holder.
Environment_object: toy1, toy2, holder, desk
Object_state:
1. [ontop, toy1, desk]
2. [ontop, toy2, desk]
3. [ontop, holder, desk]
Output:
Two branches:
Subtasks of branch 1:
1. [PICK UP, toy1]
2. [PUT INTO, holder]
Subtasks of branch 2:
1. [PICK UP, toy2]
2. [PUT INTO, holder]

### Example2:
Input:
Instruction: put two tennis into the cabinet.
Environment_object: tennis 1, tennis 2, cabinet, desk
Object_state:
1. [ontop, tennis 1, desk]
2. [ontop, tennis 2, desk]
3. [ontop, cabinet, desk]
4. [closed, cabinet]
Output:
One branch:
1. tennis 1 >> cabinet, tennis 2 >> cabinet;
Subtasks of branch 1:
1. [OPEN, cabinet]
2. [PICK UP, tennis 1]
3. [PUT INTO, cabinet]
4. [CLOSE, cabinet]
5. [OPEN, cabinet]
6. [PICK UP, tennis 2]
7. [PUT INTO, cabinet]
8. [CLOSE, cabinet]

### Example3:
Input:
Instruction: put the toy and holder on the drawer, with the toy in the holder.
Environment_object: toy, holder, apple 1, apple 2, drawer, desk
Object_state:
1. [ontop, toy, desk]
2. [ontop, holder, desk]
3. [ontop, drawer, desk]
4. [ontop, apple 1, desk]
5. [ontop, apple 2, desk]
Output:
One branch:
Subtasks of branch 1:
1. [PICK UP, toy]
2. [PUT INTO, holder]
3. [PICK UP, holder]
4. [PUT ON, drawer]

### Example4:
Input:
Instruction: put one apple into the cabinet and another apple on the plate.
Environment_object: apple 1, apple 2, cabinet, plate, drawer, tennis, desk
Object_state:
1. [ontop, apple 1, desk]
2. [ontop, apple 2, desk]
3. [ontop, cabinet, desk]
4. [closed, cabinet]
5. [ontop, plate, desk]
6. [ontop, tennis, desk]
7. [ontop, drawer, desk]
8. [ontop, tennis, drawer]
9. [closed, drawer]
Output:
The order of object operations remains unchanged, while the corresponding order of containers changes.
Each insertion of an object into the cabinet must follow an open–pick-putin–close cycle.
Two branches:
1. apple 1 >> cabinet, apple 2 >> plate;
2. apple 1 >> plate, apple 2 >> cabinet;
Subtasks of branch 1:
1. [OPEN, cabinet]
2. [PICK UP, apple 1]
3. [PUT INTO, cabinet]
4. [CLOSE, cabinet]
5. [PICK UP, apple 2]
6. [PUT ON, apple 2, plate]
Subtasks of branch 2:
1. [PICK UP, apple 1]
2. [PUT ON, plate]
3. [OPEN, cabinet]
4. [PICK UP, apple 2]
5. [PUT INTO, cabinet]
6. [CLOSE, cabinet]

### Input:
Instruction:
{Instruction}

Environment_object:
{Environment_object}

Object_state:
{Object_state}

Output:

'''

In [1]:
detect_pose = '''
Instruction:
{Instruction}
Does the task instruction contain explicit prompts for capturing pose diversity, such as horizontally or vertically. output YES or NO.
'''

pose_branching = '''
Task background: The robot is in front of the desk and needs to complete a multi-step task.

I need you to branch the original branches from the grass pose of a specific step.
Only PICK UP has branching conditions, other actions do not! Don't create any other actions yourself!

### Example1:
Input:
Instruction: put one apple into the cabinet horizontally or vertically.
Environment_object: apple 1, apple 2, cabinet, plate, desk
Object_state:
1. [ontop, apple 1, desk]
2. [ontop, apple 2, desk]
3. [ontop, cabinet, desk]
4. [closed, cabinet]
5. [ontop, plate, desk]
Branches:
1. apple 1 >> cabinet;
2. apple 1 >> plate;
Subtasks of branch 1:
1. [OPEN, cabinet]
2. [PICK UP, apple 1]
3. [PUT INTO, cabinet]
4. [CLOSE, cabinet]
5. [PICK UP, apple 2]
6. [PUT ON, apple 2, plate]
Subtasks of branch 2:
1. [PICK UP, apple 1]
2. [PUT ON, plate]
3. [OPEN, cabinet]
4. [PICK UP, apple 2]
5. [PUT INTO, cabinet]
6. [CLOSE, cabinet]

Output:
Four branches:
1. apple 1 >> cabinet(horizontally);
2. apple 2 >> cabinet(horizontally);
Subtasks of branch 1:
1. [OPEN, cabinet]
2. [PICK UP, apple 1][horizontally]
3. [PUT INTO, cabinet]
4. [CLOSE, cabinet]
Subtasks of branch 2:
1. [PICK UP, apple 1][vertically]
2. [PUT INTO, cabinet]
3. [CLOSE, cabinet]
Subtasks of branch 3:
1. [PICK UP, apple 1][horizontally]
2. [PUT INTO, cabinet]
3. [CLOSE, cabinet]
Subtasks of branch 4:
1. [PICK UP, apple 2][vertically]
2. [PUT INTO, cabinet]
3. [CLOSE, cabinet]

### Input:
Instruction:
{Instruction}

Environment_object:
{Environment_object}

Object_state:
{Object_state}

Branches:
{Branches}
Output:

'''

### format

In [4]:
format_check = '''
Task background: The robot is in front of the desk and needs to complete a multi-step task.
You need to check if there are any irregularities in the generated tasks. The task specifications are as follows:
1. Firstly, the plan must consist of the following actions, and any undefined actions will be considered as errors:
Action Skills Set:
- PICK UP
- PUT ON
- PUT INTO
- OPEN
- CLOSE

2. We require that when multiple objects are placed into a articulated container, each insertion must follow an open–pick-putin–close cycle. That is, the container must be opened, the object placed inside, and then closed before inserting the next object.

3. After action [PICK UP, object], the next action can only be PUT ON, PUT INTO, or None(PICK UP is the last action), not OPEN or CLOSE. While after action [PUT ON, obj] or [PUT INTO, obj] or [CLOSE, obj] or [OPEN, obj], the next action can be any action.

4. Microwave ovens and drawers are articulated objects, holders are not articulated objects, but they can all be put into objects as containers. Drawer can also be used as a platform to place objects.

5. Plans for different branches should be judged separately, There is no relationship between different branches!!!

6. Output format: First output Yes!!!/No!!!, then output analysis.

7. The PICK UP, OPEN, and CLOSE actions can appear alone in the plan.

8. After PICK UP, you can only PICK UP again after PUT ON, PUT INTO.

Example1:
Input:
1. [OPEN, drawer]
Output:
Yes, The final generated plan meets the specifications, OPEN action can be excuted alone

Example2:
Input:
1. [CLOSE, drawer]
Output:
Yes, The final generated plan meets the specifications, CLOSE action can be excuted alone

Example3:
Input:
1. [PICK UP, apple]
2. [PUT ON, drawer] (safe place)
Output:
Yes!!! The final generated plan meets the specifications. 
The first action is picking up the apple. After [PICK UP apple], you can do second action [PUT ON, drawer].

Example4:
Input:
Branch 1:
1. [PICK UP, apple 1]
Branch 2:
1. [PICK UP, apple 2]
Output:
Yes!!! Plans for different branches should be judged separately. Each branch plan meets the specifications.
PICK UP action can be excuted alone for each branch, actions. 
After picking up an object, the next action can be None(PICK UP is the last action)

Example5:
Input:
Branch 1:
1. [PICK UP, tennis]
2. [PUT INTO, holder]
3. [OPEN, drawer]
4. [PICK UP, apple]
5. [PUT INTO, drawer]
6. [CLOSE, drawer]
Output:
Yes!!! The final generated plan meets the specifications.
The first action is picking up the tennis, the second action can be [PUT INTO, holder]. 
And after [PUT INTO, holder], the third action can be [PICK UP, apple].
And after [OPEN, drawer], the fourth action can be [PICK UP, apple].
And after [OPEN, drawer] and [PICK UP, apple], the sixth action can be [PUT INTO, drawer].
And after [PUT INTO, drawer], the fifth action can be [CLOSE, drawer].



Example6:
Input:
MultiVLM Response: In analyzing the images and the scenario, we can infer the issue might have occurred while attempting to pick up the tennis ball after opening the drawer. This action was likely impeded by the drawer being open, hindering the robot's access to the tennis ball on the desk. 

To correct the task plan while adhering to only the action skills set provided, we can modify the steps to move the tennis ball to a safe place before dealing with the drawer.

Here is the corrected task plan:

1. [PICK UP, apple]
2. [PUT INTO, holder]
3. [PICK UP, tennis]
4. [OPEN, drawer]
5. [PUT INTO, drawer]
6. [CLOSE, drawer]

Output:
No!!! The final generated plan encountered an error: the forth action OPEN drawer cannot be executed directly after the third action picking up tennis.

Example7:
Input:
MultiVLM Response: 
Correct task:
1. [OPEN, holder]
2. [PICK UP, apple]
3. [PUT INTO, holder]
4. [CLOSE, holder]
Output:
No!!! The holder is not a articulated object and cannot be opened

Example8:
Input:
Branch 1:
1. [PICK UP, apple 1]
1. [PICK UP, apple 2]
Output:
No!!! The final generated plan encountered an error: the second action pick up apple 2 cannot be executed directly after the first action picking up apple 1

Example9:
Input:
MultiVLM Response: 
Correction:
1. [PICK UP, tennis 1] (move to accessible position)
2. [PUT ON, desk](safe and accessible place)
3. [PICK UP, tennis 2] 
4. [PUT INTO, holder]
5. [PICK UP, holder]
6. [PUT ON, drawer](safe and accessible place)
Output:
Yes!!! The final generated plan meets the specifications. 
After the first action picking up the tennis 1, the second action can be [PUT ON, desk]. 
And after the second action [PUT ON, desk], the third action can be [PICK UP, tennis 2]. 
After the third action picking up the tennis 2, the fourth action can be [PUT INTO, holder]. 
And after the fourth action [PUT INTO, holder], the fifth action can be [PICK UP, holder]. 
And after the fifth action [PICK UP, holder], the sixth action can be [PUT ON, drawer].

### Input:
{task_plan}

Output:

'''



In [5]:
def priori_branching(Instruction, Environment_object, Object_state):
    base_vlm = BaseVLM(api_key, model, system_message, base_url)
    prompt = subtask_fewshot.format(Instruction=Instruction, Environment_object=Environment_object, Object_state=Object_state)
    format = "No"

    while "yes" not in format.lower():
        print("planning!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
        response = base_vlm.call_model(prompt)
        print(response)
        format_prompt = format_check.format(task_plan=response)
        format = base_vlm.call_model(format_prompt)
        
        print(format)
        
    prompt = detect_pose.format(Instruction=Instruction)
    pose = base_vlm.call_model(prompt)
    if "yes" in pose.lower():
        prompt = pose_branching.format(Instruction=Instruction, Environment_object=Environment_object, Object_state=Object_state, Branches=response)
        response = base_vlm.call_model(prompt)
        print(response)

### tasks

In [6]:
# Task1
Instruction = "Open the microwave door"
Environment_object = "microwave, tennis, desk"
Object_state = '''
1. [ontop, microwave, desk]
2. [ontop, tennis, desk]
3. [closed, microwave]
'''
priori_branching(Instruction, Environment_object, Object_state)



planning!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
One branch:
Subtasks of branch 1:
1. [OPEN, microwave]
Yes!!! The final generated plan meets the specifications. 
The action [OPEN, microwave] can be executed alone.


In [111]:
# Task2
Instruction = "Reorient the pen and place it into a pen holder"
Environment_object = "pen, pen holder 1, pen holder 2, drawer, desk"
Object_state = '''
1. [ontop, pen, drawer]  
2. [ontop, holder 1, desk]  
3. [ontop, holder 2, desk]  
4. [ontop, drawer, desk]
'''
priori_branching(Instruction, Environment_object, Object_state)

planning!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Given the provided instruction, environment objects, and object states, the task is to reorient the pen and place it into one of the two pen holders. This means there will be two branches since there are two pen holders to choose from. 

There is no articulated object such as a cabinet or drawer that needs to be opened or closed in this task.

### Output:

Two branches:

Subtasks of branch 1 (placing pen into pen holder 1):
1. [PICK UP, pen]
2. [PUT INTO, pen holder 1]

Subtasks of branch 2 (placing pen into pen holder 2):
1. [PICK UP, pen]
2. [PUT INTO, pen holder 2]
No!!! The final generated plan encountered an error: the second action [PICK UP, pen] in branch 2 cannot be executed directly after the first action [PUT INTO, pen holder 1] in branch 1.

### Corrected Plan:

Since different branches should be judged separately, and each branch starts with [PICK UP, pen], here’s the corrected plan for each branch:

Subtasks of bra

In [112]:
# Task3
Instruction = "Pick up the holder horizontally or vertically"
Environment_object = "holder, drawer, desk"
Object_state = '''
1. [ontop, holder, drawer]  
2. [ontop, drawer, desk]
'''
priori_branching(Instruction=Instruction, Environment_object=Environment_object, Object_state=Object_state)

planning!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
### Input:
Instruction:
Pick up the holder horizontally or vertically

Environment_object:
holder, drawer, desk

Object_state:
1. [ontop, holder, drawer]  
2. [ontop, drawer, desk]

### Output:
One branch:
Subtasks of branch 1:
1. [PICK UP, holder]
Yes!!! The final generated plan meets the specifications. The first action is picking up the holder, which can be executed alone.
## Output:
Two branches:
1. holder >> (horizontally)
2. holder >> (vertically)

Subtasks of branch 1:
1. [PICK UP, holder][horizontally]

Subtasks of branch 2:
1. [PICK UP, holder][vertically]


In [113]:
# Task4
Instruction = "Close the drawer"
Environment_object = "drawer, toy, desk"
Object_state = '''
1. [inside, toy, drawer]  
2. [ontop, drawer, desk]
'''
priori_branching(Instruction=Instruction, Environment_object=Environment_object, Object_state=Object_state)

planning!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Since the task is simply to close the drawer and there are no branches because it's just one object involved, the output will be:

One branch:
Subtasks of branch 1:
1. [CLOSE, drawer]
Yes!!! The final generated plan meets the specifications.
The single action to close the drawer can be executed alone.




In [114]:
# Taskd
Instruction = "Pick up a tennis"
Environment_object = "tennis 1, tennis 2, holder, desk"
Object_state = '''
1. [inside, tennis 1, holder]  
2. [ontop, tennis 2, desk]
3. [ontop, holder, desk]
'''
priori_branching(Instruction=Instruction, Environment_object=Environment_object, Object_state=Object_state)

planning!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Two branches:
1. tennis 1 from holder
2. tennis 2 from desk

Subtasks of branch 1:
1. [PICK UP, tennis 1]

Subtasks of branch 2:
1. [PICK UP, tennis 2]
Yes!!! The final generated plan meets the specifications. 
Plans for different branches should be judged separately. Each branch plan meets the specifications.

For branch 1:
- The action is picking up tennis 1 from the holder. The action [PICK UP, tennis 1] can be executed alone.

For branch 2:
- The action is picking up tennis 2 from the desk. The action [PICK UP, tennis 2] can be executed alone.


In [115]:
# Task5
Instruction = "Reorient the pen and place it into a holder"
Environment_object = "pen, holder 1, holder 2, apple, drawer, desk"
Object_state = '''
1. [ontop, pen, drawer]  
2. [ontop, holder 1, desk]  
3. [ontop, holder 2, desk]  
4. [ontop, drawer, desk]
5. [ontop, apple, holder 2]
'''
priori_branching(Instruction=Instruction, Environment_object=Environment_object, Object_state=Object_state)

planning!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Two branches:
1. pen >> holder 1
2. pen >> holder 2

Subtasks of branch 1:
1. [PICK UP, pen]
2. [PUT INTO, holder 1]

Subtasks of branch 2:
1. [PICK UP, pen]
2. [PUT INTO, holder 2]
Yes!!! The final generated plan meets the specifications.

For Branch 1:
1. The first action is picking up the pen. After [PICK UP, pen], the next action is [PUT INTO, holder 1], which is valid.

For Branch 2:
1. The first action is picking up the pen. After [PICK UP, pen], the next action is [PUT INTO, holder 2], which is valid.

Both branch plans are separately analyzed and meet the task specifications. Each branch has valid actions following the rules, so the overall plan is correct.


In [116]:
# Task6
Instruction = "Place the apple and the holder on the drawer, with the apple inside the holder"
Environment_object = "apple, holder, drawer, desk"
Object_state = '''
1. [ontop, holder, desk]  
2. [ontop, apple, desk]  
3. [ontop, drawer, desk]
'''
priori_branching(Instruction=Instruction, Environment_object=Environment_object, Object_state=Object_state)

planning!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
One branch:
Subtasks of branch 1:
1. [PICK UP, apple]
2. [PUT INTO, holder]
3. [PICK UP, holder]
4. [PUT ON, drawer]
Yes!!! The final generated plan meets the specifications.
After the first action [PICK UP, apple], the second action can be [PUT INTO, holder].
After the second action [PUT INTO, holder], the third action can be [PICK UP, holder].
After the third action [PICK UP, holder], the fourth action can be [PUT ON, drawer].


In [None]:
# Task7
Instruction = "Put the apple and the tennis ball in either the drawer or the pen holder, together or separately. Ensure the drawer is closed."
Environment_object = "apple, tennis, holder, drawer, desk"
Object_state = '''
1. [ontop, holder, desk]  
2. [ontop, apple, desk]  
3. [ontop, drawer, desk]
4. [ontop, tennis, desk]
5. [closed, drawer]
'''
priori_branching(Instruction=Instruction, Environment_object=Environment_object, Object_state=Object_state)

planning!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
### Output Branch1 (Apple and Tennis Ball into Drawer):

1. [OPEN, drawer]
2. [PICK UP, apple]
3. [PUT INTO, drawer]
4. [CLOSE, drawer]
5. [OPEN, drawer]
6. [PICK UP, tennis]
7. [PUT INTO, drawer]
8. [CLOSE, drawer]

### Output Branch2 (Apple into Drawer and Tennis Ball into Holder):

1. [OPEN, drawer]
2. [PICK UP, apple]
3. [PUT INTO, drawer]
4. [CLOSE, drawer]
5. [PICK UP, tennis]
6. [PUT INTO, holder]

### Output Branch3 (Apple into Holder and Tennis Ball into Drawer):

1. [PICK UP, apple]
2. [PUT INTO, holder]
3. [OPEN, drawer]
4. [PICK UP, tennis]
5. [PUT INTO, drawer]
6. [CLOSE, drawer]

### Output Branch4 (Apple and Tennis Ball into Holder):

1. [PICK UP, apple]
2. [PUT INTO, holder]
3. [PICK UP, tennis]
4. [PUT INTO, holder]
Yes!!! The final generated plan meets the specifications.

### Analysis

**Branch 1 (Apple and Tennis Ball into Drawer):**

1. [OPEN, drawer] 
   - Opening the drawer.
2. [PICK UP, apple]
   - Picki

## w.o priori

In [118]:
subtask_fewshot = '''
Task background: The robot is in front of the desk and needs to complete a multi-step task.
You need to generate a multi-step task plan based on the provided task instruction, environment_objects, object states and example specifications.

Action Skills Set:
- PICK UP
- PUT ON
- PUT INTO
- OPEN
- CLOSE

Note:
1. Articulated objects (e.g., cabinet) require OPEN and CLOSE actions. There are two types of containers: articulated objects (such as drawers) that need to be opened and closed, and if they are not (such as holders), they do not need to be opened and closed. If the task being executed is unrelated to the articulated object, or if there is no mention of placing the object in the articulated object, do not open or close the articulated object. 
2. If there are multiple objects that need to be placed in different containers and different branches need to be generated, the order of object operations remains unchanged, while the corresponding order of containers changes.
3. If there are multiple poses for a specific grasping action, ignore them and summarize them with only one PICK UP without adding any other modifiers.
4. Additionally, we require that when multiple objects are placed into a articulated container, each insertion must follow an open–pick-putin–close cycle. That is, the container must be opened, the object placed inside, and then closed before inserting the next object.
### Example1:
Input:
Instruction: put one apple into the cabinet and another apple on the plate.
Environment_object: apple 1, apple 2, cabinet, plate, desk
Object_state:
1. [ontop, apple 1, desk]
2. [ontop, apple 2, desk]
3. [ontop, cabinet, desk]
4. [closed, cabinet]
5. [ontop, plate, desk]
Output:
1. [OPEN, cabinet]
2. [PICK UP, apple 1]
3. [PUT INTO, cabinet]
4. [CLOSE, cabinet]
5. [PICK UP, apple 2]
6. [PUT ON, apple 2, plate]

### Input:
Instruction:
{Instruction}

Environment_object:
{Environment_object}

Object_state:
{Object_state}

Output:

'''

In [None]:
# Task2
Instruction = "Reorient the pen and place it into a pen holder"
Environment_object = "pen, pen holder 1, pen holder 2, drawer, desk"
Object_state = '''
1. [ontop, pen, drawer]  
2. [ontop, holder 1, desk]  
3. [ontop, holder 2, desk]  
4. [ontop, drawer, desk]
'''
subtask_prompt = subtask_fewshot.format(Instruction=Instruction, Environment_object=Environment_object, Object_state=Object_state)
base_vlm = BaseVLM(api_key, model, system_message, base_url)
prompt = subtask_fewshot.format(Instruction=Instruction, Environment_object=Environment_object, Object_state=Object_state)
response = base_vlm.call_model(prompt)
print(response)
prompt = detect_pose.format(Instruction=Instruction)
pose = base_vlm.call_model(prompt)
if "no" in pose.lower():
    print(pose)
elif "yes" in pose.lower():
    prompt = pose_branching.format(Instruction=Instruction, Environment_object=Environment_object, Object_state=Object_state, Branches=response)
    response = base_vlm.call_model(prompt)
    print(response)

1. [PICK UP, pen]
2. [PUT INTO, pen, holder 1]
NO
