# ALFWorld + Reflexion

## Utils

### Imports

In [1]:
import requests
from environments.alfworld.common.environment import ALFWorldEnv
from environments.alfworld.common.evaluate_output_parser import ALFWorldEvaluateOutputParser
from environments.alfworld.reflexion.generate_prompts import generate_openai_tools_prompt
from environments.alfworld.reflexion.evaluate_prompts import evaluate_prompt
from environments.alfworld.reflexion.self_reflection_prompts import self_reflection_prompt
import os
from langchain_openai import ChatOpenAI
from operator import itemgetter
from langchain_core.runnables import RunnableLambda
from planning_library.strategies import ReflexionStrategy


%load_ext autoreload
%autoreload 2

### Logging

In [2]:
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "ALFWorld + Reflexion"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"

## Environment

In [3]:
os.environ["ALFWORLD_DATA"] = "data"

if not "base_config.yaml" in os.listdir():
    response = requests.get("https://raw.githubusercontent.com/alfworld/alfworld/master/configs/base_config.yaml")
    
    with open("base_config.yaml", "wb") as f:
        f.write(response.content)

env = ALFWorldEnv(config_path="base_config.yaml")
env.seed(123)

Initializing AlfredTWEnv...


100%|██████████| 8810/8810 [00:03<00:00, 2851.12it/s]

Overall we have 3553 games in split=train
Training with 3553 games





In [4]:
obs, info = env.reset()
print(obs)

-= Welcome to TextWorld, ALFRED! =-

You are in the middle of a room. Looking quickly around you, you see a bathtubbasin 1, a countertop 1, a drawer 4, a drawer 3, a drawer 2, a drawer 1, a dresser 1, a garbagecan 1, a handtowelholder 2, a handtowelholder 1, a shelf 2, a shelf 1, a sinkbasin 1, a toilet 1, a toiletpaperhanger 1, and a towelholder 1.

Your task is to: put a spraybottle in garbagecan.


## Strategy

### Hyperparameters

In [5]:
# Reflexion hyperparameters
value_threshold = 1.0  # threshold for evaluation; when reached, the loop will exit
max_num_iterations = 10  # maximum number of iterations; when reached, the loop will exit

# other hyperparameters
model_name = "gpt-3.5-turbo"
temperature = 0.8

### Actor

Default Actor is powered by [any agent available in LangChain](https://python.langchain.com/docs/modules/agents/agent_types/). Specifically, it expects either [`BaseSingleActionAgent`](https://api.python.langchain.com/en/latest/agents/langchain.agents.agent.BaseSingleActionAgent.html#langchain.agents.agent.BaseSingleActionAgent) or [`BaseMultiAgentAgent`](https://api.python.langchain.com/en/latest/agents/langchain.agents.agent.BaseMultiActionAgent.html#langchain.agents.agent.BaseMultiActionAgent).

### Prompt

Here is an example of a simple prompt suitable for ALFWorld with Reflexion.

In [6]:
generate_openai_tools_prompt.input_variables

['agent_scratchpad', 'inputs', 'self_reflections']

In [7]:
print(generate_openai_tools_prompt.format(inputs=obs, self_reflections=[], agent_scratchpad=[]))

System: You are ALFRED, an intelligent agent navigating in a household.
Human: -= Welcome to TextWorld, ALFRED! =-

You are in the middle of a room. Looking quickly around you, you see a bathtubbasin 1, a countertop 1, a drawer 4, a drawer 3, a drawer 2, a drawer 1, a dresser 1, a garbagecan 1, a handtowelholder 2, a handtowelholder 1, a shelf 2, a shelf 1, a sinkbasin 1, a toilet 1, a toiletpaperhanger 1, and a towelholder 1.

Your task is to: put a spraybottle in garbagecan.
Human: This might be not your first attempt to fulfill the task. 
            In this case, you will find self-reflective thoughts below. Make sure to pay extra attention to them,
             as they aim to identify and mitigate the exact shortcomings that led to failure in previous trials. 

Human: Good luck!


In [8]:
tools = env.tools
tools

[GoToTool(env=<textworld.gym.envs.textworld_batch.TextworldBatchGymEnv object at 0x2a3503d50>),
 OpenTool(env=<textworld.gym.envs.textworld_batch.TextworldBatchGymEnv object at 0x2a3503d50>),
 CloseTool(env=<textworld.gym.envs.textworld_batch.TextworldBatchGymEnv object at 0x2a3503d50>),
 TakeTool(env=<textworld.gym.envs.textworld_batch.TextworldBatchGymEnv object at 0x2a3503d50>),
 PutTool(env=<textworld.gym.envs.textworld_batch.TextworldBatchGymEnv object at 0x2a3503d50>),
 ToggleTool(env=<textworld.gym.envs.textworld_batch.TextworldBatchGymEnv object at 0x2a3503d50>),
 HeatTool(env=<textworld.gym.envs.textworld_batch.TextworldBatchGymEnv object at 0x2a3503d50>),
 CoolTool(env=<textworld.gym.envs.textworld_batch.TextworldBatchGymEnv object at 0x2a3503d50>),
 CleanTool(env=<textworld.gym.envs.textworld_batch.TextworldBatchGymEnv object at 0x2a3503d50>),
 ExamineTool(env=<textworld.gym.envs.textworld_batch.TextworldBatchGymEnv object at 0x2a3503d50>),
 InventoryTool(env=<textworld.gym.

### Putting It All Together

Let's use [OpenAI Tools](https://python.langchain.com/docs/modules/agents/agent_types/openai_tools) agent.

In [9]:
from langchain_core.utils.function_calling import convert_to_openai_tool
from typing import Tuple, Any
from langchain.agents.format_scratchpad.openai_tools import format_to_openai_tool_messages
from typing import List
from langchain_core.messages import BaseMessage, AIMessage
from langchain.agents.output_parsers.openai_tools import OpenAIToolsAgentOutputParser


def format_self_reflections(self_reflections: List[Tuple[str, Any]]) -> List[BaseMessage]:
    result = []
    for t in self_reflections:
        if t[0] == "content":
            content = t[1]
            message = AIMessage(content=content)
            result.append(message)
            
    return result


# copied from langchain.agents.create_openai_tools_agent to allow for custom formatting of `self_reflections in inputs

llm = ChatOpenAI(model=model_name, temperature=temperature).bind(tools=[convert_to_openai_tool(tool) for tool in tools])

agent = (
    {
        "inputs": itemgetter("inputs"),
        "agent_scratchpad": itemgetter("intermediate_steps") | RunnableLambda(format_to_openai_tool_messages),
        "self_reflections": itemgetter("self_reflections") | RunnableLambda(format_self_reflections),
    }
    | generate_openai_tools_prompt
    | llm
    | OpenAIToolsAgentOutputParser()
)

### Evaluator

Default Evaluator is powered by a [`Runnable`](https://python.langchain.com/docs/expression_language/interface) that takes in:

* `inputs`: original inputs
* `intermediate_steps`: a sequence of agent actions
* `agent_outcome`: final answer

It is allowed to return anything. In this case, it returns a float ranging from 0 to 1.

### Prompt

In [10]:
evaluate_prompt.input_variables

['inputs', 'intermediate_steps']

In [11]:
print(evaluate_prompt.format(inputs=obs, intermediate_steps=[]))

System: You are a helpful assistant that judges whether the episodes of household navigation ended in failure or success.
Human: 
You will be given an input and a sequence of intermediate steps from one episode of household navigation. Your goal is to evaluate whether the episode ended in success or in failure.   

Take your time and comment your decision, but make sure to always output either 0 or 1 in the end, where 0 would mean 'the episode ended in failure' and 1 would mean 'the episode ended in success'. 
Use the following format: [[number]].
Human: Here is the input for the current episode:
-= Welcome to TextWorld, ALFRED! =-

You are in the middle of a room. Looking quickly around you, you see a bathtubbasin 1, a countertop 1, a drawer 4, a drawer 3, a drawer 2, a drawer 1, a dresser 1, a garbagecan 1, a handtowelholder 2, a handtowelholder 1, a shelf 2, a shelf 1, a sinkbasin 1, a toilet 1, a toiletpaperhanger 1, and a towelholder 1.

Your task is to: put a spraybottle in garba

### Putting It All Together

In [12]:
evaluator_runnable = (
    {
        "inputs": itemgetter("inputs") | RunnableLambda(itemgetter("inputs")),
        "intermediate_steps": itemgetter("intermediate_steps") | RunnableLambda(format_to_openai_tool_messages),
    }
    | evaluate_prompt
    | ChatOpenAI(model=model_name, temperature=temperature)
    | ALFWorldEvaluateOutputParser()
)

### Self-Reflection

Default Self-Reflection is powered by a [`Runnable`](https://python.langchain.com/docs/expression_language/interface) that takes in:

* `inputs`: original inputs
* `intermediate_steps`: a sequence of agent actions

And returns a single self-reflection: high-level plan of what went wrong in a current trial.

### Prompt

In [13]:
self_reflection_prompt.input_variables

['inputs', 'intermediate_steps']

In [14]:
print(
    self_reflection_prompt.format(
        inputs=obs,
        intermediate_steps=[],
    )
)

System: You are an advanced reasoning agent that can self-reflect on their shortcomings when solving reasoning tasks.
Human: 
You will be given your previous trial in the household navigation.

Input:
-= Welcome to TextWorld, ALFRED! =-

You are in the middle of a room. Looking quickly around you, you see a bathtubbasin 1, a countertop 1, a drawer 4, a drawer 3, a drawer 2, a drawer 1, a dresser 1, a garbagecan 1, a handtowelholder 2, a handtowelholder 1, a shelf 2, a shelf 1, a sinkbasin 1, a toilet 1, a toiletpaperhanger 1, and a towelholder 1.

Your task is to: put a spraybottle in garbagecan.
Human: Your intermediate steps:
Human: 
In this trial, you were unsuccessful. 
In a few sentences, diagnose a possible reason for failure and devise a new, concise, high level plan that aims to mitigate the same shortcomings. 
Use complete sentences.


### Putting It All Together

In [15]:
self_reflection_runnable = (
    {
        "inputs": itemgetter("inputs") | RunnableLambda(itemgetter("inputs")),
        "intermediate_steps": lambda x: format_to_openai_tool_messages(x["intermediate_steps"]),
        "agent_outcome": lambda x: x["agent_outcome"].return_values["output"],
    }
    | self_reflection_prompt
    | ChatOpenAI(model=model_name, temperature=temperature)
)

## Defining strategy

In [16]:
from planning_library.action_executors import GymnasiumActionExecutor
from typing import Dict, Any


def reset_environment(_: Dict[str, Any]) -> None:
    env.reset()


action_executor = GymnasiumActionExecutor(env)
reflexion = ReflexionStrategy.create(
    agent=agent,
    tools=tools,
    action_executor=action_executor,
    evaluator_runnable=evaluator_runnable,
    self_reflection_runnable=self_reflection_runnable,
    value_threshold=value_threshold,
    reset_environment=reset_environment,
    max_iterations=max_num_iterations,
)

In [17]:
reflexion.invoke(
    {"inputs": {"inputs": obs}},
    {"recursion_limit": 10000},
)
env.close()