# Frozen Lake + Reflexion

## Utils

### Imports

In [1]:
import gymnasium as gym
from environments.frozen_lake.common.environment import FrozenLakeEnvWrapper
from environments.frozen_lake.common.evaluate_output_parser import FrozenMapEvaluateOutputParser
from langchain_openai import ChatOpenAI
from gymnasium.envs.toy_text.frozen_lake import generate_random_map
import os
from environments.frozen_lake.reflexion.evaluate_prompts import evaluate_prompt
from environments.frozen_lake.reflexion.generate_prompts import generate_openai_tools_prompt
from environments.frozen_lake.reflexion.self_reflection_prompts import self_reflection_prompt
from planning_library.strategies import ReflexionStrategy
from operator import itemgetter
from langchain_core.runnables import RunnableLambda
from datetime import datetime


%load_ext autoreload
%autoreload 2

### Logging

In [2]:
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "Reflexion + Frozen Lake test"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"

## Environment

We use a custom wrapper on top of Frozen Lake from gym. It changes the following:
* it accepts `AgentAction` from LangChain and internally calls the original environment with right actions;
* it transforms the observations returned by the original environment: instead of a single number, it's now current position (x,y) on the board;
* it supports performing a given sequence of actions when resetting an environment.

In [3]:
env = gym.make("FrozenLake-v1", desc=generate_random_map(size=4), is_slippery=True, render_mode="rgb_array")
env = FrozenLakeEnvWrapper(env)
env = gym.wrappers.RecordVideo(env, video_folder=f"reflexion_video/slippery_{datetime.strftime(datetime.now(), '%d-%m-%Y;%H:%m:%s')}", episode_trigger=lambda x: True)
env.reset(seed=123)

(0, {'prob': 1})

In [4]:
board = "\n".join("".join(x.decode() for x in y) for y in env.get_wrapper_attr("desc"))
print(board)

SHFF
FHFF
FFFF
FFFG


## Strategy

### Hyperparameters

In [5]:
# Reflexion hyperparameters
value_threshold = 1.0  # threshold for evaluation; when reached, the loop will exit
max_num_iterations = 20  # maximum number of iterations; when reached, the loop will exit

# other hyperparameters
model_name = "gpt-3.5-turbo"
temperature = 0.8

### Actor

Default Actor is powered by [any agent available in LangChain](https://python.langchain.com/docs/modules/agents/agent_types/). Specifically, it expects either [`BaseSingleActionAgent`](https://api.python.langchain.com/en/latest/agents/langchain.agents.agent.BaseSingleActionAgent.html#langchain.agents.agent.BaseSingleActionAgent) or [`BaseMultiAgentAgent`](https://api.python.langchain.com/en/latest/agents/langchain.agents.agent.BaseMultiActionAgent.html#langchain.agents.agent.BaseMultiActionAgent).

### Prompt

Here is an example of a simple prompt suitable for Game of 24 with Reflexion.

In [6]:
generate_openai_tools_prompt.input_variables

['agent_scratchpad', 'inputs', 'self_reflections']

In [7]:
print(generate_openai_tools_prompt.format(inputs=board, self_reflections=[], agent_scratchpad=[]))

System: You are an intelligent agent playing a Frozen Lake game.
Human: 
In Frozen Lake game, you move on a 2D grid. Your goal is to cross this grid from start 
to finish without falling into any holes.

You start at location (0,0) (upper left corner) of the frozen lake grid world, and the finish is located at far extent of the world, i.e., (n - 1, n - 1) for n x n grid (lower right corner).

The first coordinate is X axis. 
When you move right, you increase your first coordinate by 1 (can't be bigger than n - 1).
When you move left, you decrease your first coordinate by 1 (can't be lower than 0).

The second coordinate is Y axis.
When you move up, you decrease your second coordinate by 1 (can't be lower than 0).
When you move down, you increase your second coordinate by 1 (can't be bigger than n - 1).

The map is an n x n grid where different types of cells are denoted by different letters:
* S - start cell
* G - goal cell
* F - frozen cell
* H - hole cell

Example for 2 x 2 case:
SH


In [8]:
tools = env.get_wrapper_attr("tools")
tools

[MoveTool(env=<FrozenLakeEnvWrapper<TimeLimit<OrderEnforcing<PassiveEnvChecker<FrozenLakeEnv<FrozenLake-v1>>>>>>)]

### Putting It All Together

Let's use [OpenAI Tools](https://python.langchain.com/docs/modules/agents/agent_types/openai_tools) agent.

In [9]:
from langchain_core.utils.function_calling import convert_to_openai_tool
from typing import Tuple, Any
from langchain.agents.format_scratchpad.openai_tools import format_to_openai_tool_messages
from typing import List
from langchain_core.messages import BaseMessage, AIMessage
from langchain.agents.output_parsers.openai_tools import OpenAIToolsAgentOutputParser


def format_self_reflections(self_reflections: List[Tuple[str, Any]]) -> List[BaseMessage]:
    result = []
    for t in self_reflections:
        if t[0] == "content":
            content = t[1]
            message = AIMessage(content=content)
            result.append(message)
            
    return result


# copied from langchain.agents.create_openai_tools_agent to allow for custom formatting of `self_reflections in inputs

llm = ChatOpenAI(model=model_name, temperature=temperature).bind(tools=[convert_to_openai_tool(tool) for tool in tools])

agent = (
    {
        "inputs": itemgetter("inputs"),
        "agent_scratchpad": itemgetter("intermediate_steps") | RunnableLambda(format_to_openai_tool_messages),
        "self_reflections": itemgetter("self_reflections") | RunnableLambda(format_self_reflections),
    }
    | generate_openai_tools_prompt
    | llm
    | OpenAIToolsAgentOutputParser()
)

### Evaluator

Default Evaluator is powered by a [`Runnable`](https://python.langchain.com/docs/expression_language/interface) that takes in:

* `inputs`: original inputs
* `intermediate_steps`: a sequence of agent actions
* `agent_outcome`: final answer

It is allowed to return anything. In this case, it returns a float ranging from 0 to 1.

### Prompt

In [10]:
evaluate_prompt.input_variables

['inputs', 'intermediate_steps']

In [11]:
print(evaluate_prompt.format(inputs=board, intermediate_steps=[]))

System: You are a helpful assistant that judges whether the episodes of playing the Frozen Lake game ended in failure or success.
Human: 
In the Frozen Lake game, you move on a 2D grid. The end goal is to cross this grid from start 
to finish without falling into any holes.
You start at location (0,0) (upper left corner) of the frozen lake grid world, and the finish is located 
at far extent of the world, i.e., (n - 1, n - 1) for n x n grid (lower right corner). 
If you step on the hole, the game ends.

You will be given a map and a sequence of intermediate steps from one episode of playing Frozen Lake game. Your goal is to evaluate whether the episode ended in success (i.e., the goal was obtained) or in failure (i.e., the player stepped into the hole or didn't reach a goal for some reason).   

Take your time and comment your decision, but make sure to always output either 0 or 1 in the end, where 0 would mean 'the episode ended in failure' and 1 would mean 'the episode ended in succe

### Putting It All Together

In [12]:
evaluator_runnable = (
    {
        "inputs": itemgetter("inputs") | RunnableLambda(itemgetter("inputs")),
        "intermediate_steps": itemgetter("intermediate_steps") | RunnableLambda(format_to_openai_tool_messages),
    }
    | evaluate_prompt
    | ChatOpenAI(model=model_name, temperature=temperature)
    | FrozenMapEvaluateOutputParser()
)

### Self-Reflection

Default Self-Reflection is powered by a [`Runnable`](https://python.langchain.com/docs/expression_language/interface) that takes in:

* `inputs`: original inputs
* `intermediate_steps`: a sequence of agent actions
* `agent_outcome`: final answer

And returns a single self-reflection: high-level plan of what went wrong in a current trial.

### Prompt

In [13]:
self_reflection_prompt.input_variables

['inputs', 'intermediate_steps']

In [14]:
print(
    self_reflection_prompt.format(
        inputs=board,
        intermediate_steps=[],
    )
)

System: You are an advanced reasoning agent that can self-reflect on their shortcomings when solving reasoning tasks.
Human: 
You will be given your previous trial in the Frozen Lake game.

In Frozen Lake game, you move on a 2D grid. The end goal is to cross this grid from start 
to finish without falling into any holes.

You start at location (0,0) (upper left corner) of the frozen lake grid world, and the finish is located 
at far extent of the world, i.e., (n - 1, n - 1) for n x n grid (lower right corner). 

The first coordinate is X axis. 
When you move right, you increase your first coordinate by 1 (can't be bigger than n - 1).
When you move left, you decrease your first coordinate by 1 (can't be lower than 0).

The second coordinate is Y axis.
When you move up, you decrease your second coordinate by 1 (can't be lower than 0).
When you move down, you increase your second coordinate by 1 (can't be bigger than n - 1).

The map is an n x n grid where different types of cells are den

### Putting It All Together

In [15]:
self_reflection_runnable = (
    {
        "inputs": itemgetter("inputs") | RunnableLambda(itemgetter("inputs")),
        "intermediate_steps": lambda x: format_to_openai_tool_messages(x["intermediate_steps"]),
        "agent_outcome": lambda x: x["agent_outcome"].return_values["output"],
    }
    | self_reflection_prompt
    | ChatOpenAI(model=model_name, temperature=temperature)
)

## Defining strategy

In [16]:
from planning_library.action_executors import GymnasiumActionExecutor
from typing import Dict, Any


def reset_environment(_: Dict[str, Any]) -> None:
    env.reset()


action_executor = GymnasiumActionExecutor(env)
reflexion = ReflexionStrategy.create(
    agent=agent,
    tools=tools,
    action_executor=action_executor,
    evaluator_runnable=evaluator_runnable,
    self_reflection_runnable=self_reflection_runnable,
    value_threshold=value_threshold,
    reset_environment=reset_environment,
    max_iterations=max_num_iterations,
)

In [17]:
env.reset()
reflexion.invoke(
    {"inputs": {"inputs": board}},
    {"recursion_limit": 10000},
)
env.close()

Moviepy - Building video /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-0.mp4.
Moviepy - Writing video /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-0.mp4


                                                  

Moviepy - Done !
Moviepy - video ready /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-0.mp4




Moviepy - Building video /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-1.mp4.
Moviepy - Writing video /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-1.mp4


                                                  

Moviepy - Done !
Moviepy - video ready /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-1.mp4




Moviepy - Building video /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-2.mp4.
Moviepy - Writing video /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-2.mp4


                                                  

Moviepy - Done !
Moviepy - video ready /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-2.mp4




Moviepy - Building video /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-3.mp4.
Moviepy - Writing video /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-3.mp4


                                                   

Moviepy - Done !
Moviepy - video ready /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-3.mp4




Moviepy - Building video /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-4.mp4.
Moviepy - Writing video /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-4.mp4


                                                  

Moviepy - Done !
Moviepy - video ready /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-4.mp4




Moviepy - Building video /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-5.mp4.
Moviepy - Writing video /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-5.mp4


                                                  

Moviepy - Done !
Moviepy - video ready /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-5.mp4




Moviepy - Building video /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-6.mp4.
Moviepy - Writing video /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-6.mp4


                                                  

Moviepy - Done !
Moviepy - video ready /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-6.mp4




Moviepy - Building video /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-7.mp4.
Moviepy - Writing video /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-7.mp4


                                                  

Moviepy - Done !
Moviepy - video ready /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-7.mp4




Moviepy - Building video /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-8.mp4.
Moviepy - Writing video /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-8.mp4


                                                   

Moviepy - Done !
Moviepy - video ready /Users/Alexandra.Eliseeva/PycharmProjects/planning-library/environments/frozen_lake/reflexion_video/slippery_18-03-2024;22:03:1710795792/rl-video-episode-8.mp4


