### study

- reflexion is a paradigm that uses **verbal reinforcement** that gets encoded in an agent's memory.
- reflexion converts feedback from the environment into linguistic feedback, (like self-reflection) which is inserted into context for an LLM agent in the next episode. This helps the agent rapidly and effectively learn from prior mistakes leading to performance improvements on many advanced tasks.
- three distinct models:
    - **actor**: generates text+actions. use CoT, ReAct for this.
    - **evaluator**: score outputs produced by actor. different rewarding functions are used depending on the task (llm and rule-based heuristics are used for decision-making tasks).
    - **self-reflection**: generates verbal reinforcement cues to assist the Actor in self-improvement. uses reward signal + current trajectory + persistent memory.
- steps:
    - define a task
    - generate a trajectory
    - evaluate
    - perform reflection
    - generate the next trajectory

![Reflexion](https://www.promptingguide.ai/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Freflexion-examples.7558c279.png&w=3840&q=75)


- effective:
    - sequential decision making
    - reasoning
    - programming

- limitations:
    - reliance on self-evaluation capabilities
    - long-term memory constraints
    - code generation limitations

Sources: https://www.promptingguide.ai/techniques/reflexion

- post outlines how to build 3 reflection techniques using LangGraph, including implementations of Reflexion and Language Agent Tree Search
    - simple reflection 
    - reflexion
    - language agents tree search

- thinking types: **System 1**, **System 2**
    - System 1: reactive, instictive
    - System 2: methodical, reflective
    - what we want : System 2 like behavior
    - how: whne applied correctly, reflection can help LLM systems break out of purely System 1 **thinking** patterns and closer to something exhibiting System 2-like behavior.
     - https://neurofied.com/wp-content/uploads/2022/05/System-12-White-1.png


Sources: 
- https://blog.langchain.dev/reflection-agents/
- https://github.com/langchain-ai/langgraph/blob/main/examples/reflexion/reflexion.ipynb

### implement

Thoughts:
1. trying to learn from https://github.com/noahshinn/reflexion
2. actor, evaluator, self-reflection
3. write prompts for roles: actor, evaluator, self-reflection
4. write output_parsers for above
5. execute
    - run agent (actor)
    - evaluate final_answer from actor using evaluator
    - self-reflect with trajectory from actor + eval from evaluator to give relfection
    - run agent till the evaluator gives correct answer or n (turns to run this while loop) exhausted

reference: https://github.com/aasthavar/reflexion/blob/main/hotpotqa_runs/prompts.py
1. cot, cot-reflect
2. react, react-reflect

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from utils.reflextion_agent import *

import logging
logging.basicConfig(
    level=logging.INFO,
    format="%(name)s | L:%(lineno)d | %(message)s",
    force=True
)
logger = logging.getLogger(__name__)
logger.info("Invoking main_agent")

__main__ | L:10 | Invoking main_agent


In [3]:
#### test agent
# main_agent = get_main_agent()
# agent_result = main_agent_executor.invoke({
#     "question": "Which character does this protagonist, who secretly loves and marries a member of the rival house, of William Shakespeare's tragedy that has a fictional character Benvolio slay?",
#     "reflections": ""
# })

#### test eval
# evaluator_agent = get_evaluator_agent()
# eval_result = evaluator_agent.invoke({
#     # "answer": "Benvolio", # 0
#     # "answer": "Tybalt", # 1
#     "answer": "Juliet", # 0 # this is a little buggy
#     "question": "Which character does this protagonist, who secretly loves and marries a member of the rival house, of William Shakespeare's tragedy that has a fictional character Benvolio slay?"
# })

#### test self-reflect
# reflect_chain = get_reflexion_chain()
# main_agent_scratchpad = get_main_agent_scratchpad(
#     intermediate_steps=agent_result["intermediate_steps"], 
#     final_answer=agent_result["output"]
# )
# print(main_agent_scratchpad)
# reflect_result = self_reflection_chain.invoke({
#     "question": "Which character does this protagonist, who secretly loves and marries a member of the rival house, of William Shakespeare's tragedy that has a fictional character Benvolio slay?",
#     "scratchpad": main_agent_scratchpad,
# })

In [4]:
# question = "Which character does this protagonist, who secretly loves and marries a member of the rival house, of William Shakespeare's tragedy that has a fictional character Benvolio slay?",
# question = """The creator of "Wallace and Gromit" also created what animation comedy that matched animated zoo animals with a soundtrack of people talking about their homes?"""
question = "The head of the Foreign Relations Department of the Rastriya Janashakti Party holds a degree that can be abbreviated MS, M.S., or ScM, in what field?"
# question = """What is represented on the cover art of the studio album that includes the song "Speed of Sound"?"""


reflections = []
evaluate_final_answer = 0
main_agent_intermediate_steps = ""
num_of_steps = 1


while (evaluate_final_answer!=1) and (num_of_steps<=3):
    main_agent = get_main_agent()
    logger.info(f"step: {num_of_steps}. Invoking main_agent")
    main_agent_result = main_agent.invoke({
        "reflections": format_reflections(reflections),
        "question": question,
    })
    
    evaluator_agent = get_evaluator_agent()
    main_agent_final_answer = main_agent_result["output"]
    logger.info(f"step: {num_of_steps}. Invoking evaluator_agent")
    evaluator_result = evaluator_agent.invoke({
        "answer": main_agent_final_answer,
        "question": question,
    })
    
    logger.info(f"evaluator_result_output: {evaluator_result['output']}")
    logger.info(f"main_agent_final_answer: {main_agent_final_answer}")
    
    evaluate_final_answer = 1 if "1" in evaluator_result["output"] else 0
    if evaluate_final_answer == 1:
        break

    reflect_chain = get_reflexion_chain()
    main_agent_scratchpad = get_main_agent_scratchpad(
        intermediate_steps=main_agent_result["intermediate_steps"], 
        final_answer=main_agent_result["output"]
    )
    logger.info(f"step: {num_of_steps}. Invoking reflect_chain")
    reflect_chain_results = reflect_chain.invoke({
        "question": question,
        "scratchpad": main_agent_scratchpad,
    })
    
    reflections.append(reflect_chain_results)
    logger.info(f"reflections: {reflections}")
    
    num_of_steps += 1

logger.info(f"after while loop exit, main_agent_final_answer: {main_agent_final_answer}")

__main__ | L:15 | step: 1. Invoking main_agent




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mQuestion: The head of the Foreign Relations Department of the Rastriya Janashakti Party holds a degree that can be abbreviated MS, M.S., or ScM, in what field?
Thought: To answer this question, I will need to find information about the head of the Foreign Relations Department of the Rastriya Janashakti Party and their educational background.
Action: search-wikipedia
Action Input: "Rastriya Janashakti Party" "Foreign Relations Department"[0m[33;1m[1;3mPage: Hari Bahadur Basnet
Summary: Hari Bahadur Basnet is a Nepalese politician. He is the head of the Foreign Relations Department of the Rastriya Janashakti Party. Basnet holds a M.Sc. in Engineering.Basnet entered politics in 1970. Three times he was elected to the Rastriya Panchayat. He joined the Rastriya Prajatantra Party when it was formed, and became a Central Committee member of the party. He also headed the International Relations department of RPP.In 2003, he was ap

__main__ | L:23 | step: 1. Invoking evaluator_agent


[32;1m[1;3mBased on the information from Wikipedia, the head of the Foreign Relations Department of the Rastriya Janashakti Party, Hari Bahadur Basnet, holds a M.Sc. degree in Engineering.

Final Answer: Hari Bahadur Basnet, the head of the Foreign Relations Department of the Rastriya Janashakti Party, holds a Master of Science (M.Sc.) degree in Engineering.[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to verify the reference answer.
Action: wikipedia
Action Input: Hari Bahadur Basnet[0m[36;1m[1;3mPage: Hari Bahadur Basnet
Summary: Hari Bahadur Basnet is a Nepalese politician. He is the head of the Foreign Relations Department of the Rastriya Janashakti Party. Basnet holds a M.Sc. in Engineering.Basnet entered politics in 1970. Three times he was elected to the Rastriya Panchayat. He joined the Rastriya Prajatantra Party when it was formed, and became a Central Committee member of the party. He also headed the Internat

__main__ | L:29 | evaluator_result_output: 1
__main__ | L:30 | main_agent_final_answer: Hari Bahadur Basnet, the head of the Foreign Relations Department of the Rastriya Janashakti Party, holds a Master of Science (M.Sc.) degree in Engineering.
__main__ | L:52 | after while loop exit, main_agent_final_answer: Hari Bahadur Basnet, the head of the Foreign Relations Department of the Rastriya Janashakti Party, holds a Master of Science (M.Sc.) degree in Engineering.


[32;1m[1;3mFinal Answer: 1[0m

[1m> Finished chain.[0m


In [5]:
reflections

[]

# TODO:
- try examples that need atleast 6-7 trials - GSM8K dataset - https://paperswithcode.com/dataset/gsm8k
- reflections to be in react style instead of simple paragraph
- evaluation can be improved - heuristics or different llm
- tools can be expanded
- way to eval 
- compare cot, react