# Using memory

I have tested the naive case of GPT-3 reading Zork output and directly sending a command. So far I have found two fail modes:
- with no context window, it goes in circles trying the same three things
- with the full command history, it gets caught up in the first problem it can't solve (boarded door) and tries to solve "how do i say this" instead of "what do instead".

Now I will try to further its cognition through the following mechanisms.

### Memory
- short-buffer memory
- summarization
- buffer with summary

Note I am using the model 'text-davinci-003' here ideally, but sometimes it's easier to get the 002 model because the API for the better model is currently overloaded. Testing with 'text-curie-001' is also worth trying, as its much cheaper and might be a better demonstration of the power of prompt design.

In [26]:
# Setup environment
import textworld
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from dotenv import load_dotenv
from time import sleep

from langchain.chains import ConversationChain
from langchain.chains.conversation.memory import ConversationBufferMemory

load_dotenv()
llm = OpenAI(model_name="text-davinci-003", temperature=0.0, max_tokens=50, stop=["\n",">","Game:"])


                    stop was transfered to model_kwargs.
                    Please confirm that stop is what you intended.


In [27]:

# Let the environment know what information we want as part of the game state.
infos = textworld.EnvInfos(
    feedback=True,    # Response from the game after typing a text command.
    description=True, # Text describing the room the player is currently in.
    inventory=True,    # Text describing the player's inventory.
    max_score=True,   # Maximum score obtainable in the game.
    score=True,       # Score obtained so far.
)
env = textworld.start('./zork1.z5', infos=infos)
game_state = env.reset()


In [28]:


def go(env, func, max_steps=10, interactive=False, secs=1):
    game_state = env.reset()
    try:
        done = False
        i = 0
        while not done:
            i += 1
            print("#"*60, i)
            scene = game_state.feedback
            if interactive:
                print(scene)
                command = input(">")
            else:
                command = func(scene=scene)
            print(">",command)
            game_state, reward, done = env.step(command)
            sleep(secs)
            if i >= max_steps:
                break
        env.render()  # Final message.
    except KeyboardInterrupt:
        pass
    except Exception as e:
        print(e)

    print("Played {} steps, scoring {} points.".format(game_state.moves, game_state.score))
    understood_commmands = game_state.moves / i
    print("Percentage of commands understood: {:.2f}%".format(100 * understood_commmands))

## Memory

I think we'll end up with the summary+buffer approach, but I want to try them out in order of complexity first. I'm going to let each one run pretty long.. 10 steps or so, and see how it does.

In [29]:

template = """You are playing a text adventure game. Type commands to play the game.
{chat_history}
Game: {human_input}
Player: """

mem_prompt = PromptTemplate(
    input_variables=["chat_history", "human_input"], 
    template=template
)


In [30]:
def get_mem_npc(memory):
        
    mem_chain = LLMChain(
        llm=llm,
        prompt=mem_prompt, 
        verbose=True, 
        memory=memory,
    )

    def mem_npc(scene):
        return mem_chain.predict(human_input=scene)
    
    return mem_npc

### Simple buffer


In [31]:
from langchain.chains.conversation.memory import ConversationBufferWindowMemory

In [32]:
memory = ConversationBufferWindowMemory(
    memory_key="chat_history", 
    human_prefix="Game", 
    ai_prefix="Player",
    k=3
    )

npc = get_mem_npc(memory)    
zork = textworld.start('./zork1.z5', infos=infos)
go(zork, npc)



############################################################ 1


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are playing a text adventure game. Type commands to play the game.

Game: Copyright (c) 1981, 1982, 1983 Infocom, Inc. All rights reserved.
ZORK is a registered trademark of Infocom, Inc.
Revision 88 / Serial number 840726

West of House
You are standing in an open field west of a white house, with a boarded front door.
There is a small mailbox here.


Player: [0m

[1m> Finished chain.[0m
>  Look around.
############################################################ 2


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are playing a text adventure game. Type commands to play the game.
Game: Copyright (c) 1981, 1982, 1983 Infocom, Inc. All rights reserved.
ZORK is a registered trademark of Infocom, Inc.
Revision 88 / Serial number 840726

West of House
You are standing in an open field west of a white house,

Yeah, it's stuck on the short term command issue AND it's going in circles. Let's see what happens if we just give it some long term memory.

### Summarization


In [33]:
from langchain.chains.conversation.memory import ConversationSummaryMemory

In [34]:
memory = ConversationSummaryMemory(
    memory_key="chat_history", 
    human_prefix="Game", 
    ai_prefix="Player",
    llm=llm,
    )
    
npc = get_mem_npc(memory)    
zork = textworld.start('./zork1.z5', infos=infos)
go(zork, npc)


############################################################ 1


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are playing a text adventure game. Type commands to play the game.

Game: Copyright (c) 1981, 1982, 1983 Infocom, Inc. All rights reserved.
ZORK is a registered trademark of Infocom, Inc.
Revision 88 / Serial number 840726

West of House
You are standing in an open field west of a white house, with a boarded front door.
There is a small mailbox here.


Player: [0m

[1m> Finished chain.[0m
>  Look around.
############################################################ 2


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are playing a text adventure game. Type commands to play the game.

Game: West of House
You are standing in an open field west of a white house, with a boarded front door.
There is a small mailbox here.


Player: [0m

[1m> Finished chain.[0m
>  Open mailbox.
###############################

This is okay, but it's getting hung up on syntax and sending empty commands. I think this is because it can't see the dialogue history. Let's combine them now.

### Summary+buffer

In [35]:
from langchain.chains.conversation.memory import ConversationSummaryBufferMemory

In [36]:
memory = ConversationSummaryBufferMemory(
    memory_key="chat_history", 
    human_prefix="Game", 
    ai_prefix="Player",
    llm=llm,
    max_token_limit=69,
    )
    
npc = get_mem_npc(memory)    
zork = textworld.start('./zork1.z5', infos=infos)
go(zork, npc)

############################################################ 1


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are playing a text adventure game. Type commands to play the game.

Game: Copyright (c) 1981, 1982, 1983 Infocom, Inc. All rights reserved.
ZORK is a registered trademark of Infocom, Inc.
Revision 88 / Serial number 840726

West of House
You are standing in an open field west of a white house, with a boarded front door.
There is a small mailbox here.


Player: [0m

[1m> Finished chain.[0m
>  Look around.
############################################################ 2


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are playing a text adventure game. Type commands to play the game.

Game: West of House
You are standing in an open field west of a white house, with a boarded front door.
There is a small mailbox here.


Player: [0m

[1m> Finished chain.[0m
>  Open mailbox.
###############################

Much better! It's getting stuck on the boarded door problem, but it's not going in circles. Let's see if we can get it to solve that problem.

The best next step from here is to give the model a scratchpad on which to think.

### Scratchpad

Giving the model room to reason is the first step. Let's give it some space to think about its actions.

We do this by creating a "thoughts" prompt which will be chained into the agent prompt. This will allow the model to think about its goals, actions and observations.

The thoughts prompt will take the scene from the game and output the current state of the agent. This will be a list of the agent's goals, actions and observations. Then the agent prompt will take the thoughts and the history of the game and the current scene and output the next action.


In [37]:

scratch_template = """You are playing a text adventure game. Write your goals at each step.
```{chat_history}
Game: {human_input}```
Notes: My goal is"""

scratch_prompt = PromptTemplate(
    input_variables=["chat_history", "human_input"], 
    template=scratch_template
)


command_template = """You are playing a text adventure game. Type commands to play the game. Use your goals to guide you.
{chat_history}
Game: {human_input}
Goals: {notes}
Player: """

command_prompt = PromptTemplate(
    input_variables=["chat_history", "human_input", "notes"],
    template=command_template
)


In [38]:
from langchain.chains import SequentialChain

def get_scratch_npc(memory):
        
    notes_chain = LLMChain(
        llm=llm,
        prompt=scratch_prompt, 
        output_key="notes",
        # verbose=True, 
    )

    command_chain = LLMChain(
        llm=llm,
        prompt=command_prompt,
        output_key="command",
        verbose=True,
    )

    npc_chain = SequentialChain(
        chains=[notes_chain, command_chain],
        input_variables=["chat_history", "human_input"],
        output_variables=["command"],
        memory=memory,
        verbose=True,
    )

    def scratch_npc(scene):
        return npc_chain.run(human_input=scene)
    
    return scratch_npc

In [40]:
memory = ConversationSummaryBufferMemory(
    memory_key="chat_history", 
    human_prefix="Game", 
    ai_prefix="Player",
    llm=llm,
    max_token_limit=69,
    )
    
npc = get_scratch_npc(memory)    
zork = textworld.start('./zork1.z5', infos=infos)
go(zork, npc)

############################################################ 1


[1m> Entering new SequentialChain chain...[0m
[1mChain 0[0m:
{'notes': ' to explore the area and find a way to enter the house.'}



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are playing a text adventure game. Type commands to play the game. Use your goals to guide you.

Game: Copyright (c) 1981, 1982, 1983 Infocom, Inc. All rights reserved.
ZORK is a registered trademark of Infocom, Inc.
Revision 88 / Serial number 840726

West of House
You are standing in an open field west of a white house, with a boarded front door.
There is a small mailbox here.


Goals:  to explore the area and find a way to enter the house.
Player: [0m

[1m> Finished chain.[0m
[1mChain 1[0m:
{'command': ' Look around.'}


[1m> Finished chain.[0m
>  Look around.
############################################################ 2


[1m> Entering new SequentialChain chain...[0m
[1mChain 0[0m:
{'notes'

## Agent