# Playing text-based games with TextWorld and LangChain

### Loading the library

In [1]:
import textworld

TextWorld's documentation can be found at
https://textworld.readthedocs.io/en/stable/.

### Starting a game
Before launching a game, we can tell TextWorld what information we want as part of the game state.

In [2]:
# Let the environment know what information we want as part of the game state.
infos = textworld.EnvInfos(
    feedback=True,    # Response from the game after typing a text command.
    description=True, # Text describing the room the player is currently in.
    inventory=True,    # Text describing the player's inventory.
    max_score=True,   # Maximum score obtainable in the game.
    score=True,       # Score obtained so far.
)

*The full list of available information that can be requested can be found here: 
https://textworld.readthedocs.io/en/stable/textworld.html#textworld.core.EnvInfos.*



In [3]:
# We are now ready to start the game.
env = textworld.start('./zork1.z5', infos=infos)

### Getting the initial state

In [4]:
game_state = env.reset()

The variable `game_state` is a subclass of `textworld.core.GameState`. It provides the API allowing us to retrieve diverse information about the current state of the game. Here are the most useful properties.

In [5]:
# Response from the parser after entering a text command or resetting a game.
print(game_state.feedback)

Copyright (c) 1981, 1982, 1983 Infocom, Inc. All rights reserved.
ZORK is a registered trademark of Infocom, Inc.
Revision 88 / Serial number 840726

West of House
You are standing in an open field west of a white house, with a boarded front door.
There is a small mailbox here.




In [6]:
# Text describing the room the player is currently in.
# It corresponds the parser's feedback of the "look" command.
print(game_state.description)

West of House
You are standing in an open field west of a white house, with a boarded front door.
There is a small mailbox here.




In [7]:
# Text describing the player's inventory.
print(game_state.inventory)

You are empty-handed.




In [8]:
# Score received up until now.
print(game_state.score)

0


In [9]:
game_state.score

0

### Sending text commands to the game

In [10]:
game_state, score, done = env.step("open mailbox")
print(game_state.feedback)  # Result of the command.

Opening the small mailbox reveals a leaflet.




Alternatively to `print(game_state.feedback)`, it is more convenient to do:

In [11]:
env.render()

Opening the small mailbox reveals a leaflet.



## Simple LLM play loop

Here we just attach a language model to the output of the TextWorld environment. This is the naive case, where each turn is generated independently from the previous one.

In [12]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from dotenv import load_dotenv
from time import sleep

load_dotenv()
llm = OpenAI(model_name="text-davinci-002", temperature=0.0, max_tokens=50, stop=["\n",">"])


                    stop was transfered to model_kwargs.
                    Please confirm that stop is what you intended.


In [13]:

prompt = PromptTemplate(
    input_variables=["scene"],
    template="""\You are playing a text adventure game. Type commands to play the game.
        {scene}
        >"""
)
chain = LLMChain(llm=llm, prompt=prompt)


In [14]:
def simple_play(env, chain):
    game_state = env.reset()
    try:
        done = False
        while not done:
            scene = game_state.feedback
            print(scene)
            command = chain.run(scene=scene)
            # command = input(">")
            print(">",command)
            sleep(2)
            game_state, reward, done = env.step(command)    
        env.render()  # Final message.
    except KeyboardInterrupt:
        pass  # Quit the game.
    except Exception as e:
        print(e)

    print("Played {} steps, scoring {} points.".format(game_state.moves, game_state.score))


In [15]:
simple_play(env, chain)

Copyright (c) 1981, 1982, 1983 Infocom, Inc. All rights reserved.
ZORK is a registered trademark of Infocom, Inc.
Revision 88 / Serial number 840726

West of House
You are standing in an open field west of a white house, with a boarded front door.
There is a small mailbox here.


>  open mailbox
Opening the small mailbox reveals a leaflet.


>  take leaflet
Taken.


> look
West of House
You are standing in an open field west of a white house, with a boarded front door.
There is a small mailbox here.


>  open mailbox
It is already open.


> open door
The door cannot be opened.


> look around
West of House
You are standing in an open field west of a white house, with a boarded front door.
There is a small mailbox here.


>  open mailbox
It is already open.


> open door
The door cannot be opened.


> look around
West of House
You are standing in an open field west of a white house, with a boarded front door.
There is a small mailbox here.


>  open mailbox
It is already open.


> open 

### With short-term memory

Obviously the model is just going in circles with a single step of scene context. 

Let's try just adding the last command above the feedback to give the language model some context.

In [16]:
def format_scene(game_state): 
    if game_state.feedback != game_state.description:
        description = "\n" + game_state.description
    return f""">{game_state.last_command}
```{game_state.feedback}{description}```"""

In [17]:
def play(env, chain):
    game_state = env.reset()
    try:
        done = False
        while not done:
            if game_state.moves > 0:
                scene = format_scene(game_state)
            else:
                scene = game_state.feedback
            print(scene)
            command = chain.run(scene=scene)
            # command = input(">")
            print(">",command)
            sleep(2)
            game_state, reward, done = env.step(command)    
        env.render()  # Final message.
    except KeyboardInterrupt:
        pass  # Quit the game.
    except Exception as e:
        print(e)

    print("Played {} steps, scoring {} points.".format(game_state.moves, game_state.score))


In [19]:
play(env, chain)

Copyright (c) 1981, 1982, 1983 Infocom, Inc. All rights reserved.
ZORK is a registered trademark of Infocom, Inc.
Revision 88 / Serial number 840726

West of House
You are standing in an open field west of a white house, with a boarded front door.
There is a small mailbox here.


>  open mailbox
>open mailbox
```Opening the small mailbox reveals a leaflet.


West of House
You are standing in an open field west of a white house, with a boarded front door.
There is a small mailbox here.
The small mailbox contains:
  A leaflet

```
> read leaflet
>read leaflet
```(Taken)
"WELCOME TO ZORK!

ZORK is a game of adventure, danger, and low cunning. In it you will explore some of the most amazing territory ever seen by mortals. No computer should be without one!"



West of House
You are standing in an open field west of a white house, with a boarded front door.
There is a small mailbox here.

```
> open mailbox
>open mailbox
```It is already open.


West of House
You are standing in an open fie

Nope, just primes it to do the same thing every time.

We need to give it one more step of context: what was the last output? Can we get that from the game state?


In [20]:
game_state

{'last_command': 'open mailbox',
 'raw': 'Opening the small mailbox reveals a leaflet.\n\n',
 'done': False,
 'feedback': 'Opening the small mailbox reveals a leaflet.\n\n',
 'description': 'West of House\nYou are standing in an open field west of a white house, with a boarded front door.\nThere is a small mailbox here.\nThe small mailbox contains:\n  A leaflet\n\n',
 'inventory': 'You are empty-handed.\n\n',
 'score': 0,
 'max_score': 350,
 'won': False,
 'lost': False,
 'moves': 1,
 'location': Obj180: West House Parent82 Sibling15 Child4 Attributes [3, 6, 9, 20] Properties [31, 30, 29, 28, 27, 25, 24, 21, 17, 5]}

Wait, what if I just put that direclty in there lol

In [21]:

prompt = PromptTemplate(
    input_variables=["scene"],
    template="""\This is a JSON file listing every step of the walkthrough of a text adventure game.
        {scene},
       'last_command': '"""
)

json_chain = LLMChain(llm=llm, prompt=prompt)


In [22]:
def json_play(env, chain):
    game_state = env.reset()
    try:
        done = False
        while not done:
            scene = game_state
            print(scene)
            command = chain.run(scene=scene)
            # command = input(">")
            print(">",command)
            sleep(2)
            game_state, reward, done = env.step(command)
        env.render()  # Final message.
    except KeyboardInterrupt:
        pass
    except Exception as e:
        print(e)

    print("Played {} steps, scoring {} points.".format(game_state.moves, game_state.score))

json_play(env, json_chain)

{'raw': 'Copyright (c) 1981, 1982, 1983 Infocom, Inc. All rights reserved.\nZORK is a registered trademark of Infocom, Inc.\nRevision 88 / Serial number 840726\n\nWest of House\nYou are standing in an open field west of a white house, with a boarded front door.\nThere is a small mailbox here.\n\n', 'feedback': 'Copyright (c) 1981, 1982, 1983 Infocom, Inc. All rights reserved.\nZORK is a registered trademark of Infocom, Inc.\nRevision 88 / Serial number 840726\n\nWest of House\nYou are standing in an open field west of a white house, with a boarded front door.\nThere is a small mailbox here.\n\n', 'description': 'West of House\nYou are standing in an open field west of a white house, with a boarded front door.\nThere is a small mailbox here.\n\n', 'inventory': 'You are empty-handed.\n\n', 'score': 0, 'max_score': 350, 'won': False, 'lost': False, 'moves': 0, 'location': Obj180: West House Parent82 Sibling15 Child4 Attributes [3, 6, 9, 20] Properties [31, 30, 29, 28, 27, 25, 24, 21, 17, 

Nope! Okay, time to put LangChain to work and get some memory access going on.

### With memory

Since the Zork interpreter will be conversing with the LLM through a text chat interface, we can use the ConversationChain to keep track of the conversation history. We're essentially treating this as a chatbot problem, where the LLM is the chatbot and the Zork interpreter is the human.

In [23]:
from langchain.chains import ConversationChain
from langchain.chains.conversation.memory import ConversationBufferMemory
llm = OpenAI(model_name="text-davinci-002", temperature=0.0, max_tokens=50, stop=["\n",">","Game:"])


                    stop was transfered to model_kwargs.
                    Please confirm that stop is what you intended.


In [24]:
template = """\You are playing a text adventure game. Type commands to play the game.
{chat_history}
Game: {human_input}
Player:"""

mem_prompt = PromptTemplate(
    input_variables=["chat_history", "human_input"], 
    template=template
)
memory = ConversationBufferMemory(memory_key="chat_history", human_prefix="Game", ai_prefix="Player")

In [25]:
mem_chain = LLMChain(
    llm=llm,
    prompt=mem_prompt, 
    verbose=True, 
    memory=memory,
)

In [26]:
mem_chain.predict(human_input="""West of House
You are standing in an open field west of a white house, with a boarded front door.
There is a small mailbox here.""")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m\You are playing a text adventure game. Type commands to play the game.

Game: West of House
You are standing in an open field west of a white house, with a boarded front door.
There is a small mailbox here.
Player:[0m

[1m> Finished chain.[0m


' Open mailbox'

In [27]:
mem_chain.predict(human_input='Opening the small mailbox reveals a leaflet.')



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m\You are playing a text adventure game. Type commands to play the game.

Game: West of House
You are standing in an open field west of a white house, with a boarded front door.
There is a small mailbox here.
Player:  Open mailbox
Game: Opening the small mailbox reveals a leaflet.
Player:[0m

[1m> Finished chain.[0m


' Read leaflet'

Okay, cool. This is the most rudimentary form of memory, which will remember all the steps and feed the whole conversation back to the model as it goes. This will obviously be very expensive to run a lot of times, but let's try it once to see how far it can get.

In [81]:
def go(env, func, max_steps=10, interactive=False, secs=6):
    game_state = env.reset()
    try:
        done = False
        i = 0
        while not done:
            print("#"*60, i)
            scene = game_state.feedback
            if interactive:
                command = input(">")
            else:
                command = func(scene=scene)
            print(">",command)
            sleep(secs)
            game_state, reward, done = env.step(command)
            if game_state.moves > max_steps:
                break
        env.render()  # Final message.
    except KeyboardInterrupt:
        pass
    except Exception as e:
        print(e)

    print("Played {} steps, scoring {} points.".format(game_state.moves, game_state.score))


In [82]:
template = """You are playing a text adventure game. Type commands to play the game.
{chat_history}
Game: {human_input}
Player: """

mem_prompt = PromptTemplate(
    input_variables=["chat_history", "human_input"], 
    template=template
)


In [84]:
llm = OpenAI(model_name="text-davinci-003", temperature=0.0, max_tokens=50, stop=["\n",">","Game:"])
memory = ConversationBufferMemory(memory_key="chat_history", human_prefix="Game", ai_prefix="Player")
mem_chain = LLMChain(
    llm=llm,
    prompt=mem_prompt, 
    verbose=True, 
    memory=memory,
)

def mem_npc(scene):
    return mem_chain.predict(human_input=scene)
    
zork = textworld.start('./zork1.z5', infos=infos)

go(zork, mem_npc, max_steps=5)

                    stop was transfered to model_kwargs.
                    Please confirm that stop is what you intended.


############################################################ 0


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are playing a text adventure game. Type commands to play the game.

Game: Copyright (c) 1981, 1982, 1983 Infocom, Inc. All rights reserved.
ZORK is a registered trademark of Infocom, Inc.
Revision 88 / Serial number 840726

West of House
You are standing in an open field west of a white house, with a boarded front door.
There is a small mailbox here.


Player: [0m

[1m> Finished chain.[0m
>  Look around.
############################################################ 0


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are playing a text adventure game. Type commands to play the game.

Game: Copyright (c) 1981, 1982, 1983 Infocom, Inc. All rights reserved.
ZORK is a registered trademark of Infocom, Inc.
Revision 88 / Serial number 840726

West of House
You are standing in an open field west of a white house

### Conclusion

Okay, I'm going to wrap this up for now. So far my conclusion is that full-game memory could be enough to get an interesting bot, although at 0.0 temperature it shows an obvious failure mode of obsessiveness and stubbornness. Which makes sense!

The next step is to wrap all this functionality into a reusable chain, and then run it through a ModelLaboratory to find which models and temperatures are interesting to work with. Also probably to shorten the memory with summarization or document embeddings, slo as not to use too many tokens. Then once I've got a suitable memory+llm agent, I can start adding interfaces and visualizations.