## Data for Learning to Speak and Act in a Fantasy Text Adventure Game
 

Facebook AI Research released a dataset for their paper [Learning to Speak and Act in a Fantasy Text Adventure Game](https://arxiv.org/abs/1903.03094).

Here's paper's abstract:

> We introduce a large-scale crowdsourced text adventure game as a research platform for studying grounded dialogue. In it, agents can perceive, emote, and act while conducting dialogue with other agents. Models and humans can both act as characters within the game. We describe the results of training state-of-the-art generative and retrieval models in this setting. We show that in addition to using past dialogue, these models are able to effectively use the state of the underlying world to condition their predictions. In particular, we show that grounding on the details of the local environment, including location descriptions, and the objects (and their affordances) and characters (and their previous actions) present within it allows better predictions of agent behavior and dialogue. We analyze the ingredients necessary for successful grounding in this setting, and how each of these factors relate to agents that can talk and act successfully.

Their data is called the LIGHT dataset (Learning in Interactive Games with Humans and Text).  It contains 663 locations, 3462 objects and 1755 characters.  I have divided this data into training/dev/test splits.


## Load the data

The LIGHT data was released as part of the Facebook's ParlAI system. I extracted the data into several JSON files:
* ```light_environment_train.json``` contains information about the locations, objects, and characters in the text-adventure games.  
* ```light_dialogue_data.json``` contains sample conversations between pairs of characters.   We'll use this later in the semester. 



In [1]:
!wget https://raw.githubusercontent.com/interactive-fiction-class/interactive-fiction-class-data/master/light_dialogue/light_environment_train.json

--2022-01-31 15:03:36--  https://raw.githubusercontent.com/interactive-fiction-class/interactive-fiction-class-data/master/light_dialogue/light_environment_train.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3541467 (3.4M) [text/plain]
Saving to: ‘light_environment_train.json’


2022-01-31 15:03:36 (49.0 MB/s) - ‘light_environment_train.json’ saved [3541467/3541467]



In [2]:
import sys
import os
import json

json_filename = 'light_environment_train.json'

f = open(json_filename)
light_environment = json.load(f)


# LIGHT Environment Data

This section of the Python Notebook will walk you through the LIGHT environment data to show you the different elements of the JSON file.  We will use different pieces of these to fine-tune GPT3 in order to generate new locations and objects for our own text adventure games.


### Categories

The locations in LIGHT environment are grouped by categories. 

```
categories =  light_environment['categories']

categories

 {'11': 'Graveyard',
 '12': 'Wasteland',
 '13': 'Abandoned',
 '14': 'Mountain',
 '15': 'Cave',
 '16': 'Tavern',
 '17': 'Jungle',
 '18': 'Trail',
 '19': 'Town',
 '2': 'Forest',
 '20': 'Dungeon',
 '21': 'Inside Cottage',
 ... }
```


I split the LIGHT environment data into training/dev/test splits based on cateogries.  Here are the categories that ended up in the training partition.

In [None]:
def get_categories(light_environment):
  return light_environment['categories'].values()
categories = get_categories(light_environment)

print("\n".join(categories))

Forest
Shore
Countryside
Port
Swamp
Lake
Graveyard
Abandoned
Cave
Trail
Dungeon
Outside Cottage
Inside Castle
Outside Castle
Inside Church
Outside Church
Inside Temple
Outside Temple
Inside Tower
Outside Tower
Inside Palace
Outside Palace
Farm
city in the clouds
magical realm
netherworld
supernatural
underwater aquapolis



### Rooms

In text-adventure games, locations or settings are often called "rooms".  Rooms have a primary description of the location, a secondary description of the location with its background information, connections to neighboring rooms, and they can contain objects or non-player characters. 

Here's what the data structure looks like for a particular room in LIGHT (room number 62, 'An Unfinished Mausoleum', part of the 'Graveyard' category.

```
rooms = light_environment['rooms']
rooms['62']

{'background': "Bright white stone was all the fad for funerary architecture, once upon a time. It's difficult to understand why someone would abandon such a large and expensive undertaking. If they didn't have the money to finish it, they could have sold the stone, surely - or the mausoleum itself. Maybe they just haven't needed it yet? A bit odd, though, given how old it is. Maybe the gravedigger remembers... if he's sober.",
 'category': 'Graveyard',
 'description': 'Two-and-a-half walls of the finest, whitest stone stand here, weathered by the passing of countless seasons. There is no roof, nor sign that there ever was one. All indications are that the work was abruptly abandoned. There is no door, nor markings on the walls. Nor is there any indication that any coffin has ever lain here... yet.',
 'ex_characters': [204, 75, 156, 720],
 'ex_objects': [1791, 1792, 439],
 'in_characters': [203, 203],
 'in_objects': [1790],
 'neighbors': [108, 109],
 'room_id': 62,
 'setting': 'An Unfinished Mausoleum'}
```

The **in_objects** and **in_characters** are people and things that are explictly mentioned  listed in the description or the backstory.  The **ex_characters** and **ex_objects** are objects that are possibly present but not mentioned directly. These characters and objects are referenced by a numeric ID which are stored in a seperate part of the LIGHT environment file.



Here are the rooms that are in the 'Graveyard' category. 

In [None]:
from collections import defaultdict

rooms_by_id = light_environment['rooms']

rooms_by_category = defaultdict(set)
for room_id in rooms_by_id:
  category = light_environment['rooms'][room_id]['category']
  rooms_by_category[category].add(room_id)

def get_room_name(room_id, rooms_by_id=rooms_by_id):
  return rooms_by_id[room_id]['setting']

def print_rooms_for_category(category, rooms_by_category, rooms_by_id):
  rooms = rooms_by_category[category]
  print(category.capitalize())
  for room_id in rooms:
    print('\t', room_id, '-', get_room_name(room_id))

print_rooms_for_category('Graveyard', rooms_by_category, rooms_by_id)



Graveyard
	 340 - A cursed grave
	 277 - Graveyard
	 158 - the fountain
	 702 - Main street
	 144 - Cemetery
	 100 - Old Crypt
	 162 - Reception area
	 431 - Abandoned workers shed
	 62 - An Unfinished Mausoleum
	 661 - Main graveyard
	 386 - Tombstones of the Kings
	 462 - Dead Tree


### Neighbors

Rooms are connected to other rooms.  The LIGHT dataset stores the connections in a variable called ```light_environment['neighbors']```.  Here is an example of what is information is stored about these connections.

```
 '108': {'connection': 'walking carefully between fallen headstones',
  'destination': 'Fresh Grave',
  'direction': 'West',
  'inverse_id': None,
  'room_id': 62},
 '109': {'connection': 'following a dirt trail behind the mausoleum',
  'destination': 'Dead Tree',
  'direction': 'South',
  'inverse_id': None,
  'room_id': 62},
```

These can be thought of as arcs in a directed graph, where the rooms are nodes, and these elements are the arcs that connect a pair of nodes.  The head of the arc (the ***to node***) is specified by the ```destination``` field (a description rather than an ID), and tail of the arc (the ***from node***) is specified by the ```room_id```.

In [None]:
arcs = light_environment['neighbors']

# Create a dictionary that maps room names ('setting') to IDs
room_names_to_id = {room['setting']:room_id for (room_id,room) in rooms_by_id.items()}


def make_connections(arcs):
#  direction, connected_location, travel_description
  for arc_id, arc in arcs.items():
    try:
      source_id = str(arc['room_id'])
      target_id = str(room_names_to_id[arc['destination']])
      direction = arc['direction']
      travel_description = arc['connection']
      source_name = get_room_name(source_id)
      target_name = get_room_name(target_id)
      # Print out the room connections in the Graveyard
      if source_id in rooms_by_category['Graveyard']:
        print('====')
        print(source_name, '-->', target_name)
        print(direction)
        print(travel_description)
    except:
      pass

make_connections(arcs)

====
An Unfinished Mausoleum --> Dead Tree
South
following a dirt trail behind the mausoleum
====
Old Crypt --> Abandoned workers shed
South
walking down the cobbled path
====
Cemetery --> Main street
West
following the cobblestone path
====
Reception area --> Main graveyard
East
walking
====
Tombstones of the Kings --> Church
North
exiting the graveyard
====
Abandoned workers shed --> Old Crypt
North
walking down the cobbled path
====
Main street --> Cemetery
South
traveling the road south
====
Main street --> Cemetery
East
following the cobblestone path



### Characters 


Characters have a description, a persona (a first person description of who they are and what their motivations might be), a character type (person, creature or object), a location (```in_room_id```) and an an inventory (```carrying_objects```)

The Gravedigger character is listed in the Unfinished Mausoleum's ``in_characters`` variable.  The ``in_characters`` are characters that are explictly mentioned in the location's ``description`` or ``background`` variables.  In this case, the Gravedigger is mentioned in the Unfinished Mausoleum's ``background variahle``. 
```
light_environment['characters']['203']

{'base_form': ['gravedigger'],
 'carrying_objects': [890],
 'char_type': 'person',
 'character_id': 203,
 'corrected_name': 'gravedigger',
 'desc': 'You might want to talk to the gravedigger, specially if your looking for a friend, he might be odd but you will find a friend in him.',
 'ex_room_ids': [100, 349],
 'in_room_ids': [62],
 'is_plural': 0,
 'name': 'gravedigger',
 'orig_room_id': 349,
 'personas': ["I am low paid labor in this town. I do a job that many people shun because of my contact with death. I am very lonely and wish I had someone to talk to who isn't dead."],
 'wearing_objects': [],
 'wielding_objects': []}

 ```
 Here are the ``ex_characters`` from the Unfinished Mausoleum.  They are not explicitly mentioned in the room's description or background, but the annotators thought that these characters were the kinds of characters that might be found there.

```
for id in "204, 75, 156, 720".split(','):
  print(light_environment['characters'][id.strip()]['corrected_name'])

thief
peasant
mouse
bat
```


In [None]:
for id in "204, 75, 156, 720".split(','):
  print(light_environment['characters'][id.strip()]['corrected_name'])

thief
peasant
mouse
bat


Here is the Gravedigger character.  Characters have descriptions, name, and personas.  We'll use personas later in the semester when we look at generating dialogue for characters.

In [None]:
light_environment['characters']['203']

{'base_form': ['gravedigger'],
 'carrying_objects': [890],
 'char_type': 'person',
 'character_id': 203,
 'corrected_name': 'gravedigger',
 'desc': 'You might want to talk to the gravedigger, specially if your looking for a friend, he might be odd but you will find a friend in him.',
 'ex_room_ids': [100, 349],
 'in_room_ids': [62],
 'is_plural': 0,
 'name': 'gravedigger',
 'orig_room_id': 349,
 'personas': ["I am low paid labor in this town. I do a job that many people shun because of my contact with death. I am very lonely and wish I had someone to talk to who isn't dead."],
 'wearing_objects': [],
 'wielding_objects': []}

In [None]:
characters_by_id = light_environment['characters']
characters_by_id['203']

from collections import Counter

def count_character_types(characters_by_id):
  character_types = Counter()
  for character_id in characters_by_id:
    character = characters_by_id[character_id]
    char_type = character['char_type']
    character_types[char_type] += 1
  return character_types

character_types = count_character_types(characters_by_id)
print(character_types)


Counter({'person': 1028, 'creature': 304, 'object': 38})


### Objects

Objects are inanimate things in the game.  They have descriptions, locations, and a set of properties that could be used to govern how a player interacts with them.  The properties of objects in the light dataset are 
* is_container
* is_drink
* is_food
* is_gettable
* is_plural
* is_surface
* is_weapon
* is_wearable

These properties have numeric values associated with them.  The values seem to be something like 0.0 = false, 1.0 = true, 0.5 = possibly. 

Here is an example object:
```
light_environment['objects']['1188']

 {'base_form': ['sword', 'Sword'],
 'desc_entries': 2,
 'descriptions': ['The sword is very old, you would assume it had once belonged to a legendary warrior.',
  "The sword's legend is known by everyone, it is famous throughout the land."],
 'ex_room_ids': [],
 'holding_character_ids': [],
 'in_room_ids': [12],
 'is_container': 0.0,
 'is_drink': 0.0,
 'is_food': 0.0,
 'is_gettable': 1.0,
 'is_plural': 1.0,
 'is_surface': 0.0,
 'is_weapon': 1.0,
 'is_wearable': 0.0,
 'link_entries': 1,
 'name': 'Legendary swords',
 'object_id': 1188}
 ```

In [None]:
light_environment['objects']['1188']

{'base_form': ['sword', 'Sword'],
 'desc_entries': 2,
 'descriptions': ['The sword is very old, you would assume it had once belonged to a legendary warrior.',
  "The sword's legend is known by everyone, it is famous throughout the land."],
 'ex_room_ids': [],
 'holding_character_ids': [],
 'in_room_ids': [12],
 'is_container': 0.0,
 'is_drink': 0.0,
 'is_food': 0.0,
 'is_gettable': 1.0,
 'is_plural': 1.0,
 'is_surface': 0.0,
 'is_weapon': 1.0,
 'is_wearable': 0.0,
 'link_entries': 1,
 'name': 'Legendary swords',
 'object_id': 1188}

In [None]:
obj = light_environment['objects']['1188']
print(obj['name'])
print(obj['object_id'])
for label, value in obj.items():
  if label.startswith('is_') and value == 1.0:
    print(label, value)

Legendary swords
1188
is_gettable 1.0
is_weapon 1.0
is_plural 1.0


In [None]:
objects_by_id = light_environment['objects']

def sort_objects_by_property(objects_by_id):
  objects_by_property = defaultdict(set)
  for object_id, obj in objects_by_id.items(): 
    name = obj['name']
    for label, value in obj.items():
      if label.startswith('is_') and value == 1:
        objects_by_property[label].add(object_id)
  return objects_by_property

objects_by_property = sort_objects_by_property(objects_by_id)

# print 20 objects for each property
for prop in objects_by_property:
  print(prop)
  for counter, object_id in enumerate(objects_by_property[prop]):
    if counter < 20:
      obj_name = objects_by_id[object_id]['name']
      print('\t', obj_name)


is_gettable
	 desk chair
	 brightly colored flowers that grow up to 30 feet tall
	 florescent rocks and shells
	 specks of dust
	 lance
	 trident
	 wood is gathered to make fire
	 brazier
	 wood
	 lush green vegetation
	 trinkets
	 meals
	 cloud barstools
	 mud
	 tree branches
	 large earthenware jug of water for cooking and drinking
	 glistening shades of seashells
	 chair
	 Message in a bottle
	 cargo
is_plural
	 red and white candles
	 brightly colored flowers that grow up to 30 feet tall
	 florescent rocks and shells
	 old statues
	 many treasure chests filled to the brim with gold
	 specks of dust
	 ornate inlay-ed floor decorations
	 crests
	 trinkets
	 meals
	 fire torches
	 cloud barstools
	 tree branches
	 glistening shades of seashells
	 elderberries
	 gentle rolling hills
	 crumbling headstones
	 Rusty manicals
	 elaborate carpet pieces
	 crystals
is_weapon
	 decorative shields
	 garden bench
	 a crossbow
	 plate
	 specks of dust
	 books
	 hoe for digging
	 lance
	 fishing r

# Format Data for Fine-Tuning 

Below, I show how to create data to fine-tune OpenAI.  The OpenAI API documentation has a [guide to fine-tuning models](https://beta.openai.com/docs/guides/fine-tuning) that you should read.   The basic format of fine-tuning data is a JSONL file (one JSON object per line) with two key-value pairs: `prompt:` and `completion:`.

```
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
...
```

In the code below, I'll extract a prompt that contains the `Category` and `Setting` variables from a LIGHT Environment room, and I'll have the completion be the room's `Description`.

In [None]:
import random
def get_room_description(room_id, rooms_by_id, light_environment):
  """
  This generates a prompt and a completion which can be used to fine-tune OpenAI.
  This version just gnnerates 
  """
  prompt = ""
  completion = ""
  prompt += "Category: {category}\n".format(category=rooms_by_id[room_id]['category'].capitalize())
  prompt += "Setting: {setting}\n".format(setting=rooms_by_id[room_id]['setting'].capitalize())
  completion += "Description: {description}\n".format(description=rooms_by_id[room_id]['description'])
  completion += "###\n"

  return prompt, completion


def get_item_list(room_id, rooms_by_id, light_environment):
  """
  inputs: category, location name, location description, number of items
  output: list of item names 
  """
  prompt = ""
  completion = ""
  prompt += "Category: {category}\n".format(category=rooms_by_id[room_id]['category'].capitalize())
  prompt += "Setting: {setting}\n".format(setting=rooms_by_id[room_id]['setting'].capitalize())
  prompt += "Description: {description}\n".format(description=rooms_by_id[room_id]['description'].capitalize())
  prompt += "Number of Items: {n_items}\n".format(n_items=len(rooms_by_id[room_id]['in_objects']))
  item_indices=rooms_by_id[room_id]['in_objects']
  items_list = []
  for i in item_indices:
    items_list.append(light_environment['objects'][str(i)]['name'])

  completion += "Item List: {items}\n".format(items=items_list)
  completion += "###\n"

  return prompt, completion

def get_item_description(room_id, rooms_by_id, light_environment):
  """ 
  inputs: category, location name, location description, item name
  output: item description
  """
  prompt = ""
  completion = ""
  prompt += "Category: {category}\n".format(category=rooms_by_id[room_id]['category'].capitalize())
  prompt += "Setting: {setting}\n".format(setting=rooms_by_id[room_id]['setting'].capitalize())
  prompt += "Description: {description}\n".format(description=rooms_by_id[room_id]['description'].capitalize())

  n_items = len(rooms_by_id[room_id]['in_objects'])
  if n_items == 0:
    return "no items", "no description"


  rand_item = rooms_by_id[room_id]['in_objects'][random.randint(0, (len(rooms_by_id[room_id]['in_objects'])-1))]
  
  print(light_environment['objects'][str(rand_item)]['name'])
  prompt += "Item Name: {item}\n".format(item=light_environment['objects'][str(rand_item)]['name'])

  completion += "Item Description: {description}\n".format(description=light_environment['objects'][str(rand_item)]['descriptions'][0])
  completion += "###\n"

  return prompt, completion

def get_connections(room_id, rooms_by_id, light_environment):
  """
  inputs: category, location name, location description
  output: a list of (direction, location name) tuples
  """
  prompt = ""
  completion = ""
  prompt += "Category: {category}\n".format(category=rooms_by_id[room_id]['category'].capitalize())
  prompt += "Setting: {setting}\n".format(setting=rooms_by_id[room_id]['setting'].capitalize())
  prompt += "Description: {description}\n".format(description=rooms_by_id[room_id]['description'].capitalize())
  neighbors_list = rooms_by_id[room_id]['neighbors']
  connections_str = ""
  for neighbor in neighbors_list:
    direction = light_environment['neighbors'][str(neighbor)]["direction"]
    name = light_environment['neighbors'][str(neighbor)]["destination"]
    connections_str += "(" + direction + "," + name + ")"
  completion += "Connections: {connections}\n".format(connections=connections_str)
  completion += "###\n"

  return prompt, completion

def get_item_property(item, prop_input):
  """
  inputs: item name, item description, property (e.g. gettable)
  output: True or False if the item has that property
  """

  prompt = ""
  completion = ""
  
  prompt += "Item Name: {name}\n".format(name=item['name'].capitalize())
  print(item['descriptions'])
  prompt += "Item Description: {description}\n".format(description=item['descriptions'][0].capitalize())
  prompt += "Property: {p}\n".format(p=prop_input).capitalize()
  
  prop_bool = ""
  prop_num = item[prop_input]
  if prop_num == 0.0:
    prop_bool = "False"
  elif prop_num == 1.0:
     prop_bool = "True"
  else:
    prop_bool = "Possibly"

  completion = prop_input + ": {response}\n".format(response=prop_bool).capitalize()
  
  completion += "###\n"

  return prompt, completion
 



def create_items_finetuning_data(filename='fine_item_list.jsonl'):
  fine_tuning_data = []
  for category in categories:
    rooms = rooms_by_category[category]
    for room_id in rooms:
      data = {}
      prompt, completion = get_item_list(room_id, rooms_by_id, light_environment)
      data['prompt'] = prompt
      data['completion'] = completion
      print(prompt, end="")
      print(completion)
      fine_tuning_data.append(data)

  with open(filename, 'w') as out:
    for data in fine_tuning_data:
        out.write(json.dumps(data))
        out.write('\n')


def create_property_finetuning_data(filename='fine_tuning_properties.jsonl'):
  fine_tuning_data = []
  objects = light_environment['objects']
  properties = ["is_container", "is_drink", "is_food", "is_gettable", "is_plural", "is_surface", "is_weapon", "is_wearable"]
  for item in objects.keys():
    for prop in properties:
      data = {}
      prompt, completion = get_item_property(objects[item], prop)
      data['prompt'] = prompt
      data['completion'] = completion
      print(prompt, end="")
      print(completion)
      fine_tuning_data.append(data)

  with open(filename, 'w') as out:
    for data in fine_tuning_data:
        out.write(json.dumps(data))
        out.write('\n')

create_items_finetuning_data()

# Fine-tune GPT3 with the OpenAI API

Next, we'll perform fine-tuning with this data using OpenAI. 

In [3]:
%%capture
!pip install --upgrade openai
!pip install jsonlines

Once you've got access to the OpenAI API, you can find your OpenAI API key [here](https://beta.openai.com/account/api-keys).

In [4]:
import os
import openai

print('Enter OpenAI API key:')
openai.api_key = input()

os.environ['OPENAI_API_KEY']=openai.api_key

Enter OpenAI API key:
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


In [None]:
!head fine_tuning_location_descriptions.jsonl

head: cannot open 'fine_tuning_location_descriptions.jsonl' for reading: No such file or directory


Next, we'll make the fine tuning API call via the command line.  Here the -m argument gives the model.  There are 4 sizes of GPT3 models.  They go in alphabetical order from smallest to largest.
* Ada 
* Baddage
* Currie
* Davinci

The models as the model sizes increase, so does their quality and their cost.  Davinci is the highest quality and highest cost model.  I recommend starting by fine-tuning smaller models to debug your code first so that you don't rack up costs.

Fine-tuning curie costs about $0.50 for this data.


In [None]:
!openai api fine_tunes.create -t fine_item_list.jsonl -m babbage
#Describe a location (babbage): babbage:ft-cis-700-12-2022-01-27-01-57-21
#connections: babbage:ft-cis-700-12-2022-01-27-03-25-31
#item description: babbage:ft-cis-700-12-2022-01-27-03-55-23
#item list: babbage:ft-cis-700-12-2022-01-29-20-17-03
#properties: babbage:ft-cis-700-12-2022-01-27-04-58-14

Upload progress:   0% 0.00/237k [00:00<?, ?it/s]Upload progress: 100% 237k/237k [00:00<00:00, 338Mit/s]
Uploaded file from fine_item_list.jsonl: file-iIsmDGTq63UzCsOHGY03r9Fh
Created fine-tune: ft-l6ngqPqfMByFkcM4RCGIq6cD
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2022-01-29 20:04:48] Created fine-tune: ft-l6ngqPqfMByFkcM4RCGIq6cD
[2022-01-29 20:05:00] Fine-tune costs $0.12
[2022-01-29 20:05:01] Fine-tune enqueued. Queue number: 0
[2022-01-29 20:05:06] Fine-tune started
[2022-01-29 20:08:13] Completed epoch 1/4
[2022-01-29 20:10:56] Completed epoch 2/4
[2022-01-29 20:13:39] Completed epoch 3/4
[2022-01-29 20:16:22] Completed epoch 4/4
[2022-01-29 20:17:04] Uploaded model: babbage:ft-cis-700-12-2022-01-29-20-17-03
[2022-01-29 20:17:08] Uploaded result file: file-m8V2ae5RNxgG7ZnOsfMum1oU
[2022-01-29 20:17:08] Fine-tune succeeded

Job complete! Status: succeeded 🎉
Try out your fine-tuned model:

openai api complet

You should copy down the fine-tune numbers which look like this:

```
Created fine-tune: ft-VzQpTwfnWAzDXNKgPTFtiZg2

[2022-01-21 23:22:47] Uploaded model: curie:ft-ccb-lab-members-2022-01-21-23-22-46
```

If you forget to write it down, you can list your fine-tuned runs and models this way. These model names aren't mneumonic, so it is probably a good idea to make a note on what your model's inputs and outputs are. 

In [None]:
!openai api fine_tunes.list

{
  "data": [
    {
      "created_at": 1643247061,
      "fine_tuned_model": "curie:ft-cis-700-12-2022-01-27-01-44-33",
      "hyperparams": {
        "batch_size": 1,
        "learning_rate_multiplier": 0.05,
        "n_epochs": 4,
        "prompt_loss_weight": 0.1
      },
      "id": "ft-J5VRRV83MdXsKxUcA7oF0cSY",
      "model": "curie",
      "object": "fine-tune",
      "organization_id": "org-3y6QaChrju7S1Z98cUfL80Zy",
      "result_files": [
        {
          "bytes": 117348,
          "created_at": 1643247877,
          "filename": "compiled_results.csv",
          "id": "file-BPNOahK8GnSS5B9HccyDjhcl",
          "object": "file",
          "purpose": "fine-tune-results",
          "status": "processed",
          "status_details": null
        }
      ],
      "status": "succeeded",
      "training_files": [
        {
          "bytes": 196498,
          "created_at": 1643247060,
          "filename": "fine_tuning_location_descriptions.jsonl",
          "id": "file-uOhiqV8Z

You can run your fine tuned model in the OpenAI Playground.  After the model is finished finetuning you'll find it in the Engine dropdown menu.  

You'll need to give the inputs that we used as the "prompt" in our training data.  In this case we gave it a `Category:` and a `Location:` name.  You can also add the `Description:` part of the completion if you want. You should also set the stop sequence to be `###`. For example, we could type this input into the playground:

```
Category: Dark Forest
Location: Winter's Glade
Description:
```
If you press the "Generate" then your fine tuned model will output something like:

> The forest at night is not a place one would willingly set foot in. It is cold and dark and seems to go on forever. Trees loom over you, blocking out the light of the moon and stars. The only sound is the occasional howl of a wolf and the occasional splashing of water.

If you don't like the description, you can press the "Regerate" button to get other outputs like:

> The winter's glade is a dark and eerie place. It is home to many animals, but little else. The trees are barren and the ground is covered in snow.

Or

> The dark forest is a place where not even a ray of light can pierce the tangled web of branches overhead. Needles from vast numbers of trees protrude at awkward angles, their branches thin and frail, more like twigs than the strong trunks they resemble. Between the branches, a darkness deeper than night reigns. It is from this darkness that the trees themselves appear as ghosts, for the branches do not embrace the earth so much as merely touch it. The tips of the branches move slightly with every breath taken by the trees, and the dark forest seems to breathe along with them.

Or 

> The forest has now turned white.  The trees are barren and dead, their branches thin and broken.  There's a light dusting of snow on the ground, and it looks as if the forest is trying to erase all traces of life from the earth.


You can press the "Code" button to get a snippet of code that you can adapt into your own Python programs.  

```
import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

response = openai.Completion.create(
  model="curie:ft-ccb-lab-members-2022-01-21-23-22-46",
  prompt="Category: Dark Forest\nLocation: Winter's Glade\nDescription:",
  temperature=0.7,
  max_tokens=64,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0,
  stop=["###"]
)
```

Here's an example of how to write a function using the code that the OpenAI API provides.

# TODO: Fine-Tune Additional Models for Text Adventure Games

In this assignment, we'll ask you to fine-tune models to perform the following tasks:
1. Describe a location (I've given you this code.  You can adapt it for other models)
- inputs: category, location name
- output: location description 
2. List the items that are at a location
- inputs: category, location name, location description, number of items
- output: list of item names
3. Describe an item
- inputs: category, location name, location description, item name
- output: item description 
5. List connections from the current location
- inputs: category, location name, location description, and optionally a partial list of existing connections (direction, location name) tuples 
- output: a list of (direction, location name) tuples
4. Get an item's properties
- inputs: item name, item description, property (e.g. gettable)
- output: True or False if the item has that property


In [5]:
def get_location_description(category, location_name, finetuned_model):
  response = openai.Completion.create(
      model=finetuned_model,
      prompt="Category: {category}\nLocation: {location}\nDescription:".format(
          category=category.capitalize(),
          location=location_name.capitalize()
      ),
      temperature=0.7,
      max_tokens=64,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      stop=["###"]
      )
  return response['choices'][0]['text']

# Replace with your model's name
finetuned_model = "babbage:ft-cis-700-12-2022-01-27-01-57-21"
category = "Dark Forest"
location_name = "Winter's Glade"

descripton = get_location_description(category, location_name, finetuned_model)
print(descripton)

 The winter's glade is a hidden location during the dark of the forest. It is covered in undergrowth and surrounded by a dense fog. There are two stone benches where one can sit. The glade is beautiful during the day, with many colorful flowers blooming. It is very peaceful. At night, however


In [6]:
def get_items_at_location(category, setting, description, n_items, finetuned_model):
  response = openai.Completion.create(
  model=finetuned_model,
  prompt="Category: {category}\nSetting: {setting}\nDescription: {description}\nNumber of Items: {n_items}".format(
      category=category.capitalize(),
      setting=setting.capitalize(),
      description=description.capitalize(),
      n_items= n_items
  ),
  temperature=0.7,
  max_tokens=64,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0,
  stop=["###"]
  )
  return response['choices'][0]['text']

# Replace with your model's name
finetuned_model = "babbage:ft-cis-700-12-2022-01-29-20-17-03"
category = "Dark Forest"
setting = "Winter's Glade"
description = "A wintery forest blanketed in snow. On the ground there is a long sword, a metal amulet and an iron box. To the west you see a quarry."
n_items = 3
items = get_items_at_location(category, setting, description, n_items, finetuned_model)
print(items)
print(type(items))
print(len(items))



Item List: ['long sword', 'metal amulet', 'iron box']

<class 'str'>
55


In [11]:
def get_item_description(category, item_name, finetuned_model, location_name="", location_description=""):
  response = openai.Completion.create(
  model=finetuned_model,
  prompt="Category: {category}\nSetting: {setting}\nDescription: {description}\nItem Name: {name}".format(
      category=category.capitalize(),
      setting=setting.capitalize(),
      description=description.capitalize(),
      name=item_name
  ),
  temperature=0.7,
  max_tokens=64,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0,
  stop=["###"]
  )
  return response['choices'][0]['text']

# Replace with your model's name
finetuned_model = "babbage:ft-cis-700-12-2022-01-27-03-55-23"
category = "Dark Forest"
setting = "Winter's Glade"
description = "A wintery forest blanketed in snow. On the ground there is a long sword, a metal amulet and an iron box. To the west you see a quarry."
item_name = "amulet"
# desc = get_item_description(category, item_name, finetuned_model, setting, description)
# print(desc)

In [12]:
def get_connections(category, location_name, location_description, finetuned_model, current_connections=[]):
  prompt = "Category: {category}\nSetting: {setting}\nDescription: {description}\n".format(
      category=category.capitalize(),
      setting=location_name.capitalize(),
      description=location_description.capitalize(),
      n_connections= current_connections
  )
  response = openai.Completion.create(
  model=finetuned_model,
  prompt=prompt,
  temperature=0.4,
  max_tokens=64,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0,
  stop=["###"]
  )
  return response['choices'][0]['text']

# Replace with your model's name
finetuned_model = "babbage:ft-cis-700-12-2022-01-27-03-25-31"
category = "Forest"
setting = "The stream"
description = "The stream is always going. the water moves quickly in some places and slowly in other places. there is a dry spot on each side of the water. in some areas you will see animals drinking the water. there are also fish swimming around in the deeper parts of the stream."
connections = ['(North,The waterfall)']
# connection_results = get_connections(category, setting, description, finetuned_model, connections)


In [13]:
def get_item_property(property_name, item_name, item_description, finetuned_model):
  if property_name == "is_gettable":
    return is_gettable(item_name, item_description, finetuned_model,property_name)
  elif property_name == "is_weapon":
    return is_weapon(item_name, item_description, finetuned_model,property_name)
  elif property_name == "is_surface":
    return is_surface(item_name, item_description, finetuned_model,property_name)
  elif property_name == "is_container":
    return is_container(item_name, item_description, finetuned_model,property_name)
  elif property_name == "is_wearable":
    return is_gettable(item_name, item_description, finetuned_model,property_name)
  elif property_name == "is_drink":
    return is_gettable(item_name, item_description, finetuned_model,property_name)
  elif property_name == "is_food":
    return is_gettable(item_name, item_description, finetuned_model,property_name)
  else:
    pass

def is_gettable(item_name, item_description, finetuned_model,property_name, property_description="A player can pick this item up and add it to their inventory."):
  response = openai.Completion.create(
  model=finetuned_model,
  prompt="Item Name: {item_name}\nDescription: {item_description}\nProperty: {property_name}".format(
      item_name=item_name.capitalize(),
      item_description=item_description.capitalize(),
      #property_description= property_description.capitalize()
      property_name = property_name.capitalize()
  ),
  temperature=0.7,
  max_tokens=64,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0,
  stop=["###"]
  )
  return response['choices'][0]['text']

def is_weapon(item_name, item_description, finetuned_model,property_name, property_description="This item can be used as a weapon."):
  response = openai.Completion.create(
  model=finetuned_model,
  prompt="Item Name: {item_name}\nDescription: {item_description}\nProperty: {property_name}".format(
      item_name=item_name.capitalize(),
      item_description=item_description.capitalize(),
      #property_description= property_description.capitalize()
      property_name = property_name.capitalize()
  ),
  temperature=0.7,
  max_tokens=64,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0,
  stop=["###"]
  )
  return response['choices'][0]['text']

def is_surface(item_name, item_description, finetuned_model,property_name, property_description="Another item can be placed on top of this item."):
  response = openai.Completion.create(
  model=finetuned_model,
  prompt="Item Name: {item_name}\nDescription: {item_description}\nProperty: {property_name}".format(
      item_name=item_name.capitalize(),
      item_description=item_description.capitalize(),
      #property_description= property_description.capitalize()
      property_name = property_name.capitalize()
  ),
  temperature=0.7,
  max_tokens=64,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0,
  stop=["###"]
  )
  return response['choices'][0]['text']

def is_container(item_name, item_description, finetuned_model, property_name,property_description="Other items can be stored inside of this item."):
  response = openai.Completion.create(
  model=finetuned_model,
  prompt="Item Name: {item_name}\nDescription: {item_description}\nProperty: {property_name}".format(
      item_name=item_name.capitalize(),
      item_description=item_description.capitalize(),
      #property_description= property_description.capitalize()
      property_name = property_name.capitalize()
  ),
  temperature=0.7,
  max_tokens=64,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0,
  stop=["###"]
  )
  return response['choices'][0]['text']

def is_wearable(item_name, item_description, finetuned_model,property_name, property_description="This item can be worn as an item of cloting."):
  response = openai.Completion.create(
  model=finetuned_model,
  prompt="Item Name: {item_name}\nDescription: {item_description}\nProperty: {property_name}".format(
      item_name=item_name.capitalize(),
      item_description=item_description.capitalize(),
      #property_description= property_description.capitalize()
      property_name = property_name.capitalize()
  ),
  temperature=0.7,
  max_tokens=64,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0,
  stop=["###"]
  )
  return response['choices'][0]['text']

def is_drink(item_name, item_description, finetuned_model,property_name, property_description="This item is a liquid that can be drunk."):
  response = openai.Completion.create(
  model=finetuned_model,
  prompt="Item Name: {item_name}\nDescription: {item_description}\nProperty: {property_name}".format(
      item_name=item_name.capitalize(),
      item_description=item_description.capitalize(),
      #property_description= property_description.capitalize()
      property_name = property_name.capitalize()
  ),
  temperature=0.7,
  max_tokens=64,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0,
  stop=["###"]
  )
  return response['choices'][0]['text']

#def is_food(item_name, item_description, finetuned_model, property_description="This item can be eaten."):
def is_food(item_name, item_description, finetuned_model, property_name,property_description="This item can be eaten."):
  print('is food')
  response = openai.Completion.create(
  model=finetuned_model,
  prompt="Item Name: {item_name}\nDescription: {item_description}\nProperty: {property_name}".format(
      item_name=item_name.capitalize(),
      item_description=item_description.capitalize(),
      #property_description= property_description.capitalize()
      property_name = property_name.capitalize()
  ),
  temperature=0.7,
  max_tokens=64,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0,
  stop=["###"]
  )
  return response['choices'][0]['text']

# Replace with your model's name
finetuned_model = "babbage:ft-cis-700-12-2022-01-27-04-58-14"
property_name = "is_container"
item_name = "Towering pine trees"
item_description = "The tree is tall and leafy"
# property_results = get_item_property(property_name, item_name, item_description, finetuned_model)
# print("Property Results", property_results)

# TODO: Generate A Game

You now have all of the pieces that you need to generate a game!

Build a game using your automatic methods, and then export it in the same JSON format as the LIGHT Environment Data.  

You'll upload your JSON file to Gradescope along with this notebook.

If you'd like, you can build a game using the same theme and location names as the one that you did in HW1.

In [16]:
import random
describe_location_model = 'babbage:ft-cis-700-12-2022-01-27-01-57-21'
connections_model=  'babbage:ft-cis-700-12-2022-01-27-03-25-31'
item_description_model= 'babbage:ft-cis-700-12-2022-01-27-03-55-23'
item_list_model = 'babbage:ft-cis-700-12-2022-01-29-20-17-03'
property_model = 'babbage:ft-cis-700-12-2022-01-27-04-58-14'

from json import JSONEncoder
class Encoder(JSONEncoder):
        def default(self, o):
            return o.__dict__

class Game():
  rooms = []
  objects = []
  neighbors = []
  def __init__(self, rooms, objects, neighbors):
    self.rooms = rooms
    self.objects = objects
    self.neighbors = neighbors

class Item():
  in_room_ids = []
  description = ""
  name = ""
  properties = {
      'is_container': 0.0,
      'is_drink': 0.0,
      'is_food': 0.0,
      'is_gettable': 0.0,
      'is_surface': 0.0,
      'is_weapon': 0.0,
      'is_wearable': 0.0,
      }
  object_id = 0
 

  def __init__(self, name, id, room_id):
    self.name = name
    self.object_id = id
    self.description = [room_id]


class Neighbor():
  connection = ''
  destination = ''
  direction = ""
  inverse_id = None
 

  def __init__(self, destination, direction, room_id, neighbor_id):
    self.destination = destination
    self.direction = direction
    self.room_id = room_id
    self.neighbor_id = neighbor_id
  
class Room():
  #name
  setting = ""
  category = ""
  description = ""
  neighbors = []
  objects = []
  room_id = 0
  def __init__(self,
               setting,
               category,
               description,
               neighbors,
               objects,
               room_id):
    self.setting = setting
    self.category = category
    self.description = description
    self.room_id = room_id
    self.neighbors = neighbors

def generate_location(category, location, item_count, room_count, neighbor_count, max_items = 1):
    print("Location: ",location)
    r_description = get_location_description(category, location, describe_location_model).strip('\n')
    
    print(r_description)

    num_items = random.randint(1,max_items)
    item_string = get_items_at_location(category, location, r_description, num_items, item_list_model).strip('\n')
    item_list = item_string.split(",")
    item_objs = []
    item_ids = []
    neighbor_objs = []
    neighbor_ids = []
    
    # print(item_list)
    for i in item_list:
      item = Item(i, item_count, room_count)
      item_ids.append(item.object_id)
      item_objs.append(item)
      item_count += 1
      item.description = get_item_description(category, item.name, item_description_model, location, r_description).strip('\n')
      
      for prop in item.properties.keys():
        # print(prop)
        #get_item_property(property_name, item_name, item_description, finetuned_model)
        item.properties[prop] = get_item_property(prop,item.name, item.description, property_model).strip('\n')
        # print(item.properties[prop])
    
    connections = get_connections(category, location, r_description, connections_model,[]).split(")")
    #print('test connect', connections)
    if '\n' in connections:
      connections.remove('\n')
    
    connections = [room.replace("Connections: (", "") for room in connections if 'Connections: (' in room]
    print("Connections are",connections)
    for connect in connections:
      if ',' in connect:
        direction, dest = connect.split(',')
      else:
        continue
      neighbor = Neighbor(dest, direction, room_count, neighbor_count)
      print('Neighbor is',Encoder().encode(neighbor))
      #location.neighbor.append(neighbor.neighbor_id)
      neighbor_objs.append(neighbor)
      neighbor_ids.append(neighbor.neighbor_id)
      neighbor_count+=1

    room = Room(location, category, r_description, neighbor_ids, item_ids, room_count)
    print("Room is",Encoder().encode(room))
    #go back and add room description to neighbor
    for neigh in neighbor_objs:
        if neigh.neighbor_id == room_count:
          neigh.connection = r_description
    room_count = room_count + 1 
    return room, room_count, item_objs, item_count, neighbor_objs, neighbor_count
    
  
def build_game(category="Ancient Greece", initial_location_name="Mount Olympus", room_loops = 2, max_items = 2):
  item_count = 0
  room_count = 0
  neighbor_count = 0
  final_rooms = []
  final_items = []
  final_neighbors = []
  n = 0

  current_room = initial_location_name
  room, room_count, items, item_count, neighbors, neighbor_count = generate_location(category, current_room, item_count, room_count, neighbor_count, max_items)
  final_rooms.append(room)
  final_items.append(items)
  final_neighbors.append(neighbors)
  while not n == room_loops:
    for room_id in room.neighbors:
      for neigh in neighbors:
        if neigh.neighbor_id == room_id:
          room_name = neigh.destination
          break

      current_room = room_name
      room, room_count, items, item_count, neighbors, neighbor_count = generate_location(category, room_name, item_count, room_count, neighbor_count, max_items)
      final_rooms.append(room)
      final_items.append(items)
      final_neighbors.append(neighbors)
    n = n+1
  return Game(final_rooms, final_items, final_neighbors)

game = build_game()
#f_room, f_i, f_n = build_game()
  # return game

##https://stackoverflow.com/questions/26033239/list-of-objects-to-json-with-python
def export_game_json(game, output_filename="my_gpt3_game.json"):
   encoded_json = Encoder().encode(game)
   with open(output_filename, 'w') as out:
          out.write(encoded_json)
          out.write('\n')

export_game_json(game)

Location:  Mount Olympus
 The mountain of olympus is gray and massive. It is covered in fog and has not been seen by the naked eye since the time of the greeks. Many stories are told about what lies underneath the mountain however the most well known is that there is a cave inside the mountain that is home to a pant
Connections are ['north,cave']
Neighbor is {"destination": "cave", "direction": "north", "room_id": 0, "neighbor_id": 0}
Room is {"setting": "Mount Olympus", "category": "Ancient Greece", "description": " The mountain of olympus is gray and massive. It is covered in fog and has not been seen by the naked eye since the time of the greeks. Many stories are told about what lies underneath the mountain however the most well known is that there is a cave inside the mountain that is home to a pant", "room_id": 0, "neighbors": [0]}
Location:  cave
 Long, dark and desolate, the cave is filled with ancient bones and remains. The bones of animals and men are mired in the dark, the bo

# TODO: Evaluation

An important part of NLP and machine learning is determining how good your models are.  It's very tricky to reliably evaluate generation output automatically.  For now, we'll evaluate the predictions of the model.

For your model's attribute predictions, you should compute it's precision and recall for each attribute type on the LIGHT development data.


In [None]:
!wget https://raw.githubusercontent.com/interactive-fiction-class/interactive-fiction-class-data/master/light_dialogue/light_environment_dev.json

--2022-01-31 02:47:06--  https://raw.githubusercontent.com/interactive-fiction-class/interactive-fiction-class-data/master/light_dialogue/light_environment_dev.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 485111 (474K) [text/plain]
Saving to: ‘light_environment_dev.json’


2022-01-31 02:47:06 (10.5 MB/s) - ‘light_environment_dev.json’ saved [485111/485111]



In [None]:
json_filename = 'light_environment_dev.json'

f = open(json_filename)
light_environment_dev = json.load(f)

from collections import defaultdict

def get_property_predictions(json):
  properties = ["is_gettable", "is_weapon", "is_surface", "is_container", "is_wearable", "is_drink", "is_food"]
  i=0
  results = defaultdict(list)
  objs = json['objects']
  for obj_id in objs: 
      i+=1
      for property in properties:

        finetuned_model = "babbage:ft-cis-700-24-2022-01-28-22-54-18"
        property_name = property
        item_name = objs[obj_id]['name']
        item_description = objs[obj_id]['descriptions'][0]
        
        predicted_property = get_item_property(property_name, item_name, item_description, property_model)
        #print("Predicted Results", predicted_property)
        #anything that isn't true is listed as 0 including the possibly items
        if predicted_property.split(':')[1].strip() == 'true':
          binary = 1
        else:
          binary = 0
        results[obj_id].append(binary)

  #print("Binary results are", results)
  return results



    
predictions = get_property_predictions(light_environment_dev)

In [None]:
def sort_objects_by_property(objects_by_id):
  objects_by_property = defaultdict(set)
  for object_id, obj in objects_by_id.items(): 
    name = obj['name']
    for label, value in obj.items():
      if label.startswith('is_') and value == 1:
        objects_by_property[label].add(object_id)
  return objects_by_property

from sklearn.metrics import precision_score, recall_score
f = open('light_environment_dev.json')
light_environment_dev = json.load(f)
gold_standard_objects_by_property = sort_objects_by_property(light_environment_dev['objects'])
# You can modify this function definition
def compute_precision_and_recall_for_each_properites(gold_standard_objects_by_property, predictions):
  properties = ["is_gettable", "is_weapon", "is_surface", "is_container", "is_wearable", "is_drink", "is_food"]
  i = 0
  for property in properties:
    # 0 or 1
    # Replace with your model's name
    #true positive: says get true, is get true
    #false positive: says get true, is get false
    #false negative: says get false, is get true
    #Precision: TP/(TP+FP)
    #Recall: TP/(TP+FN)
    gold = gold_standard_objects_by_property[property]
    predicted = [predictions[obj][i] for obj in predictions.keys()]
    actual = [1 if obj_id in gold else 0 for obj_id in predictions.keys()]
    i+=1
    #print("pred",predicted)
    #print("actual",actual)
    print('Recall for',property,"is: ", recall_score(actual,predicted))
    print('Precision for',property,"is: ", precision_score(actual,predicted))




compute_precision_and_recall_for_each_properites(gold_standard_objects_by_property, predictions)


Recall for is_gettable is:  0.8307692307692308
Precision for is_gettable is:  0.864
Recall for is_weapon is:  0.6428571428571429
Precision for is_weapon is:  0.7105263157894737
Recall for is_surface is:  0.46938775510204084
Precision for is_surface is:  0.6764705882352942
Recall for is_container is:  0.625
Precision for is_container is:  0.5882352941176471
Recall for is_wearable is:  0.7142857142857143
Precision for is_wearable is:  0.8333333333333334
Recall for is_drink is:  0.36363636363636365
Precision for is_drink is:  0.6666666666666666
Recall for is_food is:  0.35714285714285715
Precision for is_food is:  0.7142857142857143
