This example notebook shows how to load the [Undergraduate Games Corpus](https://github.com/barrettrees/undergraduate_games_corpus) and inspect the contents of one of the games that it references.

# Loading the corpus metadata

We provide all metadata in a single [JSON](https://www.json.org/) file, so loading it is straight forward:

In [1]:
import json

with open("corpus.json") as f:
    corpus = json.load(f)
    
print(len(corpus), 'games total')

110 games total


# Inspecting a single game

The `corpus` object maps [archival resource keys (ARKs)](https://en.wikipedia.org/wiki/Archival_Resource_Key) to game metadata objects. Let's look at the entry for the first game that happens to be built on [Twine](https://twinery.org/).

In [2]:
for ark, game in corpus.items():
    if game['engine'] == 'Twine':
        break
        
ark, game

('ark:/13030/qt07t8r6b5',
 {'permalink': 'https://escholarship.org/uc/item/07t8r6b5',
  'title': '',
  'description': '',
  'authors': [{'firstname': 'Bryden', 'lastname': 'Fong'}],
  'engine': 'Twine',
  'tags': [''],
  'license': 'Default Copyright',
  'year': '2019',
  'quarter': 'Summer 2019',
  'files': [{'downloadLink': 'http://escholarship.org/content/qt07t8r6b5/supp/fongbryden_33223_1440184_Twine_game_alpha__1_.html',
    'contentType': 'text/html'}]})

# Downloading and interpreting game project source files

Twine games are usually composed of just a single HTML files. Let's download that first file mentioned in the entry seen above.

In [3]:
import urllib.request
url = game['files'][0]['downloadLink']
with urllib.request.urlopen(url) as u:
    source = u.read()
    
print(source[:1024])
len(source), type(source)

b'<!DOCTYPE html>\n<html>\n<head>\n<meta charset="utf-8">\n<title>Twine game alpha</title>\n<style title="Twine CSS">@-webkit-keyframes appear{0%{opacity:0}to{opacity:1}}@keyframes appear{0%{opacity:0}to{opacity:1}}@-webkit-keyframes fade-in-out{0%,to{opacity:0}50%{opacity:1}}@keyframes fade-in-out{0%,to{opacity:0}50%{opacity:1}}@-webkit-keyframes rumble{50%{-webkit-transform:translateY(-0.2em);transform:translateY(-0.2em)}}@keyframes rumble{50%{-webkit-transform:translateY(-0.2em);transform:translateY(-0.2em)}}@-webkit-keyframes shudder{50%{-webkit-transform:translateX(0.2em);transform:translateX(0.2em)}}@keyframes shudder{50%{-webkit-transform:translateX(0.2em);transform:translateX(0.2em)}}@-webkit-keyframes box-flash{0%{background-color:white;color:white}}@keyframes box-flash{0%{background-color:white;color:white}}@-webkit-keyframes pulse{0%{-webkit-transform:scale(0, 0);transform:scale(0, 0)}20%{-webkit-transform:scale(1.2, 1.2);transform:scale(1.2, 1.2)}40%{-webkit-transform:scale

(304816, bytes)

## Parsing a Twine game to recover passage text and other data

We'll use the [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) library to parse this HTML file and recover the Twine-specific tags.

In [4]:
!pip install --quiet bs4

In [5]:
import bs4

soup = bs4.BeautifulSoup(source)

{
    'meta': soup.find('tw-storydata').attrs,
    'contents': list( {
        'meta': passagedata.attrs,
        'contents': ''.join(passagedata.contents)
    } for passagedata in soup.find_all('tw-passagedata'))
}

{'meta': {'name': 'Twine game alpha',
  'startnode': '1',
  'creator': 'Twine',
  'creator-version': '2.3.2',
  'ifid': '8ABC195F-D38B-4B8D-90D3-EEB71E7DA3B0',
  'zoom': '1',
  'format': 'Harlowe',
  'format-version': '3.0.2',
  'options': '',
  'hidden': ''},
 'contents': [{'meta': {'pid': '1',
    'name': 'Continue',
    'tags': '',
    'position': '300,0',
    'size': '100,100'},
   'contents': 'You wake up in an unfamiliar setting, but it seems to be in a high place because it seems frightingly hard to breath compared to home. \nThere is nobody around.\nThere is a strange mist surrounding you that makes it hard to see clearly.\nYour only goal is to find your way back to civilization. \n[[Continue->Explore]] \n(set: $water = 0)\n(set: $bag = 0)'},
  {'meta': {'pid': '2',
    'name': 'Search',
    'tags': '',
    'position': '800,100',
    'size': '100,100'},
   'contents': 'You see a bag laying on the ground next to you. \n[[Search->Items]] the bag?\n[[Explore]] around the area. \n(