### Grammar & Tracery

We are going to learn grammar-based computer text generation with [Tracery](https://tracery.io/), a simple tool made by [Kate Compton](https://www.galaxykate.com/) to make text generation fun and approachable. Originally written in Javascript, it is also available in many other programming languages. We are going to use the one written in Python, ported by [Allison Parrish](https://www.decontextualize.com/).

You can install it using command line with `pip`:

In [None]:
pip install tracery

Or running the following cell directly in Jupyter Notebook

In [1]:
import sys
!{sys.executable} -m pip install tracery

Collecting tracery
  Downloading tracery-0.1.1.tar.gz (8.1 kB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hBuilding wheels for collected packages: tracery
  Building wheel for tracery (pyproject.toml) ... [?25ldone
[?25h  Created wheel for tracery: filename=tracery-0.1.1-py3-none-any.whl size=7729 sha256=95af5b53a586a4e87968282c809791f0a0b9a58ea4fbd8c8f3b2e06c57626700
  Stored in directory: /Users/cqx931/Library/Caches/pip/wheels/c6/9d/1b/e0836970b0149fec3c72192a9d1dca244b00401379e20391ee
Successfully built tracery
Installing collected packages: tracery
Successfully installed tracery-0.1.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


After the installation, you will need to import the tracery library.

In [2]:
import tracery

---
#### Rules & Syntax

A Tracery grammar is a set of rules that defines how a piece of text shall be generated. The rules can be nested and have the potential for expansions. 

In Python, Tracery rules are are written as dictionaries. The name of the rules are keys while the expansions are values.

Let's take a look at a very simple example of a Tracery grammar with only one rule.

In [9]:
rules = {
    "origin": "As we may speak"
}

To generate text from this grammar, we need to first create a `Grammar` object from the tracery library by passing the rules as a parameter.

In [10]:
grammar = tracery.Grammar(rules)

After that we can call the method `.flatten()` of the `Grammar` object 

In [11]:
grammar.flatten('#origin#')

'As we may speak'

This Tracery grammar can produce only one text and that's not very interesting.  Let's make it a bit more generative.

In [31]:
rules = {
    "origin":"As we may #verb#",
    "verb":["think","speak","see"]
}

grammar = tracery.Grammar(rules)
grammar.flatten('#origin#')

'As we may see'

Notice that this time we put something with special syntax to origin:

`#verb#`

This is the syntax that notify the Tracery generator that it is the name of a rule here and the generator will look up this name in the grammar and replaces the text with the expansion of that rule.

Under rule `verb`, instead of an individual string we have a list of strings, which provides multiple possibilities for how this rule is expanded. In such scenario, Tracery will select one item from the list in a random manner.

Now, let's add a third rule to our Tracery grammar.

In [58]:
rules = {
    "origin":"As we #modal# #verb#",
    "modal":["may", "might","could","can", "must"],
    "verb":["think","speak","see","dream","hear"]
}

grammar = tracery.Grammar(rules)
grammar.flatten('#origin#')

'As we could see'

As the Tracery grammar gets more complicated, it's helpful to be able to see multiple generative results within one run. A `for` loop comes in handy to fulfill such a task.

In [38]:
for i in range(5):
    print(grammar.flatten("#origin#"))

As we might think
As we could speak
As we must hear
As we could think
As we could hear


As your list gets longer, it can get annoying to always type all the necessary syntax. You can also use the `.split()` method from String to avoid that. The following cade takes the same example code and rewrites the lists into String with a `.split()` method.

In [76]:
rules = {
    "origin":"As we #modal# #verb#",
    "modal":"may might could can must".split(" "),
    "verb": "think speak see dream hear".split(" ")
}

grammar = tracery.Grammar(rules)
for i in range(5):
    print(grammar.flatten("#origin#"))

As we may speak
As we could think
As we can think
As we could dream
As we must speak


You can also pass variables into your Tracery grammar in case your list gets really long.

In [75]:
modal = ["may", "might","could","can", "must"]
verb = ["think","speak","see","dream","hear"]

rules = {
    "origin":"As we #modal# #verb#",
    "modal": modal,
    "verb": verb
}

grammar = tracery.Grammar(rules)
for i in range(5):
    print(grammar.flatten("#origin#"))

As we could see
As we could see
As we could think
As we might dream
As we can think


By default, the starting rule of a Tracery grammar is called `origin`. A lot of other similar libraries/tools also follow this convention. If you want your grammar to be compatible with other tools, itâ€™s best to follow this convention. But actually you can use any name for the starting rule, as long as you call it inside `grammar.flatten()`.

In [79]:
modal = ["may", "might","could","can", "must"]
verb = ["think","speak","see","dream","hear"]

rules = {
    "start":"As we #modal# #verb#",
    "modal": modal,
    "verb": verb
}

grammar = tracery.Grammar(rules)
for i in range(5):
    print(grammar.flatten("#start#"))

As we could dream
As we might see
As we might dream
As we can think
As we may speak


---
### More Advanced Techniques

#### Nested Rules
Rules can also be nested and have other features to create more complicated generative result. Let's see the following example based a famous sentence: "A rose is a rose is a rose." written by Gertrude Stein.



In [77]:
from tracery.modifiers import base_english

rules = {
    "origin": "[myNoun:#n#] #myNoun.a.capitalize# #subsentence# #subsentence#.",
    "n": ["rose","moon","apple"],
    "subsentence": "is #myNoun.a#"
}

grammar = tracery.Grammar(rules)
grammar.add_modifiers(base_english)

for i in range(5):
    print(grammar.flatten("#origin#"))

 A rose is a rose is a rose.
 A moon is a moon is a moon.
 A rose is a rose is a rose.
 An apple is an apple is an apple.
 An apple is an apple is an apple.


Here we begin the expansion from #origin#, it unfolds to the rule #subsentence# and inside #subsentence# the grammar expands further.


#### Variable
Another feature is being demonstrated here is the ability to create variables inside a Tracery grammar. It is useful to have such a feature so that we can keep consistent throughout the text on something that is randomly picked from a list.

The syntax for a variable is `[]` and you should put it inside the #origin# rule to be able to access it everywhere else in your grammar. This is similar to the concept of global variable.

Once you declared a variable inside `[]`, you can use it inside other rules as a normal rule key, and it will be parsed automatically by Tracery to be handled differently.



#### Modifiers

Tracery also support something called `Modifiers`. You can add modifiers to a rule, to make things more grammatically correct. Commonly used modifiers are:

- `.capitalize`: to capitalize the first letter
- `.`
- `.a`: add `a` or `an` before a word
- `.s`: plural forms
- `.ed`: simple past tense

You can use the modifiers by adding `.a` inside the `#` signs and after the name of the rule. For example, for rule `#noun#`, we can add `.a` like `#noun.a#`.

You can also add multiple modifiers for the same rule. Pay attention to the order when you add multiple modifiers. In the example above, we first added `.a` to `#myNoun#`, then a second modifier `.capitalize` so that it capitalize `a` or `an` instead of the word itself.

To use modifiers, you shall always import the modifiers first.



In [44]:
from tracery.modifiers import base_english

After your Grammar object is created, use `.add_modifiers()` method to apply the modifiers. This should be done before you expand the grammar.

In [None]:
grammar.add_modifiers(base_english)

In the following example you can see that the modifiers are rather limited, as they can only modify the rules based on simple regular rules but not for irregular cases. (Ex: think-> thought, hear->heard, speak->spoke, fish->fish, goose->geese, child->children) So for those cases, it is better to hard code it into your Tracery Grammar.

In [78]:
rules = {
    "origin":"#p# #verb.ed#. #n.s# #verb#",
    "p": ["we","I","she","he", "you"],
    "n": ["fish", "goose","child"],
    "verb": ["think","speak","see","dream","hear"]
}
# modifiers: .a, .s, .capitalize
grammar = tracery.Grammar(rules)
grammar.add_modifiers(base_english)
for i in range(5):
    print(grammar.flatten("#origin#"))

he dreamed. childs dream
I speaked. gooses see
you seed. fishes speak
you heared. gooses see
you dreamed. gooses see


---
### External Files
It's possible to use this python version of Tracery on any JSON Tracery grammar file. You can try it out with the nightvale.json (an example from official Tracery website) file using the following command:

In [None]:
python -m tracery nightvale.json

Or you can import the json file into Jupyter notebook.

In [93]:
import json

rules =  json.loads(open("nightvale.json").read())

grammar = tracery.Grammar(rules)
grammar.add_modifiers(base_english)

print(grammar.flatten('#origin#'))


Don't long for the river because the river is full of honey. You will be made of honey, too, when the time comes. Welcome to Night Vale....This is a story about a hooded woman. You know, the hooded woman who never moans when they see the earth. Well, I was walking with the hooded woman, when we both saw this sky. Glistening, black...well, more of an indigoish black. We backed away because as we legally know, skies are illegal. That was the last we saw of it. And now, the weather. Music plays. Distorted mariachi vocals reverberates into dissonance. You recall the future and pain. You know, I miss the sky. It was just beautiful. I mean, really perfect, for a sky. When the time comes, I hope it comes back. We'll see it, rising, orange...well, more of a blackish purple. But it'll be back. I guess, in the end. If not, but it hasn't always been that way. ...Goodnight, Night Vale, goodnight.




You can also import a list of words from seperate files. A good place to find pre-sorted list of words  is [here](https://github.com/dariusk/corpora/tree/master/data).


In [85]:
myList = open("list.txt").read().split("\n")
print(myList)

['universe', 'star', 'moon', 'sun', 'cloud', 'stardust', 'asteroid', 'galaxy', 'planet', 'earth', 'comet', 'constellation', 'satellite', '']


In case your list is very long, you can also just slice a section of it to use it in your grammar.

In [95]:
myList[:10] # get the first 10 words from the list

['universe',
 'star',
 'moon',
 'sun',
 'cloud',
 'stardust',
 'asteroid',
 'galaxy',
 'planet',
 'earth']


You can also write your tracery grammar in [the official online editor](https://tracery.io/editor/) and export it as a JSON file to be loaded in your python program later.

---
#### Assignment 1
Due next Thursday

- Choose a format or language style. For example: song lyrics, a haiku, a news headline, a restaurant menu, or any other sentence-based structure you find interesting. Alternatively, you may create a generative short story.
- Select a set of words that fit your chosen format or theme.
- Write a text-generator using Tracery in this format
- Generate and print 10 outputs from your generator. (Your grammar shall be complicated enough to generate 10 unique outputs)
- Save it as `Assignment1.py` and upload it to your GitHub repository