testperanto tutorial 1: our first fake words
-------------------------------------------------

Let's begin with the following piece of code.

In [None]:
from testperanto.config import init_wrig, generate_sentences

config = {"grammar": [
            {"rule": "START -> NN"},
            {"rule": "NN -> (@verbatim apple)"},
            {"rule": "NN -> (@verbatim banana)"}
          ]}
grammar = init_wrig(config)
for sent in generate_sentences(grammar, start_state='START', num_to_generate=10):
    print(sent)

This defines (and generates some sentences from) a simple context-free grammar (WRIG is short for weighted random-access indexed grammar, a generalization of a CFG that we will learn more about during these tutorials). Nonterminals should always start with a capital letter, whereas terminals should be enclosed by parentheses. Syntactically, these terminals are the only departure from a typical CFG. Terminals should be expressed in the format:

    (@vbox vinput)

where ```vbox``` is the name of the "voicebox" we want to use, and ```vinput``` is the input to the voicebox. In ```testperanto```, the role of the voicebox is to translate generic words into specific words, for instance, mapping something like ```noun.52``` to the word apple. The most straightforward voicebox is the ```verbatim``` voicebox, which simply renders words verbatim. 

Applicable rules are chosen randomly, with probability proportional to their weights. By default, each rule has a equivalent weight of ```1.0```, so you should have seen a roughly equal proportion of apples and bananas. But we can specify different rule weights:

In [None]:
config = {"grammar": [
            {"rule": "START -> NN"},
            {"rule": "NN -> (@verbatim apple)", "base_weight": 0.2},
            {"rule": "NN -> (@verbatim banana)", "base_weight": 0.8}
          ]}
grammar = init_wrig(config)
for sent in generate_sentences(grammar, start_state='START', num_to_generate=10):
    print(sent)

This should generate more bananas than apples. Rather than choose words ourselves, we can get ```testperanto``` to come up with words for us. For instance:

In [None]:
config = {"grammar": [
            {"rule": "START -> NN"},
            {"rule": "NN -> (@nn (STEM noun.52) (COUNT sng))"}
          ]}
grammar = init_wrig(config)
for sent in generate_sentences(grammar, start_state='START', num_to_generate=5):
    print(sent)

The default voicebox theme is ```"english"```, and thus uses English morphology. For instance, if we ask for a plural noun, it will add an ```"s"```.

In [None]:
config = {"grammar": [
            {"rule": "START -> NN"},
            {"rule": "NN -> (@nn (STEM noun.34) (COUNT plu))"}
          ]}
grammar = init_wrig(config)
for sent in generate_sentences(grammar, start_state='START', num_to_generate=5):
    print(sent)

The ```"english"``` theme ends verbs with the suffix ```-ize```, and can perform some simple tenses and conjugations.

In [None]:
config = {"grammar": [
            {"rule": "START -> VB"},
            {"rule": "VB -> (@verbatim present:) (@vb (STEM verb.281) (COUNT sng) (PERSON 3) (TENSE present))"},
            {"rule": "VB -> (@verbatim perfect:) (@vb (STEM verb.281) (COUNT sng) (PERSON 3) (TENSE perfect))"}
          ]}
grammar = init_wrig(config)
for sent in generate_sentences(grammar, start_state='START', num_to_generate=10):
    print(sent)

As an example of an alternative theme, ```testperanto``` also provides the stub of a Romanized ```"japanese"``` theme. Note that we don't need to recreate the grammar, we just render the generic words with a different voicebox.

In [None]:
import testperanto.wordgenerators
import testperanto.voicebox
for sent in generate_sentences(grammar, start_state='START', num_to_generate=10, vbox_theme="japanese"):
    print(sent)