testperanto tutorial 1: our first fake words
-------------------------------------------------

Let's begin with the following piece of code.

In [5]:
from testperanto.config import init_grammar_macro, generate_sentences

config = {"grammar": [
            {"rule": "START -> NN"},
            {"rule": "NN -> (@verbatim apple)"},
            {"rule": "NN -> (@verbatim banana)"}
          ]}
grammar = init_grammar_macro(config)
for sent in generate_sentences(grammar, start_state='START', num_to_generate=10):
    print(sent)

100%|██████████████████████████████████████████████████| 10/10 [00:00<00:00, 4279.03it/s]

banana
apple
banana
banana
apple
banana
apple
apple
apple
apple





This defines (and generates some sentences from) a simple context-free grammar (CFG). Nonterminals should always start with a capital letter, whereas terminals should be enclosed by parentheses. Syntactically, these terminals are the only departure from a typical CFG. Terminals should be expressed in the format:

    (@vbox vinput)

where ```vbox``` is the name of the "voicebox" we want to use, and ```vinput``` is the input to the voicebox. In ```testperanto```, the role of the voicebox is to translate generic words into specific words, for instance, mapping something like ```noun.52``` to the word apple. The most straightforward voicebox is the ```verbatim``` voicebox, which simply renders words verbatim. 

Applicable rules are chosen randomly, with probability proportional to their weights. By default, each rule has a equivalent weight of ```1.0```, so you should have seen a roughly equal proportion of apples and bananas. But we can specify different rule weights:

In [6]:
config = {"grammar": [
            {"rule": "START -> NN"},
            {"rule": "NN -> (@verbatim apple)", "base_weight": 0.2},
            {"rule": "NN -> (@verbatim banana)", "base_weight": 0.8}
          ]}
grammar = init_grammar_macro(config)
for sent in generate_sentences(grammar, start_state='START', num_to_generate=10):
    print(sent)

100%|██████████████████████████████████████████████████| 10/10 [00:00<00:00, 4151.13it/s]

banana
apple
apple
banana
banana
apple
banana
banana
banana
banana





This should generate more bananas than apples. Rather than choose words ourselves, we can get ```testperanto``` to come up with words for us. For instance:

In [7]:
config = {"grammar": [
            {"rule": "START -> NN"},
            {"rule": "NN -> (@nn (STEM noun.52) (COUNT sng))"}
          ]}
grammar = init_grammar_macro(config)
for sent in generate_sentences(grammar, start_state='START', num_to_generate=5):
    print(sent)

100%|████████████████████████████████████████████████████| 5/5 [00:00<00:00, 3878.59it/s]

dagudun
dagudun
dagudun
dagudun
dagudun





The default voicebox theme is ```"english"```, and thus uses English morphology. For instance, if we ask for a plural noun, it will add an ```"s"```.

In [8]:
config = {"grammar": [
            {"rule": "START -> NN"},
            {"rule": "NN -> (@nn (STEM noun.34) (COUNT plu))"}
          ]}
grammar = init_grammar_macro(config)
for sent in generate_sentences(grammar, start_state='START', num_to_generate=5):
    print(sent)

100%|████████████████████████████████████████████████████| 5/5 [00:00<00:00, 3059.75it/s]

flaglojals
flaglojals
flaglojals
flaglojals
flaglojals





The ```"english"``` theme ends verbs with the suffix ```-ize```, and can perform some simple tenses and conjugations.

In [9]:
config = {"grammar": [
            {"rule": "START -> VB"},
            {"rule": "VB -> (@verbatim present:) (@vb (STEM verb.281) (COUNT sng) (PERSON 3) (TENSE present))"},
            {"rule": "VB -> (@verbatim perfect:) (@vb (STEM verb.281) (COUNT sng) (PERSON 3) (TENSE perfect))"}
          ]}
grammar = init_grammar_macro(config)
for sent in generate_sentences(grammar, start_state='START', num_to_generate=10):
    print(sent)

100%|██████████████████████████████████████████████████| 10/10 [00:00<00:00, 2394.15it/s]

present: meekanizes
present: meekanizes
present: meekanizes
perfect: meekanized
present: meekanizes
present: meekanizes
perfect: meekanized
perfect: meekanized
perfect: meekanized
present: meekanizes





As an example of an alternative theme, ```testperanto``` also provides the stub of a Romanized ```"japanese"``` theme. Note that we don't need to recreate the grammar, we just render the generic words with a different voicebox.

In [64]:
import testperanto.wordgenerators
import testperanto.voicebox
for sent in generate_sentences(grammar, start_state='START', num_to_generate=10, vbox_theme="japanese"):
    print(sent)

[autoreload of testperanto.wordgenerators failed: Traceback (most recent call last):
  File "/Users/markhopkins/opt/anaconda3/envs/spred2/lib/python3.8/site-packages/IPython/extensions/autoreload.py", line 245, in check
    superreload(m, reload, self.old_objects)
  File "/Users/markhopkins/opt/anaconda3/envs/spred2/lib/python3.8/site-packages/IPython/extensions/autoreload.py", line 394, in superreload
    module = reload(module)
  File "/Users/markhopkins/opt/anaconda3/envs/spred2/lib/python3.8/imp.py", line 314, in reload
    return importlib.reload(module)
  File "/Users/markhopkins/opt/anaconda3/envs/spred2/lib/python3.8/importlib/__init__.py", line 169, in reload
    _bootstrap._exec(spec, module)
  File "<frozen importlib._bootstrap>", line 604, in _exec
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/Users/markhopkins/Documents/projects/testperanto/testperanto/wordge

present: reratuhemasu
perfect: reratuhemashita
perfect: reratuhemashita
present: reratuhemasu
perfect: reratuhemashita
present: reratuhemasu
perfect: reratuhemashita
present: reratuhemasu
present: reratuhemasu
present: reratuhemasu



