Source code for the mezzacotta Generator web page and family of random text generators.
Unlike the predecessors, mezzacotta Cafe and mezzacotta Cinematique, this code is designed to be fully generic, extensible, and customisable to any random text generation context, with a common core of categorised vocabulary files.
Contributors are welcome to submit extensions to the source text files and grammar.
- Fork this repository, add your additions, and submit a pull request.
- You can run each subdirectory's index.php to generate a page of random text, however given the random nature of the generator it may take several runs to see the effects of your changes.
Ideas for generators that we want to make are listed on the project wiki page.
- Andrew Coker - Initial work in Python
- David Morgan-Mar - PHP, HTML, and Python modifications
See also the list of contributors who participated in this project.
This project is licensed under the MIT License - see the LICENSE.txt file for details
- PHP 7.
- Python 3. We use Python 3.6 or 3.7.
- The environment variable
PYTHONIOENCODING=UTF-8
must be set. This enables reading and printing UTF-8 characters without special handling. album.py
requires the following Python packages:- flickrapi
- Pillow
- The main executable file is
generique.py
. - Core shareable vocabulary files are in the directory
vocabulary
. - Other directories contain specific generators with PHP index file, base grammar file, and context-dependent vocabulary files (of limited use to other generators).
The Python generator code generique.py
takes two or more arguments:
- The first N-1 arguments are base vocabulary file name paths, without the
.txt
extension. These specify which text generators to run. If more than one base file is given, the outputs of the base files are concatenated with a~~
separator. - The final argument is the number of lines of text to generate.
From a web browser: Place the code into your web server directory, then point your browser at the top level index.php
file. Your web server will need to allow shell execution of Python from within PHP. You may ned to edit the python path in the subdirectory index.php
files. The PHP code calls the Python generique.py
to generate a number of lines, and then formats the returned lines with HTML for browser display.
From the command line: The generique.py
script assumes that it is being called from the base generate
directory. Choose a generator subdirectory (e.g. <subdir>
= test
or tavern
) and run either:
python generique.py <subdir>/base <numlines>
(n.b. May needpython3
to specify Python 3.)cd <subdir>
followed byphp index.php
All vocabulary files (including the base vocabulary file) are text files assumed to have .txt
extensions. Vocabulary files may contain lines starting with the following directives:
#
: Comments, ignored by the parser.@format
: A format specifier. See "Format specifiers" below.@XXXcase
: Specifies how returned text from the current file is to be capitalised. See "Case specifiers" below.@quoted
: Followed by a decimal probability (0 to 1) that strings returned by the current file will be enclosed in quote marks. Useful if you want "scare quotes" around stuff.@include
: Loads the following named file, as if it were reproduced in full within the current file. The parser checks that the@format
specifier is the same, otherwise throws an error.@replace
: Specifies a replacement string using the following syntax:@replace|SEARCH|REPLACE
. Any instances ofSEARCH
in the generated text will be replaced withREPLACE
. Useful to tweak output that may generate things like "Christmas Day Eve" (@replace|Day Eve|Eve
).[text]
: Text containing normal vocabulary words and/or replacement strings starting with a dollar sign. See "Text replacement" below.#
can also be used on the same line to delineate a comment:[text] # this is a comment
.
These are lines starting with @format
. They tell the parser the format of inflected forms of words in the file. Examples:
@format ~
- None of the following text has inflected word forms. Each line is printed as-is.@format ~|S
- The following lines of text may occur in two different inflected forms, as-is, and a form which defaults to as-is concatenated with a letter 's' at the end. This is typical of nouns, and provides a plural form. (Some nouns pluralise in a different way, see below for details.)@format ~|ER|EST
- Three inflected forms, suitable for adjectives with comparative and superlative forms. e.g. {smart, smarter, smartest}. The default comparative is formed by adding 'er' to the end of the word, and the default superlative by adding 'est' to the end of the word. (Some adjectives form comparatives and superlatives in a different way, see below for details.)@format ~|S|ED|ING
Four inflected forms, suitable for verb tenses. e.g. {walk, walks, walked, walking}. (Some verbs are irregular, see below for details.)
Example text lines in various formats:
dog
- A noun with a regular plural formed simply by appendings
(dogs
). The plural does not need to be specified.fox|foxes
,mouse|mice
,fish|fish
- Nouns with an irregular plural.|butter
- Noun with no plural form (a mass noun). This syntax ignores all inflections and always returns the word as given.smart
- An adjective with regular comparative and superlative forms (smarter|smartest
).big|bigger|biggest
,bad|worse|worst
,orange|more_orange|most_orange
- Adjectives with irregular comparative and superlative forms. (bigger
andbiggest
require an extrag
to be inserted.)walk
- A regular verb with fully regular inflection endings (walks|walked|walking
).love|loves|loved|loving
,eat|eats|ate|eating
,have|has|had|having
- Irregular and irregularly spelt verbs.
These specify how text should be capitalised. By default, the generated text has the first letter capitalised, but no other letters are changed.
A specifier in a given file capitalises the strings it returns (including text generated by called files lower down the recursion stack), but does not affect text generated by calling files (further up the stack). This enables having text generated with subsections having specific capitalisation. e.g. "The adventurers found the Sword of Truth."
The only case specifier supported so far is:
@titlecase
: This specifies that strings returned by the current file will be capitalised in Title Case.
Text lines may contain words starting with dollar signs. These indicate replacement strings. Example:
the $noun
- The parser returns the wordthe
followed by a random selection from the vocabulary filenoun.txt
in thevocabulary
subdirectory.the <subdir>/$noun
- The parser returns the wordthe
followed by a random selection from the vocabulary filenoun.txt
in the<subdir>
subdirectory.the $noun_S
- The parser returns the wordthe
followed by the plural form of a random selection from the vocabulary filenoun.txt
. (n.b. the plural forms must be defined innoun.txt
- they are not automatically generated)a $adjective $noun
- The parser returns the worda
followed by a random selection from the vocabulary fileadjective.txt
followed by a random selection from the vocabulary filenoun.txt
.a $adjective_ER $noun
- The parser returns the worda
followed by the comparative form of a random selection from the vocabulary fileadjective.txt
followed by a random selection from the vocabulary filenoun.txt
.a $adjective_EST $noun
- The parser returns the worda
followed by the superlative form of a random selection from the vocabulary fileadjective.txt
followed by a random selection from the vocabulary filenoun.txt
.- etc.
The parser recursively generates random strings from referenced files until it bottoms out, and then returns the entire generated string.
There are also special @-commands that produce substituted text. The N
in each command may be any positive integer:
@randomN(A,B)+C
- Generates a random integer using the formula: (N die rolls of (a random integer from A to B inclusive)) + C. As well as+
, the following operators are supported:-
,*
,/
(performs integer division).@recentyearN
- Generates a year number in the past, biased towards more recent years, and substitutes it in place.N
is a number, which is interpreted as a scale factor for the logarithmic probability distribution. Most years generated will be within N years of the present, but there is a long low probability tail.@setN(A,B,C...)
- Generates a randomly permuted combination of N items from the set A,B,C... (formatted with commas and a penultimate 'and')
Randomly generated strings may be saved so that they can be reused later in the same text. This is useful to refer back to the same random element.
$noun=FOO
- Saves the generated noun into a variable namedFOO
.*FOO
- Substitues the saved value ofFOO
into the generated string.
Example: Players get a $fruit=FRUIT. If they eat the *FRUIT, they win.
might generate "Players get a banana. If they eat the banana, they win."
Currently it is not possible to save/reuse multiple inflections.
As a final pass, there are some automatic string replacements made to tidy up the generated text.
- If the parser generates the word
a
followed by a word that starts with a vowel, it automatically convertsa
toan
. - Underscores are converted to spaces. (Underscores are sometimes needed in vocabulary files to separate compound words with inflected forms, otherwise the inflection parsing gets confused.)
- Spaces around hyphens are removed. This is so you can specify prefixes and suffixes with hyphens. For example,
super-
as an adjective, and have the stringsuper-badger
returned without a space after the hyphen. - Plus signs and spaces around them are removed. This is so you can specify prefixes and suffixes that join onto words without hyphens. For example,
super+
as an adjective, and have the stringsuperbadger
returned without a space or hyphen. - Spaces before certain punctuation (
. , ? ! ' : ; )
) and spaces after open parentheses ((
) are removed. - Doubled up punctuation is reduced to a single punctuation character. Conflicting punctuation is resolved to the "strongest" character - e.g.
.!
becomes!
.
Example: We have two text files:
# dinosaur.txt
@format ~|S
diplodocus|diplodocuses
Tyrannosaurus rex|rexes
# mammal.txt
@format ~|S
cat
dog
horse
rabbit
We want to make a file animal.txt
that could select either dinosaurs or mammals, we could do either:
# animal.txt
@format ~|S
$dinosaur
$mammal
which would select a dinosaur 50% of the time and a mammal 50% of the time (i.e. diplodocus and T. Rex have probability 1/4, while cat, dog, horse, and rabbit all have probability 1/8), or:
# animal.txt
@format ~|S
@include dinosaur
@include mammal
which would select each animal with equal probability 1/6.
Text may be preceded by a string of one or more digits. This indicates that the following text will only be included with probability 0.<digit_string>. The script generates a random number between 0 and 1, if it is greater than the probability, then the attached text is skipped. This is useful for additional words such as adjectives that you only want to include sometimes, allowing you to tune the frequency of inclusion.
If the text is followed by a >
character and alternate text, the text following the >
is treated as an "else" string, and returned if the first text is not selected.
Either text can be a replacement string beginning with $ as specified above. The text in a conditional cannot contain spaces; use underscores in place of spaces - they will be replaced with spaces by the parser.
Examples:
4really good
- 40% chance of returning "really good"; 60% chance of returning "good".4really>very good
- 40% chance of returning "really good"; 60% chance of returning "very good".4$adj>really_very good
- 40% chance of returning "[random adjective] good"; 60% chance of returning "really very good".
If a line begins with the caret character ^
, it is always selected, rather than a random selection being made form that file. Output produced by this selection is surrounded by #####
markers, as a reminder that it has been selected over other random possibilities.
Using the ^
marker on a line that calls itself produces an infinite loop, which the generator will detect and exit after a number of iterations. Try not to do this.
The sample PHP code demonstrates how to call the Python generator and format returned lines into HTML. dish/index.php
shows an example of HTML formatting output from multiple base generators in a single Python call.
The sample PHP code does the following replacements:
- Apostrophes are replaced with curly apostrophes. (We may remove this and rely on curly apostrophes in the vocabulary files.)
- Characters
&
,<
,>
are replaced with their respective HTML entities:&
,<
,>
.
PHP is also responsible for the HTML page styling, which can be customised for each generator.