Corpora

What are they?

The pantheon-generator package comes with 6 corpora. Each one is a collection of tokenized texts from which related words will be pulled when generating your pantheon. Think of them as Gene Pools. In alphabetical order they are:

Erotica
Fairytales
Fantasy
Mathematics
Plants and Animals
Sci-Fi

Find example output for each pool here.

How Tos...

Pick your Pantheon's Gene Pool
Define a new Gene Pool
Add your own source material

Pick your Pantheon's Gene Pool

Before you generate any Gods call tokens.pick_gene_pool() and pass it the name of a corpus. A corpus's name is the name of the directory that contains it, ex. "sci-fi".

tokens.pick_gene_pool("sci-fi")
egg_donor = God("art","philosophy","XX")
sperm_donor = God("war","diplomacy","XY")
pantheon = pantheon(egg_donor,sperm_donor)
pantheon.spawn(5)

Define a new Gene Pool

A Gene Pool is a collection of tokenized texts. There are ~40 tokenized texts in the /data/corpora/ directory. By combining them in new ways you can produce new Gene Pools. For example, you could combine culinary-poisons.json with dutch-fairy-tales.json and deductive-logic.json to produce a Gene Pool named "eclectic". Here's how:

From a terminal, cd to the pantheon directory and enter python interpreter.
Declare a list variable containing the JSON files you want to include in your corpus.
Generate a tokens directory and sources.txt file for your corpus using make_tokens_dir().
Generate the primary tokens list. It's comprised of plural nouns.
Generate the secondary or "mutant" tokens list. It's comprised of gerunds.

Here's the code:

from tokens import *
list_tokenized_texts()
sources = ["culinary-poisons.json","dutch-fairy-tales.json","deductive-logic.json"]
make_tokens_dir("eclectic",sources)
make_tokens_list("eclectic", ["VBG"])
make_tokens_list("eclectic", ["NNS"])

Your corpus is ready. Select it with tokens.set_tokens_lists(<dirname>).

Add your own source material

You can add your own sources to the /data/corpora directory. Here's how:

Download .txt files into /data/corpora.
From a terminal, cd to the pantheon directory and enter python interpreter.
Use the tokenize_texts() method to automatically detect and tokenize these new files.

Here's the code:

from tokens import *
tokenize_texts()

From a terminal, cd to the pantheon directory and enter python interpreter
Import tokens and use tokenize_texts() to produce JSON files.

Your tokenized texts are now available to be included in a new corpus.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corpora

What are they?

Pick your Pantheon's Gene Pool

Define a new Gene Pool

Add your own source material

Clone this wiki locally