# Speculative magic words workbook

By [Allison Parrish](http://www.decontextualize.com/)

(Early draft, incomplete, under construction gif here)

The goal of this notebook is to demonstrate some computational means for exploring the literary genre of the *magic word*. For present purposes, I define a "magic word" as a string of letters that affords a foregrounding of its material properties (e.g., spelling, pronunciation), and suggests some effect beyond meaning alone. The underlying assumption (maybe faulty) is that magic words with similar material properties will also have similar effects, and that by writing computer programs to produce magic words (whether from whole cloth or as variants on other magic words), we can produce *new* magic words with *new* effects.

I don't understand this notebook as a way of *casting* spells, but merely as a way of investigating potential forms. Hence: *speculative* magic words.

The notebook serves as a demonstration of (1) Python string manipulation techniques; and (2) the Pincelate library for grapheme-to-phoneme and phoneme-to-grapheme translation.

Some of these examples will be data-driven, i.e., we need an existing corpus of words. [Download this file](https://github.com/dariusk/corpora/blob/master/data/words/nouns.json) into the same folder as this notebook like so:

In [332]:
!curl -L -O https://raw.githubusercontent.com/dariusk/corpora/master/data/words/nouns.json

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 18192  100 18192    0     0  94259      0 --:--:-- --:--:-- --:--:-- 94750


In [1]:
import json
nouns = [item.lower() for item in json.load(open("nouns.json"))['nouns']]

In [2]:
import random

In [3]:
random.choice(nouns)

'juror'

## Orthographic variations

> "[W]riting gave physical permanence to words.... Written words continued to act in one's behalf long after the sound of spoken words had ceased" (Skemer 133)

> "Motion terminates at no other end save its own beginning, in order to cease and rest in it... In the intelligible world... Grammar begins with the letter, from which all writing is derived and into which it is all resolved" (John Scotus Erigena, quoted in Leggott 46)

> "[T]he unit of textual meaning—the letter—lacks meaning itself. The alphabet's semantic vacuum represents a threat to orthodoxy, for into this space competing meaning systems may rush." (Crain 18)

The words in many apotropaic charms exhibit certain kinds of manipulation that we can characterize as *orthographic* in nature—i.e., they have to do with the letters in the words. In this section of the notebook, I show some computer code for performing these transformations explicitly.

The following cell defines a short text that we'll use for testing purposes:

In [165]:
text = "in the beginning was the notebook"

### Cacography

> "In medieval manuscripts, the letters themselves were frequently a source of confusion. [...] The first letters of words can be omitted... while others are doubled up.... Words can be dislocated," "compounded," "contracted," "abbreviated"; "letters vanish. [...] [W]e should also mention the variations made with uppercase and lowercase letters.... May this overview give the reader a small idea of the difficulties encountered by the researcher!" (Lecouteux xxi)

"Cacography" here means writing with mistakes. In medieval grimoires, mistakes were usually introduced as errors in copying, but the presence of errors actually made people perceive the spells as more powerful. We can simulate these errors in Python.

#### Compounding/contracting words

This operation "contracts" two words, smooshing together the first and last parts.

In [166]:
noun1 = random.choice(nouns)
noun2 = random.choice(nouns)

In [167]:
print(noun1, noun2)

words dominion


In [168]:
noun1[:int(len(noun1)/2)] + noun2[int(len(noun2)/2):]

'wonion'

In function form:

In [169]:
def smoosh(a, b):
    return a[:int(len(a)/2)] + b[int(len(b)/2):]
smoosh("allison", "parrish")

'allrish'

#### Dislocation

This operation inserts random spaces, dislocating words from each other.

In [22]:
out = ""
for ch in text:
    if random.random() < 0.1:
        out += " "
    out += ch
print(out)

in the beginning  was the jupy ter notebook


As a function:

In [23]:
def dislocate(s, prob=0.1):
    out = ""
    for ch in s:
        if random.random() < prob:
            out += " "
        out += ch
    return out
dislocate("abracadabra")

'abracadabra'

In [24]:
dislocate("abracadabra", 0.75)

' ab r aca d a b r a'

### Coding, transliteration, encryption

#### Character ciphers

In [170]:
def replace_by_char(s, ch_map):
    out = ""
    for ch in s:
        if ch in ch_map:
            out += ch_map[ch]
        else:
            out += ch
    return out

In [171]:
nextch_map = {
    'a': 'b', 'b': 'c', 'c': 'd', 'd': 'e',
    'e': 'f', 'f': 'g', 'g': 'h', 'h': 'i',
    'i': 'j', 'j': 'k', 'k': 'l', 'l': 'm',
    'm': 'n', 'n': 'o', 'o': 'p', 'p': 'q',
    'q': 'r', 'r': 's', 's': 't', 't': 'u',
    'u': 'v', 'v': 'w', 'w': 'x', 'x': 'y',
    'y': 'z', 'z': 'a'
}

In [172]:
replace_by_char("allison parrish", nextch_map)

'bmmjtpo qbssjti'

In [173]:
import codecs
codecs.encode("allison parrish", 'rot13')

'nyyvfba cneevfu'

#### Mirror writing

> "According to legend, some devil-pacts were written in retrograde to invoke diabolical powers. [...] Artists depicted retrograde writing as demonic. In a 15th c. block book, a demon is shown holding up a tablet on which the sins of the dying man's life are recorded in mirror writing..." (Skemer 121)

In [174]:
# from https://github.com/combatwombat/Lunicode.js/blob/master/lunicode.js
mirror_replacements = {
    'a': 'ɒ', 'b': 'd', 'c': 'ɔ', 'd': 'b', 'e': 'ɘ', 
    'f': 'Ꮈ', 'g': 'ǫ', 'h': 'ʜ', 'i': 'i', 'j': 'ꞁ',
    'k': 'ʞ', 'l': 'l', 'm': 'm', 'n': 'ᴎ', 'o': 'o',
    'p': 'q', 'q': 'p', 'r': 'ɿ', 's': 'ꙅ', 't': 'ƚ',
    'u': 'u', 'v': 'v', 'w': 'w', 'x': 'x', 'y': 'ʏ', 'z': 'ƹ',
    'A': 'A', 'B': 'ᙠ', 'C': 'Ɔ', 'D': 'ᗡ', 'E': 'Ǝ',
    'F': 'ꟻ', 'G': 'Ꭾ', 'H': 'H', 'I': 'I', 'J': 'Ⴑ',
    'K': '⋊', 'L': '⅃', 'M': 'M', 'N': 'Ͷ', 'O': 'O',
    'P': 'ꟼ', 'Q': 'Ọ', 'R': 'Я', 'S': 'Ꙅ', 'T': 'T',
    'U': 'U', 'V': 'V', 'W': 'W', 'X': 'X', 'Y': 'Y', 'Z': 'Ƹ'}

In [175]:
print(text + " " + replace_by_char(text, mirror_replacements))

in the beginning was the notebook iᴎ ƚʜɘ dɘǫiᴎᴎiᴎǫ wɒꙅ ƚʜɘ ᴎoƚɘdooʞ


#### Mimicking handwriting mistakes and misinterpretations

In [176]:
# suggested in Lecouteux, p. xxi
replacements = {
    'u': ['o', 'n'],
    'st': ['h'],
    'p': ['f'],
    'ni': ['m'],
    'rn': ['m'],
    'in': ['m'],
    'iu': ['m', 'in'],
    'r': ['t', 'z', 'c'],
    'l': ['t'],
    'c': ['t'],
    'd': ['ol']
}

In [177]:
import re
import random

This has to be a little different, because the patterns on the left have varying numbers of characters. So we can't just step straight through the source string character by character.

In [179]:
out = text
for patt, repl in replacements.items():
    out = re.sub(patt,
                 lambda m: random.choice(repl) if random.random() < 0.5 else m.group(),
                 out)
print(text)
print(out)

in the beginning was the notebook
in the beginnmg was the notebook


### Abbreviations

> [In magic spells] "we find sequences of letters that can be the initials of words. [...] A passage from the *Gesta Imperatorum* suggests this; in fact we read there the sequence "P P P, S S S, R R R, F F F," meaning, "Pater patriae perditur, sapientia secum sustollitur, ruunt regna Rome ferro, flamma, fame." The series of letters would therefore be a mnemonic means used to retain whole phrases, but in charms it also serves as a way to keep things secret..." (Lecouteux xx)

In [180]:
def abbrev(s, take=1):
    words = s.split()
    return [w[:take] for w in words]
abbrev("hello there how are you?")

['h', 't', 'h', 'a', 'y']

In [181]:
abbrev(text, 2)

['in', 'th', 'be', 'wa', 'th', 'no']

In [182]:
print(''.join(abbrev(text, 2)))

inthbewathno


In [183]:
init_cap = [item.capitalize() for item in abbrev(text, 2)]
print('. '.join(init_cap))

In. Th. Be. Wa. Th. No


### Formatting

According to Skemer, magic words and formulas such as *abracadabra* and *abraxas* were "often written as diminishing and augmenting series of letters"—shaped in "inverted triangles" or "[mandorlas](https://en.wikipedia.org/wiki/Mandorla)" (116).

In [184]:
def triangle(s):
    out = []
    for i in range(len(s)):
        snippet = s[:i+1]
        out.append(snippet)
    return out
print("\n".join(triangle("abracadabra")))
print("\n".join(reversed(triangle("abracadabra"))))

a
ab
abr
abra
abrac
abraca
abracad
abracada
abracadab
abracadabr
abracadabra
abracadabra
abracadabr
abracadab
abracada
abracad
abraca
abrac
abra
abr
ab
a


In [185]:
def mandorla(s):
    return triangle(s)[:-1] + list(reversed(triangle(s)))

In [186]:
print("\n".join(mandorla("abracadabra")))

a
ab
abr
abra
abrac
abraca
abracad
abracada
abracadab
abracadabr
abracadabra
abracadabr
abracadab
abracada
abracad
abraca
abrac
abra
abr
ab
a


In [187]:
from IPython.display import display, HTML

In [188]:
html_src = "<div style='text-align: center'>"
html_src += "<br>".join(mandorla("abracadabra"))
html_src += "</div>"

In [189]:
display(HTML(html_src))

In [190]:
html_src = "<div style='text-align: center'>"
html_src += "<br>".join(mandorla("abracadabra" + replace_by_char("abracadabra", mirror_replacements)))
html_src += "</div>"

In [191]:
display(HTML(html_src))

### Word squares

The [Sator Square](https://en.wikipedia.org/wiki/Sator_Square):

    S A T O R
    A R E P O
    T E N E T
    O P E R A
    R O T A S
    
"Arepo the sower guides the wheels by his work" (Skemer's translation, pp. 116–117), an example of an apotropaic formula that "clearly worked best in writing" (134).

In [192]:
def gen_str(n, alphabet):
    return ''.join([random.choice(alphabet) for i in range(n)])
gen_str(5, alphabet="abcdefghijklmnopqrstuvwxyz")

'knogk'

In [193]:
gen_str(5, alphabet="abracadabra")

'aarad'

In [194]:
def gen_square(n, alphabet='abcdefghijklmnopqrstuvwxyz', start=None):
    if start is None:
        rows = [gen_str(n, alphabet)]
    else:
        assert len(start) == n
        rows = [start]
    for i in range(int(n/2)):
        beg = ""
        end = ""
        for j in range(i+1):
            beg += rows[j][i+1]
            end += rows[j][-i-2]
        row = beg + gen_str(n - ((i+1)*2), alphabet) + ''.join(reversed(end))
        rows.append(row)
    return rows + list(reversed([''.join(reversed(s)) for s in rows[:int(n/2)]]))

In [195]:
print("\n".join(gen_square(5)))

xoqgr
olfug
qfifq
guflo
rgqox


In [52]:
print("\n".join(gen_square(5, alphabet="satorarepotenet", start="sator")))

sator
aarto
trtrt
otraa
rotas


In [53]:
print("\n".join(gen_square(7, start="allison")))

allison
lfyinwo
lybxgns
iixcxii
sngxbyl
owniyfl
nosilla


In [54]:
print()
print("\n".join(gen_square(5, alphabet="😀😄😁😆😅😂🤣😊😙😗😘🥰😍😌😉🙃🙂😇😚😋😛😝😜🤨🧐🤓😎")))


😍🤓😝😛😂
🤓😀😌🤨😛
😝😌😄😌😝
😛🤨😌😀🤓
😂😛😝🤓😍


### Numerology

> "Gematria was based on the fact that, in Hebrew, numbers are indicated by letters; this means that each Hebrew word can be given a numerical value, calculated by summing numbers represented by its letters. This allows mystic relations to be established between words having different meanings though identical numerical values..." (Eco 28)

In [55]:
# only works for the English alphabet (a-z)
def letter_value(ch):
    if not(ch.isalpha()):
        return 0
    return ord(ch.lower()) - 96
letter_value('a')

1

In [56]:
def gematriesque(s):
    return sum([letter_value(ch) for ch in s])
gematriesque('allison')

82

In [57]:
from collections import defaultdict
by_sum = defaultdict(list)
word_to_sum = {}

In [58]:
for item in nouns:
    letter_sum = gematriesque(item)
    word_to_sum[item] = letter_sum
    by_sum[letter_sum].append(item)

In [59]:
by_sum[72]

['ambulance',
 'carrier',
 'dawning',
 'discord',
 'homeland',
 'lifeline',
 'mayor',
 'sending',
 'tendon',
 'tracing']

In [60]:
gematriesque('allison')

82

In [61]:
print("\n".join(by_sum[gematriesque('allison')]))

frenchman
apartheid
artisan
bowling
colors
conflict
glucose
gusto
hallway
indecency
innocence
juror
kangaroo
melodrama
panther
volcano
voltage


## The sound

> "[T]hose who are skilled in the use of incantations, relate that the utterance of the same incantation in its proper language can accomplish what the spell professes to do; but when translated into any other tongue, it is observed to become inefficacious and feeble. And thus it is not the things signified, but the qualities and peculiarities of words, which possess a certain power for this or that purpose..."—Origen (in Richardson and Pick 406–407)

> "The rhyme, repetition and alliteration of charms produced a sonorous effect that appealed to users and had psychological effects. [...] Words in a sacralized and euphonious language like Latin could be soothing to the ear and thus might seem to have an immediate magical effect. [...] Vocalized reading [was] better able to deter evil spirits" (Skemer 153)

> "I don't think I can breathe / With the way you let me down [...] / I don't need the words / I want the sound, sound, sound..." (Jepsen)

A recurring concept with magic words is that *what they sound like matters*. So it would be nice if we had some way to compose magic words based solely on their phonetics. The problem (in English, at least) is creating the *written* form of a word from its sound—i.e., spelling.

Pincelate is a Python library that provides a simple interface for a machine learning model that can sound out English words and spell English words based on how they sound. "Sounding out" here means converting letters ("orthography") to sounds ("phonemes"), and "spelling" means converting sounds to letters (phonemes to orthography). The model is trained on the [CMU Pronouncing Dictionary](http://www.speech.cs.cmu.edu/cgi-bin/cmudict), which means it generally sounds words out as though speaking "standard" North American English, and spells words according to "standard" North American English rules (at least as far as the model itself is accurate).

### Installing Pincelate

You need to install the `tensorflow` and `pincelate` modules. Open up a terminal window and type the following lines:

    pip install tensorflow
    pip install pincelate
    
If you're not using Anaconda, you might also need to install a few other libraries:

    pip install numpy scipy
    
Now import the libraries:

In [62]:
import numpy as np
import pronouncing as pr

Now import Pincelate and instantiate a Pincelate object. (This will load the pre-trained model provided with the package.)

In [63]:
from pincelate import Pincelate

Using TensorFlow backend.


In [64]:
pin = Pincelate()

Later in the notebook, I'm going to use some of Jupyter Notebook's interactive features, so I'll import the libraries here:

In [66]:
import ipywidgets as widgets
from IPython.display import display
from ipywidgets import interact, interactive_output, Layout, HBox, VBox

### Sounding out and spelling

Pincelate is a machine learning model trained on the [CMU Pronouncing Dictionary](http://www.speech.cs.cmu.edu/cgi-bin/cmudict), a database of tens of thousands of English words along with their pronunciations. To get the pronunciation of a word:

In [68]:
pin.soundout("mimsy")

['M', 'IH1', 'M', 'S', 'IY0']

... and to produce a plausible spelling for a word whose sounds you just made up, use the `.spell()` method, passing it a list of Arpabet phonemes:

In [69]:
pin.spell(['B', 'L', 'AH1', 'R', 'F'])

'blurf'

It's important to note that Pincelate's `.soundout()` method will *only* work with letters that appear the CMU Pronouncing Dictionary's vocabulary. (You need to use lowercase letters only.) So the following will throw an error:

In [70]:
pin.spell("étui")

KeyError: 'é'

#### Spelling words from random phonemes

Let's invent somewhat plausible neologisms by drawing phonemes at random from the list of Arpabet phonemes. ("Neologism" is a fancy word for "made-up word.") Here's a list of all of the phonemes in the CMU Pronouncing Dictionary, plus examples in use:

        Phoneme Example Translation
        ------- ------- -----------
        AA      odd     AA D
        AE      at      AE T
        AH      hut     HH AH T
        AO      ought   AO T
        AW      cow     K AW
        AY      hide    HH AY D
        B       be      B IY
        CH      cheese  CH IY Z
        D       dee     D IY
        DH      thee    DH IY
        EH      Ed      EH D
        ER      hurt    HH ER T
        EY      ate     EY T
        F       fee     F IY
        G       green   G R IY N
        HH      he      HH IY
        IH      it      IH T
        IY      eat     IY T
        JH      gee     JH IY
        K       key     K IY
        L       lee     L IY
        M       me      M IY
        N       knee    N IY
        NG      ping    P IH NG
        OW      oat     OW T
        OY      toy     T OY
        P       pee     P IY
        R       read    R IY D
        S       sea     S IY
        SH      she     SH IY
        T       tea     T IY
        TH      theta   TH EY T AH
        UH      hood    HH UH D
        UW      two     T UW
        V       vee     V IY
        W       we      W IY
        Y       yield   Y IY L D
        Z       zee     Z IY
        ZH      seizure S IY ZH ER

The cell below has a Python list containing all of these phonemes:

In [71]:
all_phonemes = ['AH', 'N', 'S', 'IH', 'L', 'T', 'R', 'K', 'IY', 'D', 'M',
                'ER', 'Z', 'EH', 'AA', 'AE', 'B', 'P', 'OW', 'F', 'EY',
                'G', 'AO', 'AY', 'V', 'NG', 'UW', 'HH', 'W', 'SH', 'JH',
                'Y', 'CH', 'AW', 'TH', 'UH', 'OY', 'DH', 'ZH']

And then this function will return a random neologism, created from phonemes drawn at random from that list:

In [72]:
def neologism_phonemes():
    return [random.choice(all_phonemes) for item in range(random.randrange(3,10))]

Here's a handful, just to get a taste:

In [73]:
for i in range(5):
    print(neologism_phonemes())

['AA', 'AO', 'Y']
['AE', 'EY', 'UH', 'AE']
['CH', 'AE', 'W', 'SH', 'M']
['Z', 'N', 'IY', 'HH', 'UH', 'G']
['TH', 'TH', 'S', 'Z', 'K', 'M']


That's all well and good! Try sounding out some of these on your own (consult the [Arpabet](https://en.wikipedia.org/wiki/ARPABET) table to find the English sound corresponding to each symbol).

But how do you *spell* these neologisms? Why, with Pincelate's `.spell()` method of course:

In [75]:
pin.spell(neologism_phonemes())

'shoeeaili'

Here's a for loop that generates neologisms and prints them along with their spellings:

In [76]:
for i in range(12):
    phonemes = neologism_phonemes()
    print(pin.spell(phonemes), phonemes)

augued ['AO', 'EY', 'K', 'UW', 'D']
ttunbey ['TH', 'UW', 'N', 'B', 'EY']
kchung ['CH', 'NG', 'F', 'HH']
jzye ['ZH', 'SH', 'EY']
gnong ['G', 'N', 'F', 'N']
ororei ['OW', 'R', 'AO', 'R', 'AY']
eouch ['IH', 'AW', 'CH']
aussings ['AW', 'S', 'Y', 'NG', 'S']
dhm ['HH', 'D', 'M']
ashowachel ['AH', 'ZH', 'AW', 'AO', 'CH', 'IH', 'L']
apsuiou ['AA', 'P', 'ZH', 'UW', 'IY', 'EH', 'UW']
evepothm ['EY', 'EH', 'V', 'AH', 'P', 'M', 'TH']


### Phoneme features

The examples above use the phoneme as the basic unit of English phonetics. But each phoneme itself has characteristics, and many phonemes have characteristics in common. For example, the phoneme `/B/` has the following characteristics:

* *bilabial*: you put your lips together when you say it
* *stop*: airflow from the lungs is completely obstructed
* *voiced*: your vocal cords are vibrating while you say it

The phoneme `/P/` shares two out of three of these characteristics (it's *bilabial* and a *stop*, but is not voiced). The phoneme `/AE/`, on the other hand, shares *none* of these characteristics. Instead, it has these characteristics:

* *vowel*: your mouth doesn't stop or occlude airflow when making this sound
* *low*: your tongue is low in the mouth
* *front*: your tongue is advanced forward in the mouth
* *unrounded*: your lips are not rounded

These characteristics of phonemes are traditionally called "features." You can look up the features for particular phonemes using the `phone_feature_map` variable in Pincelate's `featurephone` module:

In [77]:
from pincelate.featurephone import phone_feature_map

For example, to get the features for the vowel `/UW/` (vowel sound in "toot"):

In [78]:
phone_feature_map['UW']

('hgh', 'bck', 'rnd', 'vwl')

The features are referred to here with short three-letter abbreviations. Here's a full list:

* `alv`: alveolar
* `apr`: approximant
* `bck`: back
* `blb`: bilabial
* `cnt`: central
* `dnt`: dental
* `fnt`: front
* `frc`: fricative
* `glt`: glottal
* `hgh`: high
* `lat`: lateral
* `lbd`: labiodental
* `lbv`: labiovelar
* `lmd`: low-mid
* `low`: low
* `mid`: mid
* `nas`: nasal
* `pal`: palatal
* `pla`: palato-alveolar
* `rnd`: rounded
* `rzd`: rhoticized
* `smh`: semi-high
* `stp`: stop
* `umd`: upper-mid
* `unr`: unrounded
* `vcd`: voiced
* `vel`: velar
* `vls`: voiceless
* `vwl`: vowel

Additionally, there are two special phoneme features:

* `beg`: beginning of word
* `end`: end of word

... which are found and the beginnings and endings of words.

Internally, Pincelate's model operates on these *phoneme features*, instead of directly on whole phonemes. This allows the model to capture and predict underlying similarities between phonemes.

Pincelate's `.phonemefeatures()` method works a lot like `.spell()`, except instead of returning a list of phonemes, it returns a [numpy](https://numpy.org/) array of *phoneme feature probabilities*. This array has one row for each predicted phoneme, and one column for the probability (between 0 and 1) of a phoneme feature being a component of each phoneme. To illustrate, here I get the feature array for the word `cat`:

In [79]:
cat_feats = pin.phonemefeatures("cat")

This array has the following shape:

In [80]:
cat_feats.shape

(5, 32)

... which tells us that there are five predicted phonemes. (The `32` is the total number of possible features.) The word `cat`, of course, has only three phonemes (`/K AE T/`)—the extra two are the special "beginning of the word" and "end of the word" phonemes at the beginning and end, respectively.

### Examining predicted phoneme features

Let's look at the feature probabilities for the first phoneme (after the special "beginning of the word" token at index 0):

In [81]:
cat_feats[1]

array([6.42707571e-04, 2.13692928e-07, 6.62605757e-08, 5.43442347e-10,
       5.47038814e-09, 7.04440527e-06, 1.58982238e-09, 1.66211791e-08,
       3.81101599e-05, 8.24350354e-05, 1.62252746e-07, 5.46323768e-08,
       1.41502560e-10, 5.33169420e-09, 7.31331828e-10, 2.70081146e-05,
       1.83614669e-04, 1.62359720e-05, 2.74244065e-11, 1.44446346e-07,
       3.33543511e-07, 1.91042790e-08, 3.52445828e-09, 4.54965146e-07,
       9.99929667e-01, 7.26780854e-05, 8.35576885e-10, 2.66875286e-04,
       1.75827936e-05, 9.99930263e-01, 9.99974251e-01, 1.87013138e-04])

You can look up the index in this array associated with a particular phoneme feature using Pincelate's `.featureidx()` method:

In [82]:
cat_feats[1][pin.featureidx('vel')]

0.9999302625656128

This tells us that the `vel` (velar) feature for this phoneme is predicted with almost 100% probability—which makes sense, since the phoneme we'd anticipate—`/K/` is a voiceless velar stop.

The following bit of code steps through each row in this array and prints out the phoneme features with the highest probability in that row, using numpy's `argsort` function:

In [83]:
def idxfeature(pin, idx):
    return pin.orth2phon.target_vocab[idx]
for i, phon in enumerate(cat_feats):
    print("phoneme", i)
    for idx in np.argsort(phon)[::-1][:5]:
        print(idxfeature(pin, idx), phon[idx])
    print()

phoneme 0
beg 1.0
vwl 0.0
vls 0.0
apr 0.0
bck 0.0

phoneme 1
vls 0.999974250793457
vel 0.9999302625656128
stp 0.999929666519165
alv 0.000642707571387291
unr 0.00026687528588809073

phoneme 2
unr 0.9997866749763489
vwl 0.9990422129631042
str 0.9986899495124817
fnt 0.9959463477134705
low 0.9807271957397461

phoneme 3
vls 0.9993033409118652
alv 0.9990631937980652
stp 0.9904974102973938
frc 0.0036416002549231052
end 0.0013078120537102222

phoneme 4
end 0.9997904896736145
fnt 0.0006787743768654764
vwl 0.000589678471442312
unr 0.0005248847301118076
str 0.0003406509349588305



We'll come back to a more complete example that shows how to *manipulate* these values below.

### Example: Resizing feature probability arrays

Once you have the phonetic feature probability arrays, you can treat them the same way you'd treat any other numpy array. One thing I like to do is use scipy's image manipulation functions and use them resample the phonetic feature arrays. This lets us use the same phonetic information to spell a shorter or longer word. In particular, `scipy.ndimage.interpolation` has a handy [zoom](https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.ndimage.interpolation.zoom.html) function that resamples an array and interpolates it. Normally you'd use this to resize an image, but nothing's stopping us from using it to resize our phonetic feature array.

First, import the function:

In [84]:
from scipy.ndimage.interpolation import zoom

Then get some phoneme feature probabilities:

In [85]:
feats = pin.phonemefeatures("alphabet")

Then resize with `zoom()`. The second parameter to `zoom()` is a tuple with the factor by which to scale the dimensions of the incoming array. We only want to scale along the first axis (i.e., the phonemes), keeping the second axis (i.e., the features) constant.

A shorter version of the word:

In [86]:
shorter = zoom(feats, (0.67, 1))
pin.spellfeatures(shorter)

'albaigh'

A longer version:

In [87]:
longer = zoom(feats, (2.0, 1))
pin.spellfeatures(longer)

'all-phafebet'

In [88]:
def stretch_words(s, factor=1.0):
    out = []
    for word in s.split():
        word = word.lower()
        vec = pin.phonemefeatures(word)
        if factor < 1.0:
            order = 3
        else:
            order = 0
        zoomed = zoom(vec, (factor, 1), order=order)
        out.append(pin.spellfeatures(zoomed))
    return " ".join(out)
stretch_words("this is a test", factor=1.5)



'theothists aise ah ttestsed'

If you've downloaded this notebook and you're following along running the code, the following cell will create an interactive widget that lets you "stretch" and "shrink" the words that you type into the text box by dragging the slider.

In [89]:
import warnings
warnings.filterwarnings('ignore')
@interact(words="in the beginning was the notebook", factor=(0.1, 4.0, 0.1))
def stretchy(words, factor=1.0):
    print(stretch_words(words, factor))

interactive(children=(Text(value='in the beginning was the notebook', description='words'), FloatSlider(value=…

### Round-trip spelling manipulation

Pincelate actually consists of *two* models: one that knows how to sound out words based on how they're spelled, and another that knows how to spell words from sounds. Pincelate's `.manipulate()` function does a "round trip" re-spelling of a word, passing it through both models to return back to the original word. Try it out:

In [102]:
pin.manipulate("poetic")

'poetic'

On the surface, this isn't very interesting! You don't need Pincelate to tell you how to spell a word that you already know how to spell. But the `.manipulate()` has a handful of parameters that allow you to mess around with the model's internal workings in fun and interesting ways. The first is the `temperature` parameter, which artificially increases or decreases the amount of randomness in the model's output probabilities.

#### Spelling temperature

When the temperature is close to zero, the model will always pick the most likely spelling of the word at each step.

In [103]:
pin.manipulate("poetic", temperature=0.01)

'poetic'

As you increase the temperature to 1.0, the model starts picking values at random according to the underlying probabilities.

In [104]:
pin.manipulate("poetic", temperature=1.0)

'poetic'

At temperatures above 1.0, the model has a higher chance of picking from letters with lower probabilities, producing a more unlikely spelling:

In [105]:
pin.manipulate("poetic", temperature=1.5)

'poettecke'

At a high enough temperature, the model's spelling feels essentially random:

In [106]:
pin.manipulate("poetic", temperature=3.0)

'niquit'

The following interactive widget lets you play with the `temperature` parameter:

In [107]:
@interact(s="your text here", temp=(0.05, 2.5, 0.05))
def tempadjust(s, temp):
    return ' '.join([pin.manipulate(w.lower(), temperature=temp) for w in s.split()])

interactive(children=(Text(value='your text here', description='s'), FloatSlider(value=1.2500000000000002, des…

#### Manipulating letter and phoneme frequencies

The `manipulate` method can take two other parameters: `letters` and `features`. These are dictionaries that map letters or phonetic features to exponential multipliers. When Pincelate is spelling the word, it uses these multipliers to adjust the probability of the corresponding letters in the output. Somewhat unintuitively, positive values reduce the corresponding probability, while negative values increase the probability.

Here's an example to make it more clear. First: respelling a word without the letter `e`:

In [108]:
pin.manipulate("spelling", letters={'e': 10})

'spilling'

Let's do this for a set of randomly selected words from the noun list:

In [112]:
for noun in random.sample(nouns, 10):
    print(noun, pin.manipulate(noun, letters={'e': 20}))

technology taknology
billing billing
righteousness rightions
wardrobe wardrob
bonding bonding
underwear undorwhir
definition difinition
ballet ballatt
formality formality
posting posting


The `features` parameter does the same thing, except it adjusts the probability of particular phoneme features at each step. For example, this makes words more nasal:

In [122]:
pin.manipulate("spelling", features={'nas': -10})

'scnenning'

The following code makes all of the vowels more rounded and further back in the mouth in a list of random nouns:

In [123]:
for noun in random.sample(nouns, 10):
    print(noun, pin.manipulate(noun, features={'bck': -2, 'rnd': -5}))

transmitter traunswaumor
sloth slouthough
inauguration unauguatuon
tracing twaussinug
grocer grossur
culprit kulpurut
witchcraft wocksfaud
detention deutanuon
larceny laursunu
annuity anuunu


#### Interactive manipulation tool

The following cells make an interactive tool you can use to play around with manipulating temperature, letter probabilties and phoneme probabilities interactively.

In [124]:
import ipywidgets as widgets
from IPython.display import display
from ipywidgets import interact, interactive_output, Layout, HBox, VBox

In [125]:
def manipulate(instr="allison", temp=0.25, **kwargs):
    return ' '.join([
        pin.manipulate(
            w,
            letters={k: v*-1 for k, v in kwargs.items()
                  if k in pin.orth2phon.src_vocab_idx_map.keys()},
            features={k: v*-1 for k, v in kwargs.items()
                      if k in pin.orth2phon.target_vocab_idx_map.keys()},
            temperature=temp
        ) for w in instr.split()]
    )

In [126]:
orth_sliders = {}
phon_sliders = {}
for ch in pin.orth2phon.src_vocab_idx_map.keys():
    if ch in "'-.": continue
    orth_sliders[ch] = widgets.FloatSlider(description=ch,
                               continuous_update=False,
                               value=0,
                               min=-20,
                               max=20,
                               step=0.5,
                               layout=Layout(height="10px"))
for feat in pin.orth2phon.target_vocab_idx_map.keys():
    if feat in ("beg", "end", "cnt", "dnt"): continue
    phon_sliders[feat] = widgets.FloatSlider(description=feat,
                               continuous_update=False,
                               value=0,
                               min=-20,
                               max=20,
                               step=0.5,
                               layout=Layout(height="10px"))
instr = widgets.Text(description='input', value="spelling words with machine learning")
tempslider = widgets.FloatSlider(description='temp', continuous_update=False, value=0.3, min=0.01, max=5, step=0.05)
left_box = VBox(tuple(orth_sliders.values()) + (tempslider,))
right_box = VBox(tuple(phon_sliders.values()))
all_sliders = HBox([left_box, right_box])

out = interactive_output(lambda *args, **kwargs: print(manipulate(*args, **kwargs)),
                         dict(instr=instr, temp=tempslider, **orth_sliders, **phon_sliders))
out.layout.height = "100px"
display(VBox([all_sliders, instr]), out)

VBox(children=(HBox(children=(VBox(children=(FloatSlider(value=0.0, continuous_update=False, description='$', …

Output(layout=Layout(height='100px'))

### Phonetic states

The Pincelate model also produces a "hidden state," which is a single fixed-size vector that represents the total sound of a word. (You can think of this as a point on a Cartesian plane, where words with similar sounds are clustered next to each other.) To get the hidden state of a word, call the `.phonemestate()` method:

In [129]:
pin.phonemestate('abracadabra')

array([ 7.95686364e-01, -8.32179904e-01, -1.32981718e+00,  7.25831270e-01,
       -2.64316416e+00,  1.57794631e+00, -1.49719226e+00,  2.60457993e+00,
       -3.31631720e-01, -6.20785542e-02, -1.07942343e+00, -9.35500801e-01,
        1.13087571e+00, -2.40438804e-02, -3.28609198e-01,  2.97865009e+00,
        5.29175103e-01,  1.03818035e+00, -1.86510909e+00,  1.05075657e+00,
        1.13979602e+00,  2.85125399e+00, -6.54058456e-01,  5.91307104e-01,
        4.18249458e-01,  4.07120883e-01,  2.90681601e-01, -2.21350479e+00,
        6.69969380e-01, -6.35705888e-01, -1.40898752e+00,  1.23353994e+00,
       -4.64894950e-01, -5.61830521e-01, -2.65465081e-01,  6.93497515e+00,
        2.54075122e+00, -3.86470616e-01,  7.37920403e-01, -2.52454400e-01,
        1.13615263e+00,  1.07363796e+00, -3.24268669e-01,  2.30040264e+00,
        1.46473849e+00, -2.06925702e+00, -1.03245997e+00, -1.25596628e-01,
       -1.65496230e+00, -4.91467148e-01, -5.36341250e-01,  4.08115983e-01,
        1.84644151e+00, -

This is a big weird number (a 256-dimensional vector, to be specific) that doesn't seem meaningful on its own. But we can do some interesting things with it.

#### Blending words

In [146]:
def blend(a, b):
    factor = 0.5
    start = pin.phonemestate(a)
    end = pin.phonemestate(b)
    return pin.spellstate(((start*factor) + (end*(1-factor))))
blend('paper', 'plastic')

'paceter'

In [148]:
for i in range(10):
    worda = random.choice(nouns)
    wordb = random.choice(nouns)
    print(worda, " → ", blend(worda, wordb), " → ", wordb)

unification  →  cenificary  →  cemetery
prophecy  →  parpathey  →  apartheid
concur  →  quante  →  quantity
computing  →  compedings  →  proceedings
serenity  →  soriging  →  lodging
riches  →  bhains  →  bones
vegetation  →  vregantas  →  drunkenness
archery  →  harte  →  height
lineage  →  lilitray  →  illustrator
allegiance  →  alegeon  →  neighbour


### Variants with noise

> "[M]agic spells come in a wide variety [...]. [W]hat seems to be most important is the sound, which is often based on alliterations and homophones. The use of sounds prompts a series of variations on a single word, such as, "festella, festelle, festelle festelli festello festello, festella festellum," used to banish all kinds of fistulas." (Lecouteux xix)

In [153]:
state = pin.phonemestate("abracadabra")
pin.spellstate(state + np.random.randn(256))

'avrhokadamba'

In [163]:
def noisy(word, factor=1.0):
    state = pin.phonemestate(word) + np.random.randn(256) * factor
    return pin.spellstate(state)
noisy("allison", 0.5)

'allison'

In [164]:
for i in range(5, 25):
    print(noisy("abracadabra", i * 0.1))

abracadaba
abrakadaba
abrakadaka
abrocadabar
avr-karabera
apraqabakada
habhakabara
abrokalabtabar
abrocagymada
apriqiaviabe
egreckava
arbhemrawabhip
appgplazabaaaba
bbrobodabaridy
a
crccepectabdada
akwawrawanabradlum
ahlchlcheceah
ggrligdmba
alrrkldkrirhirhm


## Bibliography

Crain, Patricia. *The Story of A: The Alphabetization of America from the New England Primer to the Scarlet Letter*. Stanford University Press, 2000.

Eco, Umberto. *The Search for the Perfect Language*. Blackwell, 1997.

Jepsen, Carly R. “The sound.” *Dedicated*. By Jepsen, Carly R., et al, 2019. Digital release.

Lecouteux, Claude. *Dictionary of Ancient Magic Words and Spells: From Abraxas to Zoar.* First U.S. edition, Inner Traditions, 2015.

Leggott, Michele J. *Reading Zukofsky’s 80 Flowers*. Johns Hopkins University Press, 1989.

Richardson, Ernest Cushing, and Bernhard Pick, editors. *The Ante-Nicene Fathers: Translations of the Writings of the Fathers down to A.D. 325.* C. Scribner’s sons, 1905.

Skemer, Don C. *Binding Words: Textual Amulets in the Middle Ages.* Penn State Press, 2010.