# Bobey Dig

By [Allison Parrish](http://www.decontextualize.com/) for [NaNoGenMo 2019](https://github.com/NaNoGenMo/2019)

Another NaNoGenMo entry: *Moby Dick* rewritten with [Pincelate](https://github.com/aparrish/pincelate/) so that it sounds like the narrator has a head cold.

How it's done:

In [1]:
import re

I'm using `pronouncing` to check for known words with known pronunciations and `pincelate` to respell.

In [2]:
from pincelate import Pincelate
import pronouncing as pr

Using TensorFlow backend.


In [3]:
pin = Pincelate()

The `headcold` function uses Pincelate's `.manipulate()` method to change the probabilities of certain phonetic features occurring in the sounded-out text, then respells the word from the resulting manipulation. In this case, I'm eliminating nasal features, slightly turning down fricatives and voicelessness, and boosting stops and voicing and the probability of the end of the word occurring. The intuition here is that when you have a cold, your nose is stuffy, so consonants that are normally nasal will turn out as the voiced stop at that same point of articulation (e.g., `/m/` becomes `/b/`). Sustained fricatives, especially those requiring high air flow, are also less likely owing to the reduced general capacity of the pulmonary system when inflamed. I found the values for the `features` parameter through trial and error; they seem to work okay.

In [4]:
def headcold(s):
    return pin.manipulate(
        s,
        features={
            'nas': 500,
            'stp': -2.3,
            'frc': 2.5,
            'vcd': -2.8,
            'vls': 10,
            'end': -1.7},
        temperature=0.15)

The `rewrite` function takes a string and looks it up with `pronouncing`; if it's not in the dictionary, or if it *is* in the dictionary and contains certain target sounds, it returns the headcold version; otherwise, it returns the original string. The point here is to leave unchanged any words that shouldn't have their spelling changed by the `headcold` function.

In [5]:
def rewrite(s):
    phones = pr.phones_for_word(s)
    if len(phones) == 0 or re.search(r"[MNVZKPT]", phones[0]):
        return headcold(s)
    else:
        return s
[rewrite(w) for w in "moby dick call me ishmael I am my own cold circulation thank you very much".split()]

['bobey',
 'dig',
 'gall',
 'be',
 'ishbell',
 'I',
 'aigh',
 'bied',
 'owed',
 'goeld',
 'curgulatione',
 'thag',
 'you',
 'veighe',
 'bud']

I need everything in lowercase for `pincelate` and `pronouncing` to work, but I want the replaced words to maintain their original case. This function takes a parameter `s` and matches it to the "case pattern" of string `t`. I'll use it below to transform the modified strings returned from `.modify()` to match the case of the word they were transforming.

In [6]:
def matchcase(s, t):
    t = t + (t[-1] * max(0, len(s) - len(t)))
    return ''.join([ch1.upper() if ch2.isupper() else ch1 for ch1, ch2 in zip(s, t)])
print(matchcase("hello", "Hello"))
print(matchcase("hello there", "Hello"))
print(matchcase("hello", "Hell"))
print(matchcase("hello", "HELL"))

Hello
Hello there
Hello
HELLO


Finally, the callback for `re.sub` matches the case of the rewrite of the matched group:

In [7]:
def replace(match):
    return matchcase(rewrite(match.group().lower()), match.group())

And now put it all together, performing the replacement on every line of *Moby Dick*. This might take a while! (Make sure you've [downloaded a copy first](https://www.gutenberg.org/ebooks/2489) and put the text file in the same directory as this notebook.)

In [8]:
orig = []
out = []
for i, line in enumerate(open("./pg2489.txt")):
    if i % 1000 == 0:
        print(i)
    line = line.strip()
    transformed = re.sub(r"\b[A-Za-z\']+\b", replace, line)
    out.append(transformed)
    orig.append(line)

0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
21000
22000
23000


Let's preview:

In [9]:
out[:25]

['\ufeff',
 'BOBEY DIG;',
 'OR THE WHALE',
 '',
 'by Herbes Belville',
 '',
 '',
 '',
 '',
 'CHABTER 1',
 '',
 'Lubiz',
 '',
 '',
 '',
 'Gall beigh Ishbell.  Sub years ago--dever bied how laugh prejised--',
 'havig lidtle or doed buhdie id bied purs, ad tothig baudigular',
 'doed indered be ohd shore, I thoughed I would sail abough a lidtle',
 'ad see the wadrey bard ove the world.  Id eighed a way I haved',
 'ohd drivig off the spleed ad regulating the curgulatione.',
 'Whender I fied bizlev groig grib abowed the bouthe;',
 'whender id eighed a dab, dridge Voveber id bied soul; whender I',
 'fied bizelve ivludersed bosge before covied werghouses,',
 'ad brigg ub the rear ohd every vuderle I beed;',
 'ad aescheabley whender bied highboeds ged such ad upper hadd ohd be,']

Looks good. I'm going to make two versions: the first with just the output, the second with paired lines for comparison.

In [10]:
with open("bobey-dig.txt", "w") as fh:
    fh.write("\n".join(out))

In [11]:
with open("bobey-dig-annotated.txt", "w") as fh:
    for orig_line, out_line in zip(orig, out):
        fh.write(orig_line + "\n" + out_line + "\n\n")