# Phonological Representations and Rules in CLTK

## Phonological Features

We begin by importing a subclass of cltk.phonology.orthophonology for Old English.

In [1]:
from cltk.phonology.old_english.orthophonology import *

Phonological features are represented by Enum classes.  A feature is one such class, and the possible values of the feature are the members of the enumeration.

Conventionally, binary features have members `{neg, pos}`.

In [2]:
Voiced, Voiced.pos, [feature_value for feature_value in Voiced]

(<enum 'Voiced'>, <Voiced.pos: 2>, [<Voiced.neg: 1>, <Voiced.pos: 2>])

In [3]:
[feature_value for feature_value in Manner]

[<Manner.stop: 1>,
 <Manner.fricative: 2>,
 <Manner.affricate: 3>,
 <Manner.nasal: 4>,
 <Manner.lateral: 5>,
 <Manner.trill: 6>,
 <Manner.spirant: 7>,
 <Manner.approximant: 8>]

When a phonological feature is relevant to the sonority of a phoneme, the values must be ordered by *increasing sonority*, as in the previous example.

Features may be compared for identity and (in effect) sonority.

In [4]:
Manner.spirant > Manner.trill, Manner.fricative == Manner.fricative, Manner.fricative == Voiced.pos

(True, True, False)

## Phonemes

The module provides an object `oe` for Old English ortho-phonology.  It contains, for one, a sound inventory of the language: a list of phonemes.

In [5]:
oe.sound_inventory

[IPA:m Consonantal.pos Place.bilabial Manner.nasal Voiced.pos Aspirated.neg Geminate.neg,
 IPA:n Consonantal.pos Place.alveolar Manner.nasal Voiced.pos Aspirated.neg Geminate.neg,
 IPA:n̥ Consonantal.pos Place.alveolar Manner.nasal Voiced.neg Aspirated.neg Geminate.neg,
 IPA:ŋ Consonantal.pos Place.velar Manner.nasal Voiced.pos Aspirated.neg Geminate.neg,
 IPA:p Consonantal.pos Place.bilabial Manner.stop Voiced.neg Aspirated.neg Geminate.neg,
 IPA:b Consonantal.pos Place.bilabial Manner.stop Voiced.pos Aspirated.neg Geminate.neg,
 IPA:t Consonantal.pos Place.alveolar Manner.stop Voiced.neg Aspirated.neg Geminate.neg,
 IPA:d Consonantal.pos Place.alveolar Manner.stop Voiced.pos Aspirated.neg Geminate.neg,
 IPA:k Consonantal.pos Place.velar Manner.stop Voiced.neg Aspirated.neg Geminate.neg,
 IPA:g Consonantal.pos Place.velar Manner.stop Voiced.pos Aspirated.neg Geminate.neg,
 IPA:t͡ʃ Consonantal.pos Place.post_alveolar Manner.affricate Voiced.neg Aspirated.neg Geminate.neg,
 IPA:d͡ʒ Cons

There is also an alphabet, a mapping from orthographic symbols to phonemes.  Obviously this only makes sense for alphabetic orthographies!

In [6]:
oe.alphabet

{'a': IPA:ɑ Consonantal.neg Height.open Backness.back Roundedness.neg Length.short,
 'ā': IPA:ɑ: Consonantal.neg Height.open Backness.back Roundedness.neg Length.long,
 'æ': IPA:æ Consonantal.neg Height.open Backness.front Roundedness.neg Length.short,
 'ǣ': IPA:æ: Consonantal.neg Height.open Backness.front Roundedness.neg Length.long,
 'b': IPA:b Consonantal.pos Place.bilabial Manner.stop Voiced.pos Aspirated.neg Geminate.neg,
 'c': IPA:k Consonantal.pos Place.velar Manner.stop Voiced.neg Aspirated.neg Geminate.neg,
 'ċ': IPA:t͡ʃ Consonantal.pos Place.post_alveolar Manner.affricate Voiced.neg Aspirated.neg Geminate.neg,
 'd': IPA:d Consonantal.pos Place.alveolar Manner.stop Voiced.pos Aspirated.neg Geminate.neg,
 'ð': IPA:ð Consonantal.pos Place.dental Manner.fricative Voiced.pos Aspirated.neg Geminate.neg,
 'e': IPA:e Consonantal.neg Height.mid Backness.front Roundedness.neg Length.short,
 'ē': IPA:e: Consonantal.neg Height.mid Backness.front Roundedness.neg Length.long,
 'f': IPA:f 

A phoneme is a bundle of phonological features.  Concrete forms also have an IPA symbol.

In [7]:
a

IPA:ɑ Consonantal.neg Height.open Backness.back Roundedness.neg Length.short

A phoneme can be queried for a value of a feature.  If unspecified for the feture, `None` is returned.

In [8]:
a[Height]

<Height.open: 7>

In [9]:
print(a[Manner])

None


New phonemes are created using the `phoneme` function, and in some contexts are automatically induced by bare feature values or lists of feature values.

The `phoneme` function accepts one or more lists of feature values, or variable numbers of feature_values as arguments:

In [10]:
phoneme(Manner.stop, Place.palatal), \
phoneme([Manner.stop, Place.palatal]), \
phoneme([Manner.stop], Place.palatal)

(Manner.stop Place.palatal,
 Manner.stop Place.palatal,
 Manner.stop Place.palatal)

Python's operator overloading (a simulation of the C++ mechanism) has been grossly abused in order to provide syntactic sugar for various operations an phonemes.  (Unfortunately Python provides only a limited set of overloadable operators,so the symbols aren't always perfectly symmetrical in use.)

The > and < operators compare phonemes in terms of their *sonority*:

In [11]:
a < t, \
a > t, \
a > e, \
a < e

(False, True, True, False)

The << operator *merges* the features of the right-hand argument into *a copy* of the phoneme on the left.

The object on the right can be a full phoneme, a list of features, or just an individual feature.

In the examples below, the first operation simply generates a copy of `a`, the second is a silly copy of `t`, and the second two are changes to the vowel quality of `a`.

In [12]:
a << a, \
a << t, \
a << [Backness.central, Height.close], \
a << Backness.central

(IPA:ɑ Consonantal.neg Height.open Backness.back Roundedness.neg Length.short,
 IPA:t Consonantal.pos Place.alveolar Manner.stop Voiced.neg Aspirated.neg Geminate.neg,
 IPA:ɑ Consonantal.neg Height.close Backness.central Roundedness.neg Length.short,
 IPA:ɑ Consonantal.neg Height.open Backness.central Roundedness.neg Length.short)

The <= and => operators are non-symmetric matching operators.  In effect they are subset relations on feature bundles.

A <= B is true if B contains all the features in A.  A >= B is true if A contains all the features in B.  Again on the right one can use full phonemes, or lists of features, or just a bare feature.  On the left one cannot use a list (Python won't let me overload built-in methods), so if listing several features on the left, the `phoneme` function must be used.

In [13]:
Backness.back <= a, \
a >= [Backness.back, Height.open], \
phoneme(Backness.back, Height.open) <= a,\
e <= a

(True, True, True, False)

## Phonological Rules

### Rule creation

The >> operator on phonemes creates rules.

In [14]:
rule1 = Length.short >> Length.long
rule1

<cltk.phonology.orthophonology.PhonologicalRule at 0x7fc4cf698e80>

A rule contains a *condition* and an *action*.

The condition for `rule1` is that a phoneme be specified for `Length.short`.  The action is to lengthen the vowel.

We can apply the rule as is immediately to a list of phonemes and a position in the list, without checking the condition:

In [15]:
rule1( [a] ,0)

IPA:ɑ Consonantal.neg Height.open Backness.back Roundedness.neg Length.long

In [16]:
rule1( [b] , 0)

IPA:b Consonantal.pos Place.bilabial Manner.stop Voiced.pos Aspirated.neg Geminate.neg Length.long

The result of the second application is of course nonsense, since `Length` is not a feature of consonants.  We ought to instead consult the condition:

In [17]:
rule1.check_environment( [a], 0 ), rule1.check_environment( [b], 0)

(True, False)

So the rule ought to only be applied in the first case:

In [18]:
def apply_rule(rule, phonemes, pos):
    return rule(phonemes, pos) if rule.check_environment(phonemes, pos) else phonemes[pos]

apply_rule(rule1, [a], 0), apply_rule(rule1, [b], 0)


(IPA:ɑ Consonantal.neg Height.open Backness.back Roundedness.neg Length.long,
 IPA:b Consonantal.pos Place.bilabial Manner.stop Voiced.pos Aspirated.neg Geminate.neg)

Note: the action of a rule can also specify a list of phonemes.

In [19]:
apply_rule(a >> [a, e], [a], 0)

[IPA:ɑ Consonantal.neg Height.open Backness.back Roundedness.neg Length.short,
 IPA:e Consonantal.neg Height.mid Backness.front Roundedness.neg Length.short]

The *left* side of the rule generator can also be an expression of the form: phoneme1 // phoneme2 // ...
This creates a disjunction, a list of possible matches.  

In [20]:
rule2 = a // e >> [Length.long]  # disjunctive list on the left

apply_rule(rule2, [a], 0), \
apply_rule(rule2, [e], 0), \
apply_rule(rule2, [u], 0)

(IPA:ɑ Consonantal.neg Height.open Backness.back Roundedness.neg Length.long,
 IPA:e Consonantal.neg Height.mid Backness.front Roundedness.neg Length.long,
 IPA:u Consonantal.neg Height.close Backness.back Roundedness.pos Length.short)

Note that the rule only applied to `a` and `e`, but not to `o`.  

### Environments

The `-` operator on phonemes creates environmental specifications.  Concretely, it returns a boolean *function* of the contents of the phonemes immediately preceding and following the target phoneme.

The `-` sign is intended to be iconically related to the underscore used by phonologists.

In [21]:
env = a - x
env

<function cltk.phonology.orthophonology.AbstractPhoneme.__sub__.<locals>.<lambda>(before, _, after)>

Since the environment is just a function, we can call it:

In [22]:
env(a, _, x), env(a, _, y)

(True, False)

The left and right may contain full phonemes, or, as before, feature values or lists of feature values (right side only).

In [23]:
env = Backness.back - [Backness.front, Height.close]
env(a, _, i), env(a, _, e)

(True, False)

### Conditional Rules

The bar `|` operator on a phonological rule adds the environment on the right of the operator to the conditions on the left.

In [24]:
rule2 = Consonantal.pos >> Voiced.pos
rule2 = rule2 | env

oe.rules = [rule2]
oe.transcribe('afi'), oe.transcribe('afe')

('ɑvi', 'ɑfe')

Rule2 says that a consonant is voiced in the environment specified by `env` above: when preceded by a back vowel and followed by a high front vowel.  As we can see, the rule fires for `afi` but not `afe`, since `e` is a mid vowel.

### Special (pseudo-)phonemes

There are three special identifiers for use in envionments:

* ANY : matches anything, including word boundaries
* W : matches word boundaries (replacing the # of the standard notation)
* S : matches syllable boundaries

Thus the rule:

In [25]:
a >> Length.long | W - ANY

<cltk.phonology.orthophonology.PhonologicalRule at 0x7fc4cf633898>

Determines that `a` is always lengthened when word-initial.

### Examples

Here are a few ortho-phonological rules for Old English.  Mostly they are simply phonological rules, but the last one, relating to the digraph `sc`, also involves the orthography.


In [26]:
oe.rules = [
    # intervocalic /f/, /s/, and /θ/ are vocalized
    f // s // th >> Voiced.pos | Consonantal.neg - Consonantal.neg,
    
    # /g/ is fricativized when intervocalic
    g >> y | Consonantal.neg - Consonantal.neg,
    g >> y | Voiced.pos - Voiced.pos,

    # word-initial h is just /h/
    h >> h | W - ANY,

    # /h/ is palatized after a front vowel
    h >> ch | Backness.front - ANY,

    # elsewhere for h
    h >> x,

    # 'sc' is *not* a digraph after a back vowel
    sh >> s // k | Backness.back - ANY
]
len(oe.rules)

7

In [28]:
oe('scip'), oe('ascian'), oe('alafed'), oe('cæg')

('ʃip', 'ɑskiɑn', 'ɑlɑved', 'kæg')