# Code for Chapter 3: Learning Languages

This notebook provides the code written for and used in the Chapter 3 of my dissertation **_SigmaPie_ for subregular and subsequential grammar induction**. All the links will be added soon. :)

# Generators and evaluators: the setup for the experiments

## Step 1: loading dependencies, including _SigmaPie_

In [1]:
import codecs
from random import choice, randint
from pprint import pprint

In [2]:
# accessing SigmaPie toolkit: I know, horrible!
# I promise I'll make it a package soon
%cd local_sigmapie/code/
from main import *
%cd ../..

/home/alenaks/subregular-experiments/local_sigmapie/code

You successfully loaded SigmaPie. 

Formal language classes and grammars available:
	* strictly piecewise: SP(alphabet, grammar, k, data, polar);
	* strictly local: SL(alphabet, grammar, k, data, edges, polar);
	* tier-based strictly local: TSL(alphabet, grammar, k, data, edges, polar, tier);
	* multiple tier-based strictly local: MTSL(alphabet, grammar, k, data, edges, polar).

Alternatively, you can initialize a transducer: FST(states, sigma, gamma, initial, transitions, stout).
Learning algorithm:
	OSTIA: ostia(sample, sigma, gamma).
/home/alenaks/subregular-experiments


## Step 2: defining general harmonic evaluator

Here, I will talk about the artificial harmonic generator that I will be using throughout Chapters 3 and 4 of my dissertation.
It can generate two types of samples:

* Samples of **well-formed words**, i.e. words that don't violate the rules of the harmony; and
* Samples of **underlying -> surface forms**, i.e. pairs where the first member has only the first value of every harmonic class specified (i.e. the feature that needs to be spread is given), and all consecutive members of the same class are masked as the name of that class.

### Parameters of the generator

List of the parameters that are available:

* number of strings to be generated;
* harmonic classes and their members (harmonic class is a class of segments that don't co-occur unless there is a blocker in-between them);
* minimal and maximal cluster length of each of the harmonic classes;
* blockers and the new domain that they introduce;
* a probability of observing a blocker (1 / n, where n is a parameter): basically means "every n-th cluster will be the blocker".

In [3]:
class Harmony(object):
    """
    Class defining the toy generator for the harmonic datasets.
    
    Attributes:
        cl_members (dict): dictionary of the type {(harmonic_class_1):class_id_1,
            (harmonic_class_2):class_id_2, ...} that contains info about the present
            harmonic classes. Note that the transparent element can be encoded by 
            a harmonic class containing a single element.
            Example: {("a", "o"):"A", ("b", "p"):"B", ("c"):"C"}
        cl_lengths (dict): dictionary of the type {class_id:(min_len, max_len)},
            where min_len and max_len denote the min and max len of the cluster
            made out of elements of class_id.
            Example: {"A":(1, 3), "B":(2, 4), "C":(4, 8)}
        blockers (dict): dictionary of the type {"b_1":"u_1", "b_2":"u_2", ...} where
            "b" is the blocker, and "u" is the newly introduced value.
            Example: {"t":"p"}
        blocker_prob (int): a chance of observing a blocker, the P evaluates from
            (1/blocker_prob).
            Example: 5
    """
    def __init__(self, cl_members, cl_lengths = None, blockers = None, blocker_prob = 5):
        """
        Init function for the Harmony class.
        """
        self.cl_members = cl_members
        if cl_lengths is not None:
            self.cl_lengths = cl_lengths
        else:
            self.cl_lengths = {i:(1, 3) for i in self.cl_members.values()}
        self.blockers = blockers
        self.blocker_prob = blocker_prob
        

        
    def generate_words(self, n = 3, length = 10):
        """
        Generates n strings of a given length.
        
        Arguments:
            n (int): how many strings need to be generated;
            length (int): length of the strings.
            
        Returns:
            list[str]: n generated strings.
        """
        # check if the harmony rules are well-formed
        if not self._verify_classes():
            raise("Cannot generate dataset: the sets are overlapping.")
            
        # unpack the dictionary for a quicker lookup
        unpacked = self._unpack_classes()
        transparent = self._transparent()
        generated = [self._generate(unpacked, length) for i in range(n)]
        return generated
    

    def generate_pairs(self, n = 3, length = 10):
        """
        Generates n pairs of strings of a given length.
        
        Arguments:
            n (int): how many strings need to be generated;
            length (int): length of the strings.
            
        Returns:
            list[tuple[str]]: n generated pairs of strings.
        """
        transparent = self._transparent()
        outputs = self.generate_words(n, length)
        inputs = self._mask_words(outputs, transparent)
        return list(zip(inputs, outputs))
        
        
    def _generate(self, unpacked, length):
        """
        Generates a set of strings; helper function.
        
        Output type: list[str]
        """
        
        # initialize the specifications of this particular string
        string = ""
        specs = self._specify()
        
        while len(string) < length:
            
            
            # check if we can now output the blocker
            if self.blockers is not None:
                while randint(1, self.blocker_prob) == 1:
                    b = choice(list(self.blockers))
                    string += b
                    
                    if len(string) == length:
                        return string
                    
                    # rewrite the specification because of the blocker
                    if self.blockers[b] not in specs:
                        for spec in specs:
                            if unpacked[spec] == unpacked[self.blockers[b]]:
                                specs.remove(spec)
                                specs.append(self.blockers[b])
                                break
                                
            # make sure that we don't generate cluster of the same
            # harminic set as the previous one
            if len(string) > 0:
                change = string[-1] in unpacked
            else:
                change = False
            
            # select and add new possible character as many times as
            # cl_lengths indicate
            if not change:
                newchar = choice(specs)
            else:
                collection = [i for i in specs]
                collection.remove(string[-1])
                newchar = choice(collection)
            freq_b, freq_e = self.cl_lengths[unpacked[newchar]]
            string += newchar * randint(freq_b, freq_e)
            
            # output
            if len(string) > length:
                string = ""
            elif len(string) == length:
                return string
            
            
    def _mask(self, string, transparent):
        """
        Masks all non-initial mentions of the specified allophone: helper function.
        
        Output type: str
        """
        classes = {i:False for i in self.cl_members.keys()}
        undergoers = self._undergoers()
        new = ""
        for s in string:
            if (s in undergoers) and (s not in transparent.values()):
                for c in classes:
                    
                    # rewrite the non-initial mention of the harmonic set member
                    # as its harmony_class_id
                    if s in c and not classes[c]:
                        classes[c] = True
                        new += s
                    elif s in c:
                        new += self.cl_members[c]
            else:
                new += s
        return new

    
    def _mask_words(self, words, transparent):
        """
        Masks every word of a given list; helper function.
        
        Output type: list[str]
        """
        return [self._mask(w, transparent) for w in words]
            
            
    def _undergoers(self):
        """
        Collects all undergoers; helper function.
        
        Output type: list[char]
        """
        items = []
        for i in self.cl_members:
            items.extend(list(i))
        return items
    
    def _transparent(self):
        """
        Checks if there are transparent items, i.e. if there is
        a harmonic class or classes that only contain a single item.
        
        Output type: dict[str:str]
        """
        transparent = dict()
        for i in self.cl_members:
            if len(i) == 1:
                transparent[self.cl_members[i]] = i[0]
        return transparent
        
        
    def _verify_classes(self):
        """
        Verifies that no set (harmonic sets or the set of blockers)
        overlaps with each other.
        
        Output type: bool
        """
        items = self._undergoers()
        if self.blockers is not None:
            block_ok = all([i not in items for i in self.blockers])
        else:
            block_ok = True
        return len(items) == len(set(items)) and block_ok
    
    
    def _unpack_classes(self):
        """
        Creates a dictionary where every harmonizing element 
        is mapped to its harmonic class; helps to optimize 
        the lookup of this information.
        
        Output type: dict
        """
        items = self._undergoers()
        unpacked = {}
        for i in items:
            for j in self.cl_members:
                if i in j:
                    unpacked[i] = self.cl_members[j]
        return unpacked

    
    def _specify(self):
        """
        Randomly initialize a specification from all given
        harmonic datasets.
        
        Output type: list[char]
        """
        return list(map(choice, self.cl_members.keys()))

### Examples of the data generated by AHG

#### Parallel vowel and consonant harmonies
Harmony of a class "A" that contains "a" and "o" and of a class "B" that contains "b" and "p". Linguistically, these are simultaneous and independent vowel and consonant harmonies.

In [4]:
s1 = {("a", "o"):"A", ("b", "p"):"B"}
h1 = Harmony(s1)

Now, let's generate a sample of well-formed words.

In [5]:
print(h1.generate_words(n = 5, length = 10))

['aapapppapp', 'boobooobbb', 'baaabaabaa', 'booobooboo', 'aabbbabbba']


#### Harmony with a transparent element

Transparent, or irrelevant items that only introduce the long-distance effect in the dataset can be modeled by providing an extra harmonic class with just a single item in it.

In [6]:
s2 = {("a", "o"):"A", ("x"):"X"}
l2 = {"A":(1, 2), "X":(2, 4)}
h2 = Harmony(s2, l2)

Now, us generate some well-formed words.

In [7]:
print(h2.generate_words(n = 5, length = 10))

['xxxxooxxxx', 'xxxxooxxoo', 'xxaxxxaxxx', 'aaxxxxaaxx', 'xxxaaxxxaa']


#### Parallel vowel and consonant harmonies with a blocking effect

Harmony of a class "A" and of a class "B", where if "t" occurred, "p" cannot be observed anymore: class "B" changes its specification to "p". Namely, "t" is a blocker that only allows for "p" after itself.

Additionally, clusters of the A-element consist usually from 1 to 3 elements, and clusters of the B-elements are 2 to 4 elements long. The probability of observing the blocker is $\frac{1}{4}$ at every step of the generation.

In [8]:
s3 = {("a", "o"):"A", ("b", "p"):"B"}
l3 = {"A":(1, 3), "B":(2, 4)}
b3 = {"t":"p"}
p3 = 4
h3 = Harmony(s3, l3, b3, p3)

Let's first generate some well-formed words.

In [9]:
print(h3.generate_words(n = 5, length = 10))

['aaapppaaat', 'opppoooppt', 'ttppptpppo', 'opppoppooo', 'tooppppooo']


## Step 3: Turkish generators and evaluators

The following two functions I will be using in order to verify the well-formedness of generated Turkish or fake Turkish words:
  * `backness_harmony` takes a string as input and tells if that strings is well-formed with respect to the rules of Turkish backness harmony;
  * `rounding_harmony` does the same thing for the rounding harmony.

In [10]:
def backness_harmony(string):
    """
    Tells if a string is well-formed according to rules
    of Turkish backness harmony.
    """
    front_class, back_class = "Iaou", "ieOU"
    front, back = False, False
    
    for v in front_class + back_class:
        if v in string:
            front = True if v in front_class else front
            back = True if v in back_class else back

    return not (front and back)

In [11]:
def rounding_harmony(string):
    """
    Tells if a string is well-formed according to rules
    of Turkish rounding harmony.
    """
    high, low, rounded = "iIuU", "aeoO", "uUoO"
    
    vowels = "".join([v for v in string if v in high + low])
    if len(vowels) < 2:
        return True
    
    ro = vowels[0] in rounded
    
    for v in vowels[1:]:
        if v in low:
            if v in rounded:
                return False
            ro = False
        elif (ro and v not in rounded) or (not ro and v in rounded):
            return False
            
    return True

In [12]:
def backness_and_rounding(string):
    return backness_harmony(string) and rounding_harmony(string)

Additionally, to generate simplified Turkish data I will be using `turkish_word` and `generate_turkish_words` that generate a single word and a dataset, correspondingly.

Their parameters are:
* `length` is a desired length of the Turkish word;
* `cond` is a choice of "consonant" that will be separating the vowels;
* `vowel_cluster` is a tuple of integers representing minimal and maximal length of the vowel cluster;
* `cons_cluster` is a tuple of integers representing minimal and maximal length of the consonantal cluster;
* `n` (available for `generate_turkish` only) is the number of the examples that need to be generated.

In [13]:
def turkish_word(length = 10, cons = "x", vowel_cluster = (1, 2),
                          cons_cluster = (0, 3)):
    """
    This generator generates fake Turkish words: namely, the words in which
    the harmonic system and rules of Turkish are preserved, but all consonants
    were substituted by a single given consonant.
    
    Arguments:
    * length (int): a length of a word that needs to be generated;
    * cons (str): a single character (or an empty string if only vowels
                  need to be generated), a "choice" of the consonant 
                  that makes this harmony long-distant;
    * vowel_cluster (tuple[int, int]): a tuple of integers representing
                                       minimal and maximal length of
                                       the vowel cluster;
    * cons_cluster (tuple[int, int]): a tuple of integers representing
                                      minimal and maximal length of
                                      the consonantal cluster.
                                      
    Returns:
    * str: a fake Turkish harmonic word, where all consonants are masked.
    """
    if length < 1:
        raise ValueError("Words cannot be so short.")
    
    vowels = {
        (True, True, True):"u",
        (True, True, False):"I",
        (True, False, True):"o",
        (True, False, False):"a",
        (False, True, True):"U",
        (False, True, False):"i",
        (False, False, True):"O",
        (False, False, False):"e"
    }
    
    backness = choice([True, False])
    height = choice([True, False])
    rounding = choice([True, False])
    
    specs = (backness, height, rounding)
    word = ""
    
    if choice([0, 1]):
            word += "x" * randint(*cons_cluster)
            
    while len(word) < length:
        vc = vowels[specs] * randint(*vowel_cluster)
        
        # this part is neededd to avoid the word-initial *oo clusters
        if len(vc) > 1 and not height and rounding:
            rounding = False
            vc = vc[0] + vowels[(backness, height, rounding)] * (len(vc) - 1)
            
        word += vc
        word += "x" * randint(*cons_cluster)
        
        height = choice([True, False])
        rounding = False if not height else rounding
        specs = (backness, height, rounding)
        
    return word[:length]

In [14]:
def generate_turkish_words(n = 10, length = 10, cons = "x",
                           vowel_cluster = (1, 2), cons_cluster = (1, 3)):
    """
    This generator generates a list of fake Turkish words.
    
    Arguments:
    * n (int): a number of strings that need to be generated;
    ... for the rest of the arguments, see generate_turkish_word.
    
    Outputs:
    * list: the list containing n fake Turkish words.
    """
    return [turkish_word(length, cons, vowel_cluster, cons_cluster) for i in range(n)]

## Step 4: other harmonic evaluators

The function `harmonic_evaluator` below takes two arguments: `data` and `rule`. `data` is a list of words that need to be evaluated, and `rule` is the evaluation function for some concrete harmony. This function will be further used in order to evaluate the performance of the learners on the generated datasets.

In [15]:
def harmonic_evaluator(data, rule):
    """
    Evaluates the provided data with respect to a given
    rule of harmony.
    
    Arguments:
    * data (list[str]): a list of strings tht need to be evaluated;
    * rule (function): a function that evaluates a string according
                       to some harmony.
                       
    Results:
    * Prints the report that shows if the data follows the rule.
    """
    correct = 0
    for w in data:
        correct = (correct + 1) if rule(w) else correct
        
    ratio = (correct / len(data))
    print(f"Percentage of harmonic words: {int(ratio * 100)}%.")

### Finnish

Finally, `front_harmony` defines a function that tells if a given string follows a rule of Finnish vowel harmony.

In [16]:
def front_harmony(string):
    """
    Tells if a string is well-formed according to rules
    of Finnish backness harmony.
    """
    front_class, back_class = "AOy", "aou"
    front, back = False, False
    
    for v in front_class + back_class:
        if v in string:
            front = True if v in front_class else front
            back = True if v in back_class else back

    return not (front and back)

### Fake harmonies evaluators

This section would need to eventually be redone.

In [17]:
def single_harmony_no_blockers(string):
    """
    Checks if a single [a, o] harmony is well-formed.
    """
    return not("a" in string and "o" in string)

In [18]:
def single_harmony_with_blockers(string):
    """
    Checks if a single [a, o] harmony with a blocker f:a is well-formed.
    """
    if "f" in string:
        s1 = string[:string.index("f")]
        s2 = string[string.index("f") + 1:]
        return single_harmony_no_blockers(s1) and (not "o" in s2)
    else:
        return single_harmony_no_blockers(string)

In [19]:
def double_harmony(string, group = ["a", "o", "u", "e"]):
    """
    Tells if a string contains only one out of four
    (vowel) classes; check that at most one class
    of vowels occurs within one word.
    
    Arguments:
    * string (str): a string that needs to be verified;
    * group (list[char]): the harmonic class.
    """
    assert len(group) == 4
    classes = 0
    
    for i in group:
        classes = (classes + 1) if i in string else classes
        
    return classes in [0, 1]

In [20]:
def double_harmony_no_blockers(string):
    """
    Checks if a double [a, o] and [b, p] harmony is well-formed.
    """
    vowels = not("a" in string and "o" in string)
    consonants = not("b" in string and "p" in string)
    return vowels and consonants

In [21]:
def double_harmony_with_blockers(string):
    """
    Checks if a double [a, o] and [b, p] harmony with a blocker t:p
    is well-formed.
    """
    if "a" in string and "o" in string:
        return False
    
    if "t" in string:
        s1 = string[:string.index("t")]
        s2 = string[string.index("t") + 1:]
        return double_harmony_no_blockers(s1) and ("b" not in s2)
    else:
        return double_harmony_no_blockers(string)

## Step 5: Word-final devoicing generators and evaluators

The functions `word_final_devoicing` and `generate_wfd` imitate the process of word-final devoicing.
The former one generates a string or a pair of strings (UR -> SF) implementing that rule, and the latter one generates dataset consisting of ones.

Their arguments are the following:
* `sigma` is a list of symbols that can be used in the words;
* `devoice` contains two tuples, where the first tuple represents voiced obstruents, and the second one stands for their voiceless counterparts;
* `length` is the length of the intended words;
* if `pairs` is True, (UG, SF) pairs will be returned, if False, only the surface forms;
* `n` (available only for `generate_wfd`) is a number of strings or pairs that need to be generated.

In [22]:
def word_final_devoicing(sigma = ("a", "b", "p"), devoice = (("b"), ("p")),
                         length = 10, pairs = False):
    """
    This function generates either a word grammatical with respect to a rule
    of the word final devoicing, or a fake UG -> SF pair.
    
    Arguments: 
    * sigma (list[str]): a list of symbols that can be used in the words;
    * devoice (tuple[tuple, tuple]): the first tuple represents voiced
                                     obstruents, and the second one stands
                                     for their voiceless counterparts;
    * length (int): a length of the intended words;
    * pairs (bool): if True, (UG, SF) pairs will be returned, if False, only
                    the surface forms.
                    
    Outputs:
    * str/tuple: a string or a tuple of strings (depending on the parameter 
                 `pairs`) representing the application of the word-final 
                 devoicing.
    """
    if length < 1:
        raise ValueError("The string has a very weird length.")
        
    before, after = devoice
    string = "".join([choice(sigma) for i in range(length)])
    
    if string[-1] not in before:
        return (string, string) if pairs else string
    
    devoiced = string[:-1] + after[before.index(string[-1])]
    return (string, devoiced) if pairs else devoiced

In [23]:
def generate_wfd(n = 10, sigma = ("a", "b", "p"), devoice = (("b"), ("p")),
                 length = 10, pairs = False):
    """
    Generates a set of strings or pairs that satisfy the rule of
    the word-final devoicing.
    
    Arguments:
    * n (int): the number of strings that need to be generated;
    ... for the rest of the arguments see word_final_devoicing.
    
    Outputs:
    * list: a list of strings or tuples (depending on the parameter `pairs`)
            representing the application of the word-final devoicing.
    """
    return [word_final_devoicing(sigma, devoice, length, pairs) for i in range(n)]

The following function `evaluate_wfd_words` evaluates words with respect to the rules of the word-final devoicing.

In [24]:
def evaluate_wfd_words(data, voiced = ("b")):
    """
    Evaluates the provided words with respect to the rule 
    of the word-final devoicing.
    
    Arguments:
    * data (list[str]): a list of strings tht need to be evaluated;
    * voiced (tuple[char]): a list of voiced characters, i.e. those
                            that cannot be word-final.
                       
    Results:
    * Prints the report that shows if the data follows the ule.
    """
    correct = 0
    for w in data:
        
        if not len(w):
            correct += 1
            continue
            
        correct = (correct + 1) if w[-1] not in voiced else correct
        
    ratio = (correct / len(data))
    print(f"Percentage of well-formed words: {int(ratio * 100)}%.")

As before, we can generate some words or pairs of words representing the rule of the word-final devoicing, and then check if the evaluator considers that those datasets are well-formed.

In [25]:
evaluate_wfd_words(generate_wfd(n = 1000, pairs = False))

Percentage of well-formed words: 100%.


## Step 6: UTP generator and evalurator

The function `generate_tonal_pattern` takes a length of the string that needs to be generated, and returns a random string of raising (H) and falling (L) tones as output. `utp_tones` takes that string of tones as input, and rewrites it according to the UTP rules: no L tones are allowed in-between two H tones.

In [26]:
def generate_tonal_pattern(length = 5):
    """ Generates a random sequence of tones of a given length. """
    return "".join(choice(["H", "L"]) for i in range(length))

In [27]:
def utp_tones(string):
    """ Rewrites a tonal string with respect to the rules of UTP. """
    
    if set(string) not in [{"H", "L"}, {"H"}, {"L"}, set("")]:
        print(string)
        raise ValueError("Unexpected symbols in the tonal string!")
    if not ("H" in string and "L" in string):
        return string
    
    first_h = string.find("H")
    last_h = len(string) - string[::-1].find("H")
    return string[:first_h] + "H" * (last_h - first_h) + string[last_h:]

Then, `generate_utp_strings` generates strings of tones that are well-formed accroding to the rules of UTP. As before, `n` signifies the number of strings that need to be generated, and `length` is the length of those strings.

In [28]:
def generate_utp_strings(n = 10, length = 5):
    """ Generates n strings of tones that follow UTP rules. """
    return [utp_tones(generate_tonal_pattern(length)) for i in range(n)]

Finally, `evaluate_utp_strings` and `evaluate_utp_pairs` calculate what is the percentage of the input data (strings or pairs of strings) is well-formed with respect to the rules of UTP.

In [29]:
def evaluate_utp_strings(data):
    """ Evaluates the correctness of if the given sample of tonal strings. """
    correct = 0
    for w in data:
        correct = (correct + 1) if utp_tones(w) == w else correct
        
    ratio = (correct / len(data))
    print(f"Percentage of well-formed tonal layers: {int(ratio * 100)}%.")

As before, we can verify the correctness of the generator using the evaluation functions.

In [30]:
evaluate_utp_strings(generate_utp_strings(n = 1000))

Percentage of well-formed tonal layers: 100%.


## Step 7: First-last harmony generators and evaluators

In [31]:
def first_last_UR(n = 10, length = 10):
    """ Generates URs of first-last harmony words. """
    strings = []
    for i in range(n):
        new = choice(["a", "o"])
        new += "".join([choice(["a", "o", "x"]) for j in range(length - 2)])
        new += choice(["a", "o"])
        strings.append(new)
    return strings

def first_last(string):
    """ Makes the first and the last segment of the string the same. """
    return string[:-1] + string[0]

def first_last_words(n = 10, length = 10):
    """ Generates N first-last words. """
    return [first_last(w) for w in first_last_UR(n, length)]

In [32]:
def evaluate_first_last_words(data):
    """
    Evaluates the correctness of if the given sample
    of first-last harmony (UR -> SF).
    """
    newdata = [i for i in data if len(i) > 1]
    correct = 0
    for w in newdata:
        if w[0] == w[-1]:
            correct += 1
        
    ratio = (correct / len(newdata))
    print(f"Percentage of first-last harmonic words: {int(ratio * 100)}%.")

### Auxiliary functions \[to be eliminated\]

The SP generator needs to be checked with an empty negative alphabet: it's incredibly slow, something is going on.

In [33]:
def generate_sp_empty_word(alphabet, length = 5):
    return "".join([choice(alphabet) for i in range(length)])

def generate_sp_empty(alphabet, n = 10, length = 5):
    return [generate_sp_empty_word(alphabet, length) for i in range(n)]

# Preparing training samples for the experiments

### Experiment 1: Word-final devoicing

#### Artificial grammar: `toy_wfd`

In [34]:
toy_wfd = generate_wfd(n = 1000)
print(toy_wfd[:15])

['bapbabbppp', 'aaapppbaap', 'bppaaapbap', 'apabbpbapp', 'bpbabbbaap', 'appappbbbp', 'bpbbababbp', 'baabpbapap', 'ppbppbpbpp', 'pbppbabaap', 'aababppbbp', 'apabbpppba', 'abpbpbpaap', 'bppapaappa', 'bbabbpbpbp']


#### Raw German data: `german_wfd`

In German, orthography doesn't reflect the word-final devoicing. So first of all, I rewrite all word-final /b/, /d/ and /g/ as /p/, /t/ and /k/, correspondingly. Additionally, I also remove words with "non-German" characters. The data comes from the [wordlist by enz](https://github.com/enz/german-wordlist).

In [35]:
german_data = []
with codecs.open('german.txt', encoding='utf-8') as f:
    for line in f:
        if line != "":
            german_data.append(line[:-1])
            
print(len(german_data))
print(german_data[:10], "...")

685618
['Aa', 'Aachener', 'Aachenerin', 'Aachenerinnen', 'Aachenern', 'Aacheners', 'Aaden', 'Aak', 'Aake', 'Aaken'] ...


In [36]:
count_final_b = 0
count_final_d = 0
count_final_g = 0

for i in german_data:
    if i[-1] == "b":
        count_final_b += 1
    elif i[-1] == "d":
        count_final_d += 1
    elif i[-1] == "g":
        count_final_g += 1
        
print("Number of final /b/:", count_final_b) # 1599, or 0.2% words
print("Number of final /d/:", count_final_d) # 15294, or 2.2% words
print("Number of final /g/:", count_final_g) # 17098, or 2.4 % words

Number of final /b/: 1599
Number of final /d/: 15294
Number of final /g/: 17098


In [37]:
ban = ['à', 'á', 'â', 'å', 'ç', 'è', 'é', 'ê', 'ë', 'í', 'î', 'ñ', 'ó', 'õ', 'ú',
       'û', 'č', 'ē', 'ī', 'ł', 'ō', 'œ', 'š', 'ū']

german_wfd = []
banned_words = []

for w in german_data:
    
    word = w.lower()
    
    illegal = False
    for b in ban:
        if b in word:
            banned_words.append(word)
            illegal = True
            break
            
    if illegal:
        continue
        
    if word[-1] == "b":
        word = word[:-1] + "p"
    elif word[-1] == "d":
        word = word[:-1] + "t"
    elif word[-1] == "g":
        word = word[:-1] + "k"
        
    german_wfd.append(word)

print(len(german_wfd))
print("Clean dataset:", german_wfd[:15], "...\n")

print(len(banned_words))
print("Banned words:", banned_words[:10], "...")

685147
Clean dataset: ['aa', 'aachener', 'aachenerin', 'aachenerinnen', 'aachenern', 'aacheners', 'aaden', 'aak', 'aake', 'aaken', 'aakerbeere', 'aakerbeeren', 'aakes', 'aaks', 'aal'] ...

471
Banned words: ['abbé', 'abbés', 'abrégé', 'abrégés', 'acheuléen', 'acheuléens', 'agrément', 'agréments', 'ampère', 'ångström'] ...


#### Masked German data: `german_wfd_masked`

Now, let us substitute all segments that are not /p/, /t/, /k/, /b/, /d/, /g/ by "a".
It will help further to try the learning algorithms on data that has less local dependencies.

In [38]:
german_wfd_masked = []
for w in german_wfd:
    new = ""
    for s in w:
        if s in ["p", "t", "k", "b", "d", "g"]:
            new += s
        else:
            new += "a"
    german_wfd_masked.append(new)
german_data.append("")
    
print(len(german_wfd_masked))
print("Masked words:", german_wfd_masked[10:15], "...")

685147
Masked words: ['aakaabaaaa', 'aakaabaaaaa', 'aakaa', 'aaka', 'aaa'] ...


### Experiment 2: One vowel harmony, no blockers

#### Artificial grammar: `toy_vhnb`

In [39]:
ts2 = {("a", "o"):"A", ("x"):"X"}
tl2 = {"A":(1, 2), "X":(2, 4)}
th2 = Harmony(ts2, tl2)
toy_vhnb = th2.generate_words(n = 1000)
print(toy_vhnb[:15], "...")

['xxaxxaxxaa', 'xxxaxxaaxx', 'axxxxaxxxx', 'xxoxxooxxo', 'xxxxaxxxxa', 'xxxxoxxxoo', 'axxxaaxxxx', 'ooxxxoxxxo', 'aaxxxaxxxa', 'aaxxaxxaxx', 'xxxooxxoxx', 'ooxxxoxxoo', 'xxxxaaxxxa', 'xxxxaxxxxa', 'ooxxxoxxxo'] ...


#### Raw Finnish data: `finnish_harmony`

The next step is to have a dataset from a natural language that implements a single harmony.
Here, I use Finnish data from [this link](https://github.com/douglasbuzatto/WordLists/blob/master/finnish-words.txt).

In [40]:
finnish_data = []
with codecs.open('finnish.txt', encoding='utf-8') as f:
    for line in f:
        if line != "":
            finnish_data.append(line[:-2])
            
print(len(finnish_data))
print(finnish_data[:10], "...")

287699
['/* WP Hardening - 2016-06-19 19:09:32.261648 *', 'a', 'aa', 'aaa', 'aaaaaah', 'aaah', 'aaassa', 'aab', 'aaberge', 'aabraham'] ...


Then I filter the unharmonic stems and clean the data. Apart from the digits and punctuations, I also filter words that contain `}` that stands here in this dataset for Swedish `å`, and therefore is ill-defined in terms of the harmony. Then I rewrite `{` as `ä` and `|` as `ö` in order to normalize the spelling with respect to Turkish examples further. Finally, I filter non-harmonic stems.

In [41]:
ban = [' ', '*', '-', '.', '/', '0', '1', '2', '3', '4', '6', '8', '9', ':', '}']

finnish_harmony = []
banned_words = []
non_harmonic = []

for w in finnish_data:
    
    word = w.lower()
    
    illegal = False
    for b in ban:
        if b in word:
            banned_words.append(word)
            illegal = True
            break
            
    if illegal:
        continue
    
    word = word.replace("{", "A")
    word = word.replace("|", "O")
    if front_harmony(word):
        finnish_harmony.append(word)
    else:
        non_harmonic.append(word)

print(len(finnish_harmony))
print("Clean dataset:", finnish_harmony[105000:105015], "...\n")

print(len(banned_words))
print("Banned words:", banned_words[10:15], "...\n")

print(len(non_harmonic))
print("Non-harmonic words:", non_harmonic[:3], "...")

250805
Clean dataset: ['liitettAvAA', 'liitetyksi', 'liitetyt', 'liitetAAn', 'liitingin', 'liito', 'liitoille', 'liitoilleen', 'liitoissa', 'liitoista', 'liitoistaan', 'liitoksen', 'liitoksena', 'liitokset', 'liitoksi'] ...

331
Banned words: ['bl}baer', 'bl}field', 'bl}fieldin', 'bl}sar', 'bl}sare'] ...

36563
Non-harmonic words: ['aakkosjArjestykseen', 'aakkosjArjestyksessA', 'aaltoliikettA'] ...


#### Masked Finnish data: `finnish_harmony_masked`

Finally, I create a dataset in which I mask all the transparent Finnish elements.

In [42]:
finnish_harmony_masked = []
for w in finnish_harmony:
    new = ""
    for s in w:
        if s in ["A", "O", "y", "a", "o", "u"]:
            new += s
        else:
            new += "x"
    finnish_harmony_masked.append(new)
    
print(len(finnish_harmony_masked))
print("Masked words:", finnish_harmony_masked[170005:170010], "...")

250805
Masked words: ['xauxaxxxxxoxxxxa', 'xauxaxxxxxoxxxxx', 'xauxaxxxxxxxax', 'xauxaxxxx', 'xauxaxxxxuxxxxx'] ...


### Experiment 3: One vowel harmony with blockers

#### Artificial grammar: `toy_vhwb`

In [43]:
harmonic_classes = {("a", "o"):"A", ("x"):"X"}
blockers = {"f":"a"}
cluster_lengths = {"A":(1, 2), "X":(1, 3)}
blocker_prob = 5
h = Harmony(harmonic_classes, cluster_lengths, blockers, blocker_prob)
toy_vhwb = h.generate_words(n = 1000)
print(toy_vhwb[:15], "...")

['xooxxxfxxa', 'fxaafaxxxf', 'oxxooxooxx', 'aaxxxaaxxx', 'xoxxoofxxa', 'xxxaxxaaxx', 'aaxxxaxxxa', 'xxfaxxxaxx', 'xaxaaxxafx', 'axxaaxxxax', 'xafxaxxxax', 'axxxaxfxxa', 'xxxfaxaafa', 'xoxxxooxoo', 'aaxxaxaaxx'] ...


### Experiment 4: Two vowel harmonies, no blockers

#### Artificial grammar: `toy_shnb`

In [44]:
is2 = {("a", "e", "o", "u"):"A", ("x"):"X"}
il2 = {"A":(1, 2), "X":(2, 4)}
ih2 = Harmony(is2, il2)
toy_shnb = ih2.generate_words(n = 1000)
print(toy_shnb[:15], "...")

['xxxxaxxxxa', 'xxxxuxxxuu', 'xxxuuxxuxx', 'oxxxooxxxx', 'xxxaxxxxaa', 'eexxxexxee', 'xxxxaxxxaa', 'xxxexxxexx', 'xxaxxxxaxx', 'uxxxuxxxxu', 'exxxeexxee', 'uuxxxxuxxu', 'xxuuxxxxuu', 'xxxaxxxxaa', 'xxaaxxxxaa'] ...


### Experiment 5: Two vowel harmonies with vowel blockers

#### Artificial grammar: `toy_mhwb`

In [45]:
toy_mhwb = generate_turkish_words(n = 5000, length = 8, cons_cluster = (0, 3))
toy_mhwb.extend(generate_turkish_words(n = 5000, length = 6, cons_cluster = (0, 3)))
toy_mhwb.extend(generate_turkish_words(n = 5000, length = 4, cons_cluster = (0, 3)))
print(toy_mhwb[:15], "...")

['uxxxaaxx', 'Oexxexxe', 'ixxeexxx', 'xIIxxaxa', 'oxaxxIII', 'eexeexee', 'uuaxxxax', 'xoaIxxax', 'IIxxaaIx', 'xxxIaxxx', 'eexxxixx', 'xxxUxxxU', 'Oexeeiie', 'xxIIxaxx', 'UUUUexxx'] ...


#### Raw Turkish data: `turkish_harmony`

The following is a dataset of Turkish harmony from [here](http://www.swarthmore.edu/SocSci/harmony/public_html/dummyresults.html). I remove non-native Turkish words, and also filter the ones that do not follow the rules of backness and rounding harmony.

In [46]:
banned = []
non_harmonic = []
turkish_harmony = []

with codecs.open('turkish.txt', encoding='utf-8') as f:
    
    ban = ["!", "-", "w", "x", "A"]
    for line in f:
        if line == "":
            continue
        w = line[:-1]
        
        if any([(i in w) for i in ban]):
            banned.append(w)
            continue
            
        if backness_harmony(w) and rounding_harmony(w):
            w = w.replace("K", "k")
            turkish_harmony.append(w)
        else:
            non_harmonic.append(w)
            
print(len(banned))
print(banned[:30], "...\n")

print(len(non_harmonic))
print(non_harmonic[:30], "...\n")
            
print(len(turkish_harmony))
print(turkish_harmony[:30], "...")

890
['ey-', 'gadr-', 'eG-', 'kesr-', 'tard-', 'keyf-', 'kos-', 'ledel-', 'garb-', 'ekto-', 'ekz-', 'fasl-', 'elektrik-Gi', 'elektro-', 'terkib-', 'abs-', 'lem-', 'koyn-', 'sUlUUk-u', 'hacr-', 'hacz-', 'hadd-', 'tesb-', 'li-', 'kIral-', 'hafid-', 'kriyo-', 'kriz-', 'hakk-', 'kIKr-'] ...

10545
['kesad', 'konukomKu', 'kesafet', 'somaki', 'kesan', 'kesat', 'lagemut', 'lagos', 'fuzuli', 'eyalet', 'rufai', 'ruhulkudUs', 'gaavur', 'gaavurca', 'gabardin', 'gabari', 'gabavet', 'kesedar', 'somye', 'konvansiyon', 'kooperatif', 'koordinasyon', 'sondeyiK', 'gabi', 'gabin', 'eylUUl', 'gabro', 'eylUl', 'eytam', 'gaco'] ...

14434
['som', 'lafazan', 'konuk', 'kekti', 'lafzan', 'konukCu', 'somak', 'laGar', 'laGIm', 'konulmak', 'somruk', 'laGIv', 'konum', 'somun', 'kesb', 'somurdanmak', 'konuk', 'somurmak', 'romanyalI', 'ru', 'ey', 'fuzul', 'gaah', 'eyer', 'gaasIb', 'eyercilik', 'rum', 'eyi', 'rumca', 'eyice'] ...


#### Masked Turkish data: `turkish_harmony_masked`
Then, I simplify the Turkish harmonic data by masking all non-vowels as `x`.

In [47]:
turkish_harmony_masked = []
for w in turkish_harmony:
    new = ""
    for s in w:
        if s in "iIuUaeoO":
            new += s
        else:
            new += "x"
    turkish_harmony_masked.append(new)
    
print(len(turkish_harmony_masked))
print("Masked words:", turkish_harmony_masked[12005:12010], "...")

14434
Masked words: ['xOxxex', 'xaxxaxxIxxax', 'xOxxUx', 'xaxxaxIx', 'xOxex'] ...


### Experiment 6: Vowel harmony and consonant harmony, no blockers

#### Artificial grammar: `toy_dhnb`

In [48]:
iss = {("a", "o"):"A", ("b", "p"):"B"}
ihs = Harmony(iss)
toy_dhnb = ihs.generate_words(n = 1000)
print(toy_dhnb[:15], "...")

['boobbbobbo', 'bbbaaabbba', 'pppaaapppa', 'opppopooop', 'abaabababb', 'ooopoopppo', 'bboooboboo', 'aababaaabb', 'oopppooopo', 'bbaabbaaab', 'ooobbbobbo', 'bbooobbboo', 'poppoopopo', 'bbbabbaaab', 'bbaaabbbab'] ...


### Experiment 7: Vowel harmony and consonant harmony with blockers

#### Artificial grammar: `toy_dhwb`

In [49]:
aa = {("a", "o"):"A", ("b", "p"):"B"}
bb = {"A":(1, 2), "B":(1, 2)}
cc = {"t":"p"}
dd = 5
hmm = Harmony(aa, bb, cc, dd)
toy_dhwb = hmm.generate_words(n = 5000)
print(toy_dhwb[:15], "...")

['paatppappt', 'aapattapaa', 'aabaababba', 'ttpappaatt', 'pootpototo', 'optopotppt', 'bboboboboo', 'ootpoopopp', 'opooppoopt', 'bbaatpptap', 'bobtpopoot', 'ppooppoopo', 'pataatpapt', 'optppopoot', 'aabbabbaab'] ...


### Experiment 8: Tonal plateauing
#### Artificial grammar: `toy_utp`

In [50]:
toy_utp = generate_utp_strings(n = 1000)
print(toy_utp[:15], "...")

['LLHHH', 'LLLLH', 'LHHLL', 'LHHHH', 'HHHHH', 'HHHHL', 'HHHHH', 'HHHHH', 'LLHHL', 'HHHHL', 'HHHHH', 'HHHHH', 'LHHHL', 'HHHLL', 'LHHHH'] ...


### Experiment 9: First-last harmony
#### Artificial grammar: `first_last_data`

In [51]:
first_last_data = first_last_words(n = 5000)
print(first_last_data[:15], "...")

['oaxxaxxoxo', 'oooaaaaaxo', 'aooaaxxooa', 'oooxoooxoo', 'ooooxoxaxo', 'aooaxaxxaa', 'aaxaaxaxoa', 'oaoaoxoxao', 'oaaoxooxoo', 'aaaaoxaxoa', 'ooaooxoxxo', 'oxoaaxxoao', 'aaoaoxaaxa', 'oaaoxooxao', 'axxxoaaaaa'] ...


### Quick reference to the datasets

* **Word-final devoicing**
  * `toy_wfd` (1,000 words)
  * `german_wfd` (685,147 words)
  * `german_wfd_masked` (685,147 words)
  
  
* **Single vowel harmony, no blockers**
  * `toy_vhnb` (1,000 words)
  * `finnish_harmony` (250,805 words)
  * `finnish_harmony_masked` (250,805 words)
  
  
* **Single vowel harmony with blockers**
  * `toy_vhwb` (1,000 words)
  
    
* **Two vowel harmonies, no blockers**
  * `toy_shnb` (1,000 words)
  
  
* **Two vowel harmonies with vowel blockers**
  * `toy_mhwb` (15,000 words)
  * `turkish_harmony` (14,434 words)
  * `turkish_harmony_masked` (14,434 words)
  
  
* **Vowel harmony and consonant harmony, no blockers**
  * `toy_dhnb` (1,000 words)
  
  
* **Vowel harmony and consonant harmony with blockers**
  * `toy_dhwb` (1,000 words)
  
  
* **Unboundedd tonal plateauing**
  * `toy_utp` (1,000 words)
  
  
* **First-last harmony**
  * `first_last_data` (5,000 words)

# Strictly local experiments

## Experiment 1: Word-final devoicing

### Artificial grammar

In [52]:
this = "sl1"
globals()[this] = SL(polar = "n")
globals()[this].data = toy_wfd
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
evaluate_wfd_words(globals()[this+"_sample"])
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of well-formed words: 100%.
--------------------------
Generates such strings: ['pbbbpbbapbpbaappppaaabapa', 'pp', 'apbp', 'bba', 'a', 'p', 'bp', 'pa', 'aapaba', 'bapp', 'p', 'apapbba', 'apa', 'bp', 'aap']
--------------------------
Size of the grammar: 2
--------------------------
First 30 restrictions: [('b', '<'), ('>', '<')]


### German simplified word-final devoicing

In [53]:
# this = "sl2"
# globals()[this] = SL(polar = "n")
# globals()[this].data = german_wfd_masked
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# evaluate_wfd_words(globals()[this+"_sample"], voiced = ("b", "d", "g"))
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("First 30 restrictions:", globals()[this].grammar[:30])

### German word-final devoicing

In [54]:
# this = "sl3"
# globals()[this] = SL(polar = "n")
# globals()[this].data = german_wfd
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# evaluate_wfd_words(globals()[this+"_sample"], voiced = ("b", "d", "g"))
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("First 30 restrictions:", globals()[this].grammar[:30])

## Experiment 2: Single vowel harmony, no blockers

### Artificial grammar

In [55]:
this = "sl4"
globals()[this] = SL(polar = "n")
globals()[this].data = toy_vhnb
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
harmonic_evaluator(globals()[this+"_sample"], single_harmony_no_blockers)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of harmonic words: 82%.
--------------------------
Generates such strings: ['o', 'o', 'ooooxoxoooxooxoooo', 'o', 'ox', 'aaxxxaaa', 'aa', 'a', 'oxxoxoxoo', 'oo', 'xxo', 'xaxxxaxooxxo', 'axxaa', 'xa', 'o']
--------------------------
Size of the grammar: 3
--------------------------
First 30 restrictions: [('a', 'o'), ('o', 'a'), ('>', '<')]


### Simplified Finnish harmony

In [56]:
# this = "sl5"
# globals()[this] = SL(polar = "n")
# globals()[this].data = finnish_harmony_masked
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# harmonic_evaluator(globals()[this+"_sample"], front_harmony)
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("First 30 restrictions:", globals()[this].grammar[:30])

### Finnish harmony

In [57]:
# this = "sl6"
# globals()[this] = SL(polar = "n")
# globals()[this].data = finnish_harmony
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# harmonic_evaluator(globals()[this+"_sample"], front_harmony)
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("First 30 restrictions:", globals()[this].grammar[:30])

## Experiment 3: Single vowel harmony with blockers

In [58]:
this = "sl7"
globals()[this] = SL(polar = "n")
globals()[this].data = toy_vhwb
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
harmonic_evaluator(globals()[this+"_sample"], single_harmony_with_blockers)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of harmonic words: 81%.
--------------------------
Generates such strings: ['ff', 'f', 'fxf', 'faxoxaafx', 'ff', 'axoxooxox', 'x', 'xoxaaafaaffaaaaa', 'x', 'afaa', 'xxoo', 'xoxfa', 'oxaxa', 'fafaff', 'xaaffff']
--------------------------
Size of the grammar: 4
--------------------------
First 30 restrictions: [('a', 'o'), ('f', 'o'), ('o', 'a'), ('>', '<')]


## Experiment 4: Two vowel harmonies, no blockers

In [59]:
this = "sl8"
globals()[this] = SL(polar = "n")
globals()[this].data = toy_shnb
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
harmonic_evaluator(globals()[this+"_sample"], double_harmony)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of harmonic words: 66%.
--------------------------
Generates such strings: ['eeee', 'uxuxx', 'ux', 'uu', 'exo', 'aa', 'exa', 'x', 'e', 'uxu', 'oo', 'u', 'xxaxaaa', 'axo', 'e']
--------------------------
Size of the grammar: 13
--------------------------
First 30 restrictions: [('a', 'e'), ('a', 'o'), ('a', 'u'), ('e', 'a'), ('e', 'o'), ('e', 'u'), ('o', 'a'), ('o', 'e'), ('o', 'u'), ('u', 'a'), ('u', 'e'), ('u', 'o'), ('>', '<')]


## Experiment 5: Two vowel harmonies with vowel blockers

### Artificial grammar

In [60]:
this = "sl9"
globals()[this] = SL(polar = "n")
globals()[this].data = toy_mhwb
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
harmonic_evaluator(globals()[this+"_sample"], backness_and_rounding)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of harmonic words: 63%.
--------------------------
Generates such strings: ['e', 'UxO', 'UUxxaIIIaI', 'x', 'axe', 'UxaxexOxieei', 'O', 'uxe', 'uxIxoxI', 'iixOUxOee', 'uaIxxix', 'OUe', 'xIIa', 'uaxiixUe', 'Uee']
--------------------------
Size of the grammar: 49
--------------------------
First 30 restrictions: [('I', 'O'), ('I', 'U'), ('I', 'e'), ('I', 'i'), ('I', 'o'), ('I', 'u'), ('O', 'I'), ('O', 'O'), ('O', 'a'), ('O', 'i'), ('O', 'o'), ('O', 'u'), ('U', 'I'), ('U', 'O'), ('U', 'a'), ('U', 'i'), ('U', 'o'), ('U', 'u'), ('a', 'O'), ('a', 'U'), ('a', 'e'), ('a', 'i'), ('a', 'o'), ('a', 'u'), ('e', 'I'), ('e', 'O'), ('e', 'U'), ('e', 'a'), ('e', 'o'), ('e', 'u')]


### Simplified Turkish harmony

In [61]:
# this = "sl10"
# globals()[this] = SL(polar = "n")
# globals()[this].data = turkish_harmony_masked
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# harmonic_evaluator(globals()[this+"_sample"], backness_and_rounding)
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("First 30 restrictions:", globals()[this].grammar[:30])

### Turkish harmony

In [62]:
# this = "sl11"
# globals()[this] = SL(polar = "n")
# globals()[this].data = turkish_harmony
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# harmonic_evaluator(globals()[this+"_sample"], backness_and_rounding)
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("First 30 restrictions:", globals()[this].grammar[:30])

## Experiment 6: Vowel harmony and consonant harmony, no blockers

In [63]:
this = "sl12"
globals()[this] = SL(polar = "n")
globals()[this].data = toy_dhnb
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
harmonic_evaluator(globals()[this+"_sample"], double_harmony_no_blockers)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of harmonic words: 64%.
--------------------------
Generates such strings: ['opabaapopaboo', 'a', 'p', 'boppaa', 'opabbo', 'op', 'aapapo', 'obobooopaappp', 'ob', 'pob', 'p', 'opaabo', 'ba', 'poopa', 'opo']
--------------------------
Size of the grammar: 5
--------------------------
First 30 restrictions: [('a', 'o'), ('b', 'p'), ('o', 'a'), ('p', 'b'), ('>', '<')]


## Experiment 7: Vowel harmony and consonant harmony with blockers

In [64]:
this = "sl13"
globals()[this] = SL(polar = "n")
globals()[this].data = toy_dhwb
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
harmonic_evaluator(globals()[this+"_sample"], double_harmony_with_blockers)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of harmonic words: 63%.
--------------------------
Generates such strings: ['p', 'a', 'bbbbboptab', 'oppooptatatttpp', 'tt', 'a', 'btpopab', 'opaaaa', 'obobtpp', 'topppatttpappobtaabobapot', 'ataabttpab', 'pootpooopott', 'ptapapoboptopap', 'optabaab', 'pob']
--------------------------
Size of the grammar: 6
--------------------------
First 30 restrictions: [('a', 'o'), ('b', 'p'), ('o', 'a'), ('p', 'b'), ('t', 'b'), ('>', '<')]


## Experiment 8: Unbounded tonal plateauing

In [65]:
this = "sl14"
globals()[this] = SL(polar = "n", k = 3)
globals()[this].data = toy_utp
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
evaluate_utp_strings(globals()[this+"_sample"])
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of well-formed tonal layers: 85%.
--------------------------
Generates such strings: ['LHHHLLL', 'LL', 'LH', 'LL', 'HL', 'LL', 'LL', 'LLHHH', 'HH', 'LL', 'LLL', 'LLH', 'HLLLHH', 'LLLHLLLL', 'HL']
--------------------------
Size of the grammar: 5
--------------------------
First 30 restrictions: [('H', 'L', 'H'), ('>', 'H', '<'), ('>', 'L', '<'), ('>', '>', '<'), ('>', '<', '<')]


## Experiment 9: First-last harmony

In [66]:
this = "sl15"
globals()[this] = SL(polar = "n", k = 3)
globals()[this].data = first_last_data
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
evaluate_first_last_words(globals()[this+"_sample"])
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of first-last harmonic words: 48%.
--------------------------
Generates such strings: ['aaxooaaxaaxaaoaxxaxo', 'oxao', 'axooaxxoxoxxoxoaa', 'oxxxoxaooaoaxxa', 'oo', 'oxxo', 'oxaaxoxaxxaoaxaxxxxa', 'oxo', 'ooxoaaaxxooaxa', 'ao', 'oxoooo', 'axxxxxaa', 'axxo', 'oa', 'ooooxoxa']
--------------------------
Size of the grammar: 13
--------------------------
First 30 restrictions: [('a', 'x', '<'), ('o', 'x', '<'), ('x', 'x', '<'), ('x', '<', '<'), ('>', 'a', '<'), ('>', 'o', '<'), ('>', 'x', 'a'), ('>', 'x', 'o'), ('>', 'x', 'x'), ('>', 'x', '<'), ('>', '>', 'x'), ('>', '>', '<'), ('>', '<', '<')]


# Strictly piecewise experiments

## Experiment 1: Word-final devoicing

### Artificial grammar

In [67]:
this = "sp1"
globals()[this] = SP(polar = "n")
globals()[this].data = toy_wfd
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
if not globals()[this].grammar:
    evaluate_wfd_words(generate_sp_empty(globals()[this].alphabet, n = 1000))
else:
    evaluate_wfd_words(globals()[this+"_sample"])
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of well-formed words: 64%.
--------------------------
Generates such strings: ['', 'abpaabpaap', 'paaaap', 'ppapbpa', 'pabaaaa', 'bapppb', 'bbaapppapp', 'paapapp', 'bppbp', 'papppbpaba', 'bbappb', 'bppppbbpbppp', 'appppbba', 'aapppppbpbbp', 'abpaabbappb']
--------------------------
Size of the grammar: 0
--------------------------
First 30 restrictions: []


### German simplified word-final devoicing

In [68]:
# this = "sp2"
# globals()[this] = SP(polar = "n")
# globals()[this].data = german_wfd_masked
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# if not globals()[this].grammar:
#     evaluate_wfd_words(generate_sp_empty(globals()[this].alphabet, n = 1000), voiced = ("b", "d", "g"))
# else:
#     evaluate_wfd_words(globals()[this+"_sample"], voiced = ("b", "d", "g"))
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("First 30 restrictions:", globals()[this].grammar[:30])

### German word-final devoicing

In [69]:
# this = "sp3"
# globals()[this] = SP(polar = "n")
# globals()[this].data = german_wfd
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# if not globals()[this].grammar:
#     evaluate_wfd_words(generate_sp_empty(globals()[this].alphabet, n = 1000), voiced = ("b", "d", "g"))
# else:
#     evaluate_wfd_words(globals()[this+"_sample"], voiced = ("b", "d", "g"))
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("First 30 restrictions:", globals()[this].grammar[:30])

## Experiment 2: Single vowel harmony, no blockers

### Artificial grammar

In [70]:
this = "sp4"
globals()[this] = SP(polar = "n")
globals()[this].data = toy_vhnb
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
if not globals()[this].grammar:
    harmonic_evaluator(generate_sp_empty(globals()[this].alphabet, n = 1000), single_harmony_no_blockers)
else:
    harmonic_evaluator(globals()[this+"_sample"], single_harmony_no_blockers)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of harmonic words: 100%.
--------------------------
Generates such strings: ['', 'aaxaaxxax', 'ooooxxxox', 'axaxaaxaxa', 'xaaaxxxax', 'oooooxoxoxx', 'aaxax', 'axaxaaaaxaaa', 'ooooxxoo', 'xaaaaaaaxx', 'oxooox', 'oxxxxoxx', 'xxxxxa', 'xaxaa', 'xaxaxaa']
--------------------------
Size of the grammar: 2
--------------------------
First 30 restrictions: [('a', 'o'), ('o', 'a')]


### Simplified Finnish harmony

In [71]:
# this = "sp5"
# globals()[this] = SP(polar = "n")
# globals()[this].data = finnish_harmony_masked
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# if not globals()[this].grammar:
#     harmonic_evaluator(generate_sp_empty(globals()[this].alphabet, n = 1000), front_harmony)
# else:
#     harmonic_evaluator(globals()[this+"_sample"], front_harmony)
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("First 30 restrictions:", globals()[this].grammar[:30])

### Finnish harmony

In [72]:
# this = "sp6"
# globals()[this] = SP(polar = "n")
# globals()[this].data = finnish_harmony
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# if not globals()[this].grammar:
#     harmonic_evaluator(generate_sp_empty(globals()[this].alphabet, n = 1000), front_harmony)
# else:
#     harmonic_evaluator(globals()[this+"_sample"], front_harmony)
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("First 30 restrictions:", globals()[this].grammar[:30])

## Experiment 3: Single vowel harmony with blockers

In [73]:
this = "sp7"
globals()[this] = SP(polar = "n")
globals()[this].data = toy_vhwb
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
if not globals()[this].grammar:
    harmonic_evaluator(generate_sp_empty(globals()[this].alphabet, n = 1000), single_harmony_with_blockers)
else:
    harmonic_evaluator(globals()[this+"_sample"], single_harmony_with_blockers)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of harmonic words: 83%.
--------------------------
Generates such strings: ['', 'oxfxa', 'oofxf', 'xfaxaaxfffa', 'oxaaf', 'oxofaf', 'xxoaxxax', 'fxfxf', 'axxfxafx', 'xoooxoxxa', 'oxffxxa', 'fxxafa', 'oaafa', 'xfaf', 'xofaafax']
--------------------------
Size of the grammar: 2
--------------------------
First 30 restrictions: [('a', 'o'), ('f', 'o')]


## Experiment 4: Two vowel harmonies, no blockers

In [74]:
this = "sp8"
globals()[this] = SP(polar = "n")
globals()[this].data = toy_shnb
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
if not globals()[this].grammar:
    harmonic_evaluator(generate_sp_empty(globals()[this].alphabet, n = 1000), double_harmony)
else:
    harmonic_evaluator(globals()[this+"_sample"], double_harmony)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of harmonic words: 100%.
--------------------------
Generates such strings: ['', 'uuxuu', 'uxuxuuu', 'exx', 'uuuuuxxuu', 'aaxax', 'eexxxxx', 'exexexx', 'axxxxxxaaa', 'uuxu', 'ex', 'uuxuuxuuuu', 'axxaxaaxaa', 'xaxaa', 'eexeeexxxex']
--------------------------
Size of the grammar: 12
--------------------------
First 30 restrictions: [('a', 'e'), ('a', 'o'), ('a', 'u'), ('e', 'a'), ('e', 'o'), ('e', 'u'), ('o', 'a'), ('o', 'e'), ('o', 'u'), ('u', 'a'), ('u', 'e'), ('u', 'o')]


## Experiment 5: Two vowel harmonies with vowel blockers

### Artificial grammar

In [75]:
this = "sp9"
globals()[this] = SP(polar = "n")
globals()[this].data = toy_mhwb
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
if not globals()[this].grammar:
    harmonic_evaluator(generate_sp_empty(globals()[this].alphabet, n = 1000), backness_and_rounding)
else:
    harmonic_evaluator(globals()[this+"_sample"], backness_and_rounding)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of harmonic words: 72%.
--------------------------
Generates such strings: ['', 'uaxxxaIaaII', 'oxIxaIaI', 'oxuxax', 'uIIxIxIIaIaxaI', 'exx', 'aIIaa', 'UUixee', 'iixieexeexxiex', 'eixiiixixex', 'Oieix', 'xaI', 'Oxxieee', 'iixiiie', 'uuIx']
--------------------------
Size of the grammar: 44
--------------------------
First 30 restrictions: [('I', 'O'), ('I', 'U'), ('I', 'e'), ('I', 'i'), ('I', 'o'), ('I', 'u'), ('O', 'I'), ('O', 'O'), ('O', 'a'), ('O', 'o'), ('O', 'u'), ('U', 'I'), ('U', 'O'), ('U', 'a'), ('U', 'o'), ('U', 'u'), ('a', 'O'), ('a', 'U'), ('a', 'e'), ('a', 'i'), ('a', 'o'), ('a', 'u'), ('e', 'I'), ('e', 'O'), ('e', 'U'), ('e', 'a'), ('e', 'o'), ('e', 'u'), ('i', 'I'), ('i', 'O')]


### Simplified Turkish harmony

In [76]:
# this = "sp10"
# globals()[this] = SP(polar = "n")
# globals()[this].data = turkish_harmony_masked
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# if not globals()[this].grammar:
#     harmonic_evaluator(generate_sp_empty(globals()[this].alphabet, n = 1000), backness_and_rounding)
# else:
#     harmonic_evaluator(globals()[this+"_sample"], backness_and_rounding)
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("First 30 restrictions:", globals()[this].grammar[:30])

### Turkish harmony

In [77]:
# this = "sp11"
# globals()[this] = SP(polar = "n")
# globals()[this].data = turkish_harmony
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# if not globals()[this].grammar:
#     harmonic_evaluator(generate_sp_empty(globals()[this].alphabet, n = 1000), backness_and_rounding)
# else:
#     harmonic_evaluator(globals()[this+"_sample"], backness_and_rounding)
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("First 30 restrictions:", globals()[this].grammar[:30])

## Experiment 6: Vowel harmony and consonant harmony, no blockers

In [78]:
this = "sp12"
globals()[this] = SP(polar = "n")
globals()[this].data = toy_dhnb
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
if not globals()[this].grammar:
    harmonic_evaluator(generate_sp_empty(globals()[this].alphabet, n = 1000), double_harmony_no_blockers)
else:
    harmonic_evaluator(globals()[this+"_sample"], double_harmony_no_blockers)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of harmonic words: 100%.
--------------------------
Generates such strings: ['', 'baababa', 'paaaap', 'oooboooboobobo', 'pppppaapa', 'aapapaaap', 'oppppop', 'bboobb', 'paapapp', 'opopooop', 'ooboboboo', 'aapapppppppppa', 'apapaaap', 'obbob', 'papappa']
--------------------------
Size of the grammar: 4
--------------------------
First 30 restrictions: [('a', 'o'), ('b', 'p'), ('o', 'a'), ('p', 'b')]


## Experiment 7: Vowel harmony and consonant harmony with blockers

In [79]:
this = "sp13"
globals()[this] = SP(polar = "n")
globals()[this].data = toy_dhwb
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
if not globals()[this].grammar:
    harmonic_evaluator(generate_sp_empty(globals()[this].alphabet, n = 1000), double_harmony_with_blockers)
else:
    harmonic_evaluator(globals()[this+"_sample"], double_harmony_with_blockers)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of harmonic words: 83%.
--------------------------
Generates such strings: ['', 'bottttpto', 'totoppotp', 'pooptp', 'ottpo', 'bpaatp', 'bpopott', 'obppptot', 'potto', 'atapap', 'aptpata', 'bpttaat', 'pot', 'tto', 'bappatttapa']
--------------------------
Size of the grammar: 4
--------------------------
First 30 restrictions: [('a', 'o'), ('o', 'a'), ('p', 'b'), ('t', 'b')]


## Experiment 8: Unbounded tonal plateauing

In [80]:
this = "sp14"
globals()[this] = SP(polar = "n", k = 3)
globals()[this].data = toy_utp
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 100)
if not globals()[this].grammar:
    evaluate_utp_strings(generate_sp_empty(globals()[this].alphabet, n = 100))
else:
    evaluate_utp_strings(globals()[this+"_sample"])
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of well-formed tonal layers: 100%.
--------------------------
Generates such strings: ['', 'LHHHHHHHL', 'LHLL', 'LHHLLL', 'HHHLLLLLLLLL', 'HLLLLLL', 'LLLLL', 'LHLLLLLL', 'HHHLLLLL', 'LLHLLLL', 'HHHLLL', 'LLHHL', 'LLHHHL', 'LH', 'LLLHH']
--------------------------
Size of the grammar: 1
--------------------------
First 30 restrictions: [('H', 'L', 'H')]


## Experiment 9: First-last harmony

In [81]:
this = "sp15"
globals()[this] = SP(polar = "n")
globals()[this].data = first_last_data
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 100)
if not globals()[this].grammar:
    evaluate_first_last_words(generate_sp_empty(globals()[this].alphabet, n = 100))
else:
    evaluate_first_last_words(globals()[this+"_sample"])
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of first-last harmonic words: 37%.
--------------------------
Generates such strings: ['', 'a', 'xxoa', 'oxxaao', 'xaaoa', 'aaoo', 'xa', 'aooax', 'xaoaoaaaoaxx', 'aaa', 'ooo', 'xoxax', 'oaxaa', 'oa', 'ao']
--------------------------
Size of the grammar: 0
--------------------------
First 30 restrictions: []


# Tier-based strictly local experiments

## Experiment 1: Word-final devoicing

### Artificial grammar

In [82]:
this = "tsl1"
globals()[this] = TSL(polar = "n")
globals()[this].data = toy_wfd
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
evaluate_wfd_words(globals()[this+"_sample"])
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("Tier:", globals()[this].tier)
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of well-formed words: 100%.
--------------------------
Generates such strings: ['abaabbaba', 'p', 'abapabappbpbpbapa', 'ba', 'aappbbbpaabba', 'pbabbba', 'abbppa', 'pbbaabbbbpbappbpba', 'pbp', 'abpbp', 'papa', 'bbaap', 'paaappba', 'p', 'p']
--------------------------
Size of the grammar: 2
--------------------------
Tier: ['a', 'b', 'p']
--------------------------
First 30 restrictions: [('b', '<'), ('>', '<')]


### German simplified word-final devoicing

In [83]:
# this = "tsl2"
# globals()[this] = TSL(polar = "n")
# globals()[this].data = german_wfd_masked
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# evaluate_wfd_words(globals()[this+"_sample"], voiced = ("b", "d", "g"))
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("Tier:", globals()[this].tier)
# print("--------------------------")
# print("First 30 restrictions:", globals()[this].grammar[:30])

### German word-final devoicing

In [84]:
# this = "tsl3"
# globals()[this] = TSL(polar = "n")
# globals()[this].data = german_wfd
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# evaluate_wfd_words(globals()[this+"_sample"], voiced = ("b", "d", "g"))
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("Tier:", globals()[this].tier)
# print("--------------------------")
# print("First 30 restrictions:", globals()[this].grammar[:30])

## Experiment 2: Single vowel harmony, no blockers

### Artificial grammar

In [85]:
this = "tsl4"
globals()[this] = TSL(polar = "n")
globals()[this].data = toy_vhnb
globals()[this].data.append("") # added to eliminate *>< on all tiers
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
harmonic_evaluator(globals()[this+"_sample"], single_harmony_no_blockers)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("Tier:", globals()[this].tier)
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of harmonic words: 100%.
--------------------------
Generates such strings: ['xxaxxaxxa', 'xxoxoxox', 'x', 'xxaxax', 'xxx', 'xaxax', 'xax', 'xx', 'xox', 'xxoxox', 'x', 'ax', 'xx', 'xxo', 'xxxaxx']
--------------------------
Size of the grammar: 2
--------------------------
Tier: ['a', 'o']
--------------------------
First 30 restrictions: [('a', 'o'), ('o', 'a')]


### Simplified Finnish harmony

In [86]:
# this = "tsl5"
# globals()[this] = TSL(polar = "n")
# globals()[this].data = finnish_harmony_masked
# globals()[this].data.append("") # added to eliminate *>< on all tiers
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# harmonic_evaluator(globals()[this+"_sample"], front_harmony)
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("Tier:", globals()[this].tier)
# print("--------------------------")
# print("First 30 restrictions:", globals()[this].grammar[:30])

### Finnish harmony

In [87]:
# this = "tsl6"
# globals()[this] = TSL(polar = "n")
# globals()[this].data = finnish_harmony
# globals()[this].data.append("") # added to eliminate *>< on all tiers
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# harmonic_evaluator(globals()[this+"_sample"], front_harmony)
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("Tier:", globals()[this].tier)
# print("--------------------------")
# print("First 30 restrictions:", globals()[this].grammar[:30])

## Experiment 3: Single vowel harmony with blockers

In [88]:
this = "tsl7"
globals()[this] = TSL(polar = "n")
globals()[this].data = toy_vhwb
globals()[this].data.append("") # added to eliminate *>< on all tiers
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
harmonic_evaluator(globals()[this+"_sample"], single_harmony_with_blockers)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("Tier:", globals()[this].tier)
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of harmonic words: 100%.
--------------------------
Generates such strings: ['xx', 'oxxfx', 'xfxxfxxafxxfxx', 'x', 'xfxxfxx', 'xxx', 'xxoxx', 'faxx', 'xxx', 'xaxxaxfxxaxxaxxx', 'xxxfx', 'xaxx', 'xox', 'xxxffxaxxfxx', 'x']
--------------------------
Size of the grammar: 3
--------------------------
Tier: ['a', 'f', 'o']
--------------------------
First 30 restrictions: [('a', 'o'), ('f', 'o'), ('o', 'a')]


## Experiment 4: Two vowel harmonies, no blockers

In [89]:
this = "tsl8"
globals()[this] = TSL(polar = "n")
globals()[this].data = toy_shnb
globals()[this].data.append("") # added to eliminate *>< on all tiers
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
harmonic_evaluator(globals()[this+"_sample"], double_harmony)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("Tier:", globals()[this].tier)
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of harmonic words: 100%.
--------------------------
Generates such strings: ['xxoxox', 'xxoxxoxx', 'xxoxxx', 'xxoxx', 'xxx', 'xxxoxxoxxooxxoxxx', 'xxxaxxax', 'xx', 'xx', 'xoxxxoxxo', 'xaxx', 'xxoxoxxoxox', 'xxxox', 'xxaaxx', 'xxoxxxoxxoxx']
--------------------------
Size of the grammar: 12
--------------------------
Tier: ['a', 'e', 'o', 'u']
--------------------------
First 30 restrictions: [('a', 'e'), ('a', 'o'), ('a', 'u'), ('e', 'a'), ('e', 'o'), ('e', 'u'), ('o', 'a'), ('o', 'e'), ('o', 'u'), ('u', 'a'), ('u', 'e'), ('u', 'o')]


## Experiment 5: Two vowel harmonies with vowel blockers

### Artificial grammar

In [90]:
this = "tsl9"
globals()[this] = TSL(polar = "n")
globals()[this].data = toy_mhwb
globals()[this].data.append("") # added to eliminate *>< on all tiers
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
harmonic_evaluator(globals()[this+"_sample"], backness_and_rounding)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("Tier:", globals()[this].tier)
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of harmonic words: 100%.
--------------------------
Generates such strings: ['xxexxexxxexxixxexixixxex', 'Uxxx', 'xxxexexxixxxexxexxxixxx', 'xxuxaxxIxxxaxaxaxxaxaxxxIxxIxaxxx', 'xx', 'xx', 'xxaxxax', 'xUxxexxixx', 'xuxaxxaxxaxxIxIx', 'xxoxxuxxaxxx', 'xax', 'xxxOxexxexxxixxexxeiixxixxixxixx', 'xxaxxaxxaxxaxaxxaxxaxx', 'xxaxxxaxxxaxx', '']
--------------------------
Size of the grammar: 48
--------------------------
Tier: ['I', 'O', 'U', 'a', 'e', 'i', 'o', 'u']
--------------------------
First 30 restrictions: [('I', 'O'), ('I', 'U'), ('I', 'e'), ('I', 'i'), ('I', 'o'), ('I', 'u'), ('O', 'I'), ('O', 'O'), ('O', 'a'), ('O', 'i'), ('O', 'o'), ('O', 'u'), ('U', 'I'), ('U', 'O'), ('U', 'a'), ('U', 'i'), ('U', 'o'), ('U', 'u'), ('a', 'O'), ('a', 'U'), ('a', 'e'), ('a', 'i'), ('a', 'o'), ('a', 'u'), ('e', 'I'), ('e', 'O'), ('e', 'U'), ('e', 'a'), ('e', 'o'), ('e', 'u')]


### Simplified Turkish harmony

In [91]:
# this = "tsl10"
# globals()[this] = TSL(polar = "n")
# globals()[this].data = turkish_harmony_masked
# globals()[this].data.append("") # added to eliminate *>< on all tiers
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# harmonic_evaluator(globals()[this+"_sample"], backness_and_rounding)
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("Tier:", globals()[this].tier)
# print("--------------------------")
# print("First 30 restrictions:", globals()[this].grammar[:30])

### Turkish harmony

In [92]:
# this = "tsl11"
# globals()[this] = TSL(polar = "n")
# globals()[this].data = turkish_harmony
# globals()[this].data.append("") # added to eliminate *>< on all tiers
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# harmonic_evaluator(globals()[this+"_sample"], backness_and_rounding)
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("Tier:", globals()[this].tier)
# print("--------------------------")
# print("First 30 restrictions:", globals()[this].grammar[:30])

## Experiment 6: Vowel harmony and consonant harmony, no blockers

In [93]:
this = "tsl12"
globals()[this] = TSL(polar = "n")
globals()[this].data = toy_dhnb
globals()[this].data.append("") # added to eliminate *>< on all tiers
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
harmonic_evaluator(globals()[this+"_sample"], double_harmony_no_blockers)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("Tier:", globals()[this].tier)
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of harmonic words: 71%.
--------------------------
Generates such strings: ['', '', 'b', '', 'pppa', 'p', 'appobo', '', 'p', 'a', 'booo', 'bo', 'a', 'opaa', '']
--------------------------
Size of the grammar: 4
--------------------------
Tier: ['a', 'b', 'o', 'p']
--------------------------
First 30 restrictions: [('a', 'o'), ('b', 'p'), ('o', 'a'), ('p', 'b')]


## Experiment 7: Vowel harmony and consonant harmony with blockers

In [94]:
this = "tsl14"
globals()[this] = TSL(polar = "n")
globals()[this].data = toy_dhwb
globals()[this].data.append("") # added to eliminate *>< on all tiers
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
harmonic_evaluator(globals()[this+"_sample"], double_harmony_with_blockers)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("Tier:", globals()[this].tier)
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of harmonic words: 69%.
--------------------------
Generates such strings: ['aba', 'pap', 'tabtppatap', 'o', 'b', 'p', 'a', '', '', 'o', 'btttpapa', 'tab', 'oota', 'p', '']
--------------------------
Size of the grammar: 5
--------------------------
Tier: ['a', 'b', 'o', 'p', 't']
--------------------------
First 30 restrictions: [('a', 'o'), ('b', 'p'), ('o', 'a'), ('p', 'b'), ('t', 'b')]


## Experiment 8: First-last harmony

In [95]:
this = "tsl15"
globals()[this] = TSL(polar = "n")
globals()[this].data = first_last_data
globals()[this].data.append("") # added to eliminate *>< on all tiers
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
evaluate_first_last_words(globals()[this+"_sample"])
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("Tier:", globals()[this].tier)
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of first-last harmonic words: 49%.
--------------------------
Generates such strings: ['', '', '', 'oaoxxaaoo', 'ooo', 'oo', 'aoxxa', 'oaaxoaoxxxoxaoxxxxaa', 'aaoaxo', 'oaxa', 'oxooa', 'o', '', '', '']
--------------------------
Size of the grammar: 2
--------------------------
Tier: ['a', 'o', 'x']
--------------------------
First 30 restrictions: [('x', '<'), ('>', 'x')]


## Experiment 9: Unbounded tonal plateauing

In [96]:
this = "tsl13"
globals()[this] = TSL(polar = "n", k = 3)
globals()[this].data = toy_utp
globals()[this].data.append("") # added to eliminate *>< on all tiers
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
evaluate_utp_strings(globals()[this+"_sample"])
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("Tier:", globals()[this].tier)
print("--------------------------")
print("First 30 restrictions:", globals()[this].grammar[:30])

Percentage of well-formed tonal layers: 89%.
--------------------------
Generates such strings: ['HHLL', 'LLHHHL', 'LHHLL', 'HLLHLL', 'LL', 'HHLLH', 'LL', 'HL', 'HHHHH', '', 'LLHHH', 'HH', 'HHH', '', 'LLL']
--------------------------
Size of the grammar: 3
--------------------------
Tier: ['H', 'L']
--------------------------
First 30 restrictions: [('H', 'L', 'H'), ('>', 'H', '<'), ('>', 'L', '<')]


# Multiple tier-based strictly local experiments

## Experiment 1: Word-final devoicing

### Artificial grammar

In [97]:
this = "mtsl1"
globals()[this] = MTSL(polar = "n")
globals()[this].data = toy_wfd
globals()[this].data.append("") # added to eliminate *>< on all tiers
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
evaluate_wfd_words(globals()[this+"_sample"])
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("Grammars:", globals()[this].grammar)

Percentage of well-formed words: 100%.
--------------------------
Generates such strings: ['aba', 'bbbpapa', '', 'a', 'pa', 'aba', 'bpa', '', 'appa', 'bpbabbp', 'app', 'bbpbap', '', 'a', 'apabp']
--------------------------
Size of the grammar: 1
--------------------------
Grammars: {('a', 'b', 'p'): [('b', '<')]}


### German simplified word-final devoicing

In [98]:
# this = "mtsl2"
# globals()[this] = MTSL(polar = "n")
# globals()[this].data = german_wfd_masked
# globals()[this].data.append("") # added to eliminate *>< on all tiers
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# evaluate_wfd_words(globals()[this+"_sample"], voiced = ("b", "d", "g"))
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("Grammars:", globals()[this].grammar)

### German word-final devoicing

In [99]:
# this = "mtsl3"
# globals()[this] = MTSL(polar = "n")
# globals()[this].data = german_wfd
# globals()[this].data.append("") # added to eliminate *>< on all tiers
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# evaluate_wfd_words(globals()[this+"_sample"], voiced = ("b", "d", "g"))
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("Grammars:", globals()[this].grammar)

## Experiment 2: Single vowel harmony, no blockers

### Artificial grammar

In [100]:
this = "mtsl4"
globals()[this] = MTSL(polar = "n")
globals()[this].data = toy_vhnb
globals()[this].data.append("") # added to eliminate *>< on all tiers
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
harmonic_evaluator(globals()[this+"_sample"], single_harmony_no_blockers)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("Grammars:", globals()[this].grammar)

Percentage of harmonic words: 100%.
--------------------------
Generates such strings: ['', 'ox', 'a', 'a', '', 'xx', 'ox', 'oo', 'axxa', 'oxoo', 'aaa', 'oxooox', '', 'ox', 'aax']
--------------------------
Size of the grammar: 1
--------------------------
Grammars: {('a', 'o'): [('o', 'a'), ('a', 'o')]}


### Simplified Finnish harmony

In [101]:
# this = "mtsl5"
# globals()[this] = MTSL(polar = "n")
# globals()[this].data = finnish_harmony_masked
# globals()[this].data.append("") # added to eliminate *>< on all tiers
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# harmonic_evaluator(globals()[this+"_sample"], front_harmony)
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("Grammars:", globals()[this].grammar)

### Finnish harmony

In [102]:
# this = "mtsl6"
# globals()[this] = MTSL(polar = "n")
# globals()[this].data = finnish_harmony
# globals()[this].data.append("") # added to eliminate *>< on all tiers
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# harmonic_evaluator(globals()[this+"_sample"], front_harmony)
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("Grammars:", globals()[this].grammar)

## Experiment 3: Single vowel harmony with blockers

In [103]:
this = "mtsl7"
globals()[this] = MTSL(polar = "n")
globals()[this].data = toy_vhwb
globals()[this].data.append("") # added to eliminate *>< on all tiers
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
harmonic_evaluator(globals()[this+"_sample"], single_harmony_with_blockers)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("Grammars:", globals()[this].grammar)

Percentage of harmonic words: 100%.
--------------------------
Generates such strings: ['oo', 'xaafxa', 'x', 'o', '', 'xxofxaxffa', 'oofx', 'oxffff', 'xfxxxaf', 'xxoo', '', 'aaaf', '', 'a', '']
--------------------------
Size of the grammar: 3
--------------------------
Grammars: {('a', 'f', 'o'): [('o', 'a')], ('f', 'o'): [('f', 'o')], ('a', 'o'): [('a', 'o')]}


## Experiment 4: Two vowel harmonies, no blockers

In [104]:
this = "mtsl8"
globals()[this] = MTSL(polar = "n")
globals()[this].data = toy_shnb
globals()[this].data.append("") # added to eliminate *>< on all tiers
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
harmonic_evaluator(globals()[this+"_sample"], double_harmony)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("Grammars:", globals()[this].grammar)

Percentage of harmonic words: 100%.
--------------------------
Generates such strings: ['xa', 'xaa', 'a', 'uuuxxxuxuuxxxux', 'x', 'oxo', 'o', 'exex', 'oox', 'uuuxx', 'o', '', 'eee', 'uxxuxx', '']
--------------------------
Size of the grammar: 6
--------------------------
Grammars: {('a', 'u'): [('a', 'u'), ('u', 'a')], ('e', 'o'): [('o', 'e'), ('e', 'o')], ('e', 'u'): [('u', 'e'), ('e', 'u')], ('o', 'u'): [('u', 'o'), ('o', 'u')], ('a', 'o'): [('a', 'o'), ('o', 'a')], ('a', 'e'): [('e', 'a'), ('a', 'e')]}


## Experiment 5: Two vowel harmonies with vowel blockers

### Artificial grammar

In [105]:
this = "mtsl9"
globals()[this] = MTSL(polar = "n")
globals()[this].data = toy_mhwb
globals()[this].data.append("") # added to eliminate *>< on all tiers
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
harmonic_evaluator(globals()[this+"_sample"], backness_and_rounding)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("Grammars:", globals()[this].grammar)

Percentage of harmonic words: 98%.
--------------------------
Generates such strings: ['', 'oa', 'IIaI', 'axIIaxIxxIxaaaaI', 'IaxxxI', 'I', '', 'IxxIIIax', 'Oxie', 'o', 'aIxIxaxI', 'uxaxxxIxx', 'xxi', 'a', 'e']
--------------------------
Size of the grammar: 32
--------------------------
Grammars: {('o',): [('o', 'o')], ('e', 'o'): [('o', 'e'), ('e', 'o')], ('i', 'u'): [('u', 'i'), ('i', 'u')], ('U', 'i'): [('i', 'U')], ('O', 'U', 'e', 'i', 'x'): [('O', 'i')], ('I', 'U'): [('U', 'I'), ('I', 'U')], ('a', 'e'): [('e', 'a'), ('a', 'e')], ('a', 'i'): [('a', 'i'), ('i', 'a')], ('O', 'a'): [('a', 'O'), ('O', 'a')], ('O', 'u'): [('u', 'O'), ('O', 'u')], ('I', 'a', 'u'): [('u', 'I')], ('O', 'o'): [('o', 'O'), ('O', 'o')], ('I', 'u'): [('I', 'u')], ('O', 'U'): [('U', 'O')], ('U', 'a'): [('U', 'a'), ('a', 'U')], ('I', 'e'): [('I', 'e'), ('e', 'I')], ('I', 'o'): [('I', 'o')], ('i', 'o'): [('i', 'o'), ('o', 'i')], ('O',): [('O', 'O')], ('O', 'e'): [('e', 'O')], ('I', 'O'): [('I', 'O'), ('O', 'I')]

### Simplified Turkish harmony

In [106]:
# this = "mtsl10"
# globals()[this] = MTSL(polar = "n")
# globals()[this].data = turkish_harmony_masked
# globals()[this].data.append("") # added to eliminate *>< on all tiers
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# harmonic_evaluator(globals()[this+"_sample"], backness_and_rounding)
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("Grammars:", globals()[this].grammar)

### Turkish harmony

In [107]:
# this = "mtsl11"
# globals()[this] = MTSL(polar = "n")
# globals()[this].data = turkish_harmony
# globals()[this].data.append("") # added to eliminate *>< on all tiers
# globals()[this].extract_alphabet()
# globals()[this].learn()
# globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
# harmonic_evaluator(globals()[this+"_sample"], backness_and_rounding)
# print("--------------------------")
# print("Generates such strings:", globals()[this+"_sample"][:15])
# print("--------------------------")
# print("Size of the grammar:", len(globals()[this].grammar))
# print("--------------------------")
# print("Grammars:", globals()[this].grammar)

## Experiment 6: Vowel harmony and consonant harmony, no blockers

In [108]:
this = "mtsl12"
globals()[this] = MTSL(polar = "n")
globals()[this].data = toy_dhnb
globals()[this].data.append("") # added to eliminate *>< on all tiers
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
harmonic_evaluator(globals()[this+"_sample"], double_harmony_no_blockers)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("Grammars:", globals()[this].grammar)

Percentage of harmonic words: 100%.
--------------------------
Generates such strings: ['o', 'ap', 'bbo', 'apaa', 'p', 'bbab', 'o', 'o', 'bo', 'op', '', '', 'po', 'apa', 'apapp']
--------------------------
Size of the grammar: 2
--------------------------
Grammars: {('a', 'o'): [('o', 'a'), ('a', 'o')], ('b', 'p'): [('b', 'p'), ('p', 'b')]}


## Experiment 7: Vowel harmony and consonant harmony with blockers

In [109]:
this = "mtsl13"
globals()[this] = MTSL(polar = "n")
globals()[this].data = toy_dhwb
globals()[this].data.append("") # added to eliminate *>< on all tiers
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
harmonic_evaluator(globals()[this+"_sample"], double_harmony_with_blockers)
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("Grammars:", globals()[this].grammar)

Percentage of harmonic words: 100%.
--------------------------
Generates such strings: ['topttp', 'aapt', 'b', 'oot', 'topot', 'tpotpt', 'op', 'o', 'otppppoppp', 'tooottp', 'b', '', 'a', 'bbto', 'poopo']
--------------------------
Size of the grammar: 4
--------------------------
Grammars: {('b', 'p'): [('p', 'b')], ('b', 'p', 't'): [('b', 'p')], ('a', 'o'): [('a', 'o'), ('o', 'a')], ('b', 't'): [('t', 'b')]}


## Experiment 8: Unbounded tonal plateauing

Impossible to check, we cannot have $3$-local MTSL learner. (But of course it'll fail.)

## Experiment 9: First-last harmony

In [110]:
this = "mtsl15"
globals()[this] = MTSL(polar = "n")
globals()[this].data = first_last_data
globals()[this].data.append("") # added to eliminate *>< on all tiers
globals()[this].extract_alphabet()
globals()[this].learn()
globals()[this+"_sample"] = globals()[this].generate_sample(n = 1000)
evaluate_first_last_words(globals()[this+"_sample"])
print("--------------------------")
print("Generates such strings:", globals()[this+"_sample"][:15])
print("--------------------------")
print("Size of the grammar:", len(globals()[this].grammar))
print("--------------------------")
print("Grammars:", globals()[this].grammar)

Percentage of first-last harmonic words: 50%.
--------------------------
Generates such strings: ['a', '', 'a', 'oxo', 'aoxoo', 'axaaoxooaooxaoxo', '', 'ooxo', 'a', '', 'oa', '', 'o', 'a', '']
--------------------------
Size of the grammar: 1
--------------------------
Grammars: {('a', 'o', 'x'): [('>', 'x'), ('x', '<')]}
