**To do list:**
  * Write up the future work (k-MTSL, transductions)
  * Add "how to run" subsection
  * Fix intuitions for CS and RE languages
  * Fix references (include all links in text)
  * Fix typos!
  * \[after practice\] decide if I can add algo intuitions or not enough time

<div style="text-align: right">
    <i>
        AMP 2019 (October 12) <br>
        Alëna Aksënova
    </i>
</div>

# _SigmaPie_ for subregular grammar induction

## Subregular languages in phonology

This toolkit is relevant for anyone who is working or going to work with subregualar grammars both from the perspectives of theoretical linguistics and formal language theory.

**Why theoretical linguistics should be interested in formal language theory?** <br>
_Formal language theory_ explains how potentially infinite stringsets, or _formal languages_,
can be generalized to grammars encoding the desired patterns and what properties those
grammars have. It also allows one to compare different grammars with respect to parameters such as expressivity.

**Chomsky hierarchy** aligns main classes of formal languages with respect to their expressive power.
  * **Regular** grammars are as powerful as finite state devices or regular expressions: they can "count" only until certain threshold (no $a^{n}b^{n}$ patterns);
  * **Context-free** grammars have access to potentially infinite _stack_ that allows them to reproduce patterns that involve center embedding;
  * **Mildly context-sensitive** grammars are powerful enough to handle some types of cross-serial dependencies such as copying;
  * **Context sensitive** grammars are restricted to a finitely long [memory tape](https://en.wikipedia.org/wiki/Punched_tape) encoding the pattern;
  * **Recursively enumerable** grammars are as powerful as any theoretically possible computer in this universe, they can use infinitely long memory tape.



<img src="images/chomhier.png" width="600">


Both phonology and morphology frequently display properties of regular languages.

**Phonology** does not require the power of center-embedding. For example, consider a harmony where the first vowel agrees with the last vowel, second vowel agrees with the pre-last vowel, etc.
    
    GOOD: "arugula", "tropicalization", "electrotelethermometer", etc.
    BAD:  any other word violating the rule.


While it is a theoretically possible pattern, harmonies of that types are unattested in natural languages.

**Morphology** avoids center-embedding as well. In [Aksënova et al. (2016)](https://www.aclweb.org/anthology/W16-2019) we show that it is possible to iterate prefixes with the meaning "after" in Russian. In Ilocano, where the same semantics is expressed via a circumfix, its iteration is prohibited.
    
    RUSSIAN: "zavtra" (tomorrow), "posle-zavtra" (the day after tomorrow), 
             "posle-posle-zavtra" (the day after the day after tomorrow), ...
    ILOCANO: "bigat" (morning), "ka-bigat-an" (the next morning),
             <*>"ka-ka-bigat-an-an" (the morning after the next one).


Moreover, typological review of patterns shows that phonology and morphology do not require the full power of regular languages. As an example of an unattested pattern, [Heinz (2011)](http://jeffreyheinz.net/papers/Heinz-2011-CPF.pdf) provides a language where a word must have an even number of vowels to be well-formed.


Regular languages can be sub-divided into another nested hierarchy of languages decreasing in their expressive power: **subregular hierarchy**.


<img src="images/subreg.png" width="250">


This tutorial and _SigmaPie_ toolkit currently contains functionality for the following classes:
  * strictly piecewise (SP);
  * strictly local (SL);
  * tier-based strictly local (TSL);
  * multiple tier-based strictly local (MTSL).

## Functionality of the toolkit

  * **Learners** extract grammars from stringsets.
  * **Scanners** evaluate strings with respect to a given grammar.
  * **Sample generators** generate stringsets for a given grammar.
  * **FSM constructors** translate subregular grammars to finite state machines.
  * **Polarity converters** switch negative grammars to positive, and vice versa.

In [None]:
%cd
%cd Desktop/SigmaPie/code/

from main import *

## Strictly piecewise languages

**Negative strictly $k$-piecewise (SP)** grammars prohibit occurrence of sequences of $k$ symbols at an arbitrary distance from each other. The value of $k$ defines the size of the window of the grammar, or the length of the longest sequence that the grammar can prohibit. Alternatively, if the grammar is positive, it lists subsequences that are allowed in well-formed words of the language.

    k = 2
    POLARITY: negative
    GRAMMAR:  ab, ba
    LANGUAGE: accaacc, cbccc, cccacaaaa, ...
              <*>accacba, <*>bcccacbb, <*>bccccccca, ...
              
              
In phonology, an example of an SP pattren is _tone plateauing_ considered in [Jardine (2015,](https://adamjardine.net/files/jardinecomptone-short.pdf) [2016)](https://adamjardine.net/files/jardine2016dissertation.pdf).
For example, in Luganda (Bantu) a low tone (L) cannot intervene in-between two high tones (H): L is changed to H in such configuration.
The prosodic domain cannot have more than one stretch of H tones.

**Luganda verb and noun combinations** (Hyman and Katamba (2010), cited by Jardine (2016))

  * /tw-áa-mú-láb-a, walúsimbi/ $\Rightarrow$ tw-áá-mu-lab-a, walúsimbi <br>
    ‘we saw him, Walusimbi’ <br>
    **HHLLL, LHLL**
    
  * /tw-áa-láb-w-a walúsimbi/ $\Rightarrow$ tw-áá-láb-wá wálúsimbi <br>
    ‘we were seen by Walusimbi’ <br>
    **HHHHHHLL**
    
  * /tw-áa-láb-a byaa=walúsimbi/ $\Rightarrow$ tw-áá-láb-á byáá-wálúsimbi <br>
    ‘we saw those of Walusimbi’ <br>
    **HHHHHHHHLL**
    
This pattern can be described using SP grammar $G_{SP_{neg}} = \{HLH\}$.

### Learning tone plateauing pattern

Let us say that `tone_plat` represents a "toy" example of tonal plateauing (TP) pattern.

In [None]:
luganda = ["LLLL", "HHLLL", "LHHHLL", "LLLLHHHH"]

Our goal will be to learn the generalization behind TP.

Negative and positive SP grammars are implemented in the package in the `SP()` class.

In [None]:
tp_pattern = SP()

### Attributes of SP grammars
  * `polar` ("p" or "n") is the polarity of the grammar;
  * `alphabet` (list) is the set of symbols that the grammar uses;
  * `grammar` (list of tuples) is the list of allowed or prohibited substructures of the language;
  * `k` (int) is the size of the locality window of the grammar, by default it is $2$;
  * `data` (list of string) is the learning sample;
  * `fsm` (FSM object) is the finite state device that corresponds to the grammar; in this case, the devide is FSM family constructed according to [Heinz&Rogers(2013)](https://www.aclweb.org/anthology/W13-3007).
  
The initial step is to define the training sample and the alphabet.

In [None]:
tp_pattern.data = luganda
tp_pattern.alphabet = ["H", "L"]

By default, the locality window of the grammar is $2$ and the delimiters are ">" and "<".

In [None]:
print("Locality of the SP grammar:", tp_pattern.k)
print("Delimiters:", tp_pattern.edges)

All these attributes can be directly accessed. For example, let us change the locality of the window from $2$ to $3$:

In [None]:
tp_pattern.k = 3
print("Locality of the SP grammar:", tp_pattern.k)

### Methods for SP grammars
  * `check_polarity()` and `switch_polarity()` display and changes the polarity of the grammar;
  * `learn()` extracts prohibited or allowed subsequences from the training sample;
  * `scan(string)` tells if a given string is well-formed with respect to a learned grammar;
  * `extract_alphabet()` collects alphabet based on the provided data;
  * `generate_sample(n, repeat)` generates $n$ strings based on the given grammar; by default, `repeat` is set to False, and repetitions of the generated strings are not allowed, but this parameter can be set to True;
  * `fsmize()` creates the corresponding FSM family by following the steps outlined in [Heinz&Rogers(2013)](https://www.aclweb.org/anthology/W13-3007);
  * `subsequences(string)` returns all $k$-piecewise subsequences of the given string;
  * `generate_all_ngrams()` generates all possible strings of the length $k$ based on the provided alphabet.

**Checking and changing polarity of the grammar**

By default, the grammars are positive. The polarity can be checked by running the `check_polarity` method:

In [None]:
print("Polarity of the grammar:", tp_pattern.check_polarity())

If the polarity needs to be changed, this can be done using the `switch_polarity` method. It will automatically switch the grammar, if one is provided or already extracted, to the opposite one.

In [None]:
tp_pattern.switch_polarity()
print("Polarity of the grammar:", tp_pattern.check_polarity())

**Learning the SP grammar**

Method `learn` extracts allowed or prohibited subsequences from the learning sample based on the polarity of the grammar and the locality window. Currently, $k=2$ and the grammar is negative.

In [None]:
tp_pattern.learn()
print("Extracted grammar:", tp_pattern.grammar)

Indeed, it learned the TP pattern!

$n$-grams are represented as tuples of strings, because in this case, elements of the alphabet are not restricted to characters, and it allows for other representations to be learned as well.

**Scanning strings and telling if they are part of the language**

Method `scan` takes  string as input and returns True or False depending on if the current string is contained in the language of the grammar:

In [None]:
tp = ["HHHLLL", "L", "HHL", "LLHLLL"]
no_tp = ["LLLLHLLLLH", "HLLLLLLH", "LLLHLLLHLLLHL"]

print("Tonal plateauing:")
for string in tp:
    print("String", string, "is in L(G):", tp_pattern.scan(string))
    
print("\nNo tonal plateauing:")
for string in no_tp:
    print("String", string, "is in L(G):", tp_pattern.scan(string))

**Generating a data sample**

Based on the learned grammar, a data sample of the desired size can be generated.

In [None]:
sample = tp_pattern.generate_sample(n = 10)
print("Sample:", sample)

**Extracting subsequences**

Finally, this toolkit can be used also in order to extract subsequences from the input word by feeding it to the `subsequences` method.

In [None]:
tp_pattern.k = 3
print("k = 3:", tp_pattern.subsequences("regular"), "\n")
tp_pattern.k = 5
print("k = 5:", tp_pattern.subsequences("regular"))

While SP languages capture multiple long-distance processes such as tone plateauings or some harmonies, they are unable to capture local processes, or blocking effect.

## Strictly local languages

**Negative strictly $k$-local (SL)** grammars prohibit occurrence of consecutive substrings consisting of up to $k$ symbols. The value of $k$ in this case, defines the longest substring that cannot be present in a well-formed string of a language. Positive SL grammars defines substrings that can be present in the language.

Importantly, in order to define _first_ and _last_ elements, SL languages use delimiters (">" and "<") that indicate the beginning and the end of the string.

    k = 2
    POLARITY: positive
    GRAMMAR:  >a, ab, ba, b<
    LANGUAGE: ab, abab, abababab, ...
              <*>babab, <*>abaab, <*>bababba, ...

In phonology, very frequently changes involve adjacent segments, and the notion of locality is therefore extremely important. The discussion of local processes in phonology can be found in ([Chandlee 2014](http://dspace.udel.edu/bitstream/handle/19716/13374/2014_Chandlee_Jane_PhD.pdf)).


**Russian word-final devoicing**

In Russian, the final obstruent of a word cannot be voiced. <br>
  * "lug" \[luK\] _meadow_ $\Rightarrow$ "lug-a" \[luGa\] _of the meadow_
  * "luk" \[luK\] _onion_ $\Rightarrow$ "luk-a" \[luKa\] _of the onion_
  * "porog" \[paroK\] _doorstep_ $\Rightarrow$ "porog-a" \[paroGa\] _of the doorstep_
  * "porok" \[paroK\] _vice_ $\Rightarrow$ "porok-a" \[paroKa\] _of the vice_

### Learning word-final devoicing

Assume the following toy dataset where the following mapping is defined:
  * "a" stands for a vowel;
  * "b" stands for a voiced obstruent;
  * "p" stands for any other consonant.

In [None]:
russian = ["", "ababa", "babbap", "pappa", "pabpaapba" "aap"]

In this term, the Russian word-final devoicing generalization would be _"do not have "b" at the end of the word"_. However, in order to define "beginning" and "end", we need to use delimiters ">" and "<".

This pattern can then be described using SL grammar $G_{SL_{neg}} = \{b<\}$.

Let us initialize a SL object.

In [None]:
wf_devoicing = SL()
wf_devoicing.data = russian

### Attributes of SL grammars
  * `polar` ("p" or "n") is the polarity of the grammar;
  * `alphabet` (list) is the set of symbols that the grammar uses;
  * `grammar` (list of tuples) is the list of allowed or prohibited substructures of the language;
  * `k` (int) is the size of the locality window of the grammar, by default it is $2$;
  * `data` (list of string) is the learning sample;
  * `edges` (list of two characters) are the delimiters used by the grammar, the default value is ">" and "<";
  * `fsm` (FSM object) is the finite state device that corresponds to the grammar.
  
### Methods defined for SL grammars
  * `check_polarity()` and `switch_polarity()` display and changes the polarity of the grammar;
  * `learn()` extracts prohibited or allowed subsequences from the training sample;
  * `scan(string)` tells if a given string is well-formed with respect to a learned grammar;
  * `extract_alphabet()` collects alphabet based on the provided data;
  * `generate_sample(n, repeat)` generates $n$ strings based on the given grammar; by default, `repeat` is set to False, and repetitions of the generated strings are not allowed, but this parameter can be set to True;
  * `fsmize()` creates the corresponding FSA;
  * `clean_grammar()` removes useless $k$-grams from the grammar.

**Extracting alphabet and learning SL grammar**

As before, `learn()` method extracts dependencies from the data. It simply extracts $k$-grams of the indicated size from the data, and the default value of $k$ is $2$.

In [None]:
wf_devoicing.learn()
print("The grammar is", wf_devoicing.grammar)

In order to automatically extract the alphabet from the data, it is possible to run `extract_alphabet()`.

In [None]:
print("Original value of the alphabet is", wf_devoicing.alphabet)
wf_devoicing.extract_alphabet()
print("Modified value of the alphabet is", wf_devoicing.alphabet)

**Changing polarity of the grammar**

The grammar outputted above is positive. If we want to capture the pattern using restrictions rather then the allowed substrings, we can `switch_polarity()` of the grammar:

In [None]:
wf_devoicing.switch_polarity()
print("The grammar is", wf_devoicing.grammar)

**Scanning strings**

As before, `scan(string)` method returns True or False depending on the well-formedness of the given string with respect to the learned grammar.

In [None]:
wfd = ["apapap", "papa", "abba"]
no_wfd = ["apab", "apapapb"]

print("Word-final devoicing:")
for string in wfd:
    print("String", string, "is in L(G):", wf_devoicing.scan(string))
    
print("\nNo word-final devoicing:")
for string in no_wfd:
    print("String", string, "is in L(G):", wf_devoicing.scan(string))

**Generating data samples**

If the grammar is non-empty, the data sample can be generated in the same way as before for SP grammars: `generate_sample(n, repeat)`, where `n` is the number of examples that need to be generated, and `repeat` is a flag allowing or prohibiting repetitings of the same strings in the generated data.

In [None]:
sample = wf_devoicing.generate_sample(5, repeat = False)
print(sample)

**Cleaning grammar**

Potentially, a grammar that user provides can contain "useless" $k$-grams. For example, consider the following grammar:

In [None]:
sl = SL()
sl.grammar = [(">", "a"), ("b", "a"), ("a", "b"), ("b", "<"),
              (">", "g"), ("f", "<"), ("t", "t")]
sl.alphabet = ["a", "b", "g", "f", "t"]

This grammar contains $3$ useless bigrams:
  
  * `(">", "g")` can never be used because nothing can follow "g";
  * `("f", "<")` is useless because there is no way to start a string that would lead to "f";
  * `("t", "t")` has both problems listed above.
  
Method `clean_grammar()` detects and removes such $n$-grams by constructing a corresponding finite state machine, and trimming all inaccessible nodes of that FSM.

In [None]:
print("Old grammar:", sl.grammar)
sl.clean_grammar()
print("Clean grammar:", sl.grammar)

Even though SP and SL languages can capture a large portion of phonological well-formedness conditions, there are numerous examples of patterns that require increased complexity. For example, **harmony with a blocking effect** cannot be captured using SP grammars because they will "miss" a blocker, and cannot be encoded via SL grammars because they cannot be used for long-distance processes.

## Tier-based strictly local languages

**Tier-based strictly local (TSL)** grammars operate just like the strictly local ones, but they have power to _ignore_ a certain set of symbols completely. The set of symbols that is not ignored are called **tier** symbols, and the one that do not matter for the well-formedness of strings are the **non-tier** ones [(Heinz et al. 2011)](https://pdfs.semanticscholar.org/b934/bfcc962f65e19ae139426668e8f8054e5616.pdf).

_Example._ Assume that we have the following sets of tier and non-tier symbols.

    tier = [l, r]
    non_tier = [c, d]
    
Non-tiers symbols are ignored when the strings are being evaluated by TSL grammars, so the alphabets `tier` and `non_tier` define the following mapping:

  * <b>l</b>cc<b>r</b>dc<b>l</b>cddc<b>rl</b>c $\Rightarrow$ <b>lrlrl</b>
  * <b>rl</b>dcd<b>r</b>cc<b>l</b>dcd<b>r</b>d<b>l</b> $\Rightarrow$ <b>rlrlrl</b>
  * cdcddcdcdcdc $\Rightarrow \epsilon$

The strings on the right-hand side are called _tier images_ of the original strings, because all non-tier symbols are ignored in them. _TSL grammars then are SL grammars that operate over the tier._

Continuing the example above, let's prohibit "l" following "l" unless "r" intervenes, and also ban "r" following "r" unless "l" intervenes. Therefore we are creating a toy Latin dissimilation pattern that will be exemplified below. Over the `tier`, $G_{TSL_{neg}} = \{ll, rr\}$ expresses this rule.

Intuitively, TSL grammars make non-local dependencies local by evaluating only a tier image of a string.

**Latin liquid dissimilation**

In Latin, liquids tend to alternate: if the final liquid of the stem is "l", the adjectival affix is realized as "aris". And vise versa, if the final liquid is "r", the choice of the affix is "alis". Consider the examples below.

  * mi<b>l</b>ita<b>r</b>is \~ <*>mi<b>l</b>ita<b>l</b>is _"military"_
  * f<b>l</b>o<b>r</b>a<b>l</b>is \~ <*>f<b>l</b>o<b>r</b>a<b>r</b>is _"floral"_
  * p<b>l</b>u<b>r</b>a<b>l</b>is \~ <*>p<b>l</b>u<b>r</b>a<b>r</b>is _"plural"_
  
This pattern is _not SP_ because SP grammars cannot exhibit blocking effect, and it is _not SL_ either due to its long-distance nature.

In [None]:
lat_dissim = TSL()

### Attributes of TSL grammars
  * `polar` ("p" or "n") is the polarity of the grammar;
  * `alphabet` (list) is the set of symbols that the grammar uses;
  * `grammar` (list of tuples) is the list of allowed or prohibited substructures of the language;
  * `k` (int) is the size of the locality window of the grammar, by default it is $2$;
  * `data` (list of string) is the learning sample;
  * `edges` (list of two characters) are the delimiters used by the grammar, the default value is ">" and "<";
  * `fsm` (FSM object) is the finite state device that corresponds to the grammar;
  * `tier` (list) is the list of the tier symbols.
  
### Methods defined for TSL grammars
  * `check_polarity()` and `switch_polarity()` display and changes the polarity of the grammar;
  * `learn()` detects the tier symbols and learns the tier grammar;
  * `tier_image(string)` returns the tier image of a given string;
  * `scan(string)` tells if a given string is well-formed with respect to a learned grammar;
  * `extract_alphabet()` collects alphabet based on the provided data;
  * `generate_sample(n, repeat)` generates $n$ strings based on the given grammar; by default, `repeat` is set to False, and repetitions of the generated strings are not allowed, but this parameter can be set to True;
  * `fsmize()` creates the corresponding FSA;
  * `clean_grammar()` removes useless $k$-grams from the grammar.

### Learning liquid dissimilation

Assume the toy Latin dissimilation dataset, where we mask every non-liquid as "c".

In [None]:
lat_dissim.data = ["ccc", "lccrcccclcr", "lrl", "rcclc"]

We don't need to explicitly provide the alphabet. Instead, it can be extracted from the data by running `extract_alphabet()` method.

In [None]:
lat_dissim.extract_alphabet()

After the alphabet is extracted and the training sample is provided, we can learn the dependency.

In [None]:
lat_dissim.learn()
print('Tier:   ', lat_dissim.tier)
print('Grammar:', lat_dissim.grammar)

By-default, the grammars are positive, but this pattern is more clear when represented as a restriction. We can convert the positive grammar to negative with `switch_polarity()` method.

In [None]:
print("Initial polarity of the grammar:", lat_dissim.check_polarity(), "\n")
lat_dissim.switch_polarity()
print("New polarity of the grammar:", lat_dissim.check_polarity())
print("New grammar:", lat_dissim.grammar)

We can learn a negative grammar directly as well. For example, let us learn a pattern like this:

    aaabaaaa, baaaa, aaaaaba, aaaaaab, ...
    <*>aababaaa, <*>baaaababb, <*>aaaa, ...
    
In simple words, the desired pattern is _a single "b" must be present in a string_. Translating it to a pattern relevant for linguistics would give us _stress culminativity_, for example.

In [None]:
stress = TSL(polar="n")
stress.data = ["aaabaaaa", "baaaa", "aaaaaba", "aaaaaab"]
stress.extract_alphabet()
stress.learn()

print("Tier:    ", stress.tier)
print("Grammar: ", stress.grammar)

The learned negative TSL grammar prohibits an empty tier (stress must be present in a word), and prohibits a tier where there is more than a single stress.

Data sample generation is also available for the class of TSL languages. Repetition of the same items within the dataset can be allowed or prohibited by changing the parameter `repeat`.

In [None]:
print(stress.generate_sample(n=10, repeat=True))

In [None]:
print(stress.generate_sample(n=10, repeat=False))

The implemented learning algorithm for $k$-TSL languages is designed by [McMullin and Jardine (2017)](https://adamjardine.net/files/jardinemcmullin2016tslk.pdf), which is based on [Jardine and Heinz (2016)](http://jeffreyheinz.net/papers/Jardine-Heinz-2016-LTSLL.pdf).

**PLACEHOLDER: SEE IF THERE IS ENOUGH TIME TO INSERT A BRIEF INTUITION BEHIND THE ALGORITHM.**

However, there are some phonological processes that require more power than TSL. Some languages have more than just a single long-distance assimilation: for example, separate vowel and consonantal harmonies. In this case, one tier is not enough: putting both vowels and consonants on a single tier will not give the desired locality neither among vowels, nor among consonants. For cases like this, a subregular class of _multiple tier-based strictly local languages_ is especially useful.

## Multiple tier-based strictly local languages

There are numerous examples from the typological literature that show that there are phonological patterns complexity of which is beyond the power of TSL languages. Any pattern where several long-distance dependencies affect different sets of elements will require more power than TSL, see McMullin (2016) and Aksënova and Deshmukh (2018) for examples and discussion of those patterns.


**Two sibilant harmonies, only one of them has blockers**

The first example comes from Imdlawn Tashlhiyt (Hansson 2010). Sibilants agree in voicing and anteriority. 

  * <b>s</b>-a<b>s:</b>twa _CAUS-settle_
  * <b>S</b>-fia<b>S</b>r _CAUS-be.full.of.straw_
  * <b>z</b>-bru<b>z:</b>a _CAUS-crumble_
  * <b>Z</b>-m:<b>Z</b>dawl _CAUS-stumble_

However, while voicing harmony can be blocked by voiceless obstruents, they are transparent for the anteriority agreement.

  * <b>s</b>-m<b>X</b>a<b>z</b>aj _CAUS-loathe.each.other_
  * <b>S</b>-<b>q</b>u<b>Z:</b>i _CAUS-be.dislocated_

The blockers need to be projected in order to capture the voicing harmony, however, having those blockers on the tier would make sibilants non-adjacent anymore, and therefore would cause problems for the anteriority harmony.


**Vowel harmony and consonant harmony**

In Bukusu, vowels agree in height, whereas "l" assimilates to "r" if followed by "r" somewhere further in the word (Odden 1994).

  * <b>r</b><i>ee</i>b-<i>e</i><b>r</b>- _ask-APPL_
  * <b>l</b><i>i</i>m-<i>i</i><b>l</b>- _cultivate-APPL_
  * <b>r</b><i>u</i>m-<i>i</i><b>r</b>- _send-APPL_
  
The tier containing both vowels and liquids would not capture this picture. Intervening vowels would make the liquid spreading non-local over the tier, and intervening liquids would cause vowels to be potentially far away from each other over the tier.


**Multiple tier-based strictly local** grammars are a conjunction of multiple TSL grammars: they consist of several tiers, and restrictions defined for every one of those tiers. For example, consider the following toy example.


    Good strings: aaabbabba, oppopooo, aapapapp, obooboboboobbb, ...
    Bad strings:  <*>aabaoob, <*>paabab, <*>obabooo, ...
    Generalization: if a string contains "a", it cannot contain "o", and vice versa;
                    if a string contains "p", it cannot contain "b", and vice versa.
                    
Two tiers are required to encode this pattern: a tier of vowels ("o" and "a"), and a tier of consonants ("p" and "b"). Consider the following MTSL grammar:

$G_{MTSL_{neg}} = \{
                      T_1 = [a, o], G_1 = [ao, oa];
                      T_2 = [b, p], G_2 = [pb, bp]
                   \}$
    
It then restricts the string of its language to the ones that have the consistent choices of vowels and consonants.

### Learning independent vowel and consonant harmonies


In [None]:
data = ['aabbaabb', 'abab', 'aabbab', 'abaabb', 'aabaab', 'abbabb', 'ooppoopp',
        'opop', 'ooppop', 'opoopp', 'oopoop', 'oppopp', 'aappaapp', 'apap',
        'aappap', 'apaapp', 'aapaap', 'appapp', 'oobboobb', 'obob', 'oobbob',
        'oboobb', 'ooboob', 'obbobb', 'aabb', 'ab', 'aab', 'abb', 'oopp', 'op',
        'oop', 'opp', 'oobb', 'ob', 'oob', 'obb', 'aapp', 'ap', 'aap', 'app',
        'aaa', 'ooo', 'bbb', 'ppp', 'a', 'o', 'b', 'p', '']

The first step is to initialize an MTSL object.

In [None]:
harmony = MTSL()

### Attributes of TSL grammars
  * `polar` ("p" or "n") is the polarity of the grammar;
  * `alphabet` (list) is the set of symbols that the grammar uses;
  * `grammar` (list of tuples) is the list of allowed or prohibited substructures of the language;
  * `k` (int) is the size of the locality window of the grammar, by default it is $2$;
  * `data` (list of string) is the learning sample;
  * `edges` (list of two characters) are the delimiters used by the grammar, the default value is ">" and "<".
  
### Methods defined for TSL grammars
  * `check_polarity()` and `switch_polarity()` display and changes the polarity of the grammar;
  * `learn()` detects the tier symbols and learns the tier grammar;
  * `scan(string)` tells if a given string is well-formed with respect to a learned grammar;
  * `extract_alphabet()` collects alphabet based on the provided data.

Now we can initialize the `data` and `alphabet` attributes of the MTSL class, and apply the `learn` method to learn the tiers and the grammars that correspond to them.

In [None]:
harmony.data = data
harmony.extract_alphabet()
harmony.learn()

The value of the attribute `grammar` is represented in the following way:

    G = {
            tier_1 (tuple): tier_1_restrictions (list),
            tier_2 (tuple): tier_2_restrictions (list),
                ...
            tier_n (tuple): tier_n_restrictions (list)
        }

In [None]:
for i in harmony.grammar:
    print("Tier:", i)
    print("Restrictions;", harmony.grammar[i], "\n")

**PLACEHOLDER: SEE IF THERE IS ENOUGH TIME TO INSERT A BRIEF INTUITION BEHIND THE ALGORITHM.**


The grammar that is learned by default is positive and is pretty verbose, and can be easily converted to negative with the `switch_polarity` method.

In [None]:
print("Old polarity:", harmony.check_polarity())
harmony.switch_polarity()
print("New polarity:", harmony.check_polarity(), "\n")

for i in harmony.grammar:
    print("Tier:", i)
    print("Restrictions;", harmony.grammar[i], "\n")

As before, method `scan` tells if the given string well-formed with respect to the learned grammar.

In [None]:
good = ["apapappa", "appap", "popo", "bbbooo"]
bad = ["aoap", "popppa", "pabp", "popoa"]

for s in good:
    print("String", s, "is good:", harmony.scan(s))
print()
for s in bad:
    print("String", s, "is good:", harmony.scan(s))

**Current state of the MTSL-related research**

We are currently doing the theoretical work of extending the learning algorithm for MTSL languages from capturing $2$-local dependencies to $n$. Therefore this module of the toolkit will be updated as the theoretical work on this language class progresses.

## Future work

transducers


<img src="images/fig.png" width="400">

**Acknowledgements** 

I am very grateful to _Thomas Graf_, _Jeffrey Heinz_ and _Aniello De Santo_ whose input on different parts of this project was extremely helpful.

**Bibliography**

  * Make a reference to Chomsky (?) for the Chomsky hierarchy
  * Kaplan and Kay (1994)
  * Karttunen et al. (1992)
  * Shieber (1985)
  * Heinz (2011)
  * Aksenova et al (2016)
  * Jardine 2015 and 2016
  * Hyman and Katamba (2010)
  * Jeffrey Heinz and James Rogers. 2013. Learning subregular classes of languages with factored deterministic automata. In Proceedings of the 13th Meeting on the Mathematics of Language (MoL 13), pages 64–71, Sofia, Bulgaria. Association for Computational Linguistics.
  * Chandlee 2016
  * Heinz et al 2011 [link](https://pdfs.semanticscholar.org/b934/bfcc962f65e19ae139426668e8f8054e5616.pdf)
  * [McMullin and Jardine (2017)](https://adamjardine.net/files/jardinemcmullin2016tslk.pdf)
  * [Jardine and Heinz (2016)](http://jeffreyheinz.net/papers/Jardine-Heinz-2016-LTSLL.pdf)
  * McMullin, Kevin James. 2016. Tier-based locality in long-distance phonotactics: learnability and typology. Doctoral dissertation, University of British Columbia.
  * Aksenova, A.I., & Deshmukh, S.A. (2018). Formal Restrictions On Multiple Tiers.
  * Hansson, Gunnar Olafur. 2010b. Long-distance voicing assimilation in berber: spreading and/or agreement? In Proceedings of the 2010 annual conference of the Canadian Linguistic Association. Ottawa, Canada: Canadian Linguistic Association.
  * Odden, David. 1994. Adjacency parameters in phonology. Language 70:289–330.