## Multi-tiered sequence representation

### Pseudo words and phonological rules

When dealing with questions of historical phonology (but also synchronic phonology), linguists often forget to make clear to themselves what questions they want to investigate in the end. Apart from the large pool of potential questions that can be asked when dealing with phonological questions, we consider especially the following three tasks as important, especially from the perspective of computable applications:

1. (pseudo) word generation: the generation by whatever means of all *possible* sounds of a given language
2. phonological rule induction: the induction (inference) of rules that trigger phonological changes in 
  - diachrony (from a source language to a target language), but also in 
  - synchrony (from a set of source forms to target forms within the same language)
3. (pseudo) word prediction: the prediction of the possible outcome of a phonological process by which a source word form is converted, using a set of phonological rules, to a target form, again in
  - diachrony (from an ancestral form to a descendant form), and again also
  - synchrony (from a source form `[`or underlying form`]` to a target form `[`or surface form`]`)
  
These tasks do not cover all of phonology and phonological theory, but they capture many aspects that are dealt with in many schools of phonology, including generative syntax, optimality theory, and also classical historical linguistics.

The task of (pseudo) word generation can be seen as the fundamental task of many approaches to phonology. If we assume that phonology defines phonotactic rules for a given language by which a set of symbols (the phonemes of the language) are transformed into valid forms of the language under investigation, which will be readily accepted as possible words of that very languages by its native speakers, can often be found in the phonological literature, even if it is rarely addressed by the authors explicitly. In traditional grammars of a language, for example, we often find a chapter on the phonotactics of a given language, where the authors describe the general syllable structure, using well-known formulas, such as `(C)(C)V(C)`, by which the general structure of possible syllables is described (with C referring to consonants and V referring to vowels). We can also find it in more theoretical accounts of phonology where scholars try to explain why a certain language does not allow for specific words. 

Word generation is also reflected in spoken language itself, as we can easily test when asking native speakers of a given language, whether some word that we just artificially created might form a possible word in their native tongue. We can find extreme limitations in the phonotactic systems of the languages of the world, where almost all possible words or syllables are valid. In Mandarin Chinese, for example, the amount of possible syllables (and syllables also correspond to morphemes) is limited to about 1600 different forms, of which, however, only about 1200 are readily realized in the language. If you ask a Chinese speaker whether *ká* is a valid word in Mandarin Chinese, most would probably answer that this is the case, even if no morpheme or word exists, in which you can find this form. But if you ask them, whether *tré* is a valid form, the would almost certainly deny it. 

In languages with a more complex phonology, like German, it is more difficult to determine which words are potentially valid German words. Since pseudo words play an important role for psycholinguistic experiments ([Keyleers and Brysbaert 2010](:bib:Keyleers2010)), the unbiased generatoin of pseudo-words in different languages plays a crucial practical role for the research. But also in classical phonology, scholars discuss to which degree words that are possible but not reflected in a given language occur. Software for pseudo-word generation exists (Keuleers and Brysbaert 2010), but most of the time, psycholinguists generate their pseudo-word candidates for their experiments manually, usually by shuffling syllables. Given that the inference and description and comparison of the phonotactic rules underlying a given language are a typical example for the fundamental research with which classical phonology should be concerned, it is surprising that the tasks of pseudo word generation and phonotactic description are barely addressed from within the same framework. We think that it is crucial, especially for computer-assisted accounts on historical linguistics (but also synchronic phonology), to emphasize the similarity between the tasks and to work on unified frameworks in which these task can be tackled both from a computational and a theoretical perspective.

The two remaining tasks (rule inference and word prediction) are tightly connected with phonotactics, even if this may not seem to be obvious at the first sight. The similarity lies not only in the importance of the phonotactics of a given language to license a specific output (that needs to conform to the phonotactical rules of the target language), but more specifically in the dominance of the concept of phonetic or phonological context that plays a crucial role in both phonotactics and phonological change. From the phonotactic perspective, we can say that a language like German does not allow for voiced plosives (`[`b, d, g`]`) in the end of a word. From the perspective of phonological change, we can say that all voiced plosives in German words, if they occur in the end of the word, will be devoiced (`[`b, d, g`]` > `[`p, t, k`]` / `_`$). This rule results in morphonological alternations of voiced and voiceless plosives in the German plural (`[` h u n t `]` "the dog" vs. `[` h u n d ə `]` "the dogs").  

While rules and patterns such as final devoicing seem to be easy enough to handle in whatever framework, we have seen that the role that *context* (in whatever flavor it occurs) plays in synchronic and diachronic phonology can be quite complex, and that models of word generation that restrict the probability of finding an element in some position in a sound sequence only depend on the preceding segment will often hopelessly fail. We could handle final devoicing in German with these models by simply assuming a word generator that never ends a word in a voiced plosive, but we would fail in various other tasks, be they related to phonotactics, or historical sound change). +++ maybe one example +++

That we cannot use simple models in which the probability of finding a given sound only depends on the preceding sound, is unfortunate, since these models, which are usually called +++first order+++ Markov models, are very well investigated, and a large number of tools is available that could be directly used to investigate language data in empirical frameworks.

## Problems of current accounts in synchronic and diachronic phonology

If we consider how phonology deals with the problems of word generation, rule inference, and word prediction, we can see several problems. These problems do not pertain to the theory of restrictions and rules (our tasks 1 and 2) *per se*, but rather in the procedures used by linguists to infer those restrictions and rules. The general problem is that the inference of restrictions and rules is a hard task: we always deal with a multitude of possibilities, but linguists have the ambition to find the *best* ones among these possibilities without being very concrete about what would qualify as an "optimal" restriction or rule to describe a given language. Linguists invoke various concepts, ranging from *cross-linguistic considerations* (rules recurring in many languages, and thus having predictive force beyond the description of one single language), *parsimony* (claiming that their account requires less assumptions than a given account by their colleagues), or *elegance* (as we can also often find it in mathematics, although there is no real criterion to judge the elegance of a given proposal). 

In general, the question of how to infer the rules is chiefly ignored in most approaches, probably also, because heuristic procedures that would confront scholars with all possible explanations for restrictions and rules would deprive scholars of the fun of trying to come up with new rules themselves. As a result, most accounts on phonology are restricted to the anecdotal, often only seemingly formal explanation of carefully ecclected phenomena, and discussions rarely focus on the data, but are restricted to carefully chose examples by which scholars try to make the point that their given system explains the few phenomena considered in a given study better than previous account. What we need, however, are rigorous empirical proposals that do no longer focus on exceptions and extremely challenging cases, but instead try to prove their usefulness based on a clear-cut set of principles that could ideally be applied to as many languages as possible. Ideally, each new account on phonological modeling should be tested on a large range of languages, and should be able to show that the tasks of word generation (1) and word prediction (3) are handled in a better way than previous accounts.

What counts as *better* in this context is of course difficult to assess, but we think that a valid phonological theory should be based on parsimony with respect to a large sample of different languages. It should further be empirical and exhaustive, thus explaining not only the most challenging restrictions or exceptions, but ideally account for a very large number of words in the languages under consideration. 

What phonology nees, furthermore, is a clear-cut procedure for rule inference that, similar to recent attempts in mathematics to infer possible proofs computationally (+++ reference +++), can also be automated, at least to some degree, in order to make sure that the initial inference of rules is not biased by the respective phonological school from which a proposal is derived.

The problem of inferring rules conditioning sound changes or restricting the possible words of a given language is that the phenomena we are dealing with are not entirely *simple*: what can follow a given sound in a language does not always only depend on what preceded it. Furthermore, we may have to accept that extra-phonological factors also have an impact on what we consider as possible words and what we neglect. 

As an example, consider four-consonant clusters in German, which can only occur after short vowels, like in *Herbst* "autumn", *färbst* "(you) color", *stirbst* "you die", or *kalbst* "(you) give birth to a calf". The sequences *rbst* and *lbst* are extreme clusters in the German languages which are really rare and -- apart from *Herbst* -- restricted to verbs, where *-st* marks the second person singular. If we create a pseudo word *nurbst* in German, it is without question that German speakers can pronounce this word, and it is also likely that they would accept it as a possible German word form. However, based on my intuition as a native speaker, I would predict that speaker would have a harder time licensing the word as "German", if it was proposed as *der Nurbst* as compared to *du nurbst*, simply, because the consonant combination is more often encountered as a complex morpheme (where speakers thus detect the verbal ending for the second person and interprete it accordingly) than as a single morpheme (where *Herbst* is our only exemplar in German). 

We would thus assume that restrictions and rules may -- at least to some degree -- have some stochastic component that cannot entirely be derived from the phonology alone. If this is the case, this clearly shows that we need models that allow for the inclusion of "extra-phonological" information, both for the task of word generation and of rule induction.




## Regular expressions and multiple levels of representation

+++ discuss ccv templates in grammars, point to biology with profiles of regular expressions, mention multi-level representations +++

## Enhanced modeling of sound sequences with help of multi-tiered sequence models

+++ describe basic idea of problem that a sequence does not have memory +++

