# LELA60331 Computational Linguistics 1 Week 2

In today's session we are going to tackle some problems in Computational Morphology. We will start by applying what we learnt last week to a morphology problem. We are then going to learn about a few more regular expression functions in Python. We will then use these, along with our new knowledge of finite state transducers.


As we move through the problems you might find it useful to refer to this: https://www.dataquest.io/wp-content/uploads/2019/03/python-regular-expressions-cheat-sheet.pdf

Before that we need to import the Python re library.

In [None]:
import re

### Concatenative Morphology: Esperanto

The constructed language Esperanto uses simple affixation to form nouns as seen in the following table.

| &emsp;hundo&emsp; | &emsp;'dog'&emsp; | <br>
| &emsp;hundino&emsp; | &emsp;'female dog'&emsp;| <br>
| &emsp;hundinoj&emsp; | &emsp;'female dogs'&emsp;| <br>
| &emsp;hundinego&emsp; | &emsp;'big female dog'&emsp;| <br>
| &emsp;hundegino&emsp; | &emsp;'big female dog'&emsp;| <br>
| &emsp;hundinego&emsp; | &emsp;'big female dog'&emsp;| <br>
| &emsp;hundigetino&emsp; | &emsp;'big small female dog'&emsp;| <br>
| &emsp;hundinetego&emsp; | &emsp;'big small female dog'&emsp;| <br>

Problem 1a: Write a finite state accceptor that will accept all of the nouns seen in the table. All nouns include the noun-forming suffix -o and can also include the following:
- the feminine marker -in
- the dimunitive -et
- the augmentative (marking size) - eg
- the plural marker -j

Problem 1b: Write a single regular expression to implement your FSA.

In [None]:
re1 = re.compile("")
# Your pattern should accept these forms
re1.match("hundo")
re1.match("hundino")
re1.match("hundinoj")
re1.match("hundinego")
re1.match("hundegino")
re1.match("hundinego")
re1.match("hundigetino")
re1.match("hundinetego")
# Your pattern should not accept these forms
re1.match("hundonetegi")
re1.match("hundonitegi")
re1.match("hundi")
re1.match("hundoje")

### re.sub()

A very useful function for us is going to be re.sub. This finds all occurences of an input sequence and replaces them with a provided output:

In [None]:
sentence1="I wanted some exercise so I walked to work today and was tired afterwards"

In [None]:
re.sub('ed','ing',sentence1)

For lots of applications of re.sub you will need to "cascade" substitutions - apply one substitution and take the output as input to the next as in the following command which turns "I like both dogs and cats" into "I like  both lions and tigers":

In [None]:
sentence2 = "I like both dogs and cats"
s2 = re.sub('dogs','lions',sentence2)
s3 = re.sub('cats','tigers',s2)
print(s3)

Problem 2: Write a regular expression or series of regular expression to translate this first sentence of "Crime and Punishment" from past to present tense.

In [None]:
opening_sentence = "On an exceptionally hot evening early in July a young man came out of the garret in which he lodged in S. Place and walked slowly, as though in hesitation, towards K. bridge."

In [None]:
re.sub('','',opening_sentence)

### Groups

Grouping is a very powerful technique for picking out substrings from a string that matches a specified pattern. It is done using parentheses.

In [None]:
re.findall("(.*)(s)", "cats")

Problem 3: In this example grouping has separated a single noun from the plural marking -s. Rewrite the pattern so that it finds and similarly separates all plural nouns in sentence 3:

In [None]:
sentence3="I like dogs, cats and rabbits"

In [None]:
re.findall("(.*)(s)", sentence3)

### Combining sub with groups
The re.sub function and grouping become particularly powerful when they are combined. You can use parentheses to capture a particular substring within a pattern and then use it in your replacement string within sub. For example:


In [None]:
opening_sentence = "a young man came out of the garret in which he lodged in S. Place and walked slowly towards K. bridge."

In [None]:
re.sub('([a-z]+)ed','is \\1ing',opening_sentence)

You can include more than 1 group in a pattern, and groups can be included within other groups They are numbered from left based on the opening bracket. So for ((a)(b)), "ab" would be 1, "a" would be 2 and "b" would be 3. As in the following.

In [None]:
re.sub('^((a)(b))$','\\1 \\2 \\3',"ab")

Problem 4a: Use sub combined with groups to convert the sentence "man bites dog" into "dog bites man"

In [None]:
sentence = "man bites dog"
print(re.sub('','',sentence))

Problem 4b: Use sub combined with groups to convert the sentence "man strokes dog" into "dog is stroked by man"

In [None]:
sentence = "man strokes dog"
print(re.sub('','',sentence))

### Prosodically governed concatenation

In the lecture this week we encountered the comparative -est ending in English which can only be applied to monosyllabic or disyllabic words:

dumb -> dumbest

timid -> timidest

fantastic -> fantasticest*

Note - a syllable has the form V, CV, VC or CVC where C is a consonant (or cluster of consonants) and V is a vowel.

Problem 5a: Write a finite state transducer that appropriately performs this concatenation

Problem 5b: Write a re.sub function that appropriately performs this concatenation. The same input and output patterns should work for all of the cases below.

In [None]:
# vowels = [aeiou]
# consonants = [qwrtypsdfghjklzxcvbnm]

# Your pattern should add suffix here
re.sub("","","dumb")

In [None]:
# Your pattern should add suffix here
re.sub("","","timid")

In [None]:
# Your pattern should NOT add suffix here
re.sub("","","fantastic")


### Orthographic changes during concatenation

In the lecture we also saw the example of German diminutive suffixation:

hund -> hündchen

haus -> häuschen

blatt -> blätchen

maus -> mäuschen


Problem 6a: Write a finite-state transducer that performs diminutive suffixation in German for the examples above

Problem 6b: Write an re.sub function that performs diminutive suffixation in German for the examples below.

In [None]:
re.sub("","","hund")

In [None]:
re.sub("","","haus")

In [None]:
re.sub("","","blatt")

In [None]:
re.sub("","","maus")

If you find these straightforward then try rewriting them as a series of cascading operations that performs them all on the input s1

In [None]:
s1 = "hund haus blatt maus"
s2 = re.sub("","",s1)
s3= re.sub("","",s2)
s4 = re.sub("","",s3)
s4

### Broken plurals in Arabic

The table below shows four different patterns of "broken" pluralisation in Arabic.

&emsp;|&emsp; sg &emsp;| &emsp; pl &emsp;|<br>
a &nbsp; | &emsp; ɣurfah &emsp; | &emsp; ɣuraf &emsp;| &emsp; ‘room’ &emsp;|<br>
 &emsp; | &emsp; rukbah &emsp; | &emsp; rukab &emsp;| &emsp; ‘knee’ &emsp;|<br>
 &emsp; | &emsp; luʕbah &emsp; | &emsp; luʕab &emsp;| &emsp; ‘toy’ &emsp;|<br>
 &emsp; | &emsp; ʔusrah &emsp; | &emsp; ʔusar &emsp;| &emsp; ‘family’ &emsp;|<br>
 &emsp; | &emsp; nusxah &emsp; | &emsp; nusax &emsp;| &emsp; ‘copy’ &emsp;|<br>

b &nbsp; | &emsp; ħikmah &emsp; | &emsp; ħikam &emsp;| &emsp; ‘wisdom’ &emsp;|<br>
 &emsp; | &emsp; qitʕtʕah &emsp; | &emsp; qitʕatʕ &emsp;| &emsp; ‘female cat’ &emsp;|<br>
 &emsp; | &emsp; fitnah &emsp; | &emsp; fitan &emsp;| &emsp; ‘temptation’ &emsp;|<br>
 &emsp; | &emsp; miħnah &emsp; | &emsp; miħan &emsp;| &emsp; ‘ordeal’ &emsp;|<br>
 &emsp; | &emsp; sikkah &emsp; | &emsp; sikak &emsp;| &emsp; ‘rail’ &emsp;|<br>


c &nbsp; | &emsp; qalb &emsp; | &emsp; quluːb &emsp;| &emsp;  ‘heart’ &emsp;|<br>
 &emsp; | &emsp; baħθ &emsp; | &emsp; buħuːθ &emsp;| &emsp; ‘research’ &emsp;|<br>
 &emsp; | &emsp; taqs &emsp; | &emsp; tuquːs &emsp;| &emsp; ‘weather’ &emsp;|<br>
 &emsp; | &emsp; qasʕr &emsp; | &emsp; qusʕuːr &emsp;| &emsp; ‘castle’ &emsp;|<br>
 &emsp; | &emsp; ʕilm &emsp; | &emsp; ʕuluːm &emsp;| &emsp; ‘science’ &emsp;|<br>

d &nbsp; | &emsp; faːkihah &emsp; | &emsp; fawaːkih &emsp;| &emsp; ‘fruit’ &emsp;|<br>
 &emsp; | &emsp; baːrid͡ʒah &emsp; | &emsp; bawaːrid͡ʒ &emsp;| &emsp; ‘battleship’ &emsp;|<br>
 &emsp; | &emsp; raːʔiħah &emsp; | &emsp; rawaːʔiħ &emsp;| &emsp; ‘smell’ &emsp;|<br>
 &emsp; | &emsp; ʕaːtʕifah &emsp; | &emsp; ʕawaːtʕif &emsp;| &emsp; ‘emotion’ &emsp;|<br>
 &emsp; | &emsp; naːfiðah &emsp; | &emsp; nawaːfið &emsp;| &emsp; ‘window’ &emsp;|<br>




Problem 7a: Figure out the relationship between the singular and plural forms for each of the four groups in the table above.

Problem 7b [optional]: Write a finite-state transducer that takes in the singular forms and outputs the plural forms for one of the groups above.

Problem 7c: Write an re.sub function that takes in the singular forms and outputs the plural forms for one of the groups above.   

Problem 7d (to do in your own time): Solve problems 7b [optional] and 7c but for the other groups. If you are really feeling ambitious, try to solve all groups with a single re.sub call.

#### Group 1

In [None]:
re.sub('','',"ɣurfah")


In [None]:
re.sub('','',"rukbah")


In [None]:
re.sub('','',"luʕbah")


In [None]:
re.sub('','',"ʔusrah")

In [None]:
re.sub('','',"nusxah")

#### Group 2

In [None]:
re.sub('','',"ħikmah")

In [None]:
re.sub('','',"qitʕtʕah")

In [None]:
re.sub('','',"fitnah")

In [None]:
re.sub('','',"miħnah")

In [None]:
re.sub('','',"sikkah")

#### Group 3

In [None]:
re.sub('','',"qalb")

In [None]:
re.sub('','',"baħθ")

In [None]:
re.sub('','',"taqs")

In [None]:
re.sub('','',"qasʕr")

In [None]:
re.sub('','',"ʕilm")

#### Group 4

In [None]:
re.sub('','',"faːkihah")

In [None]:
re.sub('','',"baːrid͡ʒah")

In [None]:
re.sub('','',"raːʔiħah")

In [None]:
re.sub('','',"ʕaːtʕifah")

In [None]:
re.sub('','',"naːfiðah")

### Reduplication

As I said in the week 1 lecture, the Python regex package is more powerful that a finite state machine. An example of this is that finite state machines cannot perform unbounded reduplication, but a Python regex can.

We saw an example of unbounded reduplication from Bambara where a sequences of nouns were duplicated after a marker "-o-" to produce a particular meaning.

Problem 8 (to do in your own time): Write input and output regular expressions that will perform the Bambara operations seen in the lecture for the noun sequences below. Check that this work for noun sequences of any length. Make sure that you understand why this isn't possible with a finite state transducer.


In [None]:
re.sub('','','wulu')

In [None]:
re.sub('','','wulu-nyinina')

In [None]:
re.sub('','','wulu-nyinina-filela')