### 0. Preparation

If you're using [Google Colab](https://colab.research.google.com/?utm_source=scs-index), first upload this notebook (under the "Upload" tab). Once you're in the Colab, navigate to the "Files" panel to the left, create a new folder called "data" (by right-clicking and selecting "New folder"), and then upload the train, dev, and test files into the newly-created data folder (by right-clicking on the "data" folder and hitting "Upload"). 

### 1. Reading the data

We are given data files in the following format:

```
'd          d                    d
Bulls       b ʊ l z              b ʊ l z
Chinatowns  t͡ʃ aɪ n ə t aʊ n z   t͡ʃ aɪ . n ə . t aʊ n z
I'll        aɪ l                 aɪ l
Icelander   aɪ s l ə n d ə       aɪ s . l ə n . d ə
```

This is a [TSV](https://www.loc.gov/preservation/digital/formats/fdd/fdd000533.shtml) file which contains three tabulator-separated (```'\t'```) columns:

1. Orthographic word
2. IPA transcription
3. Syllabified IPA transcription

All IPA symbols are separated by single spaces and syllable boundaries are marked by a `.`

We recommend first writing a function `read_line()`, which takes a line (i.e. string consisting of three tab-separated fields) as input, e.g.:

```
"Chinatowns  t͡ʃ aɪ n ə t aʊ n z   t͡ʃ aɪ . n ə . t aʊ n z"
```

and converts it into a Python dictionary having the following format:

```
{"orth": "Chinatowns",
 "ipa": ["t͡ʃ", "aɪ", "n", "ə", "t", "aʊ", "n", "z"],
 "syll": [(["t͡ʃ", "aɪ"], 0, 2), (["n", "ə"], 2, 4), (["t", "aʊ", "n", "z"], 4, 8)]}
```

Apart from `syll`, the fields are pretty self-explanatory. In the `syll` field, we've got a list of tuples representing each syllable, its start index and end index (which is 1 + the index of its final character). E.g. the syllable `"n", "ə"`, in the example above, starts at index 2 and ends at index 4:

```
IPA:   "t͡ʃ", "aɪ", "n", "ə", "t", "aʊ", "n", "z"
index:  0     1     2    3    4    5     6    7
```

After implementing the function `read_line()`, you can implement a function `read_data()`, which reads a file into a list of dictionaries. 

Use `read_data()` to read the training, development and test data and store the result in variables `train_data`, `dev_data` and `test_data`. 

In [2]:
# your code here

### 2. Baseline

Today we will implement a very trivial baseline syllabifier function `baseline()`. It contains no phonological insight. Instead, it simply chops the input word into "syllables" of length 2. E.g. given the input:

```
t͡ʃ aɪ n ə t aʊ n z
```

the baseline function would syllabify:

```
t͡ʃ aɪ . n ə . t aʊ . n z
```

**Note:** If the input contains an odd number of IPA symbols, then the final symbol should constitute a singleton syllable. E.g, `aɪ s l ə n d ə -> aɪ s . l ə . n d . ə`. 

Given the input string:

```
["t͡ʃ", "aɪ", "n", "ə", "t", "aʊ", "n", "z"]
```

`baseline()` should return:

```
[(["t͡ʃ", "aɪ"], 0, 2), (["n", "ə"], 2, 4), (["t", "aʊ"], 4, 6), (["n", "z"], 6, 8)]
```

In [3]:
# your code here

### 3. Evaluation

We will evaluate the performance of the baseline system using [F1-score](https://deepai.org/machine-learning-glossary-and-terms/f-score).

![](https://upload.wikimedia.org/wikipedia/commons/2/26/Precisionrecall.svg)

E.g. given gold standard syllabified strings:

```
[(["t͡ʃ", "aɪ"], 0, 2), (["n", "ə"], 2, 4), (["t", "aʊ", "n", "z"], 4, 8)]
[(["aɪ", "s"], 0, 2), (["l", "ə", "n"], 2, 5), (["d", "ə"], 5, 7), (["s"], 7, 8)]
```

and a baseline system output:

```
[(["t͡ʃ", "aɪ"], 0, 2), (["n", "ə"], 2, 4), (["t", "aʊ"], 4, 6), (["n", "z"], 6, 8)]
[(["aɪ", "s"], 0, 2), (["l", "ə"], 2, 4), (["n", "d"], 4, 6), (["ə", "s"], 7, 8)]
```

we have 3 true positives: 

```
["t͡ʃ", "aɪ"], ["n", "ə"], ["aɪ", "s"] 
```
and 5 false positives:

```
["t", "aʊ"], ["n", "z"], ["l", "ə"], ["n", "d"], ["ə", "s"]
``` 

and 4 false negatives: 

```
["t", "aʊ", "n", "z"], ["l", "ə", "n"], ["d", "ə"], ["s"]
``` 

This results in precision: 

$$P = truepos/(truepos + falsepos) = 3/8$$ 

and recall: 

$$R = truepos/(truepos + falseneg) = 3/7$$ 

giving F1-score: 

$$F_1 = 2 * P * R / (P + R) = 0.4 $$

You should implement a function `evaluate()` which takes two lists as input: (1) a list of system output syllabified strings, and (2) a list of gold standard syllabified strings. The function then computes the F1-score and returns it. 

**Note:** You should sum up the true positive, false positive and false negative scores over the entire dataset before computing F1-score.

In [4]:
# your code here