# Chinese translation of ancient poem

An ancient scroll was found in a cavern just south of the historical city of Xi'an. It presumably contains some wise prose written in Chinese. You have been tasked to translate the text to English.

## Files

In folder `datasets/chinese` you will find two files:

In [17]:
import os
from pathlib import Path
import pandas as pd

dirpath = Path("../datasets/chinese")
os.listdir(dirpath)

['word_list.txt', 'ancient_poem.txt']

### ancient_poem.txt
`ancient_poem.txt` contains the text found in the cavern. To help us get started, the text has been transcribed into [simplified chinese](https://eriksen.com/language/simplified-vs-traditional-chinese/) so that we can use the other file to translate.


### word_list.txt
`word_list.txt` contains word in __simplified chinese__ together with its [pinyin pronounciation](https://resources.allsetlearning.com/chinese/pronunciation/Introduction_to_pinyin) representation. It also contain the __english translation__ of the word. In this task we will not make use of the pinyin pronounciation.

Here we print the first few lines to show what is looks like:

In [22]:
# prints the first three lines of file
with open(dirpath / "word_list.txt") as f:
    for i in range(3):
        print(f.readline())

哭 ku1 cry

生气 sheng1qi4 get angry, take offence

选择 xuan3ze2 select, to pick, choose



Note that this file has a format that can not be easily read into a dataframe with `pd.read_csv`. The data columns are separated with whitespace, but the english translation column also has whitespace. It would be easier to first load the data into a list of lists and load that into a dataframe.

# Assignment

Your task is the following:

1. make a translation dataframe from `word_list.txt`
  - the index should have the chinese word
  - the dataframe should have two columns: `pinyin` and `english`
1. open `ancient_poem.txt` and iterate over each line, then doing a translation of each word
1. the end result should be a string variable containing the english translation

# Hints

### Translate a single word first

Before trying to translate the poem, try translating a word and see what the returned value looks like.

### Handling commas in english translation text

`str.split()` has an optional argument `maxsplit`. This can help you ignore the whitespaces used in the english translation.

### Making the english translated string
```python
# you can start with an empty string
poem_english = ""
# you can add words (and whitespace) to a string like this
poem_english += "word" + " "
# you can add a linebreak with "\n"
poem_english += "\n"
```

### Handling symbols ,.!
Some of the "words" in the text does not have a translation, such as the exlamation marks (!). Are we able to use Exception Handling for this scenario?

```python
example_word = "!"
try:
    poem_english += # try to translate example_word here
except KeyError:
    # no translation found, so we just use the original word
    poem_english += example_word
```
---
    

In [21]:
# Your code starts here!