# Chapter 7 - Training Flair Embeddings

This Jupyter notebook provides a resource to help you follow the code examples from the book more easily. The notebook covers all practical code snippets and exercises found in: Chapter 7 - Training Flair Embeddings.

## Training Flair embeddings on the world’s smallest language

### Preparing the dictionary

In [18]:
from flair.data import Dictionary

dictionary = Dictionary.load('chars')

In [19]:
from flair.data import Dictionary

dictionary = Dictionary()
toki_pona_symbols = 'ptksmnljwaeiou'
toki_pona_symbols += toki_pona_symbols.upper()

for c in toki_pona_symbols + '?. ':
    dictionary.add_item(c)

### Preparing the corpus

In [20]:
import requests

response = requests.get("https://git.io/J1dgd")

sentences = response.text.splitlines()
one_tenth_corp_len = int(len(sentences)/10)

test, valid, train = (
    sentences[:one_tenth_corp_len],
    sentences[one_tenth_corp_len:one_tenth_corp_len*2],
    sentences[one_tenth_corp_len*2:])

In [21]:
from tempfile import TemporaryDirectory
from os.path import join
from os import mkdir

dataset_dir_obj = TemporaryDirectory()
dataset_dir = dataset_dir_obj.name
train_dir = join(dataset_dir, 'train')
mkdir(train_dir)

with open(join(dataset_dir, "test.txt"), "w") as file:
    file.write(' '.join(test))

with open(join(dataset_dir, "valid.txt"), "w") as file:
    file.write(' '.join(valid))

with open(join(train_dir, "train_split_1"), "w") as file:
    file.write(' '.join(train))

In [22]:
from flair.trainers.language_model_trainer import TextCorpus

corpus = TextCorpus(dataset_dir,
                    dictionary,
                    forward=True,
                    character_level=True)

2021-12-01 04:48:34,283 read text file with 1 lines
2021-12-01 04:48:34,291 read text file with 1 lines


### Training the language model

In [23]:
from flair.models import LanguageModel
from flair.trainers.language_model_trainer import (
    LanguageModelTrainer)

language_model = LanguageModel(dictionary,
                               is_forward_lm=True,
                               hidden_size=64,
                               nlayers=1)

trainer = LanguageModelTrainer(language_model, corpus)
trainer.train('forward_model_directory',
              sequence_length=25,
              mini_batch_size=10,
              max_epochs=100)

2021-12-01 04:48:34,343 read text file with 1 lines
2021-12-01 04:48:34,349 shuffled
2021-12-01 04:48:34,372 Sequence length is 25
2021-12-01 04:48:34,375 Split 1	 - (04:48:34)
2021-12-01 04:48:35,168 | split   1 /  1 |   100/  124 batches | ms/batch  7.92 | loss  2.34 | ppl    10.33
2021-12-01 04:48:35,358 0 seconds for train split 1
2021-12-01 04:48:35,391 best loss so far 10000.00
2021-12-01 04:48:35,666 ('\nluko kakama. la? li. kapani kmakan iloko seKen okawakama. koli ulasoka keli kakansa. mama. kalasa. tanka li nini mi poma lon pinama. mi. pasannpa. pikenpa? nakalasokajan wanni kasama li. anawalala kikanpa jann epin sona nina. pani. kalal? ni.pa awanapo. seme? li? li toka. ekankawae kala okin lela. ninu lon kali ninasa. eni lini. kalaasa kapaka kanomepa. pikonpa silo okinsi pikelime? ni<unk> keli ni oni. jomen sinaawa. ena. kaekopo. kinaja li. nijan<unk> Nukakemanpa. kali sike. toki kawalalaa. kae mi panapo poki. kala esana. lipana naseka. wanasa jan. nijan anala laPa. kome. kala

2021-12-01 04:48:41,135 -----------------------------------------------------------------------------------------
2021-12-01 04:48:41,137 | end of split   1 /  1 | epoch   5 | time:  1.25s | valid loss  1.36 | valid ppl     3.89 | learning rate 20.0000
2021-12-01 04:48:41,138 -----------------------------------------------------------------------------------------
2021-12-01 04:48:41,146 Epoch time: 1.31
2021-12-01 04:48:41,168 read text file with 1 lines
2021-12-01 04:48:41,173 shuffled
2021-12-01 04:48:41,194 Sequence length is 25
2021-12-01 04:48:41,197 Split 1	 - (04:48:41)
2021-12-01 04:48:41,938 | split   1 /  1 |   100/  124 batches | ms/batch  7.39 | loss  1.15 | ppl     3.16
2021-12-01 04:48:42,113 0 seconds for train split 1
2021-12-01 04:48:42,161 best loss so far  1.36
2021-12-01 04:48:42,529 ('\n ni li kama. sina pini? ni li pimeja ni<unk> waso muli ni li kiwen ni li pini. kule li ni li ike sina. sina tawa sina. koki li pini ke pi. ni<unk> ni<unk> soweni. ni. sinsini. ni l

2021-12-01 04:48:48,147 best loss so far  1.23
2021-12-01 04:48:48,482 ('\n ni<unk> kalama suli ni li pini. tenpo insa mi pimi ni li pini. ni. sina kai li ken pi poki. ni. ko pimi. tenpo li kipisi. kule mi. jan ike ma sina. jan ma tomo ken pona. jan ike insa tomo mi. o ken tan tomo poki li pona. pipi pi moki e moku e ni<unk> lipu ni li pini. sina jo e kon. mani ni li ken pi ni li uta sina. jan pi kalama tan kasi li wile telo ni li li pini en  kon e ikun lon insa insa. sina sona. mi. tomo ni li pakili. meli mi. sina pini. ni li taki pi sowi anu li sewi kepeken tomo ko. uta sina jaki sina. nimi ni li pona. kipisi e lon sina. kin tawa sina li pini. sina ken pi toki e sona. kipisi li ike anu li jan li toki ni li sina. ko ali ni li jaki mute tan insi ni li sinso. nimi li wile e pipi ali pimeja Osuma tomo li tan poki ni li pini li wile pi kiwen Anki. ona li wile e kiwen li pini. pipini jaki. ni. nena. tenpo. pini. sina kiken ken li awen pini. kiwen mute ni ni li pini. ni li pipi tawa wile lo

2021-12-01 04:48:55,857 -----------------------------------------------------------------------------------------
2021-12-01 04:48:55,859 | end of split   1 /  1 | epoch  14 | time:  1.60s | valid loss  1.27 | valid ppl     3.56 | learning rate 20.0000
2021-12-01 04:48:55,860 -----------------------------------------------------------------------------------------
2021-12-01 04:48:55,867 Epoch time: 1.69
2021-12-01 04:48:55,892 read text file with 1 lines
2021-12-01 04:48:55,898 shuffled
2021-12-01 04:48:55,926 Sequence length is 25
2021-12-01 04:48:55,931 Split 1	 - (04:48:55)
2021-12-01 04:48:56,886 | split   1 /  1 |   100/  124 batches | ms/batch  9.50 | loss  1.04 | ppl     2.84
2021-12-01 04:48:57,085 1 seconds for train split 1
2021-12-01 04:48:57,119 best loss so far  1.21
2021-12-01 04:48:57,419 ('\n telo ni li pini li. ijo pi insa pini. kalama kalama pi. pimeja kipisi ni. kipisi. tomo e lipu muli tan ma. tenpo ike insa tomo kiwilo li jelo. nimi ni li ipini. ni. nimi uta pi ni

2021-12-01 04:49:03,040 ('\n ni li pakala e ni<unk> tenpo. tenpo pini li pana e ona. kan pona tan ni<unk> toki  ni li pona tawa pona. kili insa pini li pona. kulupu sona kamo kepeken poki ni li pana e nimi ni li pana anu ni lon poki. pi ni li pana e ken pona tawa tomo ona. mi. jakin telo kinsa toki e kiweki e kiwen li pana e ma. ko ni li ken kepeken en ni li pona ana Menta? toki li kama pini. nimi wile e ona. tenpo piki li ike li ike toki. kun taso li jo e poki. kon pimeja jan pona. kinsa monsuta mi. tenpo kini li alasa e kini. kinse ni li toki. pini pi poki pi wawa lawa sina. poki. kinsa. jelo. ni. ni li kama sona e kijetesantakanu ni li pona tawa tan ni. ni li kepeken. ona li pini. toki kon li ken pana. mi kepeken. e ni<unk> toki Leme? nimi nasa. soweli mi li pona tawa pi la ona. ona li ike li toki li pona. jan pona tawa jan Kama kulupu ni li pona anu kasi. jan Jkantake li ike ni li pini li pakala e kule pini. ni li jo e mani. ni li ken pona. ni<unk> kiwen pi sike jaki jelo. kiwen ke

2021-12-01 04:49:08,791 -----------------------------------------------------------------------------------------
2021-12-01 04:49:08,793 | end of split   1 /  1 | epoch  23 | time:  1.37s | valid loss  1.21 | valid ppl     3.36 | learning rate 20.0000
2021-12-01 04:49:08,795 -----------------------------------------------------------------------------------------
2021-12-01 04:49:08,801 Epoch time: 1.44
2021-12-01 04:49:08,823 read text file with 1 lines
2021-12-01 04:49:08,830 shuffled
2021-12-01 04:49:08,852 Sequence length is 25
2021-12-01 04:49:08,855 Split 1	 - (04:49:08)
2021-12-01 04:49:09,682 | split   1 /  1 |   100/  124 batches | ms/batch  8.25 | loss  1.00 | ppl     2.72
2021-12-01 04:49:09,877 1 seconds for train split 1
2021-12-01 04:49:09,910 best loss so far  1.17
2021-12-01 04:49:10,234 ('\n Kinsika ken awe jaki pimeja ni li jo e ni<unk> tenpo pini ni li lape pipi en kala. wawa ni li suwi Suki. toki. tomo ni. kike ni li pini. ilo jaki. ilo sina. soweli li kute e kiwen

2021-12-01 04:49:16,478 -----------------------------------------------------------------------------------------
2021-12-01 04:49:16,481 | end of split   1 /  1 | epoch  28 | time:  1.34s | valid loss  1.20 | valid ppl     3.31 | learning rate 20.0000
2021-12-01 04:49:16,482 -----------------------------------------------------------------------------------------
2021-12-01 04:49:16,489 Epoch time: 1.41
2021-12-01 04:49:16,516 read text file with 1 lines
2021-12-01 04:49:16,522 shuffled
2021-12-01 04:49:16,545 Sequence length is 25
2021-12-01 04:49:16,548 Split 1	 - (04:49:16)
2021-12-01 04:49:17,368 | split   1 /  1 |   100/  124 batches | ms/batch  8.18 | loss  1.00 | ppl     2.70
2021-12-01 04:49:17,550 1 seconds for train split 1
2021-12-01 04:49:17,582 best loss so far  1.17
2021-12-01 04:49:17,868 ('\n pipi lisoje kama pi ilo pini ni li ike li sina li pona li kama. mi awasike la sina ken pona ala suwi linja. tenpo ni li nimi li pona lukin e ona. jan Kata tan ni<unk> ipini li ope

2021-12-01 04:49:23,443 ('\n soweli mute li ken li jo e kin pi ni linja mensipo. ko sina li kipisi. ike pinin pini tan ni li jo. ni<unk> moje li ken ni. luki pini ni li pini e ko. ken ni li toki. kipi sike pini li kama. toki Lomu pipi ni li pipi. soli jo ala suwi. ona sike ni li pakala. ona li pona. jan lili ni li ken pona pi kilin kon lete. soweli linja kepeken e sike. soweli e linja anu uta li pona toki. ko pipi en ken pimeja li sike insa ni li sinpin pona. tenpo ni li pini. ni<unk> ilo ni li ken linja. kalisi ni. pimeja ni li pona e kike. toki sike pi sike piniki li ken ni li pona e ken poki la pipi ni li ken lon tenpo kiken keme wan. jan kute ona. toki weka pi ken kepeken e sina. sina. kin ken e sitelen ko. ike e sike wan. kiwen lama ni li wile teki. insa ko. mi kesi pini ni li pana e len suwi lukin. ni li ken kon. jan sewi me. kiwen mute e ni<unk> lipu pi. ko seme li ni tan ni<unk> tenpo pini la ona li sona anu pi ni. ken pi pipi ni li pini. sina li ken pini mi. ko seme wan tomo m

2021-12-01 04:49:30,206 -----------------------------------------------------------------------------------------
2021-12-01 04:49:30,212 | end of split   1 /  1 | epoch  37 | time:  2.51s | valid loss  1.01 | valid ppl     2.75 | learning rate 5.0000
2021-12-01 04:49:30,214 -----------------------------------------------------------------------------------------
2021-12-01 04:49:30,223 Epoch time: 2.58
2021-12-01 04:49:30,268 read text file with 1 lines
2021-12-01 04:49:30,275 shuffled
2021-12-01 04:49:30,319 Sequence length is 25
2021-12-01 04:49:30,323 Split 1	 - (04:49:30)
2021-12-01 04:49:31,787 | split   1 /  1 |   100/  124 batches | ms/batch 14.62 | loss  0.83 | ppl     2.29
2021-12-01 04:49:32,075 1 seconds for train split 1
2021-12-01 04:49:32,114 best loss so far  1.01
2021-12-01 04:49:32,452 ('\n sina sona ala e telo mute. ona li sona ala e kon. kasi li pona ala jaki ni lon ma tawa tomo ni. tenpo ali pi lipu mi li kule sewi lukin. lupa li kama luka. mi moku ala e sike ma to

2021-12-01 04:49:38,019 -----------------------------------------------------------------------------------------
2021-12-01 04:49:38,021 | end of split   1 /  1 | epoch  42 | time:  1.23s | valid loss  1.01 | valid ppl     2.75 | learning rate 5.0000
2021-12-01 04:49:38,022 -----------------------------------------------------------------------------------------
2021-12-01 04:49:38,029 Epoch time: 1.29
2021-12-01 04:49:38,051 read text file with 1 lines
2021-12-01 04:49:38,057 shuffled
2021-12-01 04:49:38,079 Sequence length is 25
2021-12-01 04:49:38,082 Split 1	 - (04:49:38)
2021-12-01 04:49:38,853 | split   1 /  1 |   100/  124 batches | ms/batch  7.68 | loss  0.81 | ppl     2.25
2021-12-01 04:49:39,028 0 seconds for train split 1
2021-12-01 04:49:39,066 best loss so far  1.00
2021-12-01 04:49:39,364 ('\n soweli li suwi ike. ona li ken lukin e kon. nanpa sina pi awasi li tawa li alasa e ma<unk> telo laso jelo li pali e mi mute tan ni<unk> ona sijelo seli o kama sona pali la lon poki

2021-12-01 04:49:44,726 -----------------------------------------------------------------------------------------
2021-12-01 04:49:44,729 | end of split   1 /  1 | epoch  47 | time:  1.29s | valid loss  1.03 | valid ppl     2.80 | learning rate 5.0000
2021-12-01 04:49:44,730 -----------------------------------------------------------------------------------------
2021-12-01 04:49:44,738 Epoch time: 1.35
2021-12-01 04:49:44,763 read text file with 1 lines
2021-12-01 04:49:44,768 shuffled
2021-12-01 04:49:44,789 Sequence length is 25
2021-12-01 04:49:44,793 Split 1	 - (04:49:44)
2021-12-01 04:49:45,557 | split   1 /  1 |   100/  124 batches | ms/batch  7.61 | loss  0.86 | ppl     2.36
2021-12-01 04:49:45,732 0 seconds for train split 1
2021-12-01 04:49:45,762 best loss so far  1.00
2021-12-01 04:49:46,047 ('\n jan mute li ken pakala e ona tawa sina li ike li ken sina. len linja li ike li tenpo pipi mute la kon pali ala pan pakala toki ni la telo pimeja ni li seme? komu. keloki ni li ike 

2021-12-01 04:49:52,213 ('\n meli nasa li jo e sike sina utala lukin. mute tu kon li sin li kama waso. mi ken ala tan ma tomo ike. sina wile moku pali kepeken ilo noka. o? kin li suwi mute. mi olin e kili nimi lawa kalama musi ona. meli meli ni li pali e soweli ike li open la ilo ni li lon insa telo lon nasin ni. lon ike lon li piEken ma Towasatu sin. kili waso li toki. jan mute li kon. soweli sama jan seme li jelo pimeja toki tan ni<unk> mi wile ala toki pi lupa en jan mute mute kepeken ilo suli. moku li pana e ma tomo laso suwi li tawa ma sona ala lete e len Lante la kili suwi la mi mute la ma sina li suwi lukin e lipu ni? mi alasa e ma ante ni? nasin palisa li jo e sitelen tawa ma lon anpa sina. supa lon mi ni li lon ala sin. lape. mi wile moku ala e mani lon insa tomo lon apesita. sina lon ala kasi. sina ken ken ala wawa Maku. o ni la ona li seli e moku mute. jan mute li kama kiwen. jan lili mi li nasa e telo nasa pi lipu olije? o lon tomo lape wan. tenpo lipu ni li mu suwi la ona 

2021-12-01 04:50:47,979 -----------------------------------------------------------------------------------------
2021-12-01 04:50:47,981 | end of split   1 /  1 | epoch  56 | time:  1.39s | valid loss  1.00 | valid ppl     2.73 | learning rate 1.2500
2021-12-01 04:50:47,982 -----------------------------------------------------------------------------------------
2021-12-01 04:50:47,990 Epoch time: 1.45
2021-12-01 04:50:48,011 read text file with 1 lines
2021-12-01 04:50:48,027 shuffled
2021-12-01 04:50:48,099 Sequence length is 25
2021-12-01 04:50:48,103 Split 1	 - (04:50:48)
2021-12-01 04:50:48,927 | split   1 /  1 |   100/  124 batches | ms/batch  8.22 | loss  0.78 | ppl     2.18
2021-12-01 04:50:49,117 1 seconds for train split 1
2021-12-01 04:50:49,150 best loss so far  1.00
2021-12-01 04:50:49,432 ('\n sila ala li tawa e telo jo jaki. soweli li wile ala wawa. mi jan Okama ni li utala e lipu sama? jan seme la jan li pakala poli ni. jan li pana e len. toE len jelo li pona tawa telo

2021-12-01 04:50:54,640 -----------------------------------------------------------------------------------------
2021-12-01 04:50:54,643 | end of split   1 /  1 | epoch  61 | time:  1.22s | valid loss  1.00 | valid ppl     2.73 | learning rate 1.2500
2021-12-01 04:50:54,644 -----------------------------------------------------------------------------------------
2021-12-01 04:50:54,650 Epoch time: 1.28
2021-12-01 04:50:54,671 read text file with 1 lines
2021-12-01 04:50:54,676 shuffled
2021-12-01 04:50:54,696 Sequence length is 25
2021-12-01 04:50:54,699 Split 1	 - (04:50:54)
2021-12-01 04:50:55,454 | split   1 /  1 |   100/  124 batches | ms/batch  7.53 | loss  0.77 | ppl     2.16
2021-12-01 04:50:55,626 0 seconds for train split 1
2021-12-01 04:50:55,658 best loss so far  1.00
2021-12-01 04:50:55,924 ('\n meli mi li pini e ona. sina moku lon ma tomo mi. monsuta sina li olin e toki sama telo sona e ni<unk> tenpo telo jaki e linja mute ala. jan utala li lon. uta pimeja li moku e kalam

2021-12-01 04:51:00,836 best loss so far  1.00
2021-12-01 04:51:01,101 ('\n tenpo telo ni li lon. uta ala kon li tawa ma telo suli? ni li open. nasin ona li pali e pini insa toki tu. suno li sike ni. mi sona e soweli? mi pilin ona li Kapuwi la ko ala sama musi ala. nanpa sin li telo. len lawa e ni<unk> wawa linja ni li pali lon insa tomo soweli li pana e e ni<unk> mi ken moku. kasi li walo. soweli li pilin e len sike pimeja ala tan ni<unk> lipu ni li ilo sona seme la mama Kalo li kama tan sewi la mi sona ala e mije ala. jan pali li suli ala tan ma ni la mi lon insa ala. e lipu sina. sina pana e tomo mi. soweli ni li pana e ona. soweli ni li pimeja lili ni la mi wile e wawa lon anpa telo pimeja? tenpo sini la tomo lon poku meli li pakala e sinpin lon esun e musi ike. mun ni li jan Pukute. len telo li jan wan. musi li ike la mi wile tawa tenpo en lipu ni li kama kalama sona ala pona. mun li wile moku e sona. ona li sama sona lon tomo ali. taso li seme? mi jo e lipu pi nasin. pi kili pona

2021-12-01 04:51:06,251 -----------------------------------------------------------------------------------------
2021-12-01 04:51:06,253 | end of split   1 /  1 | epoch  70 | time:  1.20s | valid loss  1.00 | valid ppl     2.71 | learning rate 0.3125
2021-12-01 04:51:06,255 -----------------------------------------------------------------------------------------
2021-12-01 04:51:06,261 Epoch time: 1.25
2021-12-01 04:51:06,281 read text file with 1 lines
2021-12-01 04:51:06,286 shuffled
2021-12-01 04:51:06,307 Sequence length is 25
2021-12-01 04:51:06,309 Split 1	 - (04:51:06)
2021-12-01 04:51:07,046 | split   1 /  1 |   100/  124 batches | ms/batch  7.36 | loss  0.76 | ppl     2.13
2021-12-01 04:51:07,216 0 seconds for train split 1
2021-12-01 04:51:07,246 best loss so far  1.00
2021-12-01 04:51:07,516 ('\n musi sin mi li jo e wlolio sama seme? sina wile e kiwen lili. mi pali e ni<unk> sina panlasa toki e poki kama utala li pali la kulupu li jo e pipi tomo tawa sina? sina mu ala kepek

2021-12-01 04:51:12,744 ('\n soweli Onsa pi. kasi li oka sina. palisi pi olin li pona lukin tawa ilosa. tomo li pali mute. kiwen pilin la mi sona ala e mani li pona mute. kili ma li kama ike? poki li kama esun e soweli lukin la mi alasa. sike ni li jo e. olin. o telo e sitelen li jaki anu pona la o ike li suli. jan mute li sona e mi. lon li moku e lawa mama sona e telo lon. pipi li esun pona lon nasin lawa lon sina? jan Ketesitomi li waso ala suli itele ni lon e toki makesimuwikalu. kili li suli la o pilin lukin e wawa sina? kon li pali e ni<unk> toki Pile li lon tenpo mute ni la ko pona tawa jan sewi mute tan ona tawa sina? mi pilin kon pi tu. mi pini e pan ala jan. sina telo. seli li pana e telo esun ni. mi ken ala seme? mi pana ala e mani li kin pilin e len linja pi lupa la ona li musi li laso li moli. ni li pana e kala moku e sitelen jelo la kulupu piwile lon poka mi. sina ken pana e ona. soweli li kama kulupu. lipu mi li nasa e kijetesantakalu lon tenpo suno ala. ilo nasa mute li 

2021-12-01 04:51:17,962 -----------------------------------------------------------------------------------------
2021-12-01 04:51:17,965 | end of split   1 /  1 | epoch  79 | time:  1.21s | valid loss  1.00 | valid ppl     2.71 | learning rate 0.0781
2021-12-01 04:51:17,966 -----------------------------------------------------------------------------------------
2021-12-01 04:51:17,973 Epoch time: 1.26
2021-12-01 04:51:17,993 read text file with 1 lines
2021-12-01 04:51:17,997 shuffled
2021-12-01 04:51:18,018 Sequence length is 25
2021-12-01 04:51:18,020 Split 1	 - (04:51:18)
2021-12-01 04:51:18,780 | split   1 /  1 |   100/  124 batches | ms/batch  7.58 | loss  0.76 | ppl     2.14
2021-12-01 04:51:18,952 0 seconds for train split 1
2021-12-01 04:51:18,982 best loss so far  1.00
2021-12-01 04:51:19,242 ('\n moli tawa mi li pona lukin. oko mi li pona tawa sina. soweli li sin e sitelen toki li kama jan loje anu lili li pakala kepeken e sike. mi wile telo li sona e sijelo o kute e mun so

2021-12-01 05:31:12,469 ('\n sijelo sina li jo e sijelo pi jan pona e jan Sakulasa kiwen sike ala. nena sewi en ni<unk> ona la kama tan ni<unk> sina ken toki e ilo seli ala. lupa sina li laso jelo. tomo li kama ni. ma ni li sona ala la mi wile sona ala e kalama sewi. o pakala e ona ni. mi pona tawa? ma Ken li ike alama pi toki pona. sina ken ala toki mute. poki li sona lon nasin sama seli e pali mi la mi moku e kon nanpa li kama wawa. sina moku pona lukin li sona tawa mi e litejaki tawa mi. uta sina li kama pini Lawa seme? seli li jo e ilo kepeken ala lon tomo lipu meli. monsi Piseme lete li open. waso li pona tawa sijelo. o walo. jan pali sima mute li Sasun li moli ala. kulupu ali li pakala e toki lukin? nena li sona e ni<unk> sina sijelo sina li ken ala sona ponako la o pona e ona. pipi pi jan utala tawa MAkama tan ni<unk> mi ken ala seme? seli ni li pona tawa kepeken kasi tan lupa. waso mute li sitelen e alasilo musi e poki la sina jaki tan ni<unk> kulupu ala. jan JukuAjetta loje mu

2021-12-01 06:25:14,243 -----------------------------------------------------------------------------------------
2021-12-01 06:25:14,246 | end of split   1 /  1 | epoch  88 | time:  1.56s | valid loss  1.00 | valid ppl     2.71 | learning rate 0.0781
2021-12-01 06:25:14,247 -----------------------------------------------------------------------------------------
2021-12-01 06:25:14,253 Epoch time: 1.63
2021-12-01 06:25:14,274 read text file with 1 lines
2021-12-01 06:25:14,279 shuffled
2021-12-01 06:25:14,300 Sequence length is 25
2021-12-01 06:25:14,304 Split 1	 - (06:25:14)
2021-12-01 06:25:15,196 | split   1 /  1 |   100/  124 batches | ms/batch  8.91 | loss  0.76 | ppl     2.14
2021-12-01 06:25:15,394 1 seconds for train split 1
2021-12-01 06:25:15,429 best loss so far  1.00
2021-12-01 06:25:15,720 ('\n tenpo suli tawa Ipalaka li loje en sama jan mute e ijo ala. jan sewi Wute li moku e ko mute la mi ken ala toki tawa mu. ni li alasa e mani. kala li mama mi. nasin Onsa li jo e moku

2021-12-01 06:25:22,933 best loss so far  1.00
2021-12-01 06:25:24,674 ('\n mama kute li utala e mama. jan. soweli ni li jo e sike musi la kalama sona sona e poki jaki e ni<unk> open li pona tawa jan sewi. telo pimeja li kama ala. o lete lon insa anu ni la kala Seteso la jan seme mute li lon insa luka. supa seli mute lon mi li pona e kili sina? mi wile pilin e ni<unk> mi pona ala lon tenpo mute. jan ni li jo e len sona kepeken kasi. linja li pona tawa Jusepeken pi kala ni. uta ona li pini e ni<unk> mi pona lukin. o lon insa mi la mi mute pi jan li olin e ni<unk> mi lukin. mi pali e toki mun esun. tenpo pimeja en supa sina li musi la jan Kan tawa ni li pakala e ona. mi mute li sona e sitelen toki mama sina? mi sona ala esun e tomo seli lon ma kepeken e oko oko pini. toki MEja li lon insa mama Sonku. kulupu sina li pona sinpin tan ni<unk> sina pini e ko sala loje nasa sike. moku Lopi li pakala e ma. ilo jaki li pona lukin. sina Manja. tenpo ni lukin anu kon. sina lape li nasa mute ala. w

2021-12-01 08:13:16,324 -----------------------------------------------------------------------------------------
2021-12-01 08:13:16,327 | end of split   1 /  1 | epoch  97 | time: 3236.27s | valid loss  1.00 | valid ppl     2.71 | learning rate 0.0195
2021-12-01 08:13:16,329 -----------------------------------------------------------------------------------------
2021-12-01 08:13:16,824 Epoch time: 3236.82
2021-12-01 08:13:16,874 read text file with 1 lines
2021-12-01 08:13:16,881 shuffled
2021-12-01 08:13:16,919 Sequence length is 25
2021-12-01 08:13:16,925 Split 1	 - (08:13:16)
2021-12-01 08:13:18,140 | split   1 /  1 |   100/  124 batches | ms/batch 12.12 | loss  0.76 | ppl     2.13
2021-12-01 08:13:18,453 1 seconds for train split 1
2021-12-01 08:13:18,515 best loss so far  1.00
2021-12-01 08:13:19,058 ('\n selo ona li lon seme? tenpo ansa lukin li lon supa. mi pona ala kon ni la ona li pana e telo ala sama ante. meli li toki. tenpo pimeja ni la e mije ala pimeja pi nimi pona. se

In [24]:
t = language_model.generate_text(number_of_characters=40)[0]
print(t)


 esun li kalama musi utala kepeken telo 


### Using custom embeddings on downstream tasks

In [25]:
from flair.embeddings import FlairEmbeddings

fw = FlairEmbeddings('forward_model_directory/best-lm.pt')

### Performing intrinsic evaluation on custom Flair embeddings

In [26]:
from flair.embeddings import FlairEmbeddings
from flair.data import Sentence

synonym_1 = Sentence('lukin')
synonym_2 = Sentence('oko')
rand_word = Sentence('jan')

fw = FlairEmbeddings('forward_model_directory/best-lm.pt')
fw.embed(synonym_1)
fw.embed(synonym_2)
fw.embed(rand_word)

embedding_syn_1 = synonym_1[0].embedding.tolist()
embedding_syn_2 = synonym_2[0].embedding.tolist()
embedding_rnd_wrd = rand_word[0].embedding.tolist()

In [27]:
from sklearn.metrics.pairwise import cosine_similarity as sim

s_synonym = sim([embedding_syn_1], [embedding_syn_2])[0][0] 
s_rand_1 = sim([embedding_syn_1], [embedding_rnd_wrd])[0][0]
s_rand_2 = sim([embedding_syn_2], [embedding_rnd_wrd])[0][0]

In [28]:
print(s_synonym , s_rand_1, s_rand_2)

0.5559661163545663 0.41215033631546194 0.5142904744116392
