# Old Norse with CLTK

Process your Old Norse texts thanks to cltk. Here are presented several tools adapted to Old Norse.

In [1]:
# Set your own user path
USER_PATH = "/home/pi"

### Import Old Norse corpora
* old_norse_text_perseus contains different Old Norse books
* old_norse_texts_heimskringla contains the Eddas
* old_norse_models_cltk is data for a Part Of Speech tagger 

By default, corpora are imported into ~/cltk_data.

In [2]:
from cltk.corpus.utils.importer import CorpusImporter
onc = CorpusImporter("old_norse")
onc.import_corpus("old_norse_text_perseus")
onc.import_corpus("old_norse_texts_heimskringla")
onc.import_corpus("old_norse_models_cltk")

### Configure IPython

Configure IPython if you want to use this notebook
```bash
$ ipython profile create
$ ipython locate
$ nano ~/profile_default/ipython_config.py
```
Add it a the end of the file (without '#'):
```python
c.InteractiveShellApp.exec_lines = [
    'import sys; sys.path.append("~/cltk_data/old_norse")'
]
```
And... It's done!

### old_norse_text_perseus

In [3]:
import os
import json

corpus = os.path.join(USER_PATH, "cltk_data/old_norse/text/old_norse_text_perseus/plain_text/Ragnars_saga_loðbrókar_ok_sona_hans")
chapters = []
for filename in os.listdir(corpus):
    with open(os.path.join(corpus, filename)) as f:
        chapter_text = f.read()  # json.load(filename)
        print(chapter_text[:30])
        chapters.append(chapter_text)

Ögmundr er maðr nefndr, er kal
Sigurðr hefir átt sér einn fós
Nú halda þeir þangat, ok er þe
Nú líða stundir fram, ok var s
Herruðr hét jarl ríkr ok ágætr
Í þann tíma réð fyrir Danmörku
Nú er þat eitt sumar, at hann 
Þetta spyrst til skipa Ragnars
Nú ráða þeir þetta með sér, at
HEIMIR í Hlymdölum spyrr nú þe
Nú halda þeir í brott þaðan, þ
Eptir þetta fara þeir Hvítserk
Nú er sú stund var liðin, er á
Nú er þar til máls at taka, er
Nú ráða þeir þat með sér, at þ
Sá atburðr hefir verit út í lö
Nú segir hann, at honum lízt v
Nú er þat eitthvert sinn, at m
Nú berr svá til, at þeir koma 
Eysteinn hefir konungr heitit,


### old_norse_texts_heimskringla

In [4]:
import sys
from old_norse.text.old_norse_texts_heimskringla.text_manager import *
corpus_path = USER_PATH+"/cltk_data/old_norse/text/old_norse_texts_heimskringla"
here = os.getcwd()
os.chdir(corpus_path)
loader = TextLoader(os.path.join(corpus_path, "Sæmundar-Edda", "Atlakviða"), "txt")
print(loader.get_available_names())
complete_text = loader.load()
print(complete_text[:100])
os.chdir(here)

['Snorra-Edda', '__pycache__', 'Sæmundar-Edda']

Atlakviða

Dauði Atla

Guðrún Gjúkadóttir hefndi bræðra sinna, svá sem frægt er orðit. Hon drap fyr


### POS tagging
Unknown tags are marked with 'Unk'.

In [5]:
from cltk.tag.pos import POSTag
import cltk.tag.pos as cltkonpos
tagger = POSTag('old_norse')
sent = 'Hlióðs bið ek allar.'
tagger.tag_tnt(sent)

[('Hlióðs', 'Unk'),
 ('bið', 'VBPI'),
 ('ek', 'PRO-N'),
 ('allar', 'Q-A'),
 ('.', '.')]

### Word tokenizing
For now, the word tokenizer is basic, but Old Norse actually does not need a sophisticated one.

In [6]:
from cltk.tokenize.word import WordTokenizer
word_tokenizer = WordTokenizer('old_norse')
sentence = "Gylfi konungr var maðr vitr ok fjölkunnigr."
word_tokenizer.tokenize(sentence)

['Gylfi', 'konungr', 'var', 'maðr', 'vitr', 'ok', 'fjölkunnigr', '.']

### Old Norse Stop Words
A list of stop words was elaborated with the most insignificant words of a sentence. Of course, according to your needs, you can change it.

In [7]:
from nltk.tokenize.punkt import PunktLanguageVars
from cltk.stop.old_norse.stops import STOPS_LIST
sentence = 'Þat var einn morgin, er þeir Karlsefni sá fyrir ofan rjóðrit flekk nökkurn, sem glitraði við þeim'
p = PunktLanguageVars()

tokens = p.word_tokenize(sentence.lower())
[w for w in tokens if not w in STOPS_LIST]

['var',
 'einn',
 'morgin',
 ',',
 'karlsefni',
 'rjóðrit',
 'flekk',
 'nökkurn',
 ',',
 'glitraði']

### Swadesh list for Old Norse
In the following Swadesh list, an item may have several words if they have a similar meaning, and some words lack because I have not found any corresponding Old Norse word.

In [8]:
from cltk.corpus.swadesh import Swadesh
swadesh = Swadesh('old_norse')
words = swadesh.words()
words[:30]

['ek',
 'þú',
 'hann',
 'vér',
 'þér',
 'þeir',
 'sjá, þessi',
 'sá',
 'hér',
 'þar',
 'hvar',
 'hvat',
 'hvar',
 'hvenær',
 'hvé',
 'eigi',
 'allr',
 'margr',
 'nǫkkurr',
 'fár',
 'annarr',
 'einn',
 'tveir',
 'þrír',
 'fjórir',
 'fimm',
 'stórr',
 'langr',
 'breiðr',
 'þykkr']

By Clément Besnier, email address: clemsciences@aol.com