# Old Norse with CLTK

Process your Old Norse texts thanks to cltk. Here are presented several tools adapted to Old Norse.

### Import Old Norse corpora
* old_norse_text_perseus contains different Old Norse books
* old_norse_texts_heimskringla contains the Eddas
* old_norse_models_cltk is data for a Part Of Speech tagger 

By default, corpora are imported into ~/cltk_data.

In [1]:
import os
from cltk.corpus.utils.importer import CorpusImporter

if "HOMEPATH" in os.environ:
    USER_PATH = os.environ["HOMEPATH"]
else:
    USER_PATH = os.environ["HOME"]

onc = CorpusImporter("old_norse")
onc.import_corpus("old_norse_text_perseus")
onc.import_corpus("old_norse_texts_heimskringla")
onc.import_corpus("old_norse_models_cltk")

### Configure IPython

Configure IPython if you want to use this notebook
```bash
$ ipython profile create
$ ipython locate
$ nano ~/.ipython/profile_default/ipython_config.py
```
Add it a the end of the file:
```python
c.InteractiveShellApp.exec_lines = [
    'import sys; sys.path.append("~/cltk_data/old_norse")'
]
```
And... It's done!

### old_norse_text_perseus

In [2]:
import os
import json

corpus = os.path.join(USER_PATH, "cltk_data/old_norse/text/old_norse_text_perseus/plain_text/Ragnars_saga_loðbrókar_ok_sona_hans")
chapters = []
for filename in os.listdir(corpus):
    with open(os.path.join(corpus, filename), encoding="utf-8") as f:
        chapter_text = f.read()  # json.load(filename)
        print(chapter_text[:30])
        chapters.append(chapter_text)

Nú berr svá til, at þeir koma 
HEIMIR í Hlymdölum spyrr nú þe
Ögmundr er maðr nefndr, er kal
Sigurðr hefir átt sér einn fós
Nú ráða þeir þat með sér, at þ
Eysteinn hefir konungr heitit,
Nú halda þeir í brott þaðan, þ
Nú er þat eitt sumar, at hann 
Þetta spyrst til skipa Ragnars
Herruðr hét jarl ríkr ok ágætr
Sá atburðr hefir verit út í lö
Nú er þar til máls at taka, er
Nú ráða þeir þetta með sér, at
Nú segir hann, at honum lízt v
Í þann tíma réð fyrir Danmörku
Nú er þat eitthvert sinn, at m
Nú líða stundir fram, ok var s
Nú halda þeir þangat, ok er þe
Nú er sú stund var liðin, er á
Eptir þetta fara þeir Hvítserk


### old_norse_texts_heimskringla

In [3]:
from eddas.text_manager import *
corpus_path = USER_PATH+"/cltk_data/old_norse/text/old_norse_texts_heimskringla"
here = os.getcwd()
os.chdir(corpus_path)
loader = TextLoader(os.path.join(corpus_path, "Sæmundar-Edda", "Atlakviða"), "txt")
print(loader.get_available_names())
complete_text = loader.load()
print(complete_text[:100])
os.chdir(here)

['Snorra-Edda', 'Sæmundar-Edda']

Atlakviða

Dauði Atla

Guðrún Gjúkadóttir hefndi bræðra sinna, svá sem frægt er orðit. Hon drap fyr


### POS tagging
Unknown tags are marked with 'Unk'.

In [4]:
from cltk.tag.pos import POSTag
tagger = POSTag('old_norse')
sent = 'Hlióðs bið ek allar.'
tagger.tag_tnt(sent)

[('Hlióðs', 'Unk'),
 ('bið', 'VBPI'),
 ('ek', 'PRO-N'),
 ('allar', 'Q-A'),
 ('.', '.')]

### Word tokenizing
For now, the word tokenizer is basic, but Old Norse actually does not need a sophisticated one.

In [5]:
from cltk.tokenize.word import WordTokenizer
word_tokenizer = WordTokenizer('old_norse')
sentence = "Gylfi konungr var maðr vitr ok fjölkunnigr."
word_tokenizer.tokenize(sentence)

['Gylfi', 'konungr', 'var', 'maðr', 'vitr', 'ok', 'fjölkunnigr', '.']

### Old Norse Stop Words
A list of stop words was elaborated with the most insignificant words of a sentence. Of course, according to your needs, you can change it.

In [6]:
from nltk.tokenize.punkt import PunktLanguageVars
from cltk.stop.old_norse.stops import STOPS_LIST
sentence = 'Þat var einn morgin, er þeir Karlsefni sá fyrir ofan rjóðrit flekk nökkurn, sem glitraði við þeim'
p = PunktLanguageVars()

tokens = p.word_tokenize(sentence.lower())
[w for w in tokens if not w in STOPS_LIST]

['var',
 'einn',
 'morgin',
 ',',
 'karlsefni',
 'rjóðrit',
 'flekk',
 'nökkurn',
 ',',
 'glitraði']

### Swadesh list for Old Norse
In the following Swadesh list, an item may have several words if they have a similar meaning, and some words lack because I have not found any corresponding Old Norse word.

In [7]:
from cltk.corpus.swadesh import Swadesh
swadesh = Swadesh('old_norse')
words = swadesh.words()
words[:30]

['ek',
 'þú',
 'hann',
 'vér',
 'þér',
 'þeir',
 'sjá, þessi',
 'sá',
 'hér',
 'þar',
 'hvar',
 'hvat',
 'hvar',
 'hvenær',
 'hvé',
 'eigi',
 'allr',
 'margr',
 'nǫkkurr',
 'fár',
 'annarr',
 'einn',
 'tveir',
 'þrír',
 'fjórir',
 'fimm',
 'stórr',
 'langr',
 'breiðr',
 'þykkr']

### Inflections of Old Norse words: nouns, pronouns and verbs

In [8]:
from cltk.inflection.old_norse import nouns

In [9]:
nouns.decline_strong_feminine_noun("Breðafönn", "Breðafannar", "Breðafannir")

Breðafönn
Breðafönn
Breðafönn
Breðafannar
Breðafannir
Breðafannir
breðöfunnum
Breðafanna


In [10]:
from cltk.inflection.old_norse import pronouns

In [11]:
pronouns.pro_personal_pronouns_thu.declension

[['þú', 'þik', 'þér', 'þín'], ['ér', 'yðr', 'yðr', 'yðar']]

In [12]:
from cltk.inflection.old_norse import verbs

In [13]:
verb = verbs.WeakOldNorseVerb()

In [14]:
help(verb)

Help on WeakOldNorseVerb in module cltk.inflection.old_norse.verbs object:

class WeakOldNorseVerb(OldNorseVerb)
 |  Method resolution order:
 |      WeakOldNorseVerb
 |      OldNorseVerb
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  classify(self)
 |  
 |  past_active(self)
 |      Weak verbs
 |      I
 |      >>> verb = WeakOldNorseVerb()
 |      >>> verb.set_canonic_forms(["kalla", "kallaði", "kallaðinn"])
 |      >>> verb.past_active()
 |      ['kallaða', 'kallaðir', 'kallaði', 'kölluðum', 'kölluðuð', 'kölluðu']
 |      
 |      II
 |      >>> verb = WeakOldNorseVerb()
 |      >>> verb.set_canonic_forms(["mæla", "mælti", "mæltr"])
 |      >>> verb.past_active()
 |      ['mælta', 'mæltir', 'mælti', 'mæltum', 'mæltuð', 'mæltu']
 |      
 |      III
 |      >>> verb = WeakOldNorseVerb()
 |      >>> verb.set_canonic_forms(["telja", "taldi", "talinn"])
 |      >>> verb.past_a

In [15]:
verb.set_canonic_forms(["jafna", "jafnaði", "jafnaðinn"])

In [16]:
verb.present_active()

['jafna', 'jafnar', 'jafnar', 'jöfnum', 'jafnið', 'jafna']

In [17]:
verb.past_active()

['jafnaða', 'jafnaðir', 'jafnaði', 'jöfnuðum', 'jöfnuðuð', 'jöfnuðu']

By Clément Besnier, email address: clemsciences@aol.com