# Eddas
Note: in order to use this **Jupyter notebook**, you need at least **python 3.6** or above.

Why this notbook? You can retrieve and process easily the Eddas (the Poetic one and the Snorri's one). 

If you want to get more tools for Old Norse, use **cltk** (https://github.com/cltk/cltk.git). Here is the docs: http://docs.cltk.org/en/latest/old_norse.html.

### Configuration

Install the modules.
```bash
$ pip install -r requirements.txt
```

Install the **kernel** associated with **python3.6** [https://ipython.readthedocs.io/en/stable/install/kernel_install.html](https://ipython.readthedocs.io/en/stable/install/kernel_install.html) 

Let's test if the import is correct:
```bash
$ python3.6
```

In [18]:
import os
os.chdir("..")

In [1]:
import eddas

It works! So let's continue with some predefined data.

In [2]:
from eddas import reader

The reader module uses the **NLTK** TaggedCorpusReader class. 

#### The POS annotated Völuspá
The POS tagset is available at http://nlp.cs.ru.is/pdf/Tagset.pdf.

In [3]:
pos_voeluspaa = reader.PoeticEddaPOSTaggedReader("Völuspá")

Raw POS annotated text :

In [4]:
print(pos_voeluspaa.raw()[:300])

1/ta
Hljóðs/nhee bið/sfg1eþ ek/fp1en allar/lvfosf
helgar/lvfosf kindir/nvfo ,/p
meiri/lvfovm ok/cc minni/lvfovm
mögu/nkfo Heimdallar/nkeem ;/p
viltu/sfg2en at/cn ek/fp1em ,/p Valföðr/nkenm ,/p
vel/aa fyr/cn telja/sng
forn/lhfosf spjöll/nhfo fira/nkfe ,/p
þau/fa er/ct fremst/aa of/aa man/sfg1


The first paragraph :

In [5]:
print(pos_voeluspaa.tagged_paras()[0])

[[('1', 'TA')], [('Hljóðs', 'NHEE'), ('bið', 'SFG1EÞ'), ('ek', 'FP1EN'), ('allar', 'LVFOSF')], [('helgar', 'LVFOSF'), ('kindir', 'NVFO'), (',', 'P')], [('meiri', 'LVFOVM'), ('ok', 'CC'), ('minni', 'LVFOVM')], [('mögu', 'NKFO'), ('Heimdallar', 'NKEEM'), (';', 'P')], [('viltu', 'SFG2EN'), ('at', 'CN'), ('ek', 'FP1EM'), (',', 'P'), ('Valföðr', 'NKENM'), (',', 'P')], [('vel', 'AA'), ('fyr', 'CN'), ('telja', 'SNG')], [('forn', 'LHFOSF'), ('spjöll', 'NHFO'), ('fira', 'NKFE'), (',', 'P')], [('þau', 'FA'), ('er', 'CT'), ('fremst', 'AA'), ('of', 'AA'), ('man', 'SFG1EN'), ('.', 'P')]]


#### The lemmatized Völuspá
Lemmata are from the Zoëga's dictionary that you can retrieve from https://github.com/cltk/old_norse_dictionary_zoega.

In [6]:
lem_voeluspaa = reader.PoeticEddaLemmatizationReader("Völuspá")

Sentences in the corpus are "short lines" in the Germanic tradition sense.

Without annotations:

In [7]:
lem_voeluspaa.sents()[3]

['meiri', 'ok', 'minni']

With annotations:

In [8]:
[(token, lemma.lower()) for token, lemma in lem_voeluspaa.tagged_sents()[3]]

[('meiri', 'mjök'), ('ok', 'ok'), ('minni', 'lítill')]

#### The syllabified Völuspá
Syllables are separated by a plus sign.

In [9]:
syl_voeluspaa = reader.PoeticEddaSyllabifiedReader("Völuspá")

In [10]:
syl_voeluspaa.tagged_words()[10:20]

[('Heimdallar', 'HEIM+DAL+LAR'),
 ('viltu', 'VI+LTU'),
 ('at', 'AT'),
 ('ek', 'EK'),
 ('Valföðr', 'VAL+FÖÐR'),
 ('vel', 'VEL'),
 ('fyr', 'FYR'),
 ('telja', 'TEL+JA'),
 ('forn', 'FORN'),
 ('spjöll', 'SPJÖLL')]

In [11]:
normalized_text = [[[word for word, tag in line] for line in para] for para in lem_voeluspaa.tagged_paras()]
lemmata_text = [[[tag for word, tag in line] for line in para] for para in lem_voeluspaa.tagged_paras()]
pos_text = [[[tag for word, tag in line] for line in para] for para in pos_voeluspaa.tagged_paras()]

In [12]:
from pos_icepahc import *

In [16]:
annotations = []
for i, stanza in enumerate(pos_text):
    for j, line in enumerate(stanza):
        annotations.append(" ".join(normalized_text[i][j]))
        for k, pos in enumerate(line):
            normalized = normalized_text[i][j][k]
            lemma = lemmata_text[i][j][k]
            if not lemma:
                lemma = "UNK"
            if not pos:
                pos = "UNK"
                parsed_pos = "UNK"
            else:
                parsed_pos = parse_icepahc(pos)
            
            annotations.append(normalized+" ["+lemma+"]"+" -> "+parsed_pos+" : "+pos)
        annotations.append("\n")

In [15]:
import codecs

In [19]:
with codecs.open("edda/voluspa_annotations_besnier.txt", "w", encoding="utf-8") as f:
    f.write("\n".join(annotations))
    

The Völuspá is currently being annotated. The other poems of the Poetic Edda may also be annotated so if you want to join, go into https://github.com/cltk/old_norse_texts_heimskringla.

By Clément Besnier, web site: https://clementbesnier.fr/, twitter: clemsciences