# Cleaner

This notebook is used to separate the rundata text from the codes and map the codes to other information found in the different files provided by rundata.

The first step is to load the data we want to merge into data_frames. The deeper cleaning process which separates codes and the runic text can be
found in `rundata_utils.py`.

Information from [rundata](http://www.rattsatt.com/rundata/mac/bskr_rdm.pdf). is split mostly into a file per type where each entry is represented as a sentence starting with the `Signum` or as we will anglicise here: `Code`.


|File Name|Description                            |
|---------|---------------------------------------|
|FVN      |Old west norse                         |
|FVNX     |Old west norse in a searchable format  |
|FORNSPR  |Ancient language                       |
|FORNSPRX |Ancient language in a searchable format|
|RUNTEXT  |Transliterated rune text               |
|RUNTEXTX |Rune text in a searchable format       |

This `Code` is universal throughout the files mentioned above as well as the Excel file `RUNDATA.xls` which contains metadata about the inscriptions mentioned such as their location, material and wear.

So our first task is to use the excel file to map the transliterations to the english translations of the runes, as well as add metadata that we define as important in order to get a more table like structure to make the data easier to work with.


In [7]:
import pandas as pd
from rundata_utils import get_table_from_text, get_dataframe_from_excel, material_map

df_en = get_table_from_text('ENGLISH')
df_runx = get_table_from_text('RUNTEXTX')

df_rd = get_dataframe_from_excel()

Merge the dataframes and rename the columns to English.

In [8]:
df = pd.merge(df_runx, df_en, on='Signum', suffixes=('_runx', '_en'))
df = pd.merge(df, df_rd[['Signum','Plats','Stilgruppering', 'Period/Datering', 'Materialtyp']], on='Signum')

df = df.rename(columns={
    'Text_en': 'English', 
    'Text_runx': 'Transliteration', 
    'Signum': 'Code',
    'Plats': 'Location',
    'Stilgruppering': 'Style Grouping',
    'Period/Datering': 'Dating',
    'Materialtyp': 'Material Type'
})

Translate content

In [None]:
df['Material Type'] = df['Material Type'].replace(material_map)

In [9]:
# Settings to show full text of dataframes when printing
pd.set_option('display.max_columns', None)  # or 1000
pd.set_option('display.max_colwidth', None) 
df

Unnamed: 0,Code,Transliteration,English,Location,Style Grouping,Dating,Material Type
0,Öl 1 $,§A s-a... --s- ias satr aiftir siba kuþa sun fultars in hons liþi sati at u -ausa-þ-... fulkin likr hins fulkþu flaistr uisi þat maistar taiþir tulka þruþar traukr i þaimsi huki munat raiþuiþur raþa rukstarkr i tanmarku --ntils iarmunkruntar urkrontari lonti §B {INONIN- HE... ...},"§A This stone is set up in memory of Sibbi Góði/Goði, son of Foldarr, and his retinue set on ... Hidden lies the one whom followed (most know that) the greatest deeds, Þrud's warrior of battles, in this mound. Never will a more honest, hard-fighting 'wagon-Viðurr' upon Endill's expanses rule the land in Denmark. [This stone is placed in memory of Sibbi the good, Fuldarr's son, and his retinue placed on ... ... He lies concealed, he who was followed by the greatest deeds (most men knew that), a chieftain (battle-tree of [the Goddess] Þrúðr) in this howe; Never again shall such a battle-hardened sea-warrior (Viðurr-of-the-Carriage of [the Sea-king] Endill's mighty dominion ( = God of the vessels of the the sea) ), rule unsurpassed over land in Denmark.] §B {In the name of Jesus(?) ...}",Karlevi,RAK,V s 900-t,stone
1,Öl 2 †$,tot-- þ-a- k--kaR ---- ...--- -tain iftiR sabiara bruþur sin kuþ hialbi salu hans,"Dóttir(?), Þegn(?), ...-geirr(?) [had the] stone [erected] in memory of Sæbjôrn, their brother. May God help his soul.",Algutsrums kyrka,Pr3,V,stone
2,Öl 3 †$,...iR bryþr litu r-isa ... ...ftiR ...-----s--unilu...,These brothers had the [stones] raised in memory of their(?) ...,Resmo kyrka,Pr3 - Pr4?,V efter 1050,stone
3,Öl 4 $,...-abi þaiR --tu raisa stein- eftiR rantui moþor sina,"<...-abi> they had the stones raised in memory of Randvé, their mother.",Resmo kyrka,Pr4,V efter 1050,stone
4,Öl 5 †,alti auk keti... ... stein eftiR kata faþur sin,"Aldi and Ketill, (they had) the stone (raised) in memory of Káti, their father.",Bårby,Pr3,V,stone
...,...,...,...,...,...,...,...
5162,UA Fv1914;47 $,krani kerþi half þisi iftir kal filaka sin,"Grani made this vault in memory of Karl/Káll, his partner.",Berezanj,,V,stone
5163,By Fv1970;248,alftan ---t----l-a-----,Halfdan ...,Hagia Sofia,,V/M,stone
5164,By NOR1999;26,arni,Árni,Hagia Sofia,,V?,stone
5165,By NT1984;32 $,§A ... hiaku þir hilfniks min -----... ... en i hafn þesi þir min i-ku runar at hau-sa buta -...hua-... ... --þu suiar þeta leinu ... f...- aþr gailt uan kearu- §B trikir rist runir ... §C asmuntr risti ... ...nar þisar þair isk-... ... þurlifr ---- auk - -...-o-...---t-... -ufruk...r...s--...--...uanfarn,"§A ... they cut(?), the troops men ... but in this harbour the men cut runes in memory of Haursi, a ... vigorous(?) husbandman ... Swedes arranged(?) this on the lion ... He fell(?)/perished(?) before he could gain payment. §B Valiant men carved the runes ... §C Ásmundr carved ... the runes, they Áskell(?) ... Þorleifr(?) ... and ... ... ...","Porto Leone, Pireus",,V,stone


Write out the merged data as a csv file.

In [10]:
df.to_csv('data/processed/merged.csv', index=False)