## Create tables for pension funds and their divisions

$$\frac{x^2}{y^2}$$

Create a database for pension funds. Start with creating a table for the funds themselves. The funds table has fundid, fundname, shortname and a regex column, representing a regular expression that can be used to determine if a string corresponds to that particular fund.

The second table is the subfund, or department or division of a fund. It has the columns subfundid, fundid, subfundname, shortname, subfundtype (shared or private), and regex.

Start by collecting the values from the latest excel file from FME. The funds are listed conveniently there.

In [None]:
#Gera funds töflu
using XLSX, DataFrames

path = "C:/Users/Ingolfur/Documents/GitHub/engx-project-group20/skjölin frá Birgi/Files/Arsreikningabok-2021.xlsx"

xf = XLSX.readxlsx(path)

#Return the values in a column, starting in first row and until first empty cell occurs
function read_column(sheet, firstrow, col)
    res = []
    row = firstrow
    while !ismissing(sheet[row,col])
        push!(res,sheet[row,col])
        row+=1
    end 
    return res
end

sheet_name = "Gögn"

sheet = xf[sheet_name]
firstrow = 2
col = 2
shortname = read_column(sheet, firstrow, col)
col = 3
fundname = read_column(sheet, firstrow, col)
col = 4
subfundname = read_column(sheet, firstrow, col)
col = 5
subfundtype = read_column(sheet, firstrow, col)
col = 1
fullname = read_column(sheet, firstrow, col)
rawdata = DataFrame(fundname = fundname, shortname = shortname, 
        subfundname = subfundname,subfundtype = subfundtype, fullname = fullname)

In [2]:
#Eyða duplicates og setja inn ID
funds = combine(groupby(rawdata,[:fundname,:shortname]),nrow)[:,[:fundname,:shortname]]

funds.fundid .= collect(1:nrow(funds))

funds

Row,fundname,shortname,fundid
Unnamed: 0_level_1,Any,Any,Int64
1,Almenni lífeyrissjóðurinn,Almenni,1
2,Arion banki hf.,Arion banki,2
3,Birta lífeyrissjóður,Birta,3
4,Brú Lífeyrissjóður starfsmanna sveitarfélaga,Brú,4
5,Eftirlaunasj atvinnuflugmanna,EFÍA,5
6,Festa - lífeyrissjóður,Festa,6
7,Frjálsi lífeyrissjóðurinn,Frjálsi,7
8,Gildi - lífeyrissjóður,Gildi,8
9,Íslandsbanki hf.,Íslandsbanki,9
10,Íslenski lífeyrissjóðurinn,Íslenski,10


In [3]:
#Add regex - v1.
#Create a version and test it against the fundnames and short names
myregex = r"Lífsval"
#myregex = r"(L|l)íf[a-z. ]*Versl"
regex_expressions = ["Almenni","Arion","Birta","Brú","(Eftirl[a-z. óð]*flug|FÍA)","Festa",
                    "Frjálsi","Gildi","Íslandsbanki","Íslenski","Kvika","Landsbankinn","bænda",
                    "bankam","Rang","Akureyr","Búnaðar","(Reykjav|Rvk)","(ríkis|LSR|Lsr)",
                    "(Tannl|tannl)","((L|l)íf[a-z. óð]*Versl|(L|l)íf[a-z. óð]*versl)",
                    "(L|l)íf[a-z.óð]+ (V|v)estm","Lífsval",
                    "((L|l)ífsverk|(L|l)íf[a-z. óð]*verkf)","Söfnunar","Stapi"]

for fund in funds.fundname
    m = match(myregex, fund)
    if isnothing(m)
        #println("$fund, No match!")
    else
        println("$fund, Fund Match! , $(m.match)")
    end
end

for fund in funds.shortname
    m = match(myregex, fund)
    if isnothing(m)
        #println("$fund, No match!")
    else
        println("$fund, Short Match! , $(m.match)")
    end
end

funds.regexp .= regex_expressions
funds

Lífsval - lífeyrissparnaður, Fund Match! , Lífsval
Lífsval, Short Match! , Lífsval


Row,fundname,shortname,fundid,regexp
Unnamed: 0_level_1,Any,Any,Int64,String
1,Almenni lífeyrissjóðurinn,Almenni,1,Almenni
2,Arion banki hf.,Arion banki,2,Arion
3,Birta lífeyrissjóður,Birta,3,Birta
4,Brú Lífeyrissjóður starfsmanna sveitarfélaga,Brú,4,Brú
5,Eftirlaunasj atvinnuflugmanna,EFÍA,5,(Eftirl[a-z. óð]*flug|FÍA)
6,Festa - lífeyrissjóður,Festa,6,Festa
7,Frjálsi lífeyrissjóðurinn,Frjálsi,7,Frjálsi
8,Gildi - lífeyrissjóður,Gildi,8,Gildi
9,Íslandsbanki hf.,Íslandsbanki,9,Íslandsbanki
10,Íslenski lífeyrissjóðurinn,Íslenski,10,Íslenski


In [5]:
#Ok, gera nuna funds div
rawdata

subfunds = combine(groupby(rawdata,[:fundname,:subfundname,:subfundtype]),nrow)[:,[:fundname,:subfundname,:subfundtype]]

subfunds.subfundid .= collect(1:nrow(subfunds))

#Map the type to english
subfundtype_map = Dict("Séreign" => "Private", "Samtrygging" => "Coinsurance")
transform!(subfunds, :subfundtype => ByRow(x -> subfundtype_map[x]) => :subfundtype)

#Returns the ID of the fundname using regex - errors if there is more than one match
function get_fund_id(fundname::String,funds::DataFrame)
    found = false
    fundid = 0
    official_fund_name = ""
    for row in eachrow(funds)
        myregex = Regex(row[:regexp])
        m = match(myregex,fundname)
        if !isnothing(m)
            if found
                error("Hi man! Two matches for a regex! $official_fund_name and $(row[:fundname])")
            else
                fundid = row[:fundid]
                official_fund_name = row[:fundname]
                found = true
            end
        end
    end
    if !found
        error("$fundname is a new fund, need to register in funds db!")
    end
    return fundid
end

#Use the funds dataframe that already exists
get_fund_id(fundname::String) = get_fund_id(fundname,funds)

transform!(subfunds, :fundname => ByRow(get_fund_id) => :fundid)
subfunds

Row,fundname,subfundname,subfundtype,subfundid,fundid
Unnamed: 0_level_1,Any,Any,String,Int64,Int64
1,Almenni lífeyrissjóðurinn,Ævisafn I,Private,1,1
2,Almenni lífeyrissjóðurinn,Ævisafn II,Private,2,1
3,Almenni lífeyrissjóðurinn,Ævisafn III,Private,3,1
4,Almenni lífeyrissjóðurinn,Húsnæðissafn,Private,4,1
5,Almenni lífeyrissjóðurinn,Innlánssafn,Private,5,1
6,Almenni lífeyrissjóðurinn,Ríkissafn langt,Private,6,1
7,Almenni lífeyrissjóðurinn,Ríkissafn stutt,Private,7,1
8,Almenni lífeyrissjóðurinn,Tryggingadeild,Coinsurance,8,1
9,Arion banki hf.,Erlend hlutabréf,Private,9,2
10,Arion banki hf.,Innlend skuldabréf,Private,10,2


In [None]:
#Næst, regex og búa til beinagrind fyrir staging töflur.
#regexid er bara fund regext og svo hvad sem er og svo deildarregex