In [1]:
def parse_columns(cols):
    """
    Parses columns and rename, it receives either a string of the form:
    > name (-> rename)?
    
    or a list of such strings
    
    It returns a list and a dictionary, the list is an ordered list of 'name', the dictionary is,
    for each entry with a 'rename', a mapping: 'name':'rename'
    """
    if type(cols)==list:
        l = map(parse_columns, cols)
        l_f = []
        d_f = {}
        for i in l:
            l_f.extend(i[0])
            d_f.update(i[1])
        return l_f, d_f
    cols = cols.split('->')
    r = cols[0].strip()
    if len(cols) > 1:
        return [r], {r:cols[1].strip()}
    else:
        return [r], {}

# Totals
Totals is the backbone of the project, this is the class defining how to manipulate data.

I have identified three ways to abstract Pandas operations:
+ Aggregation
+ Transformation
+ Combination
+ Filtering

To facilitate the processes I will assume all data comes from functions, making the interface uniform. Other metaclasses will decide how to approach this, for instance data sources automatically create a `get_data` interface, while subclasses will automatically concatenate the results of the operations in an arbitrary number of functions

## Filtering
Filtering is an operation necessary as for instance some parties get too little votes to influence the result, or alternatively a candidate being elected in a lane invalidates the second preference for a following lane (see the german system).

A row could be filtered based on:
+ A single column (as would happen if a party didn't pass a threshold)
+ A computation on the whole row (german second vote scenario)
+ A consideration on the whole dataframe.

However the filter can usually be determined by a single column therefore I will apply the follow standard:

1. The filter is passed to totals as a string argument
2. This is forwarded to lower calls
3. For each row of the result I take the cell in the first column and call filter on the content of the cell
4. If the result is false I remove the row

## Aggregation
In this modality a dataframe sees the number of rows reduced according to some criteria by merging multiple rows together

## Transformation
When a transformation occours the Dataframe shape and content changes radically.

+ Applying a function on a column
+ Applying a function on records to obtain a new column

## Combination

This mode combines inputs from multiple sources and calls a function on each similar entry

The sources can be treated as:
+ Scalars: these are treated as a single entity, will be passed as is to each call. They're going to be passed as positional arguments following the dataframes
+ DataFrames: these are pandas dataframes, will need a key over which to aggregate.
All entries with the same key(s) will be passed to the same function call. Different dataframes will be merged together, inner-merge. So the functions must accept a single dataframe parameter as the first positional argument
+ Series: these aren't proper pandas series, I take this to mean a dataframe where each key identifies a single row, the row will be taken as a dict, the key columns removed and passed as kwargs

### Todo:
Renaming the results isn't implemented yet

In [2]:
import pandas as pd
import random

def rand():
    return random.random()

df = pd.DataFrame({'c':['a','b','c','c','c'], #coalizione di appartenenza
                   'l':['a','b','c','d','e'], #lista
                   'vl':[10,20,5,7,8]}) #voti
ds = pd.DataFrame({'c':['a','b','c'], #coalizione
                   's':[6,10,12]}) #seggi

def df_f():
    return df

def ds_f():
    return ds

def tes_fun(*args, **kwargs):
    for i in args:
        print(i)
    for k, v in kwargs.items():
        print(k,':')
        print(v)

```
combine_fun:
  type: combine
  function: tes_fun
  keys:
    - Coalizione
  args:
    - type: dataframe
      source: df_f
      columns:
        - c -> Coalizione
        - l -> Lista
        - vl -> Voti
    - type: scalar
      source: rand
    - type: series
      source: ds_f
      columns:
        - c -> Coalizione
        - s -> Seggi
```
Questo codice deve aggregare basandosi sulla coalizione e 

In [3]:
conf = {'combine_fun': {'args': [{'columns': ['c -> Coalizione',
                                       'l -> Lista',
                                       'vl -> Voti'],
                           'source': 'df_f',
                           'type': 'dataframe'},
                          {'source': 'rand', 'type': 'scalar'},
                          {'columns': ['c -> Coalizione', 's -> Seggi'],
                           'source': 'ds_f',
                           'type': 'series'}],
                 'function': 'tes_fun',
                 'keys': ['Coalizione'],
                 'type': 'combine'}}

In [4]:
df1_cols = parse_columns(conf['combine_fun']['args'][0]['columns'])
se1_cols = parse_columns(conf['combine_fun']['args'][2]['columns'])

In [5]:
df1 = eval(conf['combine_fun']['args'][0]['source'])()[df1_cols[0]].rename(columns=df1_cols[1])
sc1 = eval(conf['combine_fun']['args'][1]['source'])()
se1 = eval(conf['combine_fun']['args'][2]['source'])()[se1_cols[0]].rename(columns=se1_cols[1])

In [6]:
dfs = []
scs = []
ses = []

for dic in conf['combine_fun']['args']:
    if dic["type"]=='dataframe':
        cols = parse_columns(dic['columns'])
        dfs.append(eval(dic['source'])()[cols[0]].rename(columns=cols[1]))
    elif dic["type"]=='series':
        cols = parse_columns(dic['columns'])
        ses.append(eval(dic['source'])()[cols[0]].rename(columns=cols[1]))
    else:
        scs.append(eval(dic['source'])())

In [7]:
keys = conf['combine_fun']['keys']
gps = df1.groupby(keys)
f = eval(conf['combine_fun']['function'])
results = []
for g, frame in gps:
    print("Running:",g)
    g = list(g)
    kwargs = {}
    # Per ogni serie
    m = [ se1[keys[i]] == g[i] for i in range(len(g))]
    fil = m[0]
    for i in m[1:]:
        fil = fil & i
    
    kw_1 = dict(se1[fil].iloc[0])
    for i in keys:
        kw_1.pop(i, None)
    kwargs.update(kw_1)
    
    results.append(f(frame, sc1,**kwargs))

Running: a
  Coalizione Lista  Voti
0          a     a    10
0.7431030431472867
Seggi :
6
Running: b
  Coalizione Lista  Voti
1          b     b    20
0.7431030431472867
Seggi :
10
Running: c
  Coalizione Lista  Voti
2          c     c     5
3          c     d     7
4          c     e     8
0.7431030431472867
Seggi :
12


In [8]:
def parse_dataframe(conf):
    """
    Generate the information needed to obtain a dataframe
    Conf is a dictionary with the following keys:
    
    """

def gen_totals_combine(conf): #conf is the function configuration, not the generic configuration
    """
    Elements of the dictionary
    
    type: 'combine'
    function: the function to apply
    keys: List[str] the keys over which to aggregate/merge dataframes
    args: List[dict], each dict will have the key "type", if not defaults to scalar 
    """
    conf = copy.deepcopy(conf)
    lis_f = conf.keys()
    def gen_fun(var):
        def give(self, val):
            setattr(self, var, val)
        return (var, give)
    m = map(gen_fun, lis_f)
    fs = {}
    for i, f in m:
        fs[f'give_{i}'] = f
    return fs


Sezione
```
lista_cand:
  type: aggregate
  keys:
    - Lista
    - Candidato 
  source:
    fun: self.get_voti_lista_cand
    options: NoArgs
  ops:
    Voti: sum
  col_types:
    Lista: Partito
    Candidato: Candidato

cand:
  type: aggregate
  keys:
    - Candidato 
  source:
    name: self.get_voti_cand
    options: NoArgs
  ops:
    Voti: sum
  col_types:
    Candidato: Candidato
```
NoArgs blocks Totals from forwarding arguments such as blocking

Uninominale:
```
candidato:
  type: aggregate
  keys:
    - Candidato
  source:
    fun: self.subs_sez_candidato
  ops:
    Voti: sum
  col_types:
    Candidato: Candidato

lista_cand:
  type: aggregate
  keys:
    - Candidato
    - Lista
  source:
    fun: self.subs_sez_list
  ops:
    Voti: sum
  col_types:
    Lista: Partito
    Candidato: Candidato

lista:
  type: combine
  function: commons.Hondt
  keys: 
    - Candidato
  args:
    - type: dataframe
      source: 
         fun: self.totals
         args:
           - lista_cand
      columns:
        - Lista
        - Candidato
        - Voti
    - redistribuzione
    - type: series
      source: 
         fun: self.totals
         args:
           - candidato
      columns:
        - Candidato
        - Seggi
  columns:
    - Lista
    - Seggi -> Voti
  col_types:
    Lista: Partito

```
negli argomenti se non è un dizionario trattarlo come scalare, stringa o intero

Plurinominale
```
lista:
  type: aggregate
  keys:
    - Lista
  source:
    fun: self.subs_uni_lista
  ops:
    Voti: sum
  col_types:
    Lista: Partito

coalizione:
  type: aggregate
  keys:
    - Lista
  source:
    fun: self.totals
    args:
      - coalizione_raw
    columns:
      - Coalizione
      - Voti
  ops:
    Voti: sum
  col_types:
    Coalizione: Coalizione

coalizione_raw:
  type: transform
  source:
    fun: self.totals
    args:
      - lista
    columns:
      - Lista
      - Voti
  apply:
    - type: column
      column: Lista
      col_type: Partito
      attribute: coalizione
      replace: True
  columns:
    - Lista -> Coalizione
    - Voti
  col_types:
    Coalizione: Coalizione
```

types:
+ aggregate
+ transform
+ combine
+ multi-stage

In [20]:
def gen_fun(i):
    def f(*a, **kw):
        return i
    return f

import copy
a = []
attr = [1,2,3,4,5,6, "Ciao"]

for i in attr:
    a.append(gen_fun(i))

a[2]()


3

source posso parsarlo con source_parse, mi restituisce una funzione che accetta \*args e \*\*kwargs e mi restituisce il valore di interesse

In [9]:
import functools
def total_example_aggregate(self, *sbarramenti):
    keys = keys_fun()
    operations = operations_dict() # column_name: str|function accepting a series
    function = func_gen()
    source = function(*sbarramenti)
    return source.groupby(keys).aggregate(operations)
    

def total_example_transformation_fullDF(self, *sbarramenti):
    function = func_gen()
    source = function(*sbarramenti)
    r = source[columns_fun_before()]
    df = r.rename(columns = rename_fun_before())
    op = fun_operation()
    return op(r)
    
def total_example_transformation_new_column(self, *sbarramenti):
    function = func_gen()
    source = function(*sbarramenti)
    r = source[columns_fun_before()]
    df = r.rename(columns = rename_fun_before())
    ops = fun_operations() #dict str: function
    for c, fun in ops.items():
        df[c] = df.apply(fun, axis=1)
    return df

def total_example_combination(self, *sbarramenti):
    functions_dfs = funcs_gen_df()
    functions_scs = funcs_gen_sc()
    functions_ses = funct_gen_se()
    s_dfs = [i(*sbarramenti) for i in functions_dfs]
    s_scs = [i(*sbarramenti) for i in functions_scs]
    s_ses = [i(*sbarramenti) for i in functions_ses]
    keys = keys_fun()
    if len(s_dfs)>1:
        frame = functools.reduce(lambda a, b: pd.merge(a,b, on=list(keys)), s_dfs)
    else: 
        frame = s_dfs[0]
    
    func = func_gen()
    gps = df.groupby(list(keys))
    res = []
    for g, frame in gps:
        g = list(g)
        kwargs = {}
        
        for ser in s_ses:
            m = [ser[keys[i]] == g[i] for i in range(len(g))]
            fil = m[0]
            for i in m[1:]:
                fil = fil & i
            kw_1 = dict(se1[fil].iloc[0])
            for i in keys:
                kw_1.pop(i, None)
            kwargs.update(kw_1)

        res.append(func(frame, *s_scs, **kwargs))
    return pd.concat(res, ignore_index=True)
    

def total_example_multi(self, *sbarramenti):
    d = locals()
    
    

def total_framer(self, *sbarramenti):
    f = which_function()
    r = f(*sbarramenti)
    r = r[columns_fun_after()]
    r = r.rename(columns = rename_fun_after())
    t_cell = column_type() #type of the cells in the first column
    def filter_g(record):
        obj = Hub.get_instance(t_cell, record.iloc[0])
        return obj.filter(geo_loc   = self,
                          total_name = "nome_funz"
                          filters   = sbarramenti, 
                          line      = record, 
                          dataframe = r)
    
    f = r.apply(filter_g, axis=1)
    return r[f]

+ keys_fun: it's a list of strings in the yaml
+ func_gen:  
+ operations_dict
+ 

In [10]:
def tots_syntax_sugar_parse(conf):
    """Totals uses a single function 
    with many subfunctions.

    Manually making exceptions would be 
    cumbersome as it'd involve a lot of:

    - type: fun
      name: totals
      args:
        - actual_name

    So, just for totals I'll add the 
    option to replace the above with:

    - totals: actual_name

    This will be translated by 
    this function
    """
    if type(conf) != dict:
        return conf
    
    if "totals" in conf:
        n = conf['totals']
        del conf['totals']
        conf['type'] = 'fun'
        conf['args'] = \
            [n] + conf.get('args', [])
    return {k:tots_syntax_sugar_parse(v)
              for k, v in conf.items()}

In [11]:
tots_syntax_sugar_parse({'args': [10, {'name': 'ciao', 'type': 'att'}],
 'kwargs': {'a': {'args': ['tanto_per'],
                  'options': 'NoForward',
                  'totals': 'ciao_f'}},
 'name': 'test',
 'type': 'fun'})

{'args': [10, {'name': 'ciao', 'type': 'att'}],
 'kwargs': {'a': {'args': ['ciao_f', 'tanto_per'],
   'options': 'NoForward',
   'type': 'fun'}},
 'name': 'test',
 'type': 'fun'}