# Fast Tutorial 1 - Algorithm Aggregation

In this notebook, I will explain how to use the *embedded_voting* package in the context of epistemic social choice and algorithms aggregations.

In general algorithm aggregation rules (Average, Median, Likelihood maximization) you need diversity among the different algorithms. However, in the real world, it is not rare to have a large group of very correlated algorithms, which are trained on the same datasets, or which have the same structure.

With our method, you can take advantage of the dependencies between the rules instead of suffering from them. In this first notebook, I will just explain how to use our method. In the following method, I will show comparisons between our method and other methods.

First of all, you need to import the package:

In [1]:
import embedded_voting as ev

## Generator to simulate algorithm results

Then, if you want to aggregate algorithms' outputs, you need to know the outputs of these algorithms. In this notebook, I will use a score generator that simulates a set of algorithms with dependencies.

In the following cell, I create a set of algorithms with $25$ algorithms in the first group, $7$ in the second group and $3$ isolated algorithms.

In [2]:
groups_sizes = [25, 7, 1, 1, 1]
features = [[1, 0, 0, 1], [0, 1, 0, 0], [1, 0, 1, 0], [0, 1, 0, 1], [0, 0, 1, 0]]

generator = ev.GroupedMixGenerator(groups_sizes, features)
generator.group_noise = 8
generator.independent_noise = .5

_, scores = generator.sample_scores(n_candidates = 20)
print(scores.shape)

(35, 20)


The last command generates a matrix of scores that contain the outputs given by the algorithms to 20 inputs. If you use this method, you can provide the score matrix by putting your algorithms' results in a matrix of shape $n_{voters} \times n_{candidates}$.


## Find the best alternative

Now, you can simply **create an *Aggregator* object** with the following line:

In [3]:
aggregator = ev.Aggregator()

The following cell show how to run a "election":

In [4]:
results = aggregator(scores)

Then you can get the results like this:

In [5]:
print("Ranking :",results.ranking_)
print("Winner :",results.winner_)

Ranking : [4, 19, 11, 17, 14, 5, 0, 7, 8, 3, 1, 6, 12, 10, 2, 9, 18, 13, 15, 16]
Winner : 4


You will probably keep using the same *Aggregator* for other elections with the same algorithms, like in the following cell:

In [6]:
for i in range(10):
    _, scores = generator.sample_scores(20)
    print(f'Winner {i+1} : {aggregator(scores).winner_}')

Winner 1 : 13
Winner 2 : 1
Winner 3 : 2
Winner 4 : 6
Winner 5 : 17
Winner 6 : 17
Winner 7 : 16
Winner 8 : 5
Winner 9 : 15
Winner 10 : 9


During each election, the *Aggregator* saves the scores given by the algorithms to know them better. However, it does not compute anything with this new data if it is not asked to do it.

Every now and then, you can retrain your *Aggregator* with these newest data. We advise to do it often where there is not a lot of training data and once you have done enough elections (typically, when you have shown as many candidates than you have algorithms), you don't need to do it a lot.

To train your *Aggregator* on the newest data, do the following:

In [7]:
aggregator.retrain()

<embedded_voting.aggregation.aggregator.Aggregator at 0x26dd16a7cc0>

You can also train it before an election using the data from the election by doing this:

In [8]:
results = aggregator(scores, train=True)

For the first election of your aggregator, you do not need to specify that *train* is **True** because the aggregator always do a training step when it is created.

## Fine-tune the aggregation rule

If you want to go further, you can change some aspects of the aggregation rule.

The first thing that you may want to change is the aggregation rule itself. The default one is *FastNash*, but you can try *FastLog*, *FastSum* or *FastMin*, which can give different results.

We advise to use *FastNash*, which shows stronger theoretical and experimental results.

In [9]:
aggregator_log = ev.Aggregator(rule=ev.FastLog())
aggregator_sum = ev.Aggregator(rule=ev.FastSum())
aggregator_min = ev.Aggregator(rule=ev.FastMin())
print("FastNash:",aggregator(scores).ranking_)
print("FastLog:",aggregator_log(scores).ranking_)
print("FastSum:",aggregator_sum(scores).ranking_)
print("FastMin:",aggregator_min(scores).ranking_)

FastNash: [9, 5, 1, 15, 13, 17, 10, 6, 3, 4, 7, 12, 8, 16, 2, 19, 18, 0, 11, 14]
FastLog: [9, 5, 1, 15, 13, 17, 10, 6, 3, 4, 7, 12, 8, 16, 2, 19, 18, 0, 11, 14]
FastSum: [9, 5, 1, 15, 13, 17, 10, 6, 3, 4, 7, 12, 8, 16, 2, 19, 18, 0, 11, 14]
FastMin: [15, 9, 17, 1, 5, 13, 10, 4, 6, 3, 8, 16, 2, 7, 0, 19, 18, 12, 11, 14]


You can also use the average rule:

In [10]:
aggregator_avg = ev.Aggregator(rule=ev.SumScores())
results = aggregator_avg(scores)
print(aggregator_avg(scores).ranking_)

[9, 5, 1, 15, 13, 17, 10, 6, 3, 4, 7, 12, 8, 16, 2, 19, 18, 0, 11, 14]


You can also change the transformation of scores. The default one is the following :

$$
f(s) = \sqrt{\frac{s}{\left || s \right ||}}
$$

But you can put any rule you want, like the identity function $f(s) = s$ if you want. In general, if you use a coherent score transformation, it will not change a lot the results.

In [11]:
aggregator_id = ev.Aggregator(rule=ev.FastNash(f=lambda x:x))
print(aggregator_id(scores).ranking_)

[9, 5, 1, 15, 13, 17, 10, 6, 3, 4, 7, 12, 8, 16, 2, 19, 18, 0, 11, 14]
