# Rewards Allocation Notebook
This goal of this notebook is to offer an easy way to process the outputs of the praise and sourcecred reward systems, calculate a reward distribution and perform an analysis of the results. It uses mock data and should be considered a work-in-progress. 

### Basic Setup
First, we import the relevant libraries, get the Data and set how many tokens we want to distribute

In [25]:
from ipyfilechooser import FileChooser

import pandas as pd 
import numpy as np 
import analytics_toolbox as tb

fc_praise = FileChooser('./exampleFiles')
fc_sourcecred = FileChooser('./exampleFiles')
fc_rewardboard = FileChooser('./exampleFiles')

print("== Please choose the Praise CSV file == ")
display(fc_praise)
print("== Please choose the Sourcecred CSV file == ")
display(fc_sourcecred)
print("== Please choose the Rewardboard address list CSV file == ")
display(fc_rewardboard)


== Please choose the Praise CSV file == 


FileChooser(path='/home/dev/Documents/GitHub/praise_RewardAnalysis/exampleFiles', filename='', title='', show_…

== Please choose the Sourcecred CSV file == 


FileChooser(path='/home/dev/Documents/GitHub/praise_RewardAnalysis/exampleFiles', filename='', title='', show_…

== Please choose the Rewardboard address list CSV file == 


FileChooser(path='/home/dev/Documents/GitHub/praise_RewardAnalysis/exampleFiles', filename='', title='', show_…

Now that we have selected the files, we can import them for processing. We can also set the total amount of tokens we want to distribute this period.
Tip: Now that the file paths are set, you can safely click on "Cell > Run all below" from here on the menu bar to execute everything :) 

In [26]:
PRAISE_DATA_PATH = fc_praise.selected
SOURCECRED_DATA_PATH = fc_sourcecred.selected
REWARD_BOARD_ADDRESSES_PATH = fc_rewardboard.selected
NUMBER_OF_PRAISE_REWARD_TOKENS_TO_DISTRIBUTE = 1000

praise_data = pd.read_csv(PRAISE_DATA_PATH)
sourcecred_data = pd.read_csv(SOURCECRED_DATA_PATH)
rewardboard_addresses = pd.read_csv(REWARD_BOARD_ADDRESSES_PATH)

## Praise reward allocation
This method allocates the praise rewards in a very straightforward way: It adds the value of all dished praised together, and then assigns to each user their % of the total.

In [27]:
def calc_praise_rewards(praiseData, tokensToDistribute):
    #we discard all we don't need and and aggregate all the praise for each single user
    slimData = praiseData[['TO', 'FINAL QUANT']].groupby(['TO']).agg('sum').reset_index()
    totalPraisePoints = slimData['FINAL QUANT'].sum()

    slimData['PERCENTAGE'] = slimData['FINAL QUANT']/totalPraisePoints
    slimData['TOKEN TO RECEIVE'] = slimData['PERCENTAGE'] * tokensToDistribute
    return slimData

praise_distribution = calc_praise_rewards(praise_data, NUMBER_OF_PRAISE_REWARD_TOKENS_TO_DISTRIBUTE)

## Combining the Datasets

Now that we have the distribution done, we combine it with sourcecred into one table.
But before continuing, we also want to declare some methods for cleaning the data and prepare it. We'll use them later.

In [28]:
#General Helper func. Puts all the "processing we probably won't need to do later or do differently" in one place
#  -removes the '#' and following from discord names
#  -Some renaming and dropping 
def prepare_praise(praise_data):
    praise_data['TO'] = (praise_data['TO'].str.split('#', 1, expand=False).str[0]).str.lower()
    praise_data.rename(columns = {'TO':'IDENTITY'}, inplace = True)
    praise_data = praise_data[['IDENTITY', 'PERCENTAGE', 'TOKEN TO RECEIVE']]
    return praise_data

#General Helper func. Puts all the "processing we probably won't need to do later or do differently" in one place
#  -Some renaming and dropping 
#  -changing percentages from 0 - 100 to 0.00-1.00
def prepare_sourcecred(sourcecred_data):
    sourcecred_data.rename(columns = {'%':'PERCENTAGE'}, inplace = True)
    sourcecred_data['IDENTITY'] = sourcecred_data['IDENTITY'].str.lower()
    sourcecred_data['PERCENTAGE'] = sourcecred_data['PERCENTAGE'] / 100
    sourcecred_data = sourcecred_data[['IDENTITY', 'PERCENTAGE', 'TOKEN TO RECEIVE']]
    return sourcecred_data

Now we are ready to go.

In [29]:
#generates a new table with combined percentages and added token rewards
# ISSUE: We need single ids
def combine_datasets(praise_data, sourcecred_data):
    processed_praise = prepare_praise(praise_data)
    processed_sourcecred = prepare_sourcecred(sourcecred_data)
    combined_dataset = processed_praise.append(processed_sourcecred, ignore_index=True)

    combined_dataset = combined_dataset.groupby(['IDENTITY']).agg('sum').reset_index()
    #since we just added to percentages
    combined_dataset['PERCENTAGE'] = combined_dataset['PERCENTAGE'] / 2


    return combined_dataset


total_period_rewards =combine_datasets(praise_distribution, sourcecred_data)
print(total_period_rewards)

   IDENTITY  PERCENTAGE  TOKEN TO RECEIVE
0      bot0    0.065301       9232.792196
1      bot1    0.035651        874.768741
2     bot10    0.020120        492.898297
3     bot11    0.037099        447.640907
4     bot12    0.015800       2298.660294
5     bot13    0.002305        461.040198
6     bot14    0.044236       5788.186182
7     bot15    0.024810       3033.389150
8     bot16    0.055361       6738.384601
9     bot17    0.000229         45.722995
10    bot18    0.033679        444.571848
11    bot19    0.054887      10977.329015
12     bot2    0.020940        686.917526
13    bot20    0.017429        544.097681
14    bot21    0.017380        759.012890
15    bot22    0.000972        194.322728
16    bot23    0.006579        827.942152
17    bot24    0.016495        232.914568
18    bot25    0.032583       4629.464603
19    bot26    0.027747       1790.681071
20    bot27    0.003505        701.085921
21    bot28    0.030522       2709.090576
22    bot29    0.003544        708

![title](img/praiseFusion.jpg)

## Results Analysis
Let's dive into some data analysis! We'll use the metrics designed and explained by octopus🐙 for now, starting with:

### Allocation percentages
This table will show us which percentage if the total rewards gets distributed to which top % of users. So "Top 50% -> 0.85" would mean that the top 50% of praisees received 85% of the total rewards 

In [30]:
p_vals = np.array([50,80,90,95,99])
rewards_rp = np.array([tb.resource_percentage(total_period_rewards["TOKEN TO RECEIVE"], p) for p in p_vals])

my_rd_index = [("Top " + str(100 - p) +"%") for p in p_vals]
resource_distribution = pd.DataFrame({"Rewards": rewards_rp}, index = my_rd_index)
print(resource_distribution)

          Rewards
Top 50%  0.899760
Top 20%  0.675479
Top 10%  0.487188
Top 5%   0.363162
Top 1%   0.163062


### Gini coefficient
Next we will look at the Gini coefficient. Note that there is some debate if we want to use this metric at all, since it is usually employed to measure wealth distribution, and not compensation.

In [31]:
p_vals = np.array([0, 50, 80])
rewards_gc = np.array([tb.gini_gt_p(np.array(total_period_rewards["TOKEN TO RECEIVE"]), p) for p in p_vals])

my_index = ["All", "Top 50%", "Top 20%"]
gini_coefs = pd.DataFrame({"Rewards": rewards_gc}, index = my_index)
print(gini_coefs)

          Rewards
All      0.629154
Top 50%  0.471294
Top 20%  0.311645


### Shannon Entropy

[Shannon Entropy](https://en.wikipedia.org/wiki/Entropy_(information_theory)) is a concept from communications theory, which is also used in measuring the diversity of a distribution. The formula for calculating Shannon Entropy among $n$ individuals is
    $$\\sum_{k=1}^n -p_k log_2(p_k),$$
where $p_k$ represents the proportion of the resource that user $k$ received.

Here we compare the actual Shannon Entropy with the maximum possible for the dataset, keeping in mind that a Shannon Entropy of 0 would mean one user holds all the rewards


In [32]:
entropies_df = pd.DataFrame(data = {"Rewards" : tb.calc_shannon_entropies(total_period_rewards["PERCENTAGE"]) }, index = ["Entropy", "Max Entropy", "% of Max"])
print(entropies_df)

              Rewards
Entropy      5.051209
Max Entropy  5.643856
% of Max     0.894993


### Nakamoto Coefficient
Last but not least, the Nakamoto coefficient. The Nakamato Coefficient is defined as the smallest number of accounts who control at least 50% of the resource. Although its significance relates to the prospect of a 51% attack on a network, which may not be relevant in our context, we can still use it as an intuitive measure of how many individuals received the majority of a resource.

In [33]:
ak_coef_IH = tb.nakamoto_coeff(total_period_rewards, "PERCENTAGE")
print(ak_coef_IH)

11


## Quantifier Data
### Praise by Quantifier
Let's take a closer look at each quantifier. In the following step we will use the raw praise data to zoom in on how each quantifier scored the praises:

In [34]:
def data_by_quantifier(praise_data):
    quant_only = pd.DataFrame()
    praise_data.drop(['DATE', 'TO', 'FROM', 'REASON', 'SERVER', 'CHANNEL', 'CORRECTION ADD', 'CORRECTION SUB', 'CORRECTION COMMENT', 'FINAL QUANT'], axis=1, inplace=True)
    num_of_quants = int((praise_data.shape[1] -1)/ 4)
    for i in range(num_of_quants):
        q_name =  str( 'QUANT_'+ str(i+1) +'_ID' )
        q_value = str('QUANT_'+str(i+1) )
        buf = praise_data[['ID', q_name , q_value ]].copy()
    
        buf.rename(columns={q_name: 'QUANT_ID', q_value: 'QUANT_VALUE', 'ID':'PRAISE_ID'}, inplace=True)
        #print(buf)
        quant_only = quant_only.append(buf.copy(), ignore_index=True)

    columnsTitles = ['QUANT_ID', 'PRAISE_ID', 'QUANT_VALUE']
    quant_only.sort_values(['QUANT_ID', 'PRAISE_ID'], inplace=True)
    quant_only =  quant_only.reindex(columns=columnsTitles).reset_index(drop=True)
    return quant_only

quantifier_table = data_by_quantifier(praise_data.copy())
print(quantifier_table)   



                              QUANT_ID  PRAISE_ID  QUANT_VALUE
0     0x000000000000000aafdb2ef4e870c8       1009           21
1     0x000000000000000aafdb2ef4e870c8       1010          144
2     0x000000000000000aafdb2ef4e870c8       1015          144
3     0x000000000000000aafdb2ef4e870c8       1016            0
4     0x000000000000000aafdb2ef4e870c8       1018           55
...                                ...        ...          ...
4495  0x00000000000000fa7c74e8880bb8d8       2489           13
4496  0x00000000000000fa7c74e8880bb8d8       2490          144
4497  0x00000000000000fa7c74e8880bb8d8       2492           21
4498  0x00000000000000fa7c74e8880bb8d8       2493          144
4499  0x00000000000000fa7c74e8880bb8d8       2498            0

[4500 rows x 3 columns]


### Amount of praise quantified
With the above table we can easily see how much praise each quantifier rated.

In [35]:
quant_praise_distribution = quantifier_table['QUANT_ID'].value_counts().reset_index().rename(columns={'index': 'QUANT_ADDRESS', 'QUANT_ID': 'NUMBER_OF_PRAISES'})
print(quant_praise_distribution)

                      QUANT_ADDRESS  NUMBER_OF_PRAISES
0  0x00000000000000f424831cf1d52bd3                463
1  0x000000000000002e0a02e98b08c1fc                462
2  0x000000000000008006d0ae0c65079b                462
3  0x0000000000000034c5757446d67b52                456
4  0x00000000000000fa7c74e8880bb8d8                453
5  0x00000000000000ac2fc7fc8165773f                451
6  0x00000000000000c60a37f0b254bffb                448
7  0x000000000000000aafdb2ef4e870c8                444
8  0x0000000000000022e76f033edab8d2                434
9  0x000000000000006aa85e33191c824a                427


## Total Praise Export
To send the allocations to the Aragon DAO for distribution, we need to put all data together and add the rewards for the reward board and the quantifiers:

In [36]:
## ToDo
def prepare_export_data(total_period_rewards, quantifier_table, rewardboard_addresses):
    final_allocations = pd.DataFrame()
    return final_allocations

final_token_allocations = prepare_export_data(total_period_rewards, quantifier_table, rewardboard_addresses)