# Exploratory Rewards Allocation Notebook
This goal of this notebook is to allow to visualize how changing different parameters of the reward system affects the outcome and inform decisionmakers' choices. 

### Basic Setup
First, we import the relevant libraries, get the Data and set how many tokens we want to distribute

In [1]:
from ipyfilechooser import FileChooser

import pandas as pd 
import numpy as np 
import analytics_toolbox as tb

fc_praise = FileChooser('./exampleFiles')
fc_sourcecred = FileChooser('./exampleFiles')

print("== Please choose the Praise CSV file == ")
display(fc_praise)
print("== Please choose the Sourcecred CSV file == ")
display(fc_sourcecred)


== Please choose the Praise CSV file == 


FileChooser(path='C:\Users\Pablo\Documents\TEC\ownRepo\praise_RewardAnalysis\exampleFiles', filename='', title…

== Please choose the Sourcecred CSV file == 


FileChooser(path='C:\Users\Pablo\Documents\TEC\ownRepo\praise_RewardAnalysis\exampleFiles', filename='', title…

Now that we have selected the files, we can import them for processing. We can also set the total amount of tokens we want to distribute this period.
Tip: Now that the file paths are set, you can safely click on "Cell > Run all below" from here on the menu bar to execute everything :) 

In [2]:
PRAISE_DATA_PATH = fc_praise.selected
SOURCECRED_DATA_PATH = fc_sourcecred.selected
NUMBER_OF_REWARD_TOKENS_TO_DISTRIBUTE = 1000

praise_data = pd.read_csv(PRAISE_DATA_PATH)
sourcecred_data = pd.read_csv(SOURCECRED_DATA_PATH)

## Praise reward allocation
This method allocates the praise rewards in a very straightforward way: It adds the value of all dished praised together, and then assigns to each user their % of the total.

In [3]:
def calc_praise_rewards(praiseData, tokensToDistribute):
    #we discard all we don't need and and aggregate all the praise for each single user
    slimData = praiseData[['TO', 'FINAL QUANT']].groupby(['TO']).agg('sum').reset_index()
    totalPraisePoints = slimData['FINAL QUANT'].sum()

    slimData['PERCENTAGE'] = slimData['FINAL QUANT']/totalPraisePoints
    slimData['TOKEN TO RECEIVE'] = slimData['PERCENTAGE'] * tokensToDistribute
    return slimData

praise_distribution = calc_praise_rewards(praise_data, NUMBER_OF_REWARD_TOKENS_TO_DISTRIBUTE)

## Combining the Datasets

Now that we have the distribution done, we combine it with sourcecred into one table.
But before continuing, we also want to declare some methods for cleaning the data and prepare it. We'll use them later.

In [4]:
#General Helper func. Puts all the "processing we probably won't need to do later or do differently" in one place
#  -removes the '#' and following from discord names
#  -Some renaming and dropping 
def prepare_praise(praise_data):
    praise_data['TO'] = (praise_data['TO'].str.split('#', 1, expand=False).str[0]).str.lower()
    praise_data.rename(columns = {'TO':'IDENTITY'}, inplace = True)
    praise_data = praise_data[['IDENTITY', 'PERCENTAGE', 'TOKEN TO RECEIVE']]
    return praise_data

#General Helper func. Puts all the "processing we probably won't need to do later or do differently" in one place
#  -Some renaming and dropping 
#  -changing percentages from 0 - 100 to 0.00-1.00
def prepare_sourcecred(sourcecred_data):
    sourcecred_data.rename(columns = {'%':'PERCENTAGE'}, inplace = True)
    sourcecred_data['IDENTITY'] = sourcecred_data['IDENTITY'].str.lower()
    sourcecred_data['PERCENTAGE'] = sourcecred_data['PERCENTAGE'] / 100
    sourcecred_data = sourcecred_data[['IDENTITY', 'PERCENTAGE', 'TOKEN TO RECEIVE']]
    return sourcecred_data

Now we are ready to go.

In [5]:
#generates a new table with combined percentages and added token rewards
# ISSUE: We need single ids
# weigjt specifies the weighting of the rewards between praise and sourcecred: 0.5 means 50/50; 0.3 means 30% praise, 70% sourcecred
def combine_datasets(praise_data, sourcecred_data, weight):
    processed_praise = prepare_praise(praise_data)
    processed_sourcecred = prepare_sourcecred(sourcecred_data)
    
    #modify token to receive from praise by weight
    processed_praise.loc[:,'TOKEN TO RECEIVE'] = processed_praise['TOKEN TO RECEIVE'].mul(weight)
    processed_praise.loc[:,'PERCENTAGE'] = processed_praise['PERCENTAGE'].mul(weight)
    
    #modify token to receive from sorucecred by weight
    processed_sourcecred.loc[:,'TOKEN TO RECEIVE'] = processed_sourcecred['TOKEN TO RECEIVE'].mul(1-weight)
    processed_sourcecred.loc[:,'PERCENTAGE'] = processed_sourcecred['PERCENTAGE'].mul(1-weight)
    
    combined_dataset = processed_praise.append(processed_sourcecred, ignore_index=True)

    combined_dataset = combined_dataset.groupby(['IDENTITY']).agg('sum').reset_index()
    #since we just added two percentages
    #combined_dataset['PERCENTAGE'] = combined_dataset['PERCENTAGE'] / 2


    return combined_dataset

total_praise_sets  = {}
weightings = [0.3, 0.4, 0.5, 0.6, 0.7]
for weight in weightings:
    total_period_praise =combine_datasets(praise_distribution.copy(), sourcecred_data.copy(), weight)
    total_praise_sets[weight]= total_period_praise.copy()
print(total_praise_sets)

{0.3:    IDENTITY  PERCENTAGE  TOKEN TO RECEIVE
0      bot0    0.025793        450.170333
1      bot1    0.035706        713.023731
2     bot10    0.058803       2843.955629
3     bot11    0.007898        387.308430
4     bot12    0.015046        402.887558
5     bot13    0.010816       1081.596237
6     bot14    0.000965         96.520399
7     bot15    0.007125        712.547652
8     bot16    0.027119       1491.362066
9     bot17    0.021320       2131.965285
10    bot18    0.024414       2441.398329
11    bot19    0.008545        854.489415
12     bot2    0.011026        387.625718
13    bot20    0.009206         48.552644
14    bot21    0.005763        576.283559
15    bot22    0.007175        282.598962
16    bot23    0.065906       6201.111172
17    bot24    0.025436       2543.596399
18    bot25    0.016210       1365.224038
19    bot26    0.017601       1760.077865
20    bot27    0.017941       1794.143888
21    bot28    0.055730       4850.352069
22    bot29    0.006671     

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)


![title](img/praiseFusion.jpg)

## Result Analysis
Let's dive into some data analysis! We'll use the metrics designed and explained by octopus🐙 for now, starting with:

### Allocation percentages
This table will show us which percentage if the total rewards gets distributed to which top % of users. So "Top 50% -> 0.85" would mean that the top 50% of praisees received 85% of the total rewards 

In [6]:
p_vals = np.array([50,80,90,95,99])
my_rd_index = [("Top " + str(100 - p) +"%") for p in p_vals]
resource_distribution = pd.DataFrame(index = my_rd_index)

for period_split in total_praise_sets.keys():
    rewards_rp = np.array([tb.resource_percentage(total_praise_sets[period_split]["TOKEN TO RECEIVE"].copy(), p) for p in p_vals])
    resource_distribution[period_split] = rewards_rp

print(resource_distribution)

              0.3       0.4       0.5       0.6       0.7
Top 50%  0.854751  0.853956  0.852848  0.851200  0.848490
Top 20%  0.499782  0.499169  0.498316  0.497046  0.494958
Top 10%  0.312557  0.312244  0.311809  0.311162  0.310097
Top 5%   0.226459  0.226012  0.225390  0.224464  0.222941
Top 1%   0.088209  0.088032  0.087784  0.087417  0.086811


### Gini coefficient
Next we will look at the Gini coefficient. Note that there is some debate if we want to use this metric at all, since it is usually employed to measure wealth distribution, and not compensation.

In [7]:
p_vals = np.array([0, 50, 80])
my_index = ["All", "Top 50%", "Top 20%"]
gini_coefs = pd.DataFrame(index = my_index)

for period_split in total_praise_sets.keys():
    rewards_gc = np.array([tb.gini_gt_p(np.array(total_praise_sets[period_split]["TOKEN TO RECEIVE"]), p) for p in p_vals])
    gini_coefs[period_split] = rewards_gc 


print(gini_coefs)

              0.3       0.4       0.5       0.6       0.7
All      0.486962  0.485953  0.484568  0.482550  0.479278
Top 50%  0.261237  0.260871  0.260361  0.259599  0.258366
Top 20%  0.178236  0.178043  0.177774  0.177372  0.176817


### Shannon Entropy

[Shannon Entropy](https://en.wikipedia.org/wiki/Entropy_(information_theory)) is a concept from communications theory, which is also used in measuring the diversity of a distribution. The formula for calculating Shannon Entropy among $n$ individuals is
    $$\\sum_{k=1}^n -p_k log_2(p_k),$$
where $p_k$ represents the proportion of the resource that user $k$ received.

Here we compare the actual Shannon Entropy with the maximum possible for the dataset, keeping in mind that a Shannon Entropy of 0 would mean one user holds all the rewards


In [8]:
entropies_index = ["Entropy", "Max Entropy", "% of Max"]
entropies_df = pd.DataFrame(index = entropies_index)

for period_split in total_praise_sets.keys():
    entrop_arr =  tb.calc_shannon_entropies(total_praise_sets[period_split]["PERCENTAGE"].copy())
    entropies_df[period_split] = entrop_arr
    
print(entropies_df)

                  0.3       0.4       0.5       0.6       0.7
Entropy      5.224874  5.216653  5.182717  5.122839  5.035168
Max Entropy  5.643856  5.643856  5.643856  5.643856  5.643856
% of Max     0.925763  0.924306  0.918294  0.907684  0.892150


### Nakamoto Coefficient
Last but not least, the Nakamoto coefficient. The Nakamato Coefficient is defined as the smallest number of accounts who control at least 50% of the resource. Although its significance relates to the prospect of a 51% attack on a network, which may not be relevant in our context, we can still use it as an intuitive measure of how many individuals received the majority of a resource.

In [9]:

for period_split in total_praise_sets.keys():
    ak_coef_IH = tb.nakamoto_coeff(total_praise_sets[period_split].copy(), "PERCENTAGE")
    print(str(period_split) + ": " + str(ak_coef_IH))


0.3: 13
0.4: 12
0.5: 12
0.6: 11
0.7: 10
