# Rewards Allocation Notebook
This goal of this notebook is to offer an easy way to process the outputs of the praise and sourcecred reward systems, calculate a reward distribution and perform an analysis of the results. It uses mock data and should be considered a work-in-progress. 

### Basic Setup
First, we import the relevant libraries, get the Data and set how many tokens we want to distribute

In [1]:
from ipyfilechooser import FileChooser

import pandas as pd 
import numpy as np 
import analytics_toolbox as tb

fc_praise = FileChooser('./exampleFiles')
fc_sourcecred = FileChooser('./exampleFiles')

print("== Please choose the Praise CSV file == ")
display(fc_praise)
print("== Please choose the Sourcecred CSV file == ")
display(fc_sourcecred)


== Please choose the Praise CSV file == 


FileChooser(path='C:\Users\Pablo\Documents\TEC\ownRepo\praise_RewardAnalysis\exampleFiles', filename='', title…

== Please choose the Sourcecred CSV file == 


FileChooser(path='C:\Users\Pablo\Documents\TEC\ownRepo\praise_RewardAnalysis\exampleFiles', filename='', title…

Now that we have selected the files, we can import them for processing. We can also set the total amount of tokens we want to distribute this period.
Tip: Now that the file paths are set, you can safely click on "Cell > Run all below" from here on the menu bar to execute everything :) 

In [2]:
PRAISE_DATA_PATH = fc_praise.selected
SOURCECRED_DATA_PATH = fc_sourcecred.selected
NUMBER_OF_REWARD_TOKENS_TO_DISTRIBUTE = 1000

praise_data = pd.read_csv(PRAISE_DATA_PATH)
sourcecred_data = pd.read_csv(SOURCECRED_DATA_PATH)

## Praise reward allocation
This method allocates the praise rewards in a very straightforward way: It adds the value of all dished praised together, and then assigns to each user their % of the total.

In [3]:
def calc_praise_rewards(praiseData, tokensToDistribute):
    #we discard all we don't need and and aggregate all the praise for each single user
    slimData = praiseData[['TO', 'FINAL QUANT']].groupby(['TO']).agg('sum').reset_index()
    totalPraisePoints = slimData['FINAL QUANT'].sum()

    slimData['PERCENTAGE'] = slimData['FINAL QUANT']/totalPraisePoints
    slimData['TOKEN TO RECEIVE'] = slimData['PERCENTAGE'] * tokensToDistribute
    return slimData

praise_distribution = calc_praise_rewards(praise_data, NUMBER_OF_REWARD_TOKENS_TO_DISTRIBUTE)

## Combining the Datasets

Now that we have the distribution done, we combine it with sourcecred into one table.
But before continuing, we also want to declare some methods for cleaning the data and prepare it. We'll use them later.

In [4]:
#General Helper func. Puts all the "processing we probably won't need to do later or do differently" in one place
#  -removes the '#' and following from discord names
#  -Some renaming and dropping 
def prepare_praise(praise_data):
    praise_data['TO'] = (praise_data['TO'].str.split('#', 1, expand=False).str[0]).str.lower()
    praise_data.rename(columns = {'TO':'IDENTITY'}, inplace = True)
    praise_data = praise_data[['IDENTITY', 'PERCENTAGE', 'TOKEN TO RECEIVE']]
    return praise_data

#General Helper func. Puts all the "processing we probably won't need to do later or do differently" in one place
#  -Some renaming and dropping 
#  -changing percentages from 0 - 100 to 0.00-1.00
def prepare_sourcecred(sourcecred_data):
    sourcecred_data.rename(columns = {'%':'PERCENTAGE'}, inplace = True)
    sourcecred_data['IDENTITY'] = sourcecred_data['IDENTITY'].str.lower()
    sourcecred_data['PERCENTAGE'] = sourcecred_data['PERCENTAGE'] / 100
    sourcecred_data = sourcecred_data[['IDENTITY', 'PERCENTAGE', 'TOKEN TO RECEIVE']]
    return sourcecred_data

Now we are ready to go.

In [5]:
#generates a new table with combined percentages and added token rewards
# ISSUE: We need single ids
def combine_datasets(praise_data, sourcecred_data):
    processed_praise = prepare_praise(praise_data)
    processed_sourcecred = prepare_sourcecred(sourcecred_data)
    combined_dataset = processed_praise.append(processed_sourcecred, ignore_index=True)

    combined_dataset = combined_dataset.groupby(['IDENTITY']).agg('sum').reset_index()
    #since we just added to percentages
    combined_dataset['PERCENTAGE'] = combined_dataset['PERCENTAGE'] / 2


    return combined_dataset


total_period_praise =combine_datasets(praise_distribution, sourcecred_data)
print(total_period_praise)

            IDENTITY  PERCENTAGE  TOKEN TO RECEIVE
0    divine_comedian    0.073840        147.679325
1       eduardovegap    0.013000          0.038300
2        freedumbs00    0.021500          0.062800
3              griff    0.074500          0.193800
4          iviangita    0.123705        219.449883
5     jessicazartler    0.060000          0.150000
6          juankbell    0.028000          0.054500
7            liviade    0.040000          0.089700
8             markop    0.044500          0.103900
9         mount manu    0.035865         71.729958
10            nuggan    0.044304         88.607595
11             pab🐝🐙    0.097046        194.092827
12           santigs    0.026500          0.050100
13               sem    0.036500          0.079900
14            tamara    0.084000          0.221800
15  vitor marthendal    0.139241        278.481013
16      ygg-anderson    0.029000          0.056700
17          zeptimus    0.028500          0.055700


![title](img/praiseFusion.jpg)

## Result Analysis
Let's dive into some data analysis! We'll use the metrics designed and explained by octopus🐙 for now, starting with:

### Allocation percentages
This table will show us which percentage if the total rewards gets distributed to which top % of users. So "Top 50% -> 0.85" would mean that the top 50% of praisees received 85% of the total rewards 

In [6]:
p_vals = np.array([50,80,90,95,99])
rewards_rp = np.array([tb.resource_percentage(total_period_praise["TOKEN TO RECEIVE"], p) for p in p_vals])

my_rd_index = [("Top " + str(100 - p) +"%") for p in p_vals]
resource_distribution = pd.DataFrame({"Rewards": rewards_rp}, index = my_rd_index)
print(resource_distribution)

          Rewards
Top 50%  0.999409
Top 20%  0.838698
Top 10%  0.497335
Top 5%   0.278148
Top 1%   0.278148


### Gini coefficient
Next we will look at the Gini coefficient. Note that there is some debate if we want to use this metric at all, since it is usually employed to measure wealth distribution, and not compensation.

In [7]:
p_vals = np.array([0, 50, 80])
rewards_gc = np.array([tb.gini_gt_p(np.array(total_period_praise["TOKEN TO RECEIVE"]), p) for p in p_vals])

my_index = ["All", "Top 50%", "Top 20%"]
gini_coefs = pd.DataFrame({"Rewards": rewards_gc}, index = my_index)
print(gini_coefs)

          Rewards
All      0.747470
Top 50%  0.496318
Top 20%  0.124378


### Shannon Entropy

[Shannon Entropy](https://en.wikipedia.org/wiki/Entropy_(information_theory)) is a concept from communications theory, which is also used in measuring the diversity of a distribution. The formula for calculating Shannon Entropy among $n$ individuals is
    $$\\sum_{k=1}^n -p_k log_2(p_k),$$
where $p_k$ represents the proportion of the resource that user $k$ received.

Here we compare the actual Shannon Entropy with the maximum possible for the dataset, keeping in mind that a Shannon Entropy of 0 would mean one user holds all the rewards


In [8]:
entropies_df = pd.DataFrame(data = {"Rewards" : tb.calc_shannon_entropies(total_period_praise["PERCENTAGE"]) }, index = ["Entropy", "Max Entropy", "% of Max"])
print(entropies_df)

              Rewards
Entropy      3.905518
Max Entropy  4.169925
% of Max     0.936592


### Nakamoto Coefficient
Last but not least, the Nakamoto coefficient. The Nakamato Coefficient is defined as the smallest number of accounts who control at least 50% of the resource. Although its significance relates to the prospect of a 51% attack on a network, which may not be relevant in our context, we can still use it as an intuitive measure of how many individuals received the majority of a resource.

In [9]:
ak_coef_IH = tb.nakamoto_coeff(total_period_praise, "PERCENTAGE")
print(ak_coef_IH)

5
