# Rewards Allocation Notebook
This goal of this notebook is to offer an easy way to process the outputs of the praise and sourcecred reward systems, perform an analysis of the results and calculate the token reward distribution. It uses mock data and should be considered a work-in-progress. 

### Basic Setup
First, we import the relevant libraries, get the Data and set how many tokens we want to distribute

In [None]:
from ipyfilechooser import FileChooser

import pandas as pd 
import numpy as np 
import analytics_toolbox as tb

import holoviews as hv
from holoviews import opts
import plotly.graph_objects as go
import plotly.express as px

fc_praise = FileChooser('./exampleFiles')
fc_sourcecred = FileChooser('./exampleFiles')
fc_rewardboard = FileChooser('./exampleFiles')

print("== Please choose the Praise CSV file == ")
display(fc_praise)
print("== Please choose the Sourcecred CSV file == ")
display(fc_sourcecred)
print("== Please choose the Rewardboard address list CSV file == ")
display(fc_rewardboard)


Now that we have selected the files, we can import them for processing. We can also set the total amount of tokens we want to distribute this period. We also set the name we want our output file to have.
Tip: Now that the file paths are set, you can safely click on "Cell > Run all below" from here on the menu bar to execute everything :) 

In [None]:
PRAISE_DATA_PATH = fc_praise.selected
SOURCECRED_DATA_PATH = fc_sourcecred.selected
REWARD_BOARD_ADDRESSES_PATH = fc_rewardboard.selected

praise_data = pd.read_csv(PRAISE_DATA_PATH)
sourcecred_data = pd.read_csv(SOURCECRED_DATA_PATH)
rewardboard_addresses = pd.read_csv(REWARD_BOARD_ADDRESSES_PATH)

In [None]:

NUMBER_OF_PRAISE_REWARD_TOKENS_TO_DISTRIBUTE = 1950
#Right now sourcecred rewards are calculated externally and already specified in the input. This may change in the future.
NUMBER_OF_SOURCECRED_REWARD_TOKENS_TO_DISTRIBUTE = 1950
NUMBER_OF_REWARD_TOKENS_FOR_QUANTIFIERS = 1000
NUMBER_OF_REWARD_TOKENS_FOR_REWARD_BOARD = 100

OUTPUT_FILENAME = "rewards_01"


In [None]:
hv.extension('bokeh')

## Reward allocation

### Praise

This method allocates the praise rewards in a very straightforward way: It adds the value of all dished praised together, and then assigns to each user their % of the total.

In [None]:
def calc_praise_rewards(praiseData, tokensToDistribute):
    #we discard all we don't need and and calculate the % worth of each praise
    slimData = praiseData[['FROM', 'TO', 'FINAL QUANT']].copy()
    totalPraisePoints = slimData['FINAL QUANT'].sum()

    slimData['PERCENTAGE'] = slimData['FINAL QUANT']/totalPraisePoints
    slimData['TOKEN TO RECEIVE'] = slimData['PERCENTAGE'] * tokensToDistribute
    return slimData

praise_distribution = calc_praise_rewards(praise_data.copy(), NUMBER_OF_PRAISE_REWARD_TOKENS_TO_DISTRIBUTE)
#raise_distribution.style


### SourceCred
We do the same procedure, but with the sourcecred data.

In [None]:
def calc_sourcecred_rewards(sourcecredData, tokensToDistribute):
    #we discard all we don't need and and calculate the % worth of each praise
    slimData = sourcecredData[['IDENTITY', 'AMOUNT']].copy()
    totalGrainPoints = slimData['AMOUNT'].sum()

    slimData['PERCENTAGE'] = slimData['AMOUNT']/totalGrainPoints
    slimData['TOKEN TO RECEIVE'] = slimData['PERCENTAGE'] * tokensToDistribute
    return slimData

sourcecred_distribution = calc_sourcecred_rewards(sourcecred_data.copy(), NUMBER_OF_SOURCECRED_REWARD_TOKENS_TO_DISTRIBUTE)
sourcecred_distribution.style

## Preparing and combining the Datasets

Now that we have the distributions done, we can combine them into one table.
But before that, we need to prepare the data and clean it a bit. We also use the chance to generate a table which shows us how much praise each user received. We'll use it later in our analysis.

In [None]:
#General Helper func. Puts all the "processing we probably won't need to do later or do differently" in one place
#  -removes the '#' and following from discord names
#  -Some renaming and dropping 
def prepare_praise(praise_data):
    praise_data['TO'] = (praise_data['TO'].str.split('#', 1, expand=False).str[0]).str.lower()
    praise_data.rename(columns = {'TO':'IDENTITY'}, inplace = True)
    processed_praise = praise_data[['IDENTITY', 'PERCENTAGE', 'TOKEN TO RECEIVE']]
    praise_by_user = praise_data[['IDENTITY', 'FINAL QUANT', 'PERCENTAGE', 'TOKEN TO RECEIVE']].groupby(['IDENTITY']).agg('sum').reset_index()
    return processed_praise, praise_by_user

#General Helper func. Puts all the "processing we probably won't need to do later or do differently" in one place
#  -Some renaming and dropping 
#  -changing percentages from 0 - 100 to 0.00-1.00
def prepare_sourcecred(sourcecred_data):
    sourcecred_data.rename(columns = {'%':'PERCENTAGE'}, inplace = True)
    sourcecred_data['IDENTITY'] = sourcecred_data['IDENTITY'].str.lower()
    sourcecred_data['PERCENTAGE'] = sourcecred_data['PERCENTAGE'] / 100
    sourcecred_data = sourcecred_data[['IDENTITY', 'PERCENTAGE', 'TOKEN TO RECEIVE']]
    return sourcecred_data


processed_praise, praise_by_user = prepare_praise(praise_distribution.copy())
processed_sourcecred = prepare_sourcecred(sourcecred_data.copy())


Let's also create a table which will let us focus on the quantifiers. It will show us what value each quantifier gave to each single praise item.

In [None]:
def data_by_quantifier(praise_data):
    quant_only = pd.DataFrame()
    praise_data.drop(['DATE', 'TO', 'FROM', 'REASON', 'SERVER', 'CHANNEL', 'CORRECTION ADD', 'CORRECTION SUB', 'CORRECTION COMMENT', 'FINAL QUANT'], axis=1, inplace=True)
    num_of_quants = int((praise_data.shape[1] -1)/ 4)
    for i in range(num_of_quants):
        q_name =  str( 'QUANT_'+ str(i+1) +'_ID' )
        q_value = str('QUANT_'+str(i+1) )
        buf = praise_data[['ID', q_name , q_value ]].copy()
    
        buf.rename(columns={q_name: 'QUANT_ID', q_value: 'QUANT_VALUE', 'ID':'PRAISE_ID'}, inplace=True)
        #print(buf)
        quant_only = quant_only.append(buf.copy(), ignore_index=True)

    columnsTitles = ['QUANT_ID', 'PRAISE_ID', 'QUANT_VALUE']
    quant_only.sort_values(['QUANT_ID', 'PRAISE_ID'], inplace=True)
    quant_only =  quant_only.reindex(columns=columnsTitles).reset_index(drop=True)
    return quant_only

quantifier_rating_table = data_by_quantifier(praise_data.copy())
#quantifier_rating_table.style



Now we are ready to go.

In [None]:
#generates a new table with combined percentages and token rewards
def combine_datasets(praise_data, sourcecred_data):
    
    combined_dataset = processed_praise.append(processed_sourcecred, ignore_index=True)
    combined_dataset = combined_dataset.groupby(['IDENTITY']).agg('sum').reset_index()
    
    #since we just added to percentages
    combined_dataset['PERCENTAGE'] = combined_dataset['PERCENTAGE'] / 2


    return combined_dataset


#To Do: Sort this output
total_period_rewards =combine_datasets(processed_praise.copy(), processed_sourcecred.copy())
total_period_rewards.style

![title](img/praiseFusion.jpg)

# Results Analysis
Let's dive into some data analysis! We'll use the metrics designed and explained by octopus🐙 for now, starting with:

### Allocation percentages
This table will show us which percentage if the total rewards gets distributed to which top % of users. So "Top 50% -> 0.85" would mean that the top 50% of praisees received 85% of the total rewards 

In [None]:
p_vals = np.array([50,80,90,95,99])
rewards_rp = np.array([tb.resource_percentage(total_period_rewards["TOKEN TO RECEIVE"], p) for p in p_vals])

my_rd_index = [("Top " + str(100 - p) +"%") for p in p_vals]
resource_distribution = pd.DataFrame({"Rewards": rewards_rp}, index = my_rd_index)
resource_distribution

### Gini coefficient
Next we will look at the Gini coefficient. Note that there is some debate if we want to use this metric at all, since it is usually employed to measure wealth distribution, and not compensation.

In [None]:
p_vals = np.array([0, 50, 80])
rewards_gc = np.array([tb.gini_gt_p(np.array(total_period_rewards["TOKEN TO RECEIVE"]), p) for p in p_vals])

my_index = ["All", "Top 50%", "Top 20%"]
gini_coefs = pd.DataFrame({"Rewards": rewards_gc}, index = my_index)
gini_coefs

### Shannon Entropy

[Shannon Entropy](https://en.wikipedia.org/wiki/Entropy_(information_theory)) is a concept from communications theory, which is also used in measuring the diversity of a distribution. The formula for calculating Shannon Entropy among $n$ individuals is
    $$\\sum_{k=1}^n -p_k log_2(p_k),$$
where $p_k$ represents the proportion of the resource that user $k$ received.

Here we compare the actual Shannon Entropy with the maximum possible for the dataset, keeping in mind that a Shannon Entropy of 0 would mean one user holds all the rewards


In [None]:
entropies_df = pd.DataFrame(data = {"Rewards" : tb.calc_shannon_entropies(total_period_rewards["PERCENTAGE"]) }, index = ["Entropy", "Max Entropy", "% of Max"])
entropies_df

### Nakamoto Coefficient
Last but not least, the Nakamoto coefficient. The Nakamato Coefficient is defined as the smallest number of accounts who control at least 50% of the resource. Although its significance relates to the prospect of a 51% attack on a network, which may not be relevant in our context, we can still use it as an intuitive measure of how many individuals received the majority of a resource.

In [None]:
ak_coef_IH = tb.nakamoto_coeff(total_period_rewards, "PERCENTAGE")
ak_coef_IH

## Praise Data

### Rating distribution
Since praise gets valued on a scale, we can take a look at how often each value of the scale gets assigned by quantifiers.

In [None]:
freq = quantifier_rating_table[['QUANT_VALUE']].value_counts().rename_axis('QUANT_VALUE').reset_index(name='counts').sort_values(by=['QUANT_VALUE'])
freq['QUANT_VALUE'] = freq['QUANT_VALUE'].astype('string')

fig_freq = px.bar(freq, x="QUANT_VALUE", y="counts", labels={"QUANT_VALUE": "Rating","counts": "Number of appearances"}, title="Praise Rating Distribution", width=800, height=300)
fig_freq.show()


### Praise Reward Distribution

We can also take a look at the distribution of the received praise rewards

In [None]:
pr_distribution = praise_by_user[['IDENTITY', 'PERCENTAGE']].sort_values(by=['PERCENTAGE'], ascending=False)

fig_pr_distribution = px.bar(pr_distribution, x='IDENTITY', y='PERCENTAGE', labels={"IDENTITY": "User","PERCENTAGE": "% of total"}, title="Praise Reward Distribution")#.opts(width=800, height=500, title='SourceCred Distribution', xlabel='Value', ylabel='% of Total', xaxis='bare')
fig_pr_distribution.update_xaxes(showticklabels=False)

fig_pr_distribution.show()


### Praise Flows

Now for something more fun: let's surface the top "praise flows" from the data. Thanks to @inventandchill for this awesome visualization! 
On one side we have the top 20 praise givers separately (modifiable by changing the variable n_senders), on the other the top 25 receivers (modifiable by changing the variable n_receivers). The people outside the selection get aggregated into the "REST FROM" and "REST TO" categories.

In [None]:
NUMBER_OF_SENDERS_FLOW = 20 #The left side, the praise senders. X largest ones + one bucket for the rest 
NUMBER_OF_RECEIVERS_FLOW = 25 #The right side, the praise receivers. X largest ones + one bucket for the rest 
praise_flow = tb.prepare_praise_flow(praise_distribution.copy(), n_senders=NUMBER_OF_SENDERS_FLOW, n_receivers=NUMBER_OF_RECEIVERS_FLOW)
#praise_flow

In [None]:
%%opts Sankey (cmap='Category10' edge_color='FROM' edge_line_width=0 node_alpha=1.0)
%%opts Sankey [node_sort=False label_position='outer' bgcolor="snow" node_width=40 node_sort=True ]
%%opts Sankey [width=1000 height=800 title="Praise flow for Batch 1. Sum of Praise. Left - praise sender. Right - praise receiver"]
%%opts Sankey [margin=0 padding=0 show_values=True]

hv.Sankey(praise_flow, kdims=["FROM", "TO"], vdims=["FINAL QUANT"])

## SourceCred Data

### SourceCred token Distribution

Next we can see the distribution made by the SourceCred algorithm.

In [None]:

sc_distribution = processed_sourcecred[['IDENTITY', 'PERCENTAGE']].sort_values(by=['PERCENTAGE'], ascending=False)

fig_sc_distribution = px.bar(sc_distribution, x='IDENTITY', y='PERCENTAGE', labels={"IDENTITY": "User","PERCENTAGE": "% of total"}, title="SourceCred Distribution")
fig_sc_distribution.update_xaxes(showticklabels=False)

fig_sc_distribution.show()


## Quantifier Data
Let's take a closer look at each quantifier. In the following step we will use the raw praise data to zoom in on how each quantifier scored the praises:

### Amount of praise quantified
With the above table we can easily see how much praise each quantifier rated.

In [None]:
quant_praise_distribution = quantifier_rating_table['QUANT_ID'].value_counts().reset_index().rename(columns={'index': 'QUANT_ADDRESS', 'QUANT_ID': 'NUMBER_OF_PRAISES'})
#quant_praise_distribution

In [None]:
fig = px.pie(quant_praise_distribution, values='NUMBER_OF_PRAISES', names='QUANT_ADDRESS', title='Amount of praise per quantifier', labels={'NUMBER_OF_PRAISES': 'Praises quant'})
fig.update_layout(showlegend=False)
fig.show()

## Total Token Distribution Visualization and  Export
To send the allocations to the DAO for distribution, we need to put all data together and add the rewards for the reward board and the quantifiers. 

First, we will calculate the rewards for the Quantifiers and the Reward Board. This is fairly straightforward, since we distribute the allocated tokens equally.


In [None]:
quantifier_rewards = pd.DataFrame(quant_praise_distribution['QUANT_ADDRESS'].copy())
quantifier_rewards['TOKEN TO RECEIVE'] = NUMBER_OF_REWARD_TOKENS_FOR_QUANTIFIERS / len(quantifier_rewards.index)
#quantifier_rewards

In [None]:
rewardboard_rewards = pd.DataFrame(rewardboard_addresses)
rewardboard_rewards['TOKEN TO RECEIVE'] = NUMBER_OF_REWARD_TOKENS_FOR_REWARD_BOARD / len(rewardboard_rewards.index)
#rewardboard_rewards

Now we can merge them all into one table and save it, ready for distribution!

In [None]:
def prepare_export_data(total_period_rewards, quantifier_rewards, rewardboard_rewards):
    quantifier_rewards.rename(columns = {'QUANT_ADDRESS':'IDENTITY'}, inplace = True)
    rewardboard_rewards.rename(columns = {'ID':'IDENTITY'}, inplace = True)
    final_allocations = pd.DataFrame(total_period_rewards[['IDENTITY', 'TOKEN TO RECEIVE']])
    final_allocations = final_allocations.append(quantifier_rewards)
    final_allocations = final_allocations.append(rewardboard_rewards)
    final_allocations = final_allocations.groupby(['IDENTITY']).agg('sum').reset_index()
    final_allocations = final_allocations.reset_index(drop =True)
    return final_allocations

final_token_allocations = prepare_export_data(total_period_rewards, quantifier_rewards, rewardboard_rewards)
final_token_allocations.style

In [None]:
# path to save the file into
directory = "/home/dev/Documents/GitHub/praise_RewardAnalysis/exampleFiles/"
filepath = directory + OUTPUT_FILENAME + ".csv"

final_token_allocations.to_csv(filepath, index=False, header=False)

That's it! The resulting file can be uploaded to Github for future reference and recalculated anytime.