# The Retrieval Pinning Parametric Analysis

On this notebook, the formalism as described on the [Retrieval Pinning](https://hackmd.io/@irenegia/retriev)
cryptoecon section is implemented. Choices of priors were selected and propagated
so that the equilibrium values of the parameters according to the rationality
criteria are generated.

Then, it is possible to filter the data for different sequences of agent decisions (eg. the incidence of $d_2=T$ given $d_1=T$, or the incidence of $d_4=T$ given $d_3, d_2, d_1=T$). By using the equilibrium values, it is possible to estimate the rational upper boundary for the action sequence happening based on the model parameters.

## Dependences

In [67]:
import pymc3 as pm
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import plotly.express as px
import plotly.figure_factory as ff
from arviz import InferenceData

## PyMC3 Model & Simulation

In [40]:
FIL = float
SEED = 31416
RUN_MODEL = True
TRACE_PATH = '../data/decision_tree_trace.nc'

if RUN_MODEL:
    with pm.Model() as model:

        AVERAGE_PAYMENT: FIL = 1.0
        STD_PAYMENT: FIL = 1.0

        # Observables
        p: FIL = pm.Gamma('payment', mu=AVERAGE_PAYMENT, sd=STD_PAYMENT)
        ## Probability of Retrieving Given Non-Deal
        P_r_d2bar = pm.Uniform('P_r_d2bar', lower=0.5, upper=1)
        ## Probability of Retrieving Given Deal
        P_r_d2 = pm.Uniform('P_r_d2', lower=P_r_d2bar, upper=1)
        ## Probability of Retrieving Given Appeal
        P_r_d3 = pm.Uniform('P_r_d3', lower=0.0, upper=1.0)
        ## Probability of Slashing Given Deal
        P_d4_d2 = pm.Beta('P_d4_d2', mu=0.01, sd=0.01)

        # Inobservables
        ## Expected Number of Appeals on an Deal
        N_a = pm.Gamma('N_a', mu=P_d4_d2, sd=0.005)
        ## Deal Leverage on Utility
        alpha = pm.Uniform('alpha', lower=0.01, upper=2)

        # Parameters
        ## Committee Fee
        m_c = pm.Uniform('m_c', lower=0, upper=10)
        ## Slashing Multiplier
        m_s = pm.Uniform('m_s', lower=0, upper=4_000)

        # Auxiliary Variables
        ## Retrieval Utility
        pi_r_C: FIL = pm.Deterministic('pi_r_C', p / alpha)
        ## Probability of Slashing Given an Appeal
        P_d4_d3 = pm.Uniform('P_d4_d3', lower=P_r_d3, upper=1)
        ## Probability of Retrieving Given that there's no Appeal
        P_r_d3bar = pm.Uniform('P_r_d3bar', lower=0.0, upper=P_r_d3)
        ## Probability of Not Slashing Given an Deal
        P_d4bar_d2 = pm.Deterministic('P_d4bar_d2', 1 - P_d4_d2)
        ## Marginal Probability on Deal
        d_r = P_r_d2 - P_r_d2bar
        ## Equilibrium Committee Fee
        # Interpretation: The Maximum Committee Fee which is acceptable
        # to d_1 to be rational
        eq_mc = pm.Deterministic('eq_mc', (d_r + alpha + 2 * P_d4_d2 - 1) / N_a)
        ## Equilibrium Slashing Multiplier
        # Interpretation: The maximum Effective Slashing Multiplier which is acceptable for d_2 to be rational
        eq_ms = pm.Deterministic('eq_ms', 1 / P_d4_d2 - 1)

        # Payoffs & Decisions
        pi_d1_C = pi_r_C * P_r_d2 + p * (2 * P_d4_d2 - N_a * m_c - 1)
        pi_d1_C_bar = pi_r_C * P_r_d2bar
        U_d1 = pm.Deterministic('U_d1', pi_d1_C - pi_d1_C_bar)
        d1 = pm.Deterministic('d1', U_d1 > 0)

        pi_d2_P = p * (P_d4bar_d2 - m_s * P_d4_d2)
        pi_d2_P_bar = 0.0
        U_d2 = pi_d2_P - pi_d2_P_bar
        d2 = pm.Deterministic('d2', d1 & (U_d2 > 0))

        pi_d3_C = pi_r_C * P_r_d3 + p * (P_d4_d3 - m_c)
        pi_d3_C_bar = pi_r_C * P_r_d3bar
        U_d3 = pi_d3_C - pi_d3_C_bar
        d3 = pm.Deterministic('d3', d2 & (U_d3 > 0))

        pi_d4_R = pm.Bernoulli('pi_d4_R', p=1-P_r_d3)
        pi_d4_R_bar = 0 if pi_d4_R == 1 else 0
        U_d4 = pi_d4_R - pi_d4_R_bar
        d4 = pm.Deterministic('d4', d3 & (U_d4 > 0))

        # Priors for unknown model parameters
        trace: InferenceData = pm.sample(draws=10_000,
                        chains=4,
                        tune=5_000,
                        nuts={'target_accept':0.99},
                        random_seed=SEED,
                        return_inferencedata=True)
        trace.to_netcdf(TRACE_PATH)
else:
    trace = InferenceData.from_netcdf(TRACE_PATH)

Multiprocess sampling (4 chains in 4 jobs)
CompoundStep
>NUTS: [P_r_d3bar, P_d4_d3, m_s, m_c, alpha, N_a, P_d4_d2, P_r_d3, P_r_d2, P_r_d2bar, payment]
>BinaryGibbsMetropolis: [pi_d4_R]


Sampling 4 chains for 5_000 tune and 10_000 draw iterations (20_000 + 40_000 draws total) took 399 seconds.
There were 49 divergences after tuning. Increase `target_accept` or reparameterize.
The number of effective samples is smaller than 10% for some parameters.


In [41]:
df = trace.posterior.to_dataframe()

## Analysis

In [42]:
metric_0 = df.d1
metric_1 = df.d2[metric_0 == True]
metric_2 = (df.P_r_d2 / df.P_r_d2bar)[metric_0 == True]
#metric_3 = df.payment * (1 - df.P_r_d2) * df.P_d4_d2 / df.pi_r_C
#metric_4 = df.pi_d4_R_bar
metric_5 = df.pi_d4_R

### The effect of sucessive decisions on the survival of parametric scenarios

Given the consideration that parametric choice should seek maximize Protocol 
Adoption, which is measured by the relative incidence of $d_1$ and $d_2$ given 
parameter choices, we will perform an survival analysis where samples
are going to be filtered taking into consideration the sequence of decisions. 
(eg. What's the incidence of parameter choices grouped by $d_1$? If $d_1=T$, 
then what's the incidence grouped by $d_2$? etc.)

Based on the scenario survival, it is possible to estimate ranges on which
specific decisions is going to be uncommon or irrational. Specifically, a low
incidence of the parameters on a given range imply a low likelihood that agents
will go over that given decision sequence.

#### Studying the effect of $m_s$ on the decisions

We can observe the following phenomenae:

- $d_1$: Not affected
    - Doesn't seem to be affected by different values of $m_s# 
- $d_2$: Lower values of $m_s$ increase likelihood of $d_2=T$ significantly
    - Most scenarios require $m_s<300$ for $d_2=T$
    - $d_2=T$ becomes specially scarce after $m_s>2000$
    - Given that $m_s$ is an upper limit on how much the collateral can be 
    negotiated, setting-up to between 300 and 2000 is acceptable.
- $d_3$: Affected by $m_s$ in an non-trivial way 
    - Overall, the $m_s$ distribution becomes more left-skewed. The interpretation
    is that lower values of $m_s$ tends to increase number of appeals.
- $d_4$: Not affected

In [80]:
var = 'm_s'
(width, height) = (1500, 250)
nbins = 20

fig_df = df
fig = px.histogram(fig_df,
                   x=var,
                   facet_col='d1',
                   histnorm='probability density',
                   marginal='box',
                   width=width,
                   height=height,
                   nbins=nbins)
fig.show()

fig_df = df.query('d1 == True')
fig = px.histogram(fig_df,
                   x=var,
                   facet_col='d2',
                   histnorm='probability density',
                   marginal='box',
                   width=width,
                   height=height,
                   nbins=nbins)
fig.show()

fig_df = df.query('d1 == True & d2 == True')
fig = px.histogram(fig_df,
                   x=var,
                   facet_col='d3',
                   histnorm='probability density',
                   marginal='box',
                   width=width,
                   height=height,
                   nbins=nbins)
fig.show()

fig_df = df.query('d1 == True & d2 == True & d3 == True')
fig = px.histogram(fig_df,
                   x=var,
                   facet_col='d4',
                   histnorm='probability density',
                   marginal='box',
                   width=width,
                   height=height,
                   nbins=nbins)
fig.show()


### Studying the effect of $m_c$ on the decisions

We can observe the following phenomenae:

- $d_1$: Not affected
- $d_2$: Not affected
- $d_3$: Lower values increase the likelihood of $d_3=T$ significantly
    - The scenarios seems to suggest that the lower, the better for the client, although
    there seems to be an saturation point after $m_c<1.5$
    - It seems that $m_c=5$ can still be acceptable for some clients
- $d_4$: Not affected

In [82]:
var = 'm_c'
(width, height) = (1500, 250)
nbins = 20

fig_df = df
fig = px.histogram(fig_df,
                   x=var,
                   facet_col='d1',
                   histnorm='probability density',
                   marginal='box',
                   width=width,
                   height=height,
                   nbins=nbins)
fig.show()

fig_df = df.query('d1 == True')
fig = px.histogram(fig_df,
                   x=var,
                   facet_col='d2',
                   histnorm='probability density',
                   marginal='box',
                   width=width,
                   height=height,
                   nbins=nbins)
fig.show()

fig_df = df.query('d1 == True & d2 == True')
fig = px.histogram(fig_df,
                   x=var,
                   facet_col='d3',
                   histnorm='probability density',
                   marginal='box',
                   width=width,
                   height=height,
                   nbins=nbins)
fig.show()

fig_df = df.query('d1 == True & d2 == True & d3 == True')
fig = px.histogram(fig_df,
                   x=var,
                   facet_col='d4',
                   histnorm='probability density',
                   marginal='box',
                   width=width,
                   height=height,
                   nbins=nbins)
fig.show()