# **srGBM-MFPT ESTIMATIONS**

How much time does it take for individual workers to improve their income status? This is the bedrock question that lies beneath the existence of the American dream. By employing ideas and techniques from statistical mechanics, we can provide a disaggregated view on a worker's income timeline. Here you can explore the time required for a United States worker to improve their current income or input your own data and calculate estimates for other economies.

Here you input data and estimate the time to reach a target income in your own economy. You can choose whether you want to provide the data in csv/xlsx format or input it directly in a table. For more explanations about the data needed for these calculations we refer to [REF TO OUR WORK].



In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import random
from scipy.stats import gmean
import statsmodels.api as sm
from scipy.optimize import minimize
import scipy.stats as st

# Importing the data
First, we need to calculate the parameters that drive the income in your economy. For this, you will need to provide two types of data:

# Dataset 1.

Insert a table as an xlsx/csv document that provides how the share of the income owned by the top 1% has evolved over time. The dataset should be called 'data_share'. The data should have two columns. The first column should indicate the year, and it should be called 'year'. The second should give the value of the share of the income owned by the top 1% and should be called share. You must include at least two data points. As an example, see the document provided here.

In [None]:
from google.colab import files
uploaded = files.upload()

In [None]:
data_share = pd.read_excel('data_share.xlsx',sheet_name='Sheet1')

# Dataset 2. 

Insert a table as an xlsx/csv document that provides how the resetting rate has evolved over time. A good approximation for the resetting rate is the share of workers who left/lost their job in a calendar year. The first column of the table should indicate the year, and should be called 'year'. The second should give the value of the resetting rate and should be called 'rate. You must include data for the same years as in Step 1. As an example, see this document [downloads a csv for USA top 1 share].

In [None]:
from google.colab import files
uploaded = files.upload()

Saving rs.xlsx to rs.xlsx


In [None]:
rs_ = pd.read_excel('rs.xlsx')

In [None]:
# some code to read rs and data_sh

# Estimating the srGBM parameters
Now we are equipped to calculate the income dynamics parameters of your economy. Clock counting the length of estimation.

In [None]:
iterations = 25

fitted_mu = np.ones([len(data_sh)-1, iterations])
fitted_sigma = np.ones([len(data_sh)-1, iterations])
fitted_ddps = np.ones([len(data_sh)-1, iterations])

for iteration in range(iterations):
    print("Iteration: "f"{iteration}")   
    min_location = []
    min_pred = []
    min_sh = []
    
    random.seed(10)
    t = len(data_sh)
    people = 10000
    dt = 1
    m = 10
    #sigma = np.sqrt(0.02219277)
    #sigma = 0.2
    trajs = np.ones([m, m, t+1, people])
    
    for real in range(0,t-1,1):
      print(real)
    
      sh = []
      sqerror = []
      pred = []
    
      if real == 0:
        trajs[:,0,:] = data_
        mus = np.linspace(mu0,mu0,m)
        sigmas = np.linspace(sig0,sig0,m)
      else:   
        mus = np.linspace(0.001,0.15,m)
        sigmas = np.linspace(0.20,0.70,m)
        
      lista = {}
      
      for muiter, mu in enumerate(mus):
    
          for siter, sigma in enumerate(sigmas):
              
              choice_ = [[0,1],[1-rs[real+1]*dt, rs[real+1]*dt]]
              prob = np.random.choice(a=choice_[0], p=choice_[1], size=people)
              noise = np.random.randn(1,people)
    
              trajs[muiter,siter, real + 1, np.argwhere(prob == 0)] = trajs[muiter,siter, real, np.argwhere(prob == 0)] * (1 + mu * dt + (sigma * np.sqrt(dt)) * noise[0,np.argwhere(prob == 0)])
              trajs[muiter,siter, real + 1, np.argwhere(prob == 1)] = np.min(trajs[muiter,siter, 0, :])
        
              check = trajs[muiter,siter, real + 1, :]
              trajs[muiter,siter, real + 1, np.where(check<0)] = np.min(trajs[muiter,siter, 0, :]) #trajs[ri, 0, np.where(check<0)]
    
              trajs[muiter,siter, real+1, :].sort()
        
              share = np.sum(trajs[muiter, siter, real+1,:][-100:])/np.sum(trajs[muiter, siter, real+1,:])
    
              sqerror.append((share-data_sh[real+1])**2)
    
              sh.append(share)
              
              lista[muiter,siter] = mu, sigma   
              
      min_loc = sqerror.index(min(sqerror))
      min_location.append(list(lista.values())[min_loc])
      min_sh.append(sh[min_loc])
    
    fitted_mu[:, iteration] = np.array(pd.DataFrame(min_location)[0])
    fitted_sigma[:, iteration] = np.array(pd.DataFrame(min_location)[1])
    fitted_ddps[:, iteration] = min_sh

NameError: ignored

In [None]:
#%% Simple plot, just to check the regimes

plt.figure()
plt.plot(2*np.mean(fitted_mu,1)+((np.mean(fitted_sigma,1))**2)/2)
plt.plot(np.mean(fitted_mu,1))
plt.plot(rs)
plt.show()

# Choosing the starting and target income
Now that you have calculated the parameters of your economy, you can choose the starting income (x0), target income (y), and the value to which incomes are reset (xr), and estimate the srGBM-MFPT. 

In [None]:
xr = 1000 #resetting income
x0 = 1000 #initial income
ub = 4000 #target income

In [None]:
start = start_year
end = end_year

year1 = np.arange(start,end+1,1).reshape(end-start+1,1) 
#%% Analytic MFPT function

def mfpt_analytic(r, params):
    
    mu, sigma, x0, ub, xr = params
    
    q = (np.sqrt((sigma**2-2*mu)**2 + 8*r*sigma**2) + (sigma**2-2*mu)) / (2*sigma**2)
    
    Tx0 = (x0/ub)**q
    Tr = (xr/ub)**q
    
    mean_Tr = (1-Tx0) / (r*Tr)
    
    return mean_Tr
#%% Read the fitted parameters

mu_top1 = np.mean(fitted_mu, axis=1)
mu_top1_stderror = np.std(fitted_mu, axis=1)/np.sqrt(iterations)
sigma_top1 = np.mean(fitted_sigma, axis=1)
sigma_top1_stderror = np.std(fitted_sigma, axis=1)/np.sqrt(iterations)

#%% MFPT from specified initial income and target

std_errors = 3
mfpt = np.zeros((len(year1),3))


for y in range(0,len(year1)-1):
    
    r_est = rs[y]
    mu_est = mu_top1[y]
    sigma_est = sigma_top1[y]
    
    mu_upper = mu_top1[y]+std_errors*mu_top1_stderror[y]
    mu_lower = mu_top1[y]-std_errors*mu_top1_stderror[y]
    sigma_upper = sigma_top1[y]+std_errors*sigma_top1_stderror[y]
    sigma_lower = sigma_top1[y]-std_errors*sigma_top1_stderror[y]
         
    params_baseline = np.array([mu_est,sigma_est,x0,ub,xr])
    mfpt[y,0] = mfpt_analytic(r_est,params_baseline)
            
    params_upper = np.array([mu_upper,sigma_upper,x0,ub,xr])
    mfpt[y,1] = mfpt_analytic(r_est,params_upper)
            
    params_lower = np.array([mu_lower,sigma_lower,x0,ub,xr])
    mfpt[y,2] = mfpt_analytic(r_est,params_lower)

#%% Plot MFPT


NameError: ignored

# Results

Finally, you can see the results... DESCRIBE. 

The intervals for the expected time are calculated using 95% confidence intervals
The optimal time is the expected time under policies that minimize the time required for a worker to reach the target income.
The optimal resetting rate is the share of workers that change their jobs/working status in a year that leads to the minimum expected time to reach the target income.



You can download them in csv format here:

In [None]:
plt.figure()
plt.plot(mfpt[:,1])
plt.plot(mfpt[:,0])
plt.plot(mfpt[:,2])
plt.show()