# Kleven Best (2012) Replication

In [1]:
# Load libraries
using Parameters
using CSV
using DataFrames
using Plots
using Trapz
using NumericalIntegration
using SpecialFunctions

In [None]:
# (Some imports / temporary)
# using Pkg
# Pkg.add("SpecialFunctions")

## Data setup

Follow code from prepared000.m

### Load and organize data.

The columns are:
* $z$: Income
* $H_y(z)$: CDF of income among the young
* $h_y(z)$: PDF of income among the young
* $H_o(z)$: CDF of income among the old
* $h_o(z)$: PDF of income among the old
* $mtr(z)$: Marginal tax rate at $z$
* $\omega$: Wage rate when old, as a function of ability and effort when young
* $j(\omega)$: PDF of $\omega$
* $J(\omega)$: CDF of $\omega$
* dtapars: Other parameters

The data parameters are:
* e: Static earnings elasticity w.r.t marginal tax rate
* d: Elasticity of old wages to young effort, $\delta = \frac{\partial w}{\partial z_y}\frac{z_y}{\omega}$
* $z_y$min: Minimum $z$ parameter for young Pareto distribution
* $\alpha_{zy}$: $\alpha$ parameter for young Pareto distribution
* $z_o$min: Minimum $z$ parameter for old Pareto distribution
* $\alpha_{zo}$: $\alpha$ parameter for old Pareto distribution
* $\gamma$: SWF inquality aversion
* $R$: Exogeneous revenue requirement

In [2]:
# Load data
dta = DataFrame(CSV.File("data/statae050.csv", header = false));

ndta = size(dta)[1]; # Data size

# Rename columns
rename!(dta, [:z, :Hzy, :hzy, :Hzo, :hzo, :mtrz, :ω, :jω, :Jω, :dtapars]);


In [3]:
# Add a struct of parameters from the data
@with_kw struct dta_params
    e
    d ::Float64 = 0 # Benchmark scenario where δ=0
    zymin
    αzy
    zomin
    αzo
    γ ::Float64 = 10
    R ::Int64 = 4000
end;

In [4]:
dtapars = dta_params(e = dta.dtapars[1],
                     zymin = dta.dtapars[2],
                     αzy = dta.dtapars[3],
                     zomin = dta.dtapars[4],
                     αzo = dta.dtapars[5]);

### Compute the remaining income distributions.

$\omega(n, z_y/n)$: Function relating wage rate when old to ability and effort when young.

Recall that $z_o = \omega(n, l_y) l_o$. The FOC of the old (Equation 3) allows us to back out the wage rate $\omega$ and construct the distribution $J(\omega)$:

$$1-\tau_o(z_o) - \left(\frac{z_o}{\omega}\right)^{1+1/e}\frac{1}{\omega}=0$$

Then, we assume that the wage rate is parametrized by:

$$\omega = \omega(n, z_y/n) = \omega_0(n) \cdot \left(\frac{z_y}{n}\right)^\delta $$

Where $w_0(n)$ is the baseline old wage for ability $n$ with no investment effects $(\delta = 0)$.

With an assumption about $\delta$ and data on $n$ and $z_y$, we can back out $\omega_0(n)$.

#### First, define a few useful functions

In [30]:
# Function to make a distribution monotonic
function make_monotone(dist)
    new = copy(dist)
    ndata = length(dist)
    for i in 2:ndata
        if new[i] <= new[i-1]
            new[i] = new[i-1]
            nextind = findfirst(new .> new[i])[2]
            next = new[nextind]
            new[i] = (next + new[i-1]*(nextind-i)) / (nextind-i+1)
        end
    end
    return new
end;

In [31]:
# Function that smoothes distributions
# (Same procedure as in Saez)
function smooth_dist(dist, niter)
    old = copy(dist)
    new = copy(dist)
    ndata = length(dist)
    for i in 1:niter
        for j in 2:ndata-1
            new[j] = 0.3*old[j-1] + 0.4*old[j] + 0.3*old[j+1]
        end
        old = copy(new)
    end
    return new
end;

#### Next, compute the distributions

In [5]:
# Find the distribution of ω(n, z_y/n)
# Do this by matching the CDF of ω and the CDF of z_y
# (Minimize the distance bewteen the CDF values)

# Get the difference between the Hzy and Jω
ωzdiff = abs.(repeat(dta.Hzy', ndta, 1) - repeat(dta.Jω, 1, ndta));

# Find the minimum difference indices for each Hzy (columnwise)
# ωinds[i] = min_j |Hzy[i] - J(ω)[j])|
_, ωinds = findmin(ωzdiff, dims = 1);
ωinds = getindex.(ωinds[1,:], 1)';

# Create ω distributions of these minimum differences
ωz = dta.ω[ωinds];
Jωzy = dta.Jω[ωinds];
jωzy = dta.jω[ωinds];

In [6]:
# Find the distribution of z_o(z_y)
# Follow a similar procedure as for finding the distribution of ω,
# this time matching the CDFs of z_y and z_o

# Difference between young z CDF and old z CDF
zozdiff = abs.(repeat(dta.Hzy', ndta, 1) - repeat(dta.Hzo, 1, ndta));

# Find the minimum difference indices for each Hzy (columnwise)
# zoinds[i] = min_j |Hzy[i] - Hzo[j])|
_, zoinds = findmin(zozdiff, dims = 1);
zoinds = getindex.(zoinds[1,:], 1)';

# Create z distributions of these minimum differences
zoz = dta.z[zoinds];
Hzoz = dta.Hzo[zoinds];
hzoz = dta.hzo[zoinds];


In [7]:
# Replace z_o(z_y) for top incomes z>150k

# Split the z vector at 150,000
znew1 = dta.z[dta.z.<150000];
znew2 = dta.z[dta.z.>150000];
nz1 = size(znew1)[1];

# Construct z_o(z_y) incomes above 150,000
zohigh = dtapars.zomin * (znew2 / dtapars.zymin).^(dtapars.αzy/dtapars.αzo);

# Append to incomes below 150,000
zoznew = [zoz[1:nz1]; zohigh];


In [32]:
# Compute ω_0, make ω_0 monotonic, and smooth ω_0
ω0 = copy(ωz);
ω0 = make_monotone(ωz);
ω0 = smooth_dist(ω0, 2000)

1×1126 Matrix{Float64}:
 67.4914  67.5645  67.6376  67.7107  …  117390.0  1.18119e5  1.18848e5

In [None]:
# Compute the ability levels, make monotone, and smooth

In [None]:
# Compute the ability distributions, make monotone, and smooth

## Optimal tax calculation

### Procedure

In [None]:
# Set parameters
# Set some parameters
@with_kw struct params
    γ ::Int64            # SWF inequality aversion parameter
    k ::Int64            # Parametrizes elasticity ε
    e ::Float64 = 1/k    # Elasticity ε = 1/k
    a ::Int64            # Pareto parameter for upper income distribution
    R ::Int64            # Exogenous per-person revenue requirement
end;