# Kleven Best (2012) Replication

## Data setup

Follow code from prepared000.m.

Note: $\delta$ is set as parameter d in the dta_params struct, change for different values of $\delta$.

Completed and exported into DataPrep.jl as a module.

In [1]:
# Load libraries
using Parameters
using CSV
using DataFrames
using Plots
using Trapz
using NumericalIntegration
using SpecialFunctions

In [2]:
# (Some imports / temporary)
# using Pkg
# Pkg.add("SpecialFunctions")

### Load and organize data.

The columns are:
* $z$: Income
* $H_y(z)$: CDF of income among the young
* $h_y(z)$: PDF of income among the young
* $H_o(z)$: CDF of income among the old
* $h_o(z)$: PDF of income among the old
* $mtr(z)$: Marginal tax rate at $z$
* $\omega$: Wage rate when old, as a function of ability and effort when young
* $j(\omega)$: PDF of $\omega$
* $J(\omega)$: CDF of $\omega$
* dtapars: Other parameters

The data parameters are:
* e: Static earnings elasticity w.r.t marginal tax rate
* d: Elasticity of old wages to young effort, $\delta = \frac{\partial w}{\partial z_y}\frac{z_y}{\omega}$
* $z_y$min: Minimum $z$ parameter for young Pareto distribution
* $\alpha_{zy}$: $\alpha$ parameter for young Pareto distribution
* $z_o$min: Minimum $z$ parameter for old Pareto distribution
* $\alpha_{zo}$: $\alpha$ parameter for old Pareto distribution
* $\gamma$: SWF inquality aversion
* $R$: Exogeneous revenue requirement

In [3]:
# Load data
dta = DataFrame(CSV.File("data/statae050.csv", header = false));

ndta = size(dta)[1]; # Data size

# Rename columns
rename!(dta, [:z, :Hzy, :hzy, :Hzo, :hzo, :mtrz, :ω, :jω, :Jω, :dtapars]);


In [4]:
# Add a struct of parameters from the data
@with_kw struct Dta_params
    e
    d ::Float64 = 0 # Benchmark scenario where δ=0
    zymin
    αzy
    zomin
    αzo
    γ ::Float64 = 10
    R ::Int64 = 4000
end;

In [5]:
dtapars = Dta_params(e = dta.dtapars[1],
                     zymin = dta.dtapars[2],
                     αzy = dta.dtapars[3],
                     zomin = dta.dtapars[4],
                     αzo = dta.dtapars[5]);

### Compute the remaining income distributions.

$\omega(n, z_y/n)$: Function relating wage rate when old to ability and effort when young.

Recall that $z_o = \omega(n, l_y) l_o$. The FOC of the old (Equation 3) allows us to back out the wage rate $\omega$ and construct the distribution $J(\omega)$:

$$1-\tau_o(z_o) - \left(\frac{z_o}{\omega}\right)^{1+1/e}\frac{1}{\omega}=0$$

Then, we assume that the wage rate is parametrized by:

$$\omega = \omega(n, z_y/n) = \omega_0(n) \cdot \left(\frac{z_y}{n}\right)^\delta $$

Where $w_0(n)$ is the baseline old wage for ability $n$ with no investment effects $(\delta = 0)$.

With an assumption about $\delta$ and data on $n$ and $z_y$, we can back out $\omega_0(n)$.

#### First, define a few useful functions

In [6]:
# Function to make a distribution monotonic
function make_monotone(dist)
    new = copy(dist)
    ndata = length(dist)
    for i in 2:ndata
        if new[i] <= new[i-1]
            new[i] = new[i-1]
            nextind = findfirst(new .> new[i-1])[2]
            next = new[nextind]
            new[i] = (next + new[i-1]*(nextind-i)) / (nextind-i+1)
        end
    end
    return new
end;

In [7]:
# Function that smoothes distributions
# (Same procedure as in Saez)
function smooth_dist(dist, niter)
    old = copy(dist)
    new = copy(dist)
    ndata = length(dist)
    for i in 1:niter
        for j in 2:ndata-1
            new[j] = 0.3*old[j-1] + 0.4*old[j] + 0.3*old[j+1]
        end
        old = copy(new)
    end
    return new
end;

#### Next, compute the distributions

In [8]:
# Find the distribution of ω(n, z_y/n)
# Do this by matching the CDF of ω and the CDF of z_y
# (Minimize the distance bewteen the CDF values)

# Get the difference between the Hzy and Jω
ωzdiff = abs.(repeat(dta.Hzy', ndta, 1) - repeat(dta.Jω, 1, ndta));

# Find the minimum difference indices for each Hzy (columnwise)
# ωinds[i] = min_j |Hzy[i] - J(ω)[j])|
_, ωinds = findmin(ωzdiff, dims = 1);
ωinds = getindex.(ωinds[1,:], 1)';

# Create ω distributions of these minimum differences
ωz = dta.ω[ωinds];
Jωzy = dta.Jω[ωinds];
jωzy = dta.jω[ωinds];

In [9]:
# Find the distribution of z_o(z_y)
# Follow a similar procedure as for finding the distribution of ω,
# this time matching the CDFs of z_y and z_o

# Difference between young z CDF and old z CDF
zozdiff = abs.(repeat(dta.Hzy', ndta, 1) - repeat(dta.Hzo, 1, ndta));

# Find the minimum difference indices for each Hzy (columnwise)
# zoinds[i] = min_j |Hzy[i] - Hzo[j])|
_, zoinds = findmin(zozdiff, dims = 1);
zoinds = getindex.(zoinds[1,:], 1)';

# Create z distributions of these minimum differences
zoz = dta.z[zoinds];
Hzoz = dta.Hzo[zoinds];
hzoz = dta.hzo[zoinds];


In [10]:
# Replace z_o(z_y) for top incomes z>150k

# Split the z vector at 150,000
znew1 = dta.z[dta.z.<150000];
znew2 = dta.z[dta.z.>150000];
nz1 = size(znew1)[1];

# Construct z_o(z_y) incomes above 150,000
zohigh = dtapars.zomin * (znew2 / dtapars.zymin).^(dtapars.αzy/dtapars.αzo);

# Append to incomes below 150,000
zoznew = [zoz[1:nz1]; zohigh];


In [11]:
# Compute ω_0, make ω_0 monotonic, and smooth ω_0
ω0 = copy(ωz);
ω0 = make_monotone(ωz);
ω0 = smooth_dist(ω0, 2000);

In [12]:
# Compute ω_0, make ω_0 monotonic, and smooth ω_0
ω0 = copy(ωz);
ω0 = make_monotone(ωz);
ω0 = smooth_dist(ω0, 2000);

In [13]:
# Compute the ability levels, make monotone, and smooth
# (n = ability)
τ = dta.mtrz./100;
n = (dta.z.^(1/(1+dtapars.e))) .* ((1 .- τ).^(-dtapars.e/(1+dtapars.e)));
n = make_monotone(n');
n = smooth_dist(n', 2000);


In [14]:
# Compute the ability distribution and smooth
fn = diff(dta.Hzy) ./ diff(n);
push!(fn, 0);
fn = smooth_dist(fn, 3000);

# Create the CDF and normalize so sum = 1
Fn = cumul_integrate(n, fn); # CDF
fn = fn/Fn[end]; # Normalize
Fn = cumul_integrate(n, fn); # Normalize

In [15]:
# Pareto tails for the ability distribution
# Above z = 2,000,000

# Calculate the pareto distribution
haz = (n.*fn)./(1 .- Fn); # Clarify what this is?
pareto_ind = findfirst(dta.z .> 2000000);
pareto_α = haz[pareto_ind];
φ = 1 - Fn[pareto_ind];
pareto_lb = n[pareto_ind] * φ^(1/pareto_α);

# Update the distributions fn, Fn
fn[pareto_ind:end] = pareto_α .* pareto_lb^pareto_α ./ n[pareto_ind:end].^(1+pareto_α);
Fn[pareto_ind:end] = 1 .- (pareto_lb^pareto_α ./ n[pareto_ind:end].^pareto_α);

# Trim the ability distribution at 27,200
ntop = findfirst(n .> 27200) - 1;
n = n[1:ntop];
fnnew = fn[1:ntop];
Fnnew = Fn[1:ntop];
ω0 = ω0[1:ntop];

# Re-normalize distributions
fn = fnnew / Fnnew[ntop];
Fn = Fnnew / Fnnew[ntop];


In [16]:
# Squeeze the top of the z grid (clarify?)
z = zeros(ntop, 1);
for i = 1:ntop
    z[i] = exp(5 + ((i-1) * (11.52/(ntop-1))))
end;

In [19]:
# Define η (from hamiltonian.agedep.m, easier to do here)
ω0prime = diff(ω0) ./ diff(n);
push!(ω0prime, ω0prime[end]);
η = dtapars.d .+ (ω0prime .* n ./ ω0);

978-element Vector{Float64}:
 0.12999643221595142
 0.13093002580318036
 0.13185402090408946
 0.13276853660413762
 0.13367374516959263
 0.13456987262833564
 0.1354571989963407
 0.1363360581516139
 0.13720683735690212
 0.1380699764397559
 0.1389259666376958
 0.13977534912302164
 0.1406187132210153
 ⋮
 1.0978032707906427
 1.0977955398419883
 1.0977870634255364
 1.097777844573296
 1.0977678886049367
 1.097757203095739
 1.0977457978337313
 1.0977336847651533
 1.0977208779297871
 1.0977073933848813
 1.0976932491203388
 1.0969641557622398

In [17]:
# Create a struct of data primitives to return (when modularized)
@with_kw struct Prims_struct
    z
    n
    ω0
    fn
    ntop
    γ
    d
    e
    η
end;

In [18]:
prims = Prims_struct(z, n, ω0, fn, ntop, dtapars.γ, dtapars.d, dtapars.e, η)


Prims_struct
  z: Array{Float64}((978, 1)) [148.4131591025766; 150.17348576416674; … ; 1.4771479259276977e7; 1.4946683593774399e7;;]
  n: Array{Float64}((978,)) [30.263461163541827, 30.51559227444055, 30.767767586748093, 31.02003127691949, 31.27242747156859, 31.52500022271328, 31.77779348321883, 32.0308510825042, 32.284216702574916, 32.5379338544449  …  25574.210424036966, 25749.394175295412, 25925.777917283227, 26103.3698668783, 26282.178296810493, 26462.211535988918, 26643.477969824642, 26825.986040548025, 27009.744247519877, 27194.761147535384]
  ω0: Array{Float64}((978,)) [67.491388, 67.56448299394141, 67.63758654525179, 67.71070722817241, 67.78385365061185, 67.85703447078714, 67.93025841363541, 68.00353428692131, 68.07687099696845, 68.15027756394441  …  37991.02662630184, 38276.713788948335, 38564.54682772191, 38854.54164316284, 39146.714251876445, 39441.080787897605, 39737.65750406192, 40036.46077337941, 40337.50709040601, 40640.8130726083]
  fn: Array{Float64}((978,)) [3.7215242