# Using Parallel Computing for Macroeconomic Forecasting at the Federal Reserve Bank of New York

**Pearl Li** (@pearlzli) <br>
**Federal Reserve Bank of New York** (@FRBNY-DSGE)

June 21, 2017

## Disclaimer

This talk reflects the experience of the author and does not represent an endorsement by the Federal Reserve Bank of New York or the Federal Reserve System of any particular product or service. The views expressed in this talk are those of the author and do not necessarily reflect the position of the Federal Reserve Bank of New York or the Federal Reserve System. Any errors or omissions are the responsibility of the author.

## Outline

1. Overview of DSGE modeling
2. "The forecast step": objectives and challenges
3. Parallelizing the forecast code: <br>
   a. DistributedArrays.jl <br>
   b. `pmap` and blocking
4. Conclusion and next steps

## Overview of DSGE modeling

A DSGE (dynamic stochastic general equilibrium) model is a "micro-founded macro-model", used in both policy and academia for

- Forecasting macroeconomic variables
- Understanding the forces underlying past economic outcomes
- Analyzing the effect of monetary policy

We can represent a DSGE model as a system of two dynamic equations:

- A **transition equation** $$s_t = T(\theta) s_{t-1} + R(\theta) \epsilon_t + C(\theta)$$ expressing how the states $s_t$ evolve over time as a function of past states $s_{t-1}$ and current-period shocks $\epsilon_t$ <br><br>

- A **measurement equation** $$y_t = Z(\theta) s_t + D(\theta)$$ mapping states $s_t$ to observables (data) $y_t$

State-space matrices are a function of time-invariant parameters $\theta$.

| Notation     | Name        | Examples                                   |
| ----------------------------------------------------------------------- |
| $s_t$        | States      | Output growth, inflation                   |
| $y_t$        | Observables | Real GDP growth, core PCE inflation        |
| $\epsilon_t$ | Shocks      | Productivity shock, aggregate demand shock |
| $\theta$     | Parameters  | Household discount rate, inflation target  |

[DSGE.jl](https://github.com/FRBNY-DSGE/DSGE.jl) is a package developed by the New York Fed's DSGE team for estimating and forecasting DSGE models in Julia.

(See Erica Moszkowski's [talk](https://www.youtube.com/watch?v=Vd2LJI3JLU0) at JuliaCon 2016.)

DSGE.jl centers around a **model object**

- Each model is a concrete subtype of `AbstractModel`
- Model object stores information about parameters, states, computational settings, and more
- Model-agnostic methods are defined for `AbstractModel`s: e.g. `optimize`
- Then use **method dispatch** to call model-specific functions:
  + e.g. `optimize` calls `measurement` to get measurement matrices $Z$ and $D$ for a particular model

A (stripped-down) concrete subtype of `AbstractModel`

In [None]:
type Model990{T} <: AbstractModel{T}
    # Time-invariant parameters
    parameters::ParameterVector{T}

    # Dictionaries mapping state/shock/etc. names to indices
    # e.g. endogenous_states[:π_t] = 2 means that π_t is the second state in s_t
    endogenous_states::OrderedDict{Symbol,Int}
    exogenous_shocks::OrderedDict{Symbol,Int}
    observables::OrderedDict{Symbol,Int}
    equilibrium_conditions::OrderedDict{Symbol,Int}

    # Model specification and subspecification
    spec::String
    subspec::String

    # Computational settings
    settings::Dict{Symbol,Setting}
end

We are interested in

- **Estimation step:** sample from the posterior distribution $\mathbb{P}(\theta\ |\ y_{1:T})$ of the parameters $\theta$
  + Data $\to$ distribution of parameters
  + Already done!
- **Forecast step:** use the estimated parameter draws to forecast, compute impulse responses and shock decompositions, and more
  + Distribution of parameters $\to$ distribution of future states (and more)
  + Focus of this talk

## "The forecast step": objectives and challenges

In the estimation step, we generated a large number of parameter draws from their posterior distribution. For each draw $\theta^{(j)}$, we might want to compute the following *products*:

- **Smoothed history:** Estimate historical states $s_{t|T}$ (where $T$ is the last data period and $t < T$)
- **Forecast:** Iterate the state space forward to get future states $s_{T+h|T}$
- **Shock decomposition:** Decompose $s_{t|T}$ into a weighted sum of accumulated shocks $\epsilon^{(i)}_{1:t|T}$ (where $i$ indexes the particular shock, e.g. productivity)
- **Impulse response:** Compute $\frac{\partial s_{1:H}}{\partial \epsilon^{(i)}_1}$, the response of states to a shock $\epsilon^{(i)}$ at time 1

Figure 1: Forecast of real GDP growth <br>
TODO: add fan chart

Figure 2: Shock decomposition of real GDP growth

<img src="shockdec.jpg" width=750px>

Want to minimize

1. Computational time
   + "Whole shebang" (three conditional types, all products) took ~70 minutes using our MATLAB code
<br>
2. Memory usage
   + (e.g. for computing smoothed historical states) 229 quarters $\times$ 84 states $\times$ 20,000 draws

Naive implementation: for loop

In [None]:
for θ_j in parameter_draws
    # Compute state space matrices under θ_j
    update!(model, θ_j)
    system = compute_system(model)
        
    # Estimate historical states
    kal = filter(model, data, system)
    histstates, histshocks, histpseudo, s_T = 
        smooth(model, data, system, kal)
        
    # Forecast future states
    forecaststates, forecastobs, forecastpseudo, forecastshocks = 
        forecast(model, system, s_T)

    ...
        
    # Write forecast outputs
    write_forecast_outputs(...)
end

## Parallelizing the forecast code

Preview of results: benchmark times against MATLAB (smaller is better)

| Test                                         | MATLAB (2014a) | Julia (0.4.5) |
| -------------------------------------------- | -------------- | ------------- |
| Smoothing                                    | 1.00           | 0.38          |
| Forecasting                                  | 1.00           | 0.24          |
| All forecast outputs (modal parameters)      | 1.00           | 0.10          |
| **All forecast outputs (full distribution)** | 1.00*          | **0.22**      |

*Run in MATLAB 2009a

Two approaches considered

1. Distributed storage, i.e. using [DistributedArrays.jl](https://github.com/JuliaParallel/DistributedArrays.jl)
2. `pmap` and "blocking"

DistributedArrays.jl

- Solution for storing arrays too large for one machine
- `DArray` storage distributed across multiple processes
- Each process operates on the part of the array it owns $\implies$ natural parallelization

In [3]:
# Add processes and load package on all processes
worker_procs = addprocs(5)
@everywhere using DistributedArrays

# Initialize DArray, distributing along the second dimension across all 
# 5 processes
arr_size = (2, 25, 2)
arr_div  = [1, 5, 1]
arr = drand(arr_size, worker_procs, arr_div)
nothing

In [4]:
# Query a worker process for its local indices into arr
worker_id = worker_procs[1]
remotecall_fetch(localindexes, worker_id, arr)

(1:2,1:5,1:2)

In [5]:
# Return worker's local array
remotecall_fetch(localpart, worker_id, arr)

2×5×2 Array{Float64,3}:
[:, :, 1] =
 0.57127  0.39564    0.209463  0.440857  0.56174
 0.75199  0.0457825  0.842128  0.793163  0.88565

[:, :, 2] =
 0.598975  0.318283  0.404738   0.0311527  0.116038
 0.146935  0.57351   0.0344157  0.212597   0.429647

In [6]:
# Remove worker processes
rmprocs(worker_procs)

:ok

Using `DArray`s in the forecast step

- Distribute parameter draws among worker processes
- Each process will compute all outputs for the draws it owns
- Use both:
  + Lower-level functions (e.g. `smooth`) which operate on one draw
  + Higher-level functions (`smooth_all`) which, given many draws, call lower-level function on each

`DArray` implementation: all functions have `DArray` input arguments and return `DArray`s

In [None]:
worker_procs = addprocs(50)

# Load draws and compute systems for each draw θ_j
parameter_draws = load_draws(model, worker_procs)
systems = prepare_systems(model, parameter_draws)

# Estimate historical states
kals = filter_all(model, data, systems)
histstates, histshocks, histpseudo, s_Ts =
    smooth_all(model, data, systems, kals; procs = worker_procs)

# Forecast future states
forecaststates, forecastshocks, forecastobs, forecastpseudo =
    forecast_all(model, systems, s_Ts, procs = worker_procs)

...

# Write forecast outputs
write_forecast_outputs(...)

rmprocs(worker_procs)

Disadvantage #1: draw assignment 

- Must explicitly assign draws to processes
- `DArray`s must be divided equally among processes
- What if number of draws isn't divisible by number of processes? Have to throw out remainder

Disadvantage #2: unwieldy `DArray` construction

```
DArray(init, dims[, procs, dist])
```

- `init` function maps a tuple of local indices to the local part of the array
- Can only initialize one `DArray` for each call to the `init` function
- But what we want for `smooth_all` is to return four `DArray`s: `histstates`, `histshocks`, `histpseudo`, and `s_Ts`
- Result: ugly code...

In [None]:
# Initialize one big DArray with all outputs
out = DArray((ndraws, nstates + nshocks + npseudo + 1, nperiods), 
             procs, [nprocs, 1, 1]) do I
    
    # Initialize local part of array
    localpart = zeros(map(length, I)...)
    
    # Determine which draws i belong to this process
    draw_inds = first(I)
    ndraws_local = length(draw_inds)

    for i in draw_inds
        # Call smooth on draw i 
        states, shocks, pseudo, s_T = smooth(model, data, systems[i], kals[i])

        # Compute index of draw i into local array
        i_local = mod(i-1, ndraws_local) + 1

        # Assign smooth outputs to local array
        localpart[i_local, states_range,  :] = states
        localpart[i_local, shocks_range,  :] = shocks
        localpart[i_local, pseudo_range,  :] = pseudo
        localpart[i_local, statesT_range, states_range] = s_T
    end
        
    return localpart    
end

In [None]:
# Convert SubArrays to DArrays
states = convert(DArray, out[1:ndraws, states_range, 1:nperiods])
shocks = convert(DArray, out[1:ndraws, shocks_range, 1:nperiods])
pseudo = convert(DArray, out[1:ndraws, pseudo_range, 1:nperiods])
s_Ts = DArray((ndraws,), procs, [nprocs]) do I
    Vector{S}[convert(Array, slice(out, i, statesT_range, states_range)) for i in first(I)]
end

Figure 3: `smooth_all` result before indexing out `SubArray`s

<img src="darray.gif">

Disadvantage #3: computational time

- Parameter draws live on the processes they've been assigned to; difficult to reallocate
- Sometimes some compute nodes are busier than others
- Bottleneck effect: since `smooth_all` must return before `forecast_all` can begin, proceeding is limited by compute time of **slowest process**

`pmap` + blocking

- Divide parameter draws into "blocks" (typically 20,000 draws into 20 blocks)
- Read in one block at a time
- For each block, parallel map `forecast_one_draw` (computes all forecast outputs for a single draw) over that draw's parameters
- When `pmap` returns, write current block's results to disk

`pmap` and blocking implementation

In [None]:
# Get indices of draws corresponding to each block
block_indices = forecast_block_inds(model)
nblocks = length(block_indices)

for block = 1:nblocks
    # Load draws for this block
    parameter_draws = load_draws(model, block)

    # Compute forecast outputs for each draw in block
    forecast_outputs = pmap(θ_j -> forecast_one_draw(m, θ_j, data),
                            parameter_draws)

    # Write results for this block
    write_forecast_outputs(model, forecast_outputs, 
                           block_number = block)
end

In [None]:
function forecast_one_draw(model::AbstractModel, θ_j::Vector{Float64},
                           data::Matrix{Float64})
    # Compute state space matrices under θ_j
    update!(model, θ_j)
    system = compute_system(model)
    
    # Estimate historical states
    kal = filter(model, data, system)
    histstates, histshocks, histpseudo, s_T = 
        smooth(model, data, system, kal)

    # Forecast future states
    forecaststates, forecastshocks, forecastobs, forecastpseudo =
        forecast(model, system, s_T)

    ...
    
    # Assign results to dictionary to be returned
    forecast_outputs = Dict{Symbol, Array{Float64}}()
    ...
    return forecast_outputs
end

Advantages

- `pmap` handles assigning draws to worker processes automatically (`DArray`s disadvantage #1)
- Don't need to implement functions like `smooth_all` which handle multiple draws (disadvantage #2)
- Takes advantage of independence of draws (disadvantage #3)
- Computing in blocks reduces memory usage when `pmap` returns results to originator process

Take advantage of [writing to a subset](https://github.com/JuliaIO/HDF5.jl/blob/master/doc/hdf5.md#reading-and-writing-data) of an HDF5 dataset:

In [None]:
function write_forecast_block(file::JldFile, arr::Array{Float64}, 
                              block_indices::Range{Int})

    # Access JLD file's underlying HDF5 file
    pfile = file.plain
    
    # Get reference to existing HDF5 dataset named "arr"
    dataset = HDF5.d_open(pfile, "arr")
    
    # Write to subset of dataset corresponding to given block indices
    ndims = length(size(dataset))
    dataset[block_indices, fill(Colon(), ndims-1)...] = arr
end

Result: more natural, readable, efficient, and **beautiful** code

## Conclusion and next steps

Long-term goals

- Make DSGE.jl less us-specific
- Move to Julia 0.6 and 1.0
- Be responsible contributors to the Julia package ecosystem

Ongoing work

- Forecasting under alternative monetary policy rules
- Forecast evaluation and decomposing changes in forecasts
- Estimating nonlinear models using the [tempered particle filter](https://web.sas.upenn.edu/schorf/files/2016/10/HS-TemperedParticleFilter-PaperAppendix-1mlvlvu.pdf) (Herbst & Schorfheide 2017)

Acknowledgments

- New York Fed DSGE team:
  + Marco Del Negro, Marc Giannoni, Abhi Gupta, Erica Moszkowski, Sara Shahanaghi, Micah Smith

- QuantEcon collaborators:
  + Zac Cranko, Spencer Lyon, John Stachurski, Pablo Winant

## Thank you! Questions?