(time-nowcasting)=
# Introduction to Nowcasting

Nowcasting is the prediction of economic indicators in the present, the very near future, and the very recent past—usually in a context where there is a lag in collecting and compiling information on that economic indicators and only partial information is available. For GDP, it can take over a year after the end of the so-called reference period for the final version of the data to be released. The two times you'll most often hear about nowcasting are in the context of the weather and GDP, two variables that people care deeply about!

For most of this Chapter, we'll see an example of coding up a nowcast following an example based on Federal Reserve Board economist Chad Fulton's [notes](http://www.chadfulton.com/topics/statespace_large_dynamic_factor_models.html).

Let's import the packages we'll need for this chapter:

In [None]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.graphics import tsaplots
from rich import inspect
from numpy.random import Generator, PCG64
import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.dates as mdates
import warnings

# Plot settings
plt.style.use(
    "https://github.com/aeturrell/coding-for-economists/raw/main/plot_style.txt"
)
mpl.rcParams.update({"lines.linewidth": 1.2})
# Set max rows displayed for readability
pd.set_option("display.max_rows", 8)

# Set seed for random numbers
seed_for_prng = 78557
prng = Generator(PCG64(seed_for_prng))
warnings.filterwarnings('ignore')

## Nowcasting

We need to begin with some definitions. A *reference period* is the time period of interest to the nowcast, ie the time period for which you would like to estimate a data point. The *data vintage* is the time when the data were issued, and this could be before or after the reference period—indeed, for data that are revised, it can be some time *after* the reference period. Finally, the *frequency* is how often reference periods occur. For nowcasting GDP, the frequency is typically quarterly.

There are three parts to a typical nowcast:

1. the forecast (the estimate of the data point shortly *before* the reference period)
2. the nowcast (the estimate of the data point *during* the reference period)
3. the backcast (the estimate of the data point *after* the reference period)


In [None]:
# TODO remove input

line_pos = [0, 3, 9, 12]
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
labels = [r"$t-1$", r"$t$", r"$t+1 \longrightarrow$"]

text_height_1 = 5
text_height_2 = 2

fig, ax = plt.subplots(figsize=(5, 2))
ax.set_xlim(line_pos[0], line_pos[-1])
ax.set_ylim(line_pos[0], 6)
ax.axvline(3, color="k", linestyle=":")
ax.axvline(9, color="k", linestyle=":")
ax.annotate("Forecast", xy=(1.5, text_height_1), ha="center")
ax.annotate("True Nowcast", xy=(6, text_height_1), ha="center")
ax.annotate("Backcast", xy=(10.5, text_height_1), ha="center")
ax.annotate("Reference period", xy=(6, text_height_2), ha="center")
ax.axvspan(xmin=line_pos[0], xmax=line_pos[1], facecolor=colors[0], alpha=0.3)
ax.axvspan(xmin=line_pos[1], xmax=line_pos[2], facecolor=colors[1], alpha=0.3)
ax.axvspan(xmin=line_pos[2], xmax=line_pos[3], facecolor=colors[2], alpha=0.3)
ax.set_xticks(line_pos[1:])
ax.set_xticklabels(labels)
ax.set_yticklabels([])
ax.set_yticks([])
ax.set_xlabel("Time (reference period)")
ax.spines['right'].set_visible(True)
ax.spines['top'].set_visible(True)
ax.set_title("The different periods of a nowcast", fontsize=10)
plt.show()

In the above figure, the model is the same for the three different sub-periods, the only difference is whether the target event has happened yet.

Nowcasting is, in many ways, more complicated than forecasting. With forecasting, it's often the case that the *information set*, $\Omega$, that you're using is fixed at a point in time (say, at time $t-1$), and you plug that information into a model, $f$, to try and predict your target variable, $y$, at $y_{t}$. With nowcasting, the problem becomes *dynamic*. We could be at $t-1$, $t$, or even $t+1$ (and anything inbetween) but we still want to use the *same* model to nowcast the value of $y$ at $t$! This is a tall order for any model; it has to cover input data that are changing over time. 

Nowcasts should be able to be constructed *whenever* new data are available; that is whenever the *information set*, $\Omega$, is updated. And this could happen several times through the nowcasting period (in forecast, nowcast, or backcasting periods).

Nowcasts have to deal with two particular challenges:

- Nowcasts should be able to deal with mixed frequency data, so as to take information that is released between and within reference periods.
- Nowcasts should be able to deal with "ragged edges" in the data; that is, different variables in the information set have different release frequencies so making a nowcast at any point in time will inevitably collect an information set that has some data missing.

As a concrete example of the new data that arrive during the period of operation of a nowcast (and these two challenges), let's imagine we are trying to nowcast quarterly GDP but that every month there is a survey of growth expectations and every three weeks on a Monday there is a release on sales (which has some predictive power for GDP). And then let's imagine we are trying to take a nowcast.

In [None]:
# TODO remove input

time_min = "30-11-2019"
periods = 50
weekly_range = pd.date_range("01-01-2020", freq="W", periods=periods)
months = mdates.MonthLocator(interval=1)
months_fmt = mdates.DateFormatter('%b')

weeks = mdates.WeekdayLocator(byweekday=mdates.MO)

date_lines = [(weekly_range.min() - pd.DateOffset(months=1)).to_period("Q"),
              (weekly_range.min() + pd.DateOffset(months=1)).to_period("Q"),
              (weekly_range.min() + pd.DateOffset(months=5)).to_period("Q"),
              (weekly_range.min() + pd.DateOffset(months=8)).to_period("Q")]

locs_release_sales = pd.date_range(time_min, freq="W-MON", periods=periods)
locs_survey = pd.date_range(time_min, freq="M", periods=periods)

text_height = 7

fig, ax = plt.subplots(figsize=(7, 2))
ax.set_xlim(weekly_range.min()-pd.DateOffset(months=1), weekly_range.max())
ax.set_ylim(0, 8)
for date_val in date_lines:
    ax.axvline(date_val, color="k", linestyle=":", alpha=0.3)
annotations = ["  Q1", "  Q2 (ref. period)", "  Q3", "  Q4"]
for i, tation in enumerate(annotations):
    ax.annotate(tation, xy=(date_lines[i], text_height), ha="left", fontsize=9)
ax.annotate("  (Q1 GDP\n  1st estimate released)", xy=(date_lines[2], 6), ha="left", va="center", fontsize=7)
ax.annotate("  (Q2 GDP 1st estimate released)", xy=(date_lines[3], 6), ha="left", va="center", fontsize=7)
for val in locs_release_sales[::3]:
    ax.annotate("Sales Release", xy=(val, 0), xytext=(val, 1), fontsize=6, rotation=90, ha="center")
for val in locs_survey:
    ax.annotate("Survey", xy=(val, 0), xytext=(val, 3), fontsize=6, rotation=90, ha="center")
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(months_fmt)
ax.xaxis.set_minor_locator(weeks)
ax.yaxis.set_ticks([])
ax.yaxis.set_ticklabels([])
ax.tick_params(axis='x', which="minor", bottom=True)
ax.set_title("An example of high-frequency information releases relevant to a GDP nowcast", fontsize=10)
plt.show()

In the above example we have three time series, all with different frequencies: the GDP 1st estimate releases (quarterly, but with a 1 quarter lag) and this we would use as an auto-regressive term in a model, the sales releases that come out every 3 weeks on a Monday, and the once-a-month survey releases. (In the diagram, the smaller tick marks denote every Monday.)

A good nowcasting model needs to be able to incorporate all of the information in the example above, *and* be able to run at any point in time shown.

There are a range of methods for nowcasting around, but the most popular broad classes are probably:

- *bridge models*, which use timely indicators to forecast a target variable. They create a "bridge" between higher-frequency series and the target variables by regressing the target variables on the higher frequency indicators. They tend to produce iterated forecasts and the high-frequency series are likely to be aggregated to low frequency.
- *MIDAS*, or mixed-data sampling regressions which use a multi-step forecast and weight high frequency indicators using a special type of polynomial of lags and include leads of the high-frequency indicators. There is not, typically, any time aggregation.
- *dynamic factor models*, which suppose that a small number of unobserved “factors” can be used to explain a substantial portion of the variation and dynamics in a larger number of observed variables, and (most usefully for nowcasting) variables that are lower frequency can be estimated from the high-frequency indicators.


TODO Quote this paper on differences: A comparison of MIDAS and bridge equations