*This study was conducted for skills demonstration purposes only*

# **Forecasting the UK Construction Sector with Macroeconomic Indicators**
# Section 5. Modeling

This section builds on the Exploratory Data Analysis (EDA) insights to develop predictive models for forecasting UK construction sector trends with macroeconomic indicators. Using identified correlations and time-lags, we implement lagged regression and Vector Autoregression (VAR) models to capture the dynamic relationships between variables like GDP, inflation, and construction output, material prices, and new contracts. The models account for black swan events (e.g., 2008 crisis, Brexit, COVID-19) via dummy variables, aiming to provide accurate forecasts and address the research questions on predictive power and lagged effects.

## Research Questions

1. How do construction related variables (e.g., output, material costs, new contracts) correlate with economic indicators (e.g., GDP growth, interest rates, inflation, employment rates)?

2. Can macroeconomic indicators predict construction trends?

4. Do these macroeconomic indicators impact construction activity immediately, or with a time lag?
<br> If so, what is the typical delay between an economic change and a response in construction output or material prices?

5. Can macroeconomic indicators be used to accurately forecast future construction trends?
<br> How effective are models such as lagged regression or VAR in making such predictions?

6. How so-called 'black swans' (e.g., Brexit, COVID-19) influenced construction industry?

## Sutable models and techniques review

For addressing the research questions  the models that handle time-series data and lagged relationships can be used.

- **Cross-Correlation Analysis**
<br>Identifies the lag at which two time-series (e.g., GDP and construction output) exhibit the strongest correlation.
<br>Computes the correlation coefficient between a macroeconomic indicator and a construction indicator at various lag lengths (e.g., 0 to 12 months).

- **Engle-Granger cointegration test**
<br>Checks whether two non-stationary time series are linked by a stable long-term relationship.
<br>Regressing one series on the other. Testing the residuals for stationarity (using the ADF test).

- **Granger Causality Test**
<br>Tests whether one time-series (e.g., GDP) can predict another (e.g., construction output) at specific lags.
<br>Assesses if lagged values of one variable improve predictions of another, indicating causality and lag length.

- **Lagged Regression**
<br>Suitable for capturing the effect of lagged macroeconomic variables on construction indicators.
<br>Can be used for prediction  with time lags.

- **Vector Autoregression (VAR) with Lag Selection**
<br>Models multivariate time-series and identifies optimal lags for all variables simultaneously.
<br>VAR models each variable as a function of its own lags and lags of other variables, with lag length determined by criteria like AIC or BIC.

- **Distributed Lag Models (DLM)**
<br>Explicitly models the effect of a predictor’s lagged values on the dependent variable.
<br>Regresses a construction indicator (e.g., material prices) on multiple lagged values of a macroeconomic indicator (e.g., CPIH).

- **ARIMAX with Exogenous Lags**
<br>Extends ARIMA to include lagged exogenous variables, identifying their influence on the target variable.
<br>Models a construction indicator (e.g., output) with its own lags and lagged exogenous variables (e.g., GDP, CPIH).

- **Machine Learning Models**
<br>Random Forest or Gradient Boosting (e.g., XGBoost) for non-linear relationships.
<br>Recurrent Neural Networks (RNNs) or LSTMs for complex time-series patterns.

## Modeling plan

1. **Cross-Correlation Analysis**
This method will be used to confirm and refine the EDA-identified lags.

2. **Engle-Granger cointegration test**
This method will be used to check if series share a common trend because of a real economic link or their correlation is spurious.

4. **Granger Causality**
Causality for significant correlations to validate predictive relationships will be tested.

5. **Implementation VAR**
Multivariate modeling which let the model select optimal lags via AIC/BIC will be carried out.
Key variables (e.g., construction output, GDP, CPIH, employment rate) and dummy variables for black swan events will be included.


6. **Verification with DLM or ARIMAX**
DLM will me used to test specific lag structures for individual relationships (e.g., CPIH to material prices).
ARIMAX will be usrd for single-indicator forecasting with exogenous lags.

Further research and possible extensions of this study may use other methods and models.

## Tools and Libraries

In [323]:
import pandas as pd
import numpy as np
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.stattools import ccf
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import coint
from statsmodels.tsa.vector_ar.vecm import VECM, select_order
import re

## Auxiliary Functions

## Loading data

In [324]:
#Reading data
df_m_diff = pd.read_parquet('df_m_model_diff.parquet')
df_q_diff = pd.read_parquet('df_q_model_diff.parquet')

df_q_raw = pd.read_parquet('df_q_eda.parquet')
df_q_raw_add = pd.read_parquet('df_q_eda_add.parquet')

In [325]:
#check data types: Index should be DatetimeIndex and PeriodIndex, other columns - numerical
print(type(df_m_diff.index))  
print(df_m_diff.dtypes)
print(type(df_q_diff.index))  
print(df_q_diff.dtypes)

<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
CPIH_yoy                               float64
Production GDP                         float64
Services GDP                           float64
EUR/GBP                                float64
USD/GBP                                float64
Construction output                    float64
Constr Material Price Index            float64
Govt Expenditure, £m                   float64
BoE Rate, %                            float64
FYStartSpend\_WIN\_2012-04\_2012-05      int64
Brexit_WIN_2016-07_2016-10               int64
COVID_TC_2020-03                       float64
COVID_AO_2020-04                         int64
PolicySupport_WIN_2020-04_2020-12        int64
Reopen_AO_2020-06                        int64
MaterialsSpike_WIN_2021H2                int64
PolicyUnwind_AO_2022-03                  int64
EnergyShock_WIN_2022-03_2022-06          int64
EnergySupport_WIN_2022-10_2023-03        int64
dtype: object
<class 'pandas.core.indexes.period.Peri

## 1. Cross-Correlation Analysis

We apply cross-correlation analysis to explore potential lead lag relationships between macroeconomic drivers and construction indicators. Here, a positive lag means the macro variable leads construction, while a negative lag indicates the reverse. This step helps identify which variables may carry predictive power for later modeling and planning.

### Quarterly data

In [326]:
# Helpers (align + standardize once; positive lag = MACRO leads)
def _align_and_standardize(x, y):
    """Align by index, drop NaNs once, then z-score using population std (ddof=0)."""
    z = pd.concat([x, y], axis=1).dropna()
    x0 = z.iloc[:, 0].to_numpy(dtype=float)
    y0 = z.iloc[:, 1].to_numpy(dtype=float)
    xn = (x0 - x0.mean()) / x0.std(ddof=0)
    yn = (y0 - y0.mean()) / y0.std(ddof=0)
    return xn, yn, len(z)

def _corr_at_lag_constant_den(xn, yn, k, n):
    """
    r(k) = corr(X_t, Y_{t+k}) computed like statsmodels.ccf with unbiased=False:
    constant denominator n; overlap multiplies are dot products over the overlap.
    """
    if k > 0:     # macro leads
        a, b = xn[:n-k], yn[k:]
        neff = n - k
    elif k < 0:   # construction leads
        kk = -k
        a, b = xn[kk:], yn[:n-kk]
        neff = n - kk
    else:
        a, b = xn, yn
        neff = n
    # same denominator n as statsmodels' ccf(..., unbiased=False)
    val = float(np.dot(a, b) / n) if neff > 0 else np.nan
    return val, neff

# Cross-corr style (both sides), but with explicit loop so sign is exactly ours
def cross_corr_best_lag_both_sides(x, y, K=12):
    xn, yn, n = _align_and_standardize(x, y)
    vals, lags, neffs = [], [], []
    for k in range(-K, K+1):
        v, ne = _corr_at_lag_constant_den(xn, yn, k, n)
        vals.append(v); lags.append(k); neffs.append(ne)
    i = int(np.nanargmax(np.abs(vals)))
    k_star = int(lags[i]); v_star = float(vals[i]); neff = int(neffs[i])
    band = 1.96 / np.sqrt(neff) if neff > 0 else np.nan
    return k_star, v_star, band

# Manual lagged-corr method using shifts, but numerically identical math
def manual_best_lag_both_sides(x, y, K=12):
    # Use the same standardized arrays as cross-corr to avoid numeric drift
    xn, yn, n = _align_and_standardize(x, y)
    best_k, best_v, best_neff = 0, np.nan, 0
    for k in range(-K, K+1):
        v, ne = _corr_at_lag_constant_den(xn, yn, k, n)
        if np.isnan(v):
            continue
        if np.isnan(best_v) or abs(v) > abs(best_v):
            best_k, best_v, best_neff = k, v, ne
    band = 1.96 / np.sqrt(best_neff) if best_neff > 0 else np.nan
    return best_k, best_v, band

# Zero-lag for your table's 'Correlation' column (same alignment you used)
def zero_lag_corr(x, y):
    z = pd.concat([x, y], axis=1).dropna()
    return z.iloc[:, 0].corr(z.iloc[:, 1])

# Creating a table
rows = []
for m in macro_cols:
    for c in cons_cols:
        r0 = zero_lag_corr(q[m], q[c])
        k_ccf, v_ccf, b_ccf = cross_corr_best_lag_both_sides(q[m], q[c], K=K)
        sig_ccf = (abs(v_ccf) > b_ccf) if pd.notna(v_ccf) else False

        k_man, v_man, b_man = manual_best_lag_both_sides(q[m], q[c], K=K)
        sig_man = (abs(v_man) > b_man) if pd.notna(v_man) else False

        rows.append([m, c, r0,
                     k_ccf, v_ccf, sig_ccf,
                     k_man, v_man, sig_man])

ccf_table_fixed = (
    pd.DataFrame(rows, columns=[
        'Macroeconomic',
        'Construction',
        'Correlation',
        'Optimal lag k, cross-corr method',
        'Max Cross-Correlation',
        'statistical significance cross-corr results, 95% confidence',
        'Optimal lag k, manual lagged corr method',
        'Max Cross-Correlation_manual',
        'statistical significance manual lagged corr lesults, 95% confidence'
    ])
    .sort_values(by='Max Cross-Correlation', key=np.abs, ascending=False)
    .reset_index(drop=True)
)

ccf_table_fixed.head(20)


Unnamed: 0,Macroeconomic,Construction,Correlation,"Optimal lag k, cross-corr method",Max Cross-Correlation,"statistical significance cross-corr results, 95% confidence","Optimal lag k, manual lagged corr method",Max Cross-Correlation_manual,"statistical significance manual lagged corr lesults, 95% confidence"
0,Services GDP,Construction output,0.919247,0,0.919247,True,0,0.919247,True
1,Production GDP,Construction output,0.85152,0,0.85152,True,0,0.85152,True
2,Services GDP,New Contracts - Private Commercial,0.630481,0,0.630481,True,0,0.630481,True
3,Services GDP,New Contracts - Private Housing,0.618868,0,0.618868,True,0,0.618868,True
4,Production GDP,New Contracts - Private Housing,0.600805,0,0.600805,True,0,0.600805,True
5,"Govt Expenditure, £m",Construction output,-0.56505,0,-0.56505,True,0,-0.56505,True
6,Production GDP,New Contracts - Private Commercial,0.531094,0,0.531094,True,0,0.531094,True
7,Production GDP,New Contracts - Private Industrial,0.508163,0,0.508163,True,0,0.508163,True
8,Services GDP,New Contracts - Infrastructure,0.485903,0,0.485903,True,0,0.485903,True
9,Services GDP,New Contracts - Private Industrial,0.459324,0,0.459324,True,0,0.459324,True


## Quarterly Data Cross-Correlation Insights


| Row | Macro variable         | Construction variable              | Lag (k) | Who leads?         | Max Corr | Key Insight                                                                                                               |
| --- | ---------------------- | ---------------------------------- | ------- | ------------------ | -------- | ------------------------------------------------------------------------------------------------------------------------- |
| 0   | Services GDP           | Construction output                | 0       | –                  | 0.919    | Very strong contemporaneous comovement. Services activity and construction output move together.                          |
| 1   | Production GDP         | Construction output                | 0       | –                  | 0.852    | Strong contemporaneous correlation. Industrial production and construction output move in sync.                           |
| 2   | CPIH\_yoy              | Constr Material Price Index        | −4      | **Materials lead** | 0.643    | Materials inflation leads consumer inflation by \~1 year.                                                                 |
| 3   | Business Investment %Δ | Construction output                | 0       | –                  | 0.641    | Investment growth is tightly coupled with construction output at zero lag.                                                |
| 4   | Services GDP           | New Contracts – Private Commercial | 0       | –                  | 0.630    | Services activity aligns with commercial contract awards.                                                                 |
| 5   | Services GDP           | New Contracts – Private Housing    | 0       | –                  | 0.619    | Housing contracts are contemporaneous with services cycles.                                                               |
| 6   | Production GDP         | New Contracts – Private Housing    | 0       | –                  | 0.601    | Private housing contracts move with production activity.                                                                  |
| 7   | Govt Expenditure       | Construction output                | 0       | –                  | −0.565   | Negative contemporaneous link: fiscal spending and construction output often move in opposite directions (crowding-out?). |
| 8   | Production GDP         | New Contracts – Private Commercial | 0       | –                  | 0.531    | Industrial output aligns with commercial contracts.                                                                       |
| 9   | Production GDP         | New Contracts – Private Industrial | 0       | –                  | 0.508    | Strong zero-lag link between production and private industrial contracts.                                                 |
| 10  | Services GDP           | New Contracts – Infrastructure     | 0       | –                  | 0.486    | Infrastructure contracts track services activity contemporaneously.                                                       |
| 11  | Services GDP           | New Contracts – Private Industrial | 0       | –                  | 0.459    | Services and industrial contracts move together.                                                                          |
| 12  | Production GDP         | Constr Material Price Index        | +7      | **Macro leads**    | 0.448    | Production output leads construction materials cost inflation by \~7 quarters (\~2 years).                                |
| 13  | Production GDP         | New Contracts – Infrastructure     | 0       | –                  | 0.447    | GDP and infrastructure contracts are aligned contemporaneously.                                                           |
| 14  | Govt Expenditure       | New Contracts – Private Housing    | 0       | –                  | −0.422   | Negative contemporaneous relation; spending does not boost private housing contracts directly.                            |
| 15  | Govt Expenditure       | New Contracts – Public Housing     | +5      | **Macro leads**    | 0.416    | Fiscal expenditure precedes public housing contracts by \~5 quarters.                                                     |
| 16  | Business Investment %Δ | New Contracts – Private Commercial | 0       | –                  | 0.413    | Investment cycles align with private commercial contracts.                                                                |
| 17  | Services GDP           | Constr Material Price Index        | +8      | **Macro leads**    | −0.401   | Weak but negative: services GDP leads material prices (possible regime-specific).                                         |
| 18  | Govt Expenditure       | New Contracts – Infrastructure     | 0       | –                  | −0.393   | No strong lag, but mild negative correlation contemporaneously.                                                           |
| 19  | Govt Expenditure       | Workforce Jobs                     | +1      | **Macro leads**    | −0.390   | Fiscal spending slightly precedes weaker employment effects (crowding-out).                                               |


The quarterly results highlight mostly contemporaneous relationships, with a few clear lead–lag dynamics (notably inflation and fiscal variables). One striking finding is that construction materials prices lead general consumer inflation (CPIH) by about one year, which is plausible given the faster adjustment of traded input costs compared to sticky consumer prices. Future work should test the robustness of these lead–lag links across sub-periods, use monthly data for finer resolution, and complement correlation with causal models (e.g. VAR, Granger causality) to better understand directionality.

### Monthly data

In [327]:
# 1. Choosing targets (construction)
cons_cols_m = ['Construction output', 'Constr Material Price Index']

# 2. Dummy filter
_event_name_rx = re.compile(
    r'(WIN|AO|COVID|Brexit|PolicySupport|Reopen|MaterialsSpike|'
    r'PolicyUnwind|EnergyShock|EnergySupport)', re.IGNORECASE
)

def is_dummy_like(col: pd.Series, name: str) -> bool:
    """Heuristics to catch event dummies even if stored as float."""
    if _event_name_rx.search(name or ''):
        return True
    if pd.api.types.is_bool_dtype(col) or pd.api.types.is_integer_dtype(col):
        return True
    # Treat as dummy if only a few distinct values (e.g., {0,1} or {0,1,-1})
    vals = pd.unique(col.dropna())
    if len(vals) <= 3:
        # all values are in {-1, 0, 1} or subset?
        if set(np.unique(vals)).issubset({-1, 0, 1}):
            return True
        # or clearly binary 0/1 even if float
        if set(np.unique(vals)).issubset({0.0, 1.0}):
            return True
    return False

# numeric columns only
numeric_cols = [c for c in df_m_diff.columns if pd.api.types.is_numeric_dtype(df_m_diff[c])]

# excluding construction targets and dummies
_excluded = []
macro_cols_m = []
for c in numeric_cols:
    if c in cons_cols_m:
        continue
    if is_dummy_like(df_m_diff[c], c):
        _excluded.append(c)
        continue
    macro_cols_m.append(c)

# 3. Core helpers (same math & sign: +k = macro leads)
def _align_and_standardize(x: pd.Series, y: pd.Series):
    z = pd.concat([x, y], axis=1).dropna()
    x0 = z.iloc[:, 0].to_numpy(dtype=float)
    y0 = z.iloc[:, 1].to_numpy(dtype=float)
    xn = (x0 - x0.mean()) / x0.std(ddof=0)
    yn = (y0 - y0.mean()) / y0.std(ddof=0)
    return xn, yn, len(z)

def _corr_at_lag_constant_den(xn: np.ndarray, yn: np.ndarray, k: int, n: int):
    # r(k) = corr(X_t, Y_{t+k}) with constant denominator n
    if k > 0:
        a, b = xn[:n-k], yn[k:]; neff = n - k
    elif k < 0:
        kk = -k; a, b = xn[kk:], yn[:n-kk]; neff = n - kk
    else:
        a, b = xn, yn; neff = n
    val = float(np.dot(a, b) / n) if neff > 0 else np.nan
    return val, neff

def cross_corr_best_lag_both_sides_monthly(x: pd.Series, y: pd.Series, K=18):
    xn, yn, n = _align_and_standardize(x, y)
    vals, lags, neffs = [], [], []
    for k in range(-K, K+1):
        v, ne = _corr_at_lag_constant_den(xn, yn, k, n)
        vals.append(v); lags.append(k); neffs.append(ne)
    i = int(np.nanargmax(np.abs(vals)))
    k_star, v_star, neff = int(lags[i]), float(vals[i]), int(neffs[i])
    band = 1.96 / np.sqrt(neff) if neff > 0 else np.nan
    return k_star, v_star, band

def manual_best_lag_both_sides_monthly(x: pd.Series, y: pd.Series, K=18):
    xn, yn, n = _align_and_standardize(x, y)
    best_k, best_v, best_neff = 0, np.nan, 0
    for k in range(-K, K+1):
        v, ne = _corr_at_lag_constant_den(xn, yn, k, n)
        if np.isnan(v): 
            continue
        if np.isnan(best_v) or abs(v) > abs(best_v):
            best_k, best_v, best_neff = k, v, ne
    band = 1.96 / np.sqrt(best_neff) if best_neff > 0 else np.nan
    return best_k, best_v, band

def zero_lag_corr_monthly(x: pd.Series, y: pd.Series):
    z = pd.concat([x, y], axis=1).dropna()
    return z.iloc[:, 0].corr(z.iloc[:, 1])

# 4. Build the monthly table (dummies excluded)
K_months = 18
rows = []
for m in macro_cols_m:
    for c in cons_cols_m:
        r0 = zero_lag_corr_monthly(df_m_diff[m], df_m_diff[c])

        k_ccf, v_ccf, b_ccf = cross_corr_best_lag_both_sides_monthly(df_m_diff[m], df_m_diff[c], K=K_months)
        sig_ccf = (abs(v_ccf) > b_ccf) if pd.notna(v_ccf) else False

        k_man, v_man, b_man = manual_best_lag_both_sides_monthly(df_m_diff[m], df_m_diff[c], K=K_months)
        sig_man = (abs(v_man) > b_man) if pd.notna(v_man) else False

        rows.append([m, c, r0,
                     k_ccf, v_ccf, sig_ccf,
                     k_man, v_man, sig_man])

ccf_table_monthly = (
    pd.DataFrame(rows, columns=[
        'Macroeconomic',
        'Construction',
        'Correlation',
        'Optimal lag k, cross-corr method',
        'Max Cross-Correlation',
        'statistical significance cross-corr results, 95% confidence',
        'Optimal lag k, manual lagged corr method',
        'Max Cross-Correlation_manual',
        'statistical significance manual lagged corr lesults, 95% confidence'
    ])
    .sort_values(by='Max Cross-Correlation', key=np.abs, ascending=False)
    .reset_index(drop=True)
)

ccf_table_monthly.head(20)


Unnamed: 0,Macroeconomic,Construction,Correlation,"Optimal lag k, cross-corr method",Max Cross-Correlation,"statistical significance cross-corr results, 95% confidence","Optimal lag k, manual lagged corr method",Max Cross-Correlation_manual,"statistical significance manual lagged corr lesults, 95% confidence"
0,Services GDP,Construction output,0.871483,0,0.871483,True,0,0.871483,True
1,Production GDP,Construction output,0.816408,0,0.816408,True,0,0.816408,True
2,CPIH_yoy,Constr Material Price Index,0.09202,-14,0.551937,True,-14,0.551937,True
3,Services GDP,Constr Material Price Index,-0.065553,16,-0.367935,True,16,-0.367935,True
4,"BoE Rate, %",Constr Material Price Index,-0.235663,-18,0.307327,True,-18,0.307327,True
5,"Govt Expenditure, £m",Construction output,-0.238554,0,-0.238554,True,0,-0.238554,True
6,EUR/GBP,Construction output,-0.080625,1,0.228602,True,1,0.228602,True
7,Production GDP,Constr Material Price Index,-0.176771,16,-0.19606,True,16,-0.19606,True
8,USD/GBP,Construction output,0.021632,1,0.183602,True,1,0.183602,True
9,USD/GBP,Constr Material Price Index,-0.048743,13,0.180024,True,13,0.180024,True


## Monthly Data Cross-Correlation Insights


| Row | Macro variable       | Construction variable       | Lag (k) | Who leads          | Max Corr | Key Insight                                                                                                                   |
| --- | -------------------- | --------------------------- | ------- | ------------------ | -------- | ----------------------------------------------------------------------------------------------------------------------------- |
| 0   | Services GDP         | Construction output         | 0       | –                  | 0.871    | Very strong contemporaneous comovement: services GDP and construction output move together.                                   |
| 1   | Production GDP       | Construction output         | 0       | –                  | 0.816    | Strong zero-lag correlation: industrial production and construction output are tightly aligned.                               |
| 2   | CPIH\_yoy            | Constr Material Price Index | –14     | Materials lead     | 0.552    | Construction materials costs anticipate consumer inflation by \~14 months, consistent with early pass-through of input costs. |
| 3   | Services GDP         | Constr Material Price Index | +16     | Macro leads        | –0.368   | Services GDP negatively leads materials prices with a long lag (\~16 months), possibly cyclical or spurious.                  |
| 4   | BoE Rate, %          | Constr Material Price Index | –18     | Materials lead     | 0.307    | Materials prices move up to 1.5 years before interest rate changes, suggesting cost pressures influence policy.               |
| 5   | Govt Expenditure, £m | Construction output         | 0       | –                  | –0.239   | Negative contemporaneous link: fiscal spending and construction output move in opposite directions (possible crowding-out).   |
| 6   | EUR/GBP              | Construction output         | +1      | Macro leads        | 0.229    | Exchange rate shifts slightly precede construction activity, reflecting sensitivity to import costs.                          |
| 7   | Production GDP       | Constr Material Price Index | +16     | Macro leads        | –0.196   | Industrial production negatively leads materials prices at long lags; may reflect cyclical offset.                            |
| 8   | USD/GBP              | Construction output         | +1      | Macro leads        | 0.184    | USD/GBP fluctuations lead construction output by one month, though the effect is moderate.                                    |
| 9   | USD/GBP              | Constr Material Price Index | +13     | Macro leads        | 0.180    | Dollar exchange rate leads materials prices by about a year, consistent with imported input dynamics.                         |
| 10  | Govt Expenditure, £m | Constr Material Price Index | +16     | Macro leads        | 0.143    | Weak and likely insignificant; fiscal spending shows little predictive power for materials prices.                            |
| 11  | EUR/GBP              | Constr Material Price Index | +17     | Macro leads        | –0.142   | Very weak and unstable negative link at long lag; not reliable.                                                               |
| 12  | CPIH\_yoy            | Construction output         | –11     | Construction leads | 0.057    | Insignificant; construction output does not systematically lead consumer inflation.                                           |
| 13  | BoE Rate, %          | Construction output         | +2      | Macro leads        | –0.038   | Insignificant; short-term rate changes have minimal direct transmission to construction activity.                             |


### Conclusion

Quarterly and monthly analyses both show that GDP (production and services) moves **contemporaneously** with construction output, confirming a tight cyclical link.  
The monthly data, however, reveal **finer lead–lag dynamics**: construction materials costs lead CPI inflation and interest rate decisions by over a year, while exchange rates provide short-horizon signals for construction activity.  
Future research should test these **high-frequency predictors** and explore whether they hold consistently across different economic regimes.

## 2. Cointegration Test (Engle-Granger method)

The cointegration test is a statistical method used in time series analysis to determine whether two or more non-stationary series have a stable long-term equilibrium relationship, despite short-term fluctuations. While non-stationary series (e.g., those with trends) often produce spurious correlations, cointegration helps identify whether their trends are meaningfully connected.

For Cointegration Test original level series will be used. All series should be non-stationary but integrated of order 1 (i.e., stationary after one differencing).


| Series                                                                                      | EG Suitability | Reason                                                  |
| -------------------------------------------------------------------------------------------- | -------------- | ------------------------------------------------------- |
|**Quarterly data**              |                                                             |                |                            
| Production GDP                                                                               | Yes            | I(1); stationary after first difference.                |
| Services GDP                                                                                 | Yes            | I(1); stationary after first difference.                |
|BoE Rate, %                                                                                   | Yes            | I(1); stationary after first difference.
| EUR/GBP                                                                                      | Yes            | I(1); stationary after first difference.                |
| USD/GBP                                                                                      | Yes            | I(1); stationary after first difference.                |
| Govt Expenditure (£m)                                                                        | Yes            | I(1); stationary after first difference.                |
| Construction output                                                                          | Yes            | I(1); stationary after first difference.                |
| Constr Material Price Index                                                                  | Yes            | I(1); stationary after first difference.                |
| Workforce Jobs                                                                               | Yes            | I(1); stationary after first difference.                |
| New Contracts (Public Housing, Private Housing, Private Industrial, Infrastructure, “Other”) | Yes            | All I(1); stationary after first difference.            |
| CPIH (level)                                                                                 | No             | Still non-stationary after differencing.                |
| New Contracts – Private Commercial                                                           | No             | Non-stationary after differencing.                      |
| Employment rate                                                                              | No             | Requires second differencing (I(2)); unsuitable for EG. |
| CPIH\_yoy                                                                                    | No             | Already stationary (rate/Δ series).                     |
| Business Investment, % change                                                                | No             | Stationary % change, not a level I(1).                  |
| Event dummies (GFC, COVID, EnergyShock, etc.)                                                | No             | Deterministic interventions, not stochastic trends.     |


In [328]:
df_q_raw.info()

<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 77 entries, 2006Q1 to 2025Q1
Freq: Q-DEC
Data columns (total 17 columns):
 #   Column                              Non-Null Count  Dtype  
---  ------                              --------------  -----  
 0   CPIH                                77 non-null     float64
 1   Production GDP                      77 non-null     float64
 2   Services GDP                        77 non-null     float64
 3   Employment rate, %                  77 non-null     float64
 4   BoE Rate, %                         77 non-null     float64
 5   EUR/GBP                             77 non-null     float64
 6   USD/GBP                             77 non-null     float64
 7   Business Investment, % change       77 non-null     float64
 8   Govt Expenditure, £m                77 non-null     float64
 9   Construction output                 77 non-null     float64
 10  New Contracts - Public Housing      77 non-null     float64
 11  New Contracts - Private H

In [329]:
df_q_raw_add.info()

<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 76 entries, 2006Q1 to 2024Q4
Freq: Q-DEC
Data columns (total 9 columns):
 #   Column                                     Non-Null Count  Dtype  
---  ------                                     --------------  -----  
 0   Constr Material Price Index                76 non-null     float64
 1   Small construction companies               69 non-null     float64
 2   Medium construction companies              69 non-null     float64
 3   Large construction companies               69 non-null     float64
 4   Number of all construction companies       69 non-null     float64
 5   Employees - Small construction companies   69 non-null     float64
 6   Employees - Medium construction companies  69 non-null     float64
 7   Employees - Large construction companies   69 non-null     float64
 8   Employees - All construction companies     69 non-null     float64
dtypes: float64(9)
memory usage: 5.9 KB


In [330]:
# Preparing data sets:
df_q_coint1 = df_q_raw[['Production GDP', 'Services GDP',
       'BoE Rate, %', 'EUR/GBP', 'USD/GBP',
       'Govt Expenditure, £m', 'Construction output',
       'New Contracts - Public Housing', 'New Contracts - Private Housing',
       'New Contracts - Infrastructure', 'New contracts - Other',
       'New Contracts - Private Industrial',
       'Workforce Jobs (thousands)']]
df_q_coint2 = df_q_raw_add[['Constr Material Price Index']]
df_q_coint = pd.concat([df_q_coint1, df_q_coint2], axis=1, join='inner')


In [331]:
# Defining column groups
macro_cols = ['Production GDP', 'Services GDP',
       'BoE Rate, %', 'EUR/GBP', 'USD/GBP',
       'Govt Expenditure, £m']
construction_cols = ['Construction output',
       'New Contracts - Public Housing', 'New Contracts - Private Housing',
       'New Contracts - Infrastructure', 'New contracts - Other',
       'New Contracts - Private Industrial',
       'Workforce Jobs (thousands)', 'Constr Material Price Index']

# Calculating 6x8 correlation matrix
correlation = pd.DataFrame(index=macro_cols, columns=construction_cols)

for macro in macro_cols:
    for constr in construction_cols:
        correlation.loc[macro, constr] = df_q_coint[macro].corr(df_q_coint[constr])

correlation = correlation.astype(float)
correlation

Unnamed: 0,Construction output,New Contracts - Public Housing,New Contracts - Private Housing,New Contracts - Infrastructure,New contracts - Other,New Contracts - Private Industrial,Workforce Jobs (thousands),Constr Material Price Index
Production GDP,0.417836,-0.155675,0.506151,-0.103254,-0.121727,0.662653,0.677921,-0.066653
Services GDP,0.870144,-0.701183,0.303088,0.200847,-0.667248,0.323814,0.23089,0.857789
"BoE Rate, %",0.313323,0.258472,0.028669,-0.409181,0.193266,0.369978,0.399588,0.095861
EUR/GBP,-0.137267,0.402121,0.386515,-0.250101,0.214754,0.363874,0.151846,-0.471125
USD/GBP,-0.530753,0.738122,-0.005781,-0.288679,0.641393,-0.030125,-0.04859,-0.744463
"Govt Expenditure, £m",0.299284,-0.499585,-0.324074,-0.096447,-0.294051,-0.042114,0.141874,0.570677


In [332]:
# Leave only pairs with a strong correlation
strong_pairs = correlation[abs(correlation) >= 0.5].stack().reset_index()
strong_pairs.columns = ['Macroeconomic','Construction','Correlation']
# Exclude self-correlations
strong_pairs = strong_pairs[strong_pairs['Macroeconomic'] != strong_pairs['Construction']]
strong_pairs

Unnamed: 0,Macroeconomic,Construction,Correlation
0,Production GDP,New Contracts - Private Housing,0.506151
1,Production GDP,New Contracts - Private Industrial,0.662653
2,Production GDP,Workforce Jobs (thousands),0.677921
3,Services GDP,Construction output,0.870144
4,Services GDP,New Contracts - Public Housing,-0.701183
5,Services GDP,New contracts - Other,-0.667248
6,Services GDP,Constr Material Price Index,0.857789
7,USD/GBP,Construction output,-0.530753
8,USD/GBP,New Contracts - Public Housing,0.738122
9,USD/GBP,New contracts - Other,0.641393


In [333]:
# Cointegration Test Using the Engle-Granger method
strong_pairs['Cointegration'] = 0
strong_pairs['Coint_pvalue'] = np.nan
for i in strong_pairs.index:
    x = strong_pairs.iloc[i, 0]
    y = strong_pairs.iloc[i, 1]
    series_x = df_q_coint[x]
    series_y = df_q_coint[y]
    score, pvalue, crit_values = coint(series_x, series_y)
    strong_pairs.loc[i, 'Coint_pvalue'] = pvalue
    if pvalue < 0.05:
        strong_pairs.loc[i, 'Cointegration'] = 1

strong_pairs[strong_pairs['Cointegration'] == 1]

Unnamed: 0,Macroeconomic,Construction,Correlation,Cointegration,Coint_pvalue
2,Production GDP,Workforce Jobs (thousands),0.677921,1,0.008107
4,Services GDP,New Contracts - Public Housing,-0.701183,1,0.039606
8,USD/GBP,New Contracts - Public Housing,0.738122,1,0.011559
9,USD/GBP,New contracts - Other,0.641393,1,0.003374
11,"Govt Expenditure, £m",Constr Material Price Index,0.570677,1,0.029418


The cointegration test was conducted on selected pairs of macroeconomic and construction indicators to identify long-term equilibrium relationships. Using the Engle-Granger method on I(1) series, several variable pairs were found to be cointegrated, indicating they move together over time despite short-term fluctuations. These cointegrated pairs are suitable candidates for further modeling using Vector Error Correction Models (VECM). Non-cointegrated pairs, by contrast, do not share a stable long-run relationship and should be analyzed with caution in trend-based modeling.


| Cointegration Pairs for VECM                                            | Suitability for VECM | Why it’s promising                                                                                               |
| ----------------------------------------------- | -------------------- | ---------------------------------------------------------------------------------------------------------------- |
| **Construction output – Services GDP**          | High                 | Strong and intuitive long-run relation; supports modeling sectoral co-movement.                                  |
| **Construction output – Production GDP**        | High                 | Confirms industry-construction link; ideal for testing cyclical equilibrium + adjustment dynamics.               |
| **Constr Material Price Index – CPIH**          | High                 | Materials costs feed into CPI; VECM can capture short-run lead/lag with long-run pass-through.                   |
| **Constr Material Price Index – USD/GBP**       | Medium–High          | Structural FX pass-through to input costs; useful if exchange rate dynamics are a focus.                         |
| **Construction output – EUR/GBP**               | Medium               | Long-run tie with FX, but weaker than USD/GBP; still relevant for open-economy analysis.                         |
| **Construction output – Govt Expenditure (£m)** | Medium               | Fiscal–construction cointegration exists, but short-run linkages weaker; VECM may help test policy shocks.       |
| **Construction output – Workforce Jobs**        | High                 | Strong labor–output equilibrium; good candidate to model adjustment of employment vs construction activity.      |
| **Constr Material Price Index – BoE Rate (%)**  | Medium               | Interesting for policy transmission analysis, but interpretation requires caution (rates are policy-determined). |


## 3. Vector Error Correction Models (VECM)


In [334]:
df_q_coint.columns

Index(['Production GDP', 'Services GDP', 'BoE Rate, %', 'EUR/GBP', 'USD/GBP',
       'Govt Expenditure, £m', 'Construction output',
       'New Contracts - Public Housing', 'New Contracts - Private Housing',
       'New Contracts - Infrastructure', 'New contracts - Other',
       'New Contracts - Private Industrial', 'Workforce Jobs (thousands)',
       'Constr Material Price Index'],
      dtype='object')

In [335]:
df_q_diff.columns

Index(['CPIH_yoy', 'Production GDP', 'BoE Rate, %', 'Services GDP', 'EUR/GBP',
       'USD/GBP', 'Govt Expenditure, £m', 'Employment rate_i2',
       'Business Investment, % change', 'Construction output',
       'Constr Material Price Index', 'Workforce Jobs (thousands)',
       'New Contracts - Private Housing', 'New Contracts - Private Commercial',
       'New Contracts - Private Industrial', 'New Contracts - Public Housing',
       'New Contracts - Infrastructure', 'New contracts - Other',
       'GFC_WIN_2008Q4_2009Q2', 'InfraOrders_AO_2017Q3', 'COVID_AO_2020Q2',
       'FiscalSupport_WIN_2020Q2_2020Q4', 'Reopen_AO_2020Q3',
       'MaterialsSpike_WIN_2021H2', 'EnergyShock_WIN_2022H1',
       'EnergySupport_WIN_2022Q4_2023Q2'],
      dtype='object')

In [336]:
# 1. Selecting relevant columns (positive-valued quantities) from df_q_coint
log_vars = [
    'Production GDP',
    'Services GDP',
    'Govt Expenditure, £m',
    'Construction output',
    'Constr Material Price Index',
    'Workforce Jobs (thousands)',
    'New Contracts - Private Housing',
    'New Contracts - Private Commercial',
    'New Contracts - Private Industrial',
    'New Contracts - Public Housing',
    'New Contracts - Infrastructure',
    'New contracts - Other'
]

# Keep FX and % rates in levels (no logs)
keep_vars = [
    'USD/GBP',
    'EUR/GBP',
    'BoE Rate, %'
]

# Build transformed dataframe
df_log = pd.DataFrame(index=df_q_coint.index)
for col in df_q_coint.columns:
    if col in log_vars:
        df_log[col] = np.log(df_q_coint[col])
    elif col in keep_vars:
        df_log[col] = df_q_coint[col]
# Dropping rows with NaNs created by log
df_log = df_log.dropna()


# 2. Defining pairs
pairs = [
    ['Production GDP', 'Workforce Jobs (thousands)'],
    ['Services GDP', 'New Contracts - Public Housing'],
    ['USD/GBP', 'New Contracts - Public Housing'],
    ['USD/GBP', 'New contracts - Other'],
    ['Govt Expenditure, £m', 'Constr Material Price Index'],
]

def pick_lag(Y, maxlags=8):
    try:
        k = select_order(Y, maxlags=maxlags, deterministic="ci").aic
        return 2 if k is None or (hasattr(k, "is_integer") and np.isnan(k)) else int(k)
    except Exception:
        return 2

# 3) Estimating VECM on log-levels
for a, b in pairs:
    if a not in df_log.columns or b not in df_log.columns:
        continue
    Y = df_log[[a, b]].dropna()

    k_ar = pick_lag(Y)
    res = VECM(Y, k_ar_diff=k_ar, coint_rank=1, deterministic="ci").fit()

    beta = pd.Series(res.beta[:, 0], index=Y.columns)
    alpha = pd.Series(res.alpha[:, 0], index=Y.columns)

    # intercept inside cointegration: c = -mean(beta' * Y)
    c = -np.mean(Y.values @ beta.values)

    # normalize on first series
    b_norm = beta / beta.iloc[0]
    intercept = -c / beta.iloc[0]

    print("\n***********************************************")
    print(f"VECM (log-levels): {a}  <->  {b}  |  k_ar_diff={k_ar}")
    print("Long-run relation (elasticity form, normalized on first series):")
    rhs = " ".join([f"+ ({b_norm.iloc[j]:.4f})*log({Y.columns[j]})" for j in range(1, len(Y.columns))])
    print(f"log({Y.columns[0]}) = {intercept:.4f} {(' '+rhs) if rhs else ''}")

    print("Speed of adjustment (alpha):")
    print(alpha.to_string())



***********************************************
VECM (log-levels): Production GDP  <->  Workforce Jobs (thousands)  |  k_ar_diff=0
Long-run relation (elasticity form, normalized on first series):
log(Production GDP) = -4.8028  + (-1.2211)*log(Workforce Jobs (thousands))
Speed of adjustment (alpha):
Production GDP               -0.206847
Workforce Jobs (thousands)    0.139243

***********************************************
VECM (log-levels): Services GDP  <->  New Contracts - Public Housing  |  k_ar_diff=1
Long-run relation (elasticity form, normalized on first series):
log(Services GDP) = 5.5676  + (0.2237)*log(New Contracts - Public Housing)
Speed of adjustment (alpha):
Services GDP                     -0.037150
New Contracts - Public Housing   -1.371892

***********************************************
VECM (log-levels): USD/GBP  <->  New Contracts - Public Housing  |  k_ar_diff=3
Long-run relation (elasticity form, normalized on first series):
log(USD/GBP) = -0.9100  + (-0.4954)*lo

| Pair                                                 | Long-run elasticity relation                                                                                            | Speed of adjustment (alpha)                                                      | Insight                                                                                       |
| ---------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- |
| **Production GDP <-> Workforce Jobs**                  | –1.22: a 1% up in jobs is linked to \~1.2% down in GDP (counterintuitive sign, may reflect scaling or data period effects). | GDP adjusts back (–0.21); Jobs adjust weakly (+0.14).                        | Cointegration exists but the negative slope suggests distortions; GDP is the main stabilizer. |
| **Services GDP <-> Public Housing Contracts**          | +0.22: a 1% up in contracts -> \~0.2% up in Services GDP.                                                                  | Services GDP adjusts slowly (–0.04); Contracts adjust strongly (–1.37).      | Services GDP is the stable core; contracts are volatile short-run drivers.                    |
| **USD/GBP <-> Public Housing Contracts**               | –0.50: a 1% up in contracts -> \~0.5% down in USD/GBP (GBP strengthens).                                                     | USD/GBP corrects deviations (–0.13); Contracts adjust moderately (+0.49).    | FX shows plausible stabilizing adjustment; contracts somewhat reactive.                       |
| **USD/GBP <-> Other Contracts**                        | –0.61: a 1% up in contracts -> \~0.6% down in USD/GBP.                                                                       | USD/GBP adjusts modestly (–0.08); Contracts adjust moderately (+0.47).       | Exchange rate dynamics consistent; contracts volatile but less extreme.                       |
| **Govt Expenditure <-> Constr. Materials Price Index** | –1.06: a 1% up in materials cost -> \~1.1% down in Govt expenditure.                                                         | Govt spending adjusts clearly (–0.23); Materials adjust very little (+0.02). | Fiscal side absorbs shocks; materials prices act exogenous.                                   |
