### Analyse stationarity of different macroeconomic series

**The purpose of this notebook is to show complexities related to the stationarity analysis.**  
**This analysis is not sufficient to assess whether delta GDP or unemployment of an individual country is stationary.**

**Review of analysis from an OpenAI model**:  
Most common “stationarity” tests on macro‐level data (like unemployment or GDP in levels) often fail to   
reject non‐stationarity. In other words, it is *normal* for a raw unemployment or GDP series to come back   
“non‐stationary” in the standard ADF/KPSS/PP tests—especially if you are only feeding in about 20–25 yearly   
data points and not detrending first.

Some things to keep in mind:

1. **Macro indicators often trend and/or have breaks.** Unemployment and GDP tend to have structural breaks,  
trends, or strong cycles. If you run ADF on raw levels of a trending or breaking series, you will often  
fail to reject the null of non‐stationarity. That’s why you may see big p‐values (the red cells in your heatmap).

3. **Short sample = low power.** If your windows run from, say, 2001 to 2024, that is fewer than 25 annual observations.  
Many of these stationarity tests have low power with short samples. Even with monthly data, you would need to be cautious;  
with yearly data you have even fewer points, making it more likely you will get non‐stationarity.  

4. **ADF vs. KPSS vs. Zivot‐Andrews.**  
   - ADF has a null of *non*‐stationarity, so a large p‐value means “we cannot reject that it’s non‐stationary.”  
   - KPSS has the opposite null (it assumes stationarity), so the interpretation of the p‐value flips.  
   - Zivot‐Andrews tries to allow for a structural break in the trend, which can help if you suspect, for example,  
     a major shift in unemployment around the financial crisis or COVID period.
     
Hence, *nothing is necessarily “wrong”* with the results. They’re likely telling you that unemployment in levels   
(from 2001 to 2024) is not stationary in a short window. If your goal is to model these series in a stationary framework,   
you would detrend before running the stationarity tests again. Then you should start seeing more green cells indicating   
that you *can* reject non‐stationarity.

In [1]:
import pandas as pd
import pandasdmx as sdmx
import numpy as np
from statsmodels.tsa.stattools import adfuller, kpss, zivot_andrews
import requests
import warnings
from statsmodels.tools.sm_exceptions import InterpolationWarning

# Optionally suppress the warning:
warnings.filterwarnings("ignore", category=InterpolationWarning)
warnings.filterwarnings("ignore", category=RuntimeWarning)

# Ensure no limitations in Jupyter Notebook output
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

  warn(


### A. Retrieve macroeconomic data

In [2]:
# Parametrisation
countries = [
    "AUT",  # Austria
    "BEL",  # Belgium
    "BGR",  # Bulgaria
    "CYP",  # Cyprus
    "CZE",  # Czechia
    "DEU",  # Germany
    "DNK",  # Denmark
    "EA20", # Euro area (20 countries)
    "ESP",  # Spain
    "EST",  # Estonia
    "EU27", # European Union (27 countries)
    "FIN",  # Finland
    "FRA",  # France
    "GBR",  # United Kingdom
    "GRC",  # Greece
    "HRV",  # Croatia
    "HUN",  # Hungary
    "IRL",  # Ireland
    "ITA",  # Italy
    "LTU",  # Lithuania
    "LUX",  # Luxembourg
    "LVA",  # Latvia
    "MLT",  # Malta
    "NLD",  # Netherlands
    "POL",  # Poland
    "PRT",  # Portugal
    "ROM",  # Romania
    "SVK",  # Slovakia
    "SVN",  # Slovenia
    "SWE",  # Sweden
]

macroeconomic_variable = 'OVGD'

In [3]:
def retrieve_macro_series(countries, macroeconomic_variable):
    # Build the series key:
    series = f"AME/A.{'+'.join(countries)}.1.0.0.0.{macroeconomic_variable}"
    url = 'https://sdw-wsrest.ecb.europa.eu/service/data/'

    # Headers used as content negotiation to return data in json format
    headers = {'Accept':'application/json'}
    r = requests.get(f'{url}{series}', headers=headers).json()

    # Process request
    date_list = r['structure']['dimensions']['observation'][0]['values']
    dates = {i: v['id'] for i, v in enumerate(date_list)}    
    areas = [v['name'] for v in r['structure']['dimensions']['series'][1]['values']]
    
    df = pd.DataFrame()
    for i, area in enumerate(areas):
        s_key = f'0:{i}:0:0:0:0:0'
        s_list = r['dataSets'][0]['series'][s_key]['observations']
        df[area] = pd.Series({dates[int(i)]: v[0] for i, v in s_list.items()})
    
    df.index = df.index.astype(int)
    
    return df

In [4]:
# Get annual changes of GDP and unemployment series
gdp_df = retrieve_macro_series(countries, 'OVGD')
gdp_df = gdp_df.pct_change(fill_method=None)

unemployment_df = retrieve_macro_series(countries, 'ZUTN')

In [5]:
gdp_df.loc[2000: 2024]

Unnamed: 0,Austria,Belgium,Bulgaria,Cyprus,Czech Republic,FR. Germany,Denmark,Euro area (20 countries),Spain,Estonia,European Union (27 countries),Finland,France,United Kingdom,Greece,Croatia,Hungary,Ireland,Italy,Lithuania,Luxembourg,Latvia,Malta,Netherlands,Poland,Portugal,Romania,Slovakia,Slovenia,Sweden
2000,0.031895,0.037167,0.045872,0.059653,0.040107,0.028773,0.03724,0.038944,0.052006,0.100877,0.038939,0.057537,0.04141,0.043417,0.041378,0.029453,0.044094,0.094035,0.038821,0.034186,0.069381,0.05841,0.064134,0.042198,0.046563,0.038162,0.024614,0.007893,0.034979,0.046311
2001,0.01317,0.010996,0.038238,0.039526,0.029172,0.016365,0.0095,0.021853,0.039191,0.058807,0.021758,0.026403,0.018994,0.025727,0.046504,0.031112,0.04062,0.053058,0.020065,0.064841,0.030744,0.064606,-0.007572,0.023236,0.012337,0.019437,0.052182,0.029251,0.028287,0.013608
2002,0.014844,0.017069,0.058719,0.037229,0.015135,-0.002283,0.004565,0.009468,0.027555,0.069331,0.011106,0.016873,0.010678,0.017957,0.046832,0.058033,0.047301,0.058994,0.002699,0.067199,0.032254,0.076655,0.027411,0.002458,0.019013,0.007709,0.05703,0.044172,0.032826,0.022777
2003,0.011416,0.010379,0.052371,0.026233,0.033008,-0.005299,0.004411,0.007532,0.029393,0.075962,0.009081,0.020118,0.009678,0.031523,0.057968,0.055684,0.039388,0.030139,0.000665,0.105519,0.026194,0.084291,0.036933,0.000978,0.035234,-0.009305,0.023412,0.048568,0.03195,0.018809
2004,0.025653,0.035712,0.065106,0.050263,0.047363,0.011624,0.027764,0.023121,0.031145,0.068007,0.025924,0.040054,0.028683,0.024579,0.053778,0.041705,0.049631,0.067881,0.01474,0.06502,0.042319,0.087251,0.004073,0.020163,0.050908,0.017887,0.10428,0.053893,0.04546,0.041795
2005,0.023204,0.023217,0.070564,0.048531,0.06375,0.008857,0.023596,0.017718,0.035512,0.095231,0.019247,0.027773,0.018886,0.027327,0.011835,0.043271,0.043003,0.057398,0.007624,0.077315,0.024829,0.116153,0.028813,0.02034,0.032608,0.007819,0.046681,0.064851,0.038541,0.027932
2006,0.032693,0.025523,0.068026,0.047138,0.066232,0.038557,0.038165,0.033215,0.040434,0.097626,0.034944,0.040196,0.02714,0.023807,0.064434,0.050815,0.039339,0.049878,0.017995,0.073954,0.060167,0.12825,0.023361,0.035374,0.062021,0.01625,0.080288,0.089256,0.059088,0.04676
2007,0.037752,0.036769,0.066543,0.05098,0.054888,0.028901,0.009871,0.030114,0.035335,0.075709,0.03156,0.053128,0.025305,0.026249,0.035068,0.050489,0.003324,0.053102,0.014623,0.11078,0.080987,0.104146,0.050364,0.038853,0.067605,0.025066,0.072339,0.108187,0.071392,0.032249
2008,0.014533,0.004469,0.061295,0.036468,0.026123,0.009104,-0.004172,0.004164,0.007671,-0.051253,0.006411,0.007844,0.003802,-0.002488,0.000575,0.019686,0.009934,-0.044841,-0.010231,0.025994,-0.003002,-0.033887,0.044107,0.021168,0.043837,0.003192,0.093074,0.053634,0.033722,-0.009231
2009,-0.035863,-0.019065,-0.033471,-0.020153,-0.047983,-0.055452,-0.049745,-0.044827,-0.037681,-0.146302,-0.043487,-0.08076,-0.028246,-0.046205,-0.041193,-0.068139,-0.067402,-0.050958,-0.053051,-0.148386,-0.032389,-0.1604,-0.013953,-0.036653,0.026151,-0.031221,-0.055167,-0.055054,-0.075906,-0.042556


In [6]:
unemployment_df.loc[2000: 2024]

Unnamed: 0,Austria,Belgium,Bulgaria,Cyprus,Czech Republic,FR. Germany,Denmark,Euro area (20 countries),Spain,Estonia,European Union (27 countries),Finland,France,United Kingdom,Greece,Croatia,Hungary,Ireland,Italy,Lithuania,Luxembourg,Latvia,Malta,Netherlands,Poland,Portugal,Romania,Slovakia,Slovenia,Sweden
2000,3.8,7.1,19.6,4.9,8.8,7.4,4.6,,13.9,14.6,9.9,9.9,8.6,5.459822,11.6,15.6,6.2,4.5,10.7,16.4,2.4,14.5,6.6,3.6,16.8,4.8,8.9,18.8,6.8,6.7
2001,3.9,6.7,23.6,3.9,8.2,7.5,4.6,,10.6,13.0,9.5,9.2,7.8,5.099012,11.0,16.0,5.5,4.2,9.7,17.3,2.3,13.9,6.9,2.8,19.0,4.8,8.3,19.3,6.2,5.0
2002,4.3,7.6,21.1,3.6,7.3,8.2,4.6,,11.5,11.2,9.8,9.2,7.9,5.188219,10.6,15.0,5.6,4.7,9.1,13.7,2.9,12.6,6.9,3.4,20.7,6.0,10.5,18.7,6.3,5.2
2003,4.6,8.3,15.9,4.3,7.8,9.3,5.4,,11.5,10.3,9.9,9.1,8.5,5.012749,10.0,14.2,5.7,4.8,8.8,12.5,3.7,11.7,7.6,4.5,20.4,7.5,8.5,17.6,6.7,5.8
2004,5.9,8.5,14.1,4.7,8.3,10.2,5.5,,11.0,10.1,10.1,8.9,8.9,4.753824,10.8,13.7,5.9,4.7,8.1,10.9,5.1,11.8,7.2,5.6,19.7,7.8,9.9,18.2,6.3,6.6
2005,6.0,8.6,11.7,5.3,7.9,10.5,4.8,,9.2,8.0,9.8,8.5,8.9,4.830368,10.2,12.8,7.0,4.6,7.8,8.3,4.5,10.1,6.9,7.2,18.5,9.0,8.8,16.3,6.5,7.6
2006,5.7,8.4,10.5,4.6,7.2,9.6,3.9,,8.5,5.9,8.8,7.8,8.8,5.422897,9.2,11.3,7.3,4.8,6.9,5.8,4.7,7.1,6.8,6.1,14.4,9.1,8.9,13.4,6.0,7.2
2007,5.3,7.6,8.0,3.9,5.3,8.1,3.8,,8.2,4.6,7.7,7.0,8.0,5.332777,8.6,9.9,7.2,5.0,6.2,4.3,4.1,6.2,6.5,5.2,10.0,9.5,7.8,11.1,4.9,6.3
2008,4.4,7.1,6.5,3.7,4.4,7.0,3.7,,11.3,5.5,7.4,6.5,7.4,5.685386,8.0,8.6,7.6,6.8,6.8,5.8,5.1,7.8,6.0,4.5,7.4,9.0,7.1,9.5,4.4,6.3
2009,5.7,8.0,7.9,5.4,6.7,7.3,6.4,9.7,17.9,13.5,9.3,8.3,9.1,7.613899,9.8,9.2,9.7,12.6,7.9,13.8,5.1,17.7,6.9,5.4,8.5,11.2,8.4,12.0,5.9,8.5


### B. Analyse stationarity

In [7]:
def stationarity_testing(
    df, 
    test='adf', 
    start_year_range=range(2001, 2015), 
    end_year=2024, 
    **kwargs
):
    """
    Apply a stationarity test to each time series (DataFrame column) over sliding windows.
    
    Parameters:
    -----------
    df : pd.DataFrame
        Time series data with time as the index and series (e.g., countries) as columns.
    test : str, optional (default='adf')
        Stationarity test to use. Options are:
          - 'adf': Augmented Dickey–Fuller test
          - 'kpss': KPSS test
          - 'pp': Phillips–Perron test
          - 'zivot': Zivot–Andrews test
    start_year_range : iterable, optional (default=range(2009, 2016))
        Iterable of starting years for the sliding window.
    end_year : int, optional (default=2024)
        The end year for the sliding window.
    **kwargs :
        Additional keyword arguments to pass to the chosen test function.
        
    Returns:
    --------
    pd.DataFrame
        A DataFrame with series as rows and sliding window labels as columns, 
        containing the p-values from the stationarity tests.
    """
    results = {}
    
    for country in df.columns:
        p_values_by_window = {}
        for start_year in start_year_range:
            # Select the sub-series for the current window.
            ts_sub = df.loc[start_year:end_year, country].dropna()
            if len(ts_sub) < 2:
                continue

            try:
                # Use explicit if/elif to choose the test.
                if test.lower() == 'adf':
                    test_result = adfuller(ts_sub, **kwargs)
                    p_value = test_result[1]
                elif test.lower() == 'kpss':
                    test_result = kpss(ts_sub, **kwargs)
                    p_value = test_result[1]
                elif test.lower() == 'zivot':
                    test_result = zivot_andrews(ts_sub, **kwargs)
                    p_value = test_result[1]
                else:
                    raise ValueError(f"Unknown test: {test}. Choose 'adf', 'kpss', 'pp', or 'zivot'.")
            except:
                p_value = 0.5
                
            
            window_label = f"{start_year}-{end_year}"
            p_values_by_window[window_label] = p_value
        
        results[country] = p_values_by_window
    
    return pd.DataFrame(results).T

def highlight_pvalue(val):
    """
    Return a CSS style string based on the p-value:
      - p < 0.05: green
      - 0.05 <= p < 0.10: orange
      - p >= 0.10: red
    The function applies white, bold text.
    """
    if pd.isna(val):
        return ""
    elif val < 0.05:
        bg_color = "rgba(0, 128, 0, 0.4)"       # Green with 50% transparency
    elif val < 0.10:
        bg_color = "rgba(255, 165, 0, 0.4)"     # Orange with 50% transparency
    else:
        bg_color = "rgba(255, 0, 0, 0.4)"         # Red with 50% transparency
    return f"background-color: {bg_color}; color: black;"

def highlight_pvalue(val, test='adf'):
    """
    Color cells based on p-value and test type:
      - For ADF or Zivot–Andrews: small p < 0.05 => stationary (green)
      - For KPSS: small p < 0.05 => non-stationary (red)
    """
    if pd.isna(val):
        return ""
    
    if test.lower() in ['adf', 'zivot']:
        # Null: non-stationary
        if val < 0.05:
            bg_color = "rgba(0, 128, 0, 0.4)"     # Green if we reject unit root
        elif val < 0.10:
            bg_color = "rgba(255, 165, 0, 0.4)"   # Orange
        else:
            bg_color = "rgba(255, 0, 0, 0.4)"     # Red
    elif test.lower() == 'kpss':
        # Null: stationarity
        # So small p-value => we reject stationarity => series likely non-stationary => color it red
        if val < 0.05:
            bg_color = "rgba(255, 0, 0, 0.4)"     # Red
        elif val < 0.10:
            bg_color = "rgba(255, 165, 0, 0.4)"   # Orange
        else:
            bg_color = "rgba(0, 128, 0, 0.4)"     # Green
    else:
        bg_color = ""
    
    return f"background-color: {bg_color}; color: black;"

In [8]:
# Test unemployment (options: adf, kpsss, zivot)
test = 'adf'

df_results = stationarity_testing(gdp_df.loc['2000':'2024'], test=test)

df_results_styled = (
    df_results
    .style
    .format("{:.2%}")
    .map(lambda x: highlight_pvalue(x, test=test))
)

df_results_styled

Unnamed: 0,2001-2024,2002-2024,2003-2024,2004-2024,2005-2024,2006-2024,2007-2024,2008-2024,2009-2024,2010-2024,2011-2024,2012-2024,2013-2024,2014-2024
Austria,0.21%,0.08%,0.00%,0.00%,0.00%,0.00%,14.57%,18.41%,15.32%,0.05%,2.35%,22.66%,45.38%,0.20%
Belgium,0.01%,0.01%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,4.52%,0.00%,62.52%,29.30%,26.92%
Bulgaria,0.50%,0.50%,0.56%,0.37%,0.15%,0.04%,0.00%,0.00%,67.53%,0.02%,0.04%,0.06%,0.01%,0.00%
Cyprus,2.08%,2.58%,3.26%,3.89%,4.54%,5.33%,5.99%,7.77%,0.00%,35.34%,30.02%,6.32%,13.22%,9.68%
Czech Republic,0.19%,0.97%,0.06%,0.42%,0.20%,0.05%,16.10%,23.98%,2.76%,0.30%,2.50%,7.75%,0.42%,1.06%
FR. Germany,0.00%,0.00%,0.00%,0.00%,0.01%,0.01%,0.01%,0.01%,0.00%,0.31%,0.05%,0.30%,0.51%,1.10%
Denmark,0.02%,0.02%,0.03%,0.05%,0.08%,0.06%,0.25%,0.21%,0.00%,0.00%,23.59%,9.76%,17.64%,0.60%
Euro area (20 countries),0.00%,0.00%,0.00%,0.00%,0.00%,0.00%,35.01%,44.69%,21.49%,8.80%,8.41%,0.06%,48.84%,0.18%
Spain,0.05%,9.05%,0.25%,0.13%,0.16%,0.16%,96.10%,0.36%,59.36%,0.67%,19.81%,0.70%,21.14%,1.34%
Estonia,0.94%,3.47%,3.26%,3.42%,2.14%,1.07%,0.82%,2.93%,0.00%,1.50%,0.83%,1.69%,2.99%,4.69%


In [9]:
# Test unemployment (options: adf, kpsss, zivot)
test = 'adf'

df_results = stationarity_testing(unemployment_df.loc['2000':'2024'], test=test)

df_results_styled = (
    df_results
    .style
    .format("{:.2%}")
    .map(lambda x: highlight_pvalue(x, test=test))
)

df_results_styled

Unnamed: 0,2001-2024,2002-2024,2003-2024,2004-2024,2005-2024,2006-2024,2007-2024,2008-2024,2009-2024,2010-2024,2011-2024,2012-2024,2013-2024,2014-2024
Austria,0.28%,0.04%,0.88%,1.40%,1.97%,3.36%,20.75%,8.23%,2.13%,2.07%,2.94%,6.71%,11.51%,13.16%
Belgium,75.93%,81.54%,60.21%,57.37%,0.02%,69.92%,81.11%,83.22%,97.69%,59.04%,100.00%,87.51%,72.18%,48.20%
Bulgaria,99.80%,87.39%,51.19%,82.18%,81.91%,85.38%,89.87%,89.40%,92.10%,93.01%,86.63%,58.24%,0.47%,0.05%
Cyprus,29.45%,5.16%,16.20%,16.64%,9.82%,87.54%,88.49%,17.27%,100.00%,98.43%,97.91%,94.43%,0.00%,2.89%
Czech Republic,99.32%,99.90%,99.90%,99.38%,0.00%,0.35%,0.00%,87.52%,85.69%,58.71%,65.05%,27.74%,6.84%,2.73%
FR. Germany,64.84%,31.34%,100.00%,99.82%,100.00%,0.12%,6.90%,36.32%,0.45%,0.94%,30.30%,41.51%,18.55%,6.17%
Denmark,31.84%,23.78%,43.43%,44.80%,0.04%,0.39%,0.00%,0.12%,0.01%,0.67%,0.00%,0.01%,18.56%,0.00%
Euro area (20 countries),99.36%,99.36%,99.36%,99.36%,99.36%,99.36%,99.36%,99.36%,99.36%,96.25%,95.80%,90.48%,39.42%,22.15%
Spain,19.26%,5.43%,55.78%,11.52%,37.77%,7.73%,51.15%,72.49%,98.67%,95.98%,95.17%,84.97%,49.71%,7.46%
Estonia,20.08%,26.76%,5.63%,0.02%,2.33%,40.23%,36.52%,40.50%,40.51%,0.00%,0.26%,4.55%,5.77%,13.46%


In [10]:
# Test unemployment (options: adf, kpsss, zivot)
test = 'zivot'

df_results = stationarity_testing(unemployment_df.loc['2000':'2024'], test=test)

df_results_styled = (
    df_results
    .style
    .format("{:.2%}")
    .map(lambda x: highlight_pvalue(x, test=test))
)

df_results_styled

Unnamed: 0,2001-2024,2002-2024,2003-2024,2004-2024,2005-2024,2006-2024,2007-2024,2008-2024,2009-2024,2010-2024,2011-2024,2012-2024,2013-2024,2014-2024
Austria,25.77%,0.30%,5.42%,9.82%,11.07%,19.40%,14.50%,44.41%,21.22%,38.01%,50.00%,50.00%,0.00%,50.00%
Belgium,3.31%,6.73%,50.00%,9.78%,50.00%,nan%,nan%,1.15%,50.00%,50.00%,50.00%,51.04%,99.09%,99.59%
Bulgaria,0.07%,9.11%,0.07%,50.00%,nan%,nan%,50.00%,50.00%,0.00%,50.00%,50.00%,98.31%,99.39%,96.21%
Cyprus,50.00%,50.00%,94.89%,50.00%,nan%,50.00%,50.00%,nan%,nan%,nan%,0.00%,50.00%,50.00%,50.00%
Czech Republic,0.00%,0.03%,50.00%,0.01%,50.00%,50.00%,50.00%,nan%,50.00%,50.00%,50.00%,50.00%,99.39%,99.12%
FR. Germany,50.00%,50.00%,93.49%,0.00%,50.00%,50.00%,50.00%,50.00%,22.93%,0.83%,87.97%,50.00%,50.00%,50.00%
Denmark,50.00%,50.00%,nan%,50.00%,0.12%,50.00%,nan%,0.00%,nan%,50.00%,50.00%,0.00%,50.00%,50.00%
Euro area (20 countries),nan%,nan%,nan%,nan%,nan%,nan%,nan%,nan%,nan%,nan%,90.54%,50.00%,50.00%,50.00%
Spain,50.00%,50.00%,50.00%,50.00%,50.00%,50.00%,50.00%,50.00%,nan%,50.00%,50.00%,50.00%,50.00%,50.00%
Estonia,56.85%,63.57%,65.95%,59.86%,50.00%,nan%,50.00%,nan%,nan%,75.79%,0.00%,50.00%,0.00%,94.91%
