# Identifying (volatility-)regimes in the EUR/USD spot exchange rate using clustering algorithms: An Oil and Gas Perspective on Parity Conditions.
Seminar in Applied Financial Economics: Applied Econometrics of FX Markets - Prof. Dr. Reitz
<br>
**Josef Fella and Robert Hennings**
<br>
Christian Albrechts University of Kiel
<br>
*josef.fella@stu.uni-kiel.de and robert.hennings@stu.uni-kiel.de*
<br>
GitHub: https://github.com/RobertHennings/Seminar
<br>
Kiel - 14.11.2025

**!!!DO NOT HIT EXECUTE ALL CELLS AS THE FITTING (SEE DOWN BELOW) WILL TAKE A LONG TIME AND WRITES FILES!!!**

**Group 08**
<br>
Josef Fella: stu245231, Quantitative Finance
<br>
Robert Hennings: stu236320, Quantitative Finance

## Outline
1. Research Hypothesis
<br>
1.1 Energy Commodity Price Shocks: The Pass-Through Effect and Implications for Monetary Policy
<br>
1.2 Research Hypothesis
<br>
2. Theoretical Framework
<br>
2.1 A simple model of exchange rates and commodity prices
<br>
2.2 Theoretical Framework
<br>
3. Model Results
<br>
3.1 Regime identification - Model comparison and selection
<br>
4. Conclusion and Discussion
<br>
4.1 Seminar Project Summary
<br>
4.2 Seminar Project Limitations
<br>
4.3 Future Research
<br>
5. Appendix
<br>
5.1 Abbreviations
<br>
5.2 Systematic Literature Overview: Main Approaches
<br>
5.3 Figures and Tables
<br>
5.4 Definitions and Data
<br>

## Short description of the notebook contents
The contents of this Jupyter notebook produce the main results for the chapter Model Results. It is based on the file model.py, that includes all the detailed data generating procedures, what have been skipped here in order to not cause confusion, if data can't be appropriatley loaded from the various sources due to a number of different potential reasons.
<br>
In this notebook the carried out clustering techniques are fitted to the data and the main inference is extracted.

Import dependencies/packages:

In [1]:
import os
import datetime as dt
import pandas as pd
import numpy as np

Set global config settings:

**!!!!CHANGE WORKING DIRECTORY HERE!!!!**

In [None]:
SEMINAR_PATH = r"/Users/Josef/Desktop/Seminar"
# Example: r"/Users/Robert_Hennings/Uni/Master/Seminar"

In [None]:
SEMINAR_CODE_PATH = rf"{SEMINAR_PATH}/src/seminar_code"
MODELS_PATH = rf"{SEMINAR_CODE_PATH}/models"
PRESENTATION_DATA = rf"{SEMINAR_PATH}/reports/presentation_latex_version/data"

# Change working directory to seminar code path
print(os.getcwd())
os.chdir(SEMINAR_CODE_PATH) # <- needed to be able to import the ModelObject class from utils
print(os.getcwd())

/Users/Robert_Hennings/Uni/Master/Seminar/notebooks
/Users/Robert_Hennings/Uni/Master/Seminar/src/seminar_code


In [None]:
from model.architecture import ModelObject # <- import the ModelObject class

Import all the used clustering algorithms, that are implemented in the library scikit-learn:

In [None]:
from sklearn.cluster import KMeans
from sklearn.cluster import AgglomerativeClustering
from sklearn.cluster import DBSCAN
from sklearn.cluster import MeanShift
from sklearn.mixture import GaussianMixture
from sklearn.cluster import Birch
from sklearn.cluster import AffinityPropagation
from sklearn.cluster import OPTICS
from sklearn.cluster import MiniBatchKMeans
from statsmodels.tsa.regime_switching.markov_regression import MarkovRegression
from sklearn.metrics import silhouette_score

## 1) Loading the data

Since we decided to test our approach using various datasets and comparing the results among them, we have to store all the benchmark datasets in a list, through which we will loop.

In [6]:
N_REGIMES = 2 # <- fix the number of regimes
spot_rate = ["EUR/USD"] # <- main variable to be used

file_name = r"chap_04_model_input_data_list.xlsx"
full_file_path = rf"{PRESENTATION_DATA}/{file_name}"
data_list = [] # as we have a few benchmarking datasets, these will be stored in a list
with pd.ExcelFile(full_file_path) as xls:
    for sheet_name in xls.sheet_names:
        data = pd.read_excel(xls, sheet_name=sheet_name, index_col=0)
        data_list.append(data)

# Ensure that every data item in the data_list is of type pd.DataFrame
for i, data in enumerate(data_list):
    if type(data) == pd.Series:
        data_list[i] = data.to_frame()

## 2) Fitting the clustering algorithms and saving the fitted models as well as inference

The following code section produces the main results of the seminar project, the algorithms are fitted to the (various datasets), the fitted model is saved as a .pkl (.pickle) file, and an accompanying .json file is saved alongside, containing all the necessary inference and transaprency parameters. Especially this enables the transparent reproduction of all presented results.

**!!!DO NOT EXECUTE THIS CELL BELOW AS THE FITTING WILL TAKE A LONG TIME AND WRITES FILES!!!**

In [None]:
for data in data_list:
    # Separate endog and exog variables
    endog_col_name = [col for col in data.columns if spot_rate[0] in col][0]
    endog = data[endog_col_name]
    # First determine if the input data has multiple features or is univariate
    if data.shape[1] > 1: # we have external variables
        exog_col_names = [col for col in data.columns if spot_rate[0] not in col]
        exog = data[exog_col_names]
    else: # no external variables to be set
        exog = None
    print(f"endog:\n{endog}\nexog: {exog}")
    # Set up the models
    kmeans = KMeans(n_clusters=N_REGIMES, random_state=42)
    agg = AgglomerativeClustering(n_clusters=N_REGIMES)
    dbscan = DBSCAN(eps=1.5, min_samples=8)
    ms = MeanShift()
    msm = MarkovRegression(
        endog=endog,
        exog=exog,
        k_regimes=N_REGIMES,
        trend='c',  # or 'nc' for no constant
        switching_trend=True,
        switching_exog=True,
        switching_variance=True,
    )
    gmm = GaussianMixture(n_components=N_REGIMES, random_state=42)
    birch = Birch(n_clusters=N_REGIMES)
    affinity = AffinityPropagation()
    optics = OPTICS()
    minibatch_kmeans = MiniBatchKMeans(n_clusters=N_REGIMES, random_state=42)
    # In the below dict, additional fit kwargs for each model class can be specified
    # that will be passed to the fit() method of the respective model class
    fit_kwargs_dict = {
        "KMeans": {"n_init": 10},
        "AgglomerativeClustering": {},
        "DBSCAN": {},
        "MeanShift": {},
        "MarkovRegression": {"em_iter": 10, "search_reps": 20},
        "MarkovAutoregression": {"em_iter": 10, "search_reps": 20},
    }
    # fit_kwargs_dict = optimised_fit_kwargs_dict
    # We store them in a list that we will loop through
    models_list = [kmeans, agg, dbscan, ms, msm, gmm, birch, affinity, optics, minibatch_kmeans]
    for model in models_list:
        # 1) Initialize a new instance for each model class
        model_object_instance = ModelObject() # Initialize a new instance for each model
        # 2) Set the model
        model_object_instance.set_model_object(model_object=model)
        # 3) Set the data - Here we only want an In-sample comparison
        # therefore train and test data are the same
        model_object_instance.set_data(
            training_data=data,
            testing_data=data
        )
        # 4) Fit the model
        # based on the model class name extract additional parameters for the fit
        model_name = model.__class__.__name__
        fit_kwargs = fit_kwargs_dict.get(model_name, {})
        model_object_instance.fit(**fit_kwargs)
        # 5) Predict the labels - In sample forecast
        predicted_labels = model_object_instance.predict()
        # 6) Evaluate the model - pick the desired score to evaluate
        # Here we could also think of providing a list with multiple functions at once
        evaluation_score = model_object_instance.evaluate(metric_function_list=[silhouette_score])
        # Save the model and the model info with dynamic names based on the model class name
        timestamp = dt.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
        file_name = f"{model.__class__.__name__}_{timestamp}"
        # First save the fitted model object itself to a .pkl file - we can also read it in later
        model_object_instance.save_model(
            file_format=r".pkl",
            file_path=MODELS_PATH,
            model_file_name=file_name
            )
        # Second save all the related metadata/information about the model - from here the relevant metadata will be pulled
        model_object_instance.get_full_model_info(
            save=True,
            return_info_dict=False,
            file_format=r".json",
            file_path=MODELS_PATH,
            file_name=file_name
            )

## 3) Extract the inference from the saved model summary files

From all the saved model inference files (.json files), the actual predicted (in-sample) class labels (either 0 or 1) are extracted and saved, this file is here below read in and presented.
<br>
<br>
NOTE: The fitted clustering algorithms are represented as the columns and the values are either 0 or 1. It is expected that fro some columns we have a lot of NaN values, since the datasets have different dimensions (i.e. length/number of observations), but the gathering function is designed in a way that all results can be joined no matter their dimensions to enable an easier presentation of the results.

In [27]:
file_name = r"chap_04_predicted_labels_df.xlsx"
full_file_path = rf"{PRESENTATION_DATA}/{file_name}"
predicted_labels_df = pd.read_excel(full_file_path, index_col=0)
print(predicted_labels_df.head(n=30))

            Birch_2025-10-15 17:20:24  \
1983-11-14                        NaN   
1983-11-15                        NaN   
1983-11-16                        NaN   
1983-11-17                        NaN   
1983-11-18                        NaN   
1983-11-21                        NaN   
1983-11-22                        NaN   
1983-11-23                        NaN   
1983-11-24                        NaN   
1983-11-25                        NaN   
1983-11-28                        NaN   
1983-11-29                        NaN   
1983-11-30                        NaN   
1983-12-01                        NaN   
1983-12-02                        NaN   
1983-12-05                        NaN   
1983-12-06                        NaN   
1983-12-07                        NaN   
1983-12-08                        NaN   
1983-12-09                        NaN   
1983-12-12                        NaN   
1983-12-13                        NaN   
1983-12-14                        NaN   
1983-12-15      

## 4) Testing the regime conditional standard UIP-relationship

Next, after having read in the class labels (either 0 or 1) for every (historical) day, we just have to separate the standard UIP-data based on these class labels into subsections and run the standard UIP-regression for all the cases/algorithms.

In [None]:
file_name = r"chap_04_uip_data_df.xlsx"
full_file_path = rf"{PRESENTATION_DATA}/{file_name}"
uip_data_df = pd.read_excel(full_file_path, index_col=0)

# Define a simple wrapper function for the statsmodels OLS regression
def run_uip_regression(
        dep_var: str,
        indep_var: str,
        data: pd.DataFrame,
        cov_type: str="nonrobust",
        use_t: bool=True
        ):
    import statsmodels.api as sm
    X = sm.add_constant(data[indep_var])
    y = data[dep_var]
    model = sm.OLS(y, X).fit(cov_type=cov_type, use_t=use_t)
    return model

**!!!NOTE: Use robust standard-errors in the UIP-regression!!!**

In [None]:
# Relevel the predicted labels df to have the same index as the UIP data df
predicted_labels_df = predicted_labels_df.reindex(uip_data_df.index).dropna()
uip_data_df = uip_data_df.reindex(predicted_labels_df.index).dropna()
# Now run the UIP regression for each identified regime
currency_pairs = ["EUR"]
regime_uip_results = {}
for model_name in predicted_labels_df.columns:
    regime_uip_results[model_name] = {}
    for regime in predicted_labels_df[model_name].unique():
        regime_data = uip_data_df[predicted_labels_df[model_name] == regime]
        regime_uip_results[model_name][regime] = {}
        for currency in currency_pairs:
            dep_var = f'{currency}'
            indep_var = f'i_diff_{currency}'
            if len(regime_data) < 10:  # Skip regimes with too few data points
                continue
            model = run_uip_regression(dep_var=dep_var, indep_var=indep_var, data=regime_data, cov_type="HC1")
            regime_uip_results[model_name][regime][currency] = model

In [31]:
# Extract the estimated coefficients and save them in a master table along the model name and regime
uip_identified_regimes_results_list = []
for model_name, regimes in regime_uip_results.items():
    for regime, currencies in regimes.items():
        for currency, model in currencies.items():
            estimated_params_df = pd.DataFrame(model.summary().tables[1].data)
            estimated_params_df.columns = estimated_params_df.iloc[0]
            estimated_params_df = estimated_params_df[1:]
            estimated_params_df.columns = ["param"] + estimated_params_df.columns[1:].tolist()
            # Transfer all columns to numeric where possible
            estimated_params_df = estimated_params_df.apply(pd.to_numeric, errors='ignore')
            estimated_params_df["model_name"] = model_name
            estimated_params_df["regime"] = regime
            uip_identified_regimes_results_list.append(estimated_params_df)
# Save the results
uip_identified_regimes_results_df = pd.concat(uip_identified_regimes_results_list, axis=0).reset_index(drop=True)
# Disentangle the confidence upper and lower columns
uip_identified_regimes_results_df = uip_identified_regimes_results_df.rename(
    columns={
        "[0.025": "ci_lower",
        "0.975]": "ci_upper",
    })
print(uip_identified_regimes_results_df)
# Each algorithm produces four rows that belong together: For each of
# the two regimes: the constant parameter from the regression and the interest
# rate differential parameter.


errors='ignore' is deprecated and will raise in a future version. Use to_numeric without passing `errors` and catch exceptions explicitly instead


errors='ignore' is deprecated and will raise in a future version. Use to_numeric without passing `errors` and catch exceptions explicitly instead


errors='ignore' is deprecated and will raise in a future version. Use to_numeric without passing `errors` and catch exceptions explicitly instead


errors='ignore' is deprecated and will raise in a future version. Use to_numeric without passing `errors` and catch exceptions explicitly instead


errors='ignore' is deprecated and will raise in a future version. Use to_numeric without passing `errors` and catch exceptions explicitly instead


`kurtosistest` p-value may be inaccurate with fewer than 20 observations; only n=13 observations were given.


errors='ignore' is deprecated and will raise in a future version. Use to_numeric without passing `errors` and catch exceptions explicitly instead




           param          coef   std err      t  P>|t|  ci_lower  ci_upper  \
0          const -1.311000e-05  0.000075 -0.176   0.86 -0.000000  0.000000   
1     i_diff_EUR  1.500000e-05  0.000055  0.273  0.785 -0.000093  0.000000   
2          const  4.000000e-04  0.001000  0.567  0.572 -0.001000  0.002000   
3     i_diff_EUR -2.000000e-04  0.000000 -0.412  0.681 -0.001000  0.001000   
4          const  4.955000e-07  0.000000  0.001  0.999 -0.001000  0.001000   
...          ...           ...       ...    ...    ...       ...       ...   
2447  i_diff_EUR -3.000000e-04  0.000000 -1.059  0.291 -0.001000  0.000000   
2448       const -1.000000e-04  0.000000 -1.169  0.243 -0.000000  0.000090   
2449  i_diff_EUR  2.000000e-04  0.000079  2.016  0.044  0.000004  0.000000   
2450       const  9.020000e-05  0.000098  0.924  0.356 -0.000000  0.000000   
2451  i_diff_EUR -2.000000e-04  0.000076 -1.983  0.047 -0.000000 -0.000002   

                                       model_name  regime  
0  


errors='ignore' is deprecated and will raise in a future version. Use to_numeric without passing `errors` and catch exceptions explicitly instead


errors='ignore' is deprecated and will raise in a future version. Use to_numeric without passing `errors` and catch exceptions explicitly instead


errors='ignore' is deprecated and will raise in a future version. Use to_numeric without passing `errors` and catch exceptions explicitly instead


errors='ignore' is deprecated and will raise in a future version. Use to_numeric without passing `errors` and catch exceptions explicitly instead

