Based on paper: https://d-nb.info/1248317343/34

## Data Scrapping

### 1. Output Gap Data extraction

First I got the quarterly GDP for the period [Office for National Statistics]:
https://www.ons.gov.uk/economy/grossdomesticproductgdp/timeseries/ybha/qna

I got the yearly output gap [Office for Budget Responsibility (OBR)]
https://obr.uk/public-finances-databank-2024-25/

Using the quarterly estimates developed [OBR: Output gap measurement: judgement and uncertainty] I replicated the shape of the quarterly output gaps in %.
https://obr.uk/docs/dlm_uploads/WorkingPaperNo5.pdf

In [1]:
import pandas as pd
import datetime as dt

# Use the raw URL from the GitHub repository
xlsx_url = "https://raw.githubusercontent.com/guri99uy/ST449_Project/52611de9d475e711c4c917c4d5ca137427404612/outputgap.xlsx"


# Load the Excel file
df_outputgap = pd.read_excel(xlsx_url, engine='openpyxl')  # Ensure you specify the 'openpyxl' engine for .xlsx files

# Define a function to parse QQYYYY
def parse_qqyyyy(qqyyyy):
    # Extract the quarter and year
    quarter = int(qqyyyy[1])
    year = int(qqyyyy[2:])
    
    # Map the quarter to the first month of that quarter
    quarter_start_month = {1: 1, 2: 4, 3: 7, 4: 10}
    month = quarter_start_month[quarter]
    
    # Create a datetime object for the first day of the quarter
    return dt.datetime(year, month, 1)

# Apply the function to the first column 'QQYYYY' to convert it to datetime
df_outputgap['QQYYYY'] = df_outputgap['QQYYYY'].apply(parse_qqyyyy)
# Rename a single column, e.g., 'OldName' to 'NewName'
df_outputgap.rename(columns={'QQYYYY': 'Date'}, inplace=True)

#Get Date in Quarters
df_outputgap['Date'] = pd.to_datetime(df_outputgap['Date'])
df_outputgap['Quarter'] = df_outputgap['Date'].dt.to_period('Q')
df_outputgap = df_outputgap.drop(columns=['Date'])

df_outputgap['GDP_Pot (m£)'] = df_outputgap['GDP_Pot (m£)'].round(0).astype(int)
df_outputgap['Output_gap (%)'] = df_outputgap['Output_gap (%)'].round(2)

# Display the first few rows of the transformed DataFrame
print(df_outputgap.head())
print(df_outputgap.tail())

   GDP_Real (m£)  GDP_Pot (m£)  Output_gap (%) Quarter
0         127119        130233            2.45  1987Q3
1         129815        133288            2.68  1987Q4
2         133283        137215            2.95  1988Q1
3         136630        141576            3.62  1988Q2
4         140801        145602            3.41  1988Q3
    GDP_Real (m£)  GDP_Pot (m£)  Output_gap (%) Quarter
77         372900        372629           -0.07  2006Q4
78         376958        378202            0.33  2007Q1
79         386144        387920            0.46  2007Q2
80         389291        392366            0.79  2007Q3
81         392244        396777            1.16  2007Q4


### 2. Interest Rate
Got .xlsx file from [Bank of Engalnd]
https://www.bankofengland.co.uk/boeapps/database/Bank-Rate.asp


In [2]:
import pandas as pd
import datetime as dt

# Raw URL of the Excel file
url = "https://raw.githubusercontent.com/guri99uy/ST449_Project/7715079b32be2ea0b9e2e77a3f7b81244f85720f/Bank_Rate.xlsx"
df_interest_rate = pd.read_excel(url, engine='openpyxl')


# Rename columns for easier access (optional)
df_interest_rate.columns = ['Date', 'Interest_rate']

# Convert the 'Date_Changed' column to datetime format
def parse_date(date_str):
    # Handle the format '07 Nov 24' as 'DD MMM YY'
    return dt.datetime.strptime(date_str, '%d %b %y')

df_interest_rate['Date'] = df_interest_rate['Date'].apply(parse_date)

# Check if 'Rate' column is string type, and process accordingly
if df_interest_rate['Interest_rate'].dtype == 'object':
    # Clean the 'Rate' column (replace commas with dots and convert to float)
    df_interest_rate['Interest_rate'] = df_interest_rate['Rate'].str.replace(',', '.').astype(float)
else:
    # Ensure the 'Rate' column is numeric
    df_interest_rate['Interest_rate'] = pd.to_numeric(df_interest_rate['Interest_rate'], errors='coerce')

# Display the processed DataFrame
print("\nEvery Interest rate by Bank of England:")
print(df_interest_rate.head())




Every Interest rate by Bank of England:
        Date  Interest_rate
0 2024-11-07           4.75
1 2024-08-01           5.00
2 2023-08-03           5.25
3 2023-06-22           5.00
4 2023-05-11           4.50


Lets process the data to: 
1. Get the quarter average
2. Assign missing quarters with the last value

In [3]:
import pandas as pd

# Assuming df_interest_rate is the DataFrame with 'Date_Changed' and 'Rate'
# Ensure 'Date_Changed' is a datetime column
df_interest_rate['Date'] = pd.to_datetime(df_interest_rate['Date'])

# Create a column for the quarter and year as strings for grouping
df_interest_rate['Quarter'] = df_interest_rate['Date'].dt.to_period('Q')

# Group by the 'Quarter' column and calculate the average interest rate
quarterly_avg_rate = (
    df_interest_rate.groupby('Quarter', as_index=False)['Interest_rate']
    .mean()
    .rename(columns={'Interest_rate': 'Avg_Interest_Rate'})
)

full_quarters = pd.period_range('1975Q1', '2007Q4', freq='Q')
quarterly_avg_rate['Quarter'] = pd.PeriodIndex(quarterly_avg_rate['Quarter'], freq='Q')
quarterly_avg_rate = quarterly_avg_rate.set_index('Quarter').reindex(full_quarters)

# Fill missing values with the value from the previous quarter
quarterly_avg_rate['Avg_Interest_Rate'] = quarterly_avg_rate['Avg_Interest_Rate'].ffill()
quarterly_avg_rate.reset_index(inplace=True)
quarterly_avg_rate.rename(columns={'index': 'Quarter'}, inplace=True)

# Filter 1997 - 2007
Quarterly_interest_rates = quarterly_avg_rate[
    (quarterly_avg_rate['Quarter'] >= '1987Q3') & (quarterly_avg_rate['Quarter'] <= '2007Q4')
]
Quarterly_interest_rates.reset_index(inplace=True)
Quarterly_interest_rates = Quarterly_interest_rates.drop(columns=['index'])

# Display
print(Quarterly_interest_rates.head())



  Quarter  Avg_Interest_Rate
0  1987Q3              9.880
1  1987Q4              8.880
2  1988Q1              8.630
3  1988Q2              8.080
4  1988Q3             10.755


### 3. Inflation
Source?
Relevant comments:


In [4]:
import pandas as pd

# GitHub raw URL for inflation
url = "https://raw.githubusercontent.com/guri99uy/ST449_Project/c87d1b581f0af98f2a813a9c6134160303e74883/inf_Data.csv"
inflation = pd.read_csv(url)

# Rename columns
inf_data = inflation.rename(columns={"Implied GDP deflator at market prices: SA Index": "GDP Deflator"})
inf_data.rename(columns={"Title": "Quarter"}, inplace=True)

# Change Quarter
inf_data["Quarter"] = inf_data["Quarter"].str.replace(r"(\d{4})\sQ(\d)", r"\1Q\2", regex=True)

print(inf_data.head())


  Quarter  GDP Deflator
0  1987Q3       35.8724
1  1987Q4       36.2206
2  1988Q1       36.5950
3  1988Q2       37.3205
4  1988Q3       37.9849


### 4. Merge relevant data
1. Output Gap
2. Interest rate
3. Inflation
   

In [5]:
# Convert 'Quarter' column in all datasets to period type
Quarterly_interest_rates['Quarter'] = pd.PeriodIndex(Quarterly_interest_rates['Quarter'], freq='Q')
df_outputgap['Quarter'] = pd.PeriodIndex(df_outputgap['Quarter'], freq='Q')
inf_data['Quarter'] = pd.PeriodIndex(inf_data['Quarter'], freq='Q')

# Merge the datasets
merged_df = pd.merge(Quarterly_interest_rates, df_outputgap, on='Quarter', how='inner')  # Inner join
merged_df = pd.merge(merged_df, inf_data, on='Quarter', how='inner')  # Inner join

# Display the merged DataFrame
print(merged_df.head())


print(merged_df.tail())

  Quarter  Avg_Interest_Rate  GDP_Real (m£)  GDP_Pot (m£)  Output_gap (%)  \
0  1987Q3              9.880         127119        130233            2.45   
1  1987Q4              8.880         129815        133288            2.68   
2  1988Q1              8.630         133283        137215            2.95   
3  1988Q2              8.080         136630        141576            3.62   
4  1988Q3             10.755         140801        145602            3.41   

   GDP Deflator  
0       35.8724  
1       36.2206  
2       36.5950  
3       37.3205  
4       37.9849  
   Quarter  Avg_Interest_Rate  GDP_Real (m£)  GDP_Pot (m£)  Output_gap (%)  \
75  2006Q2               4.50         367042        366712           -0.09   
76  2006Q3               4.75         370883        370824           -0.02   
77  2006Q4               5.00         372900        372629           -0.07   
78  2007Q1               5.25         376958        378202            0.33   
79  2007Q2               5.50         3

# Linear Model

In [6]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

In [19]:
# Load your dataset
# Assuming `merged_df` is already prepared as described
# merged_df = pd.read_csv('path_to_your_dataset.csv')

# Define Variables
merged_df['Lag_y1'] = merged_df['Output_gap (%)'].shift(1)
merged_df['Lag_y2'] = merged_df['Output_gap (%)'].shift(2)
merged_df['Lag_pi1'] = merged_df['GDP Deflator'].shift(1)
merged_df['Lag_pi2'] = merged_df['GDP Deflator'].shift(2)
merged_df['Lag_i1'] = merged_df['Avg_Interest_Rate'].shift(1)
merged_df['Lag_i2'] = merged_df['Avg_Interest_Rate'].shift(2)

# Drop rows with NaN values created due to lagging
merged_df = merged_df.dropna()

print(merged_df)

   Quarter  Avg_Interest_Rate  GDP_Real (m£)  GDP_Pot (m£)  Output_gap (%)  \
4   1988Q3          10.755000         140801        145602            3.41   
5   1988Q4          12.880000         144969        149469            3.10   
6   1989Q1          12.880000         148623        152205            2.41   
7   1989Q2          13.750000         151262        153833            1.70   
8   1989Q3          13.823333         155370        156753            0.89   
..     ...                ...            ...           ...             ...   
75  2006Q2           4.500000         367042        366712           -0.09   
76  2006Q3           4.750000         370883        370824           -0.02   
77  2006Q4           5.000000         372900        372629           -0.07   
78  2007Q1           5.250000         376958        378202            0.33   
79  2007Q2           5.500000         386144        387920            0.46   

    GDP Deflator  Lag_y1  Lag_y2  Lag_pi1  Lag_pi2  Lag_i1  Lag

In [8]:
# Linear Model (SVAR) Implementation

X_y = merged_df[['Lag_y1', 'Lag_pi1', 'Lag_i1', 'Lag_i2']]
y_y = merged_df['Output_gap (%)']

X_pi = merged_df[['Output_gap (%)', 'Lag_y1', 'Lag_y2', 'Lag_pi1', 'Lag_pi2', 'Lag_i1']]
y_pi = merged_df['GDP Deflator']

model_y = LinearRegression()
model_pi = LinearRegression()

    # Fit Models
model_y.fit(X_y, y_y)
model_pi.fit(X_pi, y_pi)


In [34]:
"""# Linear Model (SVAR) Implementation
def linear_model():
    X_y = merged_df[['Lag_y1', 'Lag_pi1', 'Lag_i1', 'Lag_i2']]
    y_y = merged_df['Output_gap (%)']

    X_pi = merged_df[['Output_gap (%)', 'Lag_y1', 'Lag_y2', 'Lag_pi1', 'Lag_pi2', 'Lag_i1']]
    y_pi = merged_df['GDP Deflator']

    model_y = LinearRegression()
    model_pi = LinearRegression()

    # Fit Models
    model_y.fit(X_y, y_y)
    model_pi.fit(X_pi, y_pi)

    def predict_y(input_data):
        return model_y.predict(input_data)

    def predict_pi(input_data):
        return model_pi.predict(input_data)

    return predict_y, predict_pi

model_y, model_pi = linear_model()"""

"# Linear Model (SVAR) Implementation\ndef linear_model():\n    X_y = merged_df[['Lag_y1', 'Lag_pi1', 'Lag_i1', 'Lag_i2']]\n    y_y = merged_df['Output_gap (%)']\n\n    X_pi = merged_df[['Output_gap (%)', 'Lag_y1', 'Lag_y2', 'Lag_pi1', 'Lag_pi2', 'Lag_i1']]\n    y_pi = merged_df['GDP Deflator']\n\n    model_y = LinearRegression()\n    model_pi = LinearRegression()\n\n    # Fit Models\n    model_y.fit(X_y, y_y)\n    model_pi.fit(X_pi, y_pi)\n\n    def predict_y(input_data):\n        return model_y.predict(input_data)\n\n    def predict_pi(input_data):\n        return model_pi.predict(input_data)\n\n    return predict_y, predict_pi\n\nmodel_y, model_pi = linear_model()"

In [20]:
#EXAMPLE USAGE

# Suppose we have a single new observation (just as an example)
# For predicting the output gap (y):
new_data_y = pd.DataFrame({
    'Lag_y1': [0.5],
    'Lag_pi1': [1.9],
    'Lag_i1': [2.0],
    'Lag_i2': [1.8]
})

# For predicting the GDP Deflator (pi):
new_data_pi = pd.DataFrame({
    'Output_gap (%)': [0.6],
    'Lag_y1': [0.5],
    'Lag_y2': [0.4],
    'Lag_pi1': [1.9],
    'Lag_pi2': [1.8],
    'Lag_i1': [2.0]
})

# Get predictions
predicted_output_gap = model_y.predict(new_data_y)
predicted_gdp_deflator = model_pi.predict(new_data_pi)

print("example usage")
print("Predicted Output Gap:", predicted_output_gap)
print("Predicted GDP Deflator:", predicted_gdp_deflator)


example usage
Predicted Output Gap: [2.03781251]
Predicted GDP Deflator: [2.4236471]


# ANN

In [10]:
import torch
import torch.nn as nn
import torch.optim as optim

class ANN(nn.Module):
    def __init__(self, input_dim, hidden_dim):
        super(ANN, self).__init__()
        self.hidden = nn.Linear(input_dim, hidden_dim)
        self.output = nn.Linear(hidden_dim, 1)
        self.activation = nn.Tanh()

    def forward(self, x):
        x = self.activation(self.hidden(x))
        x = self.output(x)
        return x

# Nonlinear Model (ANN with PyTorch)
def nonlinear_model():
    X_y = merged_df[['Lag_y1', 'Lag_pi1', 'Lag_i1', 'Lag_i2']].values
    y_y = merged_df['Output_gap (%)'].values

    X_pi = merged_df[['Output_gap (%)', 'Lag_y1', 'Lag_y2', 'Lag_pi1', 'Lag_pi2', 'Lag_i1']].values
    y_pi = merged_df['GDP Deflator'].values

    # Split Data
    X_y_train, X_y_val, y_y_train, y_y_val = train_test_split(X_y, y_y, test_size=0.15, random_state=42)
    X_pi_train, X_pi_val, y_pi_train, y_pi_val = train_test_split(X_pi, y_pi, test_size=0.15, random_state=42)

    # Initialize Models
    model_y = ANN(input_dim=X_y.shape[1], hidden_dim=3)
    model_pi = ANN(input_dim=X_pi.shape[1], hidden_dim=4)

    # Loss Function and Optimizer
    criterion = nn.MSELoss()
    optimizer_y = optim.Adam(model_y.parameters(), lr=0.01)
    optimizer_pi = optim.Adam(model_pi.parameters(), lr=0.01)

    # Convert data to PyTorch tensors
    X_y_train_tensor, y_y_train_tensor = torch.tensor(X_y_train, dtype=torch.float32), torch.tensor(y_y_train, dtype=torch.float32)
    X_pi_train_tensor, y_pi_train_tensor = torch.tensor(X_pi_train, dtype=torch.float32), torch.tensor(y_pi_train, dtype=torch.float32)

    # Train Output Gap Model
    for epoch in range(100):
        model_y.train()
        optimizer_y.zero_grad()
        predictions = model_y(X_y_train_tensor).squeeze()
        loss = criterion(predictions, y_y_train_tensor)
        loss.backward()
        optimizer_y.step()

    # Train Inflation Model
    for epoch in range(100):
        model_pi.train()
        optimizer_pi.zero_grad()
        predictions = model_pi(X_pi_train_tensor).squeeze()
        loss = criterion(predictions, y_pi_train_tensor)
        loss.backward()
        optimizer_pi.step()

    def predict_y(input_data):
        input_tensor = torch.tensor(input_data, dtype=torch.float32)
        with torch.no_grad():
            return model_y(input_tensor).numpy()

    def predict_pi(input_data):
        input_tensor = torch.tensor(input_data, dtype=torch.float32)
        with torch.no_grad():
            return model_pi(input_tensor).numpy()

    return predict_y, predict_pi

model_y_ann, model_pi_ann = nonlinear_model()


# Environment

In [22]:
import gymnasium as gym
import numpy as np
import pandas as pd
from gymnasium.spaces import box

class PaperBasedEconomyEnv(gym.Env):
    def __init__(
        self,
        df,
        model_y,            # e.g. a LinearRegression() for Output Gap
        model_pi,           # e.g. a LinearRegression() for Inflation
        lookback_periods=2,
        inflation_target=2.0,
        output_gap_target=0.0
    ):
        super(PaperBasedEconomyEnv, self).__init__()

        # Store initial DataFrame and reset index
        self.df = df.reset_index(drop=True)
        self.model_y = model_y       # The trained model for predicting Output Gap
        self.model_pi = model_pi     # The trained model for predicting Inflation
        self.lookback_periods = lookback_periods
        self.inflation_target = inflation_target
        self.output_gap_target = output_gap_target

        # Column references for readability
        self.cols = {
            'inflation': 'GDP Deflator',
            'output_gap': 'Output_gap (%)',
            'interest_rate': 'Avg_Interest_Rate'
        }

        # Action space: choose the next interest rate
        self.action_space = box.Box(low=0.0, high= 20.0, shape=(1,), dtype=np.float32)

        # Observation space: for each of the last N periods, we store
        # [inflation, output_gap, interest_rate], plus 1 extra for interest_rate(t-1)
        obs_space_size = 3 * self.lookback_periods
        self.observation_space = box.Box(
            low=-np.inf, high=np.inf, shape=(obs_space_size,), dtype=np.float32
        )

        self.current_idx = self.lookback_periods
        self.done = False
    def reset(self, seed=None, options=None):
        # e.g. np.random.seed(seed)
        super().reset(seed=seed)  # if the parent class has a reset with seeding

        self.current_idx = self.lookback_periods
        self.done = False

        obs = self._get_state()
        info = {}
        return obs, info


    def step(self, action):
        """
        1. Parse the current state
        2. Construct the feature vectors for each model
        3. Predict next_output_gap and next_inflation
        4. Compute reward
        5. Append new row to DataFrame
        6. Advance time index
        7. Return (state, reward, done, info)
        """
        if self.done:
            raise RuntimeError("Environment is done. Call reset().")

        # Action is the chosen interest rate for this step
        interest_rate = float(action[0])

        # Current state (shape: (3*lookback_periods + 1,))
        state = self._get_state()

        # ---------------------------------------------------------
        # 1) Parse the state for clarity
        #    Example for lookback=2:
        #    state = [
        #        inflation(t-2), output_gap(t-2), interest_rate(t-2),
        #        inflation(t-1), output_gap(t-1), interest_rate(t-1),
        #        interest_rate(t-1)    <-- appended
        #    ]
        # ---------------------------------------------------------
        # Let's label them (assuming lookback_periods=2)
        inflation_t2 = state[0]      # inflation(t-2)
        output_gap_t2 = state[1]     # output_gap(t-2)
        interest_rate_t2 = state[2]  # interest_rate(t-2)
        inflation_t1 = state[3]      # inflation(t-1)
        output_gap_t1 = state[4]     # output_gap(t-1)
        interest_rate_t1 = state[5]  # interest_rate(t-1)
        # state[6] is interest_rate(t-1) repeated in your original code
        # (You might want to adjust that logic—see explanation below.)

        # ---------------------------------------------------------
        # 2) Construct model_y features
        #    Based on how model_y was trained:
        #        X_y columns = ['Lag_y1', 'Lag_pi1', 'Lag_i1', 'Lag_i2']
        #
        #    This implies:
        #       Lag_y1 = output_gap(t-1)
        #       Lag_pi1 = inflation(t-1)
        #       Lag_i1 = interest_rate(t-1)
        #       Lag_i2 = interest_rate(t-2)
        # ---------------------------------------------------------
        features_y = pd.DataFrame({
            'Lag_y1': [output_gap_t1],
            'Lag_pi1': [inflation_t1],
            'Lag_i1': [interest_rate_t1],
            'Lag_i2': [interest_rate_t2]
        })

        # Predict the next output gap
        next_output_gap = self.model_y.predict(features_y)[0]

        # ---------------------------------------------------------
        # 3) Construct model_pi features
        #    Based on how model_pi was trained:
        #        X_pi columns = [
        #           'Output_gap (%)',
        #           'Lag_y1', 'Lag_y2',
        #           'Lag_pi1', 'Lag_pi2',
        #           'Lag_i1'
        #        ]
        #
        #    This implies:
        #       'Output_gap (%)' = *current* output gap used for next inflation
        #       Lag_y1 = output_gap(t-1)
        #       Lag_y2 = output_gap(t-2)
        #       Lag_pi1 = inflation(t-1)
        #       Lag_pi2 = inflation(t-2)
        #       Lag_i1 = interest_rate(t-1)
        #
        #    Here, we have a choice to use the brand-new next_output_gap or
        #    the last known output_gap(t-1). In some setups, we feed the
        #    newly predicted output_gap back in. That is an economic modeling
        #    choice—just be consistent with your original training approach.
        # ---------------------------------------------------------
        features_pi = pd.DataFrame({
            'Output_gap (%)': [next_output_gap],  # or output_gap_t1 if you prefer
            'Lag_y1': [output_gap_t1],
            'Lag_y2': [output_gap_t2],
            'Lag_pi1': [inflation_t1],
            'Lag_pi2': [inflation_t2],
            'Lag_i1': [interest_rate_t1]
        })

        # Predict the next inflation
        next_inflation = self.model_pi.predict(features_pi)[0]

        # ---------------------------------------------------------
        # 4) Compute reward
        #    Negative MSE-like penalty from inflation & output gap deviation
        # ---------------------------------------------------------
        reward = -(
            0.5 * (next_inflation - self.inflation_target) ** 2
            + 0.5 * (next_output_gap - self.output_gap_target) ** 2
        )

        # ---------------------------------------------------------
        # 5) Append new row to the DataFrame
        # ---------------------------------------------------------
        new_row = {
            self.cols['inflation']: next_inflation,
            self.cols['output_gap']: next_output_gap,
            self.cols['interest_rate']: interest_rate
        }

        self.df = pd.concat([self.df, pd.DataFrame([new_row])], ignore_index=True)
        print(self.df.tail(10))

        # ---------------------------------------------------------
        # 6) Advance time index and check termination
        # ---------------------------------------------------------
        self.current_idx += 1
        self.done = (self.current_idx >= len(self.df))

        # 7) Return the new state, reward, done, truncated, info

        truncated = False

        return self._get_state(), reward, self.done, truncated, {}

    def _get_state(self):
        """
        Builds an array of shape (3*lookback_periods,).
        For lookback=2, we collect:
          [inflation(t-2), gap(t-2), rate(t-2),
           inflation(t-1), gap(t-1), rate(t-1),
           inflation(t), gap(t), rate(t)]
        """
        state = []
        for lag in range(self.lookback_periods, 0, -1):
            state.append(
                self.df[self.cols['inflation']].iloc[self.current_idx - lag]
            )
            state.append(
                self.df[self.cols['output_gap']].iloc[self.current_idx - lag]
            )
            state.append(
                self.df[self.cols['interest_rate']].iloc[self.current_idx - lag]
            )

        state.append(self.df[self.cols['inflation']].iloc[self.current_idx])
        state.append(self.df[self.cols['output_gap']].iloc[self.current_idx])
        state.append(self.df[self.cols['interest_rate']].iloc[self.current_idx])

        return np.array(state, dtype=np.float32)

# ----------------------------
# Example usage
# ----------------------------
if __name__ == "__main__":

    # Suppose model_y and model_pi are scikit-learn models, e.g.:
    # model_y = LinearRegression().fit(X_y, y_y)
    # model_pi = LinearRegression().fit(X_pi, y_pi)
    #
    # Or if you're returning functions (predict_y, predict_pi) from linear_model(),
    # you can wrap them in small classes that define a .predict() method.
    # For demonstration, let's assume they are already fitted regressions.
    class MockModel:
        def predict(self, X):
            # Some dummy logic; in reality, this would be your real model
            return np.ones(shape=(X.shape[0],)) * 0.3


    mock_model_y = MockModel()
    mock_model_pi = MockModel()

    env = PaperBasedEconomyEnv(merged_df.tail(6)['Quarter', ], mock_model_y, mock_model_pi, lookback_periods=2)
    state = env.reset()

    for step_i in range(10):
        action = env.action_space.sample()  # Random interest rate
        state, reward, done, truncated, info = env.step(action)
        print(f"Step={step_i}, State={state}, Reward={reward}, Truncated={truncated}, Done={done}")
        if done:
            break

#state = [inflation(t-2), gap(t-2), rate(t-2), inflation(t-1), gap(t-1), rate(t-1)]

  Quarter  Avg_Interest_Rate  GDP_Real (m£)  GDP_Pot (m£)  Output_gap (%)  \
0  2006Q1           4.500000       361213.0      361972.0            0.21   
1  2006Q2           4.500000       367042.0      366712.0           -0.09   
2  2006Q3           4.750000       370883.0      370824.0           -0.02   
3  2006Q4           5.000000       372900.0      372629.0           -0.07   
4  2007Q1           5.250000       376958.0      378202.0            0.33   
5  2007Q2           5.500000       386144.0      387920.0            0.46   
6     NaT          13.297031            NaN           NaN            0.30   

   GDP Deflator  Lag_y1  Lag_y2  Lag_pi1  Lag_pi2  Lag_i1  Lag_i2  
0       64.3541    0.37    0.22  63.9196  63.6163    4.50    4.50  
1       65.1635    0.21    0.37  64.3541  63.9196    4.50    4.50  
2       65.6609   -0.09    0.21  65.1635  64.3541    4.50    4.50  
3       65.6670   -0.02   -0.09  65.6609  65.1635    4.75    4.50  
4       65.7761   -0.07   -0.02  65.6670  6

# Train

In [38]:
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv

env = PaperBasedEconomyEnv(
    df=df,
    model_y=model_y,
    model_pi=model_pi,
    lookback_periods=2
)

In [39]:
venv = DummyVecEnv([lambda: env])  # wrap your environment

model = PPO(
    policy="MlpPolicy",
    env=venv,
    verbose=1,
    device = "cuda" if torch.cuda.is_available() else "cpu"
    # You can tune many hyperparameters here
)
model.learn(total_timesteps=10_000)


Using cuda device




-----------------------------
| time/              |      |
|    fps             | 288  |
|    iterations      | 1    |
|    time_elapsed    | 7    |
|    total_timesteps | 2048 |
-----------------------------
------------------------------------------
| time/                   |              |
|    fps                  | 238          |
|    iterations           | 2            |
|    time_elapsed         | 17           |
|    total_timesteps      | 4096         |
| train/                  |              |
|    approx_kl            | 9.887619e-06 |
|    clip_fraction        | 0            |
|    clip_range           | 0.2          |
|    entropy_loss         | -1.42        |
|    explained_variance   | 2.49e-05     |
|    learning_rate        | 0.0003       |
|    loss                 | 3.85e+08     |
|    n_updates            | 10           |
|    policy_gradient_loss | -1.58e-05    |
|    std                  | 1            |
|    value_loss           | 9.87e+08     |
----------------

<stable_baselines3.ppo.ppo.PPO at 0x22f3cdba930>

In [40]:
obs = venv.reset()
for i in range(20):
    action, _ = model.predict(obs)
    obs, rewards, dones, infos = venv.step(action)
    print(f"Step: {i}, Action: {action}, Reward: {rewards}, Done: {dones}")
    if dones[0]:  # If the environment has ended
        obs = venv.reset()

Step: 0, Action: [[1.2814724]], Reward: [-1.6167372], Done: [False]
Step: 1, Action: [[0.]], Reward: [-1.7719061], Done: [False]
Step: 2, Action: [[0.28407082]], Reward: [-1.9454159], Done: [False]
Step: 3, Action: [[0.6685879]], Reward: [-2.1372666], Done: [False]
Step: 4, Action: [[1.008123]], Reward: [-2.3474584], Done: [False]
Step: 5, Action: [[0.]], Reward: [-6.053782], Done: [False]
Step: 6, Action: [[0.]], Reward: [-5.4552526], Done: [False]
Step: 7, Action: [[0.8367362]], Reward: [-6.2043986], Done: [False]
Step: 8, Action: [[0.]], Reward: [-6.716893], Done: [False]
Step: 9, Action: [[0.04842628]], Reward: [-6.1741657], Done: [False]
Step: 10, Action: [[0.24442597]], Reward: [-11.368131], Done: [False]
Step: 11, Action: [[0.0120679]], Reward: [-11.839943], Done: [False]
Step: 12, Action: [[1.0403042]], Reward: [-11.57516], Done: [False]
Step: 13, Action: [[0.4320868]], Reward: [-13.421782], Done: [False]
Step: 14, Action: [[0.65328854]], Reward: [-12.391499], Done: [False]
Ste