<a href="https://colab.research.google.com/github/anunknownpleasure/Pricing-assets-with-deep-learning/blob/main/Asset_pricing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Explanation of the Notebook

This notebook demonstrates an approach to asset pricing using a Generative Adversarial Network (GAN) framework, incorporating Fama-French factors and macroeconomic indicators. The goal is to learn a Stochastic Discount Factor (SDF) that can price a set of test assets (Fama-French 25 portfolios).

## 1. Installing Libraries

This section installs the necessary Python libraries for data fetching, manipulation, and model building.
- `getFamaFrenchFactors`: To download Fama-French factor data.
- `pandas_datareader`: To download macroeconomic data from FRED and Fama-French portfolio data.
- `lxml`: A dependency for `pandas_datareader`.
- `optuna`: For hyperparameter tuning.

In [None]:
!pip install -q getFamaFrenchFactors pandas_datareader lxml optuna

## 2. Data Import and Preprocessing

This section handles the loading and initial processing of the required financial and macroeconomic data.

### 2a. Importing Fama-French 5 factor data

We import the daily Fama-French 5 factors (Market-Risk Free, Small-Minus-Big, High-Minus-Low, Robust-Minus-Weak, Conservative-Minus-Aggressive) and the Risk-Free Rate. The factors are adjusted from percentage to decimal form.

In [None]:
import numpy as np
import pandas as pd
import getFamaFrenchFactors

print("Successfully imported getFamaFrenchFactors!")
print(dir(getFamaFrenchFactors))

Successfully imported getFamaFrenchFactors!


In [None]:
from getFamaFrenchFactors import famaFrench5Factor


# Get the factors
factors_df = famaFrench5Factor()

# Adjust from percentage
factors_df[['Mkt-RF', 'SMB', 'HML', 'RF']] = factors_df[['Mkt-RF', 'SMB', 'HML', 'RF']] / 100



In [None]:
factors_df.head(20)

Unnamed: 0,date_ff_factors,Mkt-RF,SMB,HML,RMW,CMA,RF
0,1963-07-31,-3.9e-05,-4.8e-05,-8.1e-05,0.0064,-0.0115,2.7e-05
1,1963-08-31,0.000508,-8e-05,0.00017,0.004,-0.0038,2.5e-05
2,1963-09-30,-0.000157,-4.3e-05,0.0,-0.0078,0.0015,2.7e-05
3,1963-10-31,0.000254,-0.000134,-4e-06,0.0279,-0.0225,2.9e-05
4,1963-11-30,-8.6e-05,-8.5e-05,0.000173,-0.0043,0.0227,2.7e-05
5,1963-12-31,0.000183,-0.000189,-2.1e-05,0.0012,-0.0025,2.9e-05
6,1964-01-31,0.000227,1e-05,0.000163,0.0021,0.0148,3e-05
7,1964-02-29,0.000155,3.3e-05,0.000281,0.0011,0.0081,2.6e-05
8,1964-03-31,0.000141,0.000141,0.000329,-0.0203,0.0298,3.1e-05
9,1964-04-30,1.1e-05,-0.000148,-5.4e-05,-0.0132,-0.0113,2.9e-05


We adjust the start date to -1-31-1964

In [None]:
FFdata = factors_df.iloc[6:]
FFdata['date_ff_factors'] = pd.to_datetime(FFdata['date_ff_factors'])
FFdata = FFdata.set_index('date_ff_factors')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  FFdata['date_ff_factors'] = pd.to_datetime(FFdata['date_ff_factors'])


### 2b. Importing the macroeconomic data

Macroeconomic indicators are fetched from the FRED database using `pandas_datareader`. The selected indicators include measures of the term spread, default spread, industrial production, unemployment rate, and consumer sentiment. The data is aligned to a monthly frequency and forward-filled to handle missing values.

In [None]:
import pandas_datareader.data as web

# --- 1. Define the 5 Long-History Macro Indicators (FRED Tickers) ---
long_macro_tickers = {
    'Term_Spread': 'T10YFFM',      # 10-Yr Yield minus Fed Funds Rate
    'Default_Spread': 'AAAFFM',     # Baa Corp Yield minus Aaa Corp Yield (Risk Aversion Proxy)
    'Ind_Production': 'INDPRO',     # Industrial Production Index
    'Unemployment': 'UNRATE',       # Civilian Unemployment Rate
    'Consumer_Sentiment': 'UMCSENT' # University of Michigan Consumer Sentiment
}

# --- 2. Define the Time Period
start_date = '1964-01-01'
end_date = '2025-08-31'

# --- 3. Fetch Data from FRED ---
try:
    macro_data = web.DataReader(
        list(long_macro_tickers.values()),
        'fred',
        start=start_date,
        end=end_date
    )
    macro_data.columns = list(long_macro_tickers.keys())

except Exception as e:
    print(f"Error fetching data from FRED: {e}")
    macro_data = pd.DataFrame(index=pd.date_range(start_date, end_date, freq='M'))

# --- 4. Align Data to Monthly Frequency ---

# 4a. Forward-Fill any monthly gaps (common in macro data)
macro_data = macro_data.ffill()

# 4b. Ensure all data points are at the end of the month for clean alignment
macro_data = macro_data.resample('ME').last()


# --- 5. Display the result ---
print(f"Macro Data Imported and Aligned ({len(macro_data)} periods, starting {macro_data.index.min().strftime('%Y-%m')}):")
print(macro_data.head())
print("\n... and the tail:")
print(macro_data.tail())

# The resulting 'macro_data' DataFrame is ready for merging with Fama-French data.

Macro Data Imported and Aligned (740 periods, starting 1964-01):
            Term_Spread  Default_Spread  Ind_Production  Unemployment  \
DATE                                                                    
1964-01-31         0.69            0.91         27.7409           5.6   
1964-02-29         0.67            0.88         27.9291           5.4   
1964-03-31         0.79            0.95         27.9291           5.4   
1964-04-30         0.76            0.93         28.3861           5.3   
1964-05-31         0.70            0.91         28.5474           5.1   

            Consumer_Sentiment  
DATE                            
1964-01-31                 NaN  
1964-02-29                99.5  
1964-03-31                99.5  
1964-04-30                99.5  
1964-05-31                98.5  

... and the tail:
            Term_Spread  Default_Spread  Ind_Production  Unemployment  \
DATE                                                                    
2025-04-30        -0.05    

In [None]:
print(macro_data.shape, FFdata.shape)

(740, 5) (740, 6)


### 2c. Importing the FF-portfolios

The Fama-French 25 portfolios sorted by size and book-to-market are imported. These portfolios serve as the test assets that the learned SDF will attempt to price. Missing values are dropped, and returns are converted to decimal form. The index is converted to a datetime object representing the end of the month.

In [None]:
# --- 1. Define Time Period ---
# Must match the start date used for your FF factors and macro data (e.g., 1964-01-01)
start_date = '1964-01-01'
end_date = '2025-08-31'

# --- 2. Fetch the 25 Portfolios (Size x Book-to-Market) ---
# The data is downloaded as a dictionary object
ff_portfolio = web.DataReader(
    '25_Portfolios_5x5',
    'famafrench',
    start=start_date,
    end=end_date
)

df_returns_25 = ff_portfolio[0]

df_returns_25 = df_returns_25.replace([-99.99, -999], np.nan) # Missing values are indicated by -99.99 or -999
df_returns_25.dropna(inplace=True)

df_returns_25 = (df_returns_25/100)# Converting from percentage to fraction

# Convert PeriodIndex to DatetimeIndex at the end of the month
df_returns_25.index = df_returns_25.index.to_timestamp(how = 'end').date




  ff_portfolio = web.DataReader(
  ff_portfolio = web.DataReader(
  ff_portfolio = web.DataReader(
  ff_portfolio = web.DataReader(
  ff_portfolio = web.DataReader(
  ff_portfolio = web.DataReader(
  ff_portfolio = web.DataReader(
  ff_portfolio = web.DataReader(
  ff_portfolio = web.DataReader(
  ff_portfolio = web.DataReader(


### 2d. Combining all the data into a DataFrame

The Fama-French factors, macroeconomic data, and portfolio returns are merged into a single pandas DataFrame based on their date index. Any rows with missing values after the merge are dropped.

In [None]:
# Making the index into a datetime object
macro_data.index = pd.to_datetime(macro_data.index)


# Combine the dataframes using merge on the index

combined_data_FF_macro = pd.merge(FFdata, macro_data, left_index=True, right_index=True, how='inner') # Combining FF and Macro. Dropping first row because of a null value

combined_data = pd.merge(combined_data_FF_macro, df_returns_25, left_index=True, right_index=True, how='inner')

combined_data.dropna(inplace=True) # Dropping NaN entries

# Display the combined data
print("Combined Data:")
display(combined_data.head())
print("\n... and the tail:")
display(combined_data.tail())

Combined Data:


Unnamed: 0_level_0,Mkt-RF,SMB,HML,RMW,CMA,RF,Term_Spread,Default_Spread,Ind_Production,Unemployment,...,ME4 BM1,ME4 BM2,ME4 BM3,ME4 BM4,ME4 BM5,BIG LoBM,ME5 BM2,ME5 BM3,ME5 BM4,BIG HiBM
date_ff_factors,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1964-02-29,0.000155,3.3e-05,0.000281,0.0011,0.0081,2.6e-05,0.67,0.88,27.9291,5.4,...,0.025943,0.015619,0.028444,0.072047,0.046121,0.018271,0.005232,0.010194,0.039989,0.037567
1964-03-31,0.000141,0.000141,0.000329,-0.0203,0.0298,3.1e-05,0.79,0.95,27.9291,5.4,...,0.01775,0.029767,0.052497,0.071287,0.007247,0.011575,0.007635,0.036237,0.038382,0.001491
1964-04-30,1.1e-05,-0.000148,-5.4e-05,-0.0132,-0.0113,2.9e-05,0.76,0.93,28.3861,5.3,...,-0.027045,0.003434,0.019784,-0.026384,-0.022805,0.002272,0.014745,0.008082,-0.009054,0.024147
1964-05-31,0.000141,-6.2e-05,0.000181,-0.0015,0.0013,2.6e-05,0.7,0.91,28.5474,5.1,...,0.011914,0.022992,0.013559,0.013281,0.04099,0.020599,0.003304,0.011776,0.042859,0.033968
1964-06-30,0.000127,1.3e-05,6.8e-05,-0.0033,0.001,3e-05,0.67,0.91,28.628,5.2,...,0.010927,0.014771,0.011035,0.024965,0.031119,0.009744,0.028644,0.004437,0.013682,0.024217



... and the tail:


Unnamed: 0_level_0,Mkt-RF,SMB,HML,RMW,CMA,RF,Term_Spread,Default_Spread,Ind_Production,Unemployment,...,ME4 BM1,ME4 BM2,ME4 BM3,ME4 BM4,ME4 BM5,BIG LoBM,ME5 BM2,ME5 BM3,ME5 BM4,BIG HiBM
date_ff_factors,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2025-04-30,-8.4e-05,-0.000186,-0.00034,-0.0285,-0.0267,3.5e-05,-0.05,1.12,103.6224,4.2,...,-0.008766,-0.012699,-0.020146,-0.039276,-0.072668,0.014106,-0.030129,-0.073867,-0.013472,-0.027941
2025-05-31,0.000606,-7.2e-05,-0.000288,0.0126,0.0251,3.8e-05,0.09,1.21,103.657,4.2,...,0.062577,0.050222,0.035353,0.081175,0.065826,0.078077,0.061296,0.018407,0.026156,0.065684
2025-06-30,0.000486,-2e-06,-0.00016,-0.0319,0.0145,3.4e-05,0.05,1.13,104.2115,4.1,...,0.020351,0.043192,0.024175,0.073815,0.058024,0.055279,0.062451,0.047405,0.036424,0.070109
2025-07-31,0.000198,-1.5e-05,-0.000127,-0.0029,-0.0207,3.4e-05,0.06,1.12,103.8194,4.2,...,0.034206,0.021766,0.009013,-0.001069,-0.019303,0.032949,0.014068,0.012224,0.002333,-0.013744
2025-08-31,0.000185,0.000488,0.000441,-0.0069,0.0207,3.8e-05,-0.07,1.02,103.9203,4.3,...,0.036524,0.016164,0.023481,0.058848,0.071342,0.01161,0.011927,0.030567,0.054257,0.090799


In [None]:
combined_data.columns

Index(['Mkt-RF', 'SMB', 'HML', 'RMW', 'CMA', 'RF', 'Term_Spread',
       'Default_Spread', 'Ind_Production', 'Unemployment',
       'Consumer_Sentiment', 'SMALL LoBM', 'ME1 BM2', 'ME1 BM3', 'ME1 BM4',
       'SMALL HiBM', 'ME2 BM1', 'ME2 BM2', 'ME2 BM3', 'ME2 BM4', 'ME2 BM5',
       'ME3 BM1', 'ME3 BM2', 'ME3 BM3', 'ME3 BM4', 'ME3 BM5', 'ME4 BM1',
       'ME4 BM2', 'ME4 BM3', 'ME4 BM4', 'ME4 BM5', 'BIG LoBM', 'ME5 BM2',
       'ME5 BM3', 'ME5 BM4', 'BIG HiBM'],
      dtype='object')

In [None]:
FF_columns = combined_data.columns[:6]
macro_columns = combined_data.columns[6:11]
portfolio_columns = combined_data.columns[11:]

no_of_FF_features = len(FF_columns)
no_macro_features = len(macro_columns)
no_of_portfolios = len(portfolio_columns)




### 2e. Data Processing

The combined data is scaled using `StandardScaler` to have zero mean and unit variance. This is a common preprocessing step for neural networks.

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()


processed_data = scaler.fit_transform(combined_data)
processed_data[:5]

processed_data.shape


(739, 36)

### 2f. Creating rolling windows for LSTM input

A `rolling_window` function is defined to create sequential data for the LSTM layer in the Generator network. Macroeconomic data is processed into 12-month lookback windows. The Fama-French factors and portfolio returns are aligned with the end of these windows.

We create rolling windows for LSTM input

In [None]:
# Function to create rolling windows on a timeseries data

def rolling_window(data, lookback):
  x_rolling = []
  for i in range(len(data) - lookback):
    x_rolling.append(data[i: i + lookback])

  return np.array(x_rolling)

lookback = 12
ff_data = processed_data[:, :no_of_FF_features]
macro_data = processed_data[:, no_of_FF_features: no_of_FF_features + no_macro_features]
portfolio_data = processed_data[:, no_of_FF_features + no_macro_features:]

# Creating rolling window on macro data
X_macro_rolled = rolling_window(macro_data, lookback)

# Aligning the rolling data with FF factor and Portfolio data
X_ff_aligned = ff_data[lookback:]
Y_targets_aligned = portfolio_data[lookback:]




### 2g. Creating Dataset and DataLoaders

A custom PyTorch `Dataset` (`AssetPricingDataset`) is created to handle the macro time series, Fama-French factors, and target portfolio returns. The data is split into training, validation, and testing sets. `DataLoader` objects are created to efficiently load data in batches during training.

In [None]:
# Creating Dataset

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, Subset # Import Subset

class AssetPricingDataset(Dataset):

    def __init__(self, macro_data, ff_data, target_data):
        # 1. Convert all data to PyTorch tensors
        #    We use .float() for all data as they are continuous variables.
        self.X_macro = torch.tensor(macro_data).float()
        self.X_ff = torch.tensor(ff_data).float()
        self.Y_targets = torch.tensor(target_data).float()

    def __len__(self):
        return len(self.X_macro)

    def __getitem__(self, idx):
        return {
            'macro_X': self.X_macro[idx],    # Shape: [12, 5]
            'ff_X': self.X_ff[idx],          # Shape: [6]
            'target_Y': self.Y_targets[idx]  # Shape: [25]
        }

data = AssetPricingDataset(X_macro_rolled, X_ff_aligned, Y_targets_aligned)
no_of_samples = len(data)
train_size = int(len(data)*0.8)
val_size = int(len(data)*0.1)


# Use Subset to create train and test datasets
indices = list(range(no_of_samples))
train_indices = indices[:train_size]
val_indices = indices[train_size: train_size+val_size]
test_indices = indices[train_size+val_size:]

train_data = Subset(data, train_indices)
val_data = Subset(data, val_indices)
test_data = Subset(data, test_indices)


BATCH_SIZE = 64

train_dataset = DataLoader(dataset = train_data, batch_size=BATCH_SIZE, shuffle = True, drop_last = True)
val_dataset = DataLoader(dataset = val_data, batch_size=BATCH_SIZE, shuffle = True, drop_last = True)
test_dataset = DataLoader(dataset = test_data, batch_size=BATCH_SIZE, shuffle = False, drop_last = False)

# Model Architecture

This section defines the architecture of the Generator and Discriminator networks within the GAN framework.

## Generator (SDF Network)

The Generator takes macroeconomic time series (processed by an LSTM) and Fama-French factors as input. It outputs portfolio weights that define the Stochastic Discount Factor (SDF). The SDF is constructed as $M = 1 - \omega'R$, where $\omega$ are the predicted weights and $R$ are the asset returns.

## Discriminator (Conditioning Network)

The Discriminator takes the hidden state from the Generator's LSTM and the Fama-French factors as input. It outputs conditioning instruments ($g$) used in the no-arbitrage pricing condition. The no-arbitrage condition states that the expected value of the product of the SDF and any asset return, conditional on information $g$, is zero: $E[R_i M | g] = 0$.

In [None]:
class Generator(nn.Module):
    """
    Generator (SDF Network)

    Inputs:
    - macro_X: Macro time series [batch, seq_len, macro_dim]
    - ff_X: Fama-French factors [batch, ff_dim]

    Output:
    - omega: SDF portfolio weights [batch, num_assets]
    - h_t: Hidden macro states [batch, hidden_dim]
    """
    def __init__(self, macro_dim, ff_dim, hidden_dim, lstm_layers, num_assets):
        super().__init__()

        # LSTM for processing macro time series
        self.lstm = nn.LSTM(
            input_size=macro_dim,
            hidden_size=macro_dim,
            num_layers=lstm_layers,
            batch_first=True,
            dropout=0.05 if lstm_layers > 1 else 0
        )

        # Feedforward network: [hidden_states + FF_factors] -> portfolio_weights
        self.fc1 = nn.Linear(macro_dim + ff_dim, hidden_dim)
        self.bn1 = nn.BatchNorm1d(hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, hidden_dim // 2)
        self.bn2 = nn.BatchNorm1d(hidden_dim // 2)

        # Output layer: portfolio weights for each asset
        self.fc3 = nn.Linear(hidden_dim//2, num_assets)

        self.activation = nn.ReLU()
        self.dropout = nn.Dropout(0.05)

    def forward(self, macro_X, ff_X):
        # Process macro time series with LSTM
        _, (h_n, c_n) = self.lstm(macro_X)
        h_t = h_n[-1]  # Take last hidden state

        # Combine macro states with FF factors
        combined = torch.cat([h_t, ff_X], dim=1)

        # Feedforward layers
        x = self.fc1(combined)
        x = self.bn1(x)
        x = self.activation(x)
        x = self.dropout(x)

        x = self.fc2(x)
        x = self.bn2(x)
        x = self.activation(x)
        x = self.dropout(x)

        # Output: portfolio weights (omega)
        omega = self.fc3(x)

        return omega, h_t


class Discriminator(nn.Module):
    """
    Discriminator (Conditioning Network)

    Inputs:
    - h_t: Hidden macro states [batch, hidden_dim]
    - ff_X: FF factors [batch, ff_dim]

    Output:
    - g: Conditioning instruments [batch, num_instruments]
    """
    def __init__(self, hidden_dim, ff_dim, num_instruments, hidden_layer):
        super().__init__()

        input_dim = hidden_dim + ff_dim

        self.fc1 = nn.Linear(input_dim, hidden_layer)
        self.bn1 = nn.BatchNorm1d(hidden_layer)
        self.fc2 = nn.Linear(hidden_layer, hidden_layer // 2)
        self.bn2 = nn.BatchNorm1d(hidden_layer // 2)
        self.fc3 = nn.Linear(hidden_layer // 2, num_instruments)
        self.activation = nn.ReLU()
        self.dropout = nn.Dropout(0.05)

    def forward(self, h_t, ff_X):
        # Combine macro states with FF factors
        x = torch.cat([h_t, ff_X], dim=1)

        x = self.fc1(x)
        x = self.bn1(x)
        x = self.activation(x)
        x = self.dropout(x)

        x = self.fc2(x)
        x = self.bn2(x)
        x = self.activation(x)
        x = self.dropout(x)

        # Output: conditioning instruments
        g = torch.tanh(self.fc3(x))  # Bounded in [-1, 1]

        return g

# Loss Function

This section defines the loss functions used to train the Generator and Discriminator networks.

- `calculate_SDF`: Computes the Stochastic Discount Factor given portfolio weights and returns.
- `no_arbitrage_loss`: Calculates the pricing errors based on the no-arbitrage condition $E[R_i M g] = 0$. The loss is the mean squared pricing error.
- `discriminator_loss`: The loss function for the Discriminator, which aims to maximize the no-arbitrage loss plus a regularization term.
- `generator_loss`: The loss function for the Generator, which aims to minimize the no-arbitrage loss plus a regularization term on the weights.

In [None]:
# Loss function

def calculate_SDF(weights, returns):

  weighted_returns = (weights*returns).sum(dim=1, keepdim=True)
  sdf_m = 1 - weighted_returns

  return sdf_m

def no_arbitrage_loss(weights, returns, g_instruments, num_assets):

  sdf = calculate_SDF(weights, returns)
  g_expanded = g_instruments.mean(dim=1, keepdim=True).expand(-1, num_assets)
  pricing_errors = sdf*returns*g_expanded
  mean_errors = pricing_errors.mean(dim = 0)
  loss = (mean_errors**2).mean()
  return loss


def discriminator_loss(weights, returns, g_instruments, num_assets):

  na_loss = no_arbitrage_loss(weights, returns, g_instruments, num_assets)
  D_regularization_term = 0.01*(g_instruments**2).mean()

  return -na_loss + D_regularization_term

def generator_loss(weights, returns, g_instruments, num_assets):

  na_loss = no_arbitrage_loss(weights, returns, g_instruments, num_assets)
  G_regularization_term = 0.01*(weights**2).mean()

  return na_loss + G_regularization_term



# Model Training

This section implements the training loop for the GAN. The Discriminator is trained to maximize the no-arbitrage loss, while the Generator is trained to minimize it. The training process alternates between updating the Discriminator and the Generator.

In [None]:
import torch.optim as optim
# training loop

def training(generator, discriminator, train_loader, epochs, d_lr, g_lr):
  D_optim = optim.Adam(discriminator.parameters(), lr = d_lr)
  G_optim = optim.Adam(generator.parameters(), lr = g_lr)

  d_losses = []
  g_losses = []

  generator.train()
  discriminator.train()

  for epoch in range(epochs):
    epoch_g_loss = 0
    epoch_d_loss = 0
    num_batches = 0

    for batch in train_loader:
      macro_X = batch['macro_X']
      ff_X = batch['ff_X']
      returns = batch['target_Y']

    # Training the discriminator (trained twice)
      for i in range(2):
        D_optim.zero_grad()

        # Generate SDF weights
        weights, h_t = generator(macro_X, ff_X)

        # Generate conditioning instruments
        g_inst = discriminator(h_t.detach(), ff_X)

        # Compute discriminator loss (weights are detached to above updating generator weights)
        d_loss = discriminator_loss(weights.detach(), returns, g_inst, no_of_portfolios)

        # Backprop on discriminator
        d_loss.backward()
        torch.nn.utils.clip_grad_norm_(discriminator.parameters(), max_norm=1.0)
        D_optim.step()


    # Training the generator
      G_optim.zero_grad()
    # Generate SDF weights
      weights, h_t = generator(macro_X, ff_X)

    # Conditioning instruments
      g_inst = discriminator(h_t.detach(), ff_X)
    # Computer generator loss
      g_loss = generator_loss(weights, returns, g_inst.detach(), no_of_portfolios)
    # Backprop on generator
      g_loss.backward()
      torch.nn.utils.clip_grad_norm_(generator.parameters(), max_norm=1.0)
      G_optim.step()

      epoch_g_loss += g_loss.item()
      epoch_d_loss += d_loss.item()
      num_batches += 1

    # Average losses in an epoch

    avg_g_loss = epoch_g_loss / num_batches
    avg_d_loss = epoch_d_loss / num_batches

    g_losses.append(avg_g_loss)
    d_losses.append(avg_d_loss)

    # Moniter the loss
    if (epoch + 1) % 50 == 0:
      print(f'Epoch [{epoch+1}/{epochs}], G Loss: {avg_g_loss:.6f}, D Loss: {avg_d_loss:.6f}')

  return g_losses, d_losses



# Baseline Models

This section defines baseline models for comparison with the GAN.

## 1. Fama-French linear model

This baseline estimates the traditional Fama-French 5-factor model for each portfolio and calculates metrics like Sharpe Ratio and average absolute alpha.



In [None]:
from sklearn.linear_model import LinearRegression

def baseline_fama_french_5(train_loader, test_loader):
    """
    Baseline 1: Fama-French 5-Factor Model

    Traditional factor model:
    R_i,t = α_i + β'·F_t + ε_i,t
    """
    print("\n" + "="*60)
    print("BASELINE 1: Fama-French 5-Factor Model")
    print("="*60)

    # Collect all data
    train_returns = []
    train_ff = []
    test_returns = []
    test_ff = []

    for batch in train_loader:
        train_returns.append(batch['target_Y'].numpy())
        train_ff.append(batch['ff_X'].numpy())

    for batch in test_loader:
        test_returns.append(batch['target_Y'].numpy())
        test_ff.append(batch['ff_X'].numpy())

    train_returns = np.concatenate(train_returns, axis=0)
    train_ff = np.concatenate(train_ff, axis=0)
    test_returns = np.concatenate(test_returns, axis=0)
    test_ff = np.concatenate(test_ff, axis=0)

    # Use first 5 FF factors (excluding RF)
    train_ff = train_ff[:, :5]  # Mkt-RF, SMB, HML, RMW, CMA
    test_ff = test_ff[:, :5]

    # Estimate factor loadings for each portfolio
    betas = []
    alphas = []

    for i in range(train_returns.shape[1]):
        model = LinearRegression()
        model.fit(train_ff, train_returns[:, i])
        betas.append(model.coef_)
        alphas.append(model.intercept_)

    betas = np.array(betas)
    alphas = np.array(alphas)

    # Create SDF = mean-variance efficient combination of factors
    mean_factors = train_ff.mean(axis=0)
    cov_factors = np.cov(train_ff.T)

    try:
        weights = np.linalg.solve(cov_factors, mean_factors)
        train_sdf_returns = (train_ff @ weights).flatten()
        test_sdf_returns = (test_ff @ weights).flatten()
    except:
        # If singular, use equal weights
        train_sdf_returns = train_ff.mean(axis=1)
        test_sdf_returns = test_ff.mean(axis=1)

    # Compute metrics
    train_sharpe = (train_sdf_returns.mean() / train_sdf_returns.std()) * np.sqrt(12)
    test_sharpe = (test_sdf_returns.mean() / test_sdf_returns.std()) * np.sqrt(12)
    avg_alpha = np.abs(alphas).mean()

    print(f"Train Sharpe Ratio: {train_sharpe:.4f}")
    print(f"Test Sharpe Ratio: {test_sharpe:.4f}")
    print(f"Average |Alpha|: {avg_alpha:.6f}")

    return {
        'train_sharpe': train_sharpe,
        'test_sharpe': test_sharpe,
        'avg_alpha': avg_alpha,
        'train_returns': train_sdf_returns,
        'test_returns': test_sdf_returns
    }

## 2. Mean-Variance portfolio

This baseline constructs a simple mean-variance efficient portfolio based on the historical mean and covariance of the portfolio returns in the training data.

In [None]:
def baseline_linear_mv(train_loader, test_loader):
    """
    Baseline 2: Simple Linear Mean-Variance Portfolio

    Maximize: ω'μ - λ·ω'Σω
    """
    print("\n" + "="*60)
    print("BASELINE 2: Linear Mean-Variance Portfolio")
    print("="*60)

    # Collect returns
    train_returns = []
    test_returns = []

    for batch in train_loader:
        train_returns.append(batch['target_Y'].numpy())

    for batch in test_loader:
        test_returns.append(batch['target_Y'].numpy())

    train_returns = np.concatenate(train_returns, axis=0)
    test_returns = np.concatenate(test_returns, axis=0)

    # Estimate mean and covariance
    mean_ret = train_returns.mean(axis=0)
    cov_mat = np.cov(train_returns.T)

    # Solve for tangency portfolio
    try:
        inv_cov = np.linalg.inv(cov_mat)
        weights = inv_cov @ mean_ret
        weights = weights / np.abs(weights).sum()
    except:
        weights = np.ones(train_returns.shape[1]) / train_returns.shape[1]

    # Portfolio returns
    train_portfolio_ret = (train_returns @ weights).flatten()
    test_portfolio_ret = (test_returns @ weights).flatten()

    # Compute Sharpe ratios
    train_sharpe = (train_portfolio_ret.mean() / train_portfolio_ret.std()) * np.sqrt(12)
    test_sharpe = (test_portfolio_ret.mean() / test_portfolio_ret.std()) * np.sqrt(12)

    print(f"Train Sharpe Ratio: {train_sharpe:.4f}")
    print(f"Test Sharpe Ratio: {test_sharpe:.4f}")
    print(f"Portfolio weights range: [{weights.min():.4f}, {weights.max():.4f}]")

    return {
        'train_sharpe': train_sharpe,
        'test_sharpe': test_sharpe,
        'weights': weights,
        'train_returns': train_portfolio_ret,
        'test_returns': test_portfolio_ret
    }


# Evaluation

This section defines metrics and evaluates the performance of the trained models.

## 1. Evaluation metrics

The `evaluate_GAN` function calculates several metrics for the GAN model on a given dataset:
- Sharpe Ratio: Measures the risk-adjusted return of the portfolio implied by the SDF weights.
- Cross-sectional R²: Measures how well the model-implied expected returns explain the variation in average realized returns across portfolios.
- Mean Absolute Pricing Error, Max Pricing Error, Mean Squared Pricing Error: Measure the magnitude of the deviations from the no-arbitrage condition.



In [None]:
def evaluate_GAN(generator, test_loader):

    generator.eval()
    all_sdf = []
    all_returns = []
    all_predictions = []

    with torch.no_grad():
        for batch in test_loader:
            macro = batch['macro_X']
            char = batch['ff_X']
            returns = batch['target_Y']

            # Get SDF weights
            sdf_weights, _ = generator(macro, char)

            # Construct SDF
            sdf = 1 - (sdf_weights * returns).sum(dim=1)

            all_sdf.append(sdf.numpy())
            all_returns.append(returns.numpy())
            all_predictions.append(sdf_weights.numpy())

    sdf = np.concatenate(all_sdf)
    returns = np.concatenate(all_returns)
    predictions = np.concatenate(all_predictions)

    # Compute metrics
    results = {}

    # 1. Sharpe Ratio
    portfolio_returns = (predictions * returns).sum(axis=1)
    results['sharpe_ratio'] = (portfolio_returns.mean() / portfolio_returns.std()) * np.sqrt(12)

    # 2. Cross-sectional R²
    mean_returns = returns.mean(axis=0)
    pred_returns = predictions.mean(axis=0) @ mean_returns
    results['cross_sectional_r2'] = 1 - np.sum((mean_returns - pred_returns)**2) / np.sum((mean_returns - mean_returns.mean())**2)

    # 3. Pricing errors
    pricing_errors = (sdf.reshape(-1, 1) * returns).mean(axis=0)
    results['mean_abs_pricing_error'] = np.abs(pricing_errors).mean()
    results['max_pricing_error'] = np.abs(pricing_errors).max()
    results['mean_squared_pricing_error'] = (pricing_errors**2).mean()

    return results

## 2. Model training and Evaluation

This section trains the GAN model and evaluates its performance on the training, validation, and test datasets using the defined metrics. It also evaluates the baseline models for comparison.

In [None]:
# Checking the stabilization of training loss vs no of epoch
import matplotlib.pyplot as plt

macro_dim = no_macro_features
ff_dim = no_of_FF_features
num_assets = no_of_portfolios

# Hyperparameters
hidden_dim = 16
lstm_layers = 4
hidden_layer = 16
epochs = [500,1500,2000]
d_lr = 1e-4
g_lr = 1e-4

for epoch in epochs:

  generator = Generator(macro_dim, ff_dim, hidden_dim, lstm_layers, num_assets)

  discriminator =  Discriminator(macro_dim, ff_dim, num_assets, hidden_layer)

  g_losses, d_losses = training(generator, discriminator, train_dataset, epoch, d_lr, g_lr)

  plt.figure(figsize=(10, 5))
  plt.plot(range(epoch), d_losses, label='Discriminator Loss')
  plt.plot(range(epoch), g_losses, label='Generator Loss')
  plt.xlabel('Epoch')
  plt.ylabel('Loss')
  plt.title(f'Discriminator and Generator Loss over {epoch} Epochs')
  plt.legend()
  plt.grid(True)
  plt.show()


In [None]:
Gan_train = evaluate_GAN(generator, train_dataset)

print(f"Train Sharpe Ratio: {Gan_train['sharpe_ratio']:.4f}")
print(f"Train Cross-sectional R²: {Gan_train['cross_sectional_r2']:.4f}")
print(f"Train Mean Absolute Pricing Error: {Gan_train['mean_abs_pricing_error']:.6f}")
print(f"Train Max Pricing Error: {Gan_train['max_pricing_error']:.6f}")
print(f"Train Mean Squared Pricing Error: {Gan_train['mean_squared_pricing_error']:.6f}")

NameError: name 'generator' is not defined

In [None]:
Gan_test = evaluate_GAN(generator, test_dataset)

print(f"Test Sharpe Ratio: {Gan_test['sharpe_ratio']:.4f}")
print(f"Cross-sectional R²: {Gan_test['cross_sectional_r2']:.4f}")
print(f"Mean Absolute Pricing Error: {Gan_test['mean_abs_pricing_error']:.6f}")
print(f"Max Pricing Error: {Gan_test['max_pricing_error']:.6f}")
print(f"Mean Squared Pricing Error: {Gan_test['mean_squared_pricing_error']:.6f}")
#

In [None]:
# Evaluate Fama-French Baseline
ff_results = baseline_fama_french_5(train_dataset, test_dataset)

print("\nFama-French 5-Factor Model Results:")
print(f"  Train Sharpe Ratio: {ff_results['train_sharpe']:.4f}")
print(f"  Test Sharpe Ratio: {ff_results['test_sharpe']:.4f}")
print(f"  Average |Alpha|: {ff_results['avg_alpha']:.6f}")


# Evaluate Mean-Variance Baseline
mv_results = baseline_linear_mv(train_dataset, test_dataset)

print("\nLinear Mean-Variance Portfolio Results:")
print(f"  Train Sharpe Ratio: {mv_results['train_sharpe']:.4f}")
print(f"  Test Sharpe Ratio: {mv_results['test_sharpe']:.4f}")
print(f"  Portfolio weights range: [{mv_results['weights'].min():.4f}, {mv_results['weights'].max():.4f}]")


BASELINE 1: Fama-French 5-Factor Model
Train Sharpe Ratio: 0.2137
Test Sharpe Ratio: -0.4991
Average |Alpha|: 0.011956

Fama-French 5-Factor Model Results:
  Train Sharpe Ratio: 0.2137
  Test Sharpe Ratio: -0.4991
  Average |Alpha|: 0.011956

BASELINE 2: Linear Mean-Variance Portfolio
Train Sharpe Ratio: 0.4009
Test Sharpe Ratio: -0.9752
Portfolio weights range: [-0.1320, 0.1561]

Linear Mean-Variance Portfolio Results:
  Train Sharpe Ratio: 0.4009
  Test Sharpe Ratio: -0.9752
  Portfolio weights range: [-0.1320, 0.1561]


# Hyperparameter Tuning

This section uses the Optuna library to tune the hyperparameters of the Generator and Discriminator networks to maximize the Sharpe Ratio on the validation set.

In [None]:
import optuna

def objective(trial):
  # Define these variables within the objective function for scope
  macro_dim = no_macro_features
  ff_dim = no_of_FF_features
  num_assets = no_of_portfolios


  hidden_dim = trial.suggest_categorical('hidden_dim', [32, 64, 128])
  lstm_layers = trial.suggest_categorical('lstm_layers', [2, 3, 4])
  hidden_layer = trial.suggest_categorical('hidden_layer', [32, 64, 128])
  epochs = trial.suggest_categorical('epochs', [1500, 2000])
  d_lr = trial.suggest_loguniform('d_lr', 1e-5, 1e-3)
  g_lr = trial.suggest_loguniform('g_lr', 1e-5, 1e-3)

  generator = Generator(macro_dim, ff_dim, hidden_dim, lstm_layers, num_assets)
  discriminator = Discriminator(macro_dim, ff_dim, num_assets, hidden_layer)

  g_losses, d_losses = training(generator, discriminator, train_dataset, epochs, d_lr, g_lr)

  Gan_val = evaluate_GAN(generator, val_dataset)

  return Gan_val['sharpe_ratio']

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=10)

best_params = study.best_params

[I 2025-11-07 07:34:29,814] A new study created in memory with name: no-name-279f9994-1dce-4e11-b38d-e37dfa32b041
  d_lr = trial.suggest_loguniform('d_lr', 1e-5, 1e-3)
  g_lr = trial.suggest_loguniform('g_lr', 1e-5, 1e-3)


Epoch [50/2000], G Loss: 0.246461, D Loss: -0.236593
Epoch [100/2000], G Loss: 0.098232, D Loss: -0.056987
Epoch [150/2000], G Loss: 0.039525, D Loss: -0.068229
Epoch [200/2000], G Loss: 0.030691, D Loss: -0.046510
Epoch [250/2000], G Loss: 0.021834, D Loss: -0.019553
Epoch [300/2000], G Loss: 0.028254, D Loss: -0.015413
Epoch [350/2000], G Loss: 0.021077, D Loss: -0.011472
Epoch [400/2000], G Loss: 0.038519, D Loss: -0.015835
Epoch [450/2000], G Loss: 0.020822, D Loss: -0.009085
Epoch [500/2000], G Loss: 0.017442, D Loss: -0.014497
Epoch [550/2000], G Loss: 0.007362, D Loss: -0.008964
Epoch [600/2000], G Loss: 0.007189, D Loss: -0.008726
Epoch [650/2000], G Loss: 0.017352, D Loss: -0.002996
Epoch [700/2000], G Loss: 0.013793, D Loss: -0.003462
Epoch [750/2000], G Loss: 0.006153, D Loss: -0.002070
Epoch [800/2000], G Loss: 0.006394, D Loss: -0.003498
Epoch [850/2000], G Loss: 0.005284, D Loss: -0.004008
Epoch [900/2000], G Loss: 0.005891, D Loss: -0.006060
Epoch [950/2000], G Loss: 0.0

[I 2025-11-07 07:37:16,060] Trial 0 finished with value: 3.947732042339764 and parameters: {'hidden_dim': 32, 'lstm_layers': 3, 'hidden_layer': 32, 'epochs': 2000, 'd_lr': 0.0007949367959213507, 'g_lr': 9.430549910236732e-05}. Best is trial 0 with value: 3.947732042339764.


Epoch [2000/2000], G Loss: 0.004018, D Loss: 0.000949


  d_lr = trial.suggest_loguniform('d_lr', 1e-5, 1e-3)
  g_lr = trial.suggest_loguniform('g_lr', 1e-5, 1e-3)


Epoch [50/1500], G Loss: 0.031850, D Loss: -0.031111
Epoch [100/1500], G Loss: 0.083048, D Loss: -0.011794
Epoch [150/1500], G Loss: 0.049727, D Loss: -0.045627
Epoch [200/1500], G Loss: 0.061754, D Loss: -0.065091
Epoch [250/1500], G Loss: 0.058421, D Loss: -0.099139
Epoch [300/1500], G Loss: 0.045320, D Loss: -0.098236
Epoch [350/1500], G Loss: 0.099684, D Loss: -0.031002
Epoch [400/1500], G Loss: 0.127302, D Loss: -0.142089
Epoch [450/1500], G Loss: 0.053239, D Loss: -0.063330
Epoch [500/1500], G Loss: 0.107464, D Loss: -0.030248
Epoch [550/1500], G Loss: 0.099107, D Loss: -0.081872
Epoch [600/1500], G Loss: 0.061889, D Loss: -0.022950
Epoch [650/1500], G Loss: 0.138558, D Loss: -0.012140
Epoch [700/1500], G Loss: 0.033907, D Loss: -0.033618
Epoch [750/1500], G Loss: 0.066413, D Loss: -0.052490
Epoch [800/1500], G Loss: 0.027744, D Loss: -0.094575
Epoch [850/1500], G Loss: 0.011935, D Loss: -0.026548
Epoch [900/1500], G Loss: 0.031637, D Loss: -0.007438
Epoch [950/1500], G Loss: 0.0

[I 2025-11-07 07:39:11,385] Trial 1 finished with value: 2.518514810268689 and parameters: {'hidden_dim': 128, 'lstm_layers': 2, 'hidden_layer': 32, 'epochs': 1500, 'd_lr': 5.497140398898081e-05, 'g_lr': 1.6115120726795183e-05}. Best is trial 0 with value: 3.947732042339764.


Epoch [1500/1500], G Loss: 0.033620, D Loss: -0.097617


  d_lr = trial.suggest_loguniform('d_lr', 1e-5, 1e-3)
  g_lr = trial.suggest_loguniform('g_lr', 1e-5, 1e-3)


Epoch [50/1500], G Loss: 0.015416, D Loss: -0.002978
Epoch [100/1500], G Loss: 0.007299, D Loss: -0.012328
Epoch [150/1500], G Loss: 0.018373, D Loss: -0.011130
Epoch [200/1500], G Loss: 0.005396, D Loss: -0.008504
Epoch [250/1500], G Loss: 0.005485, D Loss: -0.001047
Epoch [300/1500], G Loss: 0.006885, D Loss: -0.001990
Epoch [350/1500], G Loss: 0.003382, D Loss: 0.000548
Epoch [400/1500], G Loss: 0.001889, D Loss: 0.000318
Epoch [450/1500], G Loss: 0.001791, D Loss: 0.000264
Epoch [500/1500], G Loss: 0.001702, D Loss: -0.000723
Epoch [550/1500], G Loss: 0.001367, D Loss: 0.000820
Epoch [600/1500], G Loss: 0.001640, D Loss: 0.000526
Epoch [650/1500], G Loss: 0.000933, D Loss: 0.000254
Epoch [700/1500], G Loss: 0.000703, D Loss: 0.000265
Epoch [750/1500], G Loss: 0.000603, D Loss: 0.000143
Epoch [800/1500], G Loss: 0.000659, D Loss: 0.000091
Epoch [850/1500], G Loss: 0.000425, D Loss: 0.000178
Epoch [900/1500], G Loss: 0.000391, D Loss: 0.000063
Epoch [950/1500], G Loss: 0.000358, D Lo

[I 2025-11-07 07:41:09,451] Trial 2 finished with value: 4.687251743963873 and parameters: {'hidden_dim': 128, 'lstm_layers': 2, 'hidden_layer': 32, 'epochs': 1500, 'd_lr': 6.546437332816166e-05, 'g_lr': 0.00013237191687062074}. Best is trial 2 with value: 4.687251743963873.


Epoch [1500/1500], G Loss: 0.000132, D Loss: 0.000011
Epoch [50/2000], G Loss: 0.153190, D Loss: -0.028060
Epoch [100/2000], G Loss: 0.110416, D Loss: -0.050778
Epoch [150/2000], G Loss: 0.009349, D Loss: -0.011633
Epoch [200/2000], G Loss: 0.012830, D Loss: -0.009716
Epoch [250/2000], G Loss: 0.017409, D Loss: -0.002516
Epoch [300/2000], G Loss: 0.017378, D Loss: -0.006960
Epoch [350/2000], G Loss: 0.014822, D Loss: -0.013337
Epoch [400/2000], G Loss: 0.005235, D Loss: -0.002084
Epoch [450/2000], G Loss: 0.004174, D Loss: 0.000925
Epoch [500/2000], G Loss: 0.005510, D Loss: -0.004460
Epoch [550/2000], G Loss: 0.002324, D Loss: -0.001547
Epoch [600/2000], G Loss: 0.002654, D Loss: 0.000778
Epoch [650/2000], G Loss: 0.003789, D Loss: -0.001291
Epoch [700/2000], G Loss: 0.001149, D Loss: -0.000797
Epoch [750/2000], G Loss: 0.000597, D Loss: -0.000162
Epoch [800/2000], G Loss: 0.001328, D Loss: -0.000039
Epoch [850/2000], G Loss: 0.002937, D Loss: -0.000833
Epoch [900/2000], G Loss: 0.001

[I 2025-11-07 07:44:49,803] Trial 3 finished with value: 6.337192197484838 and parameters: {'hidden_dim': 128, 'lstm_layers': 4, 'hidden_layer': 128, 'epochs': 2000, 'd_lr': 0.0009255789089317319, 'g_lr': 0.00012586546042489945}. Best is trial 3 with value: 6.337192197484838.


Epoch [2000/2000], G Loss: 0.000222, D Loss: 0.000076


  d_lr = trial.suggest_loguniform('d_lr', 1e-5, 1e-3)
  g_lr = trial.suggest_loguniform('g_lr', 1e-5, 1e-3)


Epoch [50/1500], G Loss: 0.008441, D Loss: -0.006920
Epoch [100/1500], G Loss: 0.011815, D Loss: -0.006283
Epoch [150/1500], G Loss: 0.060874, D Loss: -0.014495
Epoch [200/1500], G Loss: 0.012086, D Loss: -0.015112
Epoch [250/1500], G Loss: 0.017199, D Loss: -0.015027
Epoch [300/1500], G Loss: 0.005866, D Loss: -0.001725
Epoch [350/1500], G Loss: 0.011573, D Loss: -0.015434
Epoch [400/1500], G Loss: 0.021807, D Loss: -0.008225
Epoch [450/1500], G Loss: 0.017290, D Loss: -0.000729
Epoch [500/1500], G Loss: 0.013299, D Loss: -0.010950
Epoch [550/1500], G Loss: 0.016804, D Loss: -0.003027
Epoch [600/1500], G Loss: 0.004326, D Loss: -0.003492
Epoch [650/1500], G Loss: 0.006101, D Loss: -0.002289
Epoch [700/1500], G Loss: 0.008365, D Loss: -0.003641
Epoch [750/1500], G Loss: 0.006175, D Loss: -0.005491
Epoch [800/1500], G Loss: 0.003768, D Loss: -0.003835
Epoch [850/1500], G Loss: 0.004558, D Loss: -0.000876
Epoch [900/1500], G Loss: 0.009038, D Loss: -0.001275
Epoch [950/1500], G Loss: 0.0

[I 2025-11-07 07:46:43,438] Trial 4 finished with value: 3.2787893278439086 and parameters: {'hidden_dim': 64, 'lstm_layers': 2, 'hidden_layer': 32, 'epochs': 1500, 'd_lr': 5.1059343031010776e-05, 'g_lr': 6.849217875032407e-05}. Best is trial 3 with value: 6.337192197484838.


Epoch [1500/1500], G Loss: 0.001450, D Loss: 0.000502


  d_lr = trial.suggest_loguniform('d_lr', 1e-5, 1e-3)
  g_lr = trial.suggest_loguniform('g_lr', 1e-5, 1e-3)


Epoch [50/1500], G Loss: 0.001900, D Loss: 0.000730
Epoch [100/1500], G Loss: 0.001570, D Loss: 0.000956
Epoch [150/1500], G Loss: 0.000737, D Loss: 0.001001
Epoch [200/1500], G Loss: 0.000505, D Loss: 0.000867
Epoch [250/1500], G Loss: 0.000571, D Loss: 0.000739
Epoch [300/1500], G Loss: 0.000341, D Loss: 0.000721
Epoch [350/1500], G Loss: 0.000338, D Loss: 0.000515
Epoch [400/1500], G Loss: 0.000287, D Loss: 0.000418
Epoch [450/1500], G Loss: 0.000394, D Loss: 0.000328
Epoch [500/1500], G Loss: 0.000246, D Loss: 0.000153
Epoch [550/1500], G Loss: 0.000356, D Loss: 0.000254
Epoch [600/1500], G Loss: 0.000423, D Loss: 0.000296
Epoch [650/1500], G Loss: 0.000438, D Loss: 0.000023
Epoch [700/1500], G Loss: 0.000227, D Loss: 0.000270
Epoch [750/1500], G Loss: 0.000377, D Loss: 0.000256
Epoch [800/1500], G Loss: 0.000348, D Loss: 0.000134
Epoch [850/1500], G Loss: 0.000232, D Loss: 0.000209
Epoch [900/1500], G Loss: 0.000539, D Loss: 0.000241
Epoch [950/1500], G Loss: 0.000223, D Loss: 0.0

[I 2025-11-07 07:49:12,156] Trial 5 finished with value: 3.988585491778047 and parameters: {'hidden_dim': 32, 'lstm_layers': 4, 'hidden_layer': 32, 'epochs': 1500, 'd_lr': 1.8702068869591523e-05, 'g_lr': 0.00041761390979180304}. Best is trial 3 with value: 6.337192197484838.


Epoch [1500/1500], G Loss: 0.000072, D Loss: 0.000054


  d_lr = trial.suggest_loguniform('d_lr', 1e-5, 1e-3)
  g_lr = trial.suggest_loguniform('g_lr', 1e-5, 1e-3)


Epoch [50/2000], G Loss: 0.001775, D Loss: -0.000561
Epoch [100/2000], G Loss: 0.001484, D Loss: 0.000308
Epoch [150/2000], G Loss: 0.001665, D Loss: -0.000260
Epoch [200/2000], G Loss: 0.001139, D Loss: 0.000572
Epoch [250/2000], G Loss: 0.001138, D Loss: 0.000592
Epoch [300/2000], G Loss: 0.000944, D Loss: -0.000017
Epoch [350/2000], G Loss: 0.000837, D Loss: 0.000113
Epoch [400/2000], G Loss: 0.000796, D Loss: 0.000299
Epoch [450/2000], G Loss: 0.000797, D Loss: 0.000333
Epoch [500/2000], G Loss: 0.001559, D Loss: -0.000308
Epoch [550/2000], G Loss: 0.001284, D Loss: -0.000319
Epoch [600/2000], G Loss: 0.000792, D Loss: 0.000237
Epoch [650/2000], G Loss: 0.000871, D Loss: 0.000361
Epoch [700/2000], G Loss: 0.000503, D Loss: 0.000399
Epoch [750/2000], G Loss: 0.000495, D Loss: 0.000166
Epoch [800/2000], G Loss: 0.000535, D Loss: 0.000132
Epoch [850/2000], G Loss: 0.000269, D Loss: 0.000384
Epoch [900/2000], G Loss: 0.000235, D Loss: 0.000303
Epoch [950/2000], G Loss: 0.000583, D Loss

[I 2025-11-07 07:51:41,051] Trial 6 finished with value: 3.1876489285229055 and parameters: {'hidden_dim': 32, 'lstm_layers': 2, 'hidden_layer': 64, 'epochs': 2000, 'd_lr': 2.1332381443441837e-05, 'g_lr': 0.0006565789746685719}. Best is trial 3 with value: 6.337192197484838.


Epoch [2000/2000], G Loss: 0.000091, D Loss: 0.000006
Epoch [50/2000], G Loss: 0.004165, D Loss: -0.003565
Epoch [100/2000], G Loss: 0.002880, D Loss: 0.000276
Epoch [150/2000], G Loss: 0.002034, D Loss: -0.001283
Epoch [200/2000], G Loss: 0.001235, D Loss: 0.000563
Epoch [250/2000], G Loss: 0.001376, D Loss: 0.000594
Epoch [300/2000], G Loss: 0.000495, D Loss: 0.000840
Epoch [350/2000], G Loss: 0.000360, D Loss: 0.000618
Epoch [400/2000], G Loss: 0.000900, D Loss: 0.000275
Epoch [450/2000], G Loss: 0.000364, D Loss: 0.000242
Epoch [500/2000], G Loss: 0.000248, D Loss: 0.000228
Epoch [550/2000], G Loss: 0.000272, D Loss: 0.000093
Epoch [600/2000], G Loss: 0.000628, D Loss: 0.000042
Epoch [650/2000], G Loss: 0.002592, D Loss: -0.001525
Epoch [700/2000], G Loss: 0.000529, D Loss: 0.000780
Epoch [750/2000], G Loss: 0.000709, D Loss: 0.000148
Epoch [800/2000], G Loss: 0.000866, D Loss: -0.000106
Epoch [850/2000], G Loss: 0.000383, D Loss: -0.000162
Epoch [900/2000], G Loss: 0.000106, D Los

[I 2025-11-07 07:54:09,602] Trial 7 finished with value: 4.4146973345419624 and parameters: {'hidden_dim': 64, 'lstm_layers': 2, 'hidden_layer': 64, 'epochs': 2000, 'd_lr': 8.338713074153563e-05, 'g_lr': 0.0009226063792509045}. Best is trial 3 with value: 6.337192197484838.


Epoch [2000/2000], G Loss: 0.000034, D Loss: 0.000008
Epoch [50/1500], G Loss: 0.010722, D Loss: -0.010871
Epoch [100/1500], G Loss: 0.008924, D Loss: -0.016516
Epoch [150/1500], G Loss: 0.015158, D Loss: -0.004905
Epoch [200/1500], G Loss: 0.010370, D Loss: -0.001247
Epoch [250/1500], G Loss: 0.004217, D Loss: -0.001919
Epoch [300/1500], G Loss: 0.007364, D Loss: -0.003616
Epoch [350/1500], G Loss: 0.004053, D Loss: -0.000404
Epoch [400/1500], G Loss: 0.001556, D Loss: 0.000703
Epoch [450/1500], G Loss: 0.001414, D Loss: -0.000139
Epoch [500/1500], G Loss: 0.001643, D Loss: 0.001132
Epoch [550/1500], G Loss: 0.001707, D Loss: 0.000066
Epoch [600/1500], G Loss: 0.001006, D Loss: 0.000259
Epoch [650/1500], G Loss: 0.000652, D Loss: 0.000483
Epoch [700/1500], G Loss: 0.000498, D Loss: 0.000063
Epoch [750/1500], G Loss: 0.000515, D Loss: 0.000263
Epoch [800/1500], G Loss: 0.000359, D Loss: 0.000209
Epoch [850/1500], G Loss: 0.000496, D Loss: 0.000075
Epoch [900/1500], G Loss: 0.000271, D 

[I 2025-11-07 07:56:42,102] Trial 8 finished with value: 4.33926484495242 and parameters: {'hidden_dim': 64, 'lstm_layers': 4, 'hidden_layer': 128, 'epochs': 1500, 'd_lr': 2.9196178233748825e-05, 'g_lr': 0.00029048010596313564}. Best is trial 3 with value: 6.337192197484838.


Epoch [1500/1500], G Loss: 0.000144, D Loss: 0.000080
Epoch [50/2000], G Loss: 0.004685, D Loss: -0.006986
Epoch [100/2000], G Loss: 0.004346, D Loss: -0.026809
Epoch [150/2000], G Loss: 0.018945, D Loss: -0.000381
Epoch [200/2000], G Loss: 0.003968, D Loss: -0.003956
Epoch [250/2000], G Loss: 0.007748, D Loss: -0.001555
Epoch [300/2000], G Loss: 0.002899, D Loss: -0.008368
Epoch [350/2000], G Loss: 0.006731, D Loss: -0.003182
Epoch [400/2000], G Loss: 0.008784, D Loss: -0.001376
Epoch [450/2000], G Loss: 0.003381, D Loss: -0.002053
Epoch [500/2000], G Loss: 0.005065, D Loss: -0.001385
Epoch [550/2000], G Loss: 0.015457, D Loss: -0.002860
Epoch [600/2000], G Loss: 0.002214, D Loss: 0.000733
Epoch [650/2000], G Loss: 0.002622, D Loss: 0.000314
Epoch [700/2000], G Loss: 0.001568, D Loss: -0.000589
Epoch [750/2000], G Loss: 0.003204, D Loss: -0.000061
Epoch [800/2000], G Loss: 0.002242, D Loss: -0.003292
Epoch [850/2000], G Loss: 0.001831, D Loss: -0.003974
Epoch [900/2000], G Loss: 0.001

[I 2025-11-07 07:59:05,968] Trial 9 finished with value: 4.63107814774796 and parameters: {'hidden_dim': 64, 'lstm_layers': 2, 'hidden_layer': 32, 'epochs': 2000, 'd_lr': 4.073042329519963e-05, 'g_lr': 0.00010017757125833272}. Best is trial 3 with value: 6.337192197484838.


Epoch [2000/2000], G Loss: 0.000226, D Loss: -0.000035


# Saving Best Hyperparameters

This section saves the best hyperparameters found during the Optuna tuning process to a JSON file, allowing them to be easily reloaded later.

In [None]:
print(best_params)

{'hidden_dim': 128, 'lstm_layers': 4, 'hidden_layer': 128, 'epochs': 2000, 'd_lr': 0.0009255789089317319, 'g_lr': 0.00012586546042489945}


In [None]:
import json

# Assuming best_params is already defined from the hyperparameter tuning
# If not, make sure you run the cell above first.

# Define the path where you want to save the file
save_path = '/content/best_hyperparameters.json'

# Save the dictionary to a JSON file
with open(save_path, 'w') as f:
    json.dump(best_params, f)

print(f"Best hyperparameters saved to {save_path}")

Best hyperparameters saved to /content/best_hyperparameters.json
