Skip to content

ACM40960/DeepRecurrentFactorModels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

48 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Static Badge Static Badge Static Badge Static Badge License

Deep Recurrent Factor Model is a term coined by the authors of the paper Deep Recurrent Factor Model: Interpretable Non-Linear and Time-Varying Multi-Factor Model. The authors challenge the idea of linear factor models to predict stock returns and use Long-Short-Term Memory networks (LSTM) in conjunction with layer-wise-relevance propagation (LRP) to construct a time-varying factor model that outperforms equivalent linear models, whilst providing insights into the relvance of particular factors in the prediction.

This repository provides classes, functions and notebooks to test Deep Recurrent Factor Models on the US-stock market.

Welcome to the Deep Recurrent Factor Model + LRP Repository. By building upon familiar modules like Keras and Tensorflow, this repository allows you to create deep LSTM networks and fascilitate LRP through the provided classes and methods.

The provided classes and methods take care of the complex task of backpropagating relevance through any variation of custom Input, LSTM, Dense or Dropout layers. This means you can build deep feed-forward LSTM networks and effortlessly backpropagate relvance scores for predictions. The model builds on top of the Functional API by Keras to provide compatability with various functionalities that come with Keras and Tensorflow.

In an Example we explore and replicate the approach suggested in the paper Deep Recurrent Factor Model: Interpretable Non-Linear and Time-Varying Multi-Factor Model to test our implementation of customs LSTM models with LRP on the US-stock market.

In order to get started clone the GitHub repository to your local machine:

git clone https://github.com/ACM40960/DeepRecurrentFactorModels
  • Make sure to have python 3.11+ installed - if not, download the latest version of Python 3.

  • Install all necessary dependencies:

    cd ./project-mkaywins
    pip install -r requirements.txt

If you want to build your own deep LSTM model, then you need to use the Functional API by Keras in cunjunction with the provided LSTM layer CustomLSTM and the provided model class CustomModel.

We show in an exmaple below how to build such a model:

# Build an example model
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Define input dimensions
timesteps = 5
input_dim = 16
input_shape = (timesteps, input_dim)

#1) Create input layer
input_layer = Input(shape=input_shape, name="Input")

#2) Create a CustomLSTM layer
lstm_output, hidden_state, cell_state = CustomLSTM(units=16, return_sequences=False,
                                                return_state=True,
                                                kernel_regularizer=L2(0.02),
                                                name="CustomLSTM_1")(input_layer)
#3) Apply dropout to LSTM output
dropout1 = Dropout(0.2)(lstm_output)

#4) Create a Dense layer
dense1 = Dense(16, kernel_regularizer=L2(0.02), name="Dense_1")(dropout1)

#5) Apply dropout to dense1 output
dropout2 = Dropout(0.2)(dense1)

#6) Create the final output layer
output_layer = Dense(1, kernel_regularizer=L2(0.02), name="Dense_2_Final")(dropout2)

# Create an instance of your custom model
custom_model = CustomModel(inputs=input_layer, outputs=output_layer)

# Compile the model
custom_model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mae'])

# # Generate some example training data (replace this with your actual data)
num_samples = 100
X_train = np.random.rand(num_samples, timesteps, input_dim)
y_train = np.random.rand(num_samples, 1)

# Train the model
custom_model.fit(X_train, y_train, epochs=10, batch_size=32)

After fitting the model we can proceed to compute the relevance for each input feature. To backpropagate relevance through the entire network, i.e. from the output layer to the input layer, we use the approach suggested by Arras et al. (2017). In order to propagate relevance between layers (including LSTM layers and linear layers), we use the following relevance distribution rule:

$$R_{i\leftarrow j} = \frac{z_i \cdot w_{ij} + \frac{\epsilon \cdot \text{sign}(z_j) + \delta \cdot b_j }{N}}{z_j + \epsilon \cdot \text{sign}(z_j)} \cdot R_j,$$

where

  • $R_j$ represents the relvance of nodes in upper layers,
  • $z_i$ is the activation of nodes in the lower layer
  • $z_j$ is the activtion of nodes in the upper layer
  • $w_{ij}$ are the weights connecting nodes from lower and upper layers
  • $\epsilon$ is a small number to avoid division by 0 - it is usually set to 0.001
  • Ξ΄ is a multiplicative factor that is either 1 or 0 (see details)

The rule propagates relevance from a higher layer to a lower layer using a fraction of the relvance for the higher layer for each node in the lower layer. We initilaise the relevance for our purposes with the final prediction $y_{T+1}$ itself.

Here is an illustration of how the relevance is backpropagted in the network. The intensity and size of the blue lines represent the amount of relevance that is propagated onto the next layer.

The propagation of relevance through a LSTM cell is not straight-forward as there a multiple components envolved in a single LSTM cell that interact with each other (signals, gates, etc.) to feed the provided input through the system. Thus, we cannot simple use the 'linear' backpropagation rule from above.

For the backpropagation of relevance in a LSTM cell we provide two approaches:

  1. the approach suggested by Arras et al. (2019), which discountes the relevance scores by 'forget factors' of the LSTM cell at each point in time.

  2. the approach suggested by Arjona-Medina, et al. (2019) - A8.4.2, who make a list of assumptions on the LSTM cell archticeture and characteristics themselves to facilitate relevance propagation without disounting relevance scores through 'forget factors' of the LSTM cell

Both approaches use the "signal takes it all" approach to handle the distribution of relevance scores in multiplicative connections within the LSTM cell (refer to the paper for details).

Here is an illustration of how the relevance is backpropagated through each LSTM cell.

Let us see how we computed relevance and conduct relevance propagation in our custom model:

# Create sample input data to test LRP
input_data = np.random.rand(1, timesteps, input_dim) # sample input

# Copmute LRP for entire network
custom_model.backpropagate_relevance(input_data, type="arras") # Arras et al. (2019)
custom_model.backpropagate_relevance(input_data, type="rudder") # Arjona-Medina, et al. (2019)

As the input to the model is of dimensions (timesteps, input_dim) = (5, 16), the resulting relevance scores will have the same dimensions.

One can decide on whether to aggregate relevance scores for LSTM layers. One can aggregate the relevance scores before propagating them to the next lower layer, by means of the aggregate argument.

  • aggregate = True: the average across all timesteps is propagated to the next layer
  • aggregate = False: the relevance score corresponding to the most recent input is propagated to the next layer

Note: This rule is only relevant if you use CustomLSTM layer with the argument return_sequences set to True.

# aggregate relvance across time - use the most recent input to the layer for relvance 
custom_model.backpropagate_relevance(input_data, aggregate=False, type="arras") 

# aggregate relvance across time - average relevance of features across time
custom_model.backpropagate_relevance(input_data, aggregate=True, type="rudder")

We provide an example of the classes and methods in this repository and test our implementation on the US-stock market. Hence, please refer to the following notebook for the analysis and results:

  • Replication of Experiments in 'Deep Factor Models'

  • You can launch the notebook directly in your browser using binder:

    Binder

  • You can find the literature review for this analysis here.

  • For a summary of MSE, RMSE scores for deep recurrent facor models please refer to this Excel File.

  • Presentation slides πŸ› for the university course can be found here

  • Arjona-Medina, J. A., Gillhofer, M., Widrich, M., Unterthiner, T., Brandstetter, J., & Hochreiter, S. (2019). Rudder: Return decomposition for delayed rewards. Advances in Neural Information Processing Systems, 32.

  • Arras, L., Montavon, G., MΓΌller, K. R., & Samek, W. (2017). Explaining recurrent neural network predictions in sentiment analysis. arXiv preprint arXiv:1706.07206.

  • Arras, L., Arjona-Medina, J., Widrich, M., Montavon, G., Gillhofer, M., MΓΌller, K. R., ... & Samek, W. (2019). Explaining and interpreting LSTMs. Explainable ai: Interpreting, explaining and visualizing deep learning, 211-238.

  • Chen, A. Y., & Zimmermann, T. (2021). Open source cross-sectional asset pricing. Critical Finance Review, Forthcoming.

  • Nakagawa, K., Ito, T., Abe, M., & Izumi, K. (2019). Deep recurrent factor model: interpretable non-linear and time-varying multi-factor model. arXiv preprint arXiv:1901.11493.

About

project-mkaywins created by GitHub Classroom

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published