Skip to content

RazvanDu/TemporalRNNs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Enhancing Transformer RNNs with Multiple Temporal Perspectives

Abstract

This project introduces a novel approach to Recurrent Neural Network (RNN) architectures by incorporating multiple temporal perspectives. This method enriches the model's understanding of sequential data, significantly improving context interpretation. By integrating this technique into the Receptance Weighted Key Value (RWKV) architecture, we address its limitation of retaining all historical information within a single hidden state, with only a minimal increase in the number of parameters.

Introduction

Transformer networks have achieved remarkable success in various NLP tasks. However, they face challenges with long sequences and computational efficiency. Our work leverages the RWKV architecture, combining Transformer's training advantages with RNN's inference efficiency, and introduces multiple temporal perspectives to enhance language processing capabilities.

Proposed Approach

Our method maintains diverse temporal views of the text, enabling the model to learn complex patterns effectively without full pre-training. This approach involves fine-tuning additional parameters dedicated to multiple temporal perspectives, resulting in improved performance across multiple benchmarks.

Citation

If you use our work in your research, please cite our paper:

@misc{dumitru2024enhancing, title={Enhancing Transformer RNNs with Multiple Temporal Perspectives}, author={Razvan-Gabriel Dumitru and Darius Peteleaza and Mihai Surdeanu}, year={2024}, eprint={2402.02625}, archivePrefix={arXiv}, primaryClass={cs.LG} }

About

Research project on the capabilities of RNNs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published