This project implements and compares three neural network architectures for small-scale time series forecasting:
- Long Short-Term Memory (LSTM)
- Recurrent Neural Network (RNN)
- Temporal Convolutional Network (TCN)
The goal is to forecast continuous time-dependent values using synthetic tabular data. The dataset represents a single time series with random variations in amplitude, frequency, and noise to simulate realistic temporal behavior.
Time series forecasting aims to predict future values based on previously observed data.
In this project, we focus on a small-scale setup—short time series with limited features—to demonstrate the behavior and performance of different sequence modeling architectures.
The workflow includes:
- Generating synthetic time series data.
- Preparing sequences for supervised learning.
- Training the LSTM, RNN, and TCN models.
- Evaluating model performance on test data.
- Visualizing actual vs. predicted results.
LSTM networks are a variant of RNNs designed to overcome the vanishing gradient problem.
They maintain long-term dependencies using cell states and gating mechanisms:
- Input gate: Decides which values to update.
- Forget gate: Determines which information to discard.
- Output gate: Controls the output from the cell state.
LSTMs are well-suited for sequential data and can model long-range dependencies effectively.
A basic RNN maintains a hidden state that captures information from previous time steps.
However, standard RNNs often struggle with long-term dependencies due to gradient vanishing during backpropagation.
They remain useful for shorter sequences and as a baseline for sequence modeling tasks.
A TCN uses 1D dilated causal convolutions instead of recurrent connections.
This allows it to:
- Capture long-term dependencies via dilation.
- Process sequences in parallel.
- Avoid vanishing gradient issues common in RNNs.
Unlike RNNs, TCNs do not require sequential computation, leading to faster training and inference.
The dataset is synthetically generated:
- Time values range from 0 to 200 with step size 0.1.
- The signal combines sine waves of random frequencies and amplitudes.
- Random Gaussian noise is added to introduce variability.
Each data point depends on the previous sequence of 20 time steps (sequence length = 20).
- Loss Function: Huber Loss
- Optimizer: Adam
- Learning Rate: 3e-4
- Epochs: 500
- Batching: Entire dataset used in each epoch (suitable for small-scale)
Each model predicts future values for the test set.
Plots display:
- Actual vs. Predicted values for both training and testing regions.
- Shaded area to indicate the test portion of the data.
Performance is measured using the test loss (Huber Loss value).
- Python ≥ 3.8
- PyTorch
- NumPy
- Pandas
- Matplotlib
- scikit-learn
Install dependencies:
pip install torch numpy pandas matplotlib scikit-learn