Skip to content

Akai01/ArarForecast.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ArarForecast

Time series forecasting using ARAR Algorithm. Ref: Introduction to Time Series and Forecasting, Chapter: 10.1.The ARAR Algorithm (Peter J. Brockwell Richard A. Davis (2016) )

Installation

using Pkg
 # dev version
Pkg.add(url = "https://github.com/Akai01/ArarForecast.jl.git")

Usage

Load packages

using CSV
using Downloads
using DataFrames
using TimeSeries
using Dates
using ArarForecast

Download and load the data

dta = CSV.File(Downloads.download("https://raw.githubusercontent.com/Akai01/example-time-series-datasets/main/Data/AirPassengers.csv")) |> DataFrame;
train = filter(row -> row["ds"] < Date(1960,1,1), dta);
test = filter(row -> row["ds"] >= Date(1960,1,1), dta);

Create a TimeArray:

train = (date = train[:,"ds"], data = train[:, "y"]);
train = TimeArray(train; timestamp = :date);
test = (date = test[:,"ds"], test = test[:, "y"]);
test = TimeArray(test; timestamp = :date);
length(test)

## 12

There are different ways to create a TimeArray see TimeSeries.jl package.

Forecasting

fc = arar(;y = train, h = 12, freq = Month, level = [80, 95]);
typeof(fc)

## ArarForecast.Forecast

Plot the Forecast Object

p = ArarForecast.plot(;object = fc)

## Plot{Plots.GRBackend() n=6}

using Plots
Plots.plot(p, test)

The accuracy

accuracy(fc, test)

## (me = [-11.275515996902792], mae = [13.065720800742184], mape = [2.8577953269438026], mdae = [8.718991617233343], rmse = 18.21764747304158)

Benchmark with R forecast package auto.arima

Load the data in and create a ts object

library(magrittr)
dta = read.csv("https://raw.githubusercontent.com/Akai01/example-time-series-datasets/main/Data/AirPassengers.csv")%>%
  dplyr::mutate(ds = as.Date(ds))
head(dta)

##           ds   y
## 1 1949-01-31 112
## 2 1949-02-28 118
## 3 1949-03-31 132
## 4 1949-04-30 129
## 5 1949-05-31 121
## 6 1949-06-30 135

train <- dta%>%dplyr::filter(ds < as.Date("1960-01-01"))

train_ts <- train%>%dplyr::select(-ds)%>%
  ts(start = c(1949, 1), frequency = 12)

test <- dta%>%dplyr::filter(ds >= as.Date("1960-01-01"))

test_ts <- test%>%dplyr::select(-ds)%>%
  ts(start = c(1960, 1), frequency = 12)

Train and forecast 12 months ahead:

fc <- forecast::auto.arima(train_ts)%>%
  forecast::forecast(h = 12)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

Plot the forecast

forecast::autoplot(fc) + forecast::autolayer(test_ts)

forecast::accuracy(fc$mean, test_ts)

##                 ME    RMSE      MAE       MPE     MAPE       ACF1 Theil's U
## Test set -16.98639 23.9317 18.52768 -3.933491 4.182395 0.04802038 0.5336134

Accuracy Benchmark: R forecast::auto.arima 4.18 vs Julia ArarForecast 2.85

How does the ARAR algorithm Work?

Memory Shortening

The ARAR algorithm applies a memory-shortening transformation if the underlying process of a given time series Yt, t = 1, 2, ..., n is “long-memory” then it fits an autoregressive model.

The algorithm follows five steps to classify Yt and take one of the following three actions:

  • L: declare Yt as long memory and form Yt by t = Yt − ϕ̂Yt − τ̂
  • M: declare Yt as moderately long memory and form Yt by t = Yt − ϕ̂1Yt − 1 − ϕ̂2Yt − 2
  • S: declare Yt as short memory.

If Yt declared to be L or M then the series Yt is transformed again until. The transformation process continuous until the transformed series is classified as short memory. However, the maximum number of transformation process is three, it is very rare a time series require more than 2.

    1. For each τ = 1, 2, ..., 15, we find the value ϕ̂(τ̂ ) of ϕ̂ that minimizes formula then define Err(τ) = ERR(ϕ(τ̂ ), τ) and choose the lag τ̂ to be the value of τ that minimizes Err(τ).
    1. If Err(τ̂) ≤ 8/n, Yt is a long-memory series.
    1. If ϕ̂(τ̂) ≥ 0.93 and τ̂ > 2, Yt is a long-memory series.
    1. If ϕ̂(τ̂) ≥ 0.93 and τ̂ = 1 or 2, Yt is a long-memory series.
    1. If ϕ̂(τ̂) < 0.93, Yt is a short-memory series.

Subset Autoregressive Model:

In the following we will describe how ARAR algorithm fits an autoregressive process to the mean-corrected series Xt = St − , t = k + 1, ..., n where St, t = k + 1, ..., n is the memory-shortened version of Yt which derived from the five steps we described above and is the sample mean of Sk + 1, ..., Sn.

The fitted model has the following form:

Xt = ϕ1Xt − 1 + ϕ1Xt − l1 + ϕ1Xt − l1 + ϕ1Xt − l1 + Z

where Z ∼ WN(0,σ2). The coefficients ϕj and white noise variance σ2 can be derived from the Yule-Walker equations for given lags l1, l2, and l3:

formula

and σ2 = γ̂(0)[1−ϕ1ρ̂(1)] − ϕl1ρ̂(l1)] − ϕl2ρ̂(l2)] − ϕl3ρ̂(l3)], where γ̂(j) and ρ̂(j), j = 0, 1, 2, ..., are the sample autocovariances and autocorelations of the series Xt.

The algorithm computes the coefficients of ϕ(j) for each set of lags where 1 < l1 < l2 < l3 ≤ m where m chosen to be 13 or 26. The algorithm selects the model that the Yule-Walker estimate of σ2 is minimal.

Forecasting

If short-memory filter found in first step it has coefficients Ψ0, Ψ1, ..., Ψk(k≥0) where Ψ0 = 1. In this case the transforemed series can be expressed as

formula

where Ψ(B) = 1 + Ψ1B + ... + ΨkBk is polynomial in the back-shift operator.

If the coefficients of the subset autoregression found in the second step it has coefficients ϕ1, ϕl1, ϕl2 and ϕl3 then the subset AR model for Xt = St −  is

where Zt is a white-noise series with zero mean and constant variance and ϕ(B) = 1 − ϕ1B − ϕl1Bl1 − ϕl2Bl2 − ϕl3Bl3. From equation (1) and (2) one can obtain

where ξ(B) = Ψ(B)ϕ(B).

Assuming the fitted model in equation (3) is an appropriate model, and Zt is uncorrelated with Yj, j < tt ∈ T, one can determine minimum mean squared error linear predictors PnYn + h of Yn + h in terms of 1, Y1, ..., Yn for n > k + l3, from recursions

formula

with the initial conditions PnYn + h = Yn + h, for h ≤ 0.

Ref: Brockwell, Peter J, and Richard A. Davis. Introduction to Time Series and Forecasting. Springer (2016)