# Stationarity, Differencing and Autocorrelation 

# Time series patterns

Many time series include <font color="#00dd00">**trend**</font>, <font color="#00dd00">**cycles**</font> and <font color="#00dd00">**seasonality**</font>. When choosing a forecasting method, we will first need to identify the time series patterns in the data, and then choose a method that is able to capture the patterns properly.

* <font color="#00dd00">**Trend**</font>   
<font color="#dddd00">A trend exists when there is a long-term increase or decrease in the data.</font> 
  * A trend does not have to be linear. 
  * Sometimes we will refer to a trend as "changing direction", when it might go from an increasing trend to a decreasing trend.    

* <font color="#00dd00">**Seasonal**</font>  
<font color="#dddd00">Seasonality is always of a fixed and known frequency.</font>  

* <font color="#00dd00">**Cyclic**</font>   
<font color="#dddd00">A cycle occurs when the data exhibit rises and falls that are not of a fixed frequency.</font> 
  * These fluctuations are usually due to economic conditions, and are often related to the "business cycle".   
  The duration of these fluctuations is usually at least 2 years.  

Remarks :  
* <font color="#dddd00">If the fluctuations are not of a fixed frequency then they are cyclic; if the frequency is unchanging and associated with some aspect of the calendar, then the pattern is seasonal.</font>   
* In general, the average length of cycles is longer than the length of a seasonal pattern, and the magnitudes(量值) of cycles tend to be more variable than the magnitudes of seasonal patterns.    

Reference : [Time series patterns](https://otexts.com/fpp2/tspatterns.html#tspatterns)


  

# <font color="#00dd00">**Stationarity(穩健性)**</font>  

A stationary time series is one whose properties do not depend on the time at which the series is observed.   
註 : 穩健性代表時間序列的資料，其平均值與標準差幾乎維持在一固定常數附近，而不是隨著時間的推移而有明顯的變化。

* A time series with trends, or with seasonality, are not stationary — the trend and seasonality will affect the value of the time series at different times.  

* <font color="#dddd00">In general, a stationary time series will have **no predictable patterns (可預測性)** in the long-term.</font> Time plots will show the series to be roughly horizontal (although some cyclic behaviour is possible), with constant variance.

* A time series with cyclic behaviour (but with no trend or seasonality) is stationary. This is because the cycles are not of a fixed length, so before we observe the series we cannot be sure where the peaks and troughs of the cycles will be.  


# <font color="#00dd00">**Differencing**</font>   

The differenced series is the change between consecutive observations in the original time series, and can be written as  
  
$$ y_t^{'} = y_t-y_{t-1} \text{ . } $$  
The differenced series will have only $T-1$ values, where $T$ is the length of the original time series.


* Differencing of a time series in discrete time is the transformation of the series to a new time series where the values are the differences between consecutive observations. This procedure may be applied consecutively more than once, giving rise to the "first differences", "second differences", etc.  

<p align="center">
<img width="600" src="https://raw.githubusercontent.com/YenLinWu/Time_Series_Model/main/Materials/imgs/Differencing.png">
</p>

* In the above figure, the Google stock price was non-stationary in panel (a), but the daily changes were stationary in panel (b).   

* Differencing can help stabilise the mean of a time series by removing changes in the level of a time series, and therefore eliminating (or reducing) trend and seasonality.  

* The <font color="#00dd00">**ACF**</font> plot is useful for identifying non-stationary time series. 
    * <font color="#dddd00">For a stationary time series, the ACF will drop to zero relatively quickly</font>, while the ACF of non-stationary data decreases slowly.   
Also, <font color="#dddd00">**for non-stationary data, the value of $r_1$ is often large and positive**.</font>

<p align="center">
<img width="600" src="https://raw.githubusercontent.com/YenLinWu/Time_Series_Model/main/Materials/imgs/ACF.png">
</p>

Reference : [Stationarity and differencing](https://otexts.com/fpp2/stationarity.html#stationarity)

# <font color="#00dd00">**Autocorrelation**</font>  
Autocorrelation measures the linear relationship between lagged values of a time series.  
$$ r_k = \frac{\displaystyle \sum_{t=k+1}^T(y_t-\bar{y})(y_{t-k}-\bar{y})}{\displaystyle \sum_{t=1}^T(y_t-\bar{y})^2} \text{ , }$$  
where $T$ is the length of the time series. For example, $r_1$ measures the relationship between $y_t$ and $y_{t-1}$, $r_2$ measures the relationship between $y_t$ and $y_{t-2}$, and so on.

* The autocorrelation coefficients are plotted to show the autocorrelation function or <font color="#00dd00">**ACF**</font> plot.  

* ACF plot shows the autocorrelations which measure the relationship between $y_t$ and $y_{t−k}$ for different values of $k$.
 
* Trend and seasonality in ACF plots :  
  * The ACF of **trended** time series tend to have <font color="#dddd00">**positive values that slowly decrease as the lags increase**</font>.  
  * When data are **seasonal**, the autocorrelations will be <font color="#dddd00">**larger for the seasonal lags (at multiples of the seasonal frequency) than for other lags**</font>.

Reference : [ACF and PACF plots](https://otexts.com/fpp2/non-seasonal-arima.html#acf-and-pacf-plots)


# <font color="#00dd00">**Partial Autocorrelation**</font>   

Partial autocorrelations measure the relationship between $y_t$ and $y_{t−k}$ after removing the effects of lags $1$, $2$, $3$, … , $k−1$. 

* The first partial autocorrelation is identical to the first autocorrelation, because there is nothing between them to remove.    

* Each partial autocorrelation can be estimated as the last coefficient in an autoregressive model. Specifically, $\alpha_k$, the $k$th partial autocorrelation coefficient, is equal to the estimate of $\phi_k$ in an AR($k$) model. 

Reference : [Difference between autocorrelation and partial autocorrelation](https://stats.stackexchange.com/questions/483383/difference-between-autocorrelation-and-partial-autocorrelation)


# Notes 

* If the data are from an ARIMA( $p$, $d$, 0 ) or ARIMA( 0, $d$, $q$ ) model, then the ACF and PACF plots can be helpful in determining the value of $p$ or $q$.  

* The data may follow an <font color="#00dd00">**ARIMA( $p$, $d$, 0 )**</font> model if the ACF and PACF plots of the differenced data show the following patterns :    
  * <font color="#dddd00">The <font color="#00dd00">**ACF**</font> is exponentially decaying or sinusoidal (正弦曲線的);</font> 
  * <font color="#dddd00">There is a significant spike at lag $p$ in the <font color="#00dd00">**PACF**</font>, but none beyond lag $p$.</font>    

* The data may follow an <font color="#00dd00">**ARIMA( 0, $d$, $q$ )**</font> model if the ACF and PACF plots of the differenced data show the following patterns :    
  * <font color="#dddd00">The <font color="#00dd00">**PACF**</font> is exponentially decaying or sinusoidal (正弦曲線的);</font>    
  * <font color="#dddd00">There is a significant spike at lag $q$ in the <font color="#00dd00">**ACF**</font>, but none beyond lag $q$.</font>
  
* If $p$ and $q$ are both positive, then the ACF and PACF plots do not help in finding suitable values of $p$ and $q$.
 