# Time Series

Based on:

- https://onlinecourses.science.psu.edu/stat510/book/export/html/661

Formally: 

> A **univariate time series** is a sequence of measurements of the same variable collected over time. Most often, the measurements are made at regular time intervals.

Then **univariate time series**, or just a **time series** is a sequence taken at successive equally spaced points in time.

$$ \{ S_{n} \} = \{ S \; (n \; \Delta t) \}$$

$$ S_{0} = S \; (0 \; * \; \Delta t) $$

$$ S_{1} = S \; (1 \; * \; \Delta t) $$

$$ S_{2} = S \; (2 \; * \; \Delta t) $$

$$ ... $$


![Time series example 1](img/time_series_ex1.png)

The time interval $\Delta t$ at which measurements are taken is named **sampling interval**, and the total time $T$ at which measurements are taken is named **observation time**.

### Note:

> A **time series** is a list of observations where the ordering matters.

Ordering is very important because there is dependency and changing the order could change the meaning of the data.

## First remember ...

**Linear methods** interpret all regular structure in a data set through linear correlations. This means, in brief, that the intrinsic dynamics of the system are governed next paradigm:

> **Linear paradigm:** "*Small cuases lead to small effects*"

**Linear equations** can only lead to exponentially decaying (or growing) or (damped) periodically oscillating solutions.

![Linear systems behaviour](img/linear_systems.jpg)

So, if one has any irregular behaviour and we assume that the system behaves linearly, then it has to be atributted to some random external input to the system:

$$ S_{n} = x_{n} + \eta_{n} $$


> **Chaos paradigm**: "*Nonlinear chaotic systems can produce irregular data with purely deterministic equations of motion in an autonomous way, i.e. without time dependent inputs*"


**Nonlinear chaotic systems** have **dependence on initial conditions**: *Tiny changes in the input lead to LARGE variations in the output*.

![Nonlinear chaotic systems behaviour](img/dependence_on_initial_conditions.png)

It is important to take into account that the system still being **deterministic** in the sense that the variables behave according to their physical rules, then it is not random, but it is **highly unpredictable** and subject to **vast variations**.

## Time series analysis goals

> We look for signatures of **deterministic nature** of the system

### Important characteristics to consider first

Some important questions to first consider when first looking at a time series are:

- Is there a **trend**?
> On average, the measurements tend to increase (or decrease) over time?

- Is there **seasonality**?
> There is a **regularly repeating pattern** of highs and lows related to calendar time such as seasons, quarters, months, days of the week, and so on?

- Are there **outliers**?
> In regression, outliers are far away from your model. With time series data, your outliers are far away from your other data.

- Is there a **long-run cycle** or period unrelated to seasonality factors?

- Is there **constant variance** over time, or is the **variance non-constant**?

- Are there any **abrupt changes** to either the level of the series or the variance?

# Stationarity

We need to know that the numbers we measure correspond to properties of the studied object, up to some measurement error. 

> **Reproducibility** is closely connected to two different notions of **stationarity**.

## First concept of Stationarity (weakest form)

> Stationarity requires that all parameters that are relevant for a system's dynamics have to be fixed and constant during the measurement period (and these parameters should be the same when the
experiment is reproduced).

- This is a requirement to be fulfilled not only by the experimental set-up but also by the process taking place in this fixed environment.

- If the process under observation is a probabilistic one, it will be characterised by probability distributions for the variables involved. For a stationary process, these probabilities may not depend on time. The same holds if the process is specified by a set of transition probabilities between different states.

- If there are deterministic rules governing the dynamics, these rules must not change during the time covered by a time series.

### Unfortunately ...

... In most cases we do not have direct access to the system which produces a signal and we cannot establish evidence that its parameters are indeed constant.

## Second concept of stationarity (which is based on the available data itself) 

> A signal is called stationary if all joint probabilities of finding the system at some time in one state and at some later time in another state are independent of time within the observation period, i.e. when calculated from the data.

From [Stationary process - Wikipedia](https://en.wikipedia.org/wiki/Stationary_process)

> Stationary process (a.k.a. a **strict/strictly stationary process** or **strong/strongly stationary process**) is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. Consequently, parameters such as mean and variance also do not change over time.

A time series, as any other measurement, has to provide enough information to determine the quantity of interest unambiguously.

This includes the **constancy of relevant parameters**, but it also requires that **phenomena belonging to the dynamics are contained in the time series sufficiently frequently**, so that the probabilities or other rules can be inferred properly.

### Remarks

> We deal with the problem of **how non-stationarity can be detected for a given data set**, but obviously stationarity is a property which can never be positively established.

There are many processes which are formally stationary when the limit of infinitely long
observation times can be taken, but which behave effectively like non-stationary
processes when studied over finite times, for example: intermittency.

- If the observed signal is quite regular almost all of the time, but contains one very irregular burst every so often, then the time series has to be considered to be non-stationary for our purposes, even in the case where all parameters remained exactly constant but the signal is intermittent.

- Only if the rare events (e.g. the irregular bursts mentioned before) also appear several times in the time series can we speak of an effective independence of the observed joint probabilities and thus of stationarity.

### Note:

Be aware that almost all the methods and results on time series analysis assume the validity of both conditions:

- **The parameters of the system remain constant**.
- **The phenomenon is sufficiently sampled:** the time series should cover a stretch of time which is much longer than the longest characteristic time scale that is relevant for the evolution of the system ... **Remember:** we are looking for **reproducibility**, and it requires that the probabilities or other rules must be inferred properly.

> The concentration of sugar in the blood of a human is driven by the consumption of food and thus roughly follows a 24 hour cycle. If this quantity is recorded over 24 hours or less, the process must be considered
non-stationary no matter how many data points have been taken during that time

### Why do we care about stationarity ?

> **Fact:** we try to approach a dynamical phenomenon through a single finite time series, and hence is a **requirement** of almost all statistical tools for time series data, including the linear ones.

Time series analysis methods can be applied to any sequence of data, including non-stationary data. However:

> When data is not stationary: "**The results cannot be assumed to characterise the underlying system**".

### Methods to deal with non-stationary data

- One way out can be segmentation of the time series into almost stationary segments

# Testing stationarity - Practical methods

A series $x_{n}$ is said to be stationary must fulfill the condition:

> The **dynamical properties** of the system underlying a signal **must not change** during the observation period.

Then, it must satisfy the following properties:

- The mean is the same for all $n$.
- The variance of $x_{n}$ is the same for all $n$.
- The covariance (and also correlation) between $x_{n}$ and $x_{n-\tau}$ is the same for all $n$.

Where $\tau$ is the **time lag**.


## Requirements:

- As a first requirement, the time series should cover a stretch of time which is much longer than the longest characteristic time scale that is relevant for the evolution of the system.

Quantitative information can be gained from the power spectrum. The longest relevant time scale can be estimated as the inverse of the lowest frequency which still contains a significant fraction of the total power of the signal.

A time series can be considered stationary only on much larger time scales.

## Method 1

> Check if dynamical properties of the system do change over time

It can be checked simply by measuring such properties for several segments of the data set.

> **Note:** Characteristics with known or negligible statistical fluctuations are preferable for this purpose.

The **statistically most stable quantities** are:

- The **mean**
- The **variance**

To detect less obvious non-stationarity, it may be needed more **subtle quantities** such as:

- Spectral components
- Correlations
- Use nonlinear statistics.

**Summary:**

Try to compute:

- Moving average (rolling average or running average)
- Moving variance (rolling variance or running variance)
- Transition probabilities
- Correlations
- ... Among others

These quantities must not differ beyond their statistical fluctuations.

### What about chaotic systems?

In **experimental chaotic systems**, it is not uncommon for **a parameter drift to result in no visible drift** in the mean or the distribution of values. Linear correlations and the spectrum may also be unaffected. **Only the nonlinear dynamical relations and transition probabilities change appreciably**.

### ... The problem of being sufficiently sampled

> Test for convergence

Whether the data set is a sufficient sample for a particular application, such as the estimate of a characteristic quantity, may be tested by **observing the convergence of that quantity when larger and larger fractions of the available data are used for its computation**:

> An attractor dimension obtained from the first half of the data must not differ substantially from the value determined using the second half, and should agree with the value computed for the whole data set within the estimated statistical errors.

This test is very crude since the convergence of nonlinear statistics can be very slow and, indeed, not much is known about its rate.

### Remarks

- Unlike in the linear theory, the estimates of nonlinear quantities that we will make later cannot usually be studied sufficiently to rigorously prove their correctness and convergence.

- It is relevant to distinguish between a quantity such as a mean value, and the way one derives a number for it from a finite sample. This is called an estimate, and there can exist different estimates for the same  cuantity in the same data set, depending on the assumptions made.

## Autocorrelation

For **Mean** and **Variance** the time ordering of the measurements is irrelevant and
thus **they cannot give any information about the time evolution of a system**. 

**Autocorrelation** gives this type of information.

The estimation of the autocorrelations from a time series is straightforward as long
as the lag $\tau$ is small compared to the total length of the time series. Therefore,
estimates of autocorrelation, are only reasonable for $\tau << N$.


If we plot values sn versus the corresponding values a fixed lag $\tau$ earlier, $S_{n-\tau}$, the autocorrelation $c_{\tau}$, quantifies how these points are distributed.

Cases:

- If they spread out evenly over the plane, then $c_{\tau} = 0$.
- If they tend to crowd along the diagonal $S_{n} = S_{n-\tau}$, then $c_{\tau} > 0$.
- If they are closer to the line $S_{n} = - S_{n - \tau}$, then $c_{\tau} < 0$.

The latter two cases reflect some tendency of $S_{n}$ and $S_{n-\tau}$ to be proportional to each other, which makes it plausible that the autocorrelation function reflects only linear correlations.

Obviously, if a signal is periodic in time, then the autocorrelation function is periodic in the lag $\tau$.

![Autocorrelation values and interpretations 1](img/Corr_C14.png)

![Autocorrelation values and interpretations 2](img/moment-correlation-coefficient.jpg)

### Autocorrelations, noise and chaos

Note:

- Stochastic processes have decaying autocorrelations but the rate of decay depends on the properties of the process.
- Autocorrelations of signals from deterministic chaotic systems typically also decay exponentially with increasing lag. 

> Autocorrelations are not characteristic enough to distinguish random from deterministic chaotic signals.

## Fourier transform: power spectrum, periodogram and spectrogram

Instead of describing the statistical properties of a signal in real space one can ask
about its properties in Fourier space. 

The Fourier transform establishes a one-to-one correspondence between the signal at certain times (time domain) and how certain frequencies contribute to the signal, and how the phases of the oscillations are
related to the phases of other oscillations (frequency domain).

The power spectrum is particularly useful for studying the oscillations of a system. There will be sharper or broader peaks at the dominant frequencies and at their integer multiples, the harmonics.

- Purely periodic or quasi-periodic signals show sharp spectral lines.
- Measurement noise adds a continuous floor to the spectrum. 

> Thus in the spectrum, purely periodic signal and noise are readily distinguished.

- Deterministic chaotic signals may also have sharp spectral lines but even in the absence of noise there will be a continuous part of the spectrum.


This is an immediate consequence of the exponentially decaying autocorrelation function.

> Without additional information it is impossible to infer from the spectrum whether the continuous part is due to noise on top of a (quasi-)periodic signal or to chaoticity.

**See:** Thomas S. Parker, Leon Chua - Practical Numerical Algorithms for Chaotic Systems.