(components)=
# Components of time series

The code on this page can be used interactively: click {fa}`rocket` --> {guilabel}`Live Code` in the top right corner, then wait until the message {guilabel}`Python interaction ready!` appears.

This page creates an interactive plot of a simulate time series. Throughout this section we will discuss these components.


A time series is a sequence of data points indexed in time in order to study a phenomenon. It is thus the data collected at different points in time. The data are usually collected at fixed time intervals rather than just recording them intermittently or randomly. The fixed interval, in the time domain, is defined as 'sampling interval' and, in the frequency domain, is defined as 'sampling rate' or 'sampling frequency', expressed for example in Hz.

A time series is denoted as 

$$Y(t) = [Y(t_1), Y(t_2), \ldots{}, Y(t_m)]^T$$

The $Y(t_i)$ are random variables, since the data is affected by noise.

The time instants, also defined as epochs, are $t_i = i \Delta t$, indicating that the samples are equally spaced in time intervals of $\Delta t$. Assuming a unit time interval (i.e., $\Delta t=1$), then $t_i = i$ and we can write the time series as 

$$Y(t) = [Y(1), Y(2), \ldots{}, Y(m)]^T = [Y_1, Y_2, \ldots{}, Y_m]^T$$

```{figure} ./figs/time_series.png
:name: time_series
:width: 700px
:align: center

Example of time series with equally spaced time interval $\Delta t$
```

A time series can be decomposed as follows:

$$Y(t) = tr(t) + s(t) + o(t) + b(t) + N(t)$$

where we distinguish the following components:

1. $tr(t)$ = trend, provides the general behavior and variation of the process
2. $s(t)$ = seasonality, shows the regular seasonal variations
3. $o(t)$ = offset, is a discontinuity (or jump) in the data
4. $b(t)$ = irregularities and outliers (also referred to as biases), due to unexpected reasons. Irregularities will not be considered in this book.
5. $N(t)$ = noise, can be white or colored noise.

## Trend

The trend is the general pattern of the time series and shows its long-term changes. It is observed when there is an increasing or decreasing slope in the time series.

```{figure} ./figs/trend.png
:name: trend
:width: 600px
:align: center

Monthly time series of global mean sea level measurements using Satellite Altimetry technique. Source image: https://www.cmar.csiro.au/sealevel/sl_hist_last_decades.html
```

{numref}`trend` shows a positive trend (red line) of around $3.5$ mm/year, which in this case indicates sea level rise. This however needs to be further investigated and tested statistically (see {ref}`hypothesis_testing` and also {ref}`modelling_tsa`).

Trend analysis expresses the changes of the variable of interest with respect to time $t$.
We address here two types of trend analysis.:
Different types of trend are possible for now we will mainly focus on linear trend, i.e. The time-dependent variable $Y(t)$ changes at a (constant) linear rate over time: $Y_t = y_0 + r t + \epsilon_t$. Other trends are however also possible, for example, quadratic or log linear.


## Seasonality

Seasonal variations explain regular fluctuations in a certain period of time (e.g. a year), usually caused by climate and weather conditions (e.g. temperature, rainfall), cycles of seasons, customs, traditional habits, weekends, or holidays. For example, the weekly signal is usually evident in the volume of people engaged in shopping (likely more people prefer going shopping in the weekends)

From {numref}`trend` it is also possible to see the seasonal variations: in fact sea levels are higher in summer and lower in winter. The annual warming/cooling cycle is the main contributor to these seasonal variations.

Regular seasonal variations in a time series might be handled by using a sinusoidal model with one or more sinusoids whose frequency may be known or unknown depending on the context. A harmonic model for seasonal variation can be of the following two equivalent forms (using that $\cos(u+v)= \cos u \cos v - \sin u \sin v$):

$$ 
\begin{align*}
Y(t) &= \sum_{k=1} ^p A_k  \cos(k \omega_0  t + \theta_k)  + \epsilon_t\\
&= \sum_{k=1} ^p \left(a_k  \cos(k \omega_0  t) + b_k  \sin(k \omega_0 t) \right)+ \epsilon_t
\end{align*}
$$

with the coefficients $a_k =A_k\sin\theta_k$ and $b_k=A_k\cos\theta_k$, and where $\omega_0$ is the base (fundamental) frequency of the seasonal variation and is fixed or is determined by Spectral Analysis methods such as {ref}`dft` or FFT. To be more specific, we can use the {ref}`psd` and {ref}`LS-HE` to determine the unknown frequencies. S

The coefficients $a_k $ and $b_k$ can be determined using the least-squares method. From this the original sinusoids can be obtained using:

$$ A_k = \sqrt{a_k^2 + b_k^2}, \hspace{1cm} \theta_k = \arctan(-\frac{b_k}{a_k}), \hspace{1cm} k = 1, \ldots{}, p $$

:::{card} Worked example - seasonality signal

Show that the time series 

$$Y(t)=A \cos(\omega_0 t + \theta)$$ 

with given $\omega_0$, can be rewritten as

$$Y(t)=a \cos(\omega_0 t) + b \sin(\omega_0 t)$$

and derive the formulation of $A$ and $\theta$.

Hint: you might need to know cosine properties $\cos(u+v)=\cos(u)\cos(v)-\sin(u)\sin(v)$

````{admonition} Solution
:class: tip, dropdown
Using the cosine property to rewrite:

$ Y(t)=A \cos(\omega_0 t + \theta) = A (\cos(\omega_0 t)\cos(\theta)-\sin(\omega_0 t)\sin(\theta)) $

Retrieving the functions for a and b

$ a = A \cos(\theta) \hspace{1cm} b = -A \sin(\theta)$

Squaring both functions in order to get rid of the sin and cos

$ a^2 = A^2 \cos^2(\theta) \hspace{1cm} b^2 = A^2 \sin^2(\theta) $

Adding both functions together

$ a^2 + b^2 = A^2 (\cos^2(\theta) + \sin^2(\theta)) $

Using this property to simplify:

$ \cos^2(\theta) + \sin^2(\theta) = 1 $

$ a^2 + b^2 = A^2 $

Take square root to find A

$ \sqrt{a^2 + b^2} = A $ 

For $\theta$ we rewrite the second function

$ a = A \cos(\theta) \hspace{1cm} -b = A \sin(\theta)$

$ \frac{-b}{a} = \frac{\sin(\theta)}{\cos(\theta)} = \tan(\theta) $

$ \theta = \arctan(\frac{-b}{a}) $


[This video](https://youtu.be/8kqQiI4ni68) includes the solution to this exercise. 
````

:::

<!-- (season)=
:::{card} Example - seasonal variations

```{figure} ./figs/sine_wave_1.jpg
:name: trendab
:width: 600px
:align: center

Seasonal variations components: blue line is the time series $Y(t)$; red and green lines represent the contributions $a  \cos(0.5\pi t)$ and  $b   \sin(0.5\pi t)$, respectively.
```

The seasonal variation is given as $y = A \sin(\omega_0 t + \theta)$.

Assume amplitude $A=2$, base frequency $\omega_0=0.5\pi$ and initial phase $\theta = -0.8 \pi$ (rad), see top panel of {numref}`trendab`.

$y(t) = 2 \sin(0.5 \pi t - 0.8\pi)$

The time-delay of the phase is $0.5 t - 0.8 = 0 \Rightarrow t = 1.6 \equiv \theta_t$.

Alternatively we can write 

$y(t) = a  \cos(0.5\pi t) + b   \sin(0.5\pi t)$

where $a = A  \sin(\theta)=-1.1756$ and $b=A  \cos(\theta)=-1.6180$.

::: -->

## Offset (jump)

Offsets are sudden changes in time series. There are different underlying reasons why we encounter offsets in time series. 

```{figure} ./figs/offset.png
:name: offset
:width: 700px
:align: center

Example of time series with two offsets. 
```

As a deterministic sudden change, offsets can be handled by a step function such as a Heaviside step function whose epoch (time instant) can be known or unknown (to be detected) depending on the time series.

In this case the time series is written as 

$$ Y(t) = \sum_{k=1}^q o_k u_k(t)+\epsilon_t$$

where $q$ is the series of offsets (in {numref}`offset` there are two offsets, hence $q=2$) and each of them is expressed as a Heaviside step function 

$$u_k(t) = \left\{
\begin{array}{ll}
      0 & \text{if} \hspace{0.3cm} t<t_k \\
      1 & \text{if} \hspace{0.3cm} t\geq t_k \\
\end{array} 
\right.  $$

## Noise 

Noise simply refers to random fluctuations in the time series about its typical pattern. In general we can talk about white and colored noise in time series analysis. The following characteristics are associated with noise:

- Noise is not necessarily synonymous to error, but part of noise is the random error.
- It is required to filter out unwanted random variations, and detect meaningful information (i.e., a signal) from noise processes.
- Transforming data from the time domain to the frequency domain allows to filter out the frequencies that pollute the data.
- White noise can be decomposed into its constituent components (frequencies) like white light.  In principle, white noise contains all wavelengths/colors, each contributing equally to the fluctuations observed in the data.
- Colored noise can seriously affect the analysis of time series, and their parameters of interest. Short-term colored noise has also predictive property (used for forecasting).

A purely random process (or white noise process) yields a sequence of uncorrelated zero-mean random variables. This zero-mean random process is of the form

$$ Y(t)=Y_t=\epsilon_t $$

where $\epsilon_t$ is the independent identically distributed (i.i.d.) error at epoch $t$. Therefore, the observation/noise at time $t$ is not dependent on the previous observations.

### Stochastic model

A stationary zero-mean random process has an expectation of zero (functional model), and a scaled identity matrix as its covariance matrix (stochastic model). The functional and stochastic models of white noise are of the form 

$$
\mathbb{E}(Y) =  \mathbb{E} \left[\begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_m \end{array}\right] = \left[\begin{array}{c} 0 \\ 0 \\ \vdots \\ 0 \end{array}\right]
$$

and 

$$
\mathbb{D}(Y) =  \Sigma_{Y} = \sigma^2 \left[\begin{array}{ccc} 1 & 0 & \ldots{} & 0 \\ 0 & 1 & \ldots{} & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots{} & 1 \end{array}\right]
$$

The noise can be represented with a Gaussian distribution with mean $\mu=0$ and variance $\sigma^2$, that is $\epsilon(t) \sim \textbf{N}(0, \sigma^2)$.

:::{card} Example - time series consisting of a trend, annual signal (seasonality), an offset and pure random noise (white noise)

It can be written as 

$$Y(t) = y_0 + rt + a \text{cos}(\omega_0 t) + b \text{sin}(\omega_0 t) + o u_k(t) + \epsilon(t)$$

where 
- $y_0$ is the intercept (e.g. in mm)
- $r$ is the rate (e.g. in mm/year)
- $a$ and $b$ are the coefficients of the signal, (e.g. annual signal)
- $\omega_0$ is the frequency (e.g. 1 cycle/year)
- $o$ is the offset starting at time $t_k$
- $\epsilon(t)$ is the i.i.d. random noise, i.e. $\epsilon(t) \sim \textbf{N}(0, \sigma^2)$.
:::

In [10]:

import numpy as np
import micropip
await micropip.install(["plotly", "nbformat"])
import plotly.graph_objects as go


In [None]:

x = np.linspace(0, 10, 500)

def gen_seasons(a=0, b=0, frequency=0):
    y = a * np.cos(frequency*x) + b * np.sin(frequency*x)
    return y

def gen_offset(offset_loc=0, offset_size=0):
    offset = np.zeros_like(x)
    for ind, i in enumerate(x):
        if i >= offset_loc:
            offset[ind] = offset_size
    return offset

def gen_trend(trend_slope=0):
    y = trend_slope * x
    return y

def gen_noise(std=0):
    np.random.seed(int(std*100))
    y = np.random.normal(0, std, size=x.shape)
    return y

def generate_data(trend_slope=0, std=0, a=0, b=0, frequency=0, offset_loc=5, offset_size=0):
    y = gen_trend(trend_slope) + gen_noise(std) + gen_offset(offset_loc, offset_size) + gen_seasons(a, b, frequency)
    return y


In [None]:
# Create figure
fig = go.Figure()

In [None]:

# Function to plot the data
def plot_data(trend_slope=0, a=0, b=0, frequency=1, offset_location=5, offset_size=0, standard_dev=0):
    y = generate_data(trend_slope, standard_dev, a, b, frequency, offset_location, offset_size)
    y_trend = gen_trend(trend_slope)
    y_seas = gen_seasons(a, b, frequency)
    y_offs = gen_offset(offset_location, offset_size)
    y_noise = gen_noise(standard_dev)

    fig, axs = plt.subplots(5, 1, figsize=(8, 8))
    # plt.figure(figsize=(10, 6))
    axs[0].plot(x, y, label="Generated Data")

    axs[0].set_title("Data with Optional Trend and Noise")
    axs[0].grid(True)
    axs[0].tick_params(axis='x', labelbottom=False)  # Remove x-tick labels

    axs[1].plot(x, y_trend, label="Trend")
    axs[1].grid(True)
    axs[1].legend()
    axs[1].tick_params(axis='x', labelbottom=False)  # Remove x-tick labels
    axs[1].set_ylim([-20, 20])

    # plt.ylabel('Different time series components', loc='bottom')
    axs[2].plot(x, y_seas, label="Seasonality")
    axs[2].grid(True)
    axs[2].legend()
    axs[2].tick_params(axis='x', labelbottom=False)  # Remove x-tick labels
    axs[2].set_ylim([-5.5, 5.5])

    axs[3].plot(x, y_offs, label="Offset")
    axs[3].grid(True)
    axs[3].legend()
    axs[3].tick_params(axis='x', labelbottom=False)  # Remove x-tick labels
    axs[3].set_ylim([-5.5, 5.5])

    axs[4].plot(x, y_noise, label="Noise")
    axs[4].grid(True)
    axs[4].legend()
    axs[4].set_ylim([-3, 3])

    # plt.tight_layout()
    plt.xlabel('Time')
    plt.show()


In [11]:
    
# Creating interactive widgets
style = {'description_width': 'initial'}
interact(plot_data,
         trend_slope=widgets.FloatSlider(value=1, min=-2.0, max=2.0, step=0.05, description="Trend Slope", style=style, layout=Layout(width='40%')),
         a=widgets.FloatSlider(value=0, min=0, max=5.0, step=0.05,description="a", style=style, layout=Layout(width='40%')),
         b=widgets.FloatSlider(value=2, min=0, max=5.0, step=0.05,description="b", style=style, layout=Layout(width='40%')),
         frequency=widgets.FloatSlider(value=4, min=0, max=10.0, step=0.05,description="frequency", style=style, layout=Layout(width='40%')),
         offset_location=widgets.FloatSlider(value=5, min=0, max=10, step=0.1, description="offset location", style=style, layout=Layout(width='40%')),
         offset_size=widgets.FloatSlider(value=5, min=-10, max=10, step=0.1, description="offset size", style=style, layout=Layout(width='40%')),
         standard_dev=widgets.FloatSlider(value=0.5, min=0, max=2, step=0.005,description="Standard deviation", style=style, layout=Layout(width='40%')));
        

interactive(children=(FloatSlider(value=1.0, description='Trend Slope', layout=Layout(width='40%'), max=2.0, m…