# Running average of time series
When dealing with real data, we often form an estimate of a moment (e.g. mean) using a single time series.  This demonstration explores the effect that the autocorrelation of the time series has on the quality of this estimate.

### Preamble
Start by importing the Python libraries that we will require

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import lfilter

And define a function that will return true if running in a Jupyter Notebook

In [None]:
def is_jupyter():
    """Return true if running in a Jupyter Notebook"""
    try:
        if get_ipython().__class__.__name__ == 'ZMQInteractiveShell':
            return True
        else:
            return False
    except: 
        return False

### User specified parameters

Set up the first filter for a rapidly varying noise signal

Parameter | Meaning
--------- | -------
<code>L</code> | The number of noise samples to generate (should be large) (e.g. 10000)
<code>P</code> | The number of sample points to show (should be much smaller than <code>L</code>) (e.g. 200)
<code>M</code> | The number of maximum ACF delay to evaluate (e.g. 40)

In [None]:
L = 10000
P = 200
M = 40

### Plotting functions
For convenience, we define a function for plotting the time series

In [None]:
def plot_fig(points, average, xlabel, ylabel, title, legend, name):
    """
       Funtion to plot the amplitude of time series data and estimated mean data.
       
       INPUT:
           points  (array-like): The vertical coordinates of data points of time series. 
           average (array-like): The vertical coordinates of data points of estimated mean. 
           xlabel      (string): The label for x-axis.
           ylabel      (string): The label for y-axis.
           title       (string): The title for the figure.
           name        (string): The name used to save figure.
           legend ([legend0, legend1]): The legends for points and average lines.
           
    """
    
    # Create the plot figure
    plt.figure(figsize = (16, 8))
    # Update label font size
    plt.rcParams.update({'font.size': 16})
    
    # Plot the amplitude
    ax = plt.gca()
    ax.plot(np.arange(0, len(points)), points, 'b', 
            marker = 'o', markersize = 3, 
            linestyle = 'dotted', linewidth = 1,
            label = legend[0])
    ax.plot(np.arange(0, len(points)), average, 'r', 
            marker = 'x', markersize = 3,
            linewidth = 1,
            label = legend[1])
    
    # Tidy up the plot and add axes labels
    plt.xlim([0,len(points)-1])
    plt.xlabel(xlabel, fontsize = 20)
    plt.ylabel(ylabel, fontsize = 20)
    plt.legend(prop={'size': 20})
    plt.title(title, fontsize = 20)
    
    # Save figure in python or ipython system
    if not is_jupyter(): plt.savefig(name)

and one for the autocorrelation.

In [None]:
def stem_fig(X, Y, xlabel, ylabel, title, name):
    """
       Funtion to create stem plot
       
       INPUT:
           X  (array-like): The x-positions of the stems. 
           Y  (array-like): The y-values of the stem heads.
           xlabel (string): The label for x-axis.
           ylabel (string): The label for y-axis.
           title  (string): The title for the figure.
           name   (string): The name used to save figure.
           
    """
    
    # Create the plot figure
    plt.figure(figsize = (16, 8))
    plt.rcParams.update({'font.size': 16})
    
    # Stem plot and its circle marker
    (markerLines, stemLines, baseLines) = plt.stem(X, Y, use_line_collection=True)
    plt.setp(stemLines, linewidth=1)
    plt.setp(baseLines, color = 'black', linewidth=1)
    markerLines.set_markersize(4)
    markerLines.set_markerfacecolor('none')
    
    # Tidy up the plot and add axes labels
    plt.xlim([X[0],X[-1]])
    plt.xlabel(xlabel, fontsize = 20)
    plt.ylabel(ylabel, fontsize = 20)
    plt.title(title, fontsize = 20)
    
    # Save figure in python or ipython system
    if not is_jupyter(): plt.savefig(name)

### Generate noise signal
This function generates the noisy time series that we are going to perform our estimation on.  The value of <code>L</code> needs to be significantly larger than the number of points we are interested in, as this type of filter (an IIR) can have a long transient response.

In [None]:
def noise_signal(alpha, L):
    # Define IIR filter taps
    b = [1-alpha]
    a = [1, -alpha]
    
    # Generate a white noise input sequence with a DC offset
    x = np.random.randn(L) + 0.5
    
    # Use this to generate a time varying signal and return it
    return(lfilter(b, a, x))

### $\alpha = 0.15$
We will explore two different filter settings in this script.  The first setting, with a small $\alpha$, produces noise with low correlation between samples

In [None]:
alpha = 0.15

y = noise_signal(alpha, L)

# Select the last P points in the output
points = y[-P:]

# Create space for running average
average = np.zeros(P)

# Compute the running average
for n in range(P):
    average[n] = np.sum(points[:n+1])/(n+1)

### Time series
Here we plot the time series of the noisy signal, and plot the average that we would estimate if we were to use all of the data in the plot up to that point.  (So, for example, the red point at sample 20 uses all of the signal from sample 0 up to sample 20 to form the estimate of the mean).  It can be seen that the average quickly settles down, and is close to the true mean of 0.

In [None]:
xlabel = 'Sample'
ylabel = 'Amplitude'
title = 'Time estimation of an average'
legend = ['Time series','Estimate of mean']
name = 'Time_average_low_alpha.pdf'

plot_fig(points, average, xlabel, ylabel, title, legend, name)

### Autocorrelation
From the autocorrelation it can be seen that adjacent samples of the time series are only weakly correlated, and beyond that point the correlation is even lower.  So, although the samples are not independent, the correlation is small enough that a good estimate of the mean can be quickly formed.

In [None]:
acf_xaxis = np.arange(-M, M+1)
acf = (1-alpha) * np.power(alpha, (abs(acf_xaxis))) / (1+alpha)

xlabel = 'Delay'
ylabel = 'Autocovariance'
title = 'Autocovariance of time series'
name = 'Autocovariance_low_alpha.pdf'

stem_fig(acf_xaxis, acf, xlabel, ylabel, title, name)

### Repeat the process for $\alpha=0.95$
We will repeat this process for a filter that produces signals that exhibit strong correlation between adjacent samples.

In [None]:
alpha = 0.95

y = noise_signal(alpha, L)

# Select the last P points in the output
points = y[-P:]

# Create space for running average
average = np.zeros(P)

# Compute the running average
for n in range(P):
    average[n] = np.sum(points[:n+1])/(n+1)

### Time series
Here we can see that the signal varies much more slowly than the previous example.  It can also be seen that the average does not necessarily converge to the true mean.  (As new random signals are produced each time, you may need to re-run the cell above, and the following cells to see this effect).

In [None]:
xlabel = 'Sample'
ylabel = 'Amplitude'
title = 'Time estimation of an average'
legend = ['Time series','Estimate of mean']
name = 'Time_average_high_alpha.pdf'

plot_fig(points, average, xlabel, ylabel, title, legend, name)

### Autocorrelation
The autocorrelation plot shows that adjacent samples are very strongly correlated.  This means that a much longer time series would be required in order to be able to get a good estimate of the mean.

In [None]:
acf_xaxis = np.arange(-M, M+1)
acf = (1-alpha) * np.power(alpha, (np.abs(acf_xaxis))) / (1+alpha)

xlabel = 'Delay'
ylabel = 'Autocovariance'
title = 'Autocovariance of time series'
name = 'Autocovariance_high_alpha.pdf'

stem_fig(acf_xaxis, acf, xlabel, ylabel, title, name)

© The University of Edinburgh: Produced by D. Laurenson, School of Engineering. Initial code conversion by Xing Zixiao.