# Surrogate data testing

The purpose of the present notebook is to detect the presence of nonlinearities in time series using surrogate data testing.

The idea behind the method of the surrogate data is to generate artificial time series, the surrogate data, that share statistics with the original data, i.e. equal power spectrum, but are of linear (stochastic) origin. If one can find a discriminating statistic that is is significantly different for the surrogate data one can reject the hypothesis that the original data is a linear process. Thereby one proves through contradiction the presence of a nonlinear process.

In [2]:
from plotly import offline as py
from plotly import graph_objs as go

py.init_notebook_mode(connected=True)

ts1 = np.genfromtxt('ts1.dat', delimiter='\n')
ts2 = np.genfromtxt('ts2.dat', delimiter='\n')

**How do you generate surrogate data?**

There exist different techniques that also test for different linearities:

1. Random Shuffle (RS)
2. Random Phases or Fourier Transform (RP or FT)
3. Amplitude Adjusted Fourier Transform (AAFT)
4. Iterative Amplitude Adjusted Fourier Transform (IAAFT)

### Random shuffle

The null hypothesis of the random shuffle surrogate is that the data to test is uncorrelated white noise.

One can obtain random shuffle surrogate data by applying random permutations to the original time series.

In [5]:
def random_shuffle(x):
    return np.random.permutation(x)

rs1a = random_shuffle(ts1)
rs1b = random_shuffle(ts1)
rs2a = random_shuffle(ts2)
rs2b = random_shuffle(ts2)

In [23]:
data = [
    go.Scatter(y=ts1, name='x1', line=dict(width=3)),
    go.Scatter(y=rs1a, name='rs1', line=dict(width=1)),
    go.Scatter(y=rs1b, name='rs2', line=dict(width=1)),
    
    go.Scatter(y=ts2, name='x2', xaxis='x2', yaxis='y2', line=dict(width=3)),
    go.Scatter(y=rs2a, name='rs1', xaxis='x2', yaxis='y2', line=dict(width=1)),
    go.Scatter(y=rs2b, name='rs2', xaxis='x2', yaxis='y2', line=dict(width=1))
]

figure = go.Figure(data=data, layout=go.Layout(
    title='Time series',
    showlegend=True,
    yaxis=dict(title='x[n]', domain=[0.55, 1.00]),
    yaxis2=dict(title='x[n]', anchor='x2', domain=[0.00, 0.45])
))

py.iplot(figure)

The random shuffle surrogates cannot be distinguished from the first time series. For the second time series the original data can be distinguished from the permutated surrogate data.

In [27]:
def autocorrelation(x):
    a = np.correlate(x, x, mode='full')
    
    return a[a.size // 2:]

ac1 = autocorrelation(ts1)
ac1a = autocorrelation(rs1a)
ac1b = autocorrelation(rs1b)

ac2 = autocorrelation(ts2)
ac2a = autocorrelation(rs2a)
ac2b = autocorrelation(rs2b)

In [28]:
data = [
    go.Scatter(y=ac1, name='x1', line=dict(width=3)),
    go.Scatter(y=ac1a, name='rs1', line=dict(width=1)),
    go.Scatter(y=ac1b, name='rs2', line=dict(width=1)),
    
    go.Scatter(y=ac2, name='x2', xaxis='x2', yaxis='y2', line=dict(width=3)),
    go.Scatter(y=ac2a, name='rs1', xaxis='x2', yaxis='y2', line=dict(width=1)),
    go.Scatter(y=ac2b, name='rs2', xaxis='x2', yaxis='y2', line=dict(width=1))
]

figure = go.Figure(data=data, layout=go.Layout(
    title='Autocorrelation',
    showlegend=True,
    yaxis=dict(title='x[n]', domain=[0.55, 1.00]),
    yaxis2=dict(title='x[n]', anchor='x2', domain=[0.00, 0.45])
))

py.iplot(figure)

We can clearly identify the random shuffle surrogates from the autocorrelation.

In [29]:
def power_spectrum(x):
    X = np.fft.rfft(x)

    return np.abs(X)**2

In [31]:
ps1 = power_spectrum(ts1)
ps1a = power_spectrum(rs1a)
ps1b = power_spectrum(rs1b)

ps2 = power_spectrum(ts2)
ps2a = power_spectrum(rs2a)
ps2b = power_spectrum(rs2b)

data = [
    go.Scatter(y=ps1, name='x1', line=dict(width=3)),
    go.Scatter(y=ps1a, name='rs1', line=dict(width=1)),
    go.Scatter(y=ps1b, name='rs2', line=dict(width=1)),
    
    go.Scatter(y=ps2, name='x2', xaxis='x2', yaxis='y2', line=dict(width=3)),
    go.Scatter(y=ps2a, name='rs1', xaxis='x2', yaxis='y2', line=dict(width=1)),
    go.Scatter(y=ps2b, name='rs2', xaxis='x2', yaxis='y2', line=dict(width=1))
]

figure = go.Figure(data=data, layout=go.Layout(
    title='Power spectrum',
    showlegend=True,
    yaxis=dict(title='x[n]', domain=[0.55, 1.00]),
    yaxis2=dict(title='x[n]', anchor='x2', domain=[0.00, 0.45])
))

py.iplot(figure)

The power spectrum can also be used to differentiate the random shuffle surrogates from the true data.

### Random phases

One can obtain random phases surrogate data by assigning (uniform) random phases to the Fourier transform of the true data. In this case the power spectrum of true and random phase surrogate data should equal.

The null hypothesis in this case is that the data originates from a linear Gaussian process.

In [97]:
def random_phases(x):
    N = len(x)
    X = np.fft.fft(x)
    
    # X[0] is the zero frequency contribution and corresponds to the mean amplitude,
    # thus there are N-1 values that contain information about the spectrum,
    # however, for a real signal we have symmetry in that X[k+1] = X[n-k]*,
    # therefore we only need (N-1)//2 random phases
    u = np.random.uniform(0, 2*np.pi, (N-1)//2)
    u = np.concatenate([
        np.flip(-u),
        np.array([np.arctan2(X[N//2].imag, X[N//2].real)]),
        u
    ])
    
    X[1:] = np.abs(X[1:]) * np.exp(1j * u)
    
    return np.fft.ifft(X).real

rp1a = random_phases(ts1)
rp1b = random_phases(ts1)

rp2a = random_phases(ts2)
rp2b = random_phases(ts2)

In [98]:
data = [
    go.Scatter(y=ts1, name='x1', line=dict(width=1)),
    go.Scatter(y=rp1a, name='rs1', line=dict(width=1)),
    go.Scatter(y=rp1b, name='rs2', line=dict(width=1)),
    
    go.Scatter(y=ts2, name='x2', xaxis='x2', yaxis='y2', line=dict(width=1)),
    go.Scatter(y=rp2a, name='rs1', xaxis='x2', yaxis='y2', line=dict(width=1)),
    go.Scatter(y=rp2b, name='rs2', xaxis='x2', yaxis='y2', line=dict(width=1))
]

figure = go.Figure(data=data, layout=go.Layout(
    title='Time series',
    showlegend=True,
    yaxis=dict(title='x[n]', domain=[0.55, 1.00]),
    yaxis2=dict(title='x[n]', anchor='x2', domain=[0.00, 0.45])
))

py.iplot(figure)

Time series again very difficult to distinguish the random phase surrogate data from the true data.

In [96]:
ps1 = power_spectrum(ts1)
ps1a = power_spectrum(rp1a)
ps1b = power_spectrum(rp1b)

ps2 = power_spectrum(ts2)
ps2a = power_spectrum(rp2a)
ps2b = power_spectrum(rp2b)

data = [
    go.Scatter(y=ps1, name='x1', line=dict(width=2)),
    go.Scatter(y=ps1a, name='rs1', line=dict(width=2)),
    go.Scatter(y=ps1b, name='rs2', line=dict(width=2)),
    
    go.Scatter(y=ps2, name='x2', xaxis='x2', yaxis='y2', line=dict(width=2)),
    go.Scatter(y=ps2a, name='rs1', xaxis='x2', yaxis='y2', line=dict(width=2)),
    go.Scatter(y=ps2b, name='rs2', xaxis='x2', yaxis='y2', line=dict(width=2))
]

figure = go.Figure(data=data, layout=go.Layout(
    title='Power spectrum',
    showlegend=True,
    yaxis=dict(title='x[n]', domain=[0.55, 1.00]),
    yaxis2=dict(title='x[n]', anchor='x2', domain=[0.00, 0.45])
))

py.iplot(figure)

As expected the power spectrum is an exact math.

In [99]:
ac1a = autocorrelation(rp1a)
ac1b = autocorrelation(rp1b)
ac2a = autocorrelation(rp2a)
ac2b = autocorrelation(rp2b)

data = [
    go.Scatter(y=ac1, name='x1', line=dict(width=3)),
    go.Scatter(y=ac1a, name='rs1', line=dict(width=1)),
    go.Scatter(y=ac1b, name='rs2', line=dict(width=1)),
    
    go.Scatter(y=ac2, name='x2', xaxis='x2', yaxis='y2', line=dict(width=3)),
    go.Scatter(y=ac2a, name='rs1', xaxis='x2', yaxis='y2', line=dict(width=1)),
    go.Scatter(y=ac2b, name='rs2', xaxis='x2', yaxis='y2', line=dict(width=1))
]

figure = go.Figure(data=data, layout=go.Layout(
    title='Autocorrelation',
    showlegend=True,
    yaxis=dict(title='x[n]', domain=[0.55, 1.00]),
    yaxis2=dict(title='x[n]', anchor='x2', domain=[0.00, 0.45])
))

py.iplot(figure)

The autocorrelation function seems similar especially for the second dataset.

### Amplitude adjusted Fourier transform

The AAFT surrogate null hypothesis is that the process originates from a linear stochastic process that has undergone a static nonlinear transformation.

In order to obtain AAFT surrogate data from the true data one has to:

1. Scale true data to Gaussian distributions (Gaussianization)
2. Multiply random phases
3. Reverse Gaussianization



In [161]:
from scipy.stats import boxcox
from scipy.special import inv_boxcox

def aaft(x):
    u = np.abs(x.min()) + 1
    y, lamda = boxcox(x + u)
    y = random_phases(y)
    
    return inv_boxcox(y, lamda) - u

In [162]:
aaft1a = aaft(ts1)
aaft1b = aaft(ts1)
aaft2a = aaft(ts2)
aaft2b = aaft(ts2)

In [163]:
data = [
    go.Scatter(y=ts1, name='x1', line=dict(width=1)),
    go.Scatter(y=aaft1a, name='rs1', line=dict(width=1)),
    go.Scatter(y=aaft1b, name='rs2', line=dict(width=1)),
    
    go.Scatter(y=ts2, name='x2', xaxis='x2', yaxis='y2', line=dict(width=1)),
    go.Scatter(y=aaft2a, name='rs1', xaxis='x2', yaxis='y2', line=dict(width=1)),
    go.Scatter(y=aaft2b, name='rs2', xaxis='x2', yaxis='y2', line=dict(width=1))
]

figure = go.Figure(data=data, layout=go.Layout(
    title='Time series',
    showlegend=True,
    yaxis=dict(title='x[n]', domain=[0.55, 1.00]),
    yaxis2=dict(title='x[n]', anchor='x2', domain=[0.00, 0.45])
))

py.iplot(figure)

In [164]:
ps1 = power_spectrum(ts1)
ps1a = power_spectrum(aaft1a)
ps1b = power_spectrum(aaft1b)

ps2 = power_spectrum(ts2)
ps2a = power_spectrum(aaft2a)
ps2b = power_spectrum(aaft2b)

data = [
    go.Scatter(y=ps1, name='x1', line=dict(width=3)),
    go.Scatter(y=ps1a, name='rs1', line=dict(width=1)),
    go.Scatter(y=ps1b, name='rs2', line=dict(width=1)),
    
    go.Scatter(y=ps2, name='x2', xaxis='x2', yaxis='y2', line=dict(width=3)),
    go.Scatter(y=ps2a, name='rs1', xaxis='x2', yaxis='y2', line=dict(width=1)),
    go.Scatter(y=ps2b, name='rs2', xaxis='x2', yaxis='y2', line=dict(width=1))
]

figure = go.Figure(data=data, layout=go.Layout(
    title='Power spectrum',
    showlegend=True,
    yaxis=dict(title='x[n]', domain=[0.55, 1.00]),
    yaxis2=dict(title='x[n]', anchor='x2', domain=[0.00, 0.45])
))

py.iplot(figure)

In [165]:
ac1a = autocorrelation(aaft1a)
ac1b = autocorrelation(aaft1b)
ac2a = autocorrelation(aaft2a)
ac2b = autocorrelation(aaft2b)

data = [
    go.Scatter(y=ac1, name='x1', line=dict(width=3)),
    go.Scatter(y=ac1a, name='rs1', line=dict(width=1)),
    go.Scatter(y=ac1b, name='rs2', line=dict(width=1)),
    
    go.Scatter(y=ac2, name='x2', xaxis='x2', yaxis='y2', line=dict(width=3)),
    go.Scatter(y=ac2a, name='rs1', xaxis='x2', yaxis='y2', line=dict(width=1)),
    go.Scatter(y=ac2b, name='rs2', xaxis='x2', yaxis='y2', line=dict(width=1))
]

figure = go.Figure(data=data, layout=go.Layout(
    title='Autocorrelation',
    showlegend=True,
    yaxis=dict(title='x[n]', domain=[0.55, 1.00]),
    yaxis2=dict(title='x[n]', anchor='x2', domain=[0.00, 0.45])
))

py.iplot(figure)

The AAFT surrogate data looks amazingly similar to the true data.

### Iterative Amplitude adjusted Fourier transform

The IAAFT surrogate extends the AAFT scheme in that one iterates the procedure of the AAFT until the autocorrelation function is similar to the true data. As we already found quite good results for the AAFT approach we will not continue this route.

### Evaluation

We visually inspected three different surrogate methods with which we wanted to test the null hypothesis that the process responsible for the data is a

1. uncorrelated white noise process (RS)
2. linear Gaussian process (RP)
3. linear Gaussian process with a static nonlinear transform (AFFT)

Except for the AFFT surrogates one was able to visually distinguish the original data easily from the surrogate data by inspecting autocorrelation and power spectrum. However, we are missing a quantitaive assessment.