# Convergent Cross Mapping (CCM) Algorithm
This notebook is an attempt at replicate the results obtained by Sugihara et. al. in his [2012 Science publication](http://science.sciencemag.org/content/338/6106/496). Subsequently, the algorithm's performance based on various input parameters was investigated. 

## Section 0: Library importing, environment configurations and data importing

### Import relevant packages

In [1]:
# Importing packages
import numpy as np
import os
# Plot.ly visualisations
import plotly
import plotly.offline as pyo # Plot.ly visualisations
import plotly.graph_objs as go # Plot.ly visualisations

### Configure environment

In [2]:
%config InlineBackend.figure_format = 'retina'
np.set_printoptions(precision=3)

# Activate Plotly Offline for Jupyter
pyo.init_notebook_mode(connected=True)

### Define data generation functions

In [3]:
def generate_two_species(gamma_xy, gamma_yx, r_x=3.7, r_y=3.8, N=1000, randomise=False):
    '''
    Generate values for a 2-species predator-prey model.
    '''
    def switch(a, b):
        '''
        Switches the values of 'a' and 'b'
        '''
        return b, a
    
    def generate_random_periods(N, low, high):
        '''
        Generate an array containing cumulative period values of when the coupling parameters should be switch.
        Inputs:
            N:    Length of output
            low:  Lower bound for random beta sampling
            high: Upper bound for random beta sampling
        '''
        # Generate an array of random integers between low and high sampled from a beta distribution
        temp = np.array([])
        while np.sum(temp) < N:
            temp = np.append(temp, np.random.randint(low, high))
        
        # Create cumulative period length counter
        n = 0
        for i in range(len(temp) - 1):
            temp[i + 1] += temp[i]
        temp[-1] = N
        
        return temp
    
    
    ###################
    # Function begins #
    ###################
    
    # Add 20 time points to N to account for burn-in
    N += 20
    
    ex = {
        'X': np.zeros((N,)),
        'Y': np.zeros((N,)),
        'gamma_xy': np.zeros((N,)),
        'gamma_yx': np.zeros((N,)),
    }
    
    ex['X'][0] = np.random.uniform(0, .1)
    ex['Y'][0] = np.random.uniform(0, .1)
    
    # If is_random = True, generate array of random time indices to switch gamma parameters:
    if randomise == True:
        periods = generate_random_periods(N, 50, 200)
        ex['periods'] = periods
    
    for k in range(N - 1):
        if (randomise == True) and (k == periods[0] - 1):
            gamma_xy, gamma_yx = switch(gamma_xy, gamma_yx)
            periods = np.delete(periods, 0)
            
        ex['X'][k + 1] = ex['X'][k] * (r_x - r_x * ex['X'][k] - gamma_xy * ex['Y'][k])
        ex['Y'][k + 1] = ex['Y'][k] * (r_y - r_y * ex['Y'][k] - gamma_yx * ex['X'][k])
        ex['gamma_xy'][k + 1] = gamma_xy
        ex['gamma_yx'][k + 1] = gamma_yx
        
    ex['X'] = ex['X'][20:]
    ex['Y'] = ex['Y'][20:]
    ex['gamma_xy'] = ex['gamma_xy'][20:]
    ex['gamma_yx'] = ex['gamma_yx'][20:] 
        
    return ex

def generate_delayed_vector(data, embed_dim, delay=1):
    '''
    Generate a delayed embedding vector of the data.
    Input:
        data:      A dictionary containing keys 'X', 'Y', 'gamma_xy', and 'gamma-yx'.
        embed_dim: Embedding dimensions (int)
        delay:     Number of samples between each time series point (int)
    '''
    
    assert embed_dim > 1
    assert delay >= 1

    N = len(data['X'])
    N_embed = N - (embed_dim - 1) * delay

    data_embed = {
        'X': np.zeros((N_embed, embed_dim)),
        'Y': np.zeros((N_embed, embed_dim)),
    }

    for i in range(embed_dim):
        data_embed['X'][:,i] = data['X'][(i * delay):(N_embed + i * delay)]
        data_embed['Y'][:,i] = data['Y'][(i * delay):(N_embed + i * delay)]
        
    data_embed['gamma_xy'] = data['gamma_xy']
    data_embed['gamma_yx'] = data['gamma_yx']
        
    return data_embed

### Define plotting functions

In [4]:
def bar_chart(xval, yval, xtitle, ytitle, title):
    '''
    Generate a bar chart.
    Inputs:
        xval: 1-D array of values for x-axis
        yval: 1-D array of values for y-axis
        xtitle: x-axis title (string)
        ytitle: y-axis title (string)
        title: Chart title (string)
    '''
    # Ensure xval and yval have the same length
    assert len(xval) == len(yval)
    
    # Ensure title variables are of type string
    for var in [xtitle, ytitle, title]:
        assert type(var) is str
    
    # Create trace
    trace = go.Bar(
        x = xval,
        y = yval
    )
    
    layout = go.Layout(
        title = title,
        xaxis = {'title': xtitle},
        yaxis = {'title': ytitle},
        barmode = 'group'
    )
    
    pyo.iplot(go.Figure(data=go.Data([trace]), layout=layout))

### Define CCM algorithm

In [5]:
def CCM(data, target, k, attractor_viz=False, prediction_corr_viz=False):
    '''
    Perform convergent cross-mapping (CCM) algorithm described in paper.
    Inputs:
        data:   Data to perform k-NN, a numpy array (N x P)
        target: Target values to perform prediction (N x P)
        k:      Number of nearest neighbours (scalar)
    Returns:
        predictions: Predicted values (1-D array)
        causality:   Calculated causality from correlation plot (float)
    '''
    def euclidean_dist(A, B=None):
        '''
        Calculate the euclidean distance for rows in matrix A and rows in matrix B.
        If B is None, calculates distances for rows between matrix A.
        Inputs:
            A: A matrix (a x P)
            B: A matrix (b x k x P)
        Returns:
            A distance matrix (a x b), indicating the distance of all non-i-th point to the i-th point. 
        ''' 
        # Define input matrices with expanded dimensions
        A_expanded = np.expand_dims(A, 2)
        
        # Calculate distance of each point and every other point
        if B is None:
            return np.sqrt(np.sum(np.square(A_expanded - np.transpose(A_expanded, (2, 1, 0))), axis=1))
        else:
            return np.sqrt(np.sum(np.square(np.transpose(A_expanded, (0,2,1)) - B), axis=2))
   
    def kNN(k, data):
        '''
        Return the nearest neighbours to each row in data in the form of a responsibility matrix.
        Inputs:
            k:    Number of nearest neighbours (scalar)
            data: Data to perform k-NN, a numpy array (N x P)
        Returns:
            A responsibility matrix (N x k), listing the indices of the k-nearest neighbours for each row
        '''

        def responsibilities(k, distances):
            '''
            Finds the k-nearest neighbours to each point by index.
            Inputs:
                k:         Number of nearest neighbours (scalar)
                distances: A distance matrix (N x N)
            Returns:
                A responsibility matrix (N x k), listing the indices of the k-nearest neighbours for each row
            '''
            return np.argsort(distances)[:,1:(k + 1)]

        return responsibilities(k, euclidean_dist(data))

    def predict_target(data, target, responsibilities):
        '''
        Performa a prediction of the target based on a weighting of contemporaneous neighbours of data.
        Inputs:
            data:             Data values (N x P)
            target:           Target values to perform prediction (N x P)
            responsibilities: A responsibility matrix (N x k)
        Returns:
            An array of predicted target values (N)
        '''

        def calculate_weights(data, responsibilities):
            '''
            Calculate weights based on the k-nearest neighbours
            Inputs:
                data:             Data values (N x P)
                responsibilities: A responsibility matrix (N x k)
            Returns:
                A matrix of weights (N x k)
            '''
            # Obtain shape of responsibilities
            N, k = responsibilities.shape

            # Calculate values for numerator
            for i in range(k):
                numerator = np.exp( - np.divide(euclidean_dist(data, data[responsibilities]), \
                                                euclidean_dist(data, data[responsibilities])[:,0][:, np.newaxis]))

            # Calculate denominator
            denominator = np.sum(numerator, axis=1, keepdims=True)

            # Calculate and return weights
            return np.divide(numerator, denominator)
        
        weights = calculate_weights(data, responsibilities)
        return np.sum(target[responsibilities] * np.expand_dims(weights, axis=2), axis=1)
    
    def visualise_attractor(data, target, responsibilities, predictions):
        '''
        Produce Plotly animation on a 1 x 2 subplot to visualise nearest neighbours of data and target.
        Inputs:
            data:        Data values (N x P)
            target:      Target values (N x P)
            responsibilities: A responsibility matrix (N x k)
            predictions: Predicted values for target (N x P)
        '''
        fig = plotly.tools.make_subplots(rows=1, cols=2, specs=[[{'is_3d': True}, {'is_3d': True}]])
        
        # Define colour list
        colour_list = np.array(['#b3b3b3', '#0f3957', '#1f77b4', '#ff7f0e'])

        # Define blank figure
        figure = {
            'data': [],
            'layout': {},
            'frames': []
        }

        # Create layout
        figure['layout'] = {
            'width': 1000,
            'height': 700,
            'scene1': {
                'domain': {
                    'x': [0, 0.45],
                    'y': [0., 1.]
                }
            },
            'scene2': {
                'domain': {
                    'x': [0.55, 1.],
                    'y': [0., 1.]
                }
            },
            'title': 'Visualising Nearest Neighbours on Attractors',
            'showlegend': False
        }
        
        # Define buttons
        figure['layout']['updatemenus'] = [
            {
                'buttons': [
                    {
                        'args': [None, {'frame': {'duration': 1000, 'redraw': False},
                                 'fromcurrent': True, 'transition': {'duration': 0, 'easing': 'quadratic-in-out'}}],
                        'label': 'Play',
                        'method': 'animate'
                    },
                    {
                        'args': [[None], {'frame': {'duration': 0, 'redraw': False}, 'mode': 'immediate',
                        'transition': {'duration': 0}}],
                        'label': 'Pause',
                        'method': 'animate'
                    }
                ],
                'direction': 'left',
                'pad': {'r': 10, 't': 87},
                'showactive': False,
                'type': 'buttons',
                'x': 0.1,
                'xanchor': 'right',
                'y': 0,
                'yanchor': 'top'
            }
        ]
        
        # Define slider dictionary
        slider_dict = {
            'active': 0, # Slider knob's relative starting location
            'pad': {'b': 10, 't': 50}, # Bottom and top padding
            'len': 0.9, # Slider length
            'x': 0.1, # Slider x-position
            'y': 0, # Slider y-position
            'yanchor': 'top', 
            'xanchor': 'left',
            'currentvalue': { # Displays current value selected by slider
                'font': {'size': 20},
                'prefix': 'Time index: ',
                'visible': True,
                'xanchor': 'right'
            },
            'transition': {'duration': 300, 'easing': 'cubic-in-out'},
            'steps': []
        }

        # Create frames
        for i in range(len(predictions)):
            # Define a dictionary for each frame
            frame = {
                'data': [],
                'name': str(i + 1) # Used to connect each frame to slider value
            }
            
            # Create raw data trace
            data_trace = {
                'x': data[:,0],
                'y': data[:,1],
                'z': data[:,2],
                'mode': 'markers',
                'type': 'scatter3d',
                'hoverinfo': 'none',
                'scene': 'scene1',
                'marker': {
                    'size': 4,
                    'color': colour_list[0]
                }
            }
            
            # Create source point trace
            source_trace = {
                'x': [data[i,0]],
                'y': [data[i,1]],
                'z': [data[i,2]],
                'mode': 'markers',
                'type': 'scatter3d',
                'name': 'Source',
                'scene': 'scene1',
                'hoverinfo': 'name',
                'marker': {
                    'size': 10,
                    'symbol': 'diamond',
                    'color': colour_list[2],
                    'line': {'width': 1}
                }
            }
            
            # Create source neighbours trace
            source_neighbour_trace = {
                'x': data[responsibilities[i,:],0],
                'y': data[responsibilities[i,:],1],
                'z': data[responsibilities[i,:],2],
                'mode': 'markers',
                'type': 'scatter3d',
                'name': 'Source Neighbour',
                'scene': 'scene1',
                'hoverinfo': 'name',
                'marker': {
                    'size': 4,
                    'color': colour_list[2],
                }
            }
            
            # Create target trace
            target_trace = {
                'x': target[:,0],
                'y': target[:,1],
                'z': target[:,2],
                'mode': 'markers',
                'type': 'scatter3d',
                'scene': 'scene2',
                'hoverinfo': 'none',
                'marker': {
                    'size': 4,
                    'color': colour_list[0]
                }
            }
            
            # Create destination point trace
            actual_destination_trace = {
                'x': [target[i,0]],
                'y': [target[i,1]],
                'z': [target[i,2]],
                'mode': 'markers',
                'type': 'scatter3d',
                'name': 'Actual Target',
                'scene': 'scene2',
                'hoverinfo': 'name',
                'marker': {
                    'size': 12,
                    'symbol': 'diamond',
                    'color': colour_list[3],
                    'line': {'width': 2}
                }
            }
            
            # Create destination neighbours trace
            destination_neighbour_trace = {
                'x': target[responsibilities[i,:],0],
                'y': target[responsibilities[i,:],1],
                'z': target[responsibilities[i,:],2],
                'mode': 'markers',
                'type': 'scatter3d',
                'name': 'Target Neighbours',
                'scene': 'scene2',
                'hoverinfo': 'name',
                'marker': {
                    'size': 4,
                    'color': colour_list[2],
                }
            } 
            
            # Create predicted destination trace
            predicted_destination_trace = {
                'x': [predictions[i,0]],
                'y': [predictions[i,1]],
                'z': [predictions[i,2]],
                'mode': 'markers',
                'type': 'scatter3d',
                'name': 'Predicted Target',
                'scene': 'scene2',
                'hoverinfo': 'name',
                'marker': {
                    'size': 10,
                    'symbol': 'diamond',
                    'color': colour_list[2],
                    'line': {'width': 2}
                }
            }

            # Append traces to frame
            for trace in [data_trace, source_trace, source_neighbour_trace, \
                          target_trace, destination_neighbour_trace, \
                          actual_destination_trace, predicted_destination_trace]:
                frame['data'].append(trace)
            
            # Append frame to figure
            figure['frames'].append(frame)
            
            # Define slider step
            slider_step = {
                'args': [
                    [i + 1],
                    {'frame': {'duration': 300, 'redraw': False},
                     'mode': 'immediate',
                     'transition': {'duration': 0}}
                ],
                'label': i + 1,
                'method': 'animate'
            }
            
            # Append slider step to slider dictionary
            slider_dict['steps'].append(slider_step)
            
        # Add sliders to layout
        figure['layout']['sliders'] = [slider_dict]
        
        # Define figure['data']
        figure['data'] = figure['frames'][0]['data']
        
        # Save snapshots locally
        pyo.iplot(figure)
            
            
            
            
            
            
        
    
    def visualise_predictions(target, predictions):
        '''
        Create a scatterplot visualising predictions vs. target.
        Inputs:
            target:      Target values (N x P)
            predictions: Prediction values (N x P)
        '''
        trace = go.Scatter(
            x = target[:,-1],
            y = predictions[:,-1],
            mode = 'markers',
        )
        
        line_trace = go.Scatter(
            x = [0, 1],
            y = [0, 1],
            mode = 'lines',
            hoverinfo = 'none',
            line = {
                'color': '#000000',
                'dash': 'dash',
                'width': 3
            }
        )
        
        layout = go.Layout(
            title = 'Correlation Plot (r = {})'\
                    .format(np.round(np.corrcoef(target[:,-1], predictions[:,-1])[0,-1], 3)),
            showlegend = False,
            height = 800,
            width = 700,
            xaxis = {'title': 'Target',},
            yaxis = {'title': 'Prediction', 'scaleanchor': 'x'},
        )
        
        figure = go.Figure(data=go.Data([trace, line_trace]), layout=layout)
        pyo.iplot(figure)
        
    
    ###################
    # Function begins #
    ###################
    
    # Find indices of k-nearest neighbours
    responsibilities = kNN(k, data)
    print 'k-NN complete!'
    
    # Calculate predicted target values
    predictions = predict_target(data, target, responsibilities)
    print 'Predictions complete!'
    
    # Create interactive attractor animation
    if attractor_viz == True:
        visualise_attractor(data, target, responsibilities, predictions)
        
    # Create correlation plot
    if prediction_corr_viz == True:
        visualise_predictions(target, predictions)
        
    # Calculate causality based on correlation
    causality = np.round(np.corrcoef(target[:,-1], predictions[:,-1])[0,-1], 3)
    
    return predictions, causality

## Section 1: Convergent Cross Mapping (CCM) Algorithm for Detecting Causality of Non-linear Dynamical Time Series
The time series used to generate the values stem from the following 2-species predator-prey equation:

$
\begin{align}
X(t + 1) &= X(t) \left[ r_x - r_x X(t) + \gamma_{xy} Y(t) \right] \\
Y(t + 1) &= Y(t) \left[ r_y - r_y Y(t) + \gamma_{yx} X(t) \right]
\end{align}
$

The initial conditions $X(0)$ and $Y(0)$ are obtained from random uniform distribution between 0 and 1 (i.e. $U(0,1)$).

Delayed time-series $\mathbf{Y}$ and $\mathbf{X}$, each with delayed embedding dimensions and lags of $L = 2$ and $\tau = 1$ were represented using variables '`data`' and '`target`' respectively.

The regulation parameters for all examples are set to be $r_x = 3.7$ and $r_y = 3.8$. 

The example below is intended to be for a unidirectional causal system (i.e. $\mathbf{X} \rightarrow \mathbf{Y}$). As such, the coupling parameters are set such that:

$
\begin{align}
\gamma_{xy} &= 0 \\
\gamma_{yx} &= 0.32
\end{align}
$

The measure of causality cna be inferred from the Pearson correlation coefficient, $r$, from each plot.

### Generate time-series $(N = 10,000$, $L = 2$ and $\tau = 1)$

In [6]:
# Generate data
source = generate_delayed_vector(generate_two_species(gamma_xy=0, gamma_yx=0.32, N=10001), embed_dim=2)
                                
# Create data and target variables
data = source['Y']
target = source['X']

print 'Data shape: ', data.shape

Data shape:  (10000, 2)


### Detect causality of $\mathbf{X} \rightarrow \mathbf{Y}$

In [7]:
CCM(data=data, target=target, k=3, prediction_corr_viz=True);

k-NN complete!
Predictions complete!


### Detect causality of $\mathbf{Y} \rightarrow \mathbf{X}$

In [8]:
CCM(data=target, target=data, k=3, prediction_corr_viz=True);

k-NN complete!
Predictions complete!


### Section Summary
As seen above, the CCM algorithm was able to detect the unidirectional causality of $\mathbf{X} \rightarrow \mathbf{Y}$, and not vice-versa. Thus, the algoirthm is able to quantify the causal relationship with a calculated correlation coefficient of $r = 0.795$.

Subsequently, the algorithm's performance based on various inputs parameters were investigated. The parameters being investigated include:
1. Time series length, $N$
2. Number of nearest neighbours, $k$
3. Embedding dimension, $L$
4. Embedding delay, $\tau$

## Section 2: CCM Algorithm Performance with Time Series Length

The performance of the CCM algorithm is tested with varying time series lengths of the identical dynamical system. The causalities of $\mathbf{X} \rightarrow \mathbf{Y}$ are measured in the subsequent illustrations.

In [9]:
# Generate variable to store calculated causality
causality = []

### Detect causality of $\mathbf{X} \rightarrow \mathbf{Y}$ $(N = 100)$

In [10]:
# Generate data
source = generate_delayed_vector(generate_two_species(gamma_xy=0, gamma_yx=0.32, N=101), embed_dim=2)
                                
# Create data and target variables
data = source['Y']
target = source['X']

print 'Data shape: ', data.shape

Data shape:  (100, 2)


In [11]:
_, temp = CCM(data=data, target=target, k=3, prediction_corr_viz=True)
causality.append(temp)

k-NN complete!
Predictions complete!


### Detect causality of $\mathbf{X} \rightarrow \mathbf{Y}$ $(N = 1,000)$

In [12]:
# Generate data
source = generate_delayed_vector(generate_two_species(gamma_xy=0, gamma_yx=0.32, N=1001), embed_dim=2)
                                
# Create data and target variables
data = source['Y']
target = source['X']

print 'Data shape: ', data.shape

Data shape:  (1000, 2)


In [13]:
_, temp = CCM(data=data, target=target, k=3, prediction_corr_viz=True)
causality.append(temp)

k-NN complete!
Predictions complete!


### Detect causality of $\mathbf{X} \rightarrow \mathbf{Y}$ $(N = 10,000)$

In [14]:
# Generate data
source = generate_delayed_vector(generate_two_species(gamma_xy=0, gamma_yx=0.32, N=10001), embed_dim=2)
                                
# Create data and target variables
data = source['Y']
target = source['X']

print 'Data shape: ', data.shape

Data shape:  (10000, 2)


In [15]:
_, temp = CCM(data=data, target=target, k=3, prediction_corr_viz=True)
causality.append(temp)

k-NN complete!
Predictions complete!


### Section Summary
Determining causality becomes increasingly easier and deterministic with larger time-series lengths. It is worth noting that for the $N = 100$ example, $r$ varied quite significantly between around 0.67 to 0.82 depending on initial conditions, whereas the other cases have much less variability in their calculated $r$ values.

In [16]:
bar_chart(xval=['N=100', 'N=1,000', 'N=10,000'], \
          yval=causality, \
          xtitle='Time-Series Length, N', \
          ytitle='Calculated causality, r', \
          title='CCM Performance with Time-Series Length (N)')

## Section 3: CCM Algorithm Performance with Number of Nearest Neighbours

The performance of the CCM algorithm is tested with varying values of $k$ in the k-Nearest Neighbour algorithm. The causalities of $\mathbf{X} \rightarrow \mathbf{Y}$ are measured in the subsequent illustrations.

In [17]:
# Generate variable to store calculated causality
causality = []

### Generate time-series $(N = 100)$

In [18]:
# Generate data
source = generate_delayed_vector(generate_two_species(gamma_xy=0, gamma_yx=0.32, N=101), embed_dim=2)
                                
# Create data and target variables
data = source['Y']
target = source['X']

print 'Data shape: ', data.shape

Data shape:  (100, 2)


### Detect causality of $\mathbf{X} \rightarrow \mathbf{Y}$  $(k = 1, 2, 3, 4, 5)$

In [19]:
_, temp = CCM(data=data, target=target, k=1, prediction_corr_viz=False)
causality.append(temp)

k-NN complete!
Predictions complete!


In [20]:
_, temp = CCM(data=data, target=target, k=2, prediction_corr_viz=False)
causality.append(temp)

k-NN complete!
Predictions complete!


In [21]:
_, temp = CCM(data=data, target=target, k=3, prediction_corr_viz=False)
causality.append(temp)

k-NN complete!
Predictions complete!


In [22]:
_, temp = CCM(data=data, target=target, k=4, prediction_corr_viz=False);
causality.append(temp)

k-NN complete!
Predictions complete!


In [23]:
_, temp = CCM(data=data, target=target, k=5, prediction_corr_viz=False);
causality.append(temp)

k-NN complete!
Predictions complete!


### Section Summary
The number of nearest neighbour, $k$, has a weak effect on determining causality.

In [24]:
bar_chart(xval=[1, 2, 3, 4, 5], \
          yval=causality, \
          xtitle='Number of nearest neigbours, k', \
          ytitle='Calculated causality, r', \
          title='CCM Performance with Number of Nearest Neighbours (k) for N = 100')

## Section 4: CCM Algorithm Performance with Embedding Dimension

The performance of the CCM algorithm is investigated with various embedding dimensions, $L$. The causalities of $\mathbf{X} \rightarrow \mathbf{Y}$ are measured in the subsequent illustrations.

Values of $L = 2, 3, 4, 5, 6$ were investigated, with $N = 100$.

In [25]:
# Generate variable to store calculated causality
causality = []

### Generate time-series

In [26]:
# Generate raw data
raw = generate_two_species(gamma_xy=0, gamma_yx=0.32, N=101)

### Detect causality of $\mathbf{X} \rightarrow \mathbf{Y}$ $(L = 2)$

In [27]:
# Generate source data
source = generate_delayed_vector(raw, embed_dim=2)
                                
# Create data and target variables
data = source['Y']
target = source['X']

print 'Data shape: ', data.shape

Data shape:  (100, 2)


In [28]:
_, temp = CCM(data=data, target=target, k=3, prediction_corr_viz=False);
causality.append(temp)

k-NN complete!
Predictions complete!


### Detect causality of $\mathbf{X} \rightarrow \mathbf{Y}$ $(L = 3)$

In [29]:
# Generate data
source = generate_delayed_vector(raw, embed_dim=3)
                                
# Create data and target variables
data = source['Y']
target = source['X']

print 'Data shape: ', data.shape

Data shape:  (99, 3)


In [30]:
_, temp = CCM(data=data, target=target, k=3, prediction_corr_viz=False);
causality.append(temp)

k-NN complete!
Predictions complete!


### Detect causality of $\mathbf{X} \rightarrow \mathbf{Y}$ $(L = 4)$

In [31]:
# Generate data
source = generate_delayed_vector(raw, embed_dim=4)
                                
# Create data and target variables
data = source['Y']
target = source['X']

print 'Data shape: ', data.shape

Data shape:  (98, 4)


In [32]:
_, temp = CCM(data=data, target=target, k=3, prediction_corr_viz=False);
causality.append(temp)

k-NN complete!
Predictions complete!


### Detect causality of $\mathbf{X} \rightarrow \mathbf{Y}$ $(L = 5)$

In [33]:
# Generate data
source = generate_delayed_vector(raw, embed_dim=5)
                                
# Create data and target variables
data = source['Y']
target = source['X']

print 'Data shape: ', data.shape

Data shape:  (97, 5)


In [34]:
_, temp = CCM(data=data, target=target, k=3, prediction_corr_viz=False);
causality.append(temp)

k-NN complete!
Predictions complete!


### Detect causality of $\mathbf{X} \rightarrow \mathbf{Y}$ $(L = 6)$

In [35]:
# Generate data
source = generate_delayed_vector(raw, embed_dim=6)
                                
# Create data and target variables
data = source['Y']
target = source['X']

print 'Data shape: ', data.shape

Data shape:  (96, 6)


In [36]:
_, temp = CCM(data=data, target=target, k=3, prediction_corr_viz=False);
causality.append(temp)

k-NN complete!
Predictions complete!


### Section Summary
Increasing embedding dimensions, $L$, has a decreasing effect on determining causality. The degree of that effect, varies significantly with the dataset. At times, it's effect is miniscule, and at other times, it effects the results significantly.

In [37]:
bar_chart(xval=[2, 3, 4, 5, 6], \
          yval=causality, \
          xtitle='Embedding dimensions, L', \
          ytitle='Calculated causality, r', \
          title='CCM Performance with Embedding Dimension (L) for N = 100')

## Section 5: CCM Algorithm Performance with Embedding Delay

The performance of the CCM algorithm is investigated with various embedding delays, $\tau$. The causalities of $\mathbf{X} \rightarrow \mathbf{Y}$ are measured in the subsequent illustrations.

Values of $\tau = 1, 2, 3, 4, 5, 10, 100$ were investigated, with $N = 100$.

In [38]:
# Generate variable to store calculated causality
causality = []

### Generate time-series

In [39]:
# Generate raw data
raw = generate_two_species(gamma_xy=0, gamma_yx=0.32, N=101)

### Detect causality of $\mathbf{X} \rightarrow \mathbf{Y}$ $(\tau = 1)$

In [40]:
# Generate source data
source = generate_delayed_vector(raw, embed_dim=2, delay=1)
                                
# Create data and target variables
data = source['Y']
target = source['X']

print 'Data shape: ', data.shape

Data shape:  (100, 2)


In [41]:
_, temp = CCM(data=data, target=target, k=3, prediction_corr_viz=False);
causality.append(temp)

k-NN complete!
Predictions complete!


### Detect causality of $\mathbf{X} \rightarrow \mathbf{Y}$ $(\tau = 2)$

In [42]:
# Generate source data
source = generate_delayed_vector(raw, embed_dim=2, delay=2)
                                
# Create data and target variables
data = source['Y']
target = source['X']

print 'Data shape: ', data.shape

Data shape:  (99, 2)


In [43]:
_, temp = CCM(data=data, target=target, k=3, prediction_corr_viz=False);
causality.append(temp)

k-NN complete!
Predictions complete!


### Detect causality of $\mathbf{X} \rightarrow \mathbf{Y}$ $(\tau = 3)$

In [44]:
# Generate source data
source = generate_delayed_vector(raw, embed_dim=2, delay=3)
                                
# Create data and target variables
data = source['Y']
target = source['X']

print 'Data shape: ', data.shape

Data shape:  (98, 2)


In [45]:
_, temp = CCM(data=data, target=target, k=3, prediction_corr_viz=False);
causality.append(temp)

k-NN complete!
Predictions complete!


### Detect causality of $\mathbf{X} \rightarrow \mathbf{Y}$ $(\tau = 4)$

In [46]:
# Generate source data
source = generate_delayed_vector(raw, embed_dim=2, delay=4)
                                
# Create data and target variables
data = source['Y']
target = source['X']

print 'Data shape: ', data.shape

Data shape:  (97, 2)


In [47]:
_, temp = CCM(data=data, target=target, k=3, prediction_corr_viz=False);
causality.append(temp)

k-NN complete!
Predictions complete!


### Detect causality of $\mathbf{X} \rightarrow \mathbf{Y}$ $(\tau = 5)$

In [48]:
# Generate source data
source = generate_delayed_vector(raw, embed_dim=2, delay=5)
                                
# Create data and target variables
data = source['Y']
target = source['X']

print 'Data shape: ', data.shape

Data shape:  (96, 2)


In [49]:
_, temp = CCM(data=data, target=target, k=3, prediction_corr_viz=False);
causality.append(temp)

k-NN complete!
Predictions complete!


### Detect causality of $\mathbf{X} \rightarrow \mathbf{Y}$ $(\tau = 10)$

In [50]:
# Generate source data
source = generate_delayed_vector(raw, embed_dim=2, delay=10)
                                
# Create data and target variables
data = source['Y']
target = source['X']

print 'Data shape: ', data.shape

Data shape:  (91, 2)


In [51]:
_, temp = CCM(data=data, target=target, k=3, prediction_corr_viz=False);
causality.append(temp)

k-NN complete!
Predictions complete!


### Detect causality of $\mathbf{X} \rightarrow \mathbf{Y}$ $(\tau = 50)$

In [52]:
# Generate source data
source = generate_delayed_vector(raw, embed_dim=2, delay=50)
                                
# Create data and target variables
data = source['Y']
target = source['X']

print 'Data shape: ', data.shape

Data shape:  (51, 2)


In [53]:
_, temp = CCM(data=data, target=target, k=3, prediction_corr_viz=False);
causality.append(temp)

k-NN complete!
Predictions complete!


### Section Summary
The embedding dimensions, $\tau$, plays an important role in affecting the calculated causality. Empirically, the graph here shows $\tau$ having a negatively-correlated relationship to $r$. Yet, based on literature, the optimal value of $\tau$ should be the one that leads to the first local minimum of mutual information in the delayed embedding vector.

In [54]:
bar_chart(xval=['=1', '=2', '=3', '=4', '=5', '=10', '=50'], \
          yval=causality, \
          xtitle='Embedding delay, tau', \
          ytitle='Calculated causality, r', \
          title='CCM Performance with Embedding Delay (tau) for N = 100')

## Summary
This notebook has demonstrated that causality can be detected from time-series generated from a non-linear, chaotic dynamical system. The effect of four input parameters on the algorithm's ability to detect causality was investigated. The results from the aforementioned investigation is summarised as follows:

1. *Time series length, $N$*: Increasing value of $N$ allows for more accurate determination of causality. 
2. *Number of nearest neighbours, $k$*: Has no significant effect on the calculated causality.
3. *Embedding dimension, $L$*: Has a decreasing effect on calculated causality. Strength of effect varies wildly with time-series.
4. *Embedding delay, $\tau$*: Has a decreasing effect on calculated causality. 