Presenting the different similarity measures implemented in the similarity_measures.py file

In [1]:
import similarity_measures as sim
import numpy as np

In [2]:
series1 = np.array([1, 0, 1, 1, 1, 0, 4, -1, -2, 0, 1, -8, 9, 4, 2])
series2 = np.array([0, 0, 2, -4, 1, 1, 2, -4, -5, 1, 2, -5, 4, 2, 1])

series1_normalized = sim.normalize(series1)
series2_normalized = sim.normalize(series2)

<h2>Pearson's Correlation</h2>

In [3]:
print(sim.pearson_correlation.__doc__)


    Compute the Pearson correlation coefficient between two series

    Quantifies the degree of linear relationship between time series.

    Args:
        series1 (numpy.ndarray): First series
        series2 (numpy.ndarray): Second series

    Returns:
        Pearson correlation coefficient between the two series
    


In [4]:
sim.pearson_correlation(series1, series2)

0.7612272716777063

<h2>Manhattan Distance</h2>

In [5]:
print(sim.manhattan_distance.__doc__)


    Compute the City Block (Manhattan) distance between two series

    Quantifies the absolute magnitude of the difference between time series.

    Args:
        series1 (numpy.ndarray): First series
        series2 (numpy.ndarray): Second series

    Returns:
        City Block (Manhattan) distance coefficient between the two series
    


In [6]:
sim.manhattan_distance(series1_normalized, series2_normalized)

2.2165663629115637

<h2>Euclidean Distance</h2>

In [7]:
print(sim.euclidean_distance.__doc__)


    Compute the Euclidean distance between two series

    Quantifies the Euclidean distance of the difference between time series.

    Args:
        series1 (numpy.ndarray): First series
        series2 (numpy.ndarray): Second series

    Returns:
        Euclidean distance between the two series
    


In [8]:
sim.euclidean_distance(series1_normalized, series2_normalized)

0.7401975233558657

<h2>Cosine Similarity</h2>

In [9]:
print(sim.cosine_similarity.__doc__)


    Compute the Cosine distance between two series

    Args:
        series1 (numpy.ndarray): First series
        series2 (numpy.ndarray): Second series

    Returns:
        Cosine distance between the two series
    


In [10]:
sim.cosine_similarity(series1, series2)

0.7260538132089214

<h2>Mutual Information</h2>

In [11]:
print(sim.mutual_information.__doc__)


    Compute the Mutual Information between two series

    Measure of the amount of mutual dependence between two random variables.

    Args:
        series1 (numpy.ndarray): First series
        series2 (numpy.ndarray): Second series

    Returns:
        Mutual Information between the two series
    


In [12]:
sim.mutual_information(series1, series2)

1.6159220638351663

<h2>Transfer Entropy</h2>

In [13]:
print(sim.transfer_entropy.__doc__)


    Compute the Transfer Entropy between two series

    Quantify information transfer between an information
    source and destination, conditioning out shared history effects.

    Args:
        series1 (numpy.ndarray): First series
        series2 (numpy.ndarray): Second series

    Returns:
        Transfer Entropy between the two series
    


In [14]:
sim.transfer_entropy(series1, series2)

0.3076923076923077

<h2>Conditional Entropy</h2>

In [15]:
print(sim.conditional_entropy.__doc__)


    Compute the Relative Entropy between two series

    Measure of the amount of information required to describe a
    random variable series1 given knowledge of another random variable series2

    Args:
        series1 (numpy.ndarray): First series
        series2 (numpy.ndarray): Second series

    Returns:
        Relative Entropy between the two series
    


In [16]:
sim.conditional_entropy(series1, series2)

0.8243018651066856

<h2>Dynamic Time Warping Distance</h2>

In [17]:
print(sim.dynamic_time_warping_distance.__doc__)


    Compute the Dynamic Time Warping distance between two series

    Dynamic time warping is an algorithm used to measure similarity between
    two sequences which may vary in time or speed.
    It works as follows:
        1. Divide the two series into equal points.
        2. Calculate the euclidean distance between the first point in the
            first series and every point in the second series. Store the minimum
            distance calculated. (this is the ‘time warp’ stage)
        3. Move to the second point and repeat 2. Move step by step along points
            and repeat 2 till all points are exhausted.
        4. Repeat 2 and 3 but with the second series as a reference point.
        5. Add up all the minimum distances that were stored and this is a
            true measure of similarity between the two series.

    Args:
        series1 (numpy.ndarray): First series
        series2 (numpy.ndarray): Second series

    Returns:
        Dynamic time warping distance be

In [18]:
sim.dynamic_time_warping_distance(series1_normalized, series2_normalized)

2.2165663629115637

<h2>Principal Component Distance</h2>

In [19]:
print(sim.principal_component_distance.__doc__)


    Compute the distance of the first k principal components between two series

    Computes the difference between time series mapped into the first k PCs that
    explain the majority of the variance.

    Args:
        series1 (numpy.ndarray): First series
        series2 (numpy.ndarray): Second series
        k (int): Number of Principal Components
            Defaults to 2

    Returns:
        Distance of values mapped into the first k principal components
    


In [20]:
sim.principal_component_distance(series1_normalized, series2_normalized)

0.6807759689722824

<h2>Spearman's Correlation</h2>

In [21]:
print(sim.spearman_correlation.__doc__)


    Compute the Spearman correlation coefficient between two series

    Benchmarks monotonic relationships between time series.

    Args:
        series1 (numpy.ndarray): First series
        series2 (numpy.ndarray): Second series

    Returns:
        Spearman correlation coefficient between the two series
    


In [22]:
sim.spearman_correlation(series1, series2)

0.8003745272736548

<h2>Kenadll's Tau</h2>

In [23]:
print(sim.kendall_tau.__doc__)


    Compute the Kendall Tau coefficient between two series

    Non-parametric measure of relationship between time series.

    Args:
        series1 (numpy.ndarray): First series
        series2 (numpy.ndarray): Second series

    Returns:
        Kendall Tau correlation coefficient between the two series
    


In [24]:
sim.kendall_tau(series1, series2)

0.7182430061427789