![Clarify Logo](https://global-uploads.webflow.com/5e81e464dad44d3a9a32d1f4/5ed10fc3f1ff8467f4466786_logo.svg)

# Welcome to the Pattern Recognition tutorial using Data Science algorithms with Clarify! 🤹

<img src="https://raw.githubusercontent.com/clarify/data-science-tutorials/main/media/pattern_recognition/patterns.png" alt="Additional Options"  />

## Prerequisites 
This tutorial picks up where [basic tutorial on using Python with Clarify](https://colab.research.google.com/github/clarify/data-science-tutorials/blob/main/tutorials/Introduction.ipynb) leaves off, diving more deeply into how to discover patterns in your data and how to visualise them in [Clarify](https://www.clarify.io).


## What we will do

1. [Read and plot item data from Clarify](#read)
2. [Apply a Pattern Recognition Algorithm](#apply)
3. [Create Patterns](#create)
4. [Write signal data and metadata to Clarify](#write)
5. [Publish Signals to create Items](#publish)
3. [Visualise the results in Clarify](#visualise)

---
Other resources:
* [Basic tutorial on using Python with Clarify](https://colab.research.google.com/github/clarify/data-science-tutorials/blob/main/tutorials/Introduction.ipynb)
* [API reference](https://docs.clarify.io/reference/http)
* [SDK documentation](https://clarify.github.io/pyclarify/)
* [Pattern Recognition algorithm](https://matrixprofile.org)

<a name="read"></a>
# Read and plot item data from Clarify

<img src="https://raw.githubusercontent.com/clarify/data-science-tutorials/main/media/pattern_recognition/readdata.png" alt="clarify illustration"  />

## Read Item data


First we need to install some packages, like PyClarify. [PyClarify](https://pypi.org/project/pyclarify/) gives you a fast and easy way to get your data from Clarify and write data back. This way you can analyze your data - for example apply a pattern recognition algorithm - and write back your results. 

> The specifics of how to use PyClarify is in the "Basic tutorial on using Python with Clarify" which you can find [here](https://colab.research.google.com/github/clarify/data-science-tutorials/blob/main/tutorials/Introduction.ipynb). Therefore we will just briefly mention what the methods do, without going into details. 

In [None]:
!pip install pyclarify
!pip install numpy
!pip install matrixprofile
!pip install plotly
!pip install --upgrade nbformat
!pip install pandas


In [None]:
import datetime
from pyclarify import APIClient, SignalInfo, DataFrame
import plotly.graph_objects as go
import plotly.express as px
import math

Download your credentials, choose an item for which you want to discover patterns of interest and copy the item id. To read the item data for the dates that want you, choose a start date and an end date.

From the APIClient class you can use the `select_item` method to get the items data and metadata. You can use the filter parameter choose one or more items.

In [None]:
client = APIClient("./clarify-credentials.json")

# Add the item id, start date and end date
item_id = "<item-id>"
start_date = "<stating date>"   # example: "2021-04-17T03:22:06Z"
end_date = "<ending date>"      # example: "2021-08-12T22:00:06Z"

# The API returns data with a window of 40 days. 
# For this reason we will create multiple calls using 'blocks' of dates.
start = datetime.datetime.strptime(start_date,"%Y-%m-%dT%H:%M:%SZ")
end = datetime.datetime.strptime(end_date,"%Y-%m-%dT%H:%M:%SZ")
dt = end - start

hours = dt.days * 24  + dt.seconds//3600 + 1
blocks = math.ceil(hours / 960)
delta = (end - start)/blocks

date_list = [start.strftime("%Y-%m-%dT%H:%M:%S")+'Z']

for block in range(1, blocks + 1):
    time = (start + block * delta).strftime("%Y-%m-%dT%H:%M:%S")+'Z'
    date_list.append(time)

values = []
dates = []

for date in range(len(date_list)-1):

    params = {
        "items": {"include": False, "filter": {"id": {"$in": [item_id]}}},                    # items -> include = False : don't include metadata
        "data": {"include": True , "notBefore": date_list[date], "before": date_list[date+1]} # data  -> include = True : include data
    }

    # Get signal data
    response = client.select_items(params=params)
    
    # Pass data in a list
    values.extend(list(response.result.data.series.values())[0])
    dates.extend(response.result.data.times)

## Plot your data 
When running the cell below, you shoud see a similar plot as you see it Clarify. Keep in mind that the plot shape slidely changes as you zoom in and out in Clarify.

In [None]:
fig = go.Figure([go.Scatter(x=dates, y=values )])
fig.show()

Time to mark the pattern of interest, also called _query_, in the plot.

Define start and end date of the query. The key here is to be specific. 
Zoom in your plot to get the exact date and time.

In [None]:
# Example of start_date and end_date:
# print(dates) 
start_date = datetime.datetime(2021, 11, 21, 14, 55, 59,  tzinfo=datetime.timezone.utc)
end_date = datetime.datetime(2021, 11, 21, 14, 56, 27, tzinfo=datetime.timezone.utc)

# Asserting that start date and end date is in the array dates (both should be true)
print(start_date in dates)
print(end_date in dates)

# Find index
start = dates.index(start_date)
end = dates.index(end_date)

# Query of interest
query_values = values[start : end]
query_dates = dates[start : end]

fig = go.Figure()
fig.add_trace(go.Scatter(x=dates, y=values,
                    mode='lines',
                    name='data'))

fig.add_trace(go.Scatter(x=query_dates, y=query_values,
                    mode='lines',
                    name='pattern'))

fig.show()

<a name="apply"></a>
# Apply a Pattern Recognition Algorithm
To make life easer for you, we have already written some code which finds matching patterns.

In [None]:
# To find the patterns we use the mass2 algorithm from the matrix profile foundation: 
# Website: https://matrixprofile.org
# Github repo: https://github.com/matrix-profile-foundation/matrixprofile

import numpy as np
from matrixprofile.algorithms import mass2


def percentage(percent, number):
    """
    Calculates the percentage of a number.

    Parameters
    ----------
    percent : int
        Specifying the percentage from a number
    number : float64
        Number for which to calculate the percentage

    Returns
    -------
    float64
        Percentage of a number.
    """

    return (percent * number) / 100


def limits_query(query, percent_min, percent_max):
    """
    Calculates the percentages of the minimum and maximum value of the query.

    Parameters
    ----------
    query : Series
        Specific time series query
    percent_min : int
        Specifying the percentage from the minimum value of the query
    percent_max : int
        Specifying the percentage from the maximum value of the query

    Returns
    -------
    list of float64
        1D list with the percentages of two numbers
    """

    minimum = np.amin(query)
    maximum = np.amax(query)

    return [minimum - (percent_min * minimum)/100, maximum + (percent_max * maximum)/100]


def find_pattern_index(mass, limit, ex_zone, loop, d=None):
    """
    Find pattern index.

    Parameters
    ----------
    mass : numpy.ndarray
        1D array containing the mass values
    ex_zone : int, default m - (m//10)
        Amount of overlap (use m for zero overlap)
    limit : int
        Number of patterns to discover

    Returns
    -------
    numpy.ndarray
        1D array with indexes of the pattern starting locations
    """

    limit = len(mass) if limit > len(mass) else limit
    mass_cop = np.copy(mass)

    if d is not None:
        start = len(d) 
        z = np.zeros(limit - start, dtype='int')  
        d = np.hstack((d, z))

    else:
        d = np.zeros(limit, dtype='int')   
        start = 0      

    for i in range(start, limit):
        minVal = np.inf
        minIdx = -1
        for j, val in enumerate(mass_cop):
            if not np.isinf(val) and val < minVal and val != -1:
                minVal = val
                minIdx = j
        d[i] = minIdx
        mass_cop[max([minIdx - ex_zone, 0]):min([minIdx + ex_zone, len(mass_cop)])] = np.inf

    if loop:
        return mass_cop, d

    return d


def constraints(query, ts, patterns, percent_var=40, percent_min=100, percent_max=100):
    """
    Removes patterns which don't follow our constrains-restrictions. For the combination percent_var = 0,
    percent_min = 100, percent_max = 100 we don't have any constrains and we get all the patterns which were found.

    Parameters
    ----------
    query : Series
        Specific time series query.
    ts : Series
        Time series containing the query.
    patterns : array_like of int
        Output from _remove_overlaps. 1D array with the indexes of the pattern starting locations.
    percent_var : int, default 40
        Constrain No1 : specifying the variance (height) of the patterns that we want to find by using the variance of
        the query and calculating the percentage of it. If percent_var = 0 we have no constrains, if percent_var = 100
        we have patterns which have the same or higher variance from the query.
    percent_min : int, default 100
        Constrain No2 : specifying the lower limit of the patterns by using percentage. If percent_min = 0 the patterns
        that we find will be at the same level or higher than the query. If percent_min = 100 we have no constrains.
    percent_max : int, default 100
        Constrain No3 : specifying the upper limit of the patterns by using percentage. If percent_max = 0 the patterns
        that we find will be at the same level or lower than the query. If percent_min = 100 we have no constrain.

    Returns
    -------
    list of int
        1D list with the indexes of the pattern.
    """

    var_threshold = percentage(percent_var, np.var(query))
    limits = limits_query(query, percent_min, percent_max)

    new_patterns = []

    for pattern in patterns:
        patt = ts[pattern: pattern + len(query)]
        if (pattern != -1) and np.amin(patt) >= limits[0] and np.amax(patt) <= limits[1] and \
                np.var(patt) >= var_threshold:
            new_patterns.append(pattern)

    return new_patterns


def get_patterns(query, ts, limit, percent_var=40, percent_min=100, percent_max=100, ex_zone=None, loop=False):
    """
    Discovers the best matching patterns in ts.

    Parameters
    ----------
    query : numpy array
        Specific time series query.
    ts : numpy array
        Time series to compare the query against.
    limit : int
        The number of patterns to discover.
    percent_var : int, default 40
        Constrain No1 : specifying the variance (height) of the patterns that we want to find by using the variance of
        the query and calculating the percentage of it. If percent_var = 0 we have no constrains, if percent_var = 100
        we have patterns which have the same or higher variance from the query.
    percent_min : int, default 100
        Constrain No2 : specifying the lower limit of the patterns by using percentage. If percent_min = 0 the patterns
        that we find will be at the same level or higher than the query. If percent_min = 100 we have no constrains.
    percent_max : int, default 100
        Constrain No3 : specifying the upper limit of the patterns by using percentage. If percent_max = 0 the patterns
        that we find will be at the same level or lower than the query. If percent_min = 100 we have no constrain.
    ex_zone : int, default m - (m//10)
        Amount of overlap.
    loop:
        If set to True, the function will increase the limit if the number of pattern it found is smaller than limit. 
        If set to False, if 'n' number of patterns are removed from the constraints, the functions returns limit-n patterns. 

    Returns
    -------
    list of dict of {str : int}
        A 1D list of dictionaries with "start" and "end" indexes of the recognized patterns.
    """

    m = len(query)

    if ex_zone is None:
        ex_zone = m - (m//10)

    mass = mass2(ts, query)
    pattern_number = 0

    if loop:
        cop_mass, patterns = find_pattern_index(mass, limit, ex_zone, loop)
        patterns = constraints(query, ts, patterns, percent_var, percent_min, percent_max)
        pattern_number = len(patterns)

        if pattern_number < limit:
            for attempt in range(2):
                new_limit = limit + (limit - pattern_number + attempt)
                cop_mass, patterns = find_pattern_index(cop_mass, new_limit, ex_zone, loop, patterns)
                patterns = constraints(query, ts, patterns, percent_var, percent_min, percent_max)
                if pattern_number >= limit:
                    break
    
    else:
        patterns = find_pattern_index(mass, limit, ex_zone, loop)
        patterns = constraints(query, ts, patterns, percent_var, percent_min, percent_max)

    
    return [{"start": int(i), "end": int(i)+m} for i in patterns]

<a name="create"></a>
# Create Patterns

To create patterns using a query, we will use the `get_patterns` function created in the cell above. 


<img src="https://raw.githubusercontent.com/clarify/data-science-tutorials/main/media/pattern_recognition/pattern.png" alt="clarify illustration"  />

Feel free to change the parameters of the `get_patterns` function. You will notice that the returned patterns will change! 
Change the values of: `limit`, `percent_var`, `percent_min`, `percent_max` and `loop` to make the algorithm more or less strict depending on your needs.

In [None]:
index_of_pattern = get_patterns(query = query_values, ts = values, limit = 40, percent_var= 50 , percent_min=50, percent_max= 40, loop=True, ex_zone=len(query_values))
index_of_pattern

Once you found some patterns, it is time to plot them.

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=dates, y=values,
                    mode='lines',
                    name='data'))

for pattern in range(len(index_of_pattern)):
    start = list(index_of_pattern[pattern].values())[0]
    end = list(index_of_pattern[pattern].values())[1]

    fig.add_trace(go.Scatter(x=dates[start:end], y=values[start:end],
                        mode='lines',
                        name=f'pattern {pattern + 1}'))


fig.show()

<a name="write"></a>
# Write signal data and metadata to Clarify

If you are satisfied with your results, you can send your patterns to Clarify, share your knowledge and make comments about what you discovered! 

To do that, create two lists. One with all your `pattern values` and one with the [`enums`](https://colab.research.google.com/github/clarify/data-science-tutorials/blob/main/tutorials/Introduction.ipynb#write). For the enums we will use 0-1 values. The value 0 will represent that there is no pattern, and 1 that there is a pattern.


In [None]:
pattern_values = []
enum_values = []
pattern_dates = []
previous_end = 0

index_of_pattern = sorted(index_of_pattern, key = lambda i: i['start'])

for pattern in range(len(index_of_pattern)):
    start = list(index_of_pattern[pattern].values())[0]
    end = list(index_of_pattern[pattern].values())[1]

    if start == 0:
        enum_values.extend([0] * (start - previous_end))
        enum_values.extend([1] * (end - start))
    else:
        enum_values.extend([0] * (start - previous_end - 1))
        enum_values.extend([1] * (end - start + 1))

    pattern_values.extend([None] * (start - previous_end))
    pattern_values.extend( values[start:end] )
    
    previous_end = end

    # for plots
    pattern_dates.extend(dates[start:end])

enum_values.extend([0] * (len(values) - previous_end))
pattern_values.extend([None] * (len(values) - previous_end))

Let's make sure that everything is correct and that these are indeed the pattern that you want.

In [None]:
patt = [i for i in pattern_values if i!=None]
fig = px.line(x=pattern_dates, y=patt)
fig.show()

In [None]:
# Zero values indicate that there are no patterns in the time series.
# One values indicate that there is a matching pattern. 
fig = px.scatter(x=dates, y=enum_values)
fig.show()

If everything looks good, you are good to go with writing your pattern data back to Clarify, using the `insert` method.

Add some metadata to include information about your data and to make it easer to find your signal using the `save_signals` method.

Add your singal name of your choice, a description of your data, some labels, add a gap detection to make it easy so detect gaps in your data, add labels, an engineering unit and many more. 
For more information about signal metadata click [here](https://docs.clarify.io/reference/signal)

In [None]:
# Choose an input id for your two signals
input_pattern_id = "<input_pattern_id>"
input_enum_id = "<input_enum_id>"

# Example of adding signal metadata
pattern_metadata = SignalInfo(name="Pattern Data", description="Pattern for passing bikes item", labels={"data-source": ["ML on passing bikes data"], "location":["Data Science Analysis Office"],  "Pattern": ["Passing Bikes"]}, gapDetection="PT1H", engUnit="nbr/h")
enum_metadata = SignalInfo(name="Enum for Pattern Data", description="Enum pattern for passing bikes item", labels={"data-source": ["ML on passing bikes data"], "location":["Data Science Analysis Office"],  "Pattern": ["Passing Bikes"]},  type='enum', enumValues = {0: " ", 1: "Pattern"})

save_signals_response = client.save_signals(params={"inputs": {input_pattern_id: pattern_metadata, input_enum_id: enum_metadata}, "createOnly": False})

data = DataFrame(
     times=dates,
     series={input_pattern_id: pattern_values, input_enum_id: enum_values},
)

insert_response = client.insert(data=data)

print(f"save signals response: {save_signals_response} \n insert response: {insert_response}")


<a name="publish"></a>
# Publish Signals to create Items

Publish your signals to create Items. Go to [Clarify](https://clarifyapp.clarify.io/), find the integration which you used to download your credentials, click on view signals, find your signal, scroll down to  __System info__ and copy paste the signal id.


For more information how to do that click [here](https://github.com/clarify/data-science-tutorials/blob/main/tutorials/Introduction.ipynb).

In [None]:
response = client.publish_signals(
    params={
        "itemsBySignal": {"<signal_pattern_id>": pattern_metadata, "<signal_enum_id>": enum_metadata},
        "createOnly": False,
    }
)
print(response.json())

Under Items you can see your newly created Items.

<img src="https://raw.githubusercontent.com/clarify/data-science-tutorials/main/media/pattern_recognition/Items.png" alt="Add Items in Your Timeline" width= "600px;" />

<a name="visualise"></a>
# Visualise the results in Clarify

The last chapter of the [Introduction](https://colab.research.google.com/github/clarify/data-science-tutorials/blob/main/tutorials/Introduction.ipynb) tutorial called ["Visualise the data in Clarify"](https://colab.research.google.com/github/clarify/data-science-tutorials/blob/main/tutorials/Introduction.ipynb#bonus) describes all the steps needed to see your data in a Timeline. For a refresher, click [here](https://colab.research.google.com/github/clarify/data-science-tutorials/blob/main/tutorials/Introduction.ipynb). This section describes how you can visualise your own ML results in a powerful way in Clarify. 

## Add Data and share your Timeline

Create a Timeline with the new items. For more information how to do that [here](hhttps://colab.research.google.com/github/clarify/data-science-tutorials/blob/main/tutorials/Introduction.ipynb#bonus).

Share your timeline with selected or all members of your organization

<img src="https://raw.githubusercontent.com/clarify/data-science-tutorials/main/media/pattern_recognition/item.png" alt="Add Items in Your Timeline" width= "500px;" />

## Customize your Timeline

1. Click on the bottom right icon in your timeline. From there you can cosumize the looks of your timeline!
2. Add colors, change names, hide or reveal items.

<img src="https://raw.githubusercontent.com/clarify/data-science-tutorials/main/media/pattern_recognition/customize.gif" alt="Customize your Timeline">

> Tip: More customization can be done by clicking on the item name above the ploted data. 

<table><tr>
<td> <img src="https://raw.githubusercontent.com/clarify/data-science-tutorials/main/media/pattern_recognition/tip_1.png" alt="Additional Options" width= "1000px;"> </td>
</tr></table>

## Add Comments

Share your knowledge and discoveries!
You can select a period to add comment, labels and images.

<img src="https://raw.githubusercontent.com/clarify/data-science-tutorials/main/media/pattern_recognition/comments.gif" alt="Add Comments">


> Tip: Once you select a period in your timeline you can also view statistics and export data from that period! 

<table class="center"><tr>
<td> <img src="https://raw.githubusercontent.com/clarify/data-science-tutorials/main/media/pattern_recognition/tip3.png" alt="Additional Options" width="400px;" /> </td>
<td> <img src="https://raw.githubusercontent.com/clarify/data-science-tutorials/main/media/pattern_recognition/tip4.png" alt="View Statistics" width="400px;"/> </td>
</tr></table>


**Where to go next**

* [Forecasting](https://colab.research.google.com/github/clarify/data-science-tutorials/blob/main/tutorials/Forecasting.ipynb)
* [Google Cloud Hosting](https://colab.research.google.com/github/clarify/data-science-tutorials/blob/main/tutorials/Google%20Cloud%20Hosting.ipynb)