![Clarify Logo](https://global-uploads.webflow.com/5e81e464dad44d3a9a32d1f4/5ed10fc3f1ff8467f4466786_logo.svg)

**Welcome to this basic tutorial on using Python with Clarify!**

<img src="https://raw.githubusercontent.com/searis/data-science-tutorials/update_ui/media/introduction/light.png" alt="clarify doodle" width="400">


## What you need

1. A Clarify account (with admin rights)
2. A working Integration with Signal(s)
3. An Item (published Signal)

## What we will do
1. [Get credentials from Clarify](#credentials)
2. [Read data from our APIs](#read)
3. [Write data back to Clarify (as a signal)](#write)
4. [Adding data to the new Signal](#process)
5. [(Bonus) Visualise the data in Clarify](#bonus)

--- 
Other resources:
* [API reference](https://docs.clarify.io/reference/http)
* [SDK documentation](https://searis.github.io/pyclarify/)
* [Intro to Python Notebooks](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html#notebook-user-interface)

<a name="credentials"></a>
## Get credentials from Clarify

First, you need to connect this notebook with your Clarify account. To do this, download your credentials from the admin panel in Clarify. 

<img src="https://raw.githubusercontent.com/searis/data-science-tutorials/update_ui/media/introduction/get_credentials.gif" alt="Getting credentials">


1. Access the admin panel you need to click on your organization (located on the top right corner) and go to the integrations menu.
2. Click the integration containing your signal and download the `clarify-credentials.json` file.
3. The final step is to upload the file to this workspace.


<a name="read"></a>
## Read data from our APIs
We will be using the PyClarify SDK for authentication, reading `Items` and writing `Signals` to the Clarify app. 

<img src="https://raw.githubusercontent.com/searis/data-science-tutorials/update_ui/media/introduction/light-mono.png" alt="clarify doodle" width="400">

Run the block below to install the [PyClarify SDK](https://searis.github.io/pyclarify/).

In [None]:
!pip install pyclarify

We will split reading items into two parts:
* Reading the *meta data* information of your signals
* Reading the *data* of your signals

Due note that this can be done in a single request, however for this turorial we split them to simplify. 

The SDK is mirroring the Clarify API, thus [the reference document](https://docs.clarify.io/reference) will be a good resource if you come across any issues or want to see the capabilities of the API.

To be able to read `Items`, we need to create a client to the API:

In [None]:
from pyclarify import APIClient
#insert the file path to your credentials below
client = APIClient("./clarify-credentials.json")

#### Reading Meta data
Your items contain information about all sorts of stuff. This can be location of the item, the engineering unit it displays, the sample interval and so forth. You can actually [create your own labels](https://docs.clarify.io/reference#signal) and add whatever you want to keep your items neat and organised. We will explore that further in the [writing items section](#write). 

> The API has a limit of displaying 10 items each request. Use the `skip` attribute to skip the x first `items`. 


For now we create an empty request:

In [None]:
from pyclarify.models.requests import ItemSelect
empty_request = {
  "items": {
    "include": True, 
    "skip": None
  }, 
  "times": {
  }, 
  "series": {
  }
}
meta_data_params = ItemSelect(**empty_request)

# Send request to API
response = client.select_items(meta_data_params)
item_dict = response.result.items

# Print result
for item_id, meta_data in item_dict.items():
  print(f"ID: {item_id} \t Name: {meta_data.name}")

Here you can see the name and ID of the `Items` your `clarify-credentials.json` have access to. 

> If you were expecting other `Items` you may want to download a different credentials file. 

The block below prints a complete list of meta data your last `Item` contains:

In [None]:
for value in meta_data:
  print(value)

<a name="reading_values"></a>
#### Reading data
To read the values of an `Item` we need to know the ID of it. For simplicity it is currently set to the last `Item` retrieved by the empty request. 

You can select any of the ids that are displayed above, by setting `item_id` manually.

> The API currently only supports 40 days of data in a single request. Some `Items` might not have data in the last 40 days, thus there might be a need to manually set the start time of when to retrieve data. This can be done by specifying a `notBefore` variable in the request. You can also specify a `before` variable to set the ending time of the data.



In [None]:
reading_data_request = {
  "items": {
    "include": False,
    "filter": {
      "id": {
        "$in": [
          item_id
        ]
      }
    }
  },
  "times": {
    "notBefore": "2021-03-13T01:00:00Z", #starting timestamp
    "before": None #ending timestamp (default is 40 days from starting)
  },
  "series": {
    "items": True,
  }
}

data_params = ItemSelect(**reading_data_request)

response = client.select_items(data_params)
data = response.result.data
print(data)

Clarify data frames have two attributes:
* **times:** `List[datetime]` - A list of the shared timestamp of the retrieved `Items`. 
* **series:** `Dict[InputID, NumericalValuesType]` - A dictionary containing ids of `Items` as a key and a list of numerical values as values.

> For more information of DataFrames in Clarify [see here](https://docs.clarify.io/reference/data-frame-1).

For now, lets visualise the retrieved data with help of [the Plotly package](https://github.com/plotly/plotly.py).

In [None]:
pip install -U plotly

In [None]:
import plotly.graph_objects as go

item_name = meta_data.name
times = data.times
series = data.series
values = series[item_id] 

fig = go.Figure()
fig.add_trace(go.Scatter(x=times, y=values))
fig.update_layout(title=item_name)
fig.show()

<a name="write"></a>
### Writing data back to Clarify
Now that we have imported an Item to Notebook, it's time to send data back to Clarify.

Writing data to Clarify is done in two steps:
* Create a new `Signal`
* Add data to the new `Signal`

Writing meta data can be done by creating a `Signal` and populating it with meta data. The ID of this `Signal` needs to correspond with the ID we use for writing values to it.

<img src="https://raw.githubusercontent.com/searis/data-science-tutorials/update_ui/media/introduction/light-2.png" alt="clarify doodle" width="400">

#### Create a new Signal
The new `Signal` will contain a simple rolling window based on the `Item` we visualized above. First we want to create the meta data for the `Signal` with a `Signal` data structure.

> *Why do we have both `Signals` and `Items`?*<br>
> Signals map to the raw sensor that they are recieving data from. They are supposed to be a 1 to 1 mapping in the `Signals` meta data. `Items` is an abstraction of the `Signal`. The `Item` can have custom meta data and even consist of several `Signals`. 
>
> *Why would you connect several `Signals` to an `Item`?* <br>
> You might change sensors, or even connect a new one to an `Item`. To keep the historical values you can also connect several `Signal`. Clarify will even support *Calculated Items* in the future, which is aggregated from a combination of one or more items.   

In [None]:
from pyclarify import Signal
new_signal_name = f"{item_name}_rolling_mean"
input_id = f"{item_id}_rolling_mean"


new_signal_meta_data = Signal(
    name=new_signal_name,
    description=f"Rolling window with 1d resolution of the signal {item_id}",
    labels={
        "rolling_window": ["1 day"],
        "aggregated": [True],
        "aggregated_from": [item_id]
    },
)
response = client.save_signals(
    inputs={input_id : new_signal_meta_data},
    created_only=False #False = create new signal, True = update existing signal
)
print(response)

In [None]:
response.result.signalsByInput

From the response you can see that you have a new `Input ID` and a `Signal ID`. The `Input ID` is the id we will use when selecting the signal we want to write to and the `Signal ID` is only used internally in Clarify and can be disregarded.

> You can now see the `Signal` in Clarify by going to the integration menu and clicking `Show Signals`
<img src="https://raw.githubusercontent.com/searis/data-science-tutorials/update_ui/media/introduction/open_signals.gif" alt="Getting credentials">

<a name="process"></a>
#### Add data to the new Signal
As mentioned we want to write data to this `Signal`. We can use the popular library [Pandas](https://github.com/pandas-dev/pandas) to create a rolling average with a 1 day interval of the data retrieved in [reading values](#reading_values). Then, we will write these values to the newly created `Signal`.

Let us start by importing `pandas` and creating a new Data Frame with rolling average. 

In [None]:
import pandas as pd
pd.options.plotting.backend = "plotly"
df = pd.DataFrame(series)
df.index = times

df_rolling_mean = df.rolling('1d').mean()
df_rolling_mean.columns=[new_signal_id]
merged_df = df.join(df_rolling_mean)
merged_df.plot()

As mentioned, we use data frames (not to be confused with pandas data frames) to send values to and from Clarify. [Data frames](https://docs.clarify.io/reference#data-frame-1) separates time and values by having the same timestamps for all signal even though they might not have a value at a given timestamp. The backend handles this by not writing null values to Clarify. The signals is a dictionary consiting of `input ids` as keys and values as values. 

To do this we take the index of the `pandas` data frame as timestamps and store them in an array called `times`, and convert the values to a dictionary called `series`. 

In [None]:
from pyclarify import DataFrame
times = df_rolling_mean.index.values.tolist()
series = df_rolling_mean.to_dict(orient="list")
new_df = DataFrame(times=times, series=series)
print(new_df.series)

Then we send this newly created data frame to Clarify.

In [None]:
response = client.insert(new_df)
print(response)

Before we look at the data we have inserted into Clarify, we should check out one last data type called `Enums`.

#### ENUMS
Enums are a special type of input you can create that acts a little different. Enums are displayed as blocks of data with a single value. This makes it great for displaying events over a certain space of time. To keep things simple we will create 3 different enums for percentiles of the data. 

We follow the sampe procedure as above by first creating a new `Signal` and populating it with meta data. 

> Enums are stored as integers or rounded floats in Clarify. If you want to map these enums to strings e.g. `"normal"`, you can do so by specifying the `type` to be `"enum"` and setting `enumValues` to be a the mapping.

In [None]:
# create signal
percentile_signal_name = f"{item_name}_percentile"
percentile_input_id = f"{item_id}_percentile"


percentile_signal_meta_data = Signal(
    name=percentile_signal_name,
    description=f"Percentile enums of the signal {item_id}",
    labels={
        "percentiles": ["90", "75"],
        "aggregated": [True],
        "aggregated_from": [item_id]
    },
    type="enum",
    enumValues={
        "0": "normal",
        "1": "P75",
        "2": "P95"
    }
)
response = client.save_signals(
    inputs={percentile_input_id : percentile_signal_meta_data},
    created_only=False #False = create new signal, True = update existing signal
)

Now lets create the values of the series. We will do so by using pandas [quantile](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.quantile.html) method. 

In [None]:
# set all enums to be zero
merged_df[percentile_input_id] = 0

In [None]:
# set values above 75th percentile to be one
percentile_75 = df.quantile(q=0.75).values[0]
merged_df.loc[merged_df[item_id] > percentile_75, percentile_input_id] = 1

In [None]:
# set values above 95th percentile to be two
percentile_95 = df.quantile(q=0.95).values[0]
merged_df.loc[merged_df[item_id] > percentile_95, percentile_input_id] = 2

In [None]:
merged_df.plot()

Again, we follow the same procedure as above by inserting into Clarify.

In [None]:
from pyclarify import DataFrame
times = merged_df.index.values.tolist()
series = {
    percentile_input_id: merged_df[percentile_input_id].values.tolist()
}
new_df = DataFrame(times=times, series=series)
print(new_df.series)

In [None]:
response = client.insert(new_df)
print(response)

<a name="bonus"></a>
## Visualise the data in Clarify

Once your data has been sent to Clarify, it should show up in the `Admin panel` as a `Signal` in your `Integration`.

Publish your `Signal` to make it available as an `Item` in Clarify.


#### Publishing Signals
To view the data we have added in Clarify we need to publish the `Signals`. 

<img src="https://raw.githubusercontent.com/searis/data-science-tutorials/update_ui/media/introduction/publish_signals.gif" alt="publishing signals">

1. Go to Admin -> Integrations
2. Click `Show Signals`
3. Click on a newly created signal
4. Click `Publish`


### Creating a timeline

Now that all your newly created data is available you can create your very own timeline. 

<img src="https://raw.githubusercontent.com/searis/data-science-tutorials/update_ui/media/introduction/create_timeline.gif" alt="Getting credentials">

Steps:
1. Go to Admin -> Items
2. Select newly published Item
3. Click `Open in Clarify`
4. Click `Open in New Timeline`
5. Add other Items by searching in the menu.


### Where to go next

*   [Pattern Recognition](https://colab.research.google.com/github/searis/data-science-tutorials/blob/pattern_recognition/tutorials/pattern_recognition/pattern_recognition.ipynb)
*   [Forecasting](https://colab.research.google.com/github/searis/data-science-tutorials/blob/main/tutorials/forecast.ipynb)