![Clarify Logo](https://global-uploads.webflow.com/5e81e464dad44d3a9a32d1f4/5ed10fc3f1ff8467f4466786_logo.svg)

# Welcome to this basic tutorial on using Python with Clarify!

<img src="https://raw.githubusercontent.com/clarify/data-science-tutorials/main/media/introduction/light.png" alt="clarify doodle" width="400">


## What you need

1. A Clarify account (with admin rights)
2. A working Integration with Signal(s)
3. An Item (published Signal)

## What we will do
1. [Get credentials from Clarify](#credentials)
2. [Read data from our APIs](#read)
3. [Write data back to Clarify (as a signal)](#write)
4. [Adding data to the new Signal](#process)
5. [(Bonus) Visualise the data in Clarify](#bonus)

--- 
Other resources:
* [API reference](https://docs.clarify.io/api/1.0/)
* [SDK documentation](https://clarify.github.io/pyclarify/)
* [User Guide](https://docs.clarify.io/users/welcome)
* [Developer Guide](https://docs.clarify.io/developers/welcome)
* [Intro to Python Notebooks](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html#notebook-user-interface)

<a name="credentials"></a>
## Get credentials from Clarify

First, you need to connect this notebook with your Clarify account. To do this, download your credentials from the admin panel in Clarify. 

See our [Quickstart on integrations and credentials](https://docs.clarify.io/developers/quickstart/create-integration)


We will be using the PyClarify SDK for authentication, reading `Items` and writing `Signals` to the Clarify app. 

In the SDK, the client is the main hub for communication between Clarify and your code. The client is a one to one mapping of the [API](https://docs.clarify.io/api/1.0/) with a pythonic interface. 


<img src="https://raw.githubusercontent.com/clarify/data-science-tutorials/main/media/introduction/light-mono.png" alt="clarify doodle" width="400">

Run the block below to install the [PyClarify SDK](https://clarify.github.io/pyclarify/).

In [None]:
pip install pyclarify devtools

<a name="read"></a>
## Read data and metadata from our APIs

We will split reading items into two parts:
* Reading the *meta data* information of your items
* Reading the *data* of your items

To be able to read `Items`, we need to create a client to the API:

In [None]:
from pyclarify import ClarifyClient
#insert the file path to your credentials below
client = ClarifyClient("./clarify-credentials.json")

The client has a method called `select_items` which will be the main method for retrieving information from Clarify. Run the cell below to see the parameters of the method.

In [None]:
?client.select_items

#### Reading Item Meta data
Your items contain information about all sorts of stuff. This can be location of the item, the engineering unit it displays, the sample interval and so forth. 

You can actually [create your own labels](https://docs.clarify.io/api/1.1beta1/methods/admin/select-signals#signal-select-view) and add whatever you want to keep your items neat and organised. We will explore that further in the [writing items section](#write). 

> The API has a default limit of displaying 10 items each request. Use the `skip` attribute to skip the first x `items`. You can also increase the `limit` to retrieve more items in a single query. 


##### Using the Clarify Client

In [None]:
response = client.select_items()

# printing as dict for easier to understand format
response.dict()

In [None]:
# unwrapping our response

items = response.result.data

for item in items:
  print(f"ID: {item.id} \t Name: {item.attributes.name}")

Here you can see the name and ID of the `Items` your `clarify-credentials.json` have access to. 

The block below prints a complete list of meta data your last `Item` contains:

In [None]:
item.attributes

<a name="reading_values"></a>
#### Reading Item data and using a Filter
To read the values of a specific `Item` we need to know the ID of it. For simplicity it is currently set to the last `Item` retrieved by the request. 

You can select any of the ids that are displayed above, by setting `item_id` manually.

##### Setting up a Filter

In [None]:
from pyclarify import query
item_id = items[-1].id
print(item_id)
filter = query.Filter(fields={"id": query.In(value=[item_id])})

Filters can be combined to form logical _AND_ and _OR_ operations. This is translated to python using the `&` and `|` symbols. 

The implemented operators are:
- `Equal`
- `NotEqual`
- `Regex`
- `In`
- `NotIn`
- `LessThan`
- `GreaterThan`
- `GreaterThanOrEqual`

The usage is described in the [API reference](https://docs.clarify.io/api/1.1beta1/methods/filter-syntax#compare-operators), but there is an example below. We are using the `to_query()` method to get it in a more readable format.

In [None]:
f1 = query.Filter(fields={"name": query.NotEqual(value="Temperature")})
f2 = query.Filter(fields={"labels.unit-type": query.NotIn(value=["Car", "Storage 3"])})
f3 = query.Filter(fields={"labels.location": query.Regex(value="Ocean")})

# f1 and f2
f4 = f1 & f2
print("f4", f4.to_query())
# f1 and f2 or f3
f5 = f1 & f2 | f3
print("f5", f5.to_query())

##### Selecting data

In [None]:
response = client.select_dataframe(
    filter = filter,
    gte = "2022-06-13T01:00:00Z",  #starting timestamp (greater than or equal)
    lt = None  #ending timestamp (less than ) default is 40 days from starting
)
data = response.result.data
df = data.to_pandas() # convert data into pandas DataFrame
df

##### Result

Clarify data frames have two attributes:
* **times:** `List[datetime]` - A list of the shared timestamp of the retrieved `Items`. 
* **series:** `Dict[InputID, NumericalValuesType]` - A dictionary containing ids of `Items` as a key and a list of numerical values as values.

> In addition it has a handy method called `to_pandas()` which converts it into a pandas DataFrame. For more information of DataFrames in Clarify [see here](https://docs.clarify.io/api/1.1beta1/methods/integration/insert#data-frame).

For now, lets visualise the retrieved data with help of [the Plotly package](https://github.com/plotly/plotly.py).

In [None]:
pip install -U plotly

In [None]:
import pandas as pd
pd.options.plotting.backend = "plotly"
df.plot()

<a name="write"></a>
### Writing data back to Clarify
Now that we have imported an Item to Notebook, it's time to send data back to Clarify.

Writing data to Clarify is done in two steps:
* Create a new `Signal`
* Add data to the new `Signal`

Writing meta data can be done by creating a `Signal` and populating it with meta data. The ID of this `Signal` needs to correspond with the ID we use for writing values to it.

<img src="https://raw.githubusercontent.com/clarify/data-science-tutorials/main/media/introduction/light-2.png" alt="clarify doodle" width="400">

#### Create a new Signal
The new `Signal` will contain a simple rolling window based on the `Item` we visualized above. First we want to create the meta data for the `Signal` with a `Signal` data structure.

> *Why do we have both `Signals` and `Items`?*<br>
> Signals map to the raw sensor that they are recieving data from. They are supposed to be a 1 to 1 mapping in the `Signals` meta data. `Items` is an abstraction of the `Signal`. The `Item` can have custom meta data and even consist of several `Signals`. 
>
> *Why would you connect several `Signals` to an `Item`?* <br>
> You might change sensors, or even connect a new one to an `Item`. To keep the historical values you can also connect several `Signal`. Clarify even supports *Calculated Items*, which is aggregated from a combination of one or more items.   

In [None]:
from pyclarify import Signal
item_name = item.attributes.name
new_signal_name = f"{item_name}_rolling_mean"
input_id = f"{item_id}_rolling_mean"

new_signal_meta_data = Signal(
    name=new_signal_name,
    description=f"Rolling window with 1d resolution of the signal {item_id}",
    labels={
        "rolling_window": ["1 day"],
        "aggregated": [True],
        "aggregated_from": [item_id]
    },
)

response = client.save_signals(input_ids=[input_id], signals=[new_signal_meta_data], create_only=False)

print(response)

signal_id = list(response.result.signalsByInput.values())[0].id

##### Result

From the response you can see that you have a new `Input ID` and a `Signal ID`. The `Input ID` is the id we will use when selecting the signal we want to write data.

> You can now see the `Signal` in Clarify by going to the integration menu and clicking `Show Signals`
<video width="320" height="240" controls>
  <source src="https://player.vimeo.com/video/676275415" type="video">
</video>

<iframe width="420" height="315"
src="https://player.vimeo.com/video/676275415">
</iframe>


#### Reading the new Signal metadata
Previously we have used the `select_items` method to read item metadata. You can also read metadata from signals (and items) using the `select_signal` method. Below is a code snippet to see the newly created signal.

In [None]:
filter = query.Filter(fields={"id": query.In(value=[signal_id])})
response = client.select_signals(filter=filter)

signal = response.result.data[0]
signal.dict()

<a name="process"></a>
#### Add data to the new Signal
As mentioned we want to write data to this `Signal`. We can use the popular library [Pandas](https://github.com/pandas-dev/pandas) to create a rolling average with a 1 day interval of the data retrieved in [reading values](#reading_values). Then, we will write these values to the newly created `Signal`.

Let us start by importing `pandas` and creating a new Data Frame with rolling average. 

In [None]:
import pandas as pd
pd.options.plotting.backend = "plotly"
df_rolling_mean = df.rolling('1d').mean()
df_rolling_mean.columns=[input_id]
merged_df = df.join(df_rolling_mean)
merged_df.plot()

As mentioned, we use data frames (not to be confused with pandas data frames) to send values to and from Clarify. DataFrame separates time and values by having the same timestamps for all signal even though they might not have a value at a given timestamp. The backend handles this by not writing null values to Clarify.

We take advantage of a handy method called `from_pandas` to convert a pandas DataFrame to a Clarify DataFrame.

In [None]:
from pyclarify import DataFrame

new_data = DataFrame.from_pandas(df_rolling_mean)


Then we send this newly created data frame to Clarify.

In [None]:
response = client.insert(new_data)
print(response)

#### ENUMS
Enums are a special type of input you can create that acts a little different. Enums are displayed as blocks of data with a single value. 

This makes it great for displaying events over a certain space of time. 

To keep things simple we will create 3 different enums for percentiles of the data. 

We follow the sampe procedure as above by first creating a new `Signal` and populating it with meta data. 

> Enums are stored as integers or rounded floats in Clarify. If you want to map these enums to strings e.g. `"normal"`, you can do so by specifying the `type` to be `"enum"` and setting `enumValues` to be a mapping.

In [None]:
# create signal
percentile_signal_name = f"{item_name}_percentile"
percentile_input_id = f"{item_id}_percentile"


percentile_signal_meta_data = Signal(
    name=percentile_signal_name,
    description=f"Percentile enums of the signal {item_id}",
    labels={
        "percentiles": ["90", "75"],
        "aggregated": [True],
        "aggregated_from": [item_id]
    },
    valueType="enum",
    enumValues={
        "0": "normal",
        "1": "P75",
        "2": "P95",
    }
)

response = client.save_signals(input_ids=[percentile_input_id], signals=[percentile_signal_meta_data], create_only=False)
print(response)

##### Enums mapping

Now lets create the values of the series. We will do so by using pandas [quantile](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.quantile.html) method. 

In [None]:
# set all enums to be zero
merged_df[percentile_input_id] = 0

# set values above 75th percentile to be one
percentile_75 = df.quantile(q=0.75).values[0]
merged_df.loc[merged_df[item_id] > percentile_75, percentile_input_id] = 1


# set values above 95th percentile to be two
percentile_95 = df.quantile(q=0.95).values[0]
merged_df.loc[merged_df[item_id] > percentile_95, percentile_input_id] = 2


merged_df.plot()

Now we create a clarify DataFrame manually, and insert into Clarify.

In [None]:
new_df = DataFrame.from_pandas(merged_df)
print(new_df.series.keys())

In [None]:
response = client.insert(new_df)
print(response)

<a name="bonus"></a>
## Visualise the data in Clarify

Once your data has been sent to Clarify, it should show up in the `Admin panel` as a `Signal` in your `Integration`.

Publish your `Signal` to make it available as an `Item` in Clarify.


#### Publishing Signals

In [None]:
#@title See our [quickstart guide]() for more information
from IPython.display import VimeoVideo
v = VimeoVideo(id='676275415', width=800, height=600)
v

In [None]:
import pyclarify
from pyclarify import Item

# pass some metadata from the signal to the item
data_source = ["Jupyter Notebook", "Aggregation"]
location = ["Trondheim", "Norway"]

percentile_item_meta_data = Item(
    name="Percentile Enums",
    description=f"Percentile enums of the signal {item_id}",
    labels={
        "location": location,
        "data-source": data_source,
        "percentiles": [
            "90", 
            "75", 
            "normal"
        ],
        "aggregated": [True],
        "aggregated_from": [item_id],
        "published_automatically": [True],
        "SDK_version": [pyclarify.__version__]
    },
    valueType="enum",
    enumValues={
        "0": "normal",
        "1": "P75",
        "2": "P95"
    },
    gapDetection= "PT1H",
    visible=True
)

response = client.publish_signals(signal_ids=[signal_id], items=[percentile_item_meta_data], create_only=False)

print(response)

### Creating a timeline

Now that all your newly created data is available you can create your very own timeline. 

<img src="https://raw.githubusercontent.com/clarify/data-science-tutorials/main/media/introduction/create_timeline.gif" alt="Getting credentials">

Steps:
1. Go to Admin -> Items
2. Select newly published Item
3. Click `Open in Clarify`
4. Click `Open in New Timeline`
5. Add other Items by searching in the menu.


**Where to go next**

*   [Forecasting](https://colab.research.google.com/github/clarify/data-science-tutorials/blob/main/tutorials/Forecasting.ipynb)
*   [Pattern Recognition](https://colab.research.google.com/github/clarify/data-science-tutorials/blob/main/tutorials/Pattern%20Recognition.ipynb)
*   [Hosting with Google Cloud](https://colab.research.google.com/github/clarify/data-science-tutorials/blob/main/tutorials/Google%20Cloud%20Hosting.ipynb)