![Clarify Logo](https://global-uploads.webflow.com/5e81e464dad44d3a9a32d1f4/5ed10fc3f1ff8467f4466786_logo.svg)
<img src="https://uploads-ssl.webflow.com/5f031b98adc00651e28ef04b/6058a5f7b4c86c42885a2c2c_orchest-logo-no-padding.svg" alt="Orchest Logo" width="200"/>

**Welcome the tutorial about creating and deployment data pipelines combining Clarify and Orchest!**

<img src="../media/orchest/orchestration.jpg" alt="clarify orchest" width="400">

In this tutorial we combine the power of [Clarify](https://www.clarify.io/) for data exploration, visualization and collaboration across teams with [Orchest](https://www.orchest.io/) for data pipelines development and deployment.

# Prerequisites 
This tutorial builds upon the [basic tutorial on using Python with Clarify](https://colab.research.google.com/github/searis/data-science-tutorials/blob/main/tutorials/Introduction.ipynb) and the [Forecasting](https://colab.research.google.com/github/searis/data-science-tutorials/blob/main/tutorials/Forecasting.ipynb) tutorial. So we recommend you give a read on those before moving on this tutorial.

## What you need

1. A Clarify account (with admin rights)
2. A working Integration with Signal(s): `clarify-credentials.json` uploaded to the environment running the files
3. A working setup of Orchest:
    - You can create free account and instance on [Orchest Cloud](https://cloud.orchest.io/) and be ready to follow the next steps of the tutorial
    - Alternatively, you can install Orchest for free in your own machine (either locally or in your preferred cloud service)
    - For more details about installation and various possible ways to setup Orchest check the [website](https://www.orchest.io/) and [docs](https://docs.orchest.io/en/latest/)

## What we will do
1. [Initial setup and definitions](#init)
     - [Quickstart](#quickstart)
2. [Pipelines in Orchest](#pipelines)
3. [Read, write to Clarify](#read_write)
4. [Forecast step](#forecast)
5. [Configuring recurring tasks](#cron)
6. [Visualizing the result in Clarify](#visualize)

--- 
Other resources:
* [Clarify API reference](https://docs.clarify.io/reference/http)
* [SDK documentation](https://searis.github.io/pyclarify/)
* [Intro to Python Notebooks](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html#notebook-user-interface)
* [Orchest documentation](https://docs.orchest.io/en/latest/)
* [Merlion - time-series forecast and anomaly detection library](https://opensource.salesforce.com/Merlion/v1.0.1/tutorials.html)

<a name="init"></a>
# Initial setup and definitions

[Clarify](https://www.clarify.io) is a tool for easy data sharing, exploration and collaboration. The [Clarify API](https://docs.clarify.io/docs) simplifies the task of sending timeseries data and metadata to Clarify, allowing for easy integration of multiple sources of data and visualization that can shared on a team, as well as discussion threads, calculated items, thresholds, among many other features. [PyClarify](https://clarify.github.io/pyclarify/user/whatispyclarify.html) is the Python package that facilitates the interaction with the Clarify API, allowing you to read, create and update data and metadata on Clarify. 

[Orchest](https://www.orchest.io) is tool for building data pipeline in a easy way. It consists in a web interface that can accessed either via local installation or via the [Orchest Cloud](https://cloud.orchest.io/). It comes together with a Python interface, that simplifies that tasks of sharing data in a pipeline. Orchest allows you to: define a runtime environment with the necessary packages, visually construct pipelines, write code using JupyterLab (or the native file editor, VSCode or other editor of choice), run subsets of the pipeline, parametrize the pipeline (with pipeline variables and environment variables), run pipelines with a cron-job schedule, among other features. More details about this can be found in the [documentation](https://docs.orchest.io/en/latest/).

Combining the strengths of Orchest and Clarify and their Python interfaces, allows you to easily create data pipelines, that can be readily made available for your whole team. As well as developing data science workflows, that can be scheduled and updated and the results made easily visible for your whole organization. More details about the basic setup of Clarify and PyClarify can be found in the [basic tutorial on using Python with Clarify](https://colab.research.google.com/github/searis/data-science-tutorials/blob/main/tutorials/Introduction.ipynb).

The first step on this setup is then to create an [free account](https://www.clarify.io/signup) on Clarify, and follow the step on the section "Get credentials from Clarify" in the [basic tutorial on using Python with Clarify](https://colab.research.google.com/github/searis/data-science-tutorials/blob/main/tutorials/Introduction.ipynb). The next step is setup a running environment of Orchest. This can be accomplished either by following the [installation](https://docs.orchest.io/en/latest/getting_started/installation.html) steps or setting up an [Orchest Cloud](https://cloud.orchest.io/) account and instance. The remaining part of this tutorial will assume that you have both accounts setup with the correct `clarify-credential.json` file available in the running environment.

<a name="quickstart"></a>
## Quickstart


In order for a quickstart with Clarify and Orchest, we have prepared a GitHub repository that can be readily imported into Orchest. This repository include a template of a pipeline with steps for reading, forecasting and writing back into Clarify. It also includes the setup for the basic build being used for the project, including all the necessary python packages. We will use this template project as example throughout this tutorial. To use the template you need to:
1. Start your Orchest instance and click in "Projects". 
2. Select the option "Import project"
3. In the field of Git repository, just copy and paste [`https://github.com/clarify/data-science-tutorials-orchest`](https://github.com/clarify/data-science-tutorials-orchest) and in the project name, choose a name following the naming restrictions.
4. If successfuly imported, a pipeline named "Read, Forecast and Write to Clarify" should appear listed in Pipelines. Click on this pipeline and wait for the build the be completed (it can take a couple of minutes, since it is installing multiple Python packages).
5. In order to have the access to Clarify API, copy or upload your `clarify-credential.json` to the folder `src`, which is a subfolder of the root folder of the new projected created based on the template provided on GitHub.

<img src="../media/orchest/successfully_imported.png" alt="Import pipeline"  />

<a name="pipelines"></a>
# Pipelines in Orchest

The fundamental element for building and deployment data workflows in Orchest is the [pipeline](https://docs.orchest.io/en/latest/user_guide/glossary.html#pipeline-definition).

<img src="https://docs.orchest.io/en/latest/_images/visually-construct.png" alt="Orchest pipeline"  />

A pipeline consist of a directed graph of steps, with each step representing a piece of code (for example, one step can be Python script, or a Jupyter notebook). The pipeline can be constructed visually, by connecting nodes to each other, and the input and output of each step can be saved in variables using the [Orchest SDK](https://docs.orchest.io/en/latest/user_guide/sdk/index.html) (available for Python and R, although this tutorial will focus only on the Python interface).

The main way to pass input/output data between steps is to use the `orchest.get_inputs` and `orchest.output` methods. With `orchest.get_inputs` we obtain a dictionary with keys corresponding to variable names that was passed to the current step by the incoming connections of the step. With `orchest.output(data, name)` you can write a named object, that can be read by the next steps outbound from the current step.

A pipeline is part of a project, that can be composed of multiple ones. 

For an overview of other important concepts of Orchest, check the [overview](https://docs.orchest.io/en/latest/getting_started/overview.html) and for more detail about passing data between steps in a pipeline check the "Data passing" section on the [quickstart](https://docs.orchest.io/en/latest/user_guide/sdk/python.html#sdk-quickstart-data-passing).

<a name="read_write"></a>
# Read and write to Clarify 

<img src="../media/orchest/example_pipeline.png" alt="Tutorial pipeline"  />


The [PyClarify](https://clarify.github.io/pyclarify/user/whatispyclarify.html) package allows us to communicate with the Clarify API. In order to simplify the task of building pipelines and also as an example, we have created the nodes available in `src/` and added them in the example pipeline in the tutorial code. This example pipeline is obtained after succesfully importing the example project following the steps on [Quickstart](#quickstart).

The first step of the pipeline is "Read item configuration" (that uses the code in `src/node_config_read_forecast`), which setups variables using step parameters (can be edited by clicking on the node and editing the values below `Parameters`). The values configured in the step parameters are passed forward to the next nodes that read data from Clarify according to those values. We have the following parameters by default and they should be edited according to the specific item that you have access to and the configuration of the forecast desired. 
```
"parameters": {
                "future": 6,
                "item_id": "c5rtq4jsbu8cohpq1k70",
                "lag_days": 2,
                "name": "item_data_1",
                "time_split": 3
             }
```

The variable `future` defines the number of points to be predicted into the future by the forecast node. The value should be zero or a positive integer bigger than zero. The variable `item_id` defines the item id of interest, it can be obtained by accessing the item viewer or the admin panel on Clarify.