![Clarify Logo](https://global-uploads.webflow.com/5e81e464dad44d3a9a32d1f4/5ed10fc3f1ff8467f4466786_logo.svg)


# Welcome to the Clarify Forecast tutorial! 📈 

This notebook start from the point that you can get credentials, authentication token, and read data directly from Clarify, and then proceed to show how to perform forecasting and writting the result back into Clarify.

<img src="../media/forecast/analysis_work.jpg" alt="Additional Options" style="width: 60%;" />






# Prerequisites 
* [Basic tutorial on using Python with Clarify](https://colab.research.google.com/github/searis/data-science-tutorials/blob/main/tutorials/Introduction.ipynb)
    - Check this tutorial for more details about reading and writting data using the PyClarify interface


## What you need
* [Clarify](https://www.clarify.io) Account (with admin rights)
* Credential file `clarify-credentials.json` from Clarify, available to the environment runnning this notebook.


# What we will do

1. [Initial setup](#read)
    - [Reading meta-data](#nit_read_meta)
    - [Reading data](#nit_read_data)
2. [Forecasting](#apply)
    - [Single signal forecasting](#apply-single)
3. [Write the forecast in Clarify](#write)
4. [Visualize the forecast result and collaborate with Clarify](#visualize)

---
Other resources:
* [Other tutorials on using Python with Clarify](https://github.com/searis/data-science-tutorials/)
* [API reference](https://docs.clarify.io/reference/http)
* [SDK documentation](https://searis.github.io/pyclarify/)
* [Merlion - time-series forecast and anomaly detection library](https://opensource.salesforce.com/Merlion/v1.0.1/tutorials.html)

# <a name="init"></a>  Initial setup
We will be using the PyClarify SDK for authentication, reading and writing signals to the Clarify app. The SDK is mirroring the Clarify API, thus [the reference document](https://docs.clarify.io/reference) will be a good resource if you come across any issues or want to see the capabilities of the API. 

In [None]:
# install dependencies
!pip install requests pyclarify pandas matplotlib salesforce-merlion 

In [None]:
from pyclarify import APIClient
client = APIClient("./clarify-credentials.json")

## <a name="init_read_meta"></a> Reading meta-data
You can retrieve the items data and meta-data from the Clarify API. This is useful in case you want to have a list of items that you have access in the script you are running. Also the items id are necessary when retrieving data from Clarify.

In [None]:
from pyclarify.models.requests import ItemSelect
empty_request = {
  "items": {
    "include": True, 
  }, 
  "times": {
  }, 
  "series": {
  }
}
meta_data_params = ItemSelect(**empty_request)

To obtain the result we call the method `select_items` with returns a JSON with a field `result` and sub-field `items` with a dictionary of item ids and metadata. 

In [None]:
response = client.select_items(meta_data_params)
signal_dict = response.result.items
for signal, meta_data in signal_dict.items():
  print(f"ID: {signal} \t Name: {meta_data.name}")

The default behavior of the `select_items` method is to return a list of items limit defined by the API. If you want to list all items that you have access to, you can iterate over the result list and make subsequent calls to the API asking to skip an amount of items given by the `skip` parameter. We will show an example of how to list all the items that you have access to. 

## <a name="init_read_data"></a> Reading data

Now, given the list of items that you have access to, you can choose one id of interest to retrieve data from.

In [None]:
item_id = "<item_id>" #change for the you item_id

reading_data_request = {
  "items": {
    "include": True,
    "filter": {
      "id": {
        "$in": [
          item_id
        ]
      }
    }
  },
  "times": {
    "notBefore": "2021-08-07T07:14:19Z" #starting from 
  },
  "series": {
    "items": True,
    "aggregates": False
  }
}

data_params = ItemSelect(**reading_data_request)

response = client.select_items(data_params)
item_name = list(response.result.items.values())[0].name
times = response.result.data.times
series = response.result.data.series

We procced by converting the data from our internal `DataFrame` structure to `pandas.DataFrame` in order to use in the forecasting library. We also discard that timezone information because the forecasting library does not support timezones, we can save that information for later when inserting the forecast data back into Clarify. The following figure shows an example graph that we obtained by running the code for plotting with a particular signal, you should expect a similar looking one with your own data.

<img src="../media/forecast/example_data_graph.png" alt="Additional Options" style="width: 40%;" />

In [None]:
import pandas as pd
df = pd.DataFrame(series)
df.index = [time.replace(tzinfo=None) for time in times]
if len(times) > 0:
    tzinfo = times[0].tzinfo
df.plot()
print(len(times))

# <a name="apply"></a> Forecasting

Given a sequence values in a time window, we might wonder that could be likely possible values on that timeseries. The task of forecasting values of a timeseries is defined as a task where taking the values until a certain time, we predict possible future values for the timeseries. 

In order to do so, we will start by exploring the forecast models available in the library [`merlion`](https://opensource.salesforce.com/Merlion/v1.0.0/index.html). This library encapsulates multiple forecast methods, for for single signals, multiple signals and allow for easy modular experimentation with the algorithms, as well as composing and creating ensembles. We will only show the basic functionality here.

## <a name="apply_single"></a> Single signal forecasting



The basic elements for using the `merlion` forecasting library is the `TimeSeries` data structure, transformations to the data, and the configuration and forecasting model. In this case we choose to use the `Prophet` forecasting model, which means that we need to instantiate a `ProphetConfig` object, defining for example the maximum forecast steps, seasonality and transformation on the data (which is this case is the `Identity` transformation). 

In order to visualize and validade the forecast we split the original time-series data into *train* and *test* splits. The variable `number_test_points` is used to define the number of points to be assigned to test, while the remaining part of the split is used for training. The model is trained using only the *training* split, and then evaluated in the held-out *testing* split.

> For an in-depth tutorial of forecasting using `merlion` check the [official documentation](https://opensource.salesforce.com/Merlion/v1.0.1/examples/forecast/1_ForecastFeatures.html). You will find there information about the different models, forecasting with multiple time-series and anomaly detection.
> For more about time-series train/test splitting methods including cross validation, you can check this tutorial [Time based cross validation](https://towardsdatascience.com/time-based-cross-validation-d259b13d42b8)

In [None]:
from merlion.utils import TimeSeries
from merlion.models.forecast.prophet import Prophet, ProphetConfig
from merlion.transform.base import Identity
import matplotlib.pyplot as plt

number_test_points = 150

test_data = TimeSeries.from_pd(df[-number_test_points:])
train_data = TimeSeries.from_pd(df[0:-number_test_points])
config = ProphetConfig(max_forecast_steps=50, add_seasonality="auto", transform=Identity())
model  = Prophet(config)
model.train(train_data=train_data)
test_pred, test_err = model.forecast(time_stamps=test_data.time_stamps)

Finally, we collect the forecast values together with the training values, and call the plotting function, asking as well for the uncertainty around the forecast values. This is all done by calling the function `model.plot_forecast`. The following picture is an example of a graph obtained by runnning this code for a given signal. Once you run the notebook with your signal you will obtain a different graph, but with a similar look and elements. 

<img src="../media/forecast/example_prediction_graph.png" alt="Additional Options" style="width: 60%;" />


In [None]:
fig, ax = model.plot_forecast(time_series_prev =train_data,time_series=test_data, plot_forecast_uncertainty=True, plot_time_series_prev=True)
plt.show()

# <a name="write"></a> Write the forecast in Clarify

We can now write back to Clary by creating DataFrames and metadata for the generated forecast and calling the method `insert` from `pyclarify`. In this case we write both the main trend of the forecast, as well as the upper and lower limit associated with the uncertainty of the forecast.

In [None]:
from pyclarify import Signal, DataFrame

config_dict = config.to_dict()
config_labels=[str({x:config_dict[x]}) for x in config_dict]

def write_data_and_metadata(original, new_signal_id, new_name, times, values):
    args = { "name" : new_name, "description" : f"Forecast for {original}",
    "labels" : {
        "original_item_id":[original], 
        "number_points_testing": [number_test_points], 
        "forecast_method" :[ "Prophet"],
        "method_config":config_labels}}

    new_signal_meta_data = Signal(**args)

    response = client.save_signals(
        inputs={new_signal_id : new_signal_meta_data},
        created_only=False #False = create new signal, True = update existing signal
    )
    series = {new_signal_id : values}
    new_df = DataFrame(times=times, series=series)
    response = client.insert(new_df)
    print(response)

forecast_column_id = test_pred.names[0]
column_err = test_err.names[0]
forecast_id=forecast_column_id +"_pred"
forecast_id_upper=forecast_column_id +"_upper"
forecast_id_lower=forecast_column_id +"_lower"

forecast_values = test_pred.univariates[forecast_column_id].values
forecast_upper_values= [x+y for x,y in zip(test_pred.univariates[forecast_column_id ].values, test_err.univariates[column_err].values)]
forecast_lower_values= [x-y for x,y in zip(test_pred.univariates[forecast_column_id ].values, test_err.univariates[column_err].values)]

write_data_and_metadata(item_id, forecast_id, f"Forecast for {item_name }", times=test_pred.time_stamps, values=forecast_values)
write_data_and_metadata(item_id, forecast_id_upper, f"Forecast for {item_name } (upper bound)", times=test_err.time_stamps, values=forecast_upper_values)
write_data_and_metadata(item_id, forecast_id_lower, f"Forecast for {item_name } (lower bound)", times=test_err.time_stamps, values=forecast_lower_values)

As a result of the previous steps you should be able to find three new (or updated in case they already exist) signals under the integration that has been used in this tutorial, exemplified in the figure below.

<img src="../media/forecast/saved_forecast.png" alt="Additional Options" style="width: 90%;" />

# <a name="visualize"></a> Visualize the forecast result and collaborate with Clarify
<img src="../media/forecast/clarify_forecast_comment.png" alt="Additional Options"  />

Once your data is written via the Clarify API and you have created **items** for the forecast and the bounds of the interval characterizing the uncertainty of the forecast, you can create customized **timelines** with your data. Clarify facilitates the creation of dynamic and responsive graph visualization and collaboration around the generated forecast, for example with the possibility of creating threads of comments on a point or interval of time, as illustrated in the above figure. For more information about visualization and publishing signals on Clarify check the [basic tutorial on using Python with Clarify](https://colab.research.google.com/github/searis/data-science-tutorials/blob/main/tutorials/Introduction.ipynb).

### Where to go next

* [Pattern Recognition](https://colab.research.google.com/github/searis/data-science-tutorials/blob/main/tutorials/Pattern%20Recognition.ipynb)
* [Google Cloud Hosting](https://colab.research.google.com/github/searis/data-science-tutorials/blob/main/tutorials/Google%20Cloud%20Hosting.ipynb)