# Predictions on Time Series with Prescience

The best way to understand how to make prediction on timeseries in `Prescience` is probably to do it with an example. For that purpose we are going to use a time serie that traces the evolution in number of bytes of a Ceph cluster usage.

The source is already uploaded.

## The `source` object

We uploaded the raw data from a time serie backend (Warp10) and started a `parse task`. This task includes some pre-analysis on the raw data for later processing and some type resolution.

The `source` object is almost like your original data, except that it holds some metadata inside that have been computed during the parse tasks. We wont describe all the metadata here but this is several statistics like :
* The standard deviation for each column
* The type of data contained in the column (integer, boolean, text, etc...)
* If the column can contains `null` values
* etc...

Let's retrieve the existing sources

In [None]:
from prescience_client import prescience
# Display the list of all sources in your prescience project
prescience.sources().show('html')

Then we get back this source to explore it and get a brief summary of what informations were extracted during the `parse task`.

In [None]:
source = prescience.source('ceph-usage-40w-hourly')
source.schema().show('html')

We can even plot the source object to visualize the raw timeserie. To do so we need to indicate the name of the column we want to use as abscisse `x`.

In [None]:
%matplotlib inline
import matplotlib
matplotlib.pyplot.rcParams["figure.figsize"] = (20, 3)
source.plot(x='time-column')

### The dataset object

The `dataset` object contains the same data than in your `source` except that this data has been transformed so that it can be understandable by machine learning algorithms.

The rules of transformation won't be described here but all you have to know is that previously computed statistics are used to choose the good transformation strategy.

In our case if we plot the dataset we will see that it looks exactly the same as the source, except that all axis have been standardized (i.e values have been rescaled and bound with 0 as the mean)

In [None]:
dataset_id = 'dataset-ceph-usage-40w-hourly'
prescience.plot_dataset(dataset_id, plot_test=False)

The `dataset` object is created from a `source` with a `preprocess task`. Another things to note is that the preprocess task is responsible for creating the `folds` that we previously talk about on `Problem Definition` part.

You wan easily see created `folds` :

In [None]:
for fold_number in range(3):
    prescience.plot_dataset(dataset_id, fold_number=fold_number)

As you can see, each fold is composed of 2 parts :
* A train part which is used to train machine learning algorithms
* A test part which is used to evalute the relevance of a model

It is a standard process in machine learning in order to evaluate relevancy of machine learning algorithms on data that they have never seen before.

### The model object

The `model` object is created from a `dataset` with a `train task` however a train task needs to be launched on a specific machine learning algorithm with defined parameters. That's why there is an intermediate task which is called the `optimization task`.

The aim of an optimization task is to use the previously created folds of our dataset to train a lot of machine learning algorithms with different hyperparameters, evaluate them and find the best.

All results of evaluations are stored in prescience into objects called `evaluations results`. You can request the visualisation of previously computed evaluation by doing so :

In [None]:
evaluation_results = prescience.get_evaluation_results(
    dataset_id,
    # Only display results for the wanted horizon
    forecasting_horizon_steps=24,
    # Sort all the results by the selected scoring metric
    sort_column=f'costs.mse'
)
evaluation_results.show('html')

The first row of the table is the best model and configuration find. If we compare this configuration with the configuration of the model deployed we will see that they are the same.

In [None]:
# Show the config from the best evaluation results
config_eval = evaluation_results.content[0].config()

import json
print(json.dumps(config_eval.kwargs(), indent=4))

config_eval.show('html')

## Play around with the model

We previously trained a model automatically tuned by prescience. It's forecast horizon was set to 24 (the base serie being sampled with an interval of 1 hour) to predict a day forward.

In [None]:
# Display the list of all models associated to this source
source.tree().show()

In [None]:
# Access the created model on prescience
model_id = 'ceph-usage-40w-hourly-model-24hori'
model = prescience.model(model_id)

# Show the config from the model
config_train = model.config()
print(json.dumps(config_train.kwargs(), indent=4))
config_train.show('html')

## Observing predictions results
Ploting the predicted result of a model is one of the best way to estimate its relevancy. We are going to choose some arbitrary points in our original data and compare the theorical serie with the predicted one from newly deployed model.

In [None]:
ORIGIN_TIMESTAMP = 1560733200000000

# Generate the input payload that will be send to the model for making prediction (it use the initial data to create it)
payload_dict = prescience.generate_payload_dict_for_model(model_id, from_data=STEP_TO_PREDICT)

# Print this payload :
import json
print(json.dumps(payload_dict, indent=4))

### Ask for a single prediction

In [None]:
# Get the prediction of the model
result_dataframe = model.get_dataframe_for_plot_result(payload_dict)

# Plot the prediction of the model
matplotlib.pyplot.rcParams["figure.figsize"] = (10, 4)
result_dataframe.plot()

### Ask for a rolling prediction
We can go a bit further and ask the model to predict even further in the past, the prediction result is used as input for the next forecast. 

Rolling forecasts tend to be less and less accurate. Usually either the result converges to a fixed value or diverges abruptly.

Let's see how this model handles the rolling prediction.

In [None]:
# Get the prediction of the model with a 'rolling' strategy
result_dataframe = model.get_dataframe_for_plot_result(payload_dict, rolling_steps=2)

# Plot the prediction of the model
matplotlib.pyplot.rcParams["figure.figsize"] = (20, 4)
result_dataframe.plot()

As you can see, it seems that the deployed model unsterstood the trend and the seasonality of the underlying time-series. It is a first step, however predictions can be improved. We are going to see what is really happening under the hood and try to improve that prediction.

What would happen if we went even further ?

In [None]:
# Get the prediction of the model with a 'rolling' strategy
result_dataframe = model.get_dataframe_for_plot_result(payload_dict, rolling_steps=6)

# Plot the prediction of the model
matplotlib.pyplot.rcParams["figure.figsize"] = (20, 4)
result_dataframe.plot()

The prediction seems to maintain the seasonality with a lower variance and with an increasing trend.