# Implementation tracking tutorial

## Introduction

Implementation tracking is a module that finds out if the recommendations that are exported into the CRA have been actually implemented. This notebook demonstrates the main steps of this process:

* Prepare data for implementation calculations from the recommendations sent to CRA and the most recent sensor data.
* Calculate the implementation status.

The results from this pipeline can be exported into the CRA using the functionalities outlined in the [export tutorial](export.ipynb).

## Setup

In [1]:
# Resolve path when used in use case project
import sys
from pathlib import Path

sys.path.insert(0, str(Path("../../").resolve()))

In [2]:
import recommend
print(f'Using {recommend.__version__} version of recommend package')

Using 0.39.0 version of recommend package


In [3]:
import numpy as np
import pandas as pd

## Example data

We will be using four datasets to showcase the ``implementation_tracking`` functionalities.

Optimization previous runs are used to find out when each recommendation was generated based on its ``run_id``.

In [4]:
from recommend import datasets

cra_runs = datasets.get_sample_runs_cra()
cra_runs[0]

{'id': '1d5b3bdb-ed6e-4ec6-b961-bf00df66149e',
 'timestamp': '2017-09-05T23:00:00Z'}

Previous recommendations sent to CRA are used to have information regarding what was advised to change in the control room.

In [5]:
cra_recs = datasets.get_sample_recommendations_cra()
cra_recs[0]

{'id': 'af1ddb6a-917c-466c-a301-54f1ca7fabb1',
 'value': 3461.7177480471337,
 'tolerance': 400.0,
 'run_id': '1d5b3bdb-ed6e-4ec6-b961-bf00df66149e',
 'tag_id': 'ec60d156-eb79-41b0-a907-ccedf677da9a',
 'target_id': '7d583622-e55e-49df-9978-6d2b4bcaa5c3',
 'is_flagged': False,
 'status': 'Pending'}

Also, status data contains the value before optimization of controls.

In [6]:
cra_states = datasets.get_sample_states_cra()
cra_states[0]

{'id': 'dfa49855-b295-4c8a-a1bc-f21c6e6b9d23',
 'value': 3564.6864600000004,
 'run_id': '1d5b3bdb-ed6e-4ec6-b961-bf00df66149e',
 'tag_id': 'ec60d156-eb79-41b0-a907-ccedf677da9a'}

Finally, tag values for the controls are required to evaluate whether the current tag value is consistent with the recommended one.

In [7]:
tag_data = datasets.get_sample_implementation_status_input_data()
tag_data.head()

Unnamed: 0,timestamp,amina_flow,ore_pulp_density,ore_pulp_flow,ore_pulp_ph,starch_flow,total_air_flow,total_column_level
0,2017-09-05 23:00:00+00:00,630.368125,1.744561,410.0,9.564084,3564.68646,2097.828341,2638.478359
1,2017-09-06 02:00:00+00:00,640.858592,1.742618,408.92094,9.575665,3486.914953,2005.392683,3338.673017
2,2017-09-06 05:00:00+00:00,495.941623,1.626334,401.539085,9.500252,3517.23391,1990.812654,2690.721078
3,2017-09-06 08:00:00+00:00,553.022891,1.700219,408.732205,9.595444,3472.47579,2035.502478,2688.994746
4,2017-09-06 11:00:00+00:00,523.073518,1.682749,398.98012,9.533169,2910.032707,2046.846795,3225.537206


Finally, tag meta is needed to map the tag name and its id.

In [8]:
tag_meta = datasets.get_sample_tags_meta()
tag_meta

MetaDataConfig(...)

## Data preparation with ``collect_recs_vs_actual_implementation``

The first step in implementation tracking is to consolidate the available data sources using the ``collect_recs_vs_actual_implementation`` function.

It is possible to indicate the period to evaluate the recommendations. This is done by setting ``offset`` as the time period between the recommendation timestamp in ``cra_runs`` and its evaluation.

The resulting dataframe contains one row per recommendation sent to the CRA with run information and three different values for such tag:
* ``recommended_value``: Recommendation sent to the operators for the tag.
* ``before_recs_value``: Value of the tag before being optimized (at ``run_timestamp`` - the timestamp used in the recommendations creation).
* ``current_value``: Value of the tag on the implementation tracking evaluation period (at ``timestamp`` - the timestamp used to evaluate the recommendations).

In [9]:
from recommend.implementation_tracker import collect_recs_vs_actual_implementation
implementation_data = collect_recs_vs_actual_implementation(
    cra_recs,
    cra_states,
    cra_runs,
    tag_data,
    tag_meta,
    offset = "3H",
)
implementation_data.head()

Unnamed: 0,tag_id,run_id,recommended_value,before_recs_value,id,run_timestamp,timestamp,current_value
0,ec60d156-eb79-41b0-a907-ccedf677da9a,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,3461.717748,3564.68646,af1ddb6a-917c-466c-a301-54f1ca7fabb1,2017-09-05 23:00:00+00:00,2017-09-06 02:00:00+00:00,3486.914953
1,4206e334-b1ca-4639-8808-e83e3c3be395,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,546.209128,630.368125,7cc8cdac-7fd4-4e24-9d99-610dcd0d73e4,2017-09-05 23:00:00+00:00,2017-09-06 02:00:00+00:00,640.858592
2,4011b8b6-c59b-4e0d-9779-21d8c190b04b,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,400.055094,410.0,2cd1d8f8-d138-4f7e-b88a-fcc3adf9c48b,2017-09-05 23:00:00+00:00,2017-09-06 02:00:00+00:00,408.92094
3,371f1681-6cae-4dc0-abc2-5e8e6a6a116b,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,9.763356,9.564084,51fd0ccf-0dd8-4338-9af7-13a964c7e4d9,2017-09-05 23:00:00+00:00,2017-09-06 02:00:00+00:00,9.575665
4,b57d45f0-7c63-4c5b-8b34-266df9be2e26,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,1.702584,1.744561,f7337e67-45fd-4931-b08e-db7a7138230f,2017-09-05 23:00:00+00:00,2017-09-06 02:00:00+00:00,1.742618


## Implementation evaluation with ``calculate_implementation_status``

The next step is to calculate the implementation status based from the output data of the previous step. To do so, we use function ``calculate_implementation_status``. It returns a percentage of implementation (``implementation_perc``) between 0 and 1 per each tag and run.

This function implements two methods (``"deviation"`` and ``"progress"``) for the calculations and allows for custom ones to be passed, as it is exemplified below.


### ``"deviation"`` implementation status

``"deviation"`` methodology finds out if ``current_value`` is within a close range of the ``recommended_value``.

The size of the range is defined by using the ``sensitivity`` parameter. It can have two meaning based on the value of parameter ``sensitivity_type``:
* ``"rel"``: range is defined as ``recommended_value`` $\times ( 1 \pm$ ``sensitivity`` $)$
* ``"abs"``: range is defined as ``recommended_value`` $\pm$ ``sensitivity``

Here we exemplify the use of the function using the ``rel`` ``sensitivity_type``.

In [10]:
from recommend.implementation_tracker import calculate_implementation_status
dev_imp = calculate_implementation_status(
    implementation_data,
    method = "deviation",
    sensitivity = 0.05,
    sensitivity_type = "rel",
)
dev_imp.head()

Unnamed: 0,id,tag_id,run_id,implementation_perc
0,af1ddb6a-917c-466c-a301-54f1ca7fabb1,ec60d156-eb79-41b0-a907-ccedf677da9a,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,1.0
1,7cc8cdac-7fd4-4e24-9d99-610dcd0d73e4,4206e334-b1ca-4639-8808-e83e3c3be395,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,0.0
2,2cd1d8f8-d138-4f7e-b88a-fcc3adf9c48b,4011b8b6-c59b-4e0d-9779-21d8c190b04b,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,1.0
3,51fd0ccf-0dd8-4338-9af7-13a964c7e4d9,371f1681-6cae-4dc0-abc2-5e8e6a6a116b,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,1.0
4,f7337e67-45fd-4931-b08e-db7a7138230f,b57d45f0-7c63-4c5b-8b34-266df9be2e26,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,1.0


It is also possible to define different sensitivities for each tag. This is done by providing a dictionary to ``sensitivity`` parameter, where the keys are the ``tag_id`` and the values the desired sensibility for each tag.

In [11]:
sensitivity_dict = {
    "starch_flow": 0.01,
    "amina_flow": 0.1,
    "ore_pulp_flow": 0.05,
    "ore_pulp_ph": 0.03,
    "ore_pulp_density": 0.08,
    "total_air_flow": 0.02,
    "total_column_level": 0.04,
}
dev_imp_dict = calculate_implementation_status(
    implementation_data,
    method = "deviation",
    sensitivity = sensitivity_dict,
    sensitivity_type = "rel",
)
dev_imp_dict.head()

Unnamed: 0,id,tag_id,run_id,implementation_perc
0,af1ddb6a-917c-466c-a301-54f1ca7fabb1,ec60d156-eb79-41b0-a907-ccedf677da9a,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,0.0
1,7cc8cdac-7fd4-4e24-9d99-610dcd0d73e4,4206e334-b1ca-4639-8808-e83e3c3be395,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,0.0
2,2cd1d8f8-d138-4f7e-b88a-fcc3adf9c48b,4011b8b6-c59b-4e0d-9779-21d8c190b04b,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,0.0
3,51fd0ccf-0dd8-4338-9af7-13a964c7e4d9,371f1681-6cae-4dc0-abc2-5e8e6a6a116b,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,0.0
4,f7337e67-45fd-4931-b08e-db7a7138230f,b57d45f0-7c63-4c5b-8b34-266df9be2e26,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,0.0


### ``progress`` implementation status

The second methodology provided calculates the implementation status as the ratio
$$
\begin{align}
\text{actual change / suggested change}
\end{align}
$$
where
$$
\begin{align}
\text{actual change = actual value - original value}
\end{align}
$$

$$
\begin{align}
\text{suggested change = recommended value - original value.}
\end{align}
$$

In [12]:
pro_imp = calculate_implementation_status(
    implementation_data,
    method = "progress",
)
pro_imp.head()

Unnamed: 0,id,tag_id,run_id,implementation_perc
0,af1ddb6a-917c-466c-a301-54f1ca7fabb1,ec60d156-eb79-41b0-a907-ccedf677da9a,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,0.755293
1,7cc8cdac-7fd4-4e24-9d99-610dcd0d73e4,4206e334-b1ca-4639-8808-e83e3c3be395,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,-0.124651
2,2cd1d8f8-d138-4f7e-b88a-fcc3adf9c48b,4011b8b6-c59b-4e0d-9779-21d8c190b04b,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,0.108504
3,51fd0ccf-0dd8-4338-9af7-13a964c7e4d9,371f1681-6cae-4dc0-abc2-5e8e6a6a116b,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,0.058121
4,f7337e67-45fd-4931-b08e-db7a7138230f,b57d45f0-7c63-4c5b-8b34-266df9be2e26,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,0.046268


With this methodology, it is possible to obtain implementations that are outside the range 0-100%. If we want to limit our values to that interval, the argument ``clip`` can be used.

Each ratio is first clipped between 0 and 2. Then, each value above 1 es considered to have implementation status of 2 - ratio. The logic behind this operation is that it is as bad to not reach the recommended value as surpassing it by a great amount. Every implementation percentage above 100% will be declining until it reaches 0% again.

In [13]:
pro_imp_clip = calculate_implementation_status(
    implementation_data,
    method = "progress",
    clip = True,
)
pro_imp_clip.head()

Unnamed: 0,id,tag_id,run_id,implementation_perc
0,af1ddb6a-917c-466c-a301-54f1ca7fabb1,ec60d156-eb79-41b0-a907-ccedf677da9a,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,0.755293
1,7cc8cdac-7fd4-4e24-9d99-610dcd0d73e4,4206e334-b1ca-4639-8808-e83e3c3be395,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,0.0
2,2cd1d8f8-d138-4f7e-b88a-fcc3adf9c48b,4011b8b6-c59b-4e0d-9779-21d8c190b04b,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,0.108504
3,51fd0ccf-0dd8-4338-9af7-13a964c7e4d9,371f1681-6cae-4dc0-abc2-5e8e6a6a116b,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,0.058121
4,f7337e67-45fd-4931-b08e-db7a7138230f,b57d45f0-7c63-4c5b-8b34-266df9be2e26,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,0.046268


### Custom implementation status

It is possible to create custom methods to calculate the implementation status and pass them to ``calculate_implementation_status`` function.

Those methods need to have the following protocol:

In [14]:
from recommend.implementation_tracker._implementation_calculation import TMethod
import inspect
print(inspect.getsource(TMethod))

class TMethod(tp.Protocol):
    def __call__(
        self,
        implementation_data: pd.DataFrame,
        **kwargs: tp.Any,
    ) -> pd.DataFrame:
        """
        Signature of functions that can be passed as method in
        ``calculate_implementation_status``, where ``kwargs`` includes parameters
        required to calculate the implementation status.

        Expect to receive nans on ``current_value`` column of ``implementation_data``.
        """



Let's exemplify this behavior with a function that considers that an implementation has been completed if the integer part of the ``current_value`` is the same as the ``recommended_value``.

In [15]:
def integer_implementation_method(
    implementation_data: pd.DataFrame,
) -> pd.DataFrame:
    """
    Calculates implementation status by assessing the percentage os times that the integer part of the 
    recommended value is the same as the integer part of the current value.

    Args:
        implementation_data: data ready for implementation status calculations.

    Returns:
        Dataframe with one row per recommendation and run id with the implementation
        status percentage
    """
    implementation_data = implementation_data.copy()
    implementation_data["implementation_perc"] = (
        np.round(implementation_data["current_value"]) == np.round(implementation_data["recommended_value"])
    )

    return implementation_data

cust_imp = calculate_implementation_status(
    implementation_data,
    method = integer_implementation_method,
)
cust_imp.head()

Unnamed: 0,tag_id,run_id,recommended_value,before_recs_value,id,run_timestamp,timestamp,current_value,implementation_perc
0,ec60d156-eb79-41b0-a907-ccedf677da9a,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,3461.717748,3564.68646,af1ddb6a-917c-466c-a301-54f1ca7fabb1,2017-09-05 23:00:00+00:00,2017-09-06 02:00:00+00:00,3486.914953,False
1,4206e334-b1ca-4639-8808-e83e3c3be395,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,546.209128,630.368125,7cc8cdac-7fd4-4e24-9d99-610dcd0d73e4,2017-09-05 23:00:00+00:00,2017-09-06 02:00:00+00:00,640.858592,False
2,4011b8b6-c59b-4e0d-9779-21d8c190b04b,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,400.055094,410.0,2cd1d8f8-d138-4f7e-b88a-fcc3adf9c48b,2017-09-05 23:00:00+00:00,2017-09-06 02:00:00+00:00,408.92094,False
3,371f1681-6cae-4dc0-abc2-5e8e6a6a116b,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,9.763356,9.564084,51fd0ccf-0dd8-4338-9af7-13a964c7e4d9,2017-09-05 23:00:00+00:00,2017-09-06 02:00:00+00:00,9.575665,True
4,b57d45f0-7c63-4c5b-8b34-266df9be2e26,1d5b3bdb-ed6e-4ec6-b961-bf00df66149e,1.702584,1.744561,f7337e67-45fd-4931-b08e-db7a7138230f,2017-09-05 23:00:00+00:00,2017-09-06 02:00:00+00:00,1.742618,True
