## Notebook for deploying Cognite Function

Run all cells sequentially until `Experimental` section to deploy your Cognite Function.

Modifications are done in `Inputs` section, where you need to supply relevant input parameters as required by instantiation, calculations and deployment of your Cognite Function. The input parameters related to calculations and deployment are stored in `data_dict`. There are two types of input parameters:
- A: General parameters required for deployment of any Cognite Function
- B: Optional (calculation-specific) parameters used as input to your calculation function. These should enter `data_dict["calc_params"]` as key-value pairs.

If your Cognite Function is already instantiated, but you want to set up a new schedule, you can omit calling `generate_cf` and skip straight to calling `deploy_cognite_functions` with a modified `data_dict` of parameters that satisfy your scheduled calculation.

### --- Authentication ---

In [2]:
%load_ext autoreload
%autoreload 2

import pandas as pd
from datetime import datetime
from cognite.client.data_classes import functions
from cognite.client.data_classes.functions import FunctionSchedulesList
from cognite.client.data_classes.functions import FunctionSchedule

from initialize import initialize_client
from deploy_cognite_functions import deploy_cognite_functions
from generate_cf import generate_cf

cdf_env = "dev"

In [3]:
# Set limit on function calls - don't think it's really necessary ...
func_limits = functions.FunctionsLimits(timeout_minutes=60, cpu_cores=0.25, memory_gb=1, runtimes=["py39"], response_size_mb=2)
client = initialize_client(cdf_env)

In [78]:
# client.time_series.delete(external_id="VAL_17-FI-9101-286:VALUE.COPY")
# client.time_series.delete(external_id="CF_IdealPowerConsumption")
# client.time_series.data.retrieve(external_id="pi:156799",
#                                 aggregates="average",
#                                 granularity=f"60s",
#                                 start=pd.to_datetime(datetime(2024,1,15,10,15,32)),
#                                 end=pd.to_datetime(datetime(2024,1,15,10,30,32))).to_pandas()

# ext_id = client.time_series.search(name="VAL_11-LT-95107A:X.Value").as_external_ids()[0]
# df = client.time_series.data.retrieve_dataframe(external_id=ext_id,
#                                                     # start=pd.to_datetime(datetime(2024,1,21), utc=True),
#                                                     # end=pd.to_datetime(datetime(2024,1,24), utc=True),
#                                                     granularity="60s",
#                                                     aggregates="average").index[0]
# df
# client.files.upload(f"../data/VAL_11-PT-92363BX.Value.xlsx", external_id="VAL_11-PT-92363BX.Value", name=f"VAL_11-PT-92363BX.Value.xlsx",
#                                 data_set_id=1832663593546318, mime_type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet") # mime_type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
# myfile = client.files.download_bytes(external_id="dummy_data")
# client.files.delete(external_id="VAL_17-PI-95709-258VALUE")

In [4]:
client.time_series.search(name="VAL_11-LT-95107A:X.Value")

Unnamed: 0,id,external_id,name,is_string,metadata,asset_id,is_step,description,security_categories,data_set_id,created_time,last_updated_time,unit
0,557561586839253,pi:156799,VAL_11-LT-95107A:X.Value,False,"{'compdev': '0', 'location5': '2', 'pointtype'...",5925083212680193,False,PH Oil Boost Pump A SO Res,[],3355691952342071,2021-01-13 11:19:01.296,2024-01-25 03:31:48.500,
1,8441371902458428,pi:156789,VAL_11-LT-95007A:X.Value,False,"{'context2ndEntityMatchConfScore': '0.525', 'c...",5812070406603317,False,PH Oil Export Pump SO Res,[],3355691952342071,2021-01-13 11:15:38.091,2024-01-25 03:34:31.984,
2,8121167691208049,pi:156800,VAL_11-LT-95107B:X.Value,False,"{'compdev': '0', 'location5': '2', 'pointtype'...",7099831018954753,False,PH Oil Boost Pump B SO Res,[],3355691952342071,2021-01-13 11:17:02.294,2024-01-26 03:48:44.880,
3,8774028179719513,pi:156801,VAL_11-LT-95108A:X.Value,False,"{'context2ndEntityMatchConfScore': '0.525', 'c...",8346194642603386,False,PH Oil Boost Pump A SO Res,[],3355691952342071,2021-01-13 11:19:31.924,2024-01-25 03:42:16.036,
4,8842657228489718,VAL_11-LT-95107A:X.CDF.D.AVG.LeakValue,VAL_11-LT-95107A:X.CDF.D.AVG.LeakValue,False,{},5925083212680193,False,,[],1832663593546318,2024-01-08 14:47:12.455,2024-01-08 14:47:12.455,
5,952316718957386,VAL_11-LT-95107A:X.CDF.H.AVG.LeakValue,VAL_11-LT-95107A:X.CDF.H.AVG.LeakValue,False,{},5925083212680193,False,,[],1832663593546318,2024-01-02 13:53:25.898,2024-01-02 13:53:25.898,
6,376719443289904,VAL_11-LT-95107A:X.CDF.D.AVG.LeakValue.NEW,VAL_11-LT-95107A:X.CDF.D.AVG.LeakValue.NEW,False,{},5925083212680193,False,,[],1832663593546318,2024-01-02 11:31:46.674,2024-01-02 11:31:46.674,
7,2359087192723580,VAL_11-LT-95107B:X.CDF.D.AVG.LeakValue,VAL_11-LT-95107B:X.CDF.D.AVG.LeakValue,False,{},7099831018954753,False,,[],1832663593546318,2024-01-08 14:50:42.278,2024-01-08 14:50:42.278,
8,1489339027664454,pi:156791,VAL_11-LT-95008A:X.Value,False,"{'context2ndEntityMatchConfScore': '0.525', 'c...",881318544844107,False,PH Oil Export Pump SO Res,[],3355691952342071,2021-01-13 11:14:49.285,2024-01-25 03:42:42.027,
9,2659829429231746,pi:156793,VAL_11-LT-95034A:X.Value,False,"{'compdev': '0', 'location5': '2', 'pointtype'...",6140886841963154,False,PH Oil Export Pump LO Res,[],3355691952342071,2021-01-13 11:15:17.667,2024-01-26 03:49:26.395,


### --- Inputs ---

#### A. Required parameters
- `ts_input_names` (list): 
    - names of input time series (a list, even if only one input). Must be given in same order as calculations are performed in `transformations.py`
- `ts_output` (dict): 
    - metadata for output time series, currently supporting the following fields:
    1. `names` (list of strings):
        - names of time series output(s)
        - NB: if multiple time series outputs, order of ts_output_names must correspond to order in ts_input_names.
    2. `description` (list of strings):
        - description for each time series output
    3. `unit` (list of strings):
        - units used for each time series output
- `function_name` (string): 
    - name of Cognite Function to deploy (i.e., folder with name `cf_*function_name*`)
- `calculation_function` (string): 
    - name of main calculation function to run, should be defined in transformation.py (in the folder `cf_*function_name*`) as `main_*calculation_function*`
- `schedule_name` (string):
    - name of schedule to set up for the Cognite Function. NB: make sure name is unique to avoid overwriting already existing schedules for a particular Cognite Function! If setting up multiple schedules for the same Cognite Function, one for each input time series, a good advice to keep them organized is to use the name of the time series as the name of the schedules
- `sampling_rate` (string): 
    - sampling rate of input time series
    - given as value followed by time unit, e.g., "30s" for 30 seconds, "2m" for 2 minutes, "1h" for 1 hour, etc ...
- `cron_interval_min` (string): 
    - minute-interval to run schedule at (NB: currently only supported for min-interval [1, 60)). The number should be provided as string.
- `backfill_period` (int): 
    - the period (default: number of days) back in time to perform backfilling
- `backfill_hour` (int):
    - the hour of the day to perform backfilling
- `backfill_min_start` (int):
    - performs backfilling for any scheduled call that falls within hour=`backfill_hour` and minute=`[backfill_min_start, backfill_min_start+cron_interval_min]`
- `testing` (bool):
    - defaults to `False`. Set to `True` if running unit tests
- `add_packages` (list): 
    - additional packages required to run the calculations in `transformations.py`

In [84]:
ts_input_names = ["VAL_17-FI-9101-286:VALUE", "VAL_17-PI-95709-258:VALUE", "VAL_11-PT-92363B:X.Value", "VAL_11-XT-95067B:Z.X.Value"] # Inputs to IdealPowerConsumption function # ["VAL_11-XT-95067B:Z.X.Value", 87.8, "CF_IdealPowerConsumption"] # Inputs to WastedEnergy function
# ts_input_names = ["VAL_11-LT-95107A:X.Value"]
ts_output = {"names": ["CF_IdealPowerConsumption"],
             "description": ["Optimal power consumption from equipment"], #["Daily average drainage from pump"]
             "unit": ["J/s"]} #["m3/min"]

function_name = "ideal-power-consumption"
calculation_function = "ideal_power_consumption"
schedule_name = "ipc"#ts_input_names[0]

sampling_rate = "1m"
cron_interval_min = str(15) #
assert int(cron_interval_min) < 60 and int(cron_interval_min) >= 1
backfill_period = 20
backfill_hour = 20 # 23
backfill_min_start = 0

add_packages = []#["statsmodels"]

In [27]:
# ts_input_names = ["VAL_17-FI-9101-286:VALUE", "VAL_17-PI-95709-258:VALUE", "VAL_11-PT-92363B:X.Value", "VAL_11-XT-95067B:Z.X.Value"] # Inputs to IdealPowerConsumption function # ["VAL_11-XT-95067B:Z.X.Value", 87.8, "CF_IdealPowerConsumption"] # Inputs to WastedEnergy function
ts_input_names = ["VAL_18-LIT-80143:VALUE"]
ts_output = {"names": ["hourly_avg_drainage_test"],
             "description": [None], #["Daily average drainage from pump"]
             "unit": [None]} #["m3/min"]

function_name = "avg-leakage"
calculation_function = "aggregate"
schedule_name = ts_input_names[0]

sampling_rate = "1m" #
cron_interval_min = str(5) #
assert int(cron_interval_min) < 60 and int(cron_interval_min) >= 1
backfill_period = 3
backfill_hour = 23 # 23
backfill_min_start = 0

add_packages = ["statsmodels"]

#### B. Optional parameters (if no, leave empty)
- `historic_start_time` (datetime):
    - date to start from when performing initial calculation of signal
    - recommended if millions of data points in full historic signal
    - three keys must be specified
    1. `year`
    2. `month`
    3. `day`
- `aggregate` (dictionary):
    - information about any aggregations to perform in the calculation
    - if performing aggregates, two keys **must** be specified:
    1. `period` (string):
        - the time range defining the aggregated period
        - valid values: `["minute", "hour", "day", "month", "year"]`
    2. `type` (string):
        - what type of aggregate to perform
        - valid values: any aggregation supported by `pandas`, e.g., `"mean"`, `"max"`, ... 

In [85]:
optional = {
    "historic_start_time": {
        "year": 2022,
        "month": 10,
        "day": 1
    },
    # "aggregate": {
    #     "period": "hour",
    #     "type": "mean"
    # }
}

#### C. Calculation-specific parameters (if no, leave empty)

In [86]:
calc_params = {
    "tank_volume": 240,
    "derivative_value_excl": 0.002,
    "lowess_frac": 0.001,
    "lowess_delta": 0.01,
}

#### Insert parameters into data dictionary

In [87]:
backfill_min_start = min(59, backfill_min_start)

data_dict = {'ts_input_names':ts_input_names,
            'ts_output':ts_output,
            'function_name': f"cf_{function_name}",
            'schedule_name': schedule_name,
            'calculation_function': f"main_{calculation_function}",
            'granularity': sampling_rate,
            'dataset_id': 1832663593546318, # Center of Excellence - Analytics dataset
            'cron_interval_min': cron_interval_min,
            'testing': False,
            'backfill_period': backfill_period, # days by default (if not doing aggregates)
            'backfill_hour': backfill_hour, # 23: backfilling to be scheduled at last hour of day as default
            'backfill_min_start': backfill_min_start, 'backfill_min_end': min(59.9, backfill_min_start + int(cron_interval_min)),
            'optional': optional,
            'calc_params': calc_params
            }

### --- Instantiate Cognite Function ---

Set up folder structure for the Cognite Function as required by the template.

In [31]:
generate_cf(function_name, add_packages)

Writing __init__.py ...
Writing handler.py ...
Writing transformation.py ...
Created requirements.txt in c:/Users/vetnev/OneDrive - Aker BP/Documents/First Task/opshub-task1/src/cf_avg-leakage
Packages to add:  ['ipykernel', 'pandas', 'numpy', 'openpyxl', 'pytest', 'python-dotenv', 'pyarrow', 'cognite-sdk', 'statsmodels']

Using version ^6.29.0 for ipykernel

Updating dependencies
Resolving dependencies...

Package operations: 28 installs, 0 updates, 0 removals

  â€¢ Installing six (1.16.0)
  â€¢ Installing asttokens (2.4.1)
  â€¢ Installing executing (2.0.1)
  â€¢ Installing parso (0.8.3)
  â€¢ Installing platformdirs (4.1.0)
  â€¢ Installing pure-eval (0.2.2)
  â€¢ Installing pywin32 (306)
  â€¢ Installing traitlets (5.14.1)
  â€¢ Installing wcwidth (0.2.13)
  â€¢ Installing colorama (0.4.6)
  â€¢ Installing decorator (5.1.1)
  â€¢ Installing jedi (0.19.1)
  â€¢ Installing jupyter-core (5.7.1)
  â€¢ Installing matplotlib-inline (0.1.6)
  â€¢ Installing prompt-toolkit (3.0.43)
  â€¢ 

### --- Define transformation function ---

In this step, modify `transformation.py` to include your calculations.

### --- Deploy Cognite Function in one go ---

#### Single call

Initial transformation is data-intensive. A scheduled call will likely time out. Instead, do a separate call first. 

In [45]:
deploy_cognite_functions(data_dict, client,
                         single_call=True, scheduled_call=False)

Cognite Function created. Waiting for deployment status to be ready ...
Ready for deployement.
Calling Cognite Function individually ...
... Done


#### Scheduled call

For subsequent calls, transformations are only done on current date, not too data intensive. This can be handled by scheduled calls.

In [88]:
deploy_cognite_functions(data_dict, client,
                         single_call=False, scheduled_call=True)

Preparing schedule to start sharp at next minute ...
Setting up Cognite Function schedule at time 2024-01-25 17:27:00+00:00 ...
... Done
