## Notebook for deploying Cognite Function

Run all cells sequentially until `Experimental` section to deploy your Cognite Function.

Modifications are done in `Inputs` section, where you need to supply relevant input parameters as required by instantiation, calculations and deployment of your Cognite Function. The input parameters related to calculations and deployment are stored in `data_dict`. There are two types of input parameters:
- A: General parameters required for deployment of any Cognite Function
- B: Optional parameters for more detailed specifications
- C: Calculation-specific parameters relevant for your calculations defined in `transformation.py` in the associated Cognite Functions subfolder

If your Cognite Function is already instantiated, but you want to set up a new schedule, you can omit calling `generate_cf` and skip straight to calling `deploy_cognite_functions` with a modified `data_dict` of parameters that satisfy your scheduled calculation.

### --- Authentication ---

In [2]:
%load_ext autoreload
%autoreload 2

import pandas as pd

from datetime import datetime
from cognite.client.data_classes import functions


from initialize_cdf_client import initialize_cdf_client
from deploy_cognite_functions import deploy_cognite_functions
from generate_cf import generate_cf
from utilities import dataset_abbreviation

cdf_env = "dev"

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [40]:
# Set limit on function calls - don't think it's really necessary ...
func_limits = functions.FunctionsLimits(timeout_minutes=60, cpu_cores=0.25, memory_gb=1, runtimes=["py39"], response_size_mb=2)
client = initialize_cdf_client(cdf_env)
main_desc = " Produced with Cognite Functions Template."

In [41]:
# client.time_series.delete(external_id="CoEA_WastedEnergy")

### --- Inputs ---

#### A. Required parameters
- `ts_input_names` (list): 
    - names of input time series (a list, even if only one input). Must be given in same order as calculations are performed in `transformations.py`
- `ts_output` (dict): 
    - metadata for output time series, currently supporting the following fields:
    1. `names` (list of strings):
        - names of time series output(s)
        - NB: if multiple time series outputs, order of ts_output_names must correspond to order in ts_input_names.
    2. `description` (list of strings):
        - description for each time series output
    3. `unit` (list of strings):
        - units used for each time series output
- `dataset_id` (int):
    - id of dataset to write data to
- `function_name` (string): 
    - name of Cognite Function to deploy, instantiating a folder `*dataset_abbr*_*function_name*` where *dataset_abbr* is and abbreviation of the name of the dataset to write to (see optional parameters in next section) 
    - for example: `function_name="wasted-energy"` for a Cognite Function that is to calculate wasted energy
- `calculation_function` (string): 
    - name of main calculation function to run, should be defined in transformation.py (in the folder `*dataset_name*_*function_name*`) as `main_*calculation_function*`
    - for example: `calculation_function="wasted_energy"` to use a function `main_wasted_energy` defined in `transformation.py` of the  `*dataset_name*_wasted-energy` folder
- `schedule_name` (string):
    - name of schedule to set up for the Cognite Function. NB: make sure name is unique to avoid overwriting already existing schedules for a particular Cognite Function! If setting up multiple schedules for the same Cognite Function, one for each input time series, a good advice to keep them organized is to use the name of the time series as the name of the schedules
- `sampling_rate` (string): 
    - sampling rate of input time series
    - given as value followed by time unit, e.g., "30s" for 30 seconds, "2m" for 2 minutes, "1h" for 1 hour, etc ...
- `cron_interval_min` (string): 
    - minute-interval to run schedule at (NB: currently only supported for min-interval [1, 60)). The number should be provided as string.
- `backfill_period` (int): 
    - the period (default: number of days) back in time to perform backfilling
- `backfill_hour` (int):
    - the hour of the day to perform backfilling
- `backfill_min_start` (int):
    - performs backfilling for any scheduled call that falls within hour=`backfill_hour` and minute=`[backfill_min_start, backfill_min_start+cron_interval_min]`
- `add_packages` (list): 
    - additional packages required to run the calculations defined in `transformations.py` from the Cognite Function subfolder

##### CF 1: Ideal Power Consumption

In [3]:
ts_input_names = ["VAL_17-FI-9101-286:VALUE", "VAL_17-PI-95709-258:VALUE", "VAL_11-PT-92363B:X.Value", "VAL_11-XT-95067B:Z.X.Value"] # Inputs to IdealPowerConsumption function # ["VAL_11-XT-95067B:Z.X.Value", 87.8, "CF_IdealPowerConsumption"] # Inputs to WastedEnergy function
# ts_input_names = ["VAL_11-LT-95107A:X.Value"]
ts_output = {"names": ["CoEA_IdealPowerConsumption"],
             "description": ["Optimal power consumption from equipment." + main_desc], #["Daily average drainage from pump"]
             "unit": ["J/s"]} #["m3/min"]
dataset_id = 1832663593546318

function_name = "ideal-power-consumption"
calculation_function = "ideal_power_consumption"
schedule_name = "ipc"#ts_input_names[0]

sampling_rate = "1m"
cron_interval_min = str(15) #
assert int(cron_interval_min) < 60 and int(cron_interval_min) >= 1

backfill_period = 20
backfill_hour = 11 # 23
backfill_min_start = 0
backfill_min_start = min(59, backfill_min_start)

add_packages = []#["statsmodels"]

##### CF 2: Wasted energy

In [42]:
ts_input_names = ["VAL_11-XT-95067B:Z.X.Value", 87.8, "CoEA_IdealPowerConsumption"] # Inputs to IdealPowerConsumption function # ["VAL_11-XT-95067B:Z.X.Value", 87.8, "CF_IdealPowerConsumption"] # Inputs to WastedEnergy function
# ts_input_names = ["VAL_11-LT-95107A:X.Value"]
ts_output = {"names": ["CoEA_WastedEnergy"],
             "description": ["Wasted energy from equipment, calculated from ideal power consumption." + main_desc], #["Daily average drainage from pump"]
             "unit": ["J/s"]} #["m3/min"]
dataset_id = 1832663593546318

function_name = "wasted-energy"
calculation_function = "wasted_energy"
schedule_name = "we"#ts_input_names[0]

sampling_rate = "1m"
cron_interval_min = str(15) #
assert int(cron_interval_min) < 60 and int(cron_interval_min) >= 1

backfill_period = 20
backfill_hour = 19 # 23
backfill_min_start = 0
backfill_min_start = min(59, backfill_min_start)

add_packages = []#["statsmodels"]

##### CF 3: Daily average drainage

In [25]:
# ts_input_names = ["VAL_17-FI-9101-286:VALUE", "VAL_17-PI-95709-258:VALUE", "VAL_11-PT-92363B:X.Value", "VAL_11-XT-95067B:Z.X.Value"] # Inputs to IdealPowerConsumption function # ["VAL_11-XT-95067B:Z.X.Value", 87.8, "CF_IdealPowerConsumption"] # Inputs to WastedEnergy function
ts_input_names = ["VAL_18-LIT-80143:VALUE"]
ts_output = {"names": ["VAL_18-LIT-80143.CDF.D.AVG.LeakValue"],
             "description": ["Daily average drainage from pump."+main_desc],
             "unit": ["m3/min"]}
dataset_id = 1832663593546318

function_name = "avg-leakage"
calculation_function = "aggregate"
schedule_name = ts_input_names[0]

sampling_rate = "1m" #
cron_interval_min = str(15) #
assert int(cron_interval_min) < 60 and int(cron_interval_min) >= 1

backfill_period = 3
backfill_hour = 15 # 23
backfill_min_start = 0
backfill_min_start = min(59, backfill_min_start)

add_packages = ["statsmodels"]

#### B. Optional parameters (if no, leave empty)
- `historic_start_time` (dictionary):
    - date to start from when performing initial calculation of signal
    - recommended if millions of data points in full historic signal
    - three keys must be specified
    1. `year`
    2. `month`
    3. `day`
- `aggregate` (dictionary):
    - information about any aggregations to perform in the calculation
    - if performing aggregates, two keys **must** be specified:
    1. `period` (string):
        - the time range defining the aggregated period
        - valid values: `["minute", "hour", "day", "month", "year"]`
    2. `type` (string):
        - what type of aggregate to perform
        - valid values: any aggregation supported by `pandas`, e.g., `"mean"`, `"max"`, ...
- `dataset_abbr` (string):
    - abbreviated name for dataset, used as prefix for Cognite Function for better structure
    - if not provided, an abbreviated form will be automatically generated from the name of the dataset

In [43]:
optional = {
    "historic_start_time": {
        "year": 2022,
        "month": 10,
        "day": 1
    },
    "aggregate": {
        "period": "day",
        "type": "mean"
    }
    # "dataset_abbr": "PIts" # PI Time Series
}

#### C. Calculation-specific parameters (if no, leave empty)

In [44]:
calc_params = {
    "tank_volume": 240,
    "derivative_value_excl": 0.002,
    "lowess_frac": 0.001,
    "lowess_delta": 0.01,
}

#### Insert parameters into data dictionary

In [45]:
dataset_abbr = dataset_abbreviation(client, optional, dataset_id)

data_dict = {'ts_input_names':ts_input_names,
            'ts_output':ts_output,
            'function_name': f"{dataset_abbr}_{function_name}",
            'schedule_name': schedule_name,
            'calculation_function': f"main_{calculation_function}",
            'granularity': sampling_rate,
            'dataset_id': dataset_id, # Center of Excellence - Analytics dataset
            'cron_interval_min': cron_interval_min,
            'testing': False,
            'backfill_period': backfill_period, # days by default (if not doing aggregates)
            'backfill_hour': backfill_hour, # 23: backfilling to be scheduled at last hour of day as default
            'backfill_min_start': backfill_min_start, 'backfill_min_end': min(59.9, backfill_min_start + int(cron_interval_min)),
            'optional': optional,
            'calc_params': calc_params
            }

### --- Instantiate Cognite Function ---

Set up folder structure for the Cognite Function as required by the template.

In [29]:
generate_cf(data_dict, add_packages)

Writing __init__.py ...
Writing handler.py ...
Writing transformation.py ...
Created requirements.txt in c:/Users/vetnev/OneDrive - Aker BP/Documents/First Task/opshub-task1/src/CoEA_avg-leakage
Packages to add:  ['pytest', 'pyarrow', 'statsmodels', 'python-dotenv', 'ipykernel', 'pandas', 'numpy', 'cognite-sdk', 'openpyxl']

Using version ^7.4.4 for pytest

Updating dependencies
Resolving dependencies...

Package operations: 5 installs, 0 updates, 0 removals

  â€¢ Installing colorama (0.4.6)
  â€¢ Installing iniconfig (2.0.0)
  â€¢ Installing packaging (23.2)
  â€¢ Installing pluggy (1.4.0)
  â€¢ Installing pytest (7.4.4)

Writing lock file

Using version ^15.0.0 for pyarrow

Updating dependencies
Resolving dependencies...

Package operations: 2 installs, 0 updates, 0 removals

  â€¢ Installing numpy (1.26.3)
  â€¢ Installing pyarrow (15.0.0)

Writing lock file

Using version ^0.14.1 for statsmodels

Updating dependencies
Resolving dependencies...

Package operations: 8 installs, 0 up

### --- Define transformation function ---

**IMPORTANT**: Include your desired calculations in `transformation.py` before moving on to next step (deployment)

### --- Deploy Cognite Function in one go ---

#### Scheduled call

Set up schedule of Cognite Function. 


**NB**: Be aware that initial transformation can be data intensive, and may require running locally before deploying schedule, or limit the start time of historic calculation.

In [46]:
deploy_cognite_functions(data_dict, client,
                         single_call=False, scheduled_call=True)

Cognite Function created. Waiting for deployment status to be ready ...
Ready for deployement.
Preparing schedule to start sharp at next minute ...
Setting up Cognite Function schedule at time 2024-01-26 16:38:00+00:00 ...
... Done


#### Optional: Single call

In [None]:
deploy_cognite_functions(data_dict, client,
                         single_call=True, scheduled_call=False)

Cognite Function created. Waiting for deployment status to be ready ...
Ready for deployement.
Calling Cognite Function individually ...
... Done
