## Notebook for deploying Cognite Function

Run all cells sequentially until `Experimental` section to deploy your Cognite Function.

Modifications are done in `Inputs` section, where you need to supply relevant input parameters as required by instantiation, calculations and deployment of your Cognite Function. The input parameters related to calculations and deployment are stored in `data_dict`. There are two types of input parameters:
- A: General parameters required for deployment of any Cognite Function
- B: Optional (calculation-specific) parameters used as input to your calculation function. These should enter `data_dict["calc_params"]` as key-value pairs.

If your Cognite Function is already instantiated, but you want to set up a new schedule, you can omit calling `generate_cf` and skip straight to calling `deploy_cognite_functions` with a modified `data_dict` of parameters that satisfy your scheduled calculation.

### --- Authentication ---

In [1]:
import pandas as pd
from cognite.client.data_classes import functions
from cognite.client.data_classes.functions import FunctionSchedulesList
from cognite.client.data_classes.functions import FunctionSchedule

from initialize import initialize_client
from deploy_cognite_functions import deploy_cognite_functions
from generate_cf import generate_cf

cdf_env = "dev"

In [2]:
# Set limit on function calls - don't think it's really necessary ...
func_limits = functions.FunctionsLimits(timeout_minutes=60, cpu_cores=0.25, memory_gb=1, runtimes=["py39"], response_size_mb=2)
client = initialize_client(cdf_env)

In [9]:
# client.time_series.delete(external_id="VAL_17-FI-9101-286:VALUE.COPY")
# client.time_series.delete(external_id="test_CF")
# client.time_series.delete(external_id="CF_WastedEnergy")

CogniteNotFoundError: Not found: [{'externalId': 'CF_WastedEnergy'}]
The API Failed to process some items.
Successful (2xx): []
Unknown (5xx): []
Failed (4xx): ['CF_WastedEnergy']

### --- Inputs ---

#### A. Required parameters
- `ts_input_names` (list): 
    - names of input time series (a list, even if only one input). Must be given in same order as calculations are performed in `transformations.py`
- `ts_output_names` (list): 
    - names of output time series (also given as list). NB: if multiple time series outputs, order of ts_output_names must correspond to order in ts_input_names.
- `function_name` (string): 
    - name of Cognite Function to deploy (i.e., folder with name `cf_*function_name*`)
- `calculation_function` (string): 
    - name of main calculation function to run, should be defined in transformation.py (in the folder `cf_*function_name*`) as `main_*calculation_function*`
- `schedule_name` (string):
    - name of schedule to set up for the Cognite Function. NB: make sure name is unique to avoid overwriting already existing schedules for a particular Cognite Function! If setting up multiple schedules for the same Cognite Function, one for each input time series, a good advice to keep them organized is to use the name of the time series as the name of the schedules 
- `aggregate` (dictionary):
    - information about any aggregations to perform in the calculation.
    - if **not** performing any aggregates, leave the dictionary empty!
    - if performing aggregates, two keys must be specified:
    1. `period` (string):
        - the time range defining the aggregated period
        - valid values: `["second", "minute", "hour", "day", "month", "year"]`
    2. `type` (string):
        - what type of aggregate to perform
        - valid values: any aggregation supported by `pandas`, e.g., `"mean"`, `"max"`, ... 
- `sampling_rate` (int): 
    - sampling rate of input time series, given in seconds
- `cron_interval_min` (string): 
    - minute-interval to run schedule at (NB: currently only supported for min-interval [1, 60)). The number should be provided as string.
- `backfill_days` (int): 
    - number of days back in time to perform backfilling
- `backfill_hour` (int):
    - the hour of the day to perform backfilling
- `backfill_min_start` (int):
    - performs backfilling for any scheduled call that falls within hour=`backfill_hour` and minute=`[backfill_min_start, backfill_min_start+cron_interval_min]`
- `testing` (bool):
    - defaults to `False`. Set to `True` if running unit tests
- `add_packages` (list): 
    - additional packages required to run the calculations in `transformations.py`

In [3]:
# ts_input_names = ["VAL_17-FI-9101-286:VALUE", "VAL_17-PI-95709-258:VALUE", "VAL_11-PT-92363B:X.Value", "VAL_11-XT-95067B:Z.X.Value"] # Inputs to IdealPowerConsumption function
# ts_input_names = ["VAL_11-XT-95067B:Z.X.Value", 87.8, "CF_IdealPowerConsumption"] # Inputs to WasterEnergy function
ts_input_names = ["VAL_11-LT-95107A:X.Value"]#["VAL_11-PT-92363B:X.Value"]
# ts_output_names = ["VAL_17-FI-9101-286:MULTIPLE.Test", "VAL_17-PI-95709-258:MULTIPLE.Test", "VAL_11-PT-92363B:MULTIPLE.Test"]#, "VAL_11-XT-95067B:MULTIPLE.Test"]
# ts_output_names = ["CF_IdealPowerConsumption"]
# ts_output_names = ["CF_WastedEnergy"]
ts_output_names = ["VAL_11-LT-95107A:X.CDF.D.AVG.LeakValue"]#["VAL_11-PT-92363B:X.HOURLY.AVG.DRAINAGE"] #["TemplateVsCharts_Template"]

function_name = "daily-avg-drainage"
calculation_function = "daily_avg_drainage"
schedule_name = ts_input_names[0]

aggregate = {}
aggregate["period"] = "day"
aggregate["type"] = "mean"

sampling_rate = 60 #
cron_interval_min = str(15) #
assert int(cron_interval_min) < 60 and int(cron_interval_min) >= 1
backfill_days = 3
backfill_hour = 14 # 23
backfill_min_start = 30

add_packages = [] #["indsl"]

#### B. Optional parameters

In [4]:
tank_volume = 1400
derivative_value_excl = 0.002
lowess_frac = 0.001
lowess_delta = 0.01

#### Insert parameters into data dictionary

In [5]:
backfill_min_start = min(59, backfill_min_start)

data_dict = {'ts_input_names':ts_input_names,
            'ts_output_names':ts_output_names,
            'function_name': f"cf_{function_name}",
            'schedule_name': schedule_name,
            'calculation_function': f"main_{calculation_function}",
            'granularity': sampling_rate,
            'dataset_id': 1832663593546318, # Center of Excellence - Analytics dataset
            'cron_interval_min': cron_interval_min,
            'aggregate': aggregate,
            'testing': False,
            'backfill_days': backfill_days,
            'backfill_hour': backfill_hour, # 23: backfilling to be scheduled at last hour of day as default
            'backfill_min_start': backfill_min_start, 'backfill_min_end': min(59.9, backfill_min_start + int(cron_interval_min)),
            'calc_params': {
                'derivative_value_excl':derivative_value_excl, 'tank_volume':tank_volume,
                'lowess_frac': lowess_frac, 'lowess_delta': lowess_delta, 'time_unit': "1m"
            }}

### --- Instantiate Cognite Function ---

Set up folder structure for the Cognite Function as required by the template.

In [7]:
generate_cf(function_name, add_packages)

Writing __init__.py ...
Writing handler.py ...
Writing transformation.py ...
Created requirements.txt in c:/Users/vetnev/OneDrive - Aker BP/Documents/First Task/opshub-task1/src/cf_wasted-energy
Packages to add:  ['pandas', 'numpy', 'python-dotenv', 'pytest', 'ipykernel', 'cognite-sdk']

Using version ^2.1.4 for pandas

Updating dependencies
Resolving dependencies...

Package operations: 6 installs, 0 updates, 0 removals

  â€¢ Installing six (1.16.0)
  â€¢ Installing numpy (1.26.2)
  â€¢ Installing python-dateutil (2.8.2)
  â€¢ Installing pytz (2023.3.post1)
  â€¢ Installing tzdata (2023.3)
  â€¢ Installing pandas (2.1.4)

Writing lock file

Using version ^1.26.2 for numpy

Updating dependencies
Resolving dependencies...

No dependencies to install or update

Writing lock file

Using version ^1.0.0 for python-dotenv

Updating dependencies
Resolving dependencies...

Package operations: 1 install, 0 updates, 0 removals

  â€¢ Installing python-dotenv (1.0.0)

Writing lock file

Using ve

### --- Define transformation function ---

In this step, modify `transformation.py` to include your calculations.

### --- Deploy Cognite Function in one go ---

#### Single call

Initial transformation is data-intensive. A scheduled call will likely time out. Instead, do a separate call first. 

In [6]:
deploy_cognite_functions(data_dict, client,
                         single_call=True, scheduled_call=False)

Calling Cognite Function individually ...
... Done


#### Scheduled call

For subsequent calls, transformations are only done on current date, not too data intensive. This can be handled by scheduled calls.

In [7]:
deploy_cognite_functions(data_dict, client,
                         single_call=False, scheduled_call=True)

Setting up Cognite Function schedule at time 2023-12-22 12:41:10.301468 ...
... Done


### --- Experimental ---

In [34]:
import ast

# data[col]: prints pd.Series object
# data[[col]]: prints pd.DataFrame object

myfunc = client.functions.retrieve(external_id="cf_wasted-energy")
my_schedule_id = client.functions.schedules.list(
                name="cf_wasted-energy").to_pandas().id[0]
myfunc.list_calls(schedule_id=my_schedule_id)
test = client.functions.calls.retrieve(call_id=3005253751851002, function_id=84587311037983).get_response()

8236094801741723

In [57]:
import pandas as pd
import json
test = pd.DataFrame([[1,2,3,4,5], [5,6,7,6,5]]).T
ast.literal_eval('{"test": None}')
orig = ast.literal_eval(test[0].to_json())
ast.literal_eval(json.dumps({"test": orig, "gsgg": json.dumps(None)}))

{'test': {'0': 1, '1': 2, '2': 3, '3': 4, '4': 5}, 'gsgg': 'null'}

sid = client.functions.schedules.list(function_id=func_drainage.id).to_pandas().id[0]
scid = func_drainage.list_calls(schedule_id=sid, limit=-1).to_pandas()
resp = func_drainage.retrieve_call(id=scid).get_response()
resp

my_func = client.functions.retrieve(external_id=data_dict["function_name"])
my_schedule_id = client.functions.schedules.list(
            name=data_dict["function_name"]).to_pandas().id[0]
all_calls = my_func.list_calls(
            schedule_id=my_schedule_id, limit=-1).to_pandas()
all_calls.tail()

pd.date_range(start=datetime(2023,11,16,0,0), end=datetime(2023,11,16,3,51), freq="T")
extid = client.time_series.list(name="VAL_17-FI-9101-286:VALUE")[0].external_id
ts_orig_all = client.time_series.data.retrieve(external_id=extid,
                                                   limit=20,
                                                   ).to_pandas()
ts_orig_all.head()