# DataHowLab SDK

DataHowLab SDK is a software development kit (SDK) designed to simplify and streamline the integration of DataHowLab's API into a simple Python package. This SDK provides developers with a convenient and efficient way to interact with DataHowLab's API, enabling access to the Experiments, Variables, Projects and Models, their data, as well as making predictions using these models with new sets of data.

### Prerequisites:

- **Python:** Ensure you have Python installed (version 3.9 or higher) on your system.
- **API Key:** Make sure you have a valid DataHowLab API Key. If you don't have an API key, please contact your DataHowLab provider.

### Importing SDK Package

Before starting to use DataHowLab's SDK, make sure you have a valid API Key. 
* You can store your key in a `.env` file in your directory with the key `DHL_API_KEY=your_api_key`. Otherwise, you can use your key as a direct argument in the `APIKeyAuthentication`, like
``` key = APIKeyAuthentication(api_key = "your_api_key")```. 
* After this, use your key to create your DataHowLabClient: ``` DataHowLabClient(auth_key=key, base_url="https://yourdomain.datahowlab.ch/")```, where `base_url` correspond to your DataHowLab domain and `verify_ssl` controls whether the SSL certificates are verified (verification is enabled by *default*).

*Note*: Disabling SSL verification should only be done in controlled environments, such as on premise deployments with trusted servers with self-signed certificates.

In [None]:
%load_ext dotenv
%dotenv
from dhl_sdk import DataHowLabClient, APIKeyAuthentication

# DHL_API_KEY env var is loaded from the .env file
key = APIKeyAuthentication("KEY")

# This is an example. Change the next line to your DataHowLab Instance
your_url = "https://yourdomain.datahowlab.ch/"
client = DataHowLabClient(auth_key=key, base_url=your_url, verify_ssl=True)

### Accessing your DataHowLab Data

Using your DataHowLabClient, you can get a list and access your:

* Products
* Variables
* Experiments
* Recipes
* Projects (Inside Projects you can also access your "Datasets" and "Models")


When accessing your DataBase Entities, all the results will be in a Iterable form. Because of that, you can access each element of your search by using the `next` method, just like a normal Python Iterable object. For more information regarding this, please consult the official python documentation regarding [iterables](https://docs.python.org/3/glossary.html#term-iterable), [iterators](https://docs.python.org/3/glossary.html#term-iterator) and the [next](https://docs.python.org/3/library/functions.html#next) function.

In order to get more information about your entities, you can access the fields `name` and `description` present in all entities. For Products and Variables, you can also access the `code` field. 

In [None]:
# Products can be filtered by code 

products = client.get_products(code = "ExampleProductCode")
product = next(products)

product_name = product.name
product_code = product.code
product_description = product.description

In [None]:
# Variables can be filtered by code, group, or variable type 

variables = client.get_variables(group="ExampleGroup")
variable = next(variables)

#printing the variable name
print(variable.name)

# if the variable was not the one you wanted, you can keep iterating
wanted_variable = next(variables)
print(wanted_variable.name)

In [None]:
# Experiments can be filtered by name or product. 
# If you want to filter by product you should first get your product
# and then use it to filter the experiments.

experiments = client.get_experiments(name="ExampleExperimentName", product=product)
experiment = next(experiments)

In [None]:
# Projects can be filtered by name and project type 
# (by default, project_type = "cultivation", but it can also be "spectroscopy")

projects = client.get_projects(name="exampleproject")
project = next(projects)

### Exporting Experiment Data

After selecting an experiment, you can download the data from that experiment using the `get_data` method, by providing your `client` as an argument. 

The results of the method are in a dictionary format where: 

    {"Variable Code 1": 
        {
            "timestamps": [0,1,2], 
            "values": [val1, val2, val3] 
        },
    "Variable Code 2":     
        {
            "timestamps": [0, 1, 2], 
            "values": [val1, val2, val3]  
        }
    }

In [None]:
exp_data = experiment.get_data(client)
print(exp_data)

You can also access the data inside your datasets through your projects. After you select a project, use to the method `get_datasets` to find your dataset. Similar to the `experiments` and `recipes`, you can also filter the results by the dataset name. After selecting it, download the data using the `get_data(client)` method of the dataset.
 
This will return a list of dicts associated with each experiment. To check the information regarding each experiment, just access the `experiment` field inside your dataset: `dataset.experiments`

In [None]:
# You can filter the datasets by name

# example of looping through all datasets to find with "test" in the name
datasets = project.get_datasets()
for ds in datasets:
    if "test" in ds.name:
        dataset = ds
        break
    
dataset_data = dataset.get_data(client)

### Importing Data to DataHowLab's DataBase

The SDK also allows the user to import new experiments to the DataBase (DB). This process implies the creation/selection of a Product, Variables and Data to create a new experiment. 

In case you want to use Entities already created, just get them using the methods shown above. 

In case you need to create new entities you need to initiate them using the `new` method of each one of the entities. Here is an example workflow for a new Product:

In [None]:
from dhl_sdk import Product

product = Product.new(name="ExampleProduct", code="EXPRD", description="Example Product")

And here is an example for creating 2 diferent variables: 

In [None]:
from dhl_sdk import Variable, VariableNumeric, VariableCategorical

variable1 = Variable.new(code="SDKv1", 
                         name="SDK Variable 1", 
                         description="This is a variable created with the SDK", 
                         measurement_unit="l", 
                         variable_group="X Variables", 
                         variable_type=VariableNumeric()
                         )
variable2 = Variable.new(code="SDKv2", 
                         name="SDK Variable 2", 
                         description="This is a variable created with the SDK", 
                         measurement_unit="l", 
                         variable_group="Z Variables", 
                         variable_type=VariableCategorical()
                         )

There are some limitation when creating a new variable, specially when selecting the `variable_group` and `variable_type`. 

To access a list of available Variable Groups, you can use the `get_valid_variable_groups` method from the `Variable` class:

In [None]:
#This will return a list of the valid variable groups, e.g. ["X Variables", "Y Variables", "Z Variables"]

variable_groups = Variable.get_valid_variable_groups(client)
print(variable_groups)

For `variable_type`, you need to use one of the valid options. Consult the documentation of each class to have a better understanding of each type. 

Here are some examples:

In [None]:
from dhl_sdk import VariableLogical, VariableSpectrum, VariableSpectrumXAxis, VariableSpectrumYAxis


# A Numeric variable with default value 3, min 1, and max 5
numeric_type = VariableNumeric(default=3, min=1, max=5)


# A Categorical variable with default value "A", possible values "A", "B", "C", and strict (menaing we can only use the values in the list)
categorical_type = VariableCategorical(default="A", values=["A", "B", "C"], strict=True)


# A Logical variable with default value True
logical_type = VariableLogical(default=True)


# A Spectrum variable that uses other 2 classes to define the spectrum VariableSpectrumXAxis and VariableSpectrumYAxis
# VariableSpectrumXAxis defines the x axis of the spectrum and VariableSpectrumYAxis defines the y axis of the spectrum
x_axis = VariableSpectrumXAxis(dimension=1000, unit="mm", min=0, max=100)
y_axis = VariableSpectrumYAxis(label="Intensity")
spectrum_type = VariableSpectrum(x_axis=x_axis, y_axis=y_axis)

Using the `new` method of each entity will create a local version of the entities.

In order to use them in a new experiment, you need to upload them to the DB. For that, use the `create` method of `DataHowLabClient`. 

This method will validate your new entities and upload them in case everything is correct.

In [None]:
product = client.create(product)
variable1 = client.create(variable1)
variable2 = client.create(variable2)

There is a special case for variables of type `Feeds/Flows`. Variables in this group must always be linked to other variables of type `X Variables` and `Feed Concentrations`. Besides this link, it is mandatory that you also reference a variable that indicates the initial Bioreactor Volume. 

To create a variable of this type, follow this suggested workflow, assuming you are creating all variables using the SDK and not using an already uploaded variable. If you want to link your new Feed to an existing variable, simply replace the variable creation steps with the previous instructions for searching for a particular `Variable`.

In [None]:
# First, we get a variable for the initial bioreactor volume from the DB (this is just an example)

bioreactor_volume = next(client.get_variables(code="Volume"))

# Next, create an X Variable that will be linked to the feed variable, e.g "Glucose"

x_variable = Variable.new(code="GlcSDK",
                          name="Glucose SDK",
                          description="Glucose concentration in the medium",
                          measurement_unit="g/L",
                          variable_group="X Variables",
                          variable_type=VariableNumeric()
                          )

# Then create a Feed Concentration variable, a variable with the concentration of previous metabolites in the feed medium

feed_concentration = Variable.new(code="BolusFeed_GlcSDK",
                                  name="Glucose Feed SDK",
                                  description="Glucose concentration in the feed medium",
                                  measurement_unit="g/L",
                                  variable_group="Feed Concentrations",
                                  variable_type=VariableNumeric()
                                  )

# Import the 2 new variables to the DB
x_variable = client.create(x_variable)
feed_concentration = client.create(feed_concentration)

# this way, we already established all the necessary variables to link to a feed variable. 
# Now we can create the feed variable, by the next classes: 

from dhl_sdk import FlowVariableReference, VariableFlow

# First we need to create a FlowVariableReference. 
# This class will contain the ID of the measured variable (X Variable) and the ID of the feed concentration variable. 
# To get the ID of each variable, just use the .id attribute.

glc_reference = FlowVariableReference(measurementId = x_variable.id, concentrationId = feed_concentration.id)

# Next, we create the VariableFlow, a variable type with the reference created before.
bolus_feed_type = VariableFlow.new(flow_type="bolus",
                                   variable_references=[glc_reference],
                                   volume_variable_id=bioreactor_volume.id)


# In case you want to create a continuous feed, you need to add a step_size to the VariableFlow. 
# This step_size correspondes to the time between values in seconds. 
# If, for example, you have a feed of L/day, you should set step_size = 86400 (24*60*60)

conti_feed_type = VariableFlow.new(flow_type="conti",
                                   variable_references=[glc_reference],
                                   volume_variable_id=bioreactor_volume.id,
                                   step_size=86400)


# Now that we have the variable type with all the necessary references, we can create a new variable
bolus_glc = Variable.new(code="BolusGlcSDK",
                         name="Bolus Glucose SDK",
                         description="Bolus of glucose",
                         measurement_unit="L",
                         variable_group="Feeds/Flows",
                         variable_type=bolus_feed_type)

# Finally, we can import the new variable to the DB
bolus_glc = client.create(bolus_glc)


For a new experiment, your data needs to be in a specific format:

    {"Variable Code 1": 
        {
            "timestamps": [0,1,2], 
            "values": [val1, val2, val3] 
        },
    "Variable Code 2":     
        {
            "timestamps": [0, 1, 2], 
            "values": [val1, val2, val3]  
        }
    }

In [None]:
run_data = {
            "EXv1": {
                "timestamps": [
                    1600674350,
                    1600760750,
                    1600847150,

                ],
                "values": [
                    5.1,
                    3.5,
                    1.3,
                ]
            },
            "EXv2": {
                "timestamps": [
                    1600674350],
                "values": [
                    "A"]

            }
}

After collecting your new experiment (as the `run_data` example above), you can create a new experiment using `Experiment.new` method. After that, use the `client.create` method to upload the data to the DB.

In [None]:
from dhl_sdk import Experiment

experiment = Experiment.new(name="SDK EXP", 
                            description="new experiment example for sdk", 
                            product=product, 
                            variables=[variable1, variable2], 
                            data_type="run", 
                            data=run_data, 
                            variant="run", 
                            start_time="2020-09-21T08:45:50Z", 
                            end_time="2020-09-30T08:45:50Z")

client.create(experiment)

Note: All timestamps on the new experiment must be in Unix Timestamp (Represents time as the number of seconds since the Unix epoch). 

In order to get from a datetime to a unix timestamp, we recomend you use the `datetime` module in python: 

In [None]:
from datetime import datetime

original_date = "2020-09-21T08:45:50Z"
original_datetime = datetime.strptime("2020-09-21T08:45:50Z", "%Y-%m-%dT%H:%M:%SZ")

unix_timestamp = int(original_datetime.timestamp())

Additionally, when adding data to a new experiment, all timestamps must be between `start_time` and `end_time` of the experiment to be valid.

If, for example, you want to add a new experiment with the same Product and Variables as an already created experiment, you can do it like this:

In [None]:
experiments = client.get_experiments(name="Experiment to Fork")
old_experiment = next(experiments)

new_data = {
            "EXv1": {
                "timestamps": [
                    1600674350,
                    1600760750,
                    1600847150,

                ],
                "values": [
                    5.1,
                    3.5,
                    1.3,
                ]
            },
            "EXv2": {
                "timestamps": [
                    1600674350],
                "values": [
                    "A"]

            }
}

new_experiment = Experiment.new(name="Forked Experiment",
                                description="new experiment example for sdk",
                                product=old_experiment.product,
                                variables=old_experiment.variables,
                                data_type="run",
                                data=new_data,
                                variant="run",
                                start_time="2020-09-21T08:45:50Z", 
                                end_time="2020-09-30T08:45:50Z")
client.create(new_experiment)

### Using your models 

DataHowLab's SDK also allows you to use your trained models to run new predictions based on new data. 

To access your models, you need to find your `project` of interest and, inside this project, find your model using the `get_models` method.

In [None]:
projects = client.get_projects(name="exampleproject")
project = next(projects)

models = project.get_models(name="ExampleModel")
model = next(models)

Inside the model, you can access the dataset used by `model.dataset` and also check the variables used to train the model `model.model_variables`. This last property returns a list of the `Variable` class. If you want to just check the variable codes, you can use the method `model.get_model_variables_codes()`. 

In [None]:
model_dataset = model.dataset 
# you can access the data inside the dataset 
model_data = model_dataset.get_data(client)


variables = model.model_variables
variable_codes = model.get_model_variables_codes()

The available projects are Spectroscopy and Cultivation. Inside Cultivation, we can choose Historical and Propagation models.

#### Spectroscopy 

To search your **Spectroscopy** projects, you can add `project_type="spectroscopy"` to the `get_projects` method. 

In [None]:
projects = client.get_projects(name="Spectra", project_type="spectroscopy")
project = next(projects)

models = project.get_models(name="SpectraModel")
model_spectra = next(models)

In order to use the model, we just need some data! The format used for the spectra is a 2d list, where each line is the spectra and the columns are corresponding to the wavenumbers. 

In [None]:
spectra = [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15]]
predictions = model_spectra.predict(spectra)

You can also import this data from a .csv file or .xls file, you just need to convert the data to a list of lists. For example, using numpy to read an csv: 

In [None]:
import numpy as np

spectra = np.genfromtxt("demo-spectra.csv", delimiter=',')
predictions = model_spectra.predict(spectra)

#### Cultivation

To search your **Cultivation** projects, you can add `project_type="cultivation"` to the `get_projects` method. 

After that, in the `get_models` method, you can select between `historical` and `propagation` models.

In [None]:
projects = client.get_projects(name="CultivationProject", project_type="cultivation")
project = next(projects)

For Propagation models: 

In [None]:
# by default model_type = "propagation", but you can also filter by model_type = "historical", to get the historical models

models = project.get_models(name="CultivationModel", model_type="propagation")
model_prop = next(models)

The prediction function requires 2 arguments, `timestamps` and `inputs`, and two optional arguments, `timestamps_unit` and `config`. 

* `timestamps` is a list of the timestamps where we expect to have predictions. This list corresponds to relative `timestamps`, meaning that all values will be related to the first one. 
* `inputs` is a dictionary with the variables and the necessary values for predictions. The keys must be the Codes of the
            input variables, and the values must be lists of the same length as the timestamps. 
* `timestamps_unit` is the unit of the timestamps. It can be `s`,`m`,`h`,`d` for 'seconds', 'minutes', 'hours', 'days'
* `config` uses the `PredictionConfig` class to set the configuration that will be used by the model.

The `PredictionConfig` class is used to define the parameters of the model prediction. The parameters, right now, include `model_confidence`, but this will be expanded.
* `model_confidence` determines the range within which the predictions of the model are expected to fall, with a specified level of certainty, i.e, setting it to 80% corresponds to capturing the range between the 10th and 90th percentiles of the model's output.

If no `config` is provided, the default `model_confidence` will be used (80%) 

As an example: 

In [None]:
from dhl_sdk.entities import PredictionConfig

timestamps = [1,2,3,4,5,6,7]
inputs = {"var1": [42], "var2": [0.3], "var3": [0.5], "var4": [0,2,3,3,3,3,3]}

prediction_config = PredictionConfig(model_confidence=80)

result = model_prop.predict(timestamps, inputs, timestamps_unit="d", config = prediction_config)

The structure of the `inputs` for the **propagation** models follows the following rules:

* All model variables must be present (To check what variables are needed for the prediction, use the `model_prop.model_variables` property.)
* Some variable groups need to be complete for the prediction - `Flows/Feeds` and `W`. For this variable groups, the list of values must be of the same dimension as the `timestamps`.  
* For the rest of the variables (`X`, `Z`, `Feed Concentration`), only the initial value must be provided. 
* Only `Z` variables can be of type different than `numeric`

The prediction works from the initial given point onwards. 

As for `Historical` models, the prediction functionality works similarly.

The prediction function requires 3 arguments, `timestamps`, `inputs`. 

* `timestamps` is a list of the timestamps where we expect to have predictions.
* `steps` a list of steps for prediction in the run, , representing the steps from the start of the process. 
            This steps should match the length of the timestamps and start as 0.
* `inputs` is a dictionary with the variables and the necessary values for predictions. The keys must be the Codes of the
            input variables, and the values must be lists of the same length as the timestamps.
* `config` uses the `PredictionConfig` class to set the configuration that will be used by the model. See also Propagation Model Example above.

As an example: 

In [None]:
models = project.get_models(name="HistoricalModel", model_type="historical")
model_hist = next(models)

timestamps = [86400, 172800, 259200, 345600, 432000, 518400, 604800]
steps = [0,1,2,3,4,5,6]
inputs = {"var1": [42], "var2": [0.3], "var3": [0.5], "var4": [0,2,3,3,3,3,3]}

prediction_config = PredictionConfig(model_confidence=50)

result = model_hist.predict(timestamps, steps, inputs, timestamps_unit="s", config = prediction_config)

The structure of the `inputs` for the **historical** models follows the following rules:

* All model variables must be present. 
* Some variable groups need to be complete for the prediction - `Flows/Feeds` and `W`. For this variable groups, the list of values must be of the same dimension as the `timestamps` and not have missing values.
* For `X` variables, the dimensions of the value list must be the same as the `timestamps`, however, missing values are allowed. For better prediction performance, it's advisable that you give the most complete input value list. 
* For `Z` and `Feed Concentration` variables, only the initial value must be provided. 

The prediction only returns the output variables prediction. 

#### Simulation with Propagation+Historical models

The SDK can also be used to simulate how changes in the available experiments could affect the results.
If you have accessed your reference experiment (`experiment`), propagation model (`model_prop`), and historical model (`model_hist`), you can edit the experiment data and run predictions with both models.

First you can simulate the process dynamics using the propagation model.
Note that in your original data, X variables will have values the full duration of the run. As this is not desired for the simulation, these values must be removed.

In [None]:
exp_data = experiment.get_data(client)

# Change any variable you wish
exp_data['var4']['values'] = [0,2,3,4,4,3,2]

timestamps = exp_data['var1']['timestamps']
inputs_prop = {var_code: content["values"] for var_code, content in exp_data.items()}

variables = model_prop.model_variables
x_variables = [v.code for v in variables if v.group.code=="X"]

for data in inputs_prop:
    l = inputs_prop[data]
    if data in x_variables:
        inputs_prop[data]=l[:1] # keep only intial values for X variables
    else:
        inputs_prop[data] = [x for x in l if not pd.isna(x)]

result_prop = model_prop.predict(timestamps, inputs_prop, timestamps_unit="d")

You can then use the results from the simulation with the propagation model to predict CQAs using the historical model.

In [None]:
inputs_hist = inputs_prop.copy()

for variable in result_prop.keys():
    inputs_hist[variable] = result_prop[variable]["values"]

steps = list(range(len(timestamps)))

result_hist = model_hist.predict(timestamps, steps, inputs_hist, timestamps_unit="d")