# DataHowLab SDK

DataHowLab SDK is a software development kit (SDK) designed to simplify and streamline the integration of DataHowLab's API into a simple Python package. This SDK provides developers with a convenient and efficient way to interact with DataHowLab's API, enabling access to the Experiments, Variables, Projects and Models, their data, as well as making predictions using these models with new sets of data.

### Prerequisites:

- **Python:** Ensure you have Python installed (version 3.9 or higher) on your system.
- **API Key:** Make sure you have a valid DataHowLab API Key. If you don't have an API key, please contact your DataHowLab provider.

### Importing SDK Package

Before starting to use DataHowLab's SDK, make sure you have a valid API Key. 
* You can store your key in a `.env` file in your directory with the key `DHL_API_KEY=your_api_key`. Otherwise, you can use your key as a direct argument in the `APIKeyAuthentication`, like
``` key = APIKeyAuthentication(api_key = "your_api_key")```. 
* After this, use your key to create your DataHowLabClient: ``` DataHowLabClient(auth_key=key, base_url="https://yourdomain.datahowlab.ch/")```, where `base_url` correspond to your DataHowLab domain.

In [None]:
%load_ext dotenv
%dotenv
from dhl_sdk import DataHowLabClient, APIKeyAuthentication

# DHL_API_KEY env var is loaded from the .env file
key = APIKeyAuthentication()

# This is an example. Change the next line to your DataHowLab Instance
your_url = "https://yourdomain.datahowlab.ch/"
client = DataHowLabClient(auth_key=key, base_url=your_url)

### Accessing your DataHowLab Data

Using your DataHowLabClient, you can get a list and access your:

* Products
* Variables
* Experiments
* Recipes
* Projects (Inside Projects you can also access your "Datasets" and "Models")


When accessing your DataBase Entities, all the results will be in a Iterable form. Because of that, you can access each element of your search by using the `next` method, just like a normal Python Iterable object. For more information regarding this, please consult the official python documentation regarding [iterables](https://docs.python.org/3/glossary.html#term-iterable), [iterators](https://docs.python.org/3/glossary.html#term-iterator) and the [next](https://docs.python.org/3/library/functions.html#next) function.

In order to get more information about your entities, you can access the fields `name` and `description` present in all entities. For Products and Variables, you can also access the `code` field. 

In [None]:
# Products can be filtered by code 

products = client.get_products(code = "ExampleProductCode")
product = next(products)

product_name = product.name
product_code = product.code
product_description = product.description

In [None]:
# Variables can be filtered by code, group, or variable type 

variables = client.get_variables(group="ExampleGroup")
variable = next(variables)

#printing the variable name
print(variable.name)

# if the variable was not the one you wanted, you can keep iterating
next(variables)
print(variable.name)

wanted_variable = next(variables)

In [None]:
# Experiments can be filtered by name or product. 
# If you want to filter by product you should first get your product
# and then use it to filter the experiments.

experiments = client.get_experiments(name="ExampleExperimentName", product=product)
experiment = next(experiments)

In [None]:
# Projects can be filtered by name and project type 
# (by default, project_type = "cultivation", but it can also be "spectroscopy")

projects = client.get_projects(name="exampleproject")
project = next(projects)

### Exporting Experiment Data

After selecting an experiment, you can download the data from that experiment using the `get_data` method, by providing your `client` as an argument. 

The results of the method are in a dictionary format where: 

    {"Variable Code 1": 
        {
            "timestamps": [0,1,2], 
            "values": [val1, val2, val3] 
        },
    "Variable Code 2":     
        {
            "timestamps": [0, 1, 2], 
            "values": [val1, val2, val3]  
        }
    }

In [None]:
exp_data = experiment.get_data(client)
print(exp_data)

You can also access the data inside your datasets through your projects. After you select a project, use to the method `get_datasets` to find your dataset. Similar to the `experiments` and `recipes`, you can also filter the results by the dataset name. After selecting it, download the data using the `get_data(client)` method of the dataset.
 
This will return a list of dicts associated with each experiment. To check the information regarding each experiment, just access the `experiment` field inside your dataset: `dataset.experiments`

In [None]:
# You can filter the datasets by name

# example of looping through all datasets to find with "test" in the name
datasets = project.get_datasets()
for ds in datasets:
    if "test" in ds.name:
        dataset = ds
        break
    
dataset_data = dataset.get_data(client)

### Importing Data to DataHowLab's DataBase

The SDK also allows the user to import new experiments to the DataBase (DB). This process implies the creation/selection of a Product, Variables and Data to create a new experiment. 

In case you want to use Entities already created, just get them using the methods shown above. 

In case you need to create new entities you need to initiate them using the `new` method of each one of the entities. Here is an example workflow:

In [None]:
from dhl_sdk import Product, Variable, Experiment, VariableCategorical, VariableNumeric

product = Product.new(name="ExampleProduct", code="EXPRD", description="Example Product")

variable1 = Variable.new(code="SDKv1", 
                         name="SDK Variable 1", 
                         description="This is a variable created with the SDK", 
                         measurement_unit="l", 
                         variable_group="X Variables", 
                         variable_type=VariableNumeric()
                         )
variable2 = Variable.new(code="SDKv2", 
                         name="SDK Variable 2", 
                         description="This is a variable created with the SDK", 
                         measurement_unit="l", 
                         variable_group="Z Variables", 
                         variable_type=VariableCategorical()
                         )

This will create a local version of the entities. In order to use them in a new experiment, you need to upload them to the DB. For that, use the `create` method of `DataHowLabClient`. 
This method will validate your new entities and upload them in case everything is correct.

In [None]:
product = client.create(product)
variable1 = client.create(variable1)
variable2 = client.create(variable2)

For a new experiment, your data needs to be in a specific format:

    {"Variable Code 1": 
        {
            "timestamps": [0,1,2], 
            "values": [val1, val2, val3] 
        },
    "Variable Code 2":     
        {
            "timestamps": [0, 1, 2], 
            "values": [val1, val2, val3]  
        }
    }

In [None]:
run_data = {
            "SDKv1": {
                "timestamps": [
                    0,
                    86400,
                    172800,
                ],
                "values": [
                    1.1,
                    1.2,
                    1.3
                ]
            },
            "SDKv2": {
                "timestamps": [
                    0],
                "values": [
                    "A"]

            }
}

After collecting your new experiment (as the `run_data` example above), you can create a new experiment using `Experiment.new` method. After that, use the `client.create` method to upload the data to the DB.

In [None]:
experiment = Experiment.new(name="SDK EXP", 
                            description="new experiment example for sdk", 
                            product=product, 
                            variables=[variable1, variable2], 
                            data_type="run", 
                            data=run_data, 
                            variant="run", 
                            start_time="2024-01-01T00:00:00Z", 
                            end_time="2024-01-03T00:00:00Z")

client.create(experiment)

If, for example, you want to add a new experiment with the same Product and Variables as an already created experiment, you can do it like this:

In [None]:
experiments = client.get_experiments(name="Experiment to Fork")
old_experiment = next(experiments)

new_data = {
            "SDKv1": {
                "timestamps": [
                    0,
                    86400,
                    172800,
                ],
                "values": [
                    1.1,
                    2.2,
                    3.3
                ]
            },
            "SDKv2": {
                "timestamps": [
                    0],
                "values": [
                    "B"]

            }
}

new_experiment = Experiment.new(name="Forked Experiment",
                                description="new experiment example for sdk",
                                product=old_experiment.product,
                                variables=old_experiment.variables,
                                data_type="run",
                                data=new_data,
                                variant="run",
                                start_time="2024-02-01T00:00:00Z",
                                end_time="2024-02-03T00:00:00Z")
client.create(new_experiment)

### Using your models 

DataHowLab's SDK also allows you to use your trained models to run new predictions based on new data. 

To access your models, you need to find your `project` of interest and, inside this project, find your model using the `get_models` method.

In [None]:
projects = client.get_projects(name="exampleproject")
project = next(projects)

models = project.get_models(name="ExampleModel")
model = next(models)

Inside the model, you can access the dataset used by `model.dataset` and also check the variables used to train the model `model.model_variables`. This last property returns a list of the `Variable` class. If you want to just check the variable codes, you can use the method `model.get_model_variables_codes()`. 

In [None]:
model_dataset = model.dataset 
# you can access the data inside the dataset 
model_data = model_dataset.get_data(client)


variables = model.model_variables
variable_codes = model.get_model_variables_codes()

The available projects are Spectroscopy and Cultivation. Inside Cultivation, we can choose Historical and Propagation models.

#### Spectroscopy 

To search your **Spectroscopy** projects, you can add `project_type="spectroscopy"` to the `get_projects` method. 

In [None]:
projects = client.get_projects(name="Spectra", project_type="spectroscopy")
project = next(projects)

models = project.get_models(name="SpectraModel")
model = next(models)

In order to use the model, we just need some data! The format used for the spectra is a 2d list, where each line is the spectra and the columns are corresponding to the wavenumbers. 

In [None]:
spectra = [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15]]
predictions = model.predict(spectra)

You can also import this data from a .csv file or .xls file, you just need to convert the data to a list of lists. For example, using numpy to read an csv: 

In [None]:
import numpy as np

spectra = np.genfromtxt("demo-spectra.csv", delimiter=',')
predictions = model.predict(spectra)

#### Cultivation

To search your **Cultivation** projects, you can add `project_type="cultivation"` to the `get_projects` method. 

After that, in the `get_models` method, you can select between `historical` and `propagation` models.

In [None]:
projects = client.get_projects(name="CultivationPrjoject", project_type="cultivation")
project = next(projects)

For Propagation models: 

In [None]:
# by default model_type = "propagation", but you can also filter by model_type = "historical", to get the historical models

models = project.get_models(name="CultivationModel", model_type="propagation")
model = next(models)

The prediction function requires 2 arguments, `timestamps` and `inputs`, and one optional argument `timestamps_unit`. 

* `timestamps` is a list of the timestamps where we expect to have predictions. This list corresponds to relative `timestamps`, meaning that all values will be related to the first one. 
* `inputs` is a dictionary with the variables and the necessary values for predictions. The keys must be the Codes of the
            input variables, and the values must be lists of the same length as the timestamps. 
* `timestamps_unit` is the unit of the timestamps. It can be `s`,`m`,`h`,`d` for 'seconds', 'minutes', 'hours', 'days'

As an example: 

In [None]:
timestamps = [1,2,3,4,5,6,7]
inputs = {"var1": [42], "var2": [0.3], "var3": [0.5], "var4": [0,2,3,3,3,3,3]}
result = model.predict(timestamps, inputs, timestamps_unit="d")

The structure of the `inputs` for the **propagation** models follows the following rules:

* All model variables must be present (To check what variables are needed for the prediction, use the `model.model_variables` property.)
* Some variable groups need to be complete for the prediction - `Flows/Feeds` and `W`. For this variable groups, the list of values must be of the same dimension as the `timestamps`.  
* For the rest of the variables (`X`, `Z`, `Feed Concentration`), only the initial value must be provided. 
* Only `Z` variables can be of type different than `numeric`

The prediction works from the initial given point onwards. 

As for `Historical` models, the prediction functionality works similarly.

The prediction function requires 3 arguments, `timestamps`, `inputs`. 

* `timestamps` is a list of the timestamps where we expect to have predictions.
* `steps` a list of steps for prediction in the run, , representing the steps from the start of the process. 
            This steps should match the length of the timestamps and start as 0.
* `inputs` is a dictionary with the variables and the necessary values for predictions. The keys must be the Codes of the
            input variables, and the values must be lists of the same length as the timestamps.

As an example: 

In [None]:
models = project.get_models(name="HistoricalModel", model_type="historical")
model = next(models)

timestamps = [86400, 172800, 259200, 345600, 432000, 518400, 604800]
steps = [0,1,2,3,4,5,6]
inputs = {"var1": [42], "var2": [0.3], "var3": [0.5], "var4": [0,2,3,3,3,3,3]}
result = model.predict(timestamps, steps, inputs, timestamps_unit="s")

The structure of the `inputs` for the **historical** models follows the following rules:

* All model variables must be present. 
* Some variable groups need to be complete for the prediction - `Flows/Feeds` and `W`. For this variable groups, the list of values must be of the same dimension as the `timestamps` and not have missing values.
* For `X` variables, the dimensions of the value list must be the same as the `timestamps`, however, missing values are allowed. For better prediction performance, it's advisable that you give the most complete input value list. 
* For `Z` and `Feed Concentration` variables, only the initial value must be provided. 

The prediction only returns the output variables prediction. 