# Uncertainty Engine SDK example workflows - Train

This notebook goes through how you would set up a workflow to train and save a machine-learning model using the `Train Model` node.

Start by importing and initializing the `Client` (for more details about the client see the `demo_node.ipynb` example). 

In [None]:
from uncertainty_engine.client import Client

client = Client(
    email="<you-email>",  # Note: There must be token associated with this email.
    deployment="<a-deployment-url>",
)

## Viewing the nodes

Once you have initialised the Uncertainty Engine client you can use the `list_nodes` method to list available nodes under a particular category. In this case we are training a machine learning model, so we can list the nodes under the `MachineLearningModels` category to find the `TrainModel` node.

In [21]:
from pprint import pprint

nodes = client.list_nodes()
nodes_by_id = {node["id"]: node for node in nodes}

pprint(sorted(nodes_by_id.keys()))

['Add',
 'AnalyseVariance',
 'AppendDataset',
 'BuildSensorDesigner',
 'CoralScop',
 'CreateChat',
 'CreateDatasetSlice',
 'CreateFEM',
 'CustomPython',
 'Dict',
 'Display',
 'Download',
 'EmbeddingsConfig',
 'FilterDataset',
 'GetContext',
 'Join',
 'KaustWP6PH',
 'LLMConfig',
 'Load',
 'LoadChatHistory',
 'LoadDataset',
 'LoadDocument',
 'LoadModel',
 'LoadMultiple',
 'Message',
 'ModelConfig',
 'Number',
 'PcdProcessing',
 'PredictModel',
 'PromptLLM',
 'Recommend',
 'Save',
 'ScoreModel',
 'ScoreSensorDesign',
 'SimpleLLM',
 'Splitter',
 'StereoReconstruction',
 'SuggestSensorDesign',
 'Text',
 'TrainModel',
 'UncertaintyPlot',
 'Workflow']


Once you have found your `TrainModel` node it can be useful to print out some of the extra information such as `description`, `inputs` and `outputs`.

In [None]:
train_node = nodes_by_id["TrainModel"]

pprint(train_node)

{'cache_url': 'redis-13325.c338.eu-west-2-1.ec2.redns.redis-cloud.com',
 'category': 'MachineLearningModels',
 'cost': 5,
 'description': 'Train a machine-learning model',
 'id': 'TrainModel',
 'image_name': 'uncertainty-engine-train-model-node',
 'inputs': {'config': {'default': None,
                       'description': 'Configuration for the model',
                       'label': 'Model Config',
                       'required': True,
                       'set_in_node': False,
                       'type': 'ModelConfig'},
            'inputs': {'default': None,
                       'description': 'Input dataset for training the model',
                       'label': 'Input Dataset',
                       'required': True,
                       'set_in_node': True,
                       'type': 'CSVDataset'},
            'outputs': {'default': None,
                        'description': 'Output dataset for training the model',
                        'label': 'Output Dat

Now we have that information we can start to build our workflow using the right nodes and its inputs and outputs.

## Constructing your train workflow

First, import and initialize the `Graph` object and add the `TrainModel` node.


In [4]:
from uncertainty_engine.graph import Graph
from uncertainty_engine.nodes.base import Node

graph = Graph()

train_model = Node(
    node_name="TrainModel"
)

graph.add_node(train_model, "Train Model")

As we can see by the `TrainModel` node we know that it needs a model config. When viewing the `MachineLearningModel` nodes we can see that there is a `ModelConfig` node.

In [5]:
model_config_info = nodes_by_id["ModelConfig"]

pprint(model_config_info["inputs"])
pprint(model_config_info["outputs"])

{'input_retained_dimensions': {'default': None,
                               'description': 'Number of dimensions to retain '
                                              'in the input data',
                               'label': 'Input Retained Dimensions',
                               'required': False,
                               'set_in_node': True,
                               'type': 'int'},
 'input_variance': {'default': None,
                    'description': 'Percentage of variance to retain in the '
                                   'input data',
                    'label': 'Input Variance',
                    'required': False,
                    'set_in_node': True,
                    'type': 'float'},
 'model_type': {'default': 'SingleTaskGPTorch',
                'description': 'Type of model to use',
                'label': 'Model Type',
                'required': False,
                'set_in_node': True,
                'type': 'Literal["SingleTask

Now we can define our `ModelConfig` node. As none of the inputs are required we can just use the default input parameters for now.

In [6]:
model_config = Node(
    node_name="ModelConfig",
    label="Model Config",
)

graph.add_node(model_config)

graph.add_edge(
    source="Model Config",
    target="Train Model",
    source_key="config",
    target_key="config"
)

The other input parameters we need are the input (`X`) and output (`y`) datasets. We can use the `quickstart` example dataset.

In [7]:
with open("data/quickstart.csv", "r") as file:
    quickstart_csv = file.read()

In [8]:
model_config_info = nodes_by_id["FilterDataset"]

pprint(model_config_info["inputs"])
pprint(model_config_info["outputs"])

{'columns': {'default': None,
             'description': 'The list of columns to keep',
             'label': 'Columns',
             'required': True,
             'set_in_node': True,
             'type': 'list[str]'},
 'dataset': {'default': None,
             'description': 'The dataset to filter',
             'label': 'Dataset',
             'required': True,
             'set_in_node': False,
             'type': 'CSVDataset'}}
{'dataset': {'description': 'The filtered dataset',
             'label': 'Dataset',
             'type': 'CSVDataset'}}


We can split our dataset using the `FilterDataset` node.

In [9]:
filter_dataset1 = Node(
    node_name="FilterDataset",
    dataset={"csv": quickstart_csv},
    columns=["x"]
)

filter_dataset2 = Node(
    node_name="FilterDataset",
    dataset={"csv": quickstart_csv},
    columns=["y"]
)

graph.add_node(filter_dataset1, "Input")
graph.add_node(filter_dataset2, "Output")

graph.add_edge(
    source="Input",
    target="Train Model",
    source_key="dataset",
    target_key="inputs"
)

graph.add_edge(
    source="Output",
    target="Train Model",
    source_key="dataset",
    target_key="outputs"
)

Now we have all our inputs for `TrainModel` we can use `graph.node` to list our nodes in graph format.

In [10]:
pprint(graph.nodes)

{'nodes': {'Input': {'inputs': {'columns': {'node_handle': 'Input_columns',
                                            'node_name': '_'},
                                'dataset': {'node_handle': 'Input_dataset',
                                            'node_name': '_'}},
                     'type': 'FilterDataset'},
           'Model Config': {'inputs': {}, 'type': 'ModelConfig'},
           'Output': {'inputs': {'columns': {'node_handle': 'Output_columns',
                                             'node_name': '_'},
                                 'dataset': {'node_handle': 'Output_dataset',
                                             'node_name': '_'}},
                      'type': 'FilterDataset'},
           'Train Model': {'inputs': {'config': {'node_handle': 'config',
                                                 'node_name': 'Model Config'},
                                      'inputs': {'node_handle': 'dataset',
                                               

## Defining node outputs

Now are graph has been built we can use an output node to decide how we wish to collect the output. In this case we would like to download our output using the `Download` node.

In [22]:
download = Node(
  node_name="Download",
  label="Download"
)

graph.add_node(download)

graph.add_edge(
    source="Train Model",
    target="Download",
    source_key="model",
    target_key="file"
)

pprint(graph.nodes)

{'nodes': {'Download': {'inputs': {'file': {'node_handle': 'model',
                                            'node_name': 'Train Model'}},
                        'type': 'Download'},
           'Input': {'inputs': {'columns': {'node_handle': 'Input_columns',
                                            'node_name': '_'},
                                'dataset': {'node_handle': 'Input_dataset',
                                            'node_name': '_'}},
                     'type': 'FilterDataset'},
           'Model Config': {'inputs': {}, 'type': 'ModelConfig'},
           'Output': {'inputs': {'columns': {'node_handle': 'Output_columns',
                                             'node_name': '_'},
                                 'dataset': {'node_handle': 'Output_dataset',
                                             'node_name': '_'}},
                      'type': 'FilterDataset'},
           'Train Model': {'inputs': {'config': {'node_handle': 'config',
              

## Executing a workflow

Create the executable workflow by wrapping our graph in the `Workflow` node and defining the `requested_output` as the output handle of the `Download` node.

In [14]:
from uncertainty_engine.nodes.workflow import Workflow

workflow = Workflow(
    graph=graph.nodes,
    input=graph.external_input,
    external_input_id=graph.external_input_id,
    requested_output={
        "Trained Model": {"node_name":"Download", "node_handle":"file"},
        }
    )

Execute the workflow bu running `client.run_node(workflow)` and passing the workflow object.

In [15]:
response = client.run_node(workflow)
pprint(response["outputs"])

{'outputs': {'Trained Model': 'https://uncertainty-engine-download-node-dev-databucket-b4nubppbi7hc.s3.amazonaws.com/7555e5ae-1a6c-426b-9e1a-740edd6c1c9f/17f20130-26d5-4161-9842-715af9c80178/content.json?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIA5VITMN4NQIMDQXFJ%2F20250519%2Feu-west-2%2Fs3%2Faws4_request&X-Amz-Date=20250519T140032Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Security-Token=IQoJb3JpZ2luX2VjENb%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCWV1LXdlc3QtMiJHMEUCIQD2dftr%2FtgjZPGhaaUbAYa2evEtg7RvOkjoPkFOyGTRbQIgfbwqYKzvvaFS1zJNJz2Frd3b4lRgECi7rQLpEGxa5FYqvgQIj%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FARAAGgw5MzkwMjc4ODU4NTEiDBkq37jbBYjd%2Bde3SSqSBFjcJKmeUWW0nyuolPDMdCcI43%2FyGGPUlR0VKH9RgBwmK40qzUotmO6mzMO%2BkpSVtvySK6EC3DvkYdDSpjvPh6R6tsQxC%2FsGAnQHBMcTgu5FSS%2FW4CIWrQME8UdUpSl1fgn6ej30buIVxnIVUKUl2ShW4hpCCo5l4IQ7Qw1s7UHdBoxq90KgXrnwt5dAtvmvmCqM6XVbmIb%2BzEnmrHGbPv5nweLa0M21OpV5OeFoEHyhEhQ0IY5wdis6CID2ueZpzqewK%2FmprcA6Jt%2F0ejNl5BX%2BPx6zxUzvFgH5d8jgnh8M89nyFOKWGcs078M7xlNvmQt

Following the presigned URL will download the trained model.