# Tutorial 5: Assembling a workflow automatically using sparc-assemble
This tutorial shows how, given a specific request, sparc-assemble can automatically retrieve all the possible workflows that can be used to generate the desired output. The user can then select the most appropriate workflow for their needs. The chosen workflow is saved in the cwl format. 

## Prerequisites
A populated knowledge graph is required to assemble workflows. Please see tutorial 2. Two examples are provided under resources folder.
Moreover, in order to provide a cwl description of the workflow, each tool should have a cwl file associated with it. For ease, having both json and cwl file in a '/tools' folder is helpful. Please see tutorial 1 for more details. Example tools are provided under resources/tools folder.

Open jupyter through my binder (https://mybinder.org/v2/gh/jupyter/docker-stacks/main?urlpath=lab/tree/README.ipynb) and manually add the resources (don't forget the tools under resources/tools), scripts and ipynb files to the jupyter environment. Make sure to have the same structure as the one provided in the repository under tutorials/tutorial_5_assemble_workflow.


## Note
This code requires user interaction through python prompting. This feature is not compatible with the use of jupyter notebook. Hence, the user is required to run the command in a terminal (file -> new -> terminal). The different steps are detailed below. 


## Terminal command lines
First, install the requirements. You only need to install sparc-assemble package in the terminal (this might take a few minutes):

In [None]:
pip install sparc-assemble

Make sure the ls command returns the following structure: 
assemble_workflow.py  resources  tutorial_5_assemble_workflow.ipynb 
with resources containing the tools and example knowledge graphs.

Then, run the script in the terminal:

In [None]:
python assemble_workflow.py --kg_path <path_to_your_knowledge_graph> --tools_path <path_to_your_tools_folder>

The arguments required are KG_path which is the path to your knowledge graph owl file and tools_path which is the path to your tools folder containing the json and cwl files for all the tools. Bear in mind that you are running the script from examples/example_assemble_workflow .

Please see example section below for details on how to run the script for each example.

## Examples
**Example 1**:
The first example is based on a knowledge graph containing tools only. The tools are associated with json and cwl files (see tutorial_assemble_workflow/resources). In that example we show how to assemble a workflow to derive the parameter y_file from a dataset.


In [ ]:
python assemble_workflow.py --kg_path resources/kg_example_1.owl --tools_path resources/tools

**Example 2**:
The second example serves as an introduction to the discovery functionality offered by the tool. In this case, the knowledge graph contains tools and models. The tool and model are associated with json files but not cwl files. In that example we show how to navigate the available workflow and highlight the inputs available from a dtaset and the inputs that are missing to derive the parameter of interest 'vm.Membrane/V'.

In [ ]:
python assemble_workflow.py --kg_path resources/kg_example_2.owl

## Steps

**1. Enter request**

As soon as you run the command line, you will be prompted to specify which parameter you would like to derive. If you have followed the previous workflow you should have an idea of what are the outputs available in the knowledge graph. In case you do not have any idea, you can give it a try; you will be prompted with the available choice in case your try does not return any match. 
Hint: Example1: type y_file. Example2: type time. 

The outputs search is flexible and uses natural language processing to find the similarity between your request and the available outputs. If no match are found, you will be prompted with the available outputs, and you can specify a new request using this information.

Note: At this stage, the request can only be a single parameter.


**2. Choose your preferred workflow**

Once you have validated your request, the knowledge graph is queried in order to return every possible workflows that allow deriving your requested parameter. You will see two nested lists. The first level corresponds to the methods that can compute your requested parameter directly. You can therefore choose which method you want to use to derive your parameter. The second level corresponds to all the possible workflows using the method in question as the last step. These workflows allow you to navigate the different combination available to derive your parameter of interest and make your choice depending on the input availability. Indeed, the 'Input' key highlight the required inputs for each workflow followed by a succession of inputs -> tool -> outputs (i.e., steps) with the required workflow inputs appearing in bold.

Attention: provide the number of the method of your choice first, and then the number of the workflow of your choice under the chosen method. 
Hint: Example 1: enter 1, then 2. 
Example 2 stop with the display of the options as cwl files are not provided to create a cwl workflow file. You can see that one of the inputs can come from a dataset but two need to be provided by the user. This example showcases the discovery functionality of the tool as it gives an example of two inputs that would require a different dataset or manual input in order to derive the parameter of interest.


**3. Provide additional information to link inputs and outputs from cwl files**

In order to assemble the workflow into a cwl file, the knowledge graph pull the cwl tool files and read the information they contain. The json description of the tools are simplified contextual description allowing a human understanding of the tool function. However, the cwl file embed more complex information and require some manual inputs to link their more complex inputs and outputs. You are therefore prompt with several questions. The idea is, if needed, to link the input(s) and output(s) of workflow steps. 

Let's analyse example 1:
The example contains two steps: 
1- Dataset ID, Version ID -> tool_extract_indep_var_sds -> x_file
2- x_file -> tool_extract_dep_var_sds -> y_file

We are first asked if any link need to be created between step 2 outputs and step 1 input. In that case, yes, x_file and x_file represents the same object and should be linked.
Answers: 
Does any input of current step need to be linked to a previous output (yes/no): yes
Input_name: x_file (corresponds to the input of step 2)
Output_name: x_file (corresponds to the output of step 1)

There is no other link to create (enter 'no'). There is no duplicate between steps inputs so no action is needed. In case you had common inputs you can choose if they represent the same object, if they do not, their names will be indexed with the tool name.


**4. Provide a workflow name and save**

Finally, provide a name for the workflow you assembled without specifying an extension. For example: 'example_1_workflow'. 
A cwl file containing the workflow information will be saved in your current working directory under workflows/. You now have access to a standardised description of the workflow of your choice that can derive your parameter of interest.

Only example 1 can be saved as the example 2 does not contain the cwl file for the tool and model which is necessary for writing the workflow into a cwl file.

## Saving the workflow in SDS
The workflow can be saved in the SDS format. The SDS format is a standardised format that can be used to share workflows between different platforms and make the workflow FAIR. The SDS format is a json file that contains the information of the workflow in a structured way. You can convert the workflow to SDS by running the following code (in this jupyter notebook):

In [ ]:
!pip install sparc-me

In [ ]:
import shutil
from sparc_me import Dataset

In [ ]:
workflow_name = "example_1_workflow.cwl"
dataset = Dataset()
dataset.load_dataset(from_template=True, version="2.0.0")
dataset.save(save_dir="./sds")
shutil.copy('workflows/' + workflow_name, "./sds/primary/")

Note: change the workflow name to the one you saved in the previous step.

## Running the workflow
Your workflow is now saved in the SDS format. To run it, have a look at tutorial 6.