# GeoDN Course 2: Fundamentals of Geospatial Data and Modeling - Part 2 Geospatial Foundation Models and Workflows #
> Copyright (c) 2024 International Business Machines Corporation

> This software is released under the MIT License.
> https://opensource.org/licenses/MIT

# Section 3 - GeoDN Modelling; Running a workflow from the catalogue
This example notebook will walk you through the process of running a workflow using a definition already stored in the workflow catalogue.  This is the way that most GeoDN Modeling workflows should be run.  

In order to run a workflow from the catalogue, a user needs to just specify the parameters available for that workflow.  The process is as follows:

1) Import packages and specify GeoDN Modeling URLs.
2) Grab an authentication token using the GeoDN SDK.
3) Create a connection to the GeoDN Modeling Workflow API.
4) Browser the available workflows.
5) Select a workflow and update the workflow options.
6) Send it to the cluster to run.
7) Monitor the workflow run status.
8) Download and view the output files.
9) Check a list of past workflow runs.

## 1. Imports and setup

In [None]:
from geodn.modeling import workflow

import os
import json
from PIL import Image

## 2. Authenticate to GeoDN

To authenticate to the GeoDN services, add your username and password to the file `~/geodn-creds`, which can be found in the home folder. The format should be `your-username:your-password`. 

Once updated, we can use the `getToken()` function to get an authentication token which will later be used to access the GeoDN services. The `get_token()` function takes three parameters, `username`, `password` and `geodn_modeling_url`, which are your username, your password and the URL of the GeoDN backend service to connect to, respectively. 

These tokens will expire after 24 hours. To also return a refresh_token, pass `refresh=True` to `getToken()`.

In [None]:
with open("../.." + '/geodn-creds', 'r') as file:
    data = file.read().rstrip()
    username = data.split(':')[0]
    password = data.split(':')[1]

assert username and password

# Get the tokens
id_token, access_token = workflow.get_token(username, password, geodn_modeling_url=os.environ["GEODN_URL"])
assert id_token and access_token

## 3. Connect to GeoDN modeling

And finally, we can connect to the GeoDN Modeling service. Here you will pass the token and create the connection to the GeoDN Modeling APIs. This will allow you to submit models to the cluster, check status, access logs and download files.

To determine which backend service to connect to, we use the arguments `geodn_modeling_url`, `core_url` and `workflow_url`. These have been set as environment variables but can be configured if in the future you require a connection to a different backend service. 

In [None]:
# Connect with the GeoDN Modeling APIs
geodn_modelling = workflow.GeoDN_Modeling(
    bearer_token=id_token,
    api_url=os.environ["GEODN_URL"],
    core_url=os.environ["GEODN_CORE_URL"],
    workflow_url=os.environ["GEODN_WORKFLOW_URL"],
)

## 4. Choose a workflow from the catalogue

You can query the workflow catalogue to view the workflows available, and see an example of the payload required to run them.

```python
workflows = geodn_modelling.available_workflows()
```

Alternatively, you can use the notebook UI to browse the available options.

In [None]:
geodn_modelling.available_workflows_ui()

## 5. Get and update workflow options
Once you have picked a workflow, we can grab the example payload and adapt that as we wish.  In this example using the onboarded `explain_air_pollution_v1` workflow.

Firstly, we will grab the payload for the chosen workflow, if you have selected it from the dropdown above, the sdk will already know which experiment, otherwise you can pass the workflow name and set it as an attribute `geodn_modelling.wf_name=explain_air_pollution_v1`.  Whilst the function can push the payload to a file, it will by default create a new notebook cell right underneath with the payload dictionary ready for you to update.  You can either update values there before executing the cell, or update later.


In [None]:
geodn_modelling.wf_name = "explain_air_pollution_v1"

In [None]:
# Initialise Workflow Payload
geodn_modelling.payload_to()

## 6. Run the workflow in GeoDN Modeling
Once you have updated all the options you want to, you can send the workflow to be run on the GeoDN Modeling cluster. To do that simply run `submit_workflow()` giving it the payload.  You should get back a response saying that the workflow has been submitted.  This includes the `model_run_id` which is the identifier for the workflow run you just initiated, you will use this to monitor status and access files.


In [None]:
# Check Workflow
geodn_modelling.workflow_to()

In [None]:
resp = geodn_modelling.submit_workflow(payload)
print(json.dumps(resp, indent=1))

## 7. Monitoring workflow status

To monitor the status of the workflow run, you can use the `workflowStatus()` function from the core api.  At present, this is the status of any steps which are/have been running.  You will see `Pending`, `Running`, `Succeeded` or `Failed`.  In the case of a failed step it will show an error message (this will often refer you to the logs, see below).  The json returned (`opr` here) contains more detailed information about when steps ran etc). 

In [None]:
model_run_id = resp['data']['model_run_id']

opr = geodn_modelling.workflowStatus(model_run_id=model_run_id)

## 8. Download and view output files
In a GeoDN Modeling workflow, all the data files from a particular run of a workflow are stored in an S3-compitable bucket. This includes the input files, the final outputs and all intermediate files.  To check if the workflow has run correctly, we can check the correct files have been generated, download them and take a look.

In [None]:
workflow.fileDownloader(geodn_modelling, model_run_id)

## 9. Exploring past workflow runs
You can also grab a list of your past workflows (which you could then use to pull old results etc).  To do this user the `past_workflows()` function, this will by default return a dataframe, but you can get the raw json by adding `output='json'` as an argument.  Also by default, it will filter by you as a user.


In [None]:
gdf = geodn_modelling.past_workflows()
gdf