# Overview

This notebook provides instructions for using the Terrain API to automate some tasks that require a little more work in the CyVerse Discovery Environment (DE).

## Target Audience

- Users interested in automating their tasks.
- Users who want to integrate CyVerse resources into their applications.
- Users looking for an introduction to using APIs in order to make use of other APIs related to CyVerse.

## Why Use Terrain?

The DE itself provides a lot of tools for managing and analyzing data, so one question that may come up is: why bother to use Terrain directly when I can use the DE itself? This is absolutely a valid point, but there are some situations where a graphical user interface can become a bit of a hinderance. Suppose, for example, that you need to launch dozens of analyses that all use the same app with slightly different parameter values. The DE currently provides no way to do this unless the parameters being varied happen to refer to input files. Launching so many similar jobs using a GUI would be tedious and error prone. If you're making calls to Terrain directly, you can write a short script to quickly launch all of the analyses with all of the required parameter variations.

Of course, there is a trade-off. Making calls directly to the API does take some effort; it means that you have to perform all of these tasks:

1. Authenticate to terrain.
1. Identify the app that you want to run.
1. Obtain information about the app parameters.
1. Launch the app.
1. Send a notification when the job is done.

The DE manages all of these tasks for you. If the DE suits your needs, by all means, use it. If the tasks that you have to perform become repetitive, however, investing a little time in writing a script to automate job submission might actually save you some time in the long run.

## Documentation

Terrain has two primary sources of documentation:

- Latest and Greatest: https://de.cyverse.org/terrain/docs
- Older Documentation: https://cyverse-de.github.io/api

In most cases, you'll want to use the latest documentation because some of the older documentation is out of date. The only time that the older documentation is preferable is when the newer documentation hasn't been written for an endpoint that you want to use. If you use the older documentation, the best place to look is the [endpoint index](https://cyverse-de.github.io/api/endpoint-index.html). This page includes a list of links to all of the older DE documentation.

# Prerequisites

Before actually calling terrain, we'll have to make sure that we have all of the libraries that we need, and that we have credentials that we can use to let Terrain know who we are.

## Libraries

We'll be making extensive use of the [Requests library](https://requests.readthedocs.io/en/master/), which makes calling APIs quite simple. We'll also need to be able to prompt for a password and pretty-print some data structures. There's also one case where we'll have to serialize some JSON.

In [None]:
import getpass
import json
import pprint
import requests

## Authenticating

Terrain uses OAuth2 for most endpoints. This works well for the DE because it's already integrated with an identity provider that is capable of providing tokens. For direct API calls, however, we needed something a little more convenient. For this purpose, we created a set of endpoints dedicated to obtaining OAuth2 tokens. These endpoints use HTTP basic authentication (that is, a username and password) and, assuming the credentials are valid, return a token that can be used to call other Terrain endpoints. In Python, supporting HTTP basic authentication means that we have to prompt for a username and password. This is where the `getpass` library comes in.

In [None]:
print("Username: ", end='', flush=True)
username = input()
print("Password: ", end='', flush=True)
password = getpass.getpass()

Now that we have the username and password, we can obtain the authentication token by calling the `/terrain/token` endpoint.

In [None]:
r = requests.get("https://de.cyverse.org/terrain/token", auth=(username, password))
r.raise_for_status()
token = r.json()['access_token']
auth_headers = {"Authorization": "Bearer " + token}

# Example 1: VICE Analysis

Launching VICE analyses in the DE is actually quite simple, but the simplicity of this task makes it an ideal first example.

## Finding the App

The first step is to find a VICE app to use. For this task, we'll use the app, `rstudio-chipqc 1.22`. The first step is to get the app ID so that we can get information about how to launch the analysis. The app search endpoint provides a convenient way to do this.

In [None]:
query_params = {"search": "rstudio-chipqc 1.22"}
r = requests.get("https://de.cyverse.org/terrain/apps", headers=auth_headers, params=query_params)
r.raise_for_status()
pprint.pprint(r.json())

## Obtaining the App Details

Armed with some information about the app, we can now obtain the full app description, which contains all of the information necessary to launch an analysis using the app. The first step in doing that is to obtain the information that we need from the app search above. For this step, we need the system ID and the app ID. The system ID refers to the system that is responsible for managing the app. Currently there are two valid system IDs: `de` and `agave`. Apps that use the system ID, `de`, are defined in and managed by the DE itself. Apps that use the system ID, `agave`, are defined in and managed by the (Tapis API)[https://tapis-project.org], formerly known as Agave. Of course, the app ID refers to the app itself.

In [None]:
app_listing = r.json()["apps"][0]
system_id = app_listing["system_id"]
app_id = app_listing["id"]
print("System ID: ", system_id)
print("App ID: ", app_id)

Now we can obtain the full app description.

In [None]:
url = "https://de.cyverse.org/terrain/apps/{0}/{1}".format(system_id, app_id)
r = requests.get(url, headers=auth_headers)
r.raise_for_status()
pprint.pprint(r.json())

The output from this endpoint deserves a little explanation. At the top level, we have the basic app information such as the name, ID, and description of the app. The top level also contains a list labeled `groups`. These groups provide a way to place related parameters on the same panel in the app launch window in the DE. Each group contains a list of parameters, and the parameters themselves provide the information we need to submit the job.

The primary piece of information that we're going to need from this file is the parameter ID for the input file name. We may as well grab it now.

In [None]:
parameter_id = r.json()["groups"][0]["parameters"][0]["id"]
print("Parameter ID: ", parameter_id)

## Building the Analysis Submission Request Body

The analysis submission endpoint is the first endpoint we've encountered so far that has a request body, and this request body needs to be formatted correctly for the analysis submission to succeed. The request body looks something like this:

``` json
{
    "requirements": [
        {
            "min_cpu_cores": 1,
            "min_memory_limit": 2147483648,
            "min_disk_space": 549755813888,
            "step_number": 0
        }
    ],
    "config": {},
    "name": "string",
    "app_id": "string",
    "system_id": "string",
    "debug": false,
    "output_dir": "string",
    "notify": true
}
```

Not all of the available fields are listed in the example JSON above, but I did include all of the required fields and one optional field that I wanted to highlight.

| Parameter Name | Description                                                                              |
| -------------- | ---------------------------------------------------------------------------------------- |
| config         | A map from parameter ID to parameter value.                                              |
| name           | The name of the analysis.                                                                |
| app_id         | The app ID from the submission information above.                                        |
| system_id      | The system ID from the submission information above.                                     |
| debug          | This parameter can be used to enable debugging, which isn't necessary.                   |
| output_dir     | The path to the folder in the data store where the output files should be placed.        |
| notify         | This parameter can be used to enable or disable job status update notifications.         |
| requirements   | This parameter is used to specify execution system requirements.                         |

So now we have to plug in the values. The example data that we're going to use is in `/iplant/home/shared/workshop_material/terrain_intro/example-data`, so that is the path that we need to use for the input parameter value. In addition to that, it's fairly common for a VICE analysis to require more memory than the default for the app. The reason for this is that VICE apps are interactive; the amount of memory required depends largely upon what is being done with the app. When you're requesting resource requirements, you can optionally specify different requirements for different steps in the analysis. VICE apps always contain exactly one step, so we only have to submit one set of resource requests. For this simple example, we don't need too much memory, so we're going to ask for 4 GiB (that is 4 * 2^30 bytes) of RAM.

Keep in mind that the request body below is written in Python rather than JSON, so it will look slightly different from the JSON request body listed above. The `requests` library will convert this Python object to a JSON object for us before sending the request to terrain.

In [None]:
request_body = {
    "config": {
        parameter_id: "/iplant/home/shared/workshop_material/terrain_intro/example-data"
    },
    "name": "terrain-automation-vice",
    "app_id": app_id,
    "system_id": system_id,
    "debug": False,
    "output_dir": "/iplant/home/" + username + "/analyses",
    "notify": True,
    "requirements": [
        {
            "min_memory_limit": 4 * 2 ** 30,
            "step_number": 0
        }
    ]
}
pprint.pprint(request_body)

Now we can finally submit the analysis.

In [None]:
r = requests.post("https://de.cyverse.org/terrain/analyses", headers=auth_headers, json=request_body)
r.raise_for_status()
pprint.pprint(r.json())

Finally, for a VICE analysis, we need to obtain the URL used to access the running analysis. We can use the analysis listing endpoint to get this information. First, we need to get the analysis ID from the response body of the previous step. With that information we can build and serialize a filter to place in a query parameter and call the endpoint.

In [None]:
query_params = {"filter": json.dumps([{"field":"id","value":r.json()["id"]}])}
r = requests.get("https://de.cyverse.org/terrain/analyses", headers=auth_headers, params=query_params)
r.raise_for_status()
pprint.pprint(r.json())