# Example TDS Flow

This notebook outlines an example storage and recall flow for the Terarium Data Service

### Contents:

1. [Users and Projects](#Users-and-Projects)
2. [Publications](#Publications)
3. [Models](#Models)
4. [Model Configurations](#Model-Configurations)
5. [Datasets](#Datasets)
6. [Dataset file upload/download](#Dataset-file-upload/download)
7. [Workflows](#Workflows)
8. [Simulations](#Simulations)

In [1]:
import requests
import json
from copy import deepcopy
import pandas as pd
url = 'http://localhost:8001'

## Users and Projects

Create a new user and get its `id`:

In [2]:
username = 'Brandon Rose'
user = {'name': username,
        'email': 'brandon@jataware.com',
        'org': 'Jataware',
        'is_registered': True}

resp = requests.post(f"{url}/persons", json=user)
user_id = resp.json()['id']
resp.json()

{'name': 'Brandon Rose',
 'email': 'brandon@jataware.com',
 'org': 'Jataware',
 'website': None,
 'is_registered': True,
 'id': 2}

Note that we now have an `id` for the user.

Let's create a project:

In [3]:
project =   {
    "active": True,
    "name": "BRose Project",
    "description": "Brandons first project in TDS!",
    "username": username
  }

resp = requests.post(f"{url}/projects", json=project)
project_id = resp.json()['id']
resp.json()

{'id': 2}

Note that we now have an `id` for the project.

## Publications

Now let's add a publication

In [4]:
publication =   {
  "id": 100,
  "xdd_uri": "some_xdd_uri",
  "title": "Some Paper"
}

resp = requests.post(f"{url}/external/publications", json=publication)
publication_id = resp.json()['id']
resp.json()

{'id': 100}

Let's add this publication as an asset to our project:

In [5]:
resource_type='publications'
resource_id=publication_id

resp = requests.post(f"{url}/projects/{project_id}/assets/{resource_type}/{resource_id}")
resp.json()

{'id': 112}

Let's check that the asset is attached to the project by listing the projects assets:

In [6]:
resp = requests.get(f"{url}/projects/{project_id}/assets")
resp.json()

{'datasets': [],
 'models': [],
 'model_configurations': [],
 'publications': [{'id': 100,
   'xdd_uri': 'some_xdd_uri',
   'title': 'Some Paper'}],
 'simulations': [],
 'workflows': []}

## Models
Let's create a model and add it as an asset to our project

In [7]:
model = json.loads(open('sir.json','r').read())
print(f"{model['name']}, v{model['model_version']}")

SIR Model, v0.1


In [8]:
resp = requests.post(f"{url}/models", json=model)
model_id = resp.json()['id']
resp.json()

{'id': '75d5f74c-f048-4c6e-93ae-04bf15f942f5'}

We can now add this model as an asset to our project:

In [9]:
resource_type='models'
resource_id=model_id

resp = requests.post(f"{url}/projects/{project_id}/assets/{resource_type}/{resource_id}")
resp.json()

{'id': 113}

Let's confirm the asset was added:

In [10]:
resp = requests.get(f"{url}/projects/{project_id}/assets")
resp.json()

{'datasets': [],
 'models': [],
 'model_configurations': [],
 'publications': [{'id': 100,
   'xdd_uri': 'some_xdd_uri',
   'title': 'Some Paper'}],
 'simulations': [],
 'workflows': []}

Let's list our model descriptions:

In [11]:
resp = requests.get(f"{url}/models/descriptions")
resp.json()

[]

Let's test out a search for this model as well:

In [12]:
query = {
    "match": {
      "description": "Micah"
    }
}

resp = requests.post(f"{url}/models/search", json=query)
resp.json()

[]

Let's now create a provenance tie between the model and the publication which it was extracted from:

In [13]:
provenance = {
  "relation_type": "EXTRACTED_FROM",
  "left": model_id,
  "left_type": "Model",
  "right": publication_id,
  "right_type": "Publication",
  "user_id": user_id
}

resp = requests.post(f"{url}/provenance", json=provenance)
resp.json()

{'id': 392}

## Model Configurations

Let's edit the initial value of the susceptible population from 1000 to 2000 and save this as a new model configuration.

In [14]:
model_config = deepcopy(model)
model_config['semantics']['ode']['parameters'][2]

{'id': 'S0',
 'description': 'Total susceptible population at timestep 0',
 'value': 1000}

In [15]:
model_config['semantics']['ode']['parameters'][2]['value'] = 2000
print(model_config['semantics']['ode']['parameters'][2]['value'])

2000


In [16]:
config = {
    "model_id": model_id,
    "name": "SIR example config",
    "description": "Increased susceptible population to 2000 relative to baseline",
    "configuration": model_config
}

resp = requests.post(f"{url}/model_configurations", json=config)
model_config_id = resp.json()['id']
resp.json()

{'id': '0e79c1d1-dcd6-4fb7-8296-3b5d375284fd'}

Let's ensure that the model configuration is correctly attached to the model:

In [17]:
resp = requests.get(f"{url}/models/{model_id}/model_configurations")
resp.json()

[]

## Datasets

Let's add an example dataset.

In [18]:
df = pd.read_csv('example.csv')
df.head()

Unnamed: 0,date,I,R,D,V,H,I_0-9,I_10-19,I_20-29,I_30-39,...,N_10-19,N_20-29,N_30-39,N_40-49,N_50-59,N_60-69,N_70-79,N_80-89,N_90-99,N_100+
0,2020-12-13,3003438,13294197,253207,30817,0,75981.0,176278.0,299356.0,271365.0,...,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160
1,2020-12-14,3066564,13444501,254791,35355,0,76432.0,176644.0,297050.0,270940.0,...,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160
2,2020-12-15,3074815,13635500,257164,81231,0,77189.0,177292.0,296908.0,271442.0,...,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160
3,2020-12-16,3144789,13842160,259933,236049,0,76929.0,176959.0,293334.0,269014.0,...,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160
4,2020-12-17,3178394,14056574,262637,502932,0,78317.0,179410.0,297305.0,272638.0,...,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160


In [19]:
dataset = {
  "username": username,
  "name": "CDC COVID-19 Vaccination and Case Trends by Age Group",
  "description": "CDC COVID-19 Vaccination and Case Trends by Age Group",
  "file_names": [
    "example.csv"
  ],
  "source": "https://github.com/DARPA-ASKEM/experiments/blob/main/thin-thread-examples/milestone_6month/evaluation/ta1/usa-IRDVHN_age.csv",
  }

In [20]:
columns = []
for c in df.columns:
    col = {
      "name": c,
      "data_type": "float",
      "annotations": {},
      "metadata": {},
      "grounding": {}
    }
    columns.append(col)

dataset['columns'] = columns

In [21]:
resp = requests.post(f"{url}/datasets", json=dataset)
dataset_id = resp.json()['id']
resp.json()

{'id': '86e0532b-3cc8-495f-be56-8f9a682caabb'}

Let's add this dataset as an asset to our project:

In [22]:
resource_type='datasets'
resource_id=dataset_id

resp = requests.post(f"{url}/projects/{project_id}/assets/{resource_type}/{resource_id}")
resp.json()

{'id': 114}

In [23]:
resp = requests.get(f"{url}/projects/{project_id}/assets")
resp.json()

{'datasets': [],
 'models': [],
 'model_configurations': [],
 'publications': [{'id': 100,
   'xdd_uri': 'some_xdd_uri',
   'title': 'Some Paper'}],
 'simulations': [],
 'workflows': []}

## Dataset file upload/download

Let's upload the associated file with the dataset. First we need to get a pre-signed url to upload the file:

In [24]:
query = {'filename': dataset['file_names'][0]}
resp = requests.get(f"{url}/datasets/{dataset_id}/upload-url", params=query)
upload_url = resp.json()['url']
resp.json()

{'url': 'http://minio:9000/askem-staging-data-service/datasets/86e0532b-3cc8-495f-be56-8f9a682caabb/example.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=miniouser%2F20230608%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230608T214832Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=8faf9b1572e844d09521adc7d22b6d59f8e100f1eaf4f8b6c4495d360dd54c85',
 'method': 'PUT'}

Since the signing URL must be exact and the URL is the name of the `minio` container within the Docker network, you have 2 options:

- use the `PUT` command from within the Docker network
- updated `/etc/hosts` with `127.0.0.1 minio` so that you can talk directly to `minio`

> Note: in production this really won't be relevant since we'll be using S3, this is just for local development.

In [25]:
with open('example.csv', 'rb') as file:
    resp = requests.put(upload_url, data=file)

We can now download the file by obtaining a pre-signed url for download:

In [26]:
query = {'filename': dataset['file_names'][0]}
resp = requests.get(f"{url}/datasets/{dataset_id}/download-url", params=query)
download_url = resp.json()['url']
resp.json()

{'url': 'http://minio:9000/askem-staging-data-service/datasets/86e0532b-3cc8-495f-be56-8f9a682caabb/example.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=miniouser%2F20230608%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230608T214832Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=c451843771d42a7fc44caedf465befb1682f531751ff83ddde6047cc45332972',
 'method': 'GET'}

In [27]:
resp = requests.get(download_url)
resp.text[:1000]

'date,I,R,D,V,H,I_0-9,I_10-19,I_20-29,I_30-39,I_40-49,I_50-59,I_60-69,I_70-79,N_0-9,N_10-19,N_20-29,N_30-39,N_40-49,N_50-59,N_60-69,N_70-79,N_80-89,N_90-99,N_100+\n2020-12-13,3003438,13294197,253207,30817,0,75981.0,176278.0,299356.0,271365.0,248207.0,238680.0,170763.0,94649.0,40795692,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160\n2020-12-14,3066564,13444501,254791,35355,0,76432.0,176644.0,297050.0,270940.0,248120.0,239206.0,171200.0,95561.0,40795692,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160\n2020-12-15,3074815,13635500,257164,81231,0,77189.0,177292.0,296908.0,271442.0,248301.0,239795.0,172448.0,96284.0,40795692,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160\n2020-12-16,3144789,13842160,259933,236049,0,76929.0,176959.0,293334.0,269014.0,247213.0,238855.0,171159.0,95813.0,40795692,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160\

## Workflows
Here we add an example workflow to TDS and add it as an asset to the project.

In [28]:
workflow = {
  "name": "Test Workflow",
  "description": "This is the description",
  "transform": {
    "x": 127.0002,
    "y": 17.1284,
    "z": 2.53
  },
  "nodes": [
    {
      "model": {
        "id": model_id,
        "configurations": [
          {
            "id": model_config_id
          }
        ]
      }
    }
  ],
  "edges": []
}

In [29]:
resp = requests.post(f"{url}/workflows", json=workflow)
workflow_id = resp.json()['id']
resp.json()

{'id': 'a2409c54-7b70-4ff2-b9e3-a2f4b177e1c7'}

Let's add this workflow as an asset to our project

In [30]:
resource_type='workflows'
resource_id=workflow_id

resp = requests.post(f"{url}/projects/{project_id}/assets/{resource_type}/{resource_id}")
resp.json()

{'id': 115}

## Simulations
Create a simulation and iteratively update as it completes. Store the result file(s).

In [31]:
simulation = {
  "id": "sim-123",
  "execution_payload": {
  "engine": "ciemss",
  "model_config_id": "ba8da8d4-047d-11ee-be56",
  "timespan": {
    "start_epoch": 1672531200,
    "end_epoch": 1703980800,
    "tstep_seconds": 86400
  },
  "num_samples": 100,
  "extra": {}
},
  "result_files": [],
  "type": "simulation",
  "status": "queued",
  "engine": "ciemss",
  "workflow_id": workflow_id,
  "user_id": user_id,
  "project_id": project_id
}

In [32]:
resp = requests.post(f"{url}/simulations", json=simulation)
simulation_id = resp.json()['id']
resp.json()

{'id': 'sim-123'}

Let's take a look at this simulation:

In [33]:
requests.get(f"{url}/simulations/{simulation_id}").json()

{'id': 'sim-123',
 'name': None,
 'description': None,
 'timestamp': '2023-06-08T21:48:33.048333',
 'engine': 'ciemss',
 'type': 'simulation',
 'status': 'queued',
 'execution_payload': {'engine': 'ciemss',
  'model_config_id': 'ba8da8d4-047d-11ee-be56',
  'timespan': {'start_epoch': 1672531200,
   'end_epoch': 1703980800,
   'tstep_seconds': 86400},
  'num_samples': 100,
  'extra': {}},
 'start_time': None,
 'completed_time': None,
 'workflow_id': 'a2409c54-7b70-4ff2-b9e3-a2f4b177e1c7',
 'user_id': 2,
 'project_id': 2,
 'result_files': []}

Note that `start_time` and `completed_time` are null and the `result_files` are empty. This is because the run is only _queued_ in the Simulation Service.

Once the Simulation Service starts running the simulation it should update the `status` to `running` and set the `start_time` to the time the simulation run actually started. Let's mock that now:

In [34]:
import time

simulation['start_time'] = int(time.time())
simulation['status'] = 'running'

resp = requests.put(f"{url}/simulations/{simulation_id}", json=simulation)
simulation_id = resp.json()['id']
resp.json()

{'id': 'sim-123'}

Let's check the updated simulation:

In [35]:
requests.get(f"{url}/simulations/{simulation_id}").json()

{'id': 'sim-123',
 'name': None,
 'description': None,
 'timestamp': '2023-06-08T21:48:33.148592',
 'engine': 'ciemss',
 'type': 'simulation',
 'status': 'running',
 'execution_payload': {'engine': 'ciemss',
  'model_config_id': 'ba8da8d4-047d-11ee-be56',
  'timespan': {'start_epoch': 1672531200,
   'end_epoch': 1703980800,
   'tstep_seconds': 86400},
  'num_samples': 100,
  'extra': {}},
 'start_time': '2023-06-08T21:48:33+00:00',
 'completed_time': None,
 'workflow_id': 'a2409c54-7b70-4ff2-b9e3-a2f4b177e1c7',
 'user_id': 2,
 'project_id': 2,
 'result_files': []}

Now let's pretend the Simulation Service has finished running the simulation and has created a result called `result.csv`:

In [36]:
simulation['completed_time'] = int(time.time())
simulation['status'] = 'complete'
simulation['result_files'] = ['result.csv']

resp = requests.put(f"{url}/simulations/{simulation_id}", json=simulation)
simulation_id = resp.json()['id']
resp.json()

{'id': 'sim-123'}

In [37]:
requests.get(f"{url}/simulations/{simulation_id}").json()

{'id': 'sim-123',
 'name': None,
 'description': None,
 'timestamp': '2023-06-08T21:48:33.223895',
 'engine': 'ciemss',
 'type': 'simulation',
 'status': 'complete',
 'execution_payload': {'engine': 'ciemss',
  'model_config_id': 'ba8da8d4-047d-11ee-be56',
  'timespan': {'start_epoch': 1672531200,
   'end_epoch': 1703980800,
   'tstep_seconds': 86400},
  'num_samples': 100,
  'extra': {}},
 'start_time': '2023-06-08T21:48:33+00:00',
 'completed_time': '2023-06-08T21:48:33+00:00',
 'workflow_id': 'a2409c54-7b70-4ff2-b9e3-a2f4b177e1c7',
 'user_id': 2,
 'project_id': 2,
 'result_files': ['result.csv']}

Uploading and downloading simulation results uses the same pattern as `datasets`. 

Let's add the result file `result.csv` to S3:

In [38]:
query = {'filename': simulation['result_files'][0]}
resp = requests.get(f"{url}/simulations/{simulation_id}/upload-url", params=query)
upload_url = resp.json()['url']
resp.json()

{'url': 'http://minio:9000/askem-staging-data-service/simulations/sim-123/result.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=miniouser%2F20230608%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230608T214833Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=f8350f5b8d4e5dbd3f330cc107884c701e4318e2f55250de5ba74d0f9c609aa4',
 'method': 'PUT'}

In [39]:
with open('result.csv', 'rb') as file:
    resp = requests.put(upload_url, data=file)

Now let's test that it was stored correctly by fetching it:

In [40]:
query = {'filename': simulation['result_files'][0]}
resp = requests.get(f"{url}/simulations/{simulation_id}/download-url", params=query)
download_url = resp.json()['url']
resp.json()

resp = requests.get(download_url)
resp.text[:1000].split('\n')

['timestamp,S,I,R',
 '1,100,1,0',
 '2,90,11,0',
 '3,90,1,10',
 '4,80,10,11',
 '5,60,30,21']

Finally, let's add this simulation as an asset to our project:

In [41]:
resource_type='simulations'
resource_id=simulation_id

resp = requests.post(f"{url}/projects/{project_id}/assets/{resource_type}/{resource_id}")
resp.json()

{'id': 116}

In [42]:
resp = requests.get(f"{url}/projects/{project_id}/assets")
resp.json()

{'datasets': [{'id': '86e0532b-3cc8-495f-be56-8f9a682caabb',
   'timestamp': '2023-06-08T21:48:32.711803',
   'username': 'Brandon Rose',
   'name': 'CDC COVID-19 Vaccination and Case Trends by Age Group',
   'description': 'CDC COVID-19 Vaccination and Case Trends by Age Group',
   'data_source_date': None,
   'file_names': ['example.csv'],
   'dataset_url': None,
   'columns': [{'name': 'date',
     'data_type': 'float',
     'format_str': None,
     'annotations': {},
     'metadata': {},
     'grounding': {}},
    {'name': 'I',
     'data_type': 'float',
     'format_str': None,
     'annotations': {},
     'metadata': {},
     'grounding': {}},
    {'name': 'R',
     'data_type': 'float',
     'format_str': None,
     'annotations': {},
     'metadata': {},
     'grounding': {}},
    {'name': 'D',
     'data_type': 'float',
     'format_str': None,
     'annotations': {},
     'metadata': {},
     'grounding': {}},
    {'name': 'V',
     'data_type': 'float',
     'format_str': No