# Example TDS Flow

This notebook outlines an example storage and recall flow for the Terarium Data Service

### Contents:

1. [Users and Projects](#Users-and-Projects)
2. [Publications](#Publications)
3. [Models](#Models)
4. [Model Configurations](#Model-Configurations)
5. [Datasets](#Datasets)
6. [Dataset file upload/download](#Dataset-file-upload/download)
7. [Workflows](#Workflows)
8. [Simulations](#Simulations)

In [1]:
import requests
import json
from copy import deepcopy
import pandas as pd
url = 'http://localhost:8001'

## Users and Projects

Create a new user and get its `id`:

In [2]:
username = 'Brandon Rose'
user = {'name': username,
        'email': 'brandon@jataware.com',
        'org': 'Jataware',
        'is_registered': True}

resp = requests.post(f"{url}/persons", json=user)
user_id = resp.json()['id']
resp.json()

{'name': 'Brandon Rose',
 'email': 'brandon@jataware.com',
 'org': 'Jataware',
 'website': None,
 'is_registered': True,
 'id': 4}

Note that we now have an `id` for the user.

Let's create a project:

In [3]:
project =   {
    "active": True,
    "name": "BRose Project",
    "description": "Brandons first project in TDS!",
    "username": username
  }

resp = requests.post(f"{url}/projects", json=project)
project_id = resp.json()['id']
resp.json()

{'id': 3}

Note that we now have an `id` for the project.

## Publications

Now let's add a publication

In [4]:
publication =   {
  "id": 100,
  "xdd_uri": "some_xdd_uri",
  "title": "Some Paper"
}

resp = requests.post(f"{url}/external/publications", json=publication)
publication_id = resp.json()['id']
resp.json()

{'id': 100, 'xdd_uri': 'some_xdd_uri', 'title': 'Some Paper'}

Let's add this publication as an asset to our project:

In [5]:
resource_type='publications'
resource_id=publication_id

resp = requests.post(f"{url}/projects/{project_id}/assets/{resource_type}/{resource_id}")
resp.json()

{'id': 116}

Let's check that the asset is attached to the project by listing the projects assets:

In [6]:
resp = requests.get(f"{url}/projects/{project_id}/assets")
resp.json()

{'datasets': [],
 'models': [],
 'model_configurations': [],
 'publications': [{'id': 100,
   'xdd_uri': 'some_xdd_uri',
   'title': 'Some Paper'}],
 'simulations': [],
 'workflows': []}

## Models
Let's create a model and add it as an asset to our project

In [7]:
model = json.loads(open('sir.json','r').read())
print(f"{model['name']}, v{model['model_version']}")

SIR Model, v0.1


In [8]:
resp = requests.post(f"{url}/models", json=model)
model_id = resp.json()['id']
resp.json()

{'id': '5eace001-bbc6-4d15-ae05-7fc1110e27e5'}

We can now add this model as an asset to our project:

In [9]:
resource_type='models'
resource_id=model_id

resp = requests.post(f"{url}/projects/{project_id}/assets/{resource_type}/{resource_id}")
resp.json()

{'id': 117}

Let's confirm the asset was added:

In [10]:
resp = requests.get(f"{url}/projects/{project_id}/assets")
resp.json()

{'datasets': [],
 'models': [{'id': '5eace001-bbc6-4d15-ae05-7fc1110e27e5',
   'timestamp': '2023-06-14T18:01:11.368000+00:00',
   'name': 'SIR Model',
   'description': 'SIR model created by Ben, Micah, Brandon',
   'schema': 'https://raw.githubusercontent.com/DARPA-ASKEM/Model-Representations/petrinet_v0.1/petrinet/petrinet_schema.json',
   'model_version': '0.1'}],
 'model_configurations': [],
 'publications': [{'id': 100,
   'xdd_uri': 'some_xdd_uri',
   'title': 'Some Paper'}],
 'simulations': [],
 'workflows': []}

Let's list our model descriptions:

In [11]:
resp = requests.get(f"{url}/models/descriptions")
resp.json()

[{'id': '41d286a3-f516-41bf-86b4-544e51d1f8b8',
  'timestamp': '2023-06-12T01:48:39.630000+00:00',
  'name': 'A Test Model',
  'description': 'Test Model Post from Swagger.',
  'schema': 'https://raw.githubusercontent.com/DARPA-ASKEM/Model-Representations/petrinet_v0.2/petrinet/petrinet_schema.json',
  'model_version': '1.0'},
 {'id': 'ebd24f0f-ec16-4531-836e-fdadb376c592',
  'timestamp': '2023-06-14T16:23:57.551000+00:00',
  'name': 'SIR Model',
  'description': 'SIR model created by Ben, Micah, Brandon',
  'schema': 'https://raw.githubusercontent.com/DARPA-ASKEM/Model-Representations/petrinet_v0.1/petrinet/petrinet_schema.json',
  'model_version': '0.1'},
 {'id': 'cd50cb89-7b4f-442f-9d88-e14c48620f33',
  'timestamp': '2023-06-14T16:27:08.290000+00:00',
  'name': 'SIR Model',
  'description': 'SIR model created by Ben, Micah, Brandon',
  'schema': 'https://raw.githubusercontent.com/DARPA-ASKEM/Model-Representations/petrinet_v0.1/petrinet/petrinet_schema.json',
  'model_version': '0.1'

Let's test out a search for this model as well:

In [12]:
query = {
    "match": {
      "description": "Micah"
    }
}

resp = requests.post(f"{url}/models/search", json=query)
resp.json()

[{'id': 'ebd24f0f-ec16-4531-836e-fdadb376c592',
  'timestamp': '2023-06-14T16:23:57.551000+00:00',
  'name': 'SIR Model',
  'description': 'SIR model created by Ben, Micah, Brandon',
  'schema': 'https://raw.githubusercontent.com/DARPA-ASKEM/Model-Representations/petrinet_v0.1/petrinet/petrinet_schema.json',
  'model_version': '0.1'},
 {'id': 'cd50cb89-7b4f-442f-9d88-e14c48620f33',
  'timestamp': '2023-06-14T16:27:08.290000+00:00',
  'name': 'SIR Model',
  'description': 'SIR model created by Ben, Micah, Brandon',
  'schema': 'https://raw.githubusercontent.com/DARPA-ASKEM/Model-Representations/petrinet_v0.1/petrinet/petrinet_schema.json',
  'model_version': '0.1'},
 {'id': '5eace001-bbc6-4d15-ae05-7fc1110e27e5',
  'timestamp': '2023-06-14T18:01:11.368000+00:00',
  'name': 'SIR Model',
  'description': 'SIR model created by Ben, Micah, Brandon',
  'schema': 'https://raw.githubusercontent.com/DARPA-ASKEM/Model-Representations/petrinet_v0.1/petrinet/petrinet_schema.json',
  'model_version

Let's now create a provenance tie between the model and the publication which it was extracted from:

In [13]:
provenance = {
  "relation_type": "EXTRACTED_FROM",
  "left": model_id,
  "left_type": "Model",
  "right": publication_id,
  "right_type": "Publication",
  "user_id": user_id
}

resp = requests.post(f"{url}/provenance", json=provenance)
resp.json()

{'id': 393}

## Model Configurations

Let's edit the initial value of the susceptible population from 1000 to 2000 and save this as a new model configuration.

In [14]:
model_config = deepcopy(model)
model_config['semantics']['ode']['parameters'][2]

{'id': 'S0',
 'description': 'Total susceptible population at timestep 0',
 'value': 1000}

In [15]:
model_config['semantics']['ode']['parameters'][2]['value'] = 2000
print(model_config['semantics']['ode']['parameters'][2]['value'])

2000


In [16]:
config = {
    "model_id": model_id,
    "name": "SIR example config",
    "description": "Increased susceptible population to 2000 relative to baseline",
    "configuration": model_config
}

resp = requests.post(f"{url}/model_configurations", json=config)
model_config_id = resp.json()['id']
resp.json()

{'id': '7f0ed1ba-d9c1-4d4b-bb83-2b87790aad9c'}

Let's ensure that the model configuration is correctly attached to the model:

In [17]:
resp = requests.get(f"{url}/models/{model_id}/model_configurations")
resp.json()

[{'id': '7f0ed1ba-d9c1-4d4b-bb83-2b87790aad9c',
  'name': 'SIR example config',
  'description': 'Increased susceptible population to 2000 relative to baseline',
  'timestamp': '2023-06-14T18:01:25.537142',
  'model_id': '5eace001-bbc6-4d15-ae05-7fc1110e27e5',
  'configuration': {'name': 'SIR Model',
   'schema': 'https://raw.githubusercontent.com/DARPA-ASKEM/Model-Representations/petrinet_v0.1/petrinet/petrinet_schema.json',
   'description': 'SIR model created by Ben, Micah, Brandon',
   'model_version': '0.1',
   'model': {'states': [{'id': 'S',
      'name': 'Susceptible',
      'grounding': {'identifiers': {'ido': '0000514'}}},
     {'id': 'I',
      'name': 'Infected',
      'grounding': {'identifiers': {'ido': '0000511'}}},
     {'id': 'R',
      'name': 'Recovered',
      'grounding': {'identifiers': {'ido': '0000592'}}}],
    'transitions': [{'id': 'inf',
      'input': ['S', 'I'],
      'output': ['I', 'I'],
      'properties': {'name': 'Infection'}},
     {'id': 'rec',
     

## Datasets

Let's add an example dataset.

In [18]:
df = pd.read_csv('example.csv')
df.head()

Unnamed: 0,date,I,R,D,V,H,I_0-9,I_10-19,I_20-29,I_30-39,...,N_10-19,N_20-29,N_30-39,N_40-49,N_50-59,N_60-69,N_70-79,N_80-89,N_90-99,N_100+
0,2020-12-13,3003438,13294197,253207,30817,0,75981.0,176278.0,299356.0,271365.0,...,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160
1,2020-12-14,3066564,13444501,254791,35355,0,76432.0,176644.0,297050.0,270940.0,...,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160
2,2020-12-15,3074815,13635500,257164,81231,0,77189.0,177292.0,296908.0,271442.0,...,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160
3,2020-12-16,3144789,13842160,259933,236049,0,76929.0,176959.0,293334.0,269014.0,...,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160
4,2020-12-17,3178394,14056574,262637,502932,0,78317.0,179410.0,297305.0,272638.0,...,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160


In [19]:
dataset = {
  "username": username,
  "name": "CDC COVID-19 Vaccination and Case Trends by Age Group",
  "description": "CDC COVID-19 Vaccination and Case Trends by Age Group",
  "file_names": [
    "example.csv"
  ],
  "source": "https://github.com/DARPA-ASKEM/experiments/blob/main/thin-thread-examples/milestone_6month/evaluation/ta1/usa-IRDVHN_age.csv",
  }

In [20]:
columns = []
for c in df.columns:
    col = {
      "name": c,
      "data_type": "float",
      "annotations": {},
      "metadata": {},
      "grounding": {}
    }
    columns.append(col)

dataset['columns'] = columns

In [21]:
resp = requests.post(f"{url}/datasets", json=dataset)
dataset_id = resp.json()['id']
resp.json()

{'id': '228f2e6f-bf40-4b40-9b36-09dfb6e343a0'}

Let's add this dataset as an asset to our project:

In [22]:
resource_type='datasets'
resource_id=dataset_id

resp = requests.post(f"{url}/projects/{project_id}/assets/{resource_type}/{resource_id}")
resp.json()

{'id': 118}

In [23]:
resp = requests.get(f"{url}/projects/{project_id}/assets")
resp.json()

{'datasets': [{'id': '228f2e6f-bf40-4b40-9b36-09dfb6e343a0',
   'timestamp': '2023-06-14T18:01:35.472324',
   'username': 'Brandon Rose',
   'name': 'CDC COVID-19 Vaccination and Case Trends by Age Group',
   'description': 'CDC COVID-19 Vaccination and Case Trends by Age Group',
   'data_source_date': None,
   'file_names': ['example.csv'],
   'dataset_url': None,
   'columns': [{'name': 'date',
     'data_type': 'float',
     'format_str': None,
     'annotations': {},
     'metadata': {},
     'grounding': {}},
    {'name': 'I',
     'data_type': 'float',
     'format_str': None,
     'annotations': {},
     'metadata': {},
     'grounding': {}},
    {'name': 'R',
     'data_type': 'float',
     'format_str': None,
     'annotations': {},
     'metadata': {},
     'grounding': {}},
    {'name': 'D',
     'data_type': 'float',
     'format_str': None,
     'annotations': {},
     'metadata': {},
     'grounding': {}},
    {'name': 'V',
     'data_type': 'float',
     'format_str': No

## Dataset file upload/download

Let's upload the associated file with the dataset. First we need to get a pre-signed url to upload the file:

In [24]:
query = {'filename': dataset['file_names'][0]}
resp = requests.get(f"{url}/datasets/{dataset_id}/upload-url", params=query)
upload_url = resp.json()['url']
resp.json()

{'url': 'http://localhost:9000/askem-staging-data-service/datasets/228f2e6f-bf40-4b40-9b36-09dfb6e343a0/example.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=miniouser%2F20230614%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230614T180141Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=0aa08153713990343215df2d07a53fa36136cf5360ab9a43c592e63cd7450e02',
 'method': 'PUT'}

Since the signing URL must be exact and the URL is the name of the `minio` container within the Docker network, you have 2 options:

- use the `PUT` command from within the Docker network
- updated `/etc/hosts` with `127.0.0.1 minio` so that you can talk directly to `minio`

> Note: in production this really won't be relevant since we'll be using S3, this is just for local development.

In [25]:
with open('example.csv', 'rb') as file:
    resp = requests.put(upload_url, data=file)

We can now download the file by obtaining a pre-signed url for download:

In [26]:
query = {'filename': dataset['file_names'][0]}
resp = requests.get(f"{url}/datasets/{dataset_id}/download-url", params=query)
download_url = resp.json()['url']
resp.json()

{'url': 'http://localhost:9000/askem-staging-data-service/datasets/228f2e6f-bf40-4b40-9b36-09dfb6e343a0/example.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=miniouser%2F20230614%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230614T180144Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=653aa2bed4bd981d006021d707fd4495181613816ae482a0465eb016422fe13c',
 'method': 'GET'}

In [27]:
resp = requests.get(download_url)
resp.text[:1000]

'date,I,R,D,V,H,I_0-9,I_10-19,I_20-29,I_30-39,I_40-49,I_50-59,I_60-69,I_70-79,N_0-9,N_10-19,N_20-29,N_30-39,N_40-49,N_50-59,N_60-69,N_70-79,N_80-89,N_90-99,N_100+\n2020-12-13,3003438,13294197,253207,30817,0,75981.0,176278.0,299356.0,271365.0,248207.0,238680.0,170763.0,94649.0,40795692,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160\n2020-12-14,3066564,13444501,254791,35355,0,76432.0,176644.0,297050.0,270940.0,248120.0,239206.0,171200.0,95561.0,40795692,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160\n2020-12-15,3074815,13635500,257164,81231,0,77189.0,177292.0,296908.0,271442.0,248301.0,239795.0,172448.0,96284.0,40795692,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160\n2020-12-16,3144789,13842160,259933,236049,0,76929.0,176959.0,293334.0,269014.0,247213.0,238855.0,171159.0,95813.0,40795692,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160\

## Workflows
Here we add an example workflow to TDS and add it as an asset to the project.

In [28]:
workflow = {
  "name": "Test Workflow",
  "description": "This is the description",
  "transform": {
    "x": 127.0002,
    "y": 17.1284,
    "z": 2.53
  },
  "nodes": [
    {
      "model": {
        "id": model_id,
        "configurations": [
          {
            "id": model_config_id
          }
        ]
      }
    }
  ],
  "edges": []
}

In [29]:
resp = requests.post(f"{url}/workflows", json=workflow)
workflow_id = resp.json()['id']
resp.json()

{'id': 'bc3ee8ef-a84d-4f35-a782-68f6fcff9853'}

Let's add this workflow as an asset to our project

In [30]:
resource_type='workflows'
resource_id=workflow_id

resp = requests.post(f"{url}/projects/{project_id}/assets/{resource_type}/{resource_id}")
resp.json()

{'id': 119}

## Simulations
Create a simulation and iteratively update as it completes. Store the result file(s).

First, we need to add a model config that is not an AMR because we they are not supported yet.

In [31]:
config = {
    "model_id": model_id,
    "name": "SEIRD ACSet",
    "description": "ACSet Configuration",
    "configuration": {"T":[{"tname":"exp"},{"tname":"conv"},{"tname":"rec"},{"tname":"death"}],"S":[{"sname":"S"},{"sname":"E"},{"sname":"I"},{"sname":"R"},{"sname":"D"}],"I":[{"it":1,"is":1},{"it":1,"is":3},{"it":2,"is":2},{"it":3,"is":3},{"it":4,"is":3}],"O":[{"ot":1,"os":2},{"ot":1,"os":3},{"ot":2,"os":3},{"ot":3,"os":4},{"ot":4,"os":5}]}
}

resp = requests.post(f"{url}/model_configurations", json=config)
model_config_acset_id = resp.json()['id']

Now we can kick off a simulation run. We'll also register the simulation run in TDS because it is time sensitive.

In [38]:
# Ensure that the simulation-service (SciML is running)
sim_service_url = "http://localhost:8080"
simulate_payload = {
    "model_config_id": model_config_acset_id,
    "extra": {
      "initials": {
        "S": 0.49457800495224524,
        "E": 0.26745259325403603,
        "I": 0.4497387877393193,
        "R": 0.32807705995998604,
        "D": 0.8545934885162726
      },
      "params": {
        "exp":   0.16207166221196045,
        "conv":  0.7009195813964052,
        "rec":   0.7040317196117394,
        "death": 0.15807853921067516
       }
    },
    "timespan": {"start": 0, "end": 90},
    "engine": "sciml"
}


sim_resp = requests.post(f"{sim_service_url}/simulate", json=simulate_payload, headers={'Content-Type': 'application/json'})
simulation_id = sim_resp.json()["simulation_id"]

# TDS expects the HMI to post a new simulation after kicking of a job on the sim service
simulation = {
  "id": simulation_id,
  "execution_payload": simulate_payload, 
  "result_files": [],
  "type": "simulation",
  "status": "queued",
  "engine": "sciml",
  "workflow_id": workflow_id,
  "user_id": user_id,
  "project_id": project_id
}

# If the simulation doesn't exist in TDS, the sim service will fail after a few second
resp = requests.post(f"{url}/simulations", json=simulation)
print(resp.json())

{'id': 'sciml-00000000-0000-0000-0010-fe5a35606fbd'}


Let's see if the run has completed and get the url for the resulting dataset if done.

In [95]:
# View the results
resp = requests.get(f"{url}/simulations/{simulation_id}")
simulation = resp.json()
print(simulation)
if simulation['status'] in ["running", "queued"]:
    print("Job hasn't completed. Please rerun cell!")
elif simulation['status'] == "error":
    print("The job failed somehow")
else:
    resp = requests.get(f"{url}/simulations/{simulation_id}/download-url?filename=results.csv")
    download_url = resp.json()["url"]
    print("Job complete! Ready to download!")

{'id': 'sciml-00000000-0000-0000-0010-fe5a35606fbd', 'name': None, 'description': None, 'timestamp': '2023-06-14T18:07:29.392396', 'engine': 'sciml', 'type': 'simulation', 'status': 'complete', 'execution_payload': {'engine': 'sciml', 'model_config_id': '2449a8c8-ea78-4ece-9312-e7a78ae35e77', 'timespan': {'start': 0, 'end': 90}, 'num_samples': None, 'extra': {'initials': {'S': 0.49457800495224524, 'E': 0.26745259325403603, 'I': 0.4497387877393193, 'R': 0.32807705995998604, 'D': 0.8545934885162726}, 'params': {'exp': 0.16207166221196045, 'conv': 0.7009195813964052, 'rec': 0.7040317196117394, 'death': 0.15807853921067516}}}, 'start_time': '2023-06-14T18:07:02.982469+00:00', 'completed_time': '2023-06-14T18:07:27.285751+00:00', 'workflow_id': 'bc3ee8ef-a84d-4f35-a782-68f6fcff9853', 'user_id': 4, 'project_id': 3, 'result_files': ['http://localhost:9000/askem-staging-data-service/simulations/sciml-00000000-0000-0000-0010-fe5a35606fbd/result.csv']}
Job complete! Ready to download!


Note that `start_time` and `completed_time` are null and the `result_files` are empty IF it is the status `queued`. `start_time` is updated when the status is `running`. `completed_time` and `result_files` are updated when the job `complete`s. 

Uploading and downloading simulation results uses the same pattern as `datasets`. 

Let's download the result file `result.csv` from S3:

In [97]:
print(f"From {download_url}")
pd.read_csv(download_url)

From http://localhost:9000/askem-staging-data-service/simulations/sciml-00000000-0000-0000-0010-fe5a35606fbd/results.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=miniouser%2F20230614%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230614T181844Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=8eac638ca4037f1a7838cf200f5779a68d176e1bea5dd56266a7fc624796965d


HTTPError: HTTP Error 404: Not Found

Finally, let's add this simulation as an asset to our project:

In [None]:
resource_type='simulations'
resource_id=simulation_id

resp = requests.post(f"{url}/projects/{project_id}/assets/{resource_type}/{resource_id}")
resp.json()

In [None]:
resp = requests.get(f"{url}/projects/{project_id}/assets")
resp.json()