# Example TDS Flow

This notebook outlines an example storage and recall flow for the Terarium Data Service

### Contents:

1. [Users and Projects](#Users-and-Projects)
2. [Publications](#Publications)
3. [Models](#Models)
4. [Model Configurations](#Model-Configurations)
5. [Datasets](#Datasets)
6. [Dataset file upload/download](#Dataset-file-upload/download)

In [1]:
import requests
import json
from copy import deepcopy
import pandas as pd
url = 'http://localhost:8001'

## Users and Projects

Create a new user and get its `id`:

In [2]:
username = 'Brandon Rose'
user = {'name': username,
        'email': 'brandon@jataware.com',
        'org': 'Jataware',
        'is_registered': True}

resp = requests.post(f"{url}/persons", json=user)
user_id = resp.json()['id']
resp.json()

{'name': 'Brandon Rose',
 'email': 'brandon@jataware.com',
 'org': 'Jataware',
 'website': None,
 'is_registered': True,
 'id': 2}

Note that we now have an `id` for the user.

Let's create a project:

In [3]:
project =   {
    "active": True,
    "name": "BRose Project",
    "description": "Brandons first project in TDS!",
    "username": username
  }

resp = requests.post(f"{url}/projects", json=project)
project_id = resp.json()['id']
resp.json()

{'id': 2}

Note that we now have an `id` for the project.

## Publications

Now let's add a publication

In [4]:
publication =   {
  "id": 100,
  "xdd_uri": "some_xdd_uri",
  "title": "Some Paper"
}

resp = requests.post(f"{url}/external/publications", json=publication)
publication_id = resp.json()['id']
resp.json()

{'id': 100}

Let's add this publication as an asset to our project:

In [5]:
resource_type='publications'
resource_id=publication_id

resp = requests.post(f"{url}/projects/{project_id}/assets/{resource_type}/{resource_id}")
resp.json()

{'id': 112}

Let's check that the asset is attached to the project by listing the projects assets:

In [6]:
resp = requests.get(f"{url}/projects/{project_id}/assets")
resp.json()

{'datasets': [],
 'models': [],
 'model_configurations': [],
 'publications': [{'id': 100,
   'xdd_uri': 'some_xdd_uri',
   'title': 'Some Paper'}],
 'simulation_runs': []}

## Models
Let's create a model and add it as an asset to our project

In [7]:
model = json.loads(open('sir.json','r').read())
print(f"{model['name']}, v{model['model_version']}")

SIR Model, v0.1


In [8]:
resp = requests.post(f"{url}/models", json=model)
model_id = resp.json()['id']
resp.json()

{'id': '97232ada-2380-4a44-9a86-8ab1b9fc5919'}

We can now add this model as an asset to our project:

In [9]:
resource_type='models'
resource_id=model_id

resp = requests.post(f"{url}/projects/{project_id}/assets/{resource_type}/{resource_id}")
resp.json()

{'id': 113}

Let's confirm the asset was added:

In [10]:
resp = requests.get(f"{url}/projects/{project_id}/assets")
resp.json()

{'datasets': [],
 'models': [{'id': '97232ada-2380-4a44-9a86-8ab1b9fc5919',
   'timestamp': '2023-06-01T14:42:42.701000+00:00',
   'name': 'SIR Model',
   'description': 'SIR model created by Ben, Micah, Brandon',
   'schema': 'https://raw.githubusercontent.com/DARPA-ASKEM/Model-Representations/petrinet_v0.1/petrinet/petrinet_schema.json',
   'model_version': '0.1'}],
 'model_configurations': [],
 'publications': [{'id': 100,
   'xdd_uri': 'some_xdd_uri',
   'title': 'Some Paper'}],
 'simulation_runs': []}

Let's list our model descriptions:

In [11]:
resp = requests.get(f"{url}/models/descriptions")
resp.json()

[{'id': '97232ada-2380-4a44-9a86-8ab1b9fc5919',
  'timestamp': '2023-06-01T14:42:42.701000+00:00',
  'name': 'SIR Model',
  'description': 'SIR model created by Ben, Micah, Brandon',
  'schema': 'https://raw.githubusercontent.com/DARPA-ASKEM/Model-Representations/petrinet_v0.1/petrinet/petrinet_schema.json',
  'model_version': '0.1'}]

Let's test out a search for this model as well:

In [12]:
query = {
    "match": {
      "description": "Micah"
    }
}

resp = requests.post(f"{url}/models/search", json=query)
resp.json()

[{'id': '97232ada-2380-4a44-9a86-8ab1b9fc5919',
  'timestamp': '2023-06-01T14:42:42.701000+00:00',
  'name': 'SIR Model',
  'description': 'SIR model created by Ben, Micah, Brandon',
  'schema': 'https://raw.githubusercontent.com/DARPA-ASKEM/Model-Representations/petrinet_v0.1/petrinet/petrinet_schema.json',
  'model_version': '0.1'}]

Let's now create a provenance tie between the model and the publication which it was extracted from:

In [13]:
provenance = {
  "relation_type": "EXTRACTED_FROM",
  "left": model_id,
  "left_type": "Model",
  "right": publication_id,
  "right_type": "Publication",
  "user_id": user_id
}

resp = requests.post(f"{url}/provenance", json=provenance)
resp.json()

{'id': 392}

## Model Configurations

Let's edit the initial value of the susceptible population from 1000 to 2000 and save this as a new model configuration.

In [14]:
model_config = deepcopy(model)
model_config['semantics']['ode']['parameters'][2]

{'id': 'S0',
 'description': 'Total susceptible population at timestep 0',
 'value': 1000}

In [15]:
model_config['semantics']['ode']['parameters'][2]['value'] = 2000
print(model_config['semantics']['ode']['parameters'][2]['value'])

2000


In [16]:
config = {
    "model_id": model_id,
    "name": "SIR example config",
    "description": "Increased susceptible population to 2000 relative to baseline",
    "configuration": model_config
}

resp = requests.post(f"{url}/model_configurations", json=config)
model_config_id = resp.json()['id']
resp.json()

{'id': '6c826c52-75a6-4763-b910-5c32c35e2e3a'}

Let's ensure that the model configuration is correctly attached to the model:

In [17]:
resp = requests.get(f"{url}/models/{model_id}/model_configurations")
resp.json()

[{'id': '6c826c52-75a6-4763-b910-5c32c35e2e3a',
  'name': 'SIR example config',
  'description': 'Increased susceptible population to 2000 relative to baseline',
  'model_id': '97232ada-2380-4a44-9a86-8ab1b9fc5919',
  'model': None}]

## Datasets

Let's add an example dataset.

In [18]:
df = pd.read_csv('example.csv')
df.head()

Unnamed: 0,date,I,R,D,V,H,I_0-9,I_10-19,I_20-29,I_30-39,...,N_10-19,N_20-29,N_30-39,N_40-49,N_50-59,N_60-69,N_70-79,N_80-89,N_90-99,N_100+
0,2020-12-13,3003438,13294197,253207,30817,0,75981.0,176278.0,299356.0,271365.0,...,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160
1,2020-12-14,3066564,13444501,254791,35355,0,76432.0,176644.0,297050.0,270940.0,...,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160
2,2020-12-15,3074815,13635500,257164,81231,0,77189.0,177292.0,296908.0,271442.0,...,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160
3,2020-12-16,3144789,13842160,259933,236049,0,76929.0,176959.0,293334.0,269014.0,...,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160
4,2020-12-17,3178394,14056574,262637,502932,0,78317.0,179410.0,297305.0,272638.0,...,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160


In [19]:
dataset = {
  "name": "CDC COVID-19 Vaccination and Case Trends by Age Group",
  "description": "CDC COVID-19 Vaccination and Case Trends by Age Group",
  "file_names": [
    "example.csv"
  ],
  "source": "https://github.com/DARPA-ASKEM/experiments/blob/main/thin-thread-examples/milestone_6month/evaluation/ta1/usa-IRDVHN_age.csv",
  }

In [20]:
columns = []
for c in df.columns:
    col = {
      "name": c,
      "data_type": "float",
      "annotations": {},
      "metadata": {},
      "grounding": {}
    }
    columns.append(col)

dataset['columns'] = columns

In [21]:
resp = requests.post(f"{url}/datasets", json=dataset)
dataset_id = resp.json()['id']
resp.json()

{'id': '55a04dc0-4570-4338-8794-c82e408b3fe8'}

Let's add this dataset as an asset to our project:

In [22]:
resource_type='datasets'
resource_id=dataset_id

resp = requests.post(f"{url}/projects/{project_id}/assets/{resource_type}/{resource_id}")
resp.json()

{'id': 114}

In [23]:
resp = requests.get(f"{url}/projects/{project_id}/assets")
resp.json()

{'datasets': [{'id': '55a04dc0-4570-4338-8794-c82e408b3fe8',
   'timestamp': '2023-06-01T14:42:53.651316',
   'name': 'CDC COVID-19 Vaccination and Case Trends by Age Group',
   'description': 'CDC COVID-19 Vaccination and Case Trends by Age Group',
   'file_names': ['example.csv'],
   'dataset_url': None,
   'columns': [{'name': 'date',
     'data_type': 'float',
     'format_str': None,
     'annotations': {},
     'metadata': {},
     'grounding': {}},
    {'name': 'I',
     'data_type': 'float',
     'format_str': None,
     'annotations': {},
     'metadata': {},
     'grounding': {}},
    {'name': 'R',
     'data_type': 'float',
     'format_str': None,
     'annotations': {},
     'metadata': {},
     'grounding': {}},
    {'name': 'D',
     'data_type': 'float',
     'format_str': None,
     'annotations': {},
     'metadata': {},
     'grounding': {}},
    {'name': 'V',
     'data_type': 'float',
     'format_str': None,
     'annotations': {},
     'metadata': {},
     'groun

## Dataset file upload/download

Let's upload the associated file with the dataset. First we need to get a pre-signed url to upload the file:

In [24]:
query = {'filename': dataset['file_names'][0]}
resp = requests.get(f"{url}/datasets/{dataset_id}/upload-url", params=query)
upload_url = resp.json()['url']
resp.json()

{'url': 'http://minio:9000/askem-staging-data-service/datasets/55a04dc0-4570-4338-8794-c82e408b3fe8/example.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=miniouser%2F20230601%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230601T144311Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=f03f4050013e6e8ea42d58b90e831e5424692750999b6b1f74fc0dff159f5c51',
 'method': 'PUT'}

Since the signing URL must be exact and the URL is the name of the `minio` container within the Docker network, you have 2 options:

- use the `PUT` command from within the Docker network
- updated `/etc/hosts` with `127.0.0.1 minio` so that you can talk directly to `minio`

> Note: in production this really won't be relevant since we'll be using S3, this is just for local development.

In [25]:
with open('example.csv', 'rb') as file:
    resp = requests.put(upload_url, data=file)

We can now download the file by obtaining a pre-signed url for download:

In [26]:
query = {'filename': dataset['file_names'][0]}
resp = requests.get(f"{url}/datasets/{dataset_id}/download-url", params=query)
download_url = resp.json()['url']
resp.json()

{'url': 'http://minio:9000/askem-staging-data-service/datasets/55a04dc0-4570-4338-8794-c82e408b3fe8/example.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=miniouser%2F20230601%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230601T144319Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=e13e77a65361855b3b75ae357bed72146fe6647aa75b18ec01e77cdf09062c99',
 'method': 'GET'}

In [27]:
resp = requests.get(download_url)
resp.text[:1000]

'date,I,R,D,V,H,I_0-9,I_10-19,I_20-29,I_30-39,I_40-49,I_50-59,I_60-69,I_70-79,N_0-9,N_10-19,N_20-29,N_30-39,N_40-49,N_50-59,N_60-69,N_70-79,N_80-89,N_90-99,N_100+\n2020-12-13,3003438,13294197,253207,30817,0,75981.0,176278.0,299356.0,271365.0,248207.0,238680.0,170763.0,94649.0,40795692,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160\n2020-12-14,3066564,13444501,254791,35355,0,76432.0,176644.0,297050.0,270940.0,248120.0,239206.0,171200.0,95561.0,40795692,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160\n2020-12-15,3074815,13635500,257164,81231,0,77189.0,177292.0,296908.0,271442.0,248301.0,239795.0,172448.0,96284.0,40795692,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160\n2020-12-16,3144789,13842160,259933,236049,0,76929.0,176959.0,293334.0,269014.0,247213.0,238855.0,171159.0,95813.0,40795692,41765338,45180252,45598625,40561654,42008241,39723180,25830576,10732464,2704216,98160\