# DataFlow API walkthrough
Suhas Somnath <br>
4/6/2022 <br>
Oak Ridge National Laboratory

## 0. Prepare to use DataFlow's API:

1. Generate an API Key from DataFlow's web interface

In [2]:
api_key = "Bearer eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjoyLCJleHAiOjE2ODIzOFA4MDB9.U6QU3a9_b9z879d_iIo9e37Whopkqp9Ha08Gyu0Ep58"

2. Encrypt password(s) necessary to activate Globus endpoints securely

Here, the two Globus endpoints (DataFlow server and destination) use the same authentication

In [3]:
enc_pwd = "8yEtYvltC7RVjIz2o1EghgEZ--vpF6/dv7pkvXZwNV--suXOtctdkvPKnjrBUQoNEg=="

3. Import the ``API`` class from the ``dflow`` package.

Note that the ``ordflow`` package is now available via PyPi. So you can install the package via:

``pip install ordflow``

In [None]:
from ordflow import API

Instantiate the API object with your personal API Key:

## 1. Instantiate the API

In [5]:
api = API(api_key)

Using staging server as default


## 2. Check default settings

In [6]:
response = api.settings_get()
response

{'globus': {'destination_endpoint': '57230a10-7ba2-11e7-8c3b-22000b9923ef'},
 'transport': {'protocol': 'globus'}}

## 3. Update a default setting

Here, we will switch the destination endpoint to ``olcf#dtn`` for illustration purposes

In [7]:
response = api.settings_set("globus.destination_endpoint", 
                            "ef1a9560-7ca1-11e5-992c-22000b96db58")
response

{'globus': {'destination_endpoint': 'ef1a9560-7ca1-11e5-992c-22000b96db58'},
 'transport': {'protocol': 'globus'}}

Switching back the destination endpoint to ``cades#CADES-OR`` which is the default

In [8]:
response = api.settings_set("globus.destination_endpoint", 
                            "57230a10-7ba2-11e7-8c3b-22000b9923ef")
response

{'globus': {'destination_endpoint': '57230a10-7ba2-11e7-8c3b-22000b9923ef'},
 'transport': {'protocol': 'globus'}}

## 4. List and view registered instruments

In [9]:
response = api.instrument_list()
response

[{'id': 1,
  'name': 'Cell Cycler',
  'description': 'Coin cell Cycler with environment chamber',
  'instrument_type': None}]

In [10]:
response = api.instrument_info(1)
response

{'id': 1,
 'name': 'Cell Cycler',
 'description': 'Coin cell Cycler with environment chamber',
 'instrument_type': None}

## 5. Check to see if Globus endpoints are active:

In [11]:
response = api.globus_endpoints_active("57230a10-7ba2-11e7-8c3b-22000b9923ef")
response

{'source_activation': {'code': 'AlreadyActivated'},
 'destination_activation': {'code': 'AutoActivated.CachedCredential'}}

## 6. Activate one or both endpoints as necessary:
Because the destination wasn't already activated, we can activate that specific endpoint. 

**Note**: An encrypted password is being used in place of the conventional password for safety reasons. 

In [12]:
response = api.globus_endpoints_activate("syz", 
                                         enc_pwd, 
                                         encrypted=True, 
                                         endpoint="destination")
response

{'status': 'ok'}

In [13]:
response = api.globus_endpoints_active()
response

{'source_activation': {'code': 'AlreadyActivated'},
 'destination_activation': {'code': 'AlreadyActivated'}}

## 7. Create a measurement Dataset
This creates a directory at the destination Globus Endpoint:

In [14]:
response = api.dataset_create("My new dataset with nested metadata",
                               metadata={"Sample": "PZT", 
                                         "Microscope": {
                                             "Vendor": "Asylum Research",
                                             "Model": "MFP3D"
                                             },
                                         "Temperature": 373
                                        }
                              )
response

{'id': 19,
 'name': 'My new dataset with nested metadata',
 'creator': {'id': 2, 'name': 'Suhas Somnath'},
 'dataset_files': [],
 'instrument': None,
 'metadata_field_values': [{'id': 15,
   'field_value': 'PZT',
   'field_name': 'Sample',
   'metadata_field': None},
  {'id': 16,
   'field_value': 'Asylum Research',
   'field_name': 'Microscope-Vendor',
   'metadata_field': None},
  {'id': 17,
   'field_value': 'MFP3D',
   'field_name': 'Microscope-Model',
   'metadata_field': None},
  {'id': 18,
   'field_value': '373',
   'field_name': 'Temperature',
   'metadata_field': None}]}

Getting the dataset ID programmatically to use later on:

In [15]:
dataset_id = response['id']
dataset_id

19

## 8. Upload data file(s) to Dataset

In [17]:
response = api.file_upload("./AFM_Topography.PNG", dataset_id)
response

using Globus since other file transfer adapters have not been implemented


{'id': 50,
 'name': 'AFM_Topography.PNG',
 'file_length': 201287,
 'file_type': '',
 'created_at': '2022-04-25 22:33:53 UTC',
 'relative_path': '',
 'is_directory': False}

Upload another data file to the same dataset:

In [18]:
response = api.file_upload("./measurement_configuration.txt", dataset_id, relative_path="foo/bar")
response

using Globus since other file transfer adapters have not been implemented


{'id': 51,
 'name': 'measurement_configuration.txt',
 'file_length': 1172,
 'file_type': '',
 'created_at': '2022-04-25 22:34:01 UTC',
 'relative_path': 'foo/bar',
 'is_directory': False}

## 9. Search Dataset:

In [19]:
response = api.dataset_search("nested")
response

{'total': 2,
 'has_more': False,
 'results': [{'id': 18,
   'created_at': '2022-04-25T22:12:47Z',
   'name': 'Dataset with nested metadata 1',
   'dataset_files': [],
   'metadata_field_values': [{'id': 12,
     'field_value': 'PZT',
     'field_name': 'Sample',
     'metadata_field': None},
    {'id': 13,
     'field_value': 'Asylum Research',
     'field_name': 'Microscope-Vendor',
     'metadata_field': None},
    {'id': 14,
     'field_value': 'MFP3D',
     'field_name': 'Microscope-Model',
     'metadata_field': None}]},
  {'id': 19,
   'created_at': '2022-04-25T22:31:07Z',
   'name': 'My new dataset with nested metadata',
   'dataset_files': [{'id': 50,
     'name': 'AFM_Topography.PNG',
     'file_length': 201287,
     'file_type': '',
     'created_at': '2022-04-25 22:33:53 UTC',
     'relative_path': '',
     'is_directory': False},
    {'id': 51,
     'name': 'measurement_configuration.txt',
     'file_length': 1172,
     'file_type': '',
     'created_at': '2022-04-25 22:34:

Parsing the response to get the dataset of interest for us:

In [20]:
dset_id = response['results'][1]['id']
dset_id

19

## 10. View this Dataset:
This view shows both the files and metadata contained in a dataset:

In [21]:
response = api.dataset_info(dset_id)
response

{'id': 19,
 'name': 'My new dataset with nested metadata',
 'creator': {'id': 2, 'name': 'Suhas Somnath'},
 'dataset_files': [{'id': 50,
   'name': 'AFM_Topography.PNG',
   'file_length': 201287,
   'file_type': '',
   'created_at': '2022-04-25 22:33:53 UTC',
   'relative_path': '',
   'is_directory': False},
  {'id': 51,
   'name': 'measurement_configuration.txt',
   'file_length': 1172,
   'file_type': '',
   'created_at': '2022-04-25 22:34:01 UTC',
   'relative_path': 'foo/bar',
   'is_directory': False},
  {'id': 52,
   'name': 'foo',
   'file_length': None,
   'file_type': None,
   'created_at': '2022-04-25 22:34:01 UTC',
   'relative_path': '',
   'is_directory': True},
  {'id': 53,
   'name': 'bar',
   'file_length': None,
   'file_type': None,
   'created_at': '2022-04-25 22:34:01 UTC',
   'relative_path': 'foo',
   'is_directory': True}],
 'instrument': None,
 'metadata_field_values': [{'id': 15,
   'field_value': 'PZT',
   'field_name': 'Sample',
   'metadata_field': None},
 

## 11. View files uploaded via DataFlow:
We're not using DataFlow here but just viewing the destination file system.

Datasets are sorted by date:

In [None]:
! ls -hlt ~/dataflow/untitled_instrument/

There may be more than one dataset per day. Here we only have one

In [None]:
!ls -hlt ~/dataflow/untitled_instrument/2022-04-06/

Viewing the root directory of the dataset we just created:

In [None]:
!ls -hlt ~/dataflow/untitled_instrument/2022-04-06/135750_atomic_force_microscopy_scan_of_pzt/

We will very soon be able to specify root level metadata that will be stored in ``metadata.json``.

We can also see the nested directories: ``foo/bar`` where we uploaded the second file:

In [None]:
!ls -hlt  ~/dataflow/untitled_instrument/2022-04-06/135750_atomic_force_microscopy_scan_of_pzt/foo/

Looking at the inner most directory - ``bar``:

In [None]:
!ls -hlt ~/dataflow/untitled_instrument/2022-04-06/135750_atomic_force_microscopy_scan_of_pzt/foo/bar