# Tiled Python Client Demonstration

Demonstrate examples of a Python client accessing a tiled data server (running on `localhost`).  The server provides two databroker catalogs (`bdp2022` and `20idb_usaxs`) and some selected data files from APS beam lines in a nested directory known to the tiled data server as `directory`.

Show two types of Python client:

- The Python [`requests`](https://requests.readthedocs.io/) package
  (instead of [`urllib.request`](https://docs.python.org/3.11/library/urllib.request.html)
  from the Python Standard Library).
  Python programmers with at least intermediate experience may have already
  used `requests` (for reasons
  best [summarized](https://stackoverflow.com/questions/2018026) by others.)
  to *scrape* information from a web page.
- TODO: The [`tiled.client`](https://blueskyproject.io/tiled/reference/python-client.html)
  Python library from the Bluesky Framework.  The `tiled.client` makes it easier to work
  directly with data structures such as numpy arrays, pandas DataFrames, and xarray tables.

For each type of client, show some specific queries and responses.

* [x] Find all runs in a catalog between these two ISO8601 dates.
* [x] Find run(s) which match given metadata.
* [x] Get overall metadata from given run.
* [x] What are the data streams in this run?
* [x] What is the metadata for this stream?
* [x] Get the data from the data stream named primary (the canonical main data).

## Structure of Bluesky's databroker data

Bluesky data is structured in databroker into three distinct levels which will be described below.

- databroker *catalog* : a set of bluesky *runs*
- bluesky *run* : data (& metadata) for a single run, including zero or more *streams*
- run *stream* : a set of related data acquisition events (including metadata)

### databroker *catalog*

The Bluesky `databroker` stores data from a Bluesky instrument in a [MongoDB *collection*](https://www.mongodb.com/docs/manual/core/databases-and-collections/) which is called a *catalog*.  A single MongoDB server may host several databroker catalogs.  Tiled will access a catalog with a configuration such as this template called `45id_instrument`:

```
  - path: 45id_instrument
    tree: databroker.mongo_normalized:Tree.from_uri
    args:
      uri: mongodb://DB_SERVER.xray.aps.anl.gov:27017/45id_instrument-bluesky
```

The *catalog* contains data & metadata from all the measurement *run*s.

Note: The MongoDB documentation also describes:

> In MongoDB, *databases* hold one or more collections of documents.

To avoid further name confusion, this will be the only time we speak of a MongoDB *database*.

### bluesky *run*

A bluesky [*run*](https://blueskyproject.io/bluesky/documents.html#overview-of-a-run) contains all the data acquired for a single measurement sequence (such as a step scan).  It is stored by databroker as a set of JSON documents.

The [documents](https://blueskyproject.io/event-model/external.html#the-documents) are structured according to the [Bluesky Event Model](https://blueskyproject.io/event-model), briefly summarized here.

The `start` document contains metadata keys describing the run as it begins, such as the assigned [`uid`](https://docs.python.org/3/library/uuid.html#uuid.uuid4) (used by databroker to quickly locate the run) and the `time` (number of seconds since 1970-01-01 00:00:00 UTC).  Additional metadata keys may be supplied by the acquisition sequence code (the *plan*), the instrument, and/or the user.

The `stop` document contains metadata keys describing how the run ended, including any data streams that were recorded.

The tiled server adds an additional `summary` metadata key to the run that includes content from both `start` and `end`, including a `datetime` key that is a text version of the date & time when the run started.

There are a couple other document types, used for [external file resources](https://blueskyproject.io/bluesky/hardware.html#external-asset-writing-interface) such as area detector images.

### bluesky data *stream*

During a bluesky run, data is acquired into streams, each with its own name.  It is common for the main data to be acquired.  A stream will have one or more descriptor documents and zero or more `event` documents.  The `descriptor` documents, combined by the tiled server into the stream's metadata, describe the signals measured in the stream.  The `event` documents contain each of the time-stamped data acquisition events in the stream.  The tiled server gathers these events together and provides data arrays organized by the signal name.

### metadata

As described above, there are two types of metadata for each run, the *run* metadata and the *stream* metadata.  See the two sections immediately preceding for details.

# `requests` Client

We'll use the `requests` package, to search the tiled server's API using tiled's `http://` interface by assembling a URI string.  Since the URI is a string consisting of several parts, we'll build that string up from its parts in each example below.

The tiled server response is [JSON](https://www.json.org/) formatted, which is readable as a Python dictionary.  We'll let Python handle and report any Exceptions that might occur.  Here, we just import the `requests` package.

The calls will look like:

```py
r = requests.get(uri).json()
```

where `r` is a Python dictionary of the results of the search request.

In [1]:
import requests

As a convenience, make a function that converts a string representation of the date and time in ISO-8601 format into the Linux EPOCH floating-point representation needed for tiled's API.  This makes a long function call much shorter.

In [2]:
import datetime

def iso_to_ts(isotime):
    return datetime.datetime.fromisoformat(isotime).timestamp()

We'll search the BDP project's databroker catalog, known to the tiled server (running on workstation `localhost` on port `8000`) by the text name `bdp2022`.

In [3]:
server = "localhost"
port = 8000
catalog = "bdp2022"

### Find runs within range of dates

Define the ends of the time span for the search query:

In [4]:
# Find all runs in a catalog between these two ISO8601 dates.
start_time = "2022-05-01"
end_time = "2022-11-01"
tz = "US/Central"

Using the `requests` package, ask the tiled server for all runs in the catalog that match the time range.

Here, we build up the URI suffix in parts to expose how the search query is constructed.  The response is a Python dictionary.  We won't print the entire dictionary here since it likely contains a lot of information, perhaps too much to show in full.

In [5]:
uri =(
    f"http://{server}:{port}"  # standard prefix
    "/api/v1/node/search"  # API command
    f"/{catalog}"  # catalog
    "?"  # begin any command options
    "page[limit]=0"  # 0: all matching
    "&"  # separator between any additional options
    f"filter[time_range][condition][since]={iso_to_ts(start_time)}"
    f"&filter[time_range][condition][until]={iso_to_ts(end_time)}"
    f"&filter[time_range][condition][timezone]={tz}"
    "&sort=time"
)
print(f"{uri=}")
r = requests.get(uri).json()

uri='http://localhost:8000/api/v1/node/search/bdp2022?page[limit]=0&filter[time_range][condition][since]=1651381200.0&filter[time_range][condition][until]=1667278800.0&filter[time_range][condition][timezone]=US/Central&sort=time'


Summarize the results (in object `r`):

In [6]:
def print_results_summary(r):
    """We'll use this a few times."""
    xref = dict(First=0, Last=-1)
    for k, v in dict(First=0, Last=-1).items():
        md = r["data"][v]["attributes"]["metadata"]
        # md keys: start  stop  summary
        # summary key is composed by tiled server
        plan_name = md["summary"]["plan_name"]
        scan_id = md["summary"]["scan_id"]
        started = md["summary"]["datetime"]
        print(f"{k:5s} run: {started=} {scan_id=} {plan_name=}")

In [7]:
print(f'Search of {catalog=} has {len(r["data"])} runs.')
print_results_summary(r)

Search of catalog='bdp2022' has 397 runs.
First run: started='2022-05-03T08:37:21.510276' scan_id=1596 plan_name='take_image'
Last  run: started='2022-09-08T13:54:25.178280' scan_id=1960 plan_name='push_images'


### Find runs matching a given plan name

Find run(s) which match some given metadata.  In this search, let's find all the runs that match a given `plan_name`.  Let's use the `take_image` plan from the previous results.

In [8]:
plan_name = "take_image"
print(f"Search for {plan_name=}")

uri = (
    f"http://{server}:{port}"
    "/api/v1/node/search"
    f"/{catalog}"
    "?page[limit]=0"  # 0: all matching
    "&filter[eq][condition][key]=plan_name"
    f'&filter[eq][condition][value]="{plan_name}"'
    "&sort=time"
)
print(f"{uri=}")
r = requests.get(uri).json()

Search for plan_name='take_image'
uri='http://localhost:8000/api/v1/node/search/bdp2022?page[limit]=0&filter[eq][condition][key]=plan_name&filter[eq][condition][value]="take_image"&sort=time'


In [9]:
print(f'Search of {catalog=} has {len(r["data"])} runs.')
print_results_summary(r)

Search of catalog='bdp2022' has 1029 runs.
First run: started='2022-03-22T16:48:41.130881' scan_id=1 plan_name='take_image'
Last  run: started='2022-08-30T15:06:37.662096' scan_id=1959 plan_name='take_image'


## Find runs with given text

Find run(s) that include some text in the metadata.  Here, let's find all the runs that have the text `M9 demo`.

In [10]:
search_text = "M9 demo"
case_sensitive = True
uri = (
    f"http://{server}:{port}"
    "/api/v1/node/search"
    f"/{catalog}"
    "?page[limit]=0"  # 0: all matching
    f"&filter[fulltext][condition][text]={search_text}"
    f"&filter[fulltext][condition][case_sensitive]={str(case_sensitive).lower()}"
    "&sort=time"
)

print(f"{uri=}")
r = requests.get(uri).json()

print(f'Search of {catalog=} has {len(r["data"])} runs.')
print_results_summary(r)

uri='http://localhost:8000/api/v1/node/search/bdp2022?page[limit]=0&filter[fulltext][condition][text]=M9 demo&filter[fulltext][condition][case_sensitive]=true&sort=time'
Search of catalog='bdp2022' has 375 runs.
First run: started='2022-07-15T23:14:54.974411' scan_id=1 plan_name='push_images'
Last  run: started='2022-11-23T11:17:32.495794' scan_id=2035 plan_name='m9_push_images'


### Show a run's metadata

Let's show the various metadata available from a Bluesky *run*.  We'll use the last run from the previous search.

```
http://localhost:8000/api/v1/node/metadata/bdp2022/a1233634-1259-438f-b9f0-f77c26f48f54
```

In [11]:
run = r["data"][-1]  # most recent run from previous results

The `run` object is a dictionary.  The interesting keys are:

key | content
:--- | :---
`id` | `uid` universal identifier of this `run` (used by the database)
`attributes` | contents of this `run`

The `attributes` contents are a dictionary with these interesting keys (there are other keys, as well):

key | content
:--- | :---
`metadata` | metadata dictionary of this `run`

The `metadata` dictionary has these keys:

key | content
:--- | :---
`start` | Metadata created as the run started (includes user-supplied, scan-specific, facility-specific, and bluesky metadata).  The `start` dictionary keys will vary between runs and catalogs.  Only a few are expected, including: `uid`, `time`, & `versions`.
`stop` | Metadata about how the run ended (exit status and reason if problem, stream names, end time stamp)
`summary` | tiled server provides this additional high-level summary with ISO8601 start date and run duration

Note: the run's data streams are obtained by a different query, using the run's `uid`.  Keep track of the `uid` for that reason.

To show the structure of this dictionary, we just access Python to show the object's value.

In [12]:
run["attributes"]["metadata"]

{'start': {'uid': 'a1233634-1259-438f-b9f0-f77c26f48f54',
  'time': 1669223852.495794,
  'versions': {'apstools': '1.6.9.dev89+gbf712d9',
   'bluesky': '1.10.0',
   'bluesky_queueserver': '0.0.18',
   'databroker': '1.2.5',
   'epics': '3.5.0',
   'h5py': '3.7.0',
   'matplotlib': '3.6.2',
   'numpy': '1.23.4',
   'ophyd': '1.7.0',
   'pyRestTable': '2020.0.6',
   'spec2nexus': '2021.2.4'},
  'databroker_catalog': 'bdp2022',
  'login_id': 'bdp@mona3.xray.aps.anl.gov',
  'beamline_id': 'BDP',
  'instrument_name': 'APS-U Beamline Data Pipelines project in 2022',
  'proposal_id': 'bdp2022',
  'milestone': 'BDP M9 demo',
  'pid': 61262,
  'scan_id': 2035,
  'plan_type': 'generator',
  'plan_name': 'm9_push_images',
  'purpose': 'publish image frames via PVaccess',
  'num_images': 12000,
  'frame_rate': 1000.0,
  'datetime': '2022-11-23 11:17:32.480612',
  'client': '/clhome/BDP/DM/workflows/example-06/qserver_client.py',
  'session': 'M9 demo'},
 'stop': {'run_start': 'a1233634-1259-438f-b

### What data streams are available with this run?

Use the last `take_image` run.  The stream names are in the `stop` metadata, where the number of data events is shown for each stream.

```json
                "stop": {
                    "exit_status": "success",
                    "num_events": {
                        "primary": 1
                    },
                    "reason": "",
                    "run_start": "a4edf4b3-8a12-4724-b817-fd45958488da",
                    "time": 1661889999.67191,
                    "uid": "0cb0e643-877f-41b3-96b2-7adf75bef657"
                },
```

In [13]:
uri = (
    f"http://{server}:{port}"
    "/api/v1/node/search"
    f"/{catalog}"
    "?page[limit]=0"  # 0: all matching
    "&filter[eq][condition][key]=plan_name"
    '&filter[eq][condition][value]="take_image"'
    "&sort=time"
)
print(f"{uri=}")
r = requests.get(uri).json()

stop_md = r["data"][-1]["attributes"]["metadata"]["stop"]
streams = list(stop_md["num_events"].keys())
uid = stop_md["run_start"]
print(f'Run {uid=} has {len(streams)} streams: {streams=}')

uri='http://localhost:8000/api/v1/node/search/bdp2022?page[limit]=0&filter[eq][condition][key]=plan_name&filter[eq][condition][value]="take_image"&sort=time'
Run uid='a4edf4b3-8a12-4724-b817-fd45958488da' has 1 streams: streams=['primary']


### What is the metadata for the `primary` stream of this run?

The stream metadata has information about each of the signals acquired in the stream.  PV names, units, limits, data type and shape, ... and the timestamps when each of these nuggets was received.

In [14]:
stream_name = "primary"
uri = (
    f"http://{server}:{port}"
    "/api/v1/node/metadata"
    f"/{catalog}"
    f"/{uid}"
    f"/{stream_name}"
)
print(f"{uri=}")
r = requests.get(uri).json()
print(f"{catalog=} {uid=} metadata for {stream_name=}:")
r['data']['attributes']['metadata']

uri='http://localhost:8000/api/v1/node/metadata/bdp2022/a4edf4b3-8a12-4724-b817-fd45958488da/primary'
catalog='bdp2022' uid='a4edf4b3-8a12-4724-b817-fd45958488da' metadata for stream_name='primary':


{'descriptors': [{'run_start': 'a4edf4b3-8a12-4724-b817-fd45958488da',
   'time': 1661889999.655602,
   'data_keys': {'adsimdet_image': {'shape': [1, 1024, 1024],
     'source': 'PV:bdpSimExample:',
     'dtype': 'array',
     'external': 'FILESTORE:',
     'object_name': 'adsimdet'}},
   'uid': 'f78f5d29-36dc-49e2-9e9a-8eee6120873b',
   'configuration': {'adsimdet': {'data': {'adsimdet_cam_acquire_period': 0.251,
      'adsimdet_cam_acquire_time': 0.25,
      'adsimdet_cam_image_mode': 0,
      'adsimdet_cam_manufacturer': 'Simulated detector',
      'adsimdet_cam_model': 'Basic simulator',
      'adsimdet_cam_num_exposures': 1,
      'adsimdet_cam_num_images': 1,
      'adsimdet_cam_trigger_mode': 0},
     'timestamps': {'adsimdet_cam_acquire_period': 1661889997.531498,
      'adsimdet_cam_acquire_time': 1661889997.520722,
      'adsimdet_cam_image_mode': 1661889997.5001307,
      'adsimdet_cam_manufacturer': 1661889153.029876,
      'adsimdet_cam_model': 1661889153.029901,
      'ad

### Get the data from the data stream named primary (the canonical main data).

To get the data, we need to change the type of search using `/api/v1/node/full` (so far, we have been using the default search for metadata: `/api/v1/node/search`) and specify the format of the result.  One format is `json`.  Let's pick some data from the M6 demo (with `M6-gallery` matching text) that has a very long list (1-D) of floating point numbers.

In [15]:
search_text = "M6-gallery"
case_sensitive = True
uri = (
    f"http://{server}:{port}"
    "/api/v1/node/search"
    f"/{catalog}"
    "?page[limit]=0"  # 0: all matching
    f"&filter[fulltext][condition][text]={search_text}"
    f"&filter[fulltext][condition][case_sensitive]={str(case_sensitive).lower()}"
    "&sort=time"
)
print(f"{uri=}")
r = requests.get(uri).json()

uri='http://localhost:8000/api/v1/node/search/bdp2022?page[limit]=0&filter[fulltext][condition][text]=M6-gallery&filter[fulltext][condition][case_sensitive]=true&sort=time'


Next, pick the last run (`r["data"][-1]`):

In [16]:
uid = r["data"][-1]["attributes"]["metadata"]["start"]["uid"]
data_format = "json"
stream_name = "primary"
data_name = "adpvadet_pva1_execution_time"

uri = (
    f"http://{server}:{port}"
    "/api/v1/array/full"
    f"/{catalog}"
    f"/{uid}"
    f"/{stream_name}"
    "/data"
    f"/{data_name}"
    f"?format={data_format}"
)
print(f"{uri=}")
arr = requests.get(uri).json()

uri='http://localhost:8000/api/v1/array/full/bdp2022/ae762f9c-4933-4aa4-a720-147f4aaab6fd/primary/data/adpvadet_pva1_execution_time?format=json'


In [17]:
print(f"{len(arr)=} {min(arr)=} {max(arr)=}")

len(arr)=10261 min(arr)=0.213429 max(arr)=2.024461


### Get data from a file served by *tiled*.

This file is in the catalog named (in our tiled server) as `directory`.  It happens to be a NeXus (HDF5) data file from the [NeXus documentation](https://manual.nexusformat.org/examples/python/index.html#simple-example-plot).

First find the signal names.  Since this is not a `BlueskyRun`, the results dictionary is a bit different.  Instead of a `data` key (as above), the information we seek is in the `contents` key.

In [18]:
uri = "http://localhost:8000/api/v1/node/full/directory/hdf5/writer_1_3.h5/Scan/data?format=json"
print(f"{uri=}")
r = requests.get(uri).json()
r["contents"]

uri='http://localhost:8000/api/v1/node/full/directory/hdf5/writer_1_3.h5/Scan/data?format=json'


{'counts': {'contents': {},
  'metadata': {'axes': 'two_theta', 'signal': '1', 'units': 'counts'}},
 'two_theta': {'contents': {}, 'metadata': {'units': 'degrees'}}}

Get the `counts` array.

In [19]:
uri = "http://localhost:8000/api/v1/array/full/directory/hdf5/writer_1_3.h5/Scan/data/counts?format=application/json&slice=0:31"
print(f"{uri=}")
counts = requests.get(uri).json()

uri='http://localhost:8000/api/v1/array/full/directory/hdf5/writer_1_3.h5/Scan/data/counts?format=application/json&slice=0:31'


Get the `two_theta` array.

In [20]:
uri = "http://localhost:8000/api/v1/array/full/directory/hdf5/writer_1_3.h5/Scan/data/two_theta?format=application/json&slice=0:31"
print(f"{uri=}")
tth = requests.get(uri).json()

uri='http://localhost:8000/api/v1/array/full/directory/hdf5/writer_1_3.h5/Scan/data/two_theta?format=application/json&slice=0:31'


Print the data in two columns.

In [21]:
print("two_theta  counts")
for x, y in zip(tth, counts):
    print(f"{x}  {y}")

two_theta  counts
17.92608  1037
17.92591  1318
17.92575  1704
17.92558  2857
17.92541  4516
17.92525  9998
17.92508  23819
17.92491  31662
17.92475  40458
17.92458  49087
17.92441  56514
17.92425  63499
17.92408  66802
17.92391  66863
17.92375  66599
17.92358  66206
17.92341  65747
17.92325  65250
17.92308  64129
17.92291  63044
17.92275  60796
17.92258  56795
17.92241  51550
17.92225  43710
17.92208  29315
17.92191  19782
17.92175  12992
17.92158  6622
17.92141  4198
17.92125  2248
17.92108  1321


## `tiled.client` Client

In [22]:
from tiled.client import from_uri
from tiled.client.cache import Cache
import tiled.queries
from tiled.utils import tree

TODO

Until then, look at the [Python code file](./pyapi_client.py), used for development.