# Research Object Composer tutorial

This is a [Jupyter Notebook](https://jupyter.org/) demonstrating how a client can use the [Research Object Composer](https://github.com/researchobject/research-object-composer) REST API.

For requirements to run this notebook interactively, see the [README](https://github.com/ResearchObject/research-object-composer/blob/master/README.md). 

The [RO Composer API](https://researchobject.github.io/research-object-composer/api/) is documented using [Swagger OpenAPI](https://swagger.io/docs/specification/about/) 2.0, which means the REST API can be integrated into programming languages, however this notebook uses [Python](https://www.python.org/) to not hide too much of the HTTP details.

To execute each cell when running this notebook, select each in order, then click the **▶️Run** button above.

## Python requirements

For the below examples we'll use the Python library [requests](https://pypi.org/project/requests/) to show the HTTP  interactions. Below assumes a basic knowledge of [REST services](https://en.wikipedia.org/wiki/Representational_state_transfer).

If the below `import` does not work, try on the command line where you started Jupyter Notebook: `pip install requests`

In [2]:
import requests

RO Composer is meant to be installed on a local infrastructure or as a cloud service. The below uses a demo service hosted by The University of Manchester why is not supported and may become unavailable in the future.

If you are testing the service locally using _Docker Compose_ (see [README](https://github.com/ResearchObject/research-object-composer/blob/master/README.md)) - change below to `http://localhost:8080` or use equivalent server name if you are hosting it as a cloud service.

In [3]:
host = "http://openphacts.cs.man.ac.uk:8080"

## Profiles

The RO Composer supports creating research object for multiple **profiles**. Each profile is [defined internally](https://github.com/ResearchObject/research-object-composer/tree/master/src/main/resources/public/schemas) using [JSON Schema](https://json-schema.org/), but we can query the `/profiles` service to see which profiles are installed:


In [7]:
r = requests.get(host + "/profiles")
r.status_code

200

HTTP status code `200` means **OK**, so let's see what is the _content type_ of the result:

In [9]:
r.headers["Content-Type"]

'application/hal+json;charset=UTF-8'

The API results from RO Composer is JSON that follows the Hypertext Application Language ([HAL](http://stateless.co/hal_specification.html)) patterns for RESTful services. Let's look at the content:

In [47]:
r.json()

{'_embedded': {'researchObjectProfileList': [{'id': 1,
    'name': 'data_bundle',
    'fields': ['data', '_metadata'],
    '_links': {'self': {'href': 'http://openphacts.cs.man.ac.uk:8080/profiles/data_bundle'},
     'schema': {'href': 'http://openphacts.cs.man.ac.uk:8080/schemas/data_bundle.schema.json'},
     'researchObjects': {'href': 'http://openphacts.cs.man.ac.uk:8080/profiles/data_bundle/research_objects'}}},
   {'id': 2,
    'name': 'draft_task',
    'fields': ['input', 'workflow', 'workflow_params', '_metadata'],
    '_links': {'self': {'href': 'http://openphacts.cs.man.ac.uk:8080/profiles/draft_task'},
     'schema': {'href': 'http://openphacts.cs.man.ac.uk:8080/schemas/draft_task.schema.json'},
     'researchObjects': {'href': 'http://openphacts.cs.man.ac.uk:8080/profiles/draft_task/research_objects'}}}]},
 '_links': {'self': {'href': 'http://openphacts.cs.man.ac.uk:8080/profiles'}}}

In HAL, the `_links` section contain related REST resources, in this case only `self` which `href` is referring back to the HTTP resource we just requested. 

The `_embedded` section contains additional REST sources which properties are partially embedded. Within `researchObjectProfileList` we therefore find the different profiles supported by this service. Let's look at their `name` fields:

In [48]:
profiles = r.json()["_embedded"]['researchObjectProfileList']
[p["name"] for p in profiles]

['data_bundle', 'draft_task']

In this installation, the profile `data_bundle` is for Research Objects containing arbitrary datasets, while `draft_task` is for more specific ROs describing workflow executions. We'll look at the first in detail and see it only expects the fields `data` and `_metadata`:

In [49]:
bundle_profile = profiles[0]
bundle_profile["fields"]

['data', '_metadata']

We can request the underlying [JSON Schema](https://json-schema.org/) to see details of these fields at `/schemas/{name}` - linked to from `schema` under our profile's `_links`.

In [55]:
links = bundle_profile["_links"]
links

{'self': {'href': 'http://openphacts.cs.man.ac.uk:8080/profiles/data_bundle'},
 'schema': {'href': 'http://openphacts.cs.man.ac.uk:8080/schemas/data_bundle.schema.json'},
 'researchObjects': {'href': 'http://openphacts.cs.man.ac.uk:8080/profiles/data_bundle/research_objects'}}

In [54]:
schema = links["schema"]
schema

{'href': 'http://openphacts.cs.man.ac.uk:8080/schemas/data_bundle.schema.json'}

In [56]:
schema_response = requests.get(schema["href"])
schema_response.json()

{'$schema': 'http://json-schema.org/draft-07/schema',
 'type': 'object',
 '$baggable': {'data': '/'},
 'properties': {'_metadata': {'$ref': '/schemas/_base.schema.json#/definitions/Metadata'},
  'data': {'type': 'array',
   'items': {'$ref': '/schemas/_base.schema.json#/definitions/RemoteItem'}}}}

You may notice that the JSON Schema define the `_metadata` and `data` keys by referencing a [base schema](https://github.com/ResearchObject/research-object-composer/blob/master/src/main/resources/public/schemas/_base.schema.json) that is common for all Research Objects.  However we do not need to learn the details of the profile's JSON Schema as the RO Composer will make individual REST resources for each field.

The REST resource that collect [research objects for the given profile]((https://researchobject.github.io/research-object-composer/api/#operation/listResearchObjectsForProfile)) is at `/profiles/{name}/research_objects` and linked to from the `researchObjects` link from the profile:



In [60]:
researchObjects = links["researchObjects"]
researchObjects

{'href': 'http://openphacts.cs.man.ac.uk:8080/profiles/data_bundle/research_objects'}

This resource supports creation using [POST](https://researchobject.github.io/research-object-composer/api/#operation/createResearchObject), which the Swagger API documention says requires `name` as the identifier.