Skip to content

Commit

Permalink
Merge pull request #139 from c3g/update-docs
Browse files Browse the repository at this point in the history
Update readthedocs and readme
  • Loading branch information
davidlougheed committed Jul 1, 2020
2 parents 13e2318 + 7b76f75 commit 85917b6
Show file tree
Hide file tree
Showing 6 changed files with 130 additions and 21 deletions.
10 changes: 4 additions & 6 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,7 @@ formats:
- htmlzip

# Optionally set the version of Python and requirements required to build your docs
#python:
# version: 3.7
# install:
# - requirements: requirements.txt
# - requirements: docs/requirements.txt
# - method: pip
python:
version: 3.7
install:
- requirements: requirements.txt
16 changes: 12 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,24 @@ under the BSD 3-clause license.
CHORD Metadata Service is a service to store epigenomic metadata.

1. Patients service handles anonymized individual’s data (individual id, sex, age or date of birth)
* Data model: aggregated profile from GA4GH Phenopackets Individual and FHIR Patient
* Data model: aggregated profile from GA4GH Phenopackets Individual, FHIR Patient and mCODE Patient.

2. Phenopackets service handles phenotypic and clinical data
* Data model: [GA4GH Phenopackets schema](https://github.com/phenopackets/phenopacket-schema)

3. CHORD service handles metadata about dataset, has relation to phenopackets (one dataset can have many phenopackets)
3. mCode service handles patient's oncology related data.
* Data model: [mCODE data elements](https://mcodeinitiative.org/)

4. Experiments service handles experiment related data.
* Data model: derived from [IHEC Metadata Experiment](https://github.com/IHEC/ihec-ecosystems/blob/master/docs/metadata/2.0/Ihec_metadata_specification.md#experiments)

5. Resources service handles metadata about ontologies used for data annotation.
* Data model: derived from Phenopackets Resource profile

6. CHORD service handles metadata about dataset, has relation to phenopackets (one dataset can have many phenopackets)
* Data model: [DATS](https://github.com/datatagsuite) + [GA4GH DUO](https://github.com/EBISPOT/DUO)

4. Rest api service handles all generic functionality shared among other services
7. Rest api service handles all generic functionality shared among other services


## REST API highlights
Expand All @@ -42,7 +51,6 @@ Phenopackets model is mapped to [FHIR](https://www.hl7.org/fhir/) using
To retrieve data in fhir append `?format=fhir` .

* Ingest endpoint: `/private/ingest`.
Example of POST body is in `chord/views_ingest.py` (`METADATA_WORKFLOWS`).


## Install
Expand Down
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
89 changes: 78 additions & 11 deletions docs/modules/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,19 @@ Introduction
============

Metadata service is a service to store phenotypic and clinical metadata about the patient and/or biosample.
Data model is partly based on `GA4GH Phenopackets schema <https://github.com/phenopackets/phenopacket-schema>`_.
The data model is partly based on `GA4GH Phenopackets schema <https://github.com/phenopackets/phenopacket-schema>`_ and
extended to support oncology-related metadata and experiments metadata.

The simplified data model of the service is below.

.. image:: ../_static/simple_metadata_service_model_v0.3.0.png
.. image:: ../_static/simple_metadata_service_model_v1.0.png


Technical implementation
------------------------

The service is implemented in Python and Django and uses PostgreSQL database to store the data.
Besides PostgreSQL the data can be indexed and queried in Elasticsearch.
Besides PostgreSQL, the data can be indexed and queried in Elasticsearch.


Architecture
Expand All @@ -22,15 +23,27 @@ Architecture
The Metadata Service contains several services that share one API.
Services depend on each other and are separated based on their scope.

**1. Patients service** handles anonymized individual’s data (e.g. individual id, sex, age or date of birth)
**1. Patients service** handles anonymized individual’s data (e.g. individual id, sex, age, or date of birth).

- Data model: aggregated profile from GA4GH Phenopackets Individual and FHIR Patient. It contains all fields of Phenopacket Individual and additional fields from FHIR Patient.
- Data model: aggregated profile from GA4GH Phenopackets Individual, FHIR Patient, and mCODE Patient. It contains all fields of Phenopacket Individual and additional fields from FHIR and mCODE Patient.

**2. Phenopackets service** handles phenotypic and clinical data
**2. Phenopackets service** handles phenotypic and clinical data.

- Data model: GA4GH Phenopackets schema. Currently contains only two out of four Phenopackets top elements - Phenopacket and Interpretation.

**3. CHORD service** handles granular metadata about dataset (e.g. description, where the dataset is located, who are the creators of the dataset, licenses applied to the dataset,
**3. mCode service** handles patient's oncology-related data.

- Data model: mCODE data elements. mCODE data elements grouped in a mCodepacket (like Phenopacket) containing patient's cancer-related descriptions including genomics data, medication statements, and cancer-related procedures.

**4. Experiments service** handles experiment related data.

- Data model: derived from IHEC metadata `Experiment specification <https://github.com/IHEC/ihec-ecosystems/blob/master/docs/metadata/2.0/Ihec_metadata_specification.md#experiments>`_.

**5. Resources service** handles metadata about ontologies used for data annotation.

- Data model: derived from the Phenopackets schema Resource profile.

**6. CHORD service** handles granular metadata about dataset (e.g. description, where the dataset is located, who are the creators of the dataset, licenses applied to the dataset,
authorization policy, terms of use).
The dataset in the current implementation is one or more phenopackets related to each other through their provenance.

Expand All @@ -40,20 +53,23 @@ The dataset in the current implementation is one or more phenopackets related to
- GA4GH DUO is used to capture the terms of use applied to a dataset.


**4. Restapi service** handles all generic functionality shared among other services (e.g. renderers, common serializers, schemas, validators)
**7. Restapi service** handles all generic functionality shared among other services (e.g. renderers, common serializers, schemas, validators)


Metadata standards
------------------

`Phenopackets schema <https://github.com/phenopackets/phenopacket-schema>`_ is used for phenotypic description of patient and/or biosample.

`mCODE data elements <https://mcodeinitiative.org/>`_ are used for oncology-related description of patient.

`DATS standard <https://github.com/datatagsuite>`_ is used for dataset description.

`DUO ontology <https://github.com/EBISPOT/DUO>`_ is used for describing terms of use for a dataset.

`Phenopackets on FHIR Implementation Guide <https://aehrc.github.io/fhir-phenopackets-ig/>`_ is used to map Phenopackets elements to `FHIR <https://www.hl7.org/fhir/>`_ resources.

`IHEC Metadata Experiment <https://github.com/IHEC/ihec-ecosystems/blob/master/docs/metadata/2.0/Ihec_metadata_specification.md#experiments>`_ is used for describing an experiment.

REST API highlights
-------------------
Expand All @@ -68,17 +84,22 @@ REST API highlights

- Other available renderers:

- Currently the following classes can be retrieved in FHIR format by appending :code:`?format=fhir`: Phenopacket, Individual, Biosample, PhenotypicFeature, HtsFile, Gene, Variant, Disease, Procedure.
- Currently, the following classes can be retrieved in FHIR format by appending :code:`?format=fhir`: Phenopacket, Individual, Biosample, PhenotypicFeature, HtsFile, Gene, Variant, Disease, Procedure.

- JSON-LD context to schema.org provided for the Dataset class in order to allow for a Google dataset search for Open Access Data: append :code:`?format=json-ld` when querying dataset endpoint.

- Dataset description can also be retrieved in RDF format: append :code:`?format=rdf` when querying the dataset endpoint.

**Data ingest**

Currently only the data that follow Phenopackets schema can be ingested.
Ingest workflows are implemented for different types of data within the service.
Ingest endpoint is :code:`/private/ingest`.
Example of POST request body:

**1. Phenopackets data ingest**

The data must follow Phenopackets schema in order to be ingested.

Example of Phenopackets POST request body:

.. code-block::
Expand Down Expand Up @@ -111,7 +132,53 @@ Example of POST request body:
}
}
**2. mCode data ingest**

mCODE data elements are based on FHIR datatypes.
Only mCode related profiles will be ingested.
It's expected that the data is compliant with FHIR Release 4 and provided in FHIR Bundles.

Example of mCode FHIR data POST request body:

.. code-block::
{
"table_id":"table_unique_id",
"workflow_id":"mcode_fhir_json",
"workflow_params":{
"mcode_fhir_json.json_document":"/path/to/data.json"
},
"workflow_outputs":{
"json_document":"/path/to/data.json"
}
}
**3. FHIR data ingest**

At the moment there is no implementation guide from FHIR to Phenopackets.
FHIR data will only be ingested partially where it's possible to establish mapping between FHIR resource and Phenopackets element.
The ingestion works for the following FHIR resources: Patient, Observation, Condition, Specimen.
It's expected that the data is compliant with FHIR Release 4 and provided in FHIR Bundles.

.. code-block::
{
"table_id": "table_unique_id",
"workflow_id": "fhir_json",
"workflow_params": {
"fhir_json.patients": "/path/to/patients.json",
"fhir_json.observations": "/path/to/observations.json",
"fhir_json.conditions": "/path/to/conditions.json",
"fhir_json.specimens": "/path/to/specimens.json"
},
"workflow_outputs": {
"patients": "/path/to/patients.json",
"observations": "/path/to/observations.json",
"conditions": "/path/to/conditions.json",
"specimens": "/path/to/specimens.json"
}
}
Elasticsearch index (optional)
Expand Down
18 changes: 18 additions & 0 deletions docs/modules/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,24 @@ Patients service
.. automodule:: chord_metadata_service.patients.models
:members:

Mcode service
-------------------

.. automodule:: chord_metadata_service.mcode.models
:members:

Experiments service
-------------------

.. automodule:: chord_metadata_service.experiments.models
:members:

Resources service
-------------------

.. automodule:: chord_metadata_service.resources.models
:members:

CHORD service
-------------------

Expand Down
18 changes: 18 additions & 0 deletions docs/modules/views.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,24 @@ Patients service
.. automodule:: chord_metadata_service.patients.api_views
:members:

Mcode service
-------------------

.. automodule:: chord_metadata_service.mcode.api_views
:members:

Experiments service
-------------------

.. automodule:: chord_metadata_service.experiments.api_views
:members:

Resources service
-------------------

.. automodule:: chord_metadata_service.resources.api_views
:members:

CHORD service
-------------------

Expand Down

0 comments on commit 85917b6

Please sign in to comment.