This repository stores the LinkML representation of the CRDC Harmonized Data Model (CRDC-H) produced by the Center for Cancer Data Harmonization (CCDH).
This repository includes the LinkML model itself (in YAML format) as well as a number of artifacts produced automatically by LinkML, including a JSON Schema, JSON-LD context, a GraphQL description, a CSV description and ShEx validation shapes.
Model documentation in Markdown can also be generated for this repository, and is currently hosted on GitHub Pages at https://cancerdhc.github.io/ccdhmodel/. A set of Python Data Classes can also be generated and are available for use. Examples of their use are available in the Example Data repository.
All artifacts can be generated by running make
in this repository. make clean
will delete generated existing artifacts, allowing them to be regenerated. This Makefile uses Poetry to manage dependencies.
We use mike to publish documentation to GitHub Pages. Use mike deploy [version] -p
to push a new version of the documentation to Google Pages (via the gh-pages
branch). mike deploy [version] latest -p -u
can be used to indicate that the uploaded version should be used as the latest
version, which will be displayed by default.
The CRDC-H model is currently in development on a Google Sheet,
which is converted into a LinkML schema in ./model/schema/crdch_model.yaml. If you
would like to use the latest, in-development version of the schema as described in Google Sheets, you will need to
use the sheet2linkml package to regenerate this schema to regenerate this
file by running make generate-model
.
In order to read a Google Sheet, sheet2linkml will need access to the Google Sheets API
in the Google Developers Console.
Detailed instructions and screenshots are available from
the pygsheets
documentation, which is the package sheet2linkml uses to access
Google Sheets. Save the file as google_api_credentials.json
in the root directory of this project. The first time you
run make generate-model
, you will see a browser page asking you to log in. Follow the instructions. The script will
download a token and store it locally. You will not need to log in when rerunning this command.