-
Notifications
You must be signed in to change notification settings - Fork 11
Home
GMQL is a closed algebra over datasets: each operation applies to one or two datasets including many samples; the result of each operation is a new dataset, also including many samples. Thus, operations build new regions and trace the provenance of each resulting sample, by computing the sample metadata. It is inspired by Pig Latin (a language in the Hadoop family) and targeted to cloud.
A GMQL script is expressed as a sequence of GMQL operations with the following structure:
<dataset> = operation(<parameters>) <datasets>
where each dataset stands for a Genomic Data Model (GDM) dataset. Operations are either unary (with one input dataset), or binary (with two input datasets), and construct one result dataset.
For Quick Start please refer to:
For detailed GMQL language documentation:
For a look at GDMS architecture:
For programmatical importing of GDMS kernel JARs in Scala applications and programmatically scripting GMQL in Scala:
For more information about GDMS repository architecture and repository manager:
GDMS repository is based on a dataset notion, for more information about the data module and GDM dataset architecture:
Shell API is provided for GDMS repository, to list datasets, add, delete, alter datasets in GDMS repository:
The first step in the installation is to understand the engine configurations, currently, we have two sets of configurations. One set of configurations for the repository and the other for the executor.