Skip to content

Irtazaraza/biograkn

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BioGrakn DN - Disease Networks (DN)

BioGrakn DN is a single knowledge graph of biomedical data describing disease networks, ingested from Uniprot, Reactome, DGIdb, DisGeNET, HPA-Tissue, EBI IntAct, Kaneko, Gene Expression Omnibus (GSE27876, GSE43696, GSE63142) and TissueNet.

BioGrakn DN provides an intuitive way to query interconnected and heterogeneous biomedical data in one single place. The schema that models the underlying knowledge graph alongside the descriptive query language, Graql, makes writing complex queries an extremely straightforward and intuitive process. Furthermore, the automated reasoning capability of Grakn, allows BioGrakn DN to become an intelligent database of biomedical data that infers implicit knowledge based on the explicitly stored data. BioGrakn DN can understand biological facts, infer based on new findings and enforce research constraints, all at query (run) time.

Quickstart

  1. Download the latest release (size: 2.5 GB).
  2. Unzip the downloaded file.
  3. cd into the unzipped folder, via terminal or command prompt.
  4. run ./grakn server start

Interacting With BioGrakn DN

Queries can be run over BioGrakn DN, via Graql Console, Grakn Clients and Grakn Workbase.

Via Grakn Workbase

Download the latest release of Grakn Workbase, install and run it.

Read the documentation on Workbase or watch a short series of videos about using workbase with the Grakn <> BLAST integration example.

Via Grakn Console

While inside the unzipped folder, via terminal or command prompt, run: ./graql console -k biograkn_dn. The console is now ready to answer your queries.

Via Grakn Clients

Grakn Clients are available for Java, Node.js and Python. Using these clients, you will be able to perform read and write operations over BioGrakn DN. See an example of how this is done in the Grakn <> BLAST integration example, using the Python client.

Understanding the Schema

The schema for the BioGrakn DN knowledge graph defines how the knowledge graph is modelled to represent the reality of its dataset. To understand the underlying data structure, you may read through the schema.gql or view the visualised schema.

Example Queries

Which protein(s) are encoded by the gene with entrez-id of 100137049?

match
  $gpe (encoding-gene: $ge, encoded-protein: $pr) isa gene-protein-encoding;
  $ge isa gene has entrez-id "100137049";
limit 10; get;

Proteins encoded by gene with entrez-id of 100137049

Which diseases affect the appendix tissue?

Note that the data to answer this question is not explicitly stored in the knowledge graph. The protein-disease-association-and-tissue-enhancement-implies-disease-tissue-association Rule enables us to get the answer to this question using the following query.

match
  $ti isa tissue has tissue-name "appendix";
  $dta (associated-disease: $di, associated-tissue: $ti) isa disease-tissue-association;
limit 10; get;

Disease that affect appendix tissue

What are the proteins associated with Asthma?

Note that the data to answer this question is not explicitly stored in the knowledge graph. The gene-disease-association-and-gene-protein-encoding-protein-disease-association Rule enables us to get the answer to this question using the following query.

match
  $di isa disease has disease-name "Asthma";
  $dda (associated-protein: $pr, associated-disease: $di) isa protein-disease-association;
limit 10; get;

Proteins associated with Asthma

Which diseases are associated with protein interactions taking place in the liver?

This query also makes use of the gene-disease-association-and-gene-protein-encoding-protein-disease-association Rule.

match
  $ti isa tissue, has tissue-name "liver";
  $pr isa protein;
  $pr2 isa protein;
  $pr != $pr2;
  $di isa disease;
  $pl (tissue-context: $ti, biomolecular-process: $ppi) isa process-localisation;
  $ppi (interacting-protein: $pr, interacting-protein: $pr2) isa protein-protein-interaction;
  $pda (associated-protein: $pr, associated-disease: $di) isa protein-disease-association;
limit 30; get;

Diseases associated to protein interactions taking place in liver

Which drugs and diseases are associated with the same differentially expressed gene from comparisons made in geo-series with id of GSE27876?

match
  $geo-se isa geo-series has GEOStudy-id "GSE27876";
  $comp (compared-groups: $geo-comp, containing-study: $geo-se) isa comparison;
  $def (conducted-analysis: $geo-comp, differentially-expressed-gene: $ge) isa differentially-expressed-finding;
  $dgi (target-gene: $ge, interacted-drug: $dr) isa drug-gene-interaction;
  $gda (associated-gene: $ge, associated-disease: $di) isa gene-disease-association;
limit 10; get;

Diseases and drugs associated with differentially expressed gene from comparisons made in geo-series with id of GSE27876

References