Skip to content

Latest commit

 

History

History
100 lines (66 loc) · 4.08 KB

wikipathways.i.md

File metadata and controls

100 lines (66 loc) · 4.08 KB

WikiPathways

| License | CCZero |

WikiPathways is a database with machine-readable models of biological processes for human and multiple other species [Q21092742,Q102205677]. It comes with a SPARQL endpoint with a human-oriented interface at sparql.wikipathways.org [Q26261238].

WikiPathways RDF has two parts. The first is the GPMLRDF which is an RDF representation of the Graphical Pathway Markup Language (GPML) in which the biological pathways are stored in the database. The second is the WPRDF which is the represented biological knowledge [Q26261238,Q111656837]. This chapter focuses on the WPRDF only.

Figure of simplified RDF schema:

Entities

The RDF contains all pathways, their datanodes (genes, proteins, metabolites, etc.), author information, molecular descriptors, and more. The main classes are:

  • Pathway: a biological pathway
  • GeneProduct: can be a gene, strand of RNA, and a protein.
  • Rna: RNA, e.g. miRNA.
  • Protein: a protein. Post-translational modifications can be indicated with states
  • Metabolite: metabolites, ions, and other small molecules. It includes peptides.
  • Interaction: can be a lot of things: translocation, inhibition, metabolic conversions (see [Q111656837]).

In all cases, the specific meaning is not clearly defined. Each of the above types is roughly defined by the database identifies linked to the entity. For example, a UniProt identifier linked to a GeneProduct suggests the entity is actually a protein.

Data model

Because the WikiPathways RDF contains many properties of all subjects (such as pathways), we can also directly request all contents through the SPARQL query. For example, to extract the pathway title, we add ?pathway dc:title ?pathwaytitle to the SPARQL query and add ?pathwaytitle in the SELECT list. The returned table upon running the query will get wider, so you might need to scroll to the right to see it all.

Example queries

The simplest SPARQL queries to explore RDF is to retrieve full lists of subjects of a particular type, which is frequently defined with the predicate rdfs:type or a which can be used interchangably. See the below example of listing all pathways.

pathways

The list is long and this is the first five:

pathways

Asking information for a specific pathway

With this exercise, the RDF will be explored a little more extensively. By combining statements in the RDF query, we can link multiple subjects and filter for content that we want to get back from the service. Important: when filtering for a literal (gene label, organism, etc.) the literal should have the following format: "text"^^xsd:string. For example, the next query returns the title for pathway with ID WP4846:

pathwayWP4846

Which returns the following title:

pathwayWP4846

A lipid pathway

For example, we can ask a list of pathways describing the biology of oxygenated hydrocarbons (LMFA12):

lipidPathways

This gives:

lipidPathways

A federated SPARQL query

This final example adds an extra level of difficulty by linking the AOP-Wiki RDF with another database through SPARQL (this is called a Federated SPARQL query). In this exercise we will explore the connection between WikiPathways and AOP-Wiki (see this chapter).

The SPARQL query will need to contain a SERVICE function and the final query will have the following structure:

PREFIX aopo: <http://vocabularies.wikipathways.org/wp#>
SELECT [variables] WHERE {
  [query WikiPathways]
  SERVICE <https://aopwiki.rdf.bigcat-bioinformatics.org/sparql> {
    [query AOP-Wiki]
  }
}

References