Skip to content
jctoledo edited this page Feb 3, 2012 · 26 revisions

The linked data that forms part of Bio2RDF ascribes a to simple set modeling patterns that permit the different datasets to syntactically interoperate.

Table of Contents

Identifiers

Entities

The first step of the RDFization process involves using a consistent identifier scheme so that we can syntactically integrate data across the Bio2RDF network. Bio2RDF identifiers are given by the following URI pattern:

http://bio2rdf.org/''namespace'':''identifier''

where the namespace is a short name listed in our dataset registry that uniquely identifies the source (dataset/database). The identifier is the (alpha)numeric string assigned to identify that entity. For instance, the gene identified by the number 15275 in the NCBI EntrezGene Database (namespace = geneid) has the following identifier:

 <code>http://bio2rdf.org/''geneid'':''15275''</code>

Vocabulary

The Bio2RDF URI scheme is applied not just to data entries, but also for the vocabulary (types and relations) to describe these entries.

 <code>http://bio2rdf.org/''namespace''_term:''term''</code>

For example, the gene identified by geneid:15275 is a kind of Gene, as defined by Entrez Gene.

 <code>http://bio2rdf.org/''geneid''_term:''Gene''</code>

Descriptions

Minimum Annotations

Each resource should contain the following annotations:

 <code>http://purl.org/dc/terms/title</code> 
 a human readable title as it appears in the source data.
 <code>http://purl.org/dc/terms/identifier</code>
 a string that contains the identifier using the following pattern <namespace>:<identifier>
 <code>rdfs:label</code>
 a Bio2RDF generated label containing a title followed by the identifier "title [ns:id]". 

Used by convention in most RDF browsers to render the name of resource instead of its URI.

Taken together,

 <code>
  geneid:15275 
   rdfs:label "Hk1 [geneid:15275]" ;
   dc:title "Hk1" ;
   dc:identifier "geneid:15275" ;
   rdf:type geneid_term:Gene .
 </code>

Datasets, Records and Entities

We recognize a minimum of 3 entities found in biological information resources: physical entities, records and datasets.

1. Record

Records are information objects that contain a set of statements, primarily about the subject.

 <code>
  namespace_record:identifier
    bio2rdf_term:has-primary-subject namespace:identifier .
 </code>
 <code>
  namespace:identifier
   bio2rdf_term:is-described-by namespace_record:identifier .
 </code>

2. Dataset Datasets are collections of records.

 <code>
  bio2rdf_dataset:<namespace>
    bio2rdf_term:has-item namespace_record:identifer .
 </code>

Since datasets can be versioned, we

 <code>
  bio2rdf_dataset:namespace.version
    dc:hasVersion "13" ;
    dc:partOf bio2rdf_dataset:namespace .
 </code>

Mappings

this section is about how to create mappings from your dataset specific vocabulary to SIO.

Ontologies

Scripts

:Category:Scripts

Serialization

Loading

Loading the RDF database