Skip to content
jctoledo edited this page Feb 3, 2012 · 26 revisions

The linked data that forms part of Bio2RDF ascribes a to simple set modeling patterns that permit our different datasets to syntactically interoperate. Here we present through examples how add data.

Table of Contents

Identifiers

Entities

The first step of the RDFization process involves using a consistent identifier scheme so that we can syntactically integrate data across the Bio2RDF network. Bio2RDF identifiers are given by the following URI pattern:

http://bio2rdf.org/''namespace'':''identifier''

where the namespace is a short name listed in our dataset registry that uniquely identifies the source (dataset/database). The identifier is the (alpha)numeric string assigned to identify that entity. For instance, the gene identified by the number 15275 in the NCBI EntrezGene Database (namespace = geneid) has the following identifier:

 <code>http://bio2rdf.org/''geneid'':''15275''</code>

Vocabulary

The Bio2RDF URI scheme is applied not just to data entries, but also for the vocabulary (types and relations) to describe these entries.

 <code>http://bio2rdf.org/''namespace''_term:''term''</code>

For example, the gene identified by geneid:15275 is a kind of Gene, as defined by Entrez Gene.

 <code>http://bio2rdf.org/''geneid''_term:''Gene''</code>

Descriptions

Minimum Annotations

Each resource should contain the following annotations:

 <code>http://purl.org/dc/terms/title</code> 
 a human readable title as it appears in the source data.
 <code>http://purl.org/dc/terms/identifier</code>
 a string that contains the identifier using the following pattern <namespace>:<identifier>
 <code>rdfs:label</code>
 a Bio2RDF generated label containing a title followed by the identifier "title [ns:id]". 

Used by convention in most RDF browsers to render the name of resource instead of its URI.

Taken together,

 <code>
  geneid:15275 
   rdfs:label "Hk1 [geneid:15275]" ;
   dc:title "Hk1" ;
   dc:identifier "geneid:15275" ;
   rdf:type geneid_term:Gene .
 </code>

Datasets, Records and Entities

We recognize a minimum of 3 entities found in biological information resources: physical entities, records and datasets.

1. Record

Records are information objects that contain a set of statements, primarily about the subject.

 <code>
  namespace_record:identifier
    bio2rdf_term:has-primary-subject namespace:identifier .
 </code>
 <code>
  namespace:identifier
   bio2rdf_term:is-described-by namespace_record:identifier .
 </code>

2. Dataset Datasets are collections of records.

 <code>
  bio2rdf_dataset:<namespace>
    bio2rdf_term:has-item namespace_record:identifer .
 </code>

Since datasets can be versioned, we

 <code>
  bio2rdf_dataset:namespace.version
    dc:hasVersion "13" ;
    dc:partOf bio2rdf_dataset:namespace .
 </code>

Mappings

this section is about how to create mappings from your dataset specific vocabulary to SIO.

Ontologies

Scripts

:Category:Scripts

Serialization

Loading

Loading the RDF database