synobu edited this page Sep 26, 2011 · 103 revisions



  • Construction of New LinkedData DB (link to the existing major LinkedData DBs).
  • Creating a Linked Data HOWTO for the construction of LinkedData DBs.
  • Writing a "Reasons why we should dive into RDF and Linked Data" as a manuscript: Why RDF?.

Some notes

  • You don't have to make RDF in the RDF/XML format which is quite complex. Start with the N-triples format: which is very easy. (by AK)
  • The turtle format is also easy to deal with by Google Refine. If you do not have a big table (<100,000 lines?), start with Google Refine and the turtle format.
  • You can convert between various RDF formats by using the rapper tool provided in the Redland RDF libraries (Raptor in particular): (by AK)
  • rapper -i ntriples -o rdfxml file.n3 > file.rdf

Data to publish in LinkedData DB

Alzheimer gene expression data (BH11Ujicha)

LinkedData DB for gene expression analysis on Alzheimer's disease sample (by SO)

Sample ruby code: (by MN)

to be generalized for a framework of gene expression analysis by Facet view of LinkedData (by MN).


Toxicogenomics data (gene expression data)

  • gene expressions (expression value, probe set id, gene symbol/title, transcript_id, etc..)
  • metadata (dose, dose level, time, organ, vivo/vitro, etc..)

Proteomics data (Peptide Mass data)

  • peptides mapping to protein (peptide mass, peptide sequence, protein ID, position, coverage, etc...)
  • metadata (sample, species, analytical instrument, processing software, parameters, searched DB, etc...)
  • proteomics data repositories: PRIDE, PeptideAtlas, Peptidome (site is closed. There are only archived data)
  • HUPO-PSI ontologies: PRO, MOD, ProPreO, OBI, SepCV, MSCV

Membrane Proteins of Known Structure (ID Mapping Data)

  • mapping of PDB ID, OPM ID, TC ID

Genome Metadata (Habitat, Ontology alignment)

Setup procedure of LinkedData DB

Setup of RDF store - Virtuoso (open source)

Download of Virtuoso open source edition from

RDF (triple) stores were surveyed at



$ sudo yum install gcc gmake autoconf automake libtool flex \
  bison gperf gawk m4 make openssl-devel readline-devel wget
$ tar xvfz virtuoso-opensource-6.1.3.tar.gz
$ cd virtuoso-opensource-6.1.3
$ ./configure --prefix=/usr/local/ --with-readline
$ nice make
# make install

Starting server

$ cd /usr/local/var/lib/virtuoso/db/
$ ls
$ virtuoso-t -df


Access to the Conductor menu by pointing your web-browser at http://localhost:8890/conductor/. Before accessing to the Conductor menu, you need to open 8890 port by system-config-securitylevel.


Mac OS X (OS 10.6 Snow Leopard)

Install Gawk

Download Gawk version 3.1.1 (gawk-3.1.1.tar.gz) from

$ cd gawk-3.1.1
$ ./configure
$ make
$ sudo make install

Install Virtuoso Open-Source Edition

Download Virtuoso (the OpenLink Virtuoso source code) from the SourceForge project page ( The latest version at the time was 6.1.3 (virtuoso-opensource-6.1.3).

$ cd virtuoso-opensource-6.1.3
$ ./configure
$ make
$ sudo make install

Starting server

$ cd /usr/local/virtuoso-opensource/var/lib/virtuoso/db
$ ls
$ sudo /usr/local/virtuoso-opensource/bin/virtuoso-t -f &


Access to the Conductor menu by pointing your web-browser at http://localhost:8890/conductor/.


  • INSTALL and README files in the virtuoso-opensource-6.1.3/ directory.

Setup of RDF store - OWLIM

Setup of RDF store - Sesami

QuickStart - Virtuoso

Upload RDF data

Log in as user "dba".

Virtuoso Web interface > RDF (on the top menu) > RDF Store Upload > Choose File and set proper "Named Graph IRI"

Execute SPARQL query

Virtuoso Web interface > RDF (on the top menu) > SPARQL > Set "Default Graph IRI", write your SPARQL query and Execute

Execute SPARQL query via terminal emulator

You can execute SPARQL query with REST GET method.

Quick guide:

  1. Access Virtuoso SPARQL Query Form (http://localhost:8890/sparql)
  2. Execute your favorite SPARQL query
  3. Copy the URL of the coming result page
  4. Access the URL with REST GET method: $ curl "URL" (Don't forget to add double quotation marks!)

Like this:

$ curl "http://localhost:8890/sparql?default-graph-uri=&query=SELECT+%3Fpdb_id%0D%0AWHERE%0D%0A%7B%0D%0A%3Fid+%3Chttp%3A%2F%2Flocalhost%3A3333%2Fpdb_id%3E+%3Fpdb_id%0D%0A%7D%0D%0A&format=text%2Fhtml"


  • You can choose several output formats (&format=***): HTML, XML, JSON, CSV, RDF/XML, …
  • When you want to get the result in JSON format, replace &format=text%2Fhtml with &format=application%2Fjson

Reference for SPARQL via scripting languages:

Virtuoso through scripting languages

Please add your codes written in your favorite scripting languages.

Execute SPARQL query with Python (by MM)

import sys, urllib, json

class Connection:

    def __init__(self, url):
    	self.base_url = url

    def query(self, q):
        q = "sparql?default-graph-uri=" + \
            "&query=" + urllib.quote_plus(q) + \
        return self._exec_sparql(q)

    def _exec_sparql(self, sparql):
    	data = urllib.urlopen(self.base_url+sparql).read()
    	    result = json.loads(data)["results"]["bindings"]
            return result
            return [{ "error": data }]

def main():
    c = Connection("http://localhost:8890/")
    response = c.query("""SELECT ?pdb_id WHERE {
                          ?id <http://localhost:3333/pdb_id> ?pdb_id .
    for r in response:
        print r["pdb_id"]["value"]

if __name__ == "__main__":

Retrieve data from PDBj with Python (by MM)

import rdflib

class PDBjRDF:

    def __init__(self, pdb_id):
    	self.pdb_uri = "" + pdb_id
        self.PDBo = rdflib.Namespace("")

    def get_pdbx_descriptor(self):
    	descriptors = []
        struct_categories = self._get_object(self.pdb_uri, "has_structCategory")
        for uri_s in struct_categories:
            has_structs = self._get_object(uri_s, "has_struct")
            for uri_h in has_structs:
                struct_categories = self._get_object(uri_h, "struct.pdbx_descriptor")
                for descriptor in struct_categories:
        return descriptors

    def _get_object(self, uri, predicate):
        g = rdflib.Graph()

        objects = []
            response = urllib.urlopen(uri)
            g.parse(response, format="xml")
            return []

        for s, p, o in g.triples((None, self.PDBo[predicate], None)):

        return objects

    def get_uniprot(self):
        links = []
        struct_refcategories = self._get_object(self.pdb_uri, "has_struct_refCategory")
        for uri_s in struct_refcategories:
            has_structrefs = self._get_object(uri_s, "has_struct_ref")
            for uri_h in has_structrefs:
                link_to_uniprots = self._get_object(uri_h, "link_to_uniprot")
                for link in link_to_uniprots:
        return links

def main():
    pdb_ids = ["1AP9", "1C8S", "1FBB", "1M0L", "1X0K"]
    for pdb_id in pdb_ids:
        p = PDBjRDF(pdb_id)
        descriptors = p.get_pdbx_descriptor()
        link_to_uniprots = p.get_uniprot()

        print descriptors, link_to_uniprots

if __name__ == "__main__":

Execute SPARQL query with Ruby (by YI)

require 'rubygems'
require 'sparql/client'

sparql ="http://localhost:8890/sparql")

predicate = "<http://localhost:3333/something>"
object = "?o"

result = sparql.query("SELECT ?o WHERE { ?s #{predicate} #{object} }")
result.each do |i|
  p i

Execute SPARQL query with Perl (by YAC)


use LWP::UserAgent;

my $query="select ?p, ?o where{ <> ?p ?o}";
my $baseURL="";

my $sparql_query = "query=$query&debug=on&format=text/csv&save=display";
my $sparqlURL="$baseURL?$sparql_query";

my $ua = LWP::UserAgent->new;
$ua->agent("MyApp/0.1 ");
my $req = HTTP::Request->new(GET => $sparqlURL);
my $res = $ua->request($req);

print $res->content;

LinkedData generation

Conversion of table data to RDF data

Reference for LinkedData design (type/predicate)

Construction of OWL

  • Protege

LinkedData DB for end user

SPARQL (Search)

  • Searching multiple SPARQL end ponts with one query: SPARQL 1.1
  • see page 98 in Bob DuChame (2011) Learning SPARQL. O'Reilly.

Facet (Aspect view)

Sub-network visualization (Link view)

ChEMBL and Drug Data links

  1. ChEMBL-RDF - Data Packages - the Data Hub
  2. Pharmaceutical Knowledge retrieval through Reasoning of ChEMBL RDF
  4. chem-bla-ics: ChEMBL RDF #1:SPARQL end point
  5. egonw/chembl.rdf - GitHub
  6. ChEMBL-RDF | Kasabi
  10. Journal of Biomedical Semantics | Full text | Linking the Resource Description Framework to cheminfor- matics and proteochemometrics
  13. LODD Data
  14. Journal of Cheminformatics | Full text | Linked open drug data for pharmaceutical research and development (Fig 2 shows ChEMBL data used in TripleMap)
  15. Original SPARQL end point and SNORQL
  16. (has example SPARQL queries)
  18. Kasabi blog post

Unsolved questions


  • Soichi Ogishima (SO)
  • Mizuki Morita (MM)
  • Yoshinobu Igarashi (YI)
  • Yi-an Chen (YAC)
  • Shin Kawano
  • Kiyoko Kinoshita
  • Shinobu Okamoto (ShO)
  • Mitsuteru Nakao
  • Anna Kokubu
  • Takaaki Mori
  • Erick Antezana
  • Chisato Yamasaki (also intereinsted in BioDBCore)
  • Yukie Akune
  • Yasunori Yamamoto (YY)