## Extracting Ontologies from Wikidata

Wikidata includes links between entities using predicates such as SubClassOf (P279). These form a classification hierarchy,
although as this comes from multiple sources, it may not conform to the same rules as ontology hierarchies.

OntoBio includes a wikidata ontology factory, so we can transparently create an Ontology object from wikidata,
and leverage the same methods available in ontobio.

This example is focused around [PTSD](https://www.wikidata.org/wiki/Q202387)


In [6]:
from ontobio.ontol_factory import OntologyFactory
f = OntologyFactory()

## OntologyFactory recognizes the prefix wdq for wikidata queries;
## We use this to make a sub-ontology
## (currently we have no lazy wrapper for WD, only Eager, so we limit the size)
ont = f.create('wdq:Q544006') # Anxiety disorder



In [8]:
## Find terms starting with Anxiety in the sub-ontology
qids = ont.search('Anxiety%')
qids

[rdflib.term.URIRef('http://www.wikidata.org/entity/Q544006')]

In [10]:
## Traverse up and down from query node in our sub-ontology
nodes = ont.traverse_nodes(qids, up=True, down=True)
labels = [ont.label(n) for n in nodes]
labels[:25]

['selective mutism',
 'specific phobia',
 'identifier',
 'sign',
 'blood phobia',
 'School refusal',
 'mixed disorder as reaction to stress',
 'compulsive act',
 'Olfactory Reference Syndrome',
 'knowledge',
 'Afrophobia',
 'state',
 'disease',
 'separation anxiety disorder',
 'obsessive-compulsive disorder',
 'Telephone phobia',
 'information',
 'Agrizoophobia',
 'arachnophobia',
 'Sexual obsessions',
 'Ornithophobia',
 'Francophobia',
 'Fear of mice',
 'social phobia',
 'Glossophobia']

In [16]:
## Test for cycles
import networkx as nx
g = ont.get_graph()
def show_cycle(nl):
    print(["{} {}".format(n, ont.label(n)) for n in nl])

cycles_list = list(nx.simple_cycles(g))
show_cycle(cycles_list[0])

['http://www.wikidata.org/entity/Q1347367 ability', 'http://www.wikidata.org/entity/Q151885 concept', 'http://www.wikidata.org/entity/Q9081 knowledge', 'http://www.wikidata.org/entity/Q3695082 sign', 'http://www.wikidata.org/entity/Q853614 identifier', 'http://www.wikidata.org/entity/Q937228 property']


In [11]:
## Show our extract of the sub-ontology as an ascii tree
## (note this is resilient to cycles)
from ontobio.io.ontol_renderers import GraphRenderer
w = GraphRenderer.create('tree')
w.write_subgraph(ont, nodes, query_ids=qids)


ERROR:root:CYCLE: http://www.wikidata.org/entity/Q3695082 already visited in [rdflib.term.URIRef('http://www.wikidata.org/entity/Q35120'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q488383'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q3695082'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q853614'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q937228'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q1347367'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q151885'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q673661'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q7184903'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q11028')]
ERROR:root:CYCLE: http://www.wikidata.org/entity/Q3695082 already visited in [rdflib.term.URIRef('http://www.wikidata.org/entity/Q35120'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q488383'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q3695082'), rdflib.term.URIRef('http://www.wikida

ERROR:root:CYCLE: http://www.wikidata.org/entity/Q11028 already visited in [rdflib.term.URIRef('http://www.wikidata.org/entity/Q35120'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q23312670'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q1920566'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q11028'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q3695082'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q853614'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q937228'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q1347367'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q151885'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q673661'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q7184903')]
ERROR:root:CYCLE: http://www.wikidata.org/entity/Q151885 already visited in [rdflib.term.URIRef('http://www.wikidata.org/entity/Q35120'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q23312670'), rdflib.term.URIRef('http://www.wikid

ERROR:root:CYCLE: http://www.wikidata.org/entity/Q937228 already visited in [rdflib.term.URIRef('http://www.wikidata.org/entity/Q1207505'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q937228'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q1347367'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q151885'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q9081'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q3695082'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q853614')]
ERROR:root:CYCLE: http://www.wikidata.org/entity/Q673661 already visited in [rdflib.term.URIRef('http://www.wikidata.org/entity/Q18205125'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q673661'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q7184903'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q11028'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q3695082'), rdflib.term.URIRef('http://www.wikidata.org/entity/Q853614'), rdflib.term.URIRef('http://www.wikid

. http://www.wikidata.org/entity/Q35120 ! entity
 % http://www.wikidata.org/entity/Q488383 ! object
  % http://www.wikidata.org/entity/Q3695082 ! sign
   % http://www.wikidata.org/entity/Q853614 ! identifier
    % http://www.wikidata.org/entity/Q937228 ! property
     % http://www.wikidata.org/entity/Q1347367 ! ability
      % http://www.wikidata.org/entity/Q151885 ! concept
       % http://www.wikidata.org/entity/Q673661 ! abstraction
        % http://www.wikidata.org/entity/Q7184903 ! abstract object
         % http://www.wikidata.org/entity/Q58778 ! system
          % http://www.wikidata.org/entity/Q3505845 ! state
           % http://www.wikidata.org/entity/Q18479330 ! physical condition
            % http://www.wikidata.org/entity/Q7189713 ! physiological condition
             % http://www.wikidata.org/entity/Q2057971 ! health problem
              % http://www.wikidata.org/entity/Q12136 ! disease
               % http://www.wikidata.org/entity/Q7101840 ! Organic disease
        

In [4]:
## Show as graph using GraphViz
w = GraphRenderer.create('png')
w.outfile = 'output/anxiety-disorder.png'
w.write_subgraph(ont, nodes, query_ids=qids)


![img](output/anxiety-disorder.png)

## TODO

Fetch drugs, genes for each of these
