# TreeDLib

In [8]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


We define three classes of operators:
* _NodeSets:_ $S : 2^T \mapsto 2^T$
* _Indicators:_ $I : 2^T \mapsto \{0,1\}^F$
* _Combinators:_ $C : \{0,1\}^F \times \{0,1\}^F \mapsto \{0,1\}^F$

where $T$ is a given input tree, and $F$ is the dimension of the feature space.

In [9]:
from feature_templates import *
from basic_features import *

## Simple demo: Generating DDLib features

As a first simple demo let's generate the features generated by [ddlib](http://deepdive.stanford.edu/doc/basics/gen_feats.html).

**_Note there are some noticeable differences stemming from using a dep tree representation at base; but a simple linear representation of the sentence could be used too, as in DDLib_**

First, let's load a few sample sentences and convert one of them to XML format for testing; we'll also load the feature templates library, and also tag some candidates (crudely for now) to play around with:

In [13]:
import lxml.etree as et
from util import load_sentences, tag_candidate
from tree_structs import sentence_to_xmltree, XMLTree
dts = map(sentence_to_xmltree, load_sentences('test/test1.parsed.tsv'))
dt = dts[1]
tag_candidate(dt.root, ['Autosomal', 'dominant', 'polycystic', 'kidney', 'disease'], 'P1')
tag_candidate(dt.root, ['PKD1'], 'G1')
tag_candidate(dt.root, ['PKD2'], 'G2')
dt.to_str()
dt.render_tree()
root = dt.root

In [14]:
pheno = root.xpath("//*[@cid='P1'][1]")[0]

In [15]:
p = XMLTree(pheno)
p.render_tree()

In [None]:
# TODO:
# - XML -> JSON / visualization; be able to init Tree from XML!
# - Clean up this notebook / code!

def new_root_cid(root, cid):
    
    # Get candidate mention as new root
    # NOTE: This will already contain all of its descendants from the old tree
    new_root = root.xpath("//*[@cid='%s'][1]" % cid)[0]
    
    # Recursively add parents + all their children, *minus current one!*
    new_root.append(root.xpath("//*[@cid='%s'][1]/.." % cid)[0])
    return new_root

In [None]:
root.xpath("//*[@cid='P1'][1]/..")

In [None]:
et.tostring(new_root_cid(root, 'P1')[0])

In [None]:
Mention(0)

In [None]:
NGrams(Between(Mention(0), Mention(1)), 'lemma', 3).print_apply(root, ['P1', 'G1'])

In [None]:
load_sentences('test/test1.parsed.tsv')[1].text

In [None]:
def flat_tree(root):
    if root.get('dep_label') is not None:
        s = '[%s]> %s' % (root.get('dep_label'), root.get('word'))
    else:
        s = root.get('word')
    if len(root) > 0:
        s += ' ( %s )' % ', '.join(flat_tree(c) for c in root)
    return s

In [None]:
ft = flat_tree(dt.root)
ft

In [None]:
re.findall(r'disorder.*?caused.*?mutations', ft)

In [None]:
xml = dt.to_xml_str()

In [None]:
xml

In [None]:
re.findall(r'<node[^>]*cid="G2"[^>]*/>', xml)

In [None]:
def node(attribs):
    return r'<[^>]+' + '\s[^>]*\s'.join('%s="%s"' % (k,v) for k,v in attribs.iteritems()) + '[^>]*>'

def child_of(attribs):
    

In [None]:
re.search(node({'cid':'G2'}), xml)

In [None]:
dt.root

In [None]:
Indicator(Between(Mention(0), Mention(1)), 'lemma').print_apply(dt.root, ['P1', 'G1'])

In [None]:
for feat in get_generic_mention_features(dt.root, 'G2', ['monogenic']):
    print feat

### Table example

In [None]:
# Some wishful thinking...
table_xml = """
<div class="table-wrapper">
    <h3>Causal genomic relationships</h3>
    <table>
        <tr><th>Gene</th><th>Variant</th><th>Phenotype</th></tr>
        <tr><td>ABC</td><td><i>AG34</i></td><td>Headaches during defecation</td></tr>
        <tr><td>BDF</td><td><i>CT2</i></td><td>Defecation during headaches</td></tr>
        <tr><td>XYG</td><td><i>AT456</i></td><td>Defecasomnia</td></tr>
    </table>
</div>
"""
from IPython.core.display import display_html, HTML
display_html(HTML(table_xml))

In [None]:
tt = TableTree(table_xml)

In [None]:
tt.to_xml_str()

In [None]:
tt.root