# Rule Extraction from annotated examples



### Example Ground Truth for learning extraction rules

### Ideal PAS tuple: 
```
Sentence: Full-mouth debridement is not payable on the same date of services as other prophylactic or preventive procedures.
Compounds found in ontology are: ['the same date', 'Full-mouth debridement', 'preventive procedures', 'is not payable']
Sytactic Roles: subj,pred,obj,comp,prep_adv,cord,prep
Ideal Tuples: Full-mouth debridement,payable,the same date,services,not,NA,NA
```

In [1]:
import spacy
from spacy.matcher import Matcher
from spacy import displacy
from spacy_pattern_builder import build_dependency_pattern
from spacy_pattern_builder import util
from spacy.matcher import DependencyMatcher
from pprint import pprint
from pprint import PrettyPrinter

In [2]:
sentence="Full-mouth debridement is not payable on the same date of service as other prophylactic or preventive procedures."
nlp = spacy.load('en_core_web_sm')
doc=nlp(sentence)
with doc.retokenize() as retokenizer:
        #For purposes of this sample code,
        #Retokenization is done here based on known index ranges
        #In real application, a call is made to domain-specific-retokenization module
        retokenizer.merge(doc[0:4])#Full-mouth debridement
        retokenizer.merge(doc[8:11])#the same date
        retokenizer.merge(doc[17:19])#preventive procedures
        retokenizer.merge(doc[4:8])#is not payable


In [3]:

displacy.render(doc, style="dep", options={"distance": 140}, jupyter=True)

In [4]:
def extract_matches_from_sentence(doc,token_indices):
    #Extract Pattern using Spacy Pattern builder
    match_tokens = [doc[i] for i in token_indices]  
    feature_dict = {
        'DEP': 'dep_'
    }
    pattern = build_dependency_pattern(doc, match_tokens, feature_dict=feature_dict)
    return pattern

In [5]:
#Extract Semgrex pattern for <Full-mouth debridement,is not payable,the same date,services>
token_indices = [0,1,2,3,4]# These tokens represent the SDP covering the desired set of tokens.
linguistic_pattern = extract_matches_from_sentence(doc,token_indices)

In [6]:
print("Semgrex pattern characterizing the subtree is: ")
print(linguistic_pattern)

Semgrex pattern characterizing the subtree is: 
[{'SPEC': {'NODE_NAME': 'node1'}, 'PATTERN': {'DEP': 'ROOT'}}, {'SPEC': {'NODE_NAME': 'node0', 'NBOR_NAME': 'node1', 'NBOR_RELOP': '>'}, 'PATTERN': {'DEP': 'nsubj'}}, {'SPEC': {'NODE_NAME': 'node2', 'NBOR_NAME': 'node0', 'NBOR_RELOP': '$--'}, 'PATTERN': {'DEP': 'pobj'}}, {'SPEC': {'NODE_NAME': 'node3', 'NBOR_NAME': 'node2', 'NBOR_RELOP': '>'}, 'PATTERN': {'DEP': 'prep'}}, {'SPEC': {'NODE_NAME': 'node4', 'NBOR_NAME': 'node3', 'NBOR_RELOP': '>'}, 'PATTERN': {'DEP': 'pobj'}}]


In [7]:
#Slotting rules are then constructed from original example.
#We know from annotated example that "full-mouth debriment" has the sytactic label = "subj" in our application.
#Thus slotting rule is appeneded to the above linguistic pattern.
#The final extraction rule looks like:
extraction_rule = {"sentence": "Full-mouth debridement is not payable on the same date of services as other prophylactic or preventive procedures.", "tuple": {"subj": 0, "pred": 4, "obj": "NA", "comp": "NA", "prep_adv": "NA", "cord": "NA", "prep": "NA"}, "rule_id": -1930383187189869905, "semgrex_pattern": [{"SPEC": {"NODE_NAME": "node4"}, "PATTERN": {"DEP": "ROOT"}}, {"SPEC": {"NODE_NAME": "node3", "NBOR_NAME": "node4", "NBOR_RELOP": ">"}, "PATTERN": {"DEP": "nsubj"}}, {"SPEC": {"NODE_NAME": "node2", "NBOR_NAME": "node3", "NBOR_RELOP": ">"}, "PATTERN": {"DEP": "compound"}}, {"SPEC": {"NODE_NAME": "node0", "NBOR_NAME": "node2", "NBOR_RELOP": ">"}, "PATTERN": {"DEP": "amod"}}], "slotting_rule": {"pred": {"dep": "ROOT", "pos": "AUX", "tag": "VBZ"}, "subj": {"dep": "amod", "pos": "ADJ", "tag": "JJ"}, "obj": [], "prep": [], "comp": [], "cord": [], "prep_adv": []}}

In [8]:
pprint(extraction_rule)

{'rule_id': -1930383187189869905,
 'semgrex_pattern': [{'PATTERN': {'DEP': 'ROOT'},
                      'SPEC': {'NODE_NAME': 'node4'}},
                     {'PATTERN': {'DEP': 'nsubj'},
                      'SPEC': {'NBOR_NAME': 'node4',
                               'NBOR_RELOP': '>',
                               'NODE_NAME': 'node3'}},
                     {'PATTERN': {'DEP': 'compound'},
                      'SPEC': {'NBOR_NAME': 'node3',
                               'NBOR_RELOP': '>',
                               'NODE_NAME': 'node2'}},
                     {'PATTERN': {'DEP': 'amod'},
                      'SPEC': {'NBOR_NAME': 'node2',
                               'NBOR_RELOP': '>',
                               'NODE_NAME': 'node0'}}],
 'sentence': 'Full-mouth debridement is not payable on the same date of '
             'services as other prophylactic or preventive procedures.',
 'slotting_rule': {'comp': [],
                   'cord': [],
                   'ob