### Rule-based Matching
spaCy offers a rule-matching tool called `Matcher` that allows you to build a library of token patterns, then match those patterns against a Doc object to return a list of found matches. You can match on any part of the token including text and annotations, and you can add multiple patterns to the same matcher.
In this project, I applied Spacy Rule-based matching to find the location of the defined terms in documents.

#### Import Pandas, Numpy and Spacy libraries 

In [1]:
import numpy as np
import pandas as pd
import spacy 
from spacy.matcher import Matcher
nlp = spacy.load('en_core_web_sm')

#### JSON to Dataframe

In [2]:
dt = pd.read_json('definitions.json') #Definitions dataframe

In [3]:
doc = pd.read_json('documents.json') #documents dataframe

In [14]:
doc

Unnamed: 0,doc_id,analyzed_doc
0,1,{'text': '3. Member States shall register al...
1,2,{'text': 'The following shall all be regarded ...


#### Main Function

In [9]:
lst_dict = []
def myfunc(row):
    pattern=[]
    for item in row['term_lemmas']:
        mylemmas = dict({"LEMMA": item}) #Creating Lemmas
        pattern.append(mylemmas)
        
    matcher = Matcher(nlp.vocab)
    matcher.add('assignment', [pattern])
    
    for i in range(len(doc)) :
        mydoc = nlp(doc.iloc[i, 1]['text'])
        found_matches = matcher(mydoc)
        
        for match_id, start, end in found_matches:
            string_id = nlp.vocab.strings[match_id]
            span = mydoc[start:end]
            lst_dict.append({'doc_id':doc.iloc[i, 0], 'defenition_id':row['term_id'], 'start_char': span.start_char, 'end_char':span.end_char})
    


In [10]:
dt.apply(lambda row : myfunc(row), axis = 1) #apply myfunc on every rows of definitions dataframe

df = pd.DataFrame()
my_matches = df.append(lst_dict)
my_matches

Unnamed: 0,doc_id,defenition_id,start_char,end_char
0,1,1,38,54
1,1,1,172,187
2,2,2,60,94


`start_token`: the start index of the token 
`End_token`: the end index of the token

In [6]:
#Exporting the json file
import json
json_list = json.loads(json.dumps(list(my_matches.T.to_dict().values())))
with open('my_matches.json', 'w') as fout:
    json.dump(json_list, fout)

### Thank you! if you need any other information, please let me know. Ali Ghannadrad