### Rule-based Matching
spaCy offers a rule-matching tool called `Matcher` that allows you to build a library of token patterns, then match those patterns against a Doc object to return a list of found matches. You can match on any part of the token including text and annotations, and you can add multiple patterns to the same matcher.
In this project, I applied Spacy Rule-based matching to find the location of the defined terms in documents.

#### Import Pandas, Numpy and Spacy libraries 

In [23]:
import numpy as np
import pandas as pd
import spacy 
from spacy.matcher import Matcher
sp = spacy.load('en_core_web_sm')

#### JSON to Dataframe

In [24]:
dt = pd.read_json('definitions.json') #Definitions dataframe

In [25]:
doc = pd.read_json('documents.json') #documents dataframe

#### Main Function

In [26]:
lst_dict = []
def myfunc(row):
    pattern=[]
    for item in row['term_lemmas']:
        mylemmas = dict({"LEMMA": item}) #Creating Lemmas
        pattern.append(mylemmas)
    matcher = Matcher(nlp.vocab)
    matcher.add('assignment', [pattern])
    for i in range(len(doc)) :
        found_matches = matcher(nlp(doc.iloc[i, 1]['text']))
        for elem in found_matches:
            lst_dict.append({'term_id':row['term_id'], 'doc_id':doc.iloc[i, 0], 'loc': elem})
dt.apply(lambda row : myfunc(row), axis = 1) #apply myfunc on every rows of definitions dataframe

#create my_matches dataframe
cols = ['term_id', 'doc_id', 'loc'] 
df = pd.DataFrame(columns=cols)
my_matches = df.append(lst_dict)

#creating start_token and end_token
def firstchar(myloc):
    for item in myloc:
        return myloc[1]
def endchar(myloc):
    for item in myloc:
        return myloc[2]
    
my_matches["start_token"] = my_matches['loc'].apply(firstchar)
my_matches["end_token"] = my_matches['loc'].apply(endchar)
my_matches.drop('loc', axis=1, inplace=True)
my_matches


Unnamed: 0,term_id,doc_id,start_token,end_token
0,1,1,8,10
1,1,1,29,31
2,2,2,10,14


`start_token`: the start index of the token 
`End_token`: the end index of the token

In [27]:
#Exporting the json file
import json
json_list = json.loads(json.dumps(list(my_matches.T.to_dict().values())))
with open('my_matches.json', 'w') as fout:
    json.dump(json_list, fout)

### Thank you! if you need any other information, please let me know. Ali Ghannadrad