-
Notifications
You must be signed in to change notification settings - Fork 0
Report 1
This page is dedicated to explaining the logic behind this code.
This self-contained function is the responsible for satisfying the requirement number 1 "extract a path of dependency relations from the ROOT to a token".
Input: string
Output: dictionary containing list of tokens
The function parse the string sentence into a Doc spaCy object. Then creates a dictionary with key a token and a value its head, for all but the root of the sentence with is set to None after having identified it through the use of find root.
For each token, then, a list is created and populated with a sequence of token such that each element i in the list, is the head of i-1. With the token on index 0 and the root on index n.
The list is then reversed and stored in the dictionary, which is returned.
This function satisfies the requirements 2 "extract subtree of a dependents given a token".
Input: string, boolean (optional)
Output: dictionary containing list of tokens
This function return a dictionary with the keys as the tokens, and the values as a list of their descendants. It makes use of a support list in order to have a list object and not the generator that the spaCy object Token.tree returns. The optional boolean value (false by default) is needed in case it is also necessary to have the token itself present in the list.
This function satisfies the requirement "check if a given list of tokens (segment of a sentence) forms a subtree".
Input: list of token, string sentence
Output: boolean
This function check if a given sequence of token corresponds to any subtree in the sentence. It makes use of the function extract_subtrees to extract all the possible subtrees in the sentence, and confronts them with the list of tokens to check. It makes use of get_list_from_tree in order to have a list object and not a generator.
fulfil the fourth requisite function "identify head of a span, given its tokens".
Input: list of string, sentence string
Output: the root token, or None
The function uses the support function contains_list to check whether a list of the token of the sentence contains perfectly a list of the token of the span. If the matching is found, it create a new span with the information of the Doc sentence and the corresponding slicing and return its root.
This is the function that implements the requirement 5 "extract sentence subject, direct object and indirect object spans"
Input: sentence string, optional list of other parameters
Output: dictionary with keys the list of other input parameters and values list of corresponding spans
This function is a more general function than the one requested (which is implemented as a special case with extract_nsubj_dobj_iobj) that returns the span of each type of possible dependency relationship requested by the use of the optional parameters.
A dictionary is created and for each dependency relationship we need to extract, a list is stored with the dependency type as key. Then for each token in the doc object, if the token belong to a type of dependency relation we need to extract, we store its span into the corresponding list, using its dependency type as key of the dictionary.
Then the dictionary is returned.
Return a spaCy Doc object after checking if the input string is not empty. It is used often through the code.
Returns the token root of the sentence, by checking all the token and looking for the one which has the "ROOT" dep_
attribute.
This function return a list object when given an iterable object. Used in case it is necessary to convert an generator into a comparable object.
A function that check if a list is contained into another list, taking into account all the elements and their order, and return the index corresponding to the first occurrence of the first element of the smaller list into the bigger one.
The function that uses extract_info to implement the fifth requirement, calling it with the optional string parameter "nsubj"
, "dobj"
, "iobj"
.