Skip to content

Report 1

Jhonny Hueller edited this page Apr 27, 2021 · 1 revision

Code description:

This page is dedicated to explaining the logic behind this code.

Main functions:

extract_paths_to_tokens

This self-contained function is the responsible for satisfying the requirement number 1 "extract a path of dependency relations from the ROOT to a token".

Input: string

Output: dictionary containing list of tokens

The function parse the string sentence into a Doc spaCy object. Then creates a dictionary with key a token and a value its head, for all but the root of the sentence with is set to None after having identified it through the use of find root.

For each token, then, a list is created and populated with a sequence of token such that each element i in the list, is the head of i-1. With the token on index 0 and the root on index n.

The list is then reversed and stored in the dictionary, which is returned.

extract_subtrees

This function satisfies the requirements 2 "extract subtree of a dependents given a token".

Input: string, boolean (optional)

Output: dictionary containing list of tokens

This function return a dictionary with the keys as the tokens, and the values as a list of their descendants. It makes use of a support list in order to have a list object and not the generator that the spaCy object Token.tree returns. The optional boolean value (false by default) is needed in case it is also necessary to have the token itself present in the list.

token_to_subtree_check

This function satisfies the requirement "check if a given list of tokens (segment of a sentence) forms a subtree".

Input: list of token, string sentence

Output: boolean

This function check if a given sequence of token corresponds to any subtree in the sentence. It makes use of the function extract_subtrees to extract all the possible subtrees in the sentence, and confronts them with the list of tokens to check. It makes use of get_list_from_tree in order to have a list object and not a generator.

extract_head_of_span

fulfil the fourth requisite function "identify head of a span, given its tokens".

Input: list of string, sentence string

Output: the root token, or None

The function uses the support function contains_list to check whether a list of the token of the sentence contains perfectly a list of the token of the span. If the matching is found, it create a new span with the information of the Doc sentence and the corresponding slicing and return its root.

extract_info

This is the function that implements the requirement 5 "extract sentence subject, direct object and indirect object spans"

Input: sentence string, optional list of other parameters

Output: dictionary with keys the list of other input parameters and values list of corresponding spans

This function is a more general function than the one requested (which is implemented as a special case with extract_nsubj_dobj_iobj) that returns the span of each type of possible dependency relationship requested by the use of the optional parameters.

A dictionary is created and for each dependency relationship we need to extract, a list is stored with the dependency type as key. Then for each token in the doc object, if the token belong to a type of dependency relation we need to extract, we store its span into the corresponding list, using its dependency type as key of the dictionary.

Then the dictionary is returned.

Support Functions:

get_doc

Return a spaCy Doc object after checking if the input string is not empty. It is used often through the code.

find_root

Returns the token root of the sentence, by checking all the token and looking for the one which has the "ROOT" dep_ attribute.

get_list_from_tree

This function return a list object when given an iterable object. Used in case it is necessary to convert an generator into a comparable object.

contains_list

A function that check if a list is contained into another list, taking into account all the elements and their order, and return the index corresponding to the first occurrence of the first element of the smaller list into the bigger one.

extract_nsubj_dobj_iobj

The function that uses extract_info to implement the fifth requirement, calling it with the optional string parameter "nsubj", "dobj", "iobj".