In [131]:
import bella
from bella import syntactic_contexts
from bella.dependency_parsers import tweebo, stanford
import networkx as nx
from networkx.algorithms import traversal

def repro_traversaltree(conll, target):
    '''
    This is an adapted version of:
    https://github.com/bluemonk482/tdparse/blob/master/src/utilities.py
    traversaltree method
    
    This was the original method used to find dependency connected words 
    in the TDParse method.
    
    The parts that had to change was the traversal.bfs_successors 
    outputs a generator and therefore no longer has a values() method
    thus had to gues to some degree what was happening here of which we 
    assume that the values are those in the second half of the tuple as 
    if you apply dict to the list of tuples those would be the values
    '''
    G = nx.Graph()
    for position, token, tag, parser, rel in conll:
        G.add_node(position)
        for position1, token1, tag1, parser1, rel1 in conll:
            if position1 == parser:
               head = position1
        if (parser == 0) or (parser == -1):
           pass
        else:
           try:
               G.add_edge(position, head, label=rel)
           except:
               print(token)
               print(conll)
    target_positions = []
    for position, token, tag, parser, rel in conll:
      if token == target:
        target_positions.append(position)
    positions = [[item for sublist in traversal.bfs_successors(G, target_position) for item in sublist] for target_position in target_positions]
    words = []
    for position in positions:
        for i in position:
            if isinstance(i, str):
                continue
            for d in i:
                d = int(d)
                words.append(conll[d-1][1])
    return words

# Dependency conntected words within the TDParse method

In the TDParse method they use the words connected within the same syntactic path as the target. Further in the footnote of the paper they state that they do not take proximity into account. Thus they use the whole syntactic tree. The whole dependency parse tree in a normal text is still the whole text but when using the [Tweebo Parser](https://www.aclweb.org/anthology/D14-1108.pdf) which is the parser used in this paper it actually splits the text into multiple syntactic trees thus the syntactic path for which the target is in is a utterances of the original text.

We show below that this is how the [original work by TDParse](https://www.aclweb.org/anthology/E17-1046.pdf) handled the text through using an adaptation of their code which is shown above in the `repro_traversaltree` method which is an adaptation of their [traversaltree](https://github.com/bluemonk482/tdparse/blob/master/src/utilities.py) method from their codebase.

Creating CONLL formatted text for the sentence `This bread is tasty but so is sour dough I think`.

In [129]:
from bella.dependency_parsers import TweeboParser, stanford
example_text = ['This bread is tasty but so is sour dough I think']
tweebo_api = TweeboParser()
tweebo_output = tweebo_api.parse_conll(example_text)[0]
tweebo_conll = []
for output in tweebo_output.split('\n'):
    print(output)
    output = output.split('\t')
    tweebo_conll.append((output[0], output[1], output[3], output[6], output[7]))

1	This	_	D	D	_	2	_	_	_
2	bread	_	N	N	_	3	_	_	_
3	is	_	V	V	_	5	CONJ	_	_
4	tasty	_	A	A	_	3	_	_	_
5	but	_	&	&	_	0	_	_	_
6	so	_	R	R	_	7	_	_	_
7	is	_	V	V	_	5	CONJ	_	_
8	sour	_	A	A	_	9	_	_	_
9	dough	_	N	N	_	7	_	_	_
10	I	_	O	O	_	11	_	_	_
11	think	_	V	V	_	0	_	_	_


Given the above CONLL formatted text we put it through the adapted TDParse method to find the connect words to the target `bread`

In [132]:
repro_traversaltree(tweebo_conll, 'bread')

['This', 'is', 'but', 'tasty', 'is', 'so', 'dough', 'sour']

As shown we get all the words that are in the same syntactic tree as the word `bread` except for the word `bread` as that is the target word.

## Stanford Parser
Now to demonstrate the point we use the Stanford CoreNLP dependency parser that does not split the text into multiple syntactic trees like most dependency parsers: 

In [133]:
# This has been generated from running stanford corenlp through it's java
# command line interface and specifying a conll output
# https://stanfordnlp.github.io/CoreNLP/cmdline.html
stanford_output = '''1	This	_	_	DT	_	2	det	_	_
2	bread	_	_	NN	_	4	nsubj	_	_
3	is	_	_	VBZ	_	4	cop	_	_
4	tasty	_	_	JJ	_	0	root	_	_
5	but	_	_	CC	_	4	cc	_	_
6	so	_	_	RB	_	10	advmod	_	_
7	is	_	_	VBZ	_	10	cop	_	_
8	sour	_	_	JJ	_	10	amod	_	_
9	dough	_	_	NN	_	10	compound	_	_
10	I	_	_	PRP	_	4	conj	_	_
11	think	_	_	VBP	_	4	dep	_	_'''
stanford_conll = []
for output in stanford_output.split('\n'):
    print(output)
    output = output.split('\t')
    stanford_conll.append((output[0], output[1], output[4], output[6], output[7]))
repro_traversaltree(stanford_conll, 'bread')

1	This	_	_	DT	_	2	det	_	_
2	bread	_	_	NN	_	4	nsubj	_	_
3	is	_	_	VBZ	_	4	cop	_	_
4	tasty	_	_	JJ	_	0	root	_	_
5	but	_	_	CC	_	4	cc	_	_
6	so	_	_	RB	_	10	advmod	_	_
7	is	_	_	VBZ	_	10	cop	_	_
8	sour	_	_	JJ	_	10	amod	_	_
9	dough	_	_	NN	_	10	compound	_	_
10	I	_	_	PRP	_	4	conj	_	_
11	think	_	_	VBP	_	4	dep	_	_


['This', 'tasty', 'is', 'but', 'I', 'think', 'so', 'is', 'sour', 'dough']

As shown above all the words in the list are all the words from the text apart from the target word `bread` as expected. Thus using any other parser but the Tweebo parser with the TDParse methods will not make use of the dependency context window as the window will create a whole context view just removing the target, thus will have no syntactic knowledge added to the feature space.