# UDS and "try" constructions

Caroline Gish | cng18@pitt.edu

This Jupyter Notebook contains all of the work I did with the UDS dataset after I understood a bit of how to traverse it. 

nbviewer view here: https://nbviewer.org/github/Data-Science-for-Linguists-2022/UDS-child-speech/blob/main/notebooks/UDS_explore_caroline_CRC7.ipynb

### Table of Contents

- [1. Exploring UDS with CRC](https://ondemand.htc.crc.pitt.edu/node/htc-n30.crc.pitt.edu/40835/lab/tree/hw4_yelp/UDS_explore_caroline.ipynb#1.-Exploring-UDS-with-CRC)
- [2. Importing Corpus](https://ondemand.htc.crc.pitt.edu/node/htc-n30.crc.pitt.edu/40835/lab/tree/hw4_yelp/UDS_explore_caroline.ipynb#2.--Importing-corpus)
- [3. Finding "try" sentences](https://ondemand.htc.crc.pitt.edu/node/htc-n30.crc.pitt.edu/40835/lab/tree/hw4_yelp/UDS_explore_caroline.ipynb#3.--Finding-%22try%22-sentences)
    - [3-1. "try" sentences where relation is not xcomp](https://ondemand.htc.crc.pitt.edu/node/htc-n30.crc.pitt.edu/40835/lab/tree/hw4_yelp/UDS_explore_caroline.ipynb#3-1.-%22try%22-sentences-where-relation-is-not-xcomp)
    - [3-2. Back to sentence ewt-dev-539 for exploration](https://ondemand.htc.crc.pitt.edu/node/htc-n30.crc.pitt.edu/40835/lab/tree/hw4_yelp/UDS_explore_caroline.ipynb#3-2.-Back-to-sentence-ewt-dev-539-for-exploration)
- [4. Info on "try" verb sentences](https://ondemand.htc.crc.pitt.edu/node/htc-n30.crc.pitt.edu/40835/lab/tree/hw4_yelp/UDS_explore_caroline.ipynb#3.--Finding-%22try%22-sentences)
    - [4-1. Trying out helper functions](https://ondemand.htc.crc.pitt.edu/node/htc-n30.crc.pitt.edu/40835/lab/tree/hw4_yelp/UDS_explore_caroline.ipynb#4-1.-Trying-out-helper-functions-(from-Na-Rae---thank-you!))
    - [4-2. Types of relations](https://ondemand.htc.crc.pitt.edu/node/htc-n30.crc.pitt.edu/40835/lab/tree/hw4_yelp/UDS_explore_caroline.ipynb#4-2.-Types-of-relations)
- [5. Specific info for sentences with "try" constructions](https://ondemand.htc.crc.pitt.edu/node/htc-n30.crc.pitt.edu/40835/lab/tree/hw4_yelp/UDS_explore_caroline.ipynb#5.-Specific-info-for-sentences-with-%22try%22-constructions)
    - [5-1. Create DataFrame with sentence and word info](https://ondemand.htc.crc.pitt.edu/node/htc-n30.crc.pitt.edu/40835/lab/tree/hw4_yelp/UDS_explore_caroline.ipynb#5-1.-Create-DataFrame-with-sentence-and-word-info)
- [6. Closer look at node type subspaces (of interest)](https://ondemand.htc.crc.pitt.edu/node/htc-n30.crc.pitt.edu/40835/lab/tree/hw4_yelp/UDS_explore_caroline.ipynb#6.-Closer-look-at-node-type-subspaces-(of-interest))
    - [6-1. Factuality](https://ondemand.htc.crc.pitt.edu/node/htc-n30.crc.pitt.edu/40835/lab/tree/hw4_yelp/UDS_explore_caroline.ipynb#6-1.-Factuality)
    - [6-2. Event structure](https://ondemand.htc.crc.pitt.edu/node/htc-n30.crc.pitt.edu/40835/lab/tree/hw4_yelp/UDS_explore_caroline.ipynb#6-2.-Event-structure)
    - [6-3. Genericity](https://ondemand.htc.crc.pitt.edu/node/htc-n30.crc.pitt.edu/40835/lab/tree/hw4_yelp/UDS_explore_caroline.ipynb#6-3.-Genericity)
    - [6-4. Values for subspaces](https://ondemand.htc.crc.pitt.edu/node/htc-n30.crc.pitt.edu/40835/lab/tree/hw4_yelp/UDS_explore_caroline.ipynb#6-4.-Values-for-subspaces)

## 1. Exploring UDS with CRC

Installing on CRC done via two commands:

- `module load python/ondemand-jupyter-python3.8`

- `pip install --user git+https://github.com/decompositional-semantics-initiative/decomp.git`

additional:

- `pip install pathlib`
- `pip install ruamel-yaml`
- `pip install pyqtwebengine==5.13`
- `pip install pyqt5==5.13`

In [1]:
%pprint   

from pprint import pprint

Pretty printing has been turned OFF


In [2]:
import pandas as pd

## 2.  Importing corpus

In [3]:
from decomp import UDSCorpus

In [4]:
uds = UDSCorpus()

## 3.  Finding "try" sentences 

In [5]:
# all sentences with the string " try"

for x in uds:
    sent_text = uds[x].sentence
    if ' try ' in sent_text:    # based on text substring - need spaces because otherwise was getting 'try' as part of bigger string
        print(x, sent_text)

ewt-train-668 Moreover there are now major covert attempts under way to try and bring back to Kabul leading Taliban commanders , who have been living quietly in Pakistan and have taken no part in the Taliban insurgency .
ewt-train-701 In a highly significant move Afghan Pashtun tribes along the Pakistan border warned the Taliban in Quetta and Chaman that if they try and disrupt the elections , they would be resisted .
ewt-train-1195 If part of the public supports it , others tolerate it , many are afraid of it and some try to explain it away by poverty or by a miserable childhood , organized crime will thrive and so will terrorism .
ewt-train-1328 Nevertheless , most European countries still trade with Iran , try to appease it and refuse to read the clear signals .
ewt-train-1330 It is pointless to try to understand the subtle differences between the Sunni terror of Al Qaeda and Hamas and the Shiite terror of Hezbollah , Sadr and other Iranian - inspired enterprises .
ewt-train-2078 We

**In a lot of these sentences "try" functions as a marker of the future - for an experience that has not yet happened.**



In [6]:
# all sentences with string " tried"

for x in uds:
    sent_text = uds[x].sentence
    if ' tried ' in sent_text:    # based on text substring
        print(x, sent_text)

ewt-train-917 Afshari , who was posted in Germany and was responsible for receiving Mujahedeen children during the gulf war , said that when the German government tried to absorb Mujahedeen children into their education system , the Mujahedeen refused .
ewt-train-5584 When Sabeer Bhatia came up with the business plan for the mail service , he tried all kinds of names ending in ' mail ' and finally settled for hotmail as it included the letters " html " - the programming language used to write web pages .
ewt-train-5633 Anyway , I tried to get a preorder but no places were selling it so I missed out on 2/3 weekend events .
ewt-train-6438 The Pew researchers tried to transcend the economic argument .
ewt-train-7233 Have you tried using clockwork recovery ?
ewt-train-8476 I would n't know -- I do n't particularly like scary movies , and I have n't tried this with my pet .
ewt-train-8587 I have done a fair amount of metal casting , but never tried to build my own furnace , but I think your

The first sentence here is interesting - this is what I am interested in with the "try" construction. 

> ewt-dev-539 I tried to do it on the HRonline web - site , but the procedure is too complicated .

So to what extent did the action occur?

Take a closer look at sentence ewt-dev-539, but first create variable containing all the sentences with the verb "try".

In [7]:
# based on lemma + POS -> only verbs

try_sents = []
for x in uds:
    sent_text = uds[x].sentence
    sent_lem = [(v['lemma'], v['upos']) for (n,v) in uds[x].syntax_nodes.items()]
    sent_lem_set = set(sent_lem)
    if ('try', 'VERB') in sent_lem_set:
        print(x, sent_text)
        try_sents.append(x)

ewt-train-133 Musharraf has been trying to purge his officer corps of the substantial number of al - Qaeda sympathizers .
ewt-train-637 The neighbours are still interfering , but there are signs that rather than undermining Afghanistan 's stability they may now be trying to strengthen it .
ewt-train-668 Moreover there are now major covert attempts under way to try and bring back to Kabul leading Taliban commanders , who have been living quietly in Pakistan and have taken no part in the Taliban insurgency .
ewt-train-688 However powerful hardliners in Tehran may be trying to undermine that strategy and a new issue is likely to deepen the rift with the moderates .
ewt-train-701 In a highly significant move Afghan Pashtun tribes along the Pakistan border warned the Taliban in Quetta and Chaman that if they try and disrupt the elections , they would be resisted .
ewt-train-796 He was trying to buddy with Archibald and impress him .
ewt-train-881 I read of a case not long ago when some peop

In [8]:
len(try_sents)

236

Now lots of "try" constructions once we base on lemma, and the kinds of sentences I want. 

In [9]:
try_sents[100]

'ewt-train-8851'

`try_sents` is list of sentence IDs

In [10]:
# some info about the sentences

for s in try_sents[:3]:      # first 3
    print()
    for (n, v) in uds[s].syntax_nodes.items():
        print(n, v['form'], v['lemma'], v['xpos'], v['upos'], sep="\t")


ewt-train-133-syntax-1	Musharraf	Musharraf	NNP	PROPN
ewt-train-133-syntax-2	has	have	VBZ	AUX
ewt-train-133-syntax-3	been	be	VBN	AUX
ewt-train-133-syntax-4	trying	try	VBG	VERB
ewt-train-133-syntax-5	to	to	TO	PART
ewt-train-133-syntax-6	purge	purge	VB	VERB
ewt-train-133-syntax-7	his	he	PRP$	PRON
ewt-train-133-syntax-8	officer	officer	NN	NOUN
ewt-train-133-syntax-9	corps	corps	NN	NOUN
ewt-train-133-syntax-10	of	of	IN	ADP
ewt-train-133-syntax-11	the	the	DT	DET
ewt-train-133-syntax-12	substantial	substantial	JJ	ADJ
ewt-train-133-syntax-13	number	number	NN	NOUN
ewt-train-133-syntax-14	of	of	IN	ADP
ewt-train-133-syntax-15	al	al	NNP	PROPN
ewt-train-133-syntax-16	-	-	HYPH	PUNCT
ewt-train-133-syntax-17	Qaeda	Qaeda	NNP	PROPN
ewt-train-133-syntax-18	sympathizers	sympathizer	NNS	NOUN
ewt-train-133-syntax-19	.	.	.	PUNCT

ewt-train-637-syntax-1	The	the	DT	DET
ewt-train-637-syntax-2	neighbours	neighbour	NNS	NOUN
ewt-train-637-syntax-3	are	be	VBP	AUX
ewt-train-637-syntax-4	still	still	RB	ADV
ewt-train

https://universaldependencies.org/format.html

- FORM: Word form or punctuation symbol.
- LEMMA: Lemma or stem of word form.
- UPOS: Universal part-of-speech tag.
- XPOS: Language-specific part-of-speech tag; underscore if not available.
- DEPREL: Universal dependency relation to the HEAD (root iff HEAD = 0) or a defined language-specific subtype of one.

In [11]:
# another grouping of try sentences
# this time (sentence, node id)

try_sents_id = []
for x in uds:
    sent_text = uds[x].sentence
    # key: ('try', 'VERB'), value: syntax node id
    sent_lem_dict = {(v['lemma'], v['upos']):n for (n,v) in uds[x].syntax_nodes.items()}
    for (n,v) in uds[x].syntax_nodes.items():
        if v['lemma']=='try' and v['upos']=='VERB':
            try_sents_id.append( (x,n) )    # (sentence, node id for 'try')

In [12]:
try_sents_id[0]

('ewt-train-133', 'ewt-train-133-syntax-4')

`try_sents_id` is list of (sentence ID, node ID) tuples

In [13]:
def print_udep(udstree_id):
    "Custom printing function: also prints dependency info"
    print(udstree_id)
    print(uds[udstree_id].sentence)
    edge_dict = {e2:(e1,v['deprel']) for ((e1,e2),v) in uds[udstree_id].syntax_edges().items()}
    for (n,v) in uds[udstree_id].syntax_nodes.items():
        print(n, v['form'], v['lemma'], v['xpos'], v['upos'], sep="\t", end='\t')
        if n in edge_dict:
            print(':'.join(edge_dict[n]))
        else: print('ERROR')

In [14]:
for s, trid in try_sents_id[:3]:
    print()
    print('try verb is found at:', trid)
    print('-----------')
    print_udep(s)


try verb is found at: ewt-train-133-syntax-4
-----------
ewt-train-133
Musharraf has been trying to purge his officer corps of the substantial number of al - Qaeda sympathizers .
ewt-train-133-syntax-1	Musharraf	Musharraf	NNP	PROPN	ewt-train-133-syntax-4:nsubj
ewt-train-133-syntax-2	has	have	VBZ	AUX	ewt-train-133-syntax-4:aux
ewt-train-133-syntax-3	been	be	VBN	AUX	ewt-train-133-syntax-4:aux
ewt-train-133-syntax-4	trying	try	VBG	VERB	ewt-train-133-root-0:root
ewt-train-133-syntax-5	to	to	TO	PART	ewt-train-133-syntax-6:mark
ewt-train-133-syntax-6	purge	purge	VB	VERB	ewt-train-133-syntax-4:xcomp
ewt-train-133-syntax-7	his	he	PRP$	PRON	ewt-train-133-syntax-9:nmod:poss
ewt-train-133-syntax-8	officer	officer	NN	NOUN	ewt-train-133-syntax-9:compound
ewt-train-133-syntax-9	corps	corps	NN	NOUN	ewt-train-133-syntax-6:dobj
ewt-train-133-syntax-10	of	of	IN	ADP	ewt-train-133-syntax-13:case
ewt-train-133-syntax-11	the	the	DT	DET	ewt-train-133-syntax-13:det
ewt-train-133-syntax-12	substantial	substan

In [15]:
try_sents[0]   # first one

'ewt-train-133'

In [16]:
for (n1, n2) in uds['ewt-train-133'].syntax_edges():
    if n1 == 'ewt-train-133-syntax-4':
        print(uds['ewt-train-133'].syntax_edges()[(n1,n2)])
        
# want xcomp deprel label, which is 'ewt-train-133-syntax-6'

{'deprel': 'nsubj', 'domain': 'syntax', 'type': 'dependency', 'id': 'ewt-train-133-syntax-1'}
{'deprel': 'aux', 'domain': 'syntax', 'type': 'dependency', 'id': 'ewt-train-133-syntax-2'}
{'deprel': 'aux', 'domain': 'syntax', 'type': 'dependency', 'id': 'ewt-train-133-syntax-3'}
{'deprel': 'xcomp', 'domain': 'syntax', 'type': 'dependency', 'id': 'ewt-train-133-syntax-6'}
{'deprel': 'punct', 'domain': 'syntax', 'type': 'dependency', 'id': 'ewt-train-133-syntax-19'}


### 3-1. "try" sentences where relation is not xcomp

- "try and VERB' construction
    - "try and xxx" construction
    - 'bring' is dependent on 'try' as conj
    
- "Try NOUN" construction
    - 'obj' dependency label.

In [17]:
pprint(uds['ewt-train-133'].syntax_edges())

{('ewt-train-133-root-0', 'ewt-train-133-syntax-4'): {'deprel': 'root',
                                                      'domain': 'syntax',
                                                      'id': 'ewt-train-133-syntax-4',
                                                      'type': 'dependency'},
 ('ewt-train-133-syntax-13', 'ewt-train-133-syntax-10'): {'deprel': 'case',
                                                          'domain': 'syntax',
                                                          'id': 'ewt-train-133-syntax-10',
                                                          'type': 'dependency'},
 ('ewt-train-133-syntax-13', 'ewt-train-133-syntax-11'): {'deprel': 'det',
                                                          'domain': 'syntax',
                                                          'id': 'ewt-train-133-syntax-11',
                                                          'type': 'dependency'},
 ('ewt-train-133-syntax-13', 'ewt-train-

In [18]:
for ((n1,n2),v) in uds['ewt-train-133'].syntax_edges().items():
    if n1=='ewt-train-133-syntax-4' and v['deprel']=='xcomp': 
        print (n2)

ewt-train-133-syntax-6


### 3-2. Back to sentence ewt-dev-539 for exploration

In [19]:
# sentence text

print(uds["ewt-dev-539"].name, '   ', uds['ewt-dev-539'].sentence)

ewt-dev-539     I tried to do it on the HRonline web - site , but the procedure is too complicated .


The "doing" is being tried.

Look at dependencies in the sentence:

from https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu

sent_id = email-enronsent30_02-0029

- "try" is root (0)
- "to" is dependent on "do"
- "do" is dependent on "try"

In [20]:
# semantic nodes of ewt-dev-539

for x in uds["ewt-dev-539"].semantics_nodes.keys():
    print(x)
    if uds["ewt-dev-539"].semantics_nodes[x]['frompredpatt']: 
        print(uds["ewt-dev-539"].head(x, ['form', 'lemma']))

ewt-dev-539-semantics-pred-2
(2, ['tried', 'try'])
ewt-dev-539-semantics-pred-4
(4, ['do', 'do'])
ewt-dev-539-semantics-pred-18
(18, ['complicated', 'complicated'])
ewt-dev-539-semantics-arg-1
(1, ['I', 'I'])
ewt-dev-539-semantics-arg-4
(4, ['do', 'do'])
ewt-dev-539-semantics-arg-5
(5, ['it', 'it'])
ewt-dev-539-semantics-arg-11
(11, ['site', 'site'])
ewt-dev-539-semantics-arg-15
(15, ['procedure', 'procedure'])
ewt-dev-539-semantics-pred-root
ewt-dev-539-semantics-arg-0
ewt-dev-539-semantics-arg-author
ewt-dev-539-semantics-arg-addressee


Semantic nodes will return both argument and predicate nodes together

In [21]:
# just predicate nodes of ewt-dev-539

for pred in uds["ewt-dev-539"].predicate_nodes.keys():
    print(pred)
    if uds["ewt-dev-539"].predicate_nodes[pred]['frompredpatt']: # this value is TRUE for non-performative nodes
        print(uds["ewt-dev-539"].head(pred, ['form', 'lemma']))

ewt-dev-539-semantics-pred-2
(2, ['tried', 'try'])
ewt-dev-539-semantics-pred-4
(4, ['do', 'do'])
ewt-dev-539-semantics-pred-18
(18, ['complicated', 'complicated'])
ewt-dev-539-semantics-pred-root


In [22]:
# A single entry, for "try"

pprint(uds["ewt-dev-539"].predicate_nodes['ewt-dev-539-semantics-pred-2'])

{'domain': 'semantics',
 'event_structure': {'avg_part_duration_lbound-centuries': {'confidence': 1,
                                                            'value': -0.4185},
                     'avg_part_duration_lbound-days': {'confidence': 1,
                                                       'value': -1.3143},
                     'avg_part_duration_lbound-decades': {'confidence': 1,
                                                          'value': -0.4689},
                     'avg_part_duration_lbound-forever': {'confidence': 1,
                                                          'value': -1.6181},
                     'avg_part_duration_lbound-fractions_of_a_second': {'confidence': 1,
                                                                        'value': -1.2363},
                     'avg_part_duration_lbound-hours': {'confidence': 1,
                                                        'value': -1.47},
                     'avg_part_duration_lbou

In [23]:
# A single entry, for "do"

pprint(uds["ewt-dev-539"].predicate_nodes['ewt-dev-539-semantics-pred-4'])

{'domain': 'semantics',
 'event_structure': {'dynamic': {'confidence': 0.5674227476119995,
                                 'value': 0.07712224125862122},
                     'natural_parts': {'confidence': 0.9999988675117493,
                                       'value': -1.0767347812652588},
                     'situation_duration_lbound-centuries': {'confidence': 1,
                                                             'value': -0.685},
                     'situation_duration_lbound-days': {'confidence': 1,
                                                        'value': -1.4659},
                     'situation_duration_lbound-decades': {'confidence': 1,
                                                           'value': -1.0697},
                     'situation_duration_lbound-forever': {'confidence': 1,
                                                           'value': 1.6609},
                     'situation_duration_lbound-fractions_of_a_second': {'confidence': 1,


In [24]:
# relation between predicate "try" and argument "I":

pprint(uds["ewt-dev-539"].semantics_edges()[('ewt-dev-539-semantics-pred-2', 'ewt-dev-539-semantics-arg-1')])

{'domain': 'semantics',
 'frompredpatt': True,
 'id': 'ewt-dev-539-semantics-arg-1',
 'protoroles': {'awareness': {'confidence': 1.0, 'value': 1.3575},
                'change_of_location': {'confidence': 0.2325, 'value': -0.1191},
                'change_of_possession': {'confidence': 1.0, 'value': -0.0},
                'change_of_state': {'confidence': 0.4282, 'value': 0.0066},
                'change_of_state_continuous': {'confidence': 0.4835,
                                               'value': -0.0769},
                'existed_after': {'confidence': 1.0, 'value': 1.3577},
                'existed_before': {'confidence': 1.0, 'value': 1.3586},
                'existed_during': {'confidence': 1.0, 'value': 1.3578},
                'instigation': {'confidence': 1.0, 'value': 1.3572},
                'partitive': {'confidence': 1.0, 'value': -0.0},
                'sentient': {'confidence': 1.0, 'value': 1.3565},
                'volition': {'confidence': 1.0, 'value': 1.3558},


## 4. Info on "try" verb sentences

In [25]:
# predicate nodes

for pred in uds["ewt-dev-539"].predicate_nodes:
    pprint(pred)

'ewt-dev-539-semantics-pred-2'
'ewt-dev-539-semantics-pred-4'
'ewt-dev-539-semantics-pred-18'
'ewt-dev-539-semantics-pred-root'


In [26]:
# only 2 sentences now for exploration

for s in try_sents[:2]:     
    print('========================================================================', '\n')
    print('SENTENCE:', '\n', uds[s].name, '   ', uds[s].sentence, '\n')
    print('PREDICATES:')
    for pred in uds[s].predicate_nodes.keys():
        print(pred)
        if uds[s].predicate_nodes[pred]['frompredpatt']:
            print(uds[s].head(pred, ['form', 'lemma']))
    print('\n')
    for pred in uds[s].predicate_nodes:
        print('PREDICATE:', ' ', pred, '\n')
        pprint(uds[s].predicate_nodes[pred])
        print()



SENTENCE: 
 ewt-train-133     Musharraf has been trying to purge his officer corps of the substantial number of al - Qaeda sympathizers . 

PREDICATES:
ewt-train-133-semantics-pred-4
(4, ['trying', 'try'])
ewt-train-133-semantics-pred-6
(6, ['purge', 'purge'])
ewt-train-133-semantics-pred-root


PREDICATE:   ewt-train-133-semantics-pred-4 

{'domain': 'semantics',
 'event_structure': {'dynamic': {'confidence': 0.9999978542327881,
                                 'value': 1.2817959785461426},
                     'natural_parts': {'confidence': 0.9999987483024597,
                                       'value': -1.089069128036499},
                     'situation_duration_lbound-centuries': {'confidence': 1,
                                                             'value': -0.3542},
                     'situation_duration_lbound-days': {'confidence': 1,
                                                        'value': -1.1031},
                     'situation_duration_lbound-decade

Big nested dictionaries! Will need to dig in further to see what these dictionaries contain and get specific node type subspaces. 

In [27]:
print(uds["ewt-dev-539"].predicate_nodes['ewt-dev-539-semantics-pred-4'].keys())
print()
print(uds["ewt-dev-539"].predicate_nodes['ewt-dev-539-semantics-pred-4'].items())

dict_keys(['domain', 'frompredpatt', 'type', 'event_structure', 'genericity', 'factuality', 'time'])

dict_items([('domain', 'semantics'), ('frompredpatt', True), ('type', 'predicate'), ('event_structure', {'dynamic': {'value': 0.07712224125862122, 'confidence': 0.5674227476119995}, 'natural_parts': {'value': -1.0767347812652588, 'confidence': 0.9999988675117493}, 'telic': {'value': -1.076088786125183, 'confidence': 0.9999988675117493}, 'situation_duration_lbound-centuries': {'value': -0.685, 'confidence': 1}, 'situation_duration_ubound-centuries': {'value': -0.685, 'confidence': 1}, 'situation_duration_lbound-days': {'value': -1.4659, 'confidence': 1}, 'situation_duration_ubound-days': {'value': -1.4659, 'confidence': 1}, 'situation_duration_lbound-decades': {'value': -1.0697, 'confidence': 1}, 'situation_duration_ubound-decades': {'value': -1.0697, 'confidence': 1}, 'situation_duration_lbound-forever': {'value': 1.6609, 'confidence': 1}, 'situation_duration_ubound-forever': {'value':

In [28]:
print_udep('ewt-train-133')

ewt-train-133
Musharraf has been trying to purge his officer corps of the substantial number of al - Qaeda sympathizers .
ewt-train-133-syntax-1	Musharraf	Musharraf	NNP	PROPN	ewt-train-133-syntax-4:nsubj
ewt-train-133-syntax-2	has	have	VBZ	AUX	ewt-train-133-syntax-4:aux
ewt-train-133-syntax-3	been	be	VBN	AUX	ewt-train-133-syntax-4:aux
ewt-train-133-syntax-4	trying	try	VBG	VERB	ewt-train-133-root-0:root
ewt-train-133-syntax-5	to	to	TO	PART	ewt-train-133-syntax-6:mark
ewt-train-133-syntax-6	purge	purge	VB	VERB	ewt-train-133-syntax-4:xcomp
ewt-train-133-syntax-7	his	he	PRP$	PRON	ewt-train-133-syntax-9:nmod:poss
ewt-train-133-syntax-8	officer	officer	NN	NOUN	ewt-train-133-syntax-9:compound
ewt-train-133-syntax-9	corps	corps	NN	NOUN	ewt-train-133-syntax-6:dobj
ewt-train-133-syntax-10	of	of	IN	ADP	ewt-train-133-syntax-13:case
ewt-train-133-syntax-11	the	the	DT	DET	ewt-train-133-syntax-13:det
ewt-train-133-syntax-12	substantial	substantial	JJ	ADJ	ewt-train-133-syntax-13:amod
ewt-train-133-syn

So the "try" here is the root, and the "purge" verb is dependent on it as an xcomp (open clausal complement). 

*see: https://universaldependencies.org/u/dep/xcomp.html*

From looking at the CoNNL-U format dataset (manually), I know that the predicates dependent on "try" in "try" constructions of interest are dependent in the xcomp relation. Let's test again (ID from list of "try" sentences):

In [29]:
print_udep('ewt-train-11637')

ewt-train-11637
I eventually decided to just pay the balance even though the doctor has already been paid , but now the collection agency is trying to say another reversal of $ 160.00 has come through .
ewt-train-11637-syntax-1	I	I	PRP	PRON	ewt-train-11637-syntax-3:nsubj
ewt-train-11637-syntax-2	eventually	eventually	RB	ADV	ewt-train-11637-syntax-3:advmod
ewt-train-11637-syntax-3	decided	decide	VBD	VERB	ewt-train-11637-root-0:root
ewt-train-11637-syntax-4	to	to	TO	PART	ewt-train-11637-syntax-6:mark
ewt-train-11637-syntax-5	just	just	RB	ADV	ewt-train-11637-syntax-6:advmod
ewt-train-11637-syntax-6	pay	pay	VB	VERB	ewt-train-11637-syntax-3:xcomp
ewt-train-11637-syntax-7	the	the	DT	DET	ewt-train-11637-syntax-8:det
ewt-train-11637-syntax-8	balance	balance	NN	NOUN	ewt-train-11637-syntax-6:dobj
ewt-train-11637-syntax-9	even	even	RB	ADV	ewt-train-11637-syntax-16:advmod
ewt-train-11637-syntax-10	though	though	IN	SCONJ	ewt-train-11637-syntax-16:mark
ewt-train-11637-syntax-11	the	the	DT	DET	ewt-tr

"try" here is dependent on "decide", and yep, "say" is dependent on "try" in xcomp relation.

### 4-1. Trying out helper functions (from Na-Rae - thank you!)

In [30]:
def find_xcomp(*args):
    """takes two arguments as a list: tree ID, and node1 ID which corresponds to the try verb's ID
    returns node ID of the token that's dependent on the try verb with xcomp relation"""
    tree, node1 = args[0], args[1]
    edges = uds[tree].syntax_edges()
    for ((n1,n2),v) in edges.items():
        if n1==node1 and v['deprel']=='xcomp':
            return(n2)

In [31]:
find_xcomp('ewt-train-133', 'ewt-train-133-syntax-4') 

'ewt-train-133-syntax-6'

In [32]:
find_xcomp('ewt-train-133', 'ewt-train-133-syntax-2')
# no xcomp dependent node; returns null

In [33]:
def get_node_info(s,nodeid):
    "takes sentence_id and node_id, returns form, lemma, xpos, upos"
    ndict= uds[s].syntax_nodes[nodeid]
    return (ndict['form'], ndict['lemma'], ndict['xpos'], ndict['upos'])

In [34]:
get_node_info('ewt-train-133', 'ewt-train-133-syntax-18')

('sympathizers', 'sympathizer', 'NNS', 'NOUN')

In [35]:
def get_form(s,nodeid):
    "takes sentence_id and node_id, returns form"
    ndict= uds[s].syntax_nodes[nodeid]
    return (ndict['form'])

In [36]:
def get_lemma(s,nodeid):
    "takes sentence_id and node_id, returns lemma"
    ndict= uds[s].syntax_nodes[nodeid]
    return (ndict['lemma'])

In [37]:
def get_head_node(s,nodeid):
    "takes sentence_id and node_id, returns its head node and dep relation"
    edge_dict = {e2:(e1,v['deprel']) for ((e1,e2),v) in uds[s].syntax_edges().items()}
    return (edge_dict[nodeid])

In [38]:
get_head_node('ewt-train-133', 'ewt-train-133-syntax-6')
# "purge" is dependent on node 4 'try' in 'xcomp' relation

('ewt-train-133-syntax-4', 'xcomp')

### 4-2. Types of relations

Sentences containing "try" constructions contain predicates dependent on "try" with differing relations.

In [39]:
# 'try' sentences where xcomp node IS NOT found

for (s,trid) in try_sents_id[:30]:
    #print(s,trid)
    if not find_xcomp(s,trid):   
        print(s)
        print(uds[s].sentence)
        print()

ewt-train-668
Moreover there are now major covert attempts under way to try and bring back to Kabul leading Taliban commanders , who have been living quietly in Pakistan and have taken no part in the Taliban insurgency .

ewt-train-701
In a highly significant move Afghan Pashtun tribes along the Pakistan border warned the Taliban in Quetta and Chaman that if they try and disrupt the elections , they would be resisted .

ewt-train-1288
It is trying to play ice hockey by sending a ballerina ice - skater into the ring or to knock out a heavyweight boxer by a chess player .

ewt-train-3431
Let s try before Fri , as I am planning to take that day off

ewt-train-4384
Try this .



In [40]:
# 'try' sentences where xcomp node IS found

for (s,trid) in try_sents_id[:10]:  # out of first 10
    #print(s,trid)
    if find_xcomp(s,trid):   # cases where xcomp node IS found
        print(s)
        print(trid, get_node_info(s,trid))
        xcompid = find_xcomp(s,trid)  # node ID for the xcomp verb
        print(xcompid, get_node_info(s,xcompid)) 
        print(uds[s].sentence)
        print()

ewt-train-133
ewt-train-133-syntax-4 ('trying', 'try', 'VBG', 'VERB')
ewt-train-133-syntax-6 ('purge', 'purge', 'VB', 'VERB')
Musharraf has been trying to purge his officer corps of the substantial number of al - Qaeda sympathizers .

ewt-train-637
ewt-train-637-syntax-22 ('trying', 'try', 'VBG', 'VERB')
ewt-train-637-syntax-24 ('strengthen', 'strengthen', 'VB', 'VERB')
The neighbours are still interfering , but there are signs that rather than undermining Afghanistan 's stability they may now be trying to strengthen it .

ewt-train-688
ewt-train-688-syntax-8 ('trying', 'try', 'VBG', 'VERB')
ewt-train-688-syntax-10 ('undermine', 'undermine', 'VB', 'VERB')
However powerful hardliners in Tehran may be trying to undermine that strategy and a new issue is likely to deepen the rift with the moderates .

ewt-train-796
ewt-train-796-syntax-3 ('trying', 'try', 'VBG', 'VERB')
ewt-train-796-syntax-5 ('buddy', 'buddy', 'VB', 'VERB')
He was trying to buddy with Archibald and impress him .

ewt-tra

## 5. Specific info for sentences with "try" constructions

I want to be able to see the ID, text, and predicate information about these sentences.

In [41]:
# initializing lists

sentence_id     = []
sentence        = []
try_id          = []
try_form        = []
try_lemma       = []
after_try_id    = []
after_try_form  = []
after_try_lemma = []

# populating lists

for (s,trid) in try_sents_id:    
    if find_xcomp(s,trid):   # cases where xcomp node IS found
        xcompid = find_xcomp(s,trid) 
        tr_form = get_form(s,trid).lower()   # lowercased because form could have both cases
        tr_lemma = get_lemma(s,trid)
        after_tr_form = get_form(s,xcompid)
        after_tr_lemma = get_lemma(s,xcompid)
        sent = uds[s].sentence   
        sentence_id.append(s)
        sentence.append(sent)
        try_id.append(trid)
        try_form.append(tr_form)
        try_lemma.append(tr_lemma)
        after_try_id.append(xcompid)
        after_try_form.append(after_tr_form)
        after_try_lemma.append(after_tr_lemma)

In [42]:
# checking

print(try_id[:2])
print(try_lemma[:2])
print(after_try_form[:2])

['ewt-train-133-syntax-4', 'ewt-train-637-syntax-22']
['try', 'try']
['purge', 'strengthen']


Seems good!

### 5-1. Create DataFrame with sentence and word info

In [43]:
# create dataframe from above lists

df = pd.DataFrame(list(zip(sentence_id, sentence, try_id, try_form, try_lemma, after_try_id, after_try_form, after_try_lemma)),
                 columns = ['sentence_id', 'sentence_text', 'try_id', 'try_form', 'try_lemma', 'after_try_id', 'after_try_form', 'after_try_lemma'])

In [44]:
# show dataframe

df

Unnamed: 0,sentence_id,sentence_text,try_id,try_form,try_lemma,after_try_id,after_try_form,after_try_lemma
0,ewt-train-133,Musharraf has been trying to purge his officer...,ewt-train-133-syntax-4,trying,try,ewt-train-133-syntax-6,purge,purge
1,ewt-train-637,"The neighbours are still interfering , but the...",ewt-train-637-syntax-22,trying,try,ewt-train-637-syntax-24,strengthen,strengthen
2,ewt-train-688,However powerful hardliners in Tehran may be t...,ewt-train-688-syntax-8,trying,try,ewt-train-688-syntax-10,undermine,undermine
3,ewt-train-796,He was trying to buddy with Archibald and impr...,ewt-train-796-syntax-3,trying,try,ewt-train-796-syntax-5,buddy,buddy
4,ewt-train-881,I read of a case not long ago when some people...,ewt-train-881-syntax-13,trying,try,ewt-train-881-syntax-15,get,get
...,...,...,...,...,...,...,...,...
152,ewt-test-1361,o and the cheaper the better ( we are trying t...,ewt-test-1361-syntax-10,trying,try,ewt-test-1361-syntax-12,save,save
153,ewt-test-1368,just pray for her ad=nd try to hlep your dog a...,ewt-test-1368-syntax-6,try,try,ewt-test-1368-syntax-8,hlep,hlep
154,ewt-test-1474,"I have a friend out in Chicago this week , and...",ewt-test-1474-syntax-14,trying,try,ewt-test-1474-syntax-16,remember,remember
155,ewt-test-1817,"When I tried to return it they refused , so I ...",ewt-test-1817-syntax-3,tried,try,ewt-test-1817-syntax-5,return,return


In [45]:
# info

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 157 entries, 0 to 156
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   sentence_id      157 non-null    object
 1   sentence_text    157 non-null    object
 2   try_id           157 non-null    object
 3   try_form         157 non-null    object
 4   try_lemma        157 non-null    object
 5   after_try_id     157 non-null    object
 6   after_try_form   157 non-null    object
 7   after_try_lemma  157 non-null    object
dtypes: object(8)
memory usage: 9.9+ KB


Looks like there are 157 different "try" constructions.

In [46]:
# describe

df.describe()

Unnamed: 0,sentence_id,sentence_text,try_id,try_form,try_lemma,after_try_id,after_try_form,after_try_lemma
count,157,157,157,157,157,157,157,157
unique,154,152,157,4,1,157,106,102
top,ewt-train-7580,"with your leg flopping , just try putting all ...",ewt-train-7580-syntax-13,trying,try,ewt-train-8817-syntax-3,get,get
freq,2,2,1,64,157,1,14,14


What is going on with the number of unique instances?

In [47]:
df.loc[df['sentence_id'] == 'ewt-train-9073']

Unnamed: 0,sentence_id,sentence_text,try_id,try_form,try_lemma,after_try_id,after_try_form,after_try_lemma
82,ewt-train-9073,"with your leg flopping , just try putting all ...",ewt-train-9073-syntax-7,try,try,ewt-train-9073-syntax-8,putting,put
83,ewt-train-9073,"with your leg flopping , just try putting all ...",ewt-train-9073-syntax-47,try,try,ewt-train-9073-syntax-48,standing,stand


Looks like sentence ewt-train-9073 appears twice.

In [48]:
df.loc[82].sentence_text

"with your leg flopping , just try putting all your weight in your heel and hang on with only your legs , holding my leg back helps me with flopping the more it 's forward the more it flops :/ work your calf muscles , maybe try standing on the edge of your steps ( on the ball of your foot ) and balance to strengthen those muscles"

So, sentence ewt-train-9073 appears twice because there are two different "try" constructions in it!

In [49]:
# different "try" forms (remember, these were lowercased)

df.try_form.value_counts()

trying    64
try       59
tried     33
tries      1
Name: try_form, dtype: int64

In [50]:
# top 15 words after "try"

df.after_try_form.value_counts().head(15)

get          14
make          5
find          5
bite          4
keep          4
do            4
help          3
develop       3
eat           3
googling      2
kill          2
translate     2
explain       2
charge        2
build         2
Name: after_try_form, dtype: int64

Some form of "try" + "get" is the most common "try" predicate construction at 14 instances in the data. 

## 6. Closer look at node type subspaces (of interest)

I am interested in analyzing the semantics of both the "try" predicate and the predicate dependent on the "try" predicate in the xcomp relation, specifically the extent to which the predicates involved in the "try" construction occurred. 

After reading through the [Universal Decompositional Semantic Types page](https://decomp.readthedocs.io/en/latest/data/semantic-types.html#) and seeing the different types of information available for the predicates and arguments, I figured that factuality, telicity, and the hypothetical attribute of genericity may be interesting to look at more in depth for my purposes.

### 6-1. Factuality 

from http://decomp.io/projects/factuality/

"A central function of natural language is to convey information about the properties of events. Perhaps the most fundamental of these properties is factuality: whether an event happened or not."

"In this line of work, we develop a factuality annotation that incorporates a notion of confidence. This allows us to handle a wide variety of cases where the factuality of an event is unclear."

- node type subspace `factuality`

### 6-2. Event structure

from http://decomp.io/projects/event-structure/

"In this work, we aim to capture the structure of complex events, augmenting existing UDS with a new dataset for event-structural properties that capture information about such things as the subparts of an event, how they are arranged in time, and how events relate to each other and their participants."

"We use this new dataset along with others in UDS to induce an empircal event structure ontology from a generative model based on sentence- and document-level UDS graphs. This ontology is jointly learned with three other ontologies for semantic roles, entities, and event-event relations. In each case, we find that our categories align well with others proposed in the linguistics and computational semantics literature."

- node type subspace `event_structure`

of interest:
- `telic`: dealing with a clear endpoint to an action

### 6-3. Genericity

from http://decomp.io/projects/genericity/

"An important line of study in formal semantics, philosophy, and AI investigates how language is used to represent knowledge of kinds, regularities and patterns."

"In this line of work, we propose a novel framework for capturing linguistic expressions of generalization. We suggest that linguistic expressions of generalization should be captured in a continuous multi-label system, rather than a multi-class system. We do this by decomposing categories such as EPISODIC, HABITUAL, and GENERIC into simple referential properties of predicates and their arguments."

- node type subspace `genericity`

of interest:
- `pred-hypothetical`

### 6-4. Values for subspaces

In [51]:
# formatted list of sentences, predicates, and 
# information of interest for each predicate
# only first 2 as sample

for (s,trid) in try_sents_id[:2]:   
    if find_xcomp(s,trid):
        print('========================================================================', '\n')
        print('SENTENCE:', '\n', uds[s].name, '\n', uds[s].sentence, '\n')
        print('PREDICATES:')
        for pred in uds[s].predicate_nodes.keys():
            print(pred)
            if uds[s].predicate_nodes[pred]['frompredpatt']:
                print(uds[s].head(pred, ['form', 'lemma']))
        print('\n')
        for pred in uds[s].predicate_nodes:
            print('PREDICATE:', pred)
            #pprint(uds[s].predicate_nodes[pred])
            print()
            for (n, v) in uds[s].predicate_nodes[pred].items():
                if n == 'event_structure':
                    #print(n + ': ', v)
                    #print()
                    for (x, y) in v.items():
                        if x == 'telic':
                            print(x, '\n', 'Value:', y['value'], '\n', 'Confidence:', y['confidence'])
                            print()
                if n == 'genericity':
                    for (x, y) in v.items():
                        if x == 'pred-hypothetical':
                            print(x, '\n', 'Value:', y['value'], '\n', 'Confidence:', y['confidence'])
                            print()
                if n == 'factuality': 
                    for (x, y) in v.items():
                        if x == 'factual':
                            print(x, '\n', 'Value:', y['value'], '\n', 'Confidence:', y['confidence'])
                            print()
                if n == 'time':
                    for (x, y) in v.items():
                        print(x, '\n', 'Value:', y['value'], '\n', 'Confidence:', y['confidence'], '\n')          
            print()


SENTENCE: 
 ewt-train-133 
 Musharraf has been trying to purge his officer corps of the substantial number of al - Qaeda sympathizers . 

PREDICATES:
ewt-train-133-semantics-pred-4
(4, ['trying', 'try'])
ewt-train-133-semantics-pred-6
(6, ['purge', 'purge'])
ewt-train-133-semantics-pred-root


PREDICATE: ewt-train-133-semantics-pred-4

telic 
 Value: 1.2817955017089844 
 Confidence: 0.9999978542327881

pred-hypothetical 
 Value: -0.7177 
 Confidence: 1.0

factual 
 Value: 1.2194 
 Confidence: 1.0

dur-weeks 
 Value: -1.3455 
 Confidence: 1.0 

dur-decades 
 Value: -1.1133 
 Confidence: 1.0 

dur-days 
 Value: -1.2002 
 Confidence: 1.0 

dur-hours 
 Value: -1.1744 
 Confidence: 1.0 

dur-seconds 
 Value: -0.9131 
 Confidence: 1.0 

dur-forever 
 Value: -0.3098 
 Confidence: 1.0 

dur-centuries 
 Value: -0.8337 
 Confidence: 1.0 

dur-instant 
 Value: -0.4054 
 Confidence: 1.0 

dur-years 
 Value: -1.0219 
 Confidence: 1.0 

dur-minutes 
 Value: -1.0265 
 Confidence: 1.0 

dur-months 
 

According to the [published literature on the UDS dataset](https://aclanthology.org/2020.lrec-1.699.pdf), the values for each attribute come from crowd-sourced annotations. These annotations are binary responses (YES/NO) in response to simple questions about the predicates, and then along with them, there is a 1-5 confidence rating (how confident the annotator feels about their YES or NO answer) that gets normalized. The confidence is the researcher confidence score dealing with how accurate the researcher believes the annotators' responses to be (there will be a lower confidence score if the annotator responses are more variable). Confidence is on [0,1].

### 6-5. Lists for tracking values

In [52]:

telic_value         = []
hypothetical_value  = []
factuality_value    = []

for (s,trid) in try_sents_id:   
    if find_xcomp(s,trid):
        for pred in uds[s].predicate_nodes:
            for (n, v) in uds[s].predicate_nodes[pred].items():
                if n == 'event_structure':
                    for (x, y) in v.items():
                        if x == 'telic':
                            telic_value.append(y['value'])
                if n == 'genericity':
                    for (x, y) in v.items():
                        if x == 'pred-hypothetical':
                            hypothetical_value.append(y['value'])
                if n == 'factuality': 
                    for (x, y) in v.items():
                        if x == 'factual':
                            factuality_value.append(y['value'])
                #if n == 'time':
                    #for (x, y) in v.items():
                        #print(x, '\n', 'Value:', y['value'], '\n', 'Confidence:', y['confidence'], '\n')          
            #print()

In [53]:
print(telic_value[:5])
print(hypothetical_value[:5])
print(factuality_value[:5])

[1.2817955017089844, -1.1354095935821533, -1.0744456052780151, 1.295236349105835, -1.0961610078811646]
[-0.7177, 0.4279, -1.3616, -2.3865, -2.3865]
[1.2194, 0.2707, 1.0498, 1.0616, 1.0834]
