# Exploring UDS with CRC

Installing on CRC done via two commands:

- `module load python/ondemand-jupyter-python3.8`

- `pip install --user git+https://github.com/decompositional-semantics-initiative/decomp.git`

additional:

- `pip install pathlib`
- `pip install ruamel-yaml`
- `pip install pyqtwebengine==5.13`
- `pip install pyqt5==5.13`

In [1]:
%pprint   

from pprint import pprint

Pretty printing has been turned OFF


In [2]:
from decomp import UDSCorpus

In [3]:
uds = UDSCorpus()

In [4]:
# all sentences with "try"
for x in uds:
    sent_text = uds[x].sentence
    if ' try ' in sent_text:    # based on text substring - need spaces because otherwise was getting 'try' as part of bigger string
        print(x, sent_text)

ewt-train-668 Moreover there are now major covert attempts under way to try and bring back to Kabul leading Taliban commanders , who have been living quietly in Pakistan and have taken no part in the Taliban insurgency .
ewt-train-701 In a highly significant move Afghan Pashtun tribes along the Pakistan border warned the Taliban in Quetta and Chaman that if they try and disrupt the elections , they would be resisted .
ewt-train-1195 If part of the public supports it , others tolerate it , many are afraid of it and some try to explain it away by poverty or by a miserable childhood , organized crime will thrive and so will terrorism .
ewt-train-1328 Nevertheless , most European countries still trade with Iran , try to appease it and refuse to read the clear signals .
ewt-train-1330 It is pointless to try to understand the subtle differences between the Sunni terror of Al Qaeda and Hamas and the Shiite terror of Hezbollah , Sadr and other Iranian - inspired enterprises .
ewt-train-2078 We

**In a lot of these sentences "try" functions as a marker of the future - for an experience that has not yet happened.**



In [5]:
# all sentences with "tried"
for x in uds:
    sent_text = uds[x].sentence
    if ' tried ' in sent_text:    # based on text substring
        print(x, sent_text)

ewt-train-917 Afshari , who was posted in Germany and was responsible for receiving Mujahedeen children during the gulf war , said that when the German government tried to absorb Mujahedeen children into their education system , the Mujahedeen refused .
ewt-train-5584 When Sabeer Bhatia came up with the business plan for the mail service , he tried all kinds of names ending in ' mail ' and finally settled for hotmail as it included the letters " html " - the programming language used to write web pages .
ewt-train-5633 Anyway , I tried to get a preorder but no places were selling it so I missed out on 2/3 weekend events .
ewt-train-6438 The Pew researchers tried to transcend the economic argument .
ewt-train-7233 Have you tried using clockwork recovery ?
ewt-train-8476 I would n't know -- I do n't particularly like scary movies , and I have n't tried this with my pet .
ewt-train-8587 I have done a fair amount of metal casting , but never tried to build my own furnace , but I think your

The first sentence here is interesting - this is what I am interested in with the "try" construction. 

> ewt-dev-539 I tried to do it on the HRonline web - site , but the procedure is too complicated .

So to what extent did the action occur?

Take a closer look at sentence ewt-dev-539, but first create variable containing all the sentences with the verb "try".

In [6]:
# based on lemma + POS -> only verbs

try_sents = []
for x in uds:
    sent_text = uds[x].sentence
    sent_lem = [(v['lemma'], v['upos']) for (n,v) in uds[x].syntax_nodes.items()]
    sent_lem_set = set(sent_lem)
    if ('try', 'VERB') in sent_lem_set:
        print(x, sent_text)
        try_sents.append(x)

ewt-train-133 Musharraf has been trying to purge his officer corps of the substantial number of al - Qaeda sympathizers .
ewt-train-637 The neighbours are still interfering , but there are signs that rather than undermining Afghanistan 's stability they may now be trying to strengthen it .
ewt-train-668 Moreover there are now major covert attempts under way to try and bring back to Kabul leading Taliban commanders , who have been living quietly in Pakistan and have taken no part in the Taliban insurgency .
ewt-train-688 However powerful hardliners in Tehran may be trying to undermine that strategy and a new issue is likely to deepen the rift with the moderates .
ewt-train-701 In a highly significant move Afghan Pashtun tribes along the Pakistan border warned the Taliban in Quetta and Chaman that if they try and disrupt the elections , they would be resisted .
ewt-train-796 He was trying to buddy with Archibald and impress him .
ewt-train-881 I read of a case not long ago when some peop

In [7]:
len(try_sents)

236

In [16]:
try_sents[100]

'ewt-train-8851'

**Wow, now lots of "try" constructions once we base on lemma**

In [8]:
for s in try_sents[:6]:      # first 6
    print()
    for (n, v) in uds[s].syntax_nodes.items():
        print(n, v['form'], v['lemma'], v['xpos'], v['upos'], sep="\t")


ewt-train-133-syntax-1	Musharraf	Musharraf	NNP	PROPN
ewt-train-133-syntax-2	has	have	VBZ	AUX
ewt-train-133-syntax-3	been	be	VBN	AUX
ewt-train-133-syntax-4	trying	try	VBG	VERB
ewt-train-133-syntax-5	to	to	TO	PART
ewt-train-133-syntax-6	purge	purge	VB	VERB
ewt-train-133-syntax-7	his	he	PRP$	PRON
ewt-train-133-syntax-8	officer	officer	NN	NOUN
ewt-train-133-syntax-9	corps	corps	NN	NOUN
ewt-train-133-syntax-10	of	of	IN	ADP
ewt-train-133-syntax-11	the	the	DT	DET
ewt-train-133-syntax-12	substantial	substantial	JJ	ADJ
ewt-train-133-syntax-13	number	number	NN	NOUN
ewt-train-133-syntax-14	of	of	IN	ADP
ewt-train-133-syntax-15	al	al	NNP	PROPN
ewt-train-133-syntax-16	-	-	HYPH	PUNCT
ewt-train-133-syntax-17	Qaeda	Qaeda	NNP	PROPN
ewt-train-133-syntax-18	sympathizers	sympathizer	NNS	NOUN
ewt-train-133-syntax-19	.	.	.	PUNCT

ewt-train-637-syntax-1	The	the	DT	DET
ewt-train-637-syntax-2	neighbours	neighbour	NNS	NOUN
ewt-train-637-syntax-3	are	be	VBP	AUX
ewt-train-637-syntax-4	still	still	RB	ADV
ewt-train

"try" 

**back to sentence ewt-dev-539**

In [9]:
# sentence text

print(uds["ewt-dev-539"].name, '   ', uds['ewt-dev-539'].sentence)

ewt-dev-539     I tried to do it on the HRonline web - site , but the procedure is too complicated .


The "doing" is being tried.

Look at dependencies in the sentence:

from https://raw.githubusercontent.com/UniversalDependencies/UD_English-EWT/master/en_ewt-ud-dev.conllu

sent_id = email-enronsent30_02-0029

- "try" is root (0)
- "to" is dependent on "do"
- "do" is dependent on "try"

In [10]:
# semantic nodes of ewt-dev-539

for x in uds["ewt-dev-539"].semantics_nodes.keys():
    print(x)
    if uds["ewt-dev-539"].semantics_nodes[x]['frompredpatt']: 
        print(uds["ewt-dev-539"].head(x, ['form', 'lemma']))

ewt-dev-539-semantics-pred-2
(2, ['tried', 'try'])
ewt-dev-539-semantics-pred-4
(4, ['do', 'do'])
ewt-dev-539-semantics-pred-18
(18, ['complicated', 'complicated'])
ewt-dev-539-semantics-arg-1
(1, ['I', 'I'])
ewt-dev-539-semantics-arg-4
(4, ['do', 'do'])
ewt-dev-539-semantics-arg-5
(5, ['it', 'it'])
ewt-dev-539-semantics-arg-11
(11, ['site', 'site'])
ewt-dev-539-semantics-arg-15
(15, ['procedure', 'procedure'])
ewt-dev-539-semantics-pred-root
ewt-dev-539-semantics-arg-0
ewt-dev-539-semantics-arg-author
ewt-dev-539-semantics-arg-addressee


Semantic nodes will return both argument and predicate nodes together

In [11]:
# just predicate nodes of ewt-dev-539

for pred in uds["ewt-dev-539"].predicate_nodes.keys():
    print(pred)
    if uds["ewt-dev-539"].predicate_nodes[pred]['frompredpatt']: # this value is TRUE for non-performative nodes
        print(uds["ewt-dev-539"].head(pred, ['form', 'lemma']))

ewt-dev-539-semantics-pred-2
(2, ['tried', 'try'])
ewt-dev-539-semantics-pred-4
(4, ['do', 'do'])
ewt-dev-539-semantics-pred-18
(18, ['complicated', 'complicated'])
ewt-dev-539-semantics-pred-root


In [12]:
# A single entry, for "try"

pprint(uds["ewt-dev-539"].predicate_nodes['ewt-dev-539-semantics-pred-2'])

{'domain': 'semantics',
 'event_structure': {'avg_part_duration_lbound-centuries': {'confidence': 1,
                                                            'value': -0.4185},
                     'avg_part_duration_lbound-days': {'confidence': 1,
                                                       'value': -1.3143},
                     'avg_part_duration_lbound-decades': {'confidence': 1,
                                                          'value': -0.4689},
                     'avg_part_duration_lbound-forever': {'confidence': 1,
                                                          'value': -1.6181},
                     'avg_part_duration_lbound-fractions_of_a_second': {'confidence': 1,
                                                                        'value': -1.2363},
                     'avg_part_duration_lbound-hours': {'confidence': 1,
                                                        'value': -1.47},
                     'avg_part_duration_lbou

In [13]:
# A single entry, for "do"

pprint(uds["ewt-dev-539"].predicate_nodes['ewt-dev-539-semantics-pred-4'])

{'domain': 'semantics',
 'event_structure': {'dynamic': {'confidence': 0.5674227476119995,
                                 'value': 0.07712224125862122},
                     'natural_parts': {'confidence': 0.9999988675117493,
                                       'value': -1.0767347812652588},
                     'situation_duration_lbound-centuries': {'confidence': 1,
                                                             'value': -0.685},
                     'situation_duration_lbound-days': {'confidence': 1,
                                                        'value': -1.4659},
                     'situation_duration_lbound-decades': {'confidence': 1,
                                                           'value': -1.0697},
                     'situation_duration_lbound-forever': {'confidence': 1,
                                                           'value': 1.6609},
                     'situation_duration_lbound-fractions_of_a_second': {'confidence': 1,


In [19]:
# relation between predicate "try" and argument "I":

pprint(uds["ewt-dev-539"].semantics_edges()[('ewt-dev-539-semantics-pred-2', 'ewt-dev-539-semantics-arg-1')])

{'domain': 'semantics',
 'frompredpatt': True,
 'id': 'ewt-dev-539-semantics-arg-1',
 'protoroles': {'awareness': {'confidence': 1.0, 'value': 1.3575},
                'change_of_location': {'confidence': 0.2325, 'value': -0.1191},
                'change_of_possession': {'confidence': 1.0, 'value': -0.0},
                'change_of_state': {'confidence': 0.4282, 'value': 0.0066},
                'change_of_state_continuous': {'confidence': 0.4835,
                                               'value': -0.0769},
                'existed_after': {'confidence': 1.0, 'value': 1.3577},
                'existed_before': {'confidence': 1.0, 'value': 1.3586},
                'existed_during': {'confidence': 1.0, 'value': 1.3578},
                'instigation': {'confidence': 1.0, 'value': 1.3572},
                'partitive': {'confidence': 1.0, 'value': -0.0},
                'sentient': {'confidence': 1.0, 'value': 1.3565},
                'volition': {'confidence': 1.0, 'value': 1.3558},


- **compare with other "try" constructions where extent of action completion is different**
- **compare with same types of "try" constructions being used in CHILDES dataset**

- What is "try" doing to the main verb that depends on it? 

In [21]:
for pred in uds["ewt-dev-539"].predicate_nodes:
    pprint(pred)

'ewt-dev-539-semantics-pred-2'
'ewt-dev-539-semantics-pred-4'
'ewt-dev-539-semantics-pred-18'
'ewt-dev-539-semantics-pred-root'


Reminders to make for-looping go more smoothly:

In [73]:
#for pred in uds["ewt-dev-539"].predicate_nodes.keys():
    #print(pred)
    #if uds["ewt-dev-539"].predicate_nodes[pred]['frompredpatt']: # this value is TRUE for non-performative nodes
        #print(uds["ewt-dev-539"].head(pred, ['form', 'lemma']))
        
#for pred in uds["ewt-dev-539"].predicate_nodes:
    #pprint(pred)

#pprint(uds["ewt-dev-539"].predicate_nodes['ewt-dev-539-semantics-pred-4'])


# A single entry, for "do"

#pprint(uds["ewt-dev-539"].predicate_nodes['ewt-dev-539-semantics-pred-4'])

#print(uds["ewt-dev-539"].name, '   ', uds['ewt-dev-539'].sentence)


#for pred in uds["ewt-dev-539"].predicate_nodes.keys():
    #print(pred)
    #if uds["ewt-dev-539"].predicate_nodes[pred]['frompredpatt']: # this value is TRUE for non-performative nodes
        #print(uds["ewt-dev-539"].head(pred, ['form', 'lemma']))


In [77]:
# only 2 sentences now for exploration

for s in try_sents[:2]:     
    print('========================================================================', '\n')
    print('SENTENCE:', '\n', uds[s].name, '   ', uds[s].sentence, '\n')
    print('PREDICATES:')
    for pred in uds[s].predicate_nodes.keys():
        print(pred)
        if uds[s].predicate_nodes[pred]['frompredpatt']:
            print(uds[s].head(pred, ['form', 'lemma']))
    print('\n')
    for pred in uds[s].predicate_nodes:
        print('PREDICATE:', ' ', pred, '\n')
        pprint(uds[s].predicate_nodes[pred])
        print()



SENTENCE: 
 ewt-train-133     Musharraf has been trying to purge his officer corps of the substantial number of al - Qaeda sympathizers . 

PREDICATES:
ewt-train-133-semantics-pred-4
(4, ['trying', 'try'])
ewt-train-133-semantics-pred-6
(6, ['purge', 'purge'])
ewt-train-133-semantics-pred-root


PREDICATE:   ewt-train-133-semantics-pred-4 

{'domain': 'semantics',
 'event_structure': {'dynamic': {'confidence': 0.9999978542327881,
                                 'value': 1.2817959785461426},
                     'natural_parts': {'confidence': 0.9999987483024597,
                                       'value': -1.089069128036499},
                     'situation_duration_lbound-centuries': {'confidence': 1,
                                                             'value': -0.3542},
                     'situation_duration_lbound-days': {'confidence': 1,
                                                        'value': -1.1031},
                     'situation_duration_lbound-decade

Now need to dig in further to get specific node type subspaces

## Factuality 

from http://decomp.io/projects/factuality/

"A central function of natural language is to convey information about the properties of events. Perhaps the most fundamental of these properties is factuality: whether an event happened or not."

"In this line of work, we develop a factuality annotation that incorporates a notion of confidence. This allows us to handle a wide variety of cases where the factuality of an event is unclear."

- node type subspace `factuality`

## Event structure

from http://decomp.io/projects/event-structure/

"In this work, we aim to capture the structure of complex events, augmenting existing UDS with a new dataset for event-structural properties that capture information about such things as the subparts of an event, how they are arranged in time, and how events relate to each other and their participants."

"We use this new dataset along with others in UDS to induce an empircal event structure ontology from a generative model based on sentence- and document-level UDS graphs. This ontology is jointly learned with three other ontologies for semantic roles, entities, and event-event relations. In each case, we find that our categories align well with others proposed in the linguistics and computational semantics literature."

- node type subspace `event_structure`

of interest:
- `telic`: dealing with a clear endpoint to an action

## Genericity

from http://decomp.io/projects/genericity/

"An important line of study in formal semantics, philosophy, and AI investigates how language is used to represent knowledge of kinds, regularities and patterns."

"In this line of work, we propose a novel framework for capturing linguistic expressions of generalization. We suggest that linguistic expressions of generalization should be captured in a continuous multi-label system, rather than a multi-class system. We do this by decomposing categories such as EPISODIC, HABITUAL, and GENERIC into simple referential properties of predicates and their arguments."

- node type subspace `genericity`

of interest:
- `pred-hypothetical`