This notebook is for illustrating how to get dialogue relation information in ASER.  

First, load dependencies. The "aser" package should be installed first (https://github.com/HKUST-KnowComp/ASER).

In [1]:
import os
import pickle
from aser.extract.parsed_reader import ParsedReader

In [3]:
# filepaths of eid2sid, rid2sid. They can be found in s3://dgl-data/
rid2sids_path = '/home/data/corpora/aser/database/filter_2.0/2/rid2sids.pkl'
rid2relation_path = 'rid2relation.pkl'
# the KG path
processed_path = '/home/data/corpora/aser/data'

parsed_reader = ParsedReader()

In [4]:
with open(rid2sids_path, "rb") as f:
    rid2sids = pickle.load(f)

with open(rid2relation_path, 'rb') as f:
    rid2relation = pickle.load(f)

Then, with rid2sids & rid2relation, we can get the relation -> sentence mapping.  

The corresponding sentence can be retrieved as follows

In [54]:
rid = '72a6534ca89e2b548ab406af58b0c43a959973ac'


print('[rid]:', rid)
print('[relations]:', rid2relation[rid])

print('[sids]:', rid2sids[rid])
for sid_pair in rid2sids[rid]:
    h_sid, t_sid = sid_pair
    # note that h_sid could be the same as t_sid, in which case the relation is between the eventualities within the same sentence
    h_sent = parsed_reader.get_parsed_sentence_and_context(os.path.join(processed_path, h_sid))['sentence']['text']
    t_sent = parsed_reader.get_parsed_sentence_and_context(os.path.join(processed_path, t_sid))['sentence']['text']
    print('[SID-pair]:', sid_pair)
    print('\t[head]:', h_sent)
    print('\t[tail]:', t_sent)


[rid]: 72a6534ca89e2b548ab406af58b0c43a959973ac
[relations]: {'hid': '95f20fc56205d5cffd6448ccf5fbd5667c7c170d',
 'relations': {'Conjunction': 3.0},
 'rid': '72a6534ca89e2b548ab406af58b0c43a959973ac',
 'tid': '2560e1c9b1855043cfaa7d3eaac70a4285517e7a'}
[sids]: [('subtitles/parsed_para/subtitles_990260.jsonl|210314', 'subtitles/parsed_para/subtitles_990260.jsonl|210315'), ('subtitles/parsed_para/subtitles_990260.jsonl|210334', 'subtitles/parsed_para/subtitles_990260.jsonl|210335'), ('subtitles/parsed_para/subtitles_990260.jsonl|210357', 'subtitles/parsed_para/subtitles_990260.jsonl|210358')]
[SID-pair]: ('subtitles/parsed_para/subtitles_990260.jsonl|210314', 'subtitles/parsed_para/subtitles_990260.jsonl|210315')
	[head]: Can I touch you?
	[tail]: And do the things that lovers do?
[SID-pair]: ('subtitles/parsed_para/subtitles_990260.jsonl|210334', 'subtitles/parsed_para/subtitles_990260.jsonl|210335')
	[head]: Can I touch you?
	[tail]: And do the things that lovers do?
[SID-pair]: ('subt

Most sentence pairs have the same head and tail sentence, see the statistics below

In [45]:
same = 0
diff = 0
for tmp in tqdm(rid2sids):
  for sid_pair in rid2sids[tmp]:
    if sid_pair[0] != sid_pair[1]:
      diff += 1
    else:
      same += 1
print('same:{}({:.2f}), diff: {}({:.2f})'.format(same, same/(same+diff), diff, diff/(same+diff)))

100%|██████████| 52296498/52296498 [01:25<00:00, 610439.05it/s]

same:149991894(0.92), diff: 13554304(0.08)





In this case, you might need to use the eventuality info to split the sentence  
The ''rid2relation'' mapping contains the eventuality info corresponding to the relation

The eid2eventuality mapping can be load as follows.

In [None]:
eid2eventuality_path = 'eid2eventuality.pkl'

with open(eid2eventuality_path, 'rb') as f:
    eid2eventuality = pickle.load(f)

In [75]:
rid = '72a6534ca89e2b548ab406af58b0c43a959973ac'
mapping = rid2relation[rid].to_dict()
h_eid, t_eid = mapping['hid'], mapping['tid']

print(eid2eventuality[h_eid].to_dict())

{'eid': '95f20fc56205d5cffd6448ccf5fbd5667c7c170d', 'pattern': 's-v-o', '_dependencies': [[2, 'aux', 0], [2, 'nsubj', 1], [2, 'dobj', 3]], 'words': ['can', 'i', 'touch', 'you'], 'pos_tags': ['MD', 'PRP', 'VB', 'PRP'], '_ners': ['O', 'O', 'O', 'O'], '_mentions': {}, '_skeleton_dependency_indices': [1, 2], '_skeleton_indices': [1, 2, 3], '_verb_indices': [2], 'raw_sent_mapping': None, '_phrase_segment_indices': [(0, 1), (1, 2), (2, 3), (3, 4)], 'frequency': 197.0}
