## Thematic roles in VerbNet 3.4

Thematic roles, often abbreviated as ThemRole,  are a fundamental component of semantic annotation in VerbNet. They describe the semantic relationship between a predicate and its arguments $[1]$ . 

However, VerbNet 3.4 introduced significant changes but stil lacks comprehensive documentation. In the absence of a specific thematic role guideline for this version and considering the reliance of this work on these roles, this section discusses the set of thematic roles employed.

In [1]:
import json

data_dir = '../src/data'
output_dir = '../output'
vn_dir = f'{data_dir}/verbnet3.4.json'
AM_model_path = f"{output_dir}/extracted_filtered_STRIPS.json"
thematic_tree_path = f'{data_dir}/vn_semanticrole_hierarchy.json'

# themrole heirarchy tree
with open(thematic_tree_path, 'r', encoding='utf-8') as f:
    tree_data = json.load(f)

# extracted STRIPS model
with open(AM_model_path, "r") as f:
    am_data = json.load(f)

### The Standard Thematic Role Hierarchy

The thematic role hierarchy graph provided on the [Unified Verb Index reference page](https://uvi.colorado.edu/references_page) $[2]$ serves as the standard for this research. This hierarchy outlines a total of 41 thematic roles, which are hereafter referred to as the standard thematic roles.

In [2]:
# get all values in tree data
themrole_tree_set = set()
def traverse(node, themrole_set):
    themrole_set.add(node['value'].lower())
    for child in node.get('children', []):
        traverse(child, themrole_set)
    return themrole_set

themrole_tree_set = traverse(tree_data, themrole_tree_set)
print(f"total themroles in themrole hierarchy: {len(themrole_tree_set)}")

total themroles in themrole hierarchy: 41


### ThemRoles in VerbNet annotation

Although the standard set contains 41 thematic roles, the application of these roles within the VerbNet annotation is more complex. To accommodate the variety of natural language while maintaining a high level abstract thematic role set, VerbNet annotation frequently employs constant thematic roles. These constants are designated for specific uses and are not considered part of the general thematic role set. A complicating factor is that not all constants are explicitly marked as such.

The principle for distinguishing between a thematic role and a constant is as follows: A role is identified as a thematic role if it is explicitly marked with the `ThemRole` type in the annotation or if it is one of the standard thematic roles. Otherwise, it is classified as a constant. A similar logic is applied when determining the arguments for the extracted action model.

Applying this rule, a total of 46 thematic roles were identified within VerbNet. This set includes some annotation errors and certain roles that are not present in the standard thematic role hierarchy. Since this work relies on the hierarchy to assess the similarity between thematic roles, these non standard roles are also treated as constants. Conversely, five of the standard thematic roles were never instantiated in the annotation. This observation aligns with the expectation that VerbNet annotation uses thematic roles that are as specific as possible to the verb's meaning. The five uninstantiated roles are all located at the top of the hierarchy, which suggests they function as abstract roles.

In [3]:
from vn2am.parser import get_VN_entries, get_semantics, get_frames
from vn2am.utils import remove_themrole_mark

vn_themrole_set = set()
vn_constant_set = set()

vndata = get_VN_entries(vn_dir)
frames = get_frames(vndata)
for entry in vndata:
    class_id = entry.get('class_id', 'Unknown')
    frames = entry.get('frames', [])
    for i, frame in enumerate(frames):
        semantics = get_semantics(frame)
        for semantic in semantics:
            _, predicate_name, args, bool_value = semantic
            for arg_type, arg_value in args:
                if arg_type == 'ThemRole':
                    vn_themrole_set.add(remove_themrole_mark(arg_value))
                elif arg_value.removeprefix("?").lower() in themrole_tree_set:
                    vn_themrole_set.add(remove_themrole_mark(arg_value))
                else:
                    vn_constant_set.add(arg_value)

print(f"Total number of ThemRoles in VerbNet: {len(vn_themrole_set)}")
print(f"Total number of Constant in VerbNet: {len(vn_constant_set)}")
print("ThemRoles in Verbnet only:\n", vn_themrole_set - themrole_tree_set)
print("Themroles in heirarchy only:\n", themrole_tree_set - vn_themrole_set)


Total number of ThemRoles in VerbNet: 46
Total number of Constant in VerbNet: 62
ThemRoles in Verbnet only:
 {'destination_time', 'e2', 'path', 'v_final_state', 'circumstance', 'v_manner', 'v_vehicle', 'initial_time', 'e1', 'v_state'}
Themroles in heirarchy only:
 {'place', 'participants', 'undergoer', 'property', 'locus'}


## Reference

[1] Kipper Schuler, K.: VerbNet: A Broad-Coverage, Comprehensive Verb Lexicon. PhD thesis, University of Pennsylvania (2005)

[2] https://uvi.colorado.edu/references_page