## Number of VerbNet 3.4 Classes and Frames

This section reports the number of VerbNet classes and frames in each class based on the original verbnet annotation.

In [1]:
import json

with open('../src/data/verbnet3.4.json', 'r') as json_file:
    data = json.load(json_file)

### Counting classes

VerbNet encodes a hierarchical structure where a class can contain one or more subclasses. In the annotation, subclasses appear at least twice within the resource: (i) as standalone entries and (ii) as references in their parent’s `subclasses` field. The deeper a subclass is nested, the more frequently it will appear in the records of its ancestors. Therefore, a straightforward method to obtain all unique class entries in VerbNet is to count only the top level entries, disregarding the subclass annotations.

To validate this approach, a script was developed to unfold all subclasses and ascertain if the total number of unique classes corresponds to the number of top level entries in VerbNet. 

This script recursively extracts all entries from the subclasses field and lists them as independent classes, producing a total of 953 class entries. Within this expanded list, 351 entries were found to be duplicates, and there are only 602 unique classes among these entries.

In [2]:
def unfold_subclasses(entries: list) -> list:
    """
    Recursively unfolds subclasses into the main entries list.
    """
    unfolded = []
    for entry in entries:
        unfolded.append(entry)
        subclasses = entry.get('subclasses', [])
        if subclasses:
            unfolded.extend(unfold_subclasses(subclasses))
    return unfolded


entries = data.get('VerbNet', [])
unfolded_entries = unfold_subclasses(entries)

print(f"Total unfolded class entries (including subclasses): {len(unfolded_entries)} ")

duplicated_classes = 0
for entry in entries:
    class_id = entry.get('class_id', None)
    count = 0
    for unfolded_entry in unfolded_entries:
        if unfolded_entry.get('class_id') == class_id:
            count += 1
    if count > 1:
        duplicated_classes += (count-1)
assert duplicated_classes == len(unfolded_entries) - len(entries), "The number of duplicated classes doesn't match the number of total classes"
print(f"Duplicated entries: {duplicated_classes}")

unique_classes = set()
for enty in unfolded_entries:
    class_id = enty.get('class_id', '[no ID]')
    unique_classes.add(class_id)
print(f"Unique entries: {len(unique_classes)}")

Total unfolded class entries (including subclasses): 953 
Duplicated entries: 351
Unique entries: 602


On the other hand, the total number of class entries listed in VerbNet is precisely 602. This confirms that VerbNet already provides a distinct record for every top level class and subclass, with the parent and child relationships serving as cross references between these records.

Consequently, we report the number of distinct classes in VerbNet as 602.

In [3]:
print(f"Entries in VerbNet: {len(entries)}")
assert len(entries) == len(unique_classes), "The number of entries in VerbNet is different from the number of unique classes"

Entries in VerbNet: 602


Out of these 602 classes, 329 have been identified as top-level classes, featuring subclass hierarchies that reach a maximum depth of five levels.

In [4]:
top_level_class = set()
max_depth = 0
for entry in entries:
    name = entry.get('class_id', 'unknown')
    if len(name.split('-')) == 2:
        top_level_class.add(name)
    max_depth = max(len(name.split('-')), max_depth)
print(f"Top-level classes: {len(top_level_class)}")
print(f"max depth of subclass: {max_depth}")

Top-level classes: 329
max depth of subclass: 5


### Counting frames

Frames are the frame schemas attached to classes. Each frame schema attributes to an usage of the verb. After aggregating over all class entries, the total number of frames in VerbNet is 1591

In [5]:
class_nframes = dict()
entries = data.get('VerbNet', [])
for entry in entries:
    frames = entry['frames']
    class_nframes[entry['class_id']] = len(frames)

print("Total frames in VerbNet: " + str(sum(class_nframes.values())))


Total frames in VerbNet: 1591
