Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coverage of PB-VN mapping is not a strict subset of VerbNet-derived mapping #7

Open
aaronstevenwhite opened this issue Jul 26, 2021 · 9 comments

Comments

@aaronstevenwhite
Copy link

When comparing pb-vn2.json to a mapping roleset-class mapping derived from VerbNet3.4 itself, I find that the ProbBank rolesets in the domain of each mapping are not in a subset relation with each other as might be expected.

To derive the mapping from VerbNet3.4, I use:

from collections import defaultdict
from verbnet import VerbNetParser

verbnet = VerbNetParser(version="3.4")
     
pb_vn34_map = defaultdict(set)

for cid, clsinfo in verbnet.verb_classes_numerical_dict.items():
    for m in clsinfo.members:
        for pbroleset in m.grouping:
            pb_vn34_map[pbroleset] |= {cid}

pb_vn34_map = dict(pb_vn34_map)

When compared to pb-vn2.json...

with open('semlink/instances/pb-vn2.json') as f:
    semlink_map = json.load(f)
    
pbset_from_verbnet = set(pb_vn34_map)
pbset_from_semlink = set(semlink_map)
    
print('In both SemLink and VerbNet:\t', len(pbset_from_semlink & pbset_from_verbnet))
print('In VerbNet but not SemLink:\t', len(pbset_from_verbnet - pbset_from_semlink))
print('In SemLink but not VerbNet:\t', len(pbset_from_semlink - pbset_from_verbnet))

I observe the following counts:

In both SemLink and VerbNet:	 1854
In VerbNet but not SemLink:	 1360
In SemLink but not VerbNet:	 2323
@kevincstowe
Copy link
Collaborator

I see, yes, there a couple of issues here. First, for semlink, we are also using an external file of pb-vn mappings that was generated for a separate project. It isn't directly from either resource, causing some of the disjoint.

The other, more pressing issue, is that PB and VN both seem to have ideas about what they map to. For SemLink, we trust PB: the mappings come from what PB says, and from the external file. It's likely the case that VN has more, valid mappings that we could include. Unfortunately it's probably also likely that they conflict in some places. We'll have to do a little study to find where VN's mappings to PB conflict with PB's to VN, where the disjoint is, and how we can expand coverage. @ghamzak is this something CU could look in to?

For now, I think all I can say is that we trust SemLink (and thus PB) wrt. mappings - anything that looked suspicious was removed in the automated process, and the PB mapping should then be valid.

@aaronstevenwhite
Copy link
Author

Thanks for the quick reply. Is there any information on how the PB-VN mapping linked above was generated? We are using these mappings in an analysis for a paper, and while it's straightforward to just point to VN3.4 for the mappings that can be derived from it, I'm a bit worried about using the above without being able to cite their provenance. I'm assuming they're not strictly from PB, since PB only contains PB-VN3.2 mappings and some of the classes have been renamed or split in VN3.4 (the original reason I contacted @ghamzak back in March: I had extracted PB-VN3.2 from PB, and was looking for a mapping from VN3.2 to VN3.4 to compose with it).

@MarthaSPalmer
Copy link

MarthaSPalmer commented Jul 27, 2021 via email

@aaronstevenwhite
Copy link
Author

Thanks for the quick reply, @MarthaSPalmer.

@MarthaSPalmer
Copy link

MarthaSPalmer commented Jul 27, 2021 via email

@aaronstevenwhite
Copy link
Author

Even with the updated mapping, I am running into mismatches. I believe it should be the case that if there is a semlink mapping from a PB roleset to a VN class in pv-vn2.json, the VN class is guaranteed to be in VN3.4, but this does not appear to be the case. For instance, if I compute the set of VN classes from VerbNet and the set of VN classes that semlink maps that at least one PB roleset maps to as follows...

from verbnet import VerbNetParser

verbnet = VerbNetParser(version="3.4")

with open('semlink/instances/pb-vn2.json') as f:
    semlink = json.load(f)

verbnet_classes = set(verbnet.verb_classes_numerical_dict)
semlink_classes = {vncls for pbroleset, vnclasses in semlink.items() for vncls in vnclasses}

...and then calculate len(semlink_classes - verbnet_classes), I get 52 classes that are mapped to in semlink but are not found in VN3.4. I've included a list below.

31.3-3
39.1-2
72
9.3-2
28
31.3-1
90-1
47.1-1-1
29.5-1-2
45.6
31.3-7
13.2-1-1-1
13.3-1
92.1
100
36.1
107
27
51.1-2
23.1-2
31.3-2
31.3-8
49
31.3-9
9.3-2-1
39.1-3
40.3.1-1
105
26.7-1-1
10.6
26.2
13.6
37.7-2
26.7-2-1
61
31.4-3
59
39.4-1
64
39.3-2
47.5.1-2-1
37.4
39.2-1
51.1-3
39.2-2
31.3-6
39.4-2
22.4-1
33
95
31.3-5
9.2-1

At least some of these (e.g. 10.6, 72, 105)—maybe all of them—are classes and subclasses that are only found in VN3.2. Indeed, when doing the analysis I'm trying to use this for originally, these classes were exactly the mismatching ones that triggered my initial request for a VN3.2 to VN3.4. At that point, I actually just went through and hand-corrected the mappings on a by-predicate basis as best I could, but it would be really nice to have a canonical mapping that maps to only VN3.4, since my hand-corrected mapping could be wrong in places.

@aaronstevenwhite
Copy link
Author

aaronstevenwhite commented Jul 28, 2021

It's my understanding that external_vn2pb.json should have as keys only VN3.4 classes. This does not appear to be the case. When I load that mapping and compare it to the list of classes extracted directly from VN3.4 as in my previous post...

with open('semlink/other_resources/external_vn2pb.json') as f:
    external_vn2pb = json.load(f)

# get the numeric identifier for each class
external_vn2pb_classes = {'-'.join(c.split('-')[1:]) for c in external_vn2pb} - 

external_vn2pb_classes - verbnet_classes

I get 12 classes found in external_vn2pb.json but not found in VN3.4, all of which are subclasses .

39.1-3
39.1-2
30.1-1-1-1
13.3-1
39.2-1
31.4-3
23.1-2
39.2-2
22.4-1
39.3-2
9.2-1
47.5.1-2-1

These all appear to be instances where the base class exists in VN3.4 but the subclass doesn't. Maybe these were cases for which an early version of VN3.4 subclassed an existing class but where that subclass was deleted or promoted to its own class?

The above would explain some of the mismatches mentioned in the above post, but there still remain 40 classes in pb-vn2.json that are not explained by this mismatch.

@chaitanyamalaviya
Copy link

Hi, I'm running into the same issues as highlighted above, i.e., classes linked to in pb-vn2.json don't exist in VerbNet. Would appreciate a response. Thanks!

@kevincstowe
Copy link
Collaborator

kevincstowe commented Aug 9, 2021

Sorry for the delay, but I'm looking into it now. One thing is that the current version is based on VN3.3, rather than 3.4. I don't know if that accounts for all of the mismatches though. I can say the external_vn2pb.json was built separately, and not linked the 3.3 even, but the incorrect classes should be filtered outwhen semlink is generated. I'll update when I've found out more.

UPDATE: It's pulling 3.4, so that shouldn't be an issue. But it looks like there was a bug where it wasn't correctly filtering/updating incorrect PB mappings. Fixing and rerunning now.

UPDATE: It appears that was, in fact, the issue. I implemented your test @aaronstevenwhite, and it now returns 0. This test is now included so we can check if these errors are popping up in the future. Note that the external external_vn2pb.json still is NOT current: it's an older, manually created file, and only supplements these resources. The pb2vn system takes this file as additional information, corrects it where possible, uses where correct, and removes where incorrect (as it does with the PropBank frame file mappings).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants