Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auxiliary verb rule for single word semantic lexicon lookup #27

Open
apmoore1 opened this issue Jan 31, 2022 · 1 comment
Open

Auxiliary verb rule for single word semantic lexicon lookup #27

apmoore1 opened this issue Jan 31, 2022 · 1 comment
Assignees

Comments

@apmoore1
Copy link
Member

apmoore1 commented Jan 31, 2022

To incorporate auxiliary verb rules into the USAS Rule Based Tagger.

Definition of auxiliary verb rules

All POS tags used here are from the CLAWS C7 tagset.

In English (at least in the C version of the semantic tagger) we use auxiliary verb rules for POS tags VB* (be), VD* (do), VH* (have), to determine the main and auxiliary verbs and therefore alter the semantic tag.

An auxiliary verb would normally be given the USAS semantic tag Z5 grammatical bin, whereas the main verb would be given a non Z5 tag. For example in the sentence (format is token_USAS semantic tag) below the auxiliary verb is have and the main verb is finished:

I_Z8mf have_Z5 finished_T2- my_Z8 lunch_F1 ._PUNC 

We have approximately 35 rules in place for amending the semantic tags on be, do, and have after the initial set of potential semantic tags are applied. An example rule for have is as follows:

VH*[Z5] (RR*n) (RT*n) (XX) (RR*n) (RT*n) V*N

If the sequence of POS tags matches a given context, VH* (POS tag for have) followed by V*N (POS tag for the word finished) with optional intervening adverbs (R* POS tags) or negation (XX POS tag), then the rule instructs the tagger to change the semantic tag on the auxiliary verb have to be Z5.

For semantic taggers in other languages (the Java versions), we do not have auxiliary/main verb rules in place.

How this rule maps to spaCy pipeline through UPOS tagset

In the UPOS tagset and therefore spaCy POS models we can use the AUX POS tag from the UPOS tagset, instead of VB* (be), VD* (do), VH* (have). Below is the code and output of running the small English spaCy model on the sentence I have finished my lunch.:

import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('I have finished my lunch.')
print('Token\tPOS')
for token in doc:
    print(f'{token.text}\t{token.pos_}')

Output:

Token	POS
I	PRON
have	AUX
finished	VERB
my	PRON
lunch	NOUN
.	PUNCT
@perayson
Copy link
Member

I've updated the comment to explain things further. It'd be good to find some evaluation of how accurate the auxiliary verb detection is in spaCy. We described our original approach in this UCREL technical paper: https://ucrel.lancs.ac.uk/papers/techpaper/vol3.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants