Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wildcard single word lexicon rule matches (Auto Tag) #28

Open
apmoore1 opened this issue Feb 23, 2022 · 0 comments
Open

Wildcard single word lexicon rule matches (Auto Tag) #28

apmoore1 opened this issue Feb 23, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@apmoore1
Copy link
Member

apmoore1 commented Feb 23, 2022

To support wildcard (*) syntax for single word lexicon files. This would also be useful for rules like all punctuation tokens, which should be labelled as the semantic category PUNCT, for punctuation.

The wildcard symbol in this syntax would mean that zero or more characters may appear after the word token and/or Part Of Speech (POS) tag. This syntax will therefore hold the same meaning between single word and Multi Word Expression files.

Example

Assuming the single word lexicon file:

lemma    pos    semantic_tags
*kg   num     N3.5
*    punc    PUNCT

In the first case it would allow tagging anything that ended with kg, e.g. 15kg to be tagged as a measurement, the N3.5 semantic tag. In the second case it would label all punctuation with the punctuation semantic tag, PUNCT.

@apmoore1 apmoore1 added the enhancement New feature or request label Feb 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants