Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

df tag in MWE template. #26

Open
apmoore1 opened this issue Jan 31, 2022 · 1 comment
Open

df tag in MWE template. #26

apmoore1 opened this issue Jan 31, 2022 · 1 comment
Assignees

Comments

@apmoore1
Copy link
Member

apmoore1 commented Jan 31, 2022

To incorporate Df tags from MWE templates to enhance the USAS Rule Based Tagger.

Definition of Df tags

A small number (currently 93) of English MWE templates have the tag Df, which stands for default. The Df tags refers to the first token starting with a wildcard (*) single word's semantic tag from the single word semantic lexicon, and the Df tag is replaced in the tagger output. Note that, in the C version of the semantic tagger, only the first part of any slash tag is copied across, and any gender markers (lower case letters) on the single word semantic tag are also removed during the replacement step.

Example 1

In the MWE template below, the semantic tag Df would be replaced with the semantic tags of the adjective (JJ) token by looking up that token's semantic tags in the single word semantic lexicon:

mwe_template    semantic_tags
*_JJ style_NN1    Df

To make this more concrete, given the text:

The acting style.

And the following single word lexicon

lemma    pos    semantic_tags
acting    JJ    A1/Z3

As well as the above MWE template, then the tokens acting style will be tagged as an MWE with the A1 semantic tag.

Example 2

Some MWE templates can include membership to more than one semantic category using the slash (/) notation, here is an example of how the Df tag is processed in these cases. Given the MWE template:

mwe_template    semantic_tags
*_JJ style_NN1    C1/Df

given the text:

The acting style.

And the following single word lexicon

lemma    pos    semantic_tags
acting    JJ    A1/Z3

Then the tokens acting style will be tagged as an MWE with the C1/A1 semantic tags.

Problems with the definition

As stated in the definition that:

The `Df` tags refers to the first token starting with a wildcard (`*`)

This means that an MWE with a Df tag must contain a word token element starting with a wildcard. If no such token exists in the template, then a warning should be issued.

@perayson
Copy link
Member

perayson commented Feb 8, 2022

I've amended the definition and checked the process against the algorithm as implemented in my C code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants