-
Notifications
You must be signed in to change notification settings - Fork 2
Feature treebank #40
Feature treebank #40
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, but have some questions about business logic.
src/lexeme.js
Outdated
let keepInflections = [] | ||
// iterate through this lexemes inflections and keep only thoes that are disambiguatedBy by the supplied lexeme's inflection | ||
for (let inflection of this.inflections) { | ||
if (inflection.disambiguatedBy(lexeme.inflections[0])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it check only the first inflection of lexeme
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because in my use case, the homonym that is used as a disambiguator specifies that it must be X inflection, so it will have only one. But, I see your point, particularly given your comments on the Components PR about the disambiguate method in Homonym, and think the code will be more flexible if we assume we might have a narrowing down from multiple possibilities, to a few, to a single one. I will go ahead and change this code accordingly. Thank you!
src/lemma.js
Outdated
* @param {Lemma} lemma the lemma to compare | ||
* @return {Boolean} true or false | ||
*/ | ||
isEqual (lemma) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it correctly, it does check that word and part of speech are the same only. It does not check for full feature equality. So I'm thinking that isEqual
name might be confusing in this case. Maybe it's better to rename it to something that more fully reflects a partiality of lemma comparison (and we might want to use isEqual
later for full feature comparison)? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kirlat Yes, that's a good point. Is there a good naming convention here that you can suggest? I could use 'matches' or 'mayEqual' but if there is a more standard way to represent this I'd like to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good if name would reflect an essence of a comparison. So if we're comparing a word and POFS what do really want to know about being equal between the two (from business logic standpoint)? Is it a word? Maybe we can call it matchesWord
or matchesWordPOFS
, or maybe isSameWord
or maybe isSameWordPOFS
.
It's really not easy. I think naming is probably the hardest part of software development :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a little searching and apparently linguistically, this scenario is known as a "full homonym" - two words spelled the same with the same part of speech. Since that's what I'm testing here I will name the method accordingly - i.e. isFullHomonym . Sound good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think isFullHomonym
sounds perfect!
src/inflection.js
Outdated
*/ | ||
disambiguatedBy (infl) { | ||
let matched = true | ||
for (let feature of infl.constraints.obligatoryMatches) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since obligatoryMatches
are set separately, I'm thinking if we can handle situations when obligatoryMatches
are not set by mistake gracefully, because that seems to be a thing that is easy to miss. How do we set obligatoryMatches
usually? Do we rely on a language model for that? Can we set constraints on a first disambiguatedBy
call if they are not set yet? Maybe we could have a flag that will indicate if this field has been initialized yet (a flag might be an overkill, but do we need to distinguish between situations when there is no obligatory matches or when obligatory matches have not been initialized)?
Also, what should a function return if no obligatoryMatches
are set (i.e. there is no information about obligatory matches within an inflection at the moment of comparison). Right now in that case a function will return true. Is this correct from business logic point of view? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't actually set obligatoryMatches on an Inflection in any other cases right now. It's something that was added during some of the inflection table refactoring but didn't get used. I was piggy backing on it for convenience... you are correct in calling this out. I need to think about the possible scenarios a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have additional questions besides Kirill has asked.
@kirlat I made a significant number of changes in response to your feedback and would appreciate another review! Important changes to note:
Let me know what you think when you have a chance. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Bridget, your changes look really good to me!
This is the first in a set of changes to enable disambiguation of morphological parser results with explicit annotations.
This is a porting of functionality from V1, which allowed us to disambiguate with data from treebanks. The treebanks contain morpho-syntactic annotations of specific words in specific texts. These annotations normally identify a specific lemma and inflection for a word. But they do not provide all of the information the parser does, such as the stem and suffix of the word, the conjugation, etc.
So, when we have this data, we normally want to provide a merged view of the lexeme, which retains the information from the parser which wasn't provided in the annotation, but which also narrows the inflections down to the one identified in the annotation.
There are corresponding changes in the components, morph-client and embed-lib repositories.