Feature treebank #40

balmas · 2018-07-30T18:38:35Z

This is the first in a set of changes to enable disambiguation of morphological parser results with explicit annotations.

This is a porting of functionality from V1, which allowed us to disambiguate with data from treebanks. The treebanks contain morpho-syntactic annotations of specific words in specific texts. These annotations normally identify a specific lemma and inflection for a word. But they do not provide all of the information the parser does, such as the stem and suffix of the word, the conjugation, etc.

So, when we have this data, we normally want to provide a merged view of the lexeme, which retains the information from the parser which wasn't provided in the annotation, but which also narrows the inflections down to the one identified in the annotation.

There are corresponding changes in the components, morph-client and embed-lib repositories.

coveralls · 2018-07-30T18:43:21Z

Coverage increased (+0.3%) to 83.136% when pulling 28af398 on feature-treebank into 153e1e1 on master.

kirlat

Looking good, but have some questions about business logic.

kirlat · 2018-07-31T08:56:45Z

src/lexeme.js

+      let keepInflections = []
+      // iterate through this lexemes inflections and keep only thoes that are disambiguatedBy by the supplied lexeme's inflection
+      for (let inflection of this.inflections) {
+        if (inflection.disambiguatedBy(lexeme.inflections[0])) {


Why does it check only the first inflection of lexeme?

Because in my use case, the homonym that is used as a disambiguator specifies that it must be X inflection, so it will have only one. But, I see your point, particularly given your comments on the Components PR about the disambiguate method in Homonym, and think the code will be more flexible if we assume we might have a narrowing down from multiple possibilities, to a few, to a single one. I will go ahead and change this code accordingly. Thank you!

kirlat · 2018-07-31T09:03:27Z

src/lemma.js

+   * @param {Lemma} lemma the lemma to compare
+   * @return {Boolean} true or false
+   */
+  isEqual (lemma) {


If I understand it correctly, it does check that word and part of speech are the same only. It does not check for full feature equality. So I'm thinking that isEqual name might be confusing in this case. Maybe it's better to rename it to something that more fully reflects a partiality of lemma comparison (and we might want to use isEqual later for full feature comparison)? What do you think?

@kirlat Yes, that's a good point. Is there a good naming convention here that you can suggest? I could use 'matches' or 'mayEqual' but if there is a more standard way to represent this I'd like to use it.

I think it's good if name would reflect an essence of a comparison. So if we're comparing a word and POFS what do really want to know about being equal between the two (from business logic standpoint)? Is it a word? Maybe we can call it matchesWord or matchesWordPOFS, or maybe isSameWord or maybe isSameWordPOFS.

It's really not easy. I think naming is probably the hardest part of software development :-).

I did a little searching and apparently linguistically, this scenario is known as a "full homonym" - two words spelled the same with the same part of speech. Since that's what I'm testing here I will name the method accordingly - i.e. isFullHomonym . Sound good?

Yes, I think isFullHomonym sounds perfect!

kirlat · 2018-07-31T09:15:37Z

src/inflection.js

+   */
+  disambiguatedBy (infl) {
+    let matched = true
+    for (let feature of infl.constraints.obligatoryMatches) {


Since obligatoryMatches are set separately, I'm thinking if we can handle situations when obligatoryMatches are not set by mistake gracefully, because that seems to be a thing that is easy to miss. How do we set obligatoryMatches usually? Do we rely on a language model for that? Can we set constraints on a first disambiguatedBy call if they are not set yet? Maybe we could have a flag that will indicate if this field has been initialized yet (a flag might be an overkill, but do we need to distinguish between situations when there is no obligatory matches or when obligatory matches have not been initialized)?
Also, what should a function return if no obligatoryMatches are set (i.e. there is no information about obligatory matches within an inflection at the moment of comparison). Right now in that case a function will return true. Is this correct from business logic point of view? What do you think?

we don't actually set obligatoryMatches on an Inflection in any other cases right now. It's something that was added during some of the inflection table refactoring but didn't get used. I was piggy backing on it for convenience... you are correct in calling this out. I need to think about the possible scenarios a bit.

irina060981

I don't have additional questions besides Kirill has asked.

and disamibugate using all features

balmas · 2018-08-02T16:41:39Z

@kirlat I made a significant number of changes in response to your feedback and would appreciate another review! Important changes to note:

I eliminated the use of the obligatoryMatches property on inflections for disambiguation. In thinking about the business logic behind this I realized that I was using that because we didn't have an easy way to iterate the defined features of an inflection. Really what I wanted to do was iterate through the features on the inflection being used to disambiguate - anything that is defined there has to match, but it can be less specific than the thing it's used to disambiguate. It can't be more specific because then it's a different inflection. So I added a features property on the inflection which keeps the names of all the defined features for easy iteration.
I have changed both Lexeme.disambiguate and Homonym.disambiguate to static functions which return new objects, rather than instance methods which update an existing object. I can imagine scenarios where we might want to retain the original object, so it seems safer to create a new object. I also added the ability for the disambiguation to use a sequence as you suggested. I'm not sure we have a use for it, but the effort of making the code work for it helped me to identify some serious business logic flaws in the previous version. We need to also account for the possibility, for example, that an annotation which is used for disambiguation doesn't match anything in the morphological parser results (because sometimes the parser is just wrong or missing information) and in that case the process of disambiguation actually adds to the object rather than reducing it.

Let me know what you think when you have a chance. Thanks!

kirlat

Hi Bridget, your changes look really good to me!

Bridget Almas added 4 commits July 6, 2018 16:08

add ability to disambiguate a homonym, lexeme and inflection

9dac4da

update build

118f34a

move equality testing in the lemma object

c1236df

Merge branch 'master' into feature-treebank

1b7f36b

balmas requested review from irina060981 and kirlat July 30, 2018 18:38

balmas mentioned this pull request Jul 30, 2018

Feature treebank alpheios-project/embed-lib#6

Merged

kirlat reviewed Jul 31, 2018

View reviewed changes

irina060981 approved these changes Jul 31, 2018

View reviewed changes

Bridget Almas added 4 commits August 1, 2018 09:44

rename Lemma.isEqual to Lemma.isFullHomonym

3c24673

make disambiguate methods static

b8b7f79

track features added to an inflection

5ab1101

and disamibugate using all features

refinements to disambiguation business logic

28af398

balmas mentioned this pull request Aug 2, 2018

Feature treebank alpheios-project/components#153

Merged

kirlat approved these changes Aug 3, 2018

View reviewed changes

balmas merged commit e14d7ad into master Aug 3, 2018

balmas deleted the feature-treebank branch August 3, 2018 12:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature treebank #40

Feature treebank #40

Uh oh!

balmas commented Jul 30, 2018

Uh oh!

coveralls commented Jul 30, 2018 •

edited

Loading

Uh oh!

kirlat left a comment

Uh oh!

kirlat Jul 31, 2018

Uh oh!

balmas Aug 1, 2018

Uh oh!

kirlat Jul 31, 2018

Uh oh!

balmas Aug 1, 2018

Uh oh!

kirlat Aug 1, 2018

Uh oh!

balmas Aug 1, 2018

Uh oh!

kirlat Aug 1, 2018

Uh oh!

kirlat Jul 31, 2018

Uh oh!

balmas Aug 1, 2018

Uh oh!

irina060981 left a comment

Uh oh!

balmas commented Aug 2, 2018

Uh oh!

kirlat left a comment

Uh oh!

Uh oh!

Feature treebank #40

Feature treebank #40

Uh oh!

Conversation

balmas commented Jul 30, 2018

Uh oh!

coveralls commented Jul 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kirlat left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

irina060981 left a comment

Choose a reason for hiding this comment

Uh oh!

balmas commented Aug 2, 2018

Uh oh!

kirlat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coveralls commented Jul 30, 2018 •

edited

Loading