Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old norse nouns #868

Merged
merged 63 commits into from Mar 5, 2019

Conversation

4 participants
@clemsciences
Copy link
Member

clemsciences commented Feb 6, 2019

Given the nominative singular, the genitive singular and the nominative plural, we can get the full forms of Old Norse nouns (except some nouns, see the doctests).

To achieve this, some functions were coded to take into account some phonological rules (r-assimilation or apocope, i and u umlaut (regressive metaphony).

clemsciences added some commits Jan 15, 2019

phonemic_rules.py created so that common sound changes can be process…
…ed here and a basic common prefix searching method is implemented

clemsciences added some commits Feb 1, 2019

Added doctests and dealt with u umlaut by strong masculine nouns with…
… u-stem. extract_common_stem must be improved.
@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Feb 6, 2019

Codecov Report

Merging #868 into master will increase coverage by 0.17%.
The diff coverage is 93.92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #868      +/-   ##
==========================================
+ Coverage   89.66%   89.83%   +0.17%     
==========================================
  Files         191      195       +4     
  Lines       12272    12490     +218     
==========================================
+ Hits        11004    11221     +217     
- Misses       1268     1269       +1
Impacted Files Coverage Δ
cltk/phonology/syllabify.py 96.65% <100%> (+0.03%) ⬆️
cltk/phonology/old_norse/transcription.py 95.45% <100%> (+4.71%) ⬆️
cltk/inflection/utils.py 82.27% <91.66%> (+7.59%) ⬆️
cltk/inflection/old_norse/nouns.py 93.47% <93.28%> (-6.53%) ⬇️
cltk/inflection/old_norse/phonemic_rules.py 93.33% <93.33%> (ø)
cltk/phonology/utils.py 80.22% <96.66%> (ø) ⬆️
cltk/tests/test_nlp/test_lemmatize.py 100% <0%> (ø) ⬆️
cltk/lemmatize/greek/greek.py 100% <0%> (ø)
cltk/lemmatize/backoff.py 100% <0%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 47f00b5...66b0217. Read the comment docs.

@clemsciences

This comment has been minimized.

Copy link
Member Author

clemsciences commented Feb 6, 2019

I answered in #865

@clemsciences clemsciences added this to In progress in old-norse Feb 7, 2019

@clemsciences

This comment has been minimized.

Copy link
Member Author

clemsciences commented Feb 8, 2019

In this pull request, there is this new module cltk/inflection/old_norse/phonemic_rules.py that I need for an other pull request (to come) about verb inflections. So what should this one become? Shall I make a very big pull request for an Old Norse lemmatizer (I don't like this option) or split my code into small incomprehensible pieces (I don't like this one too :D)

@Sedictious
Copy link
Contributor

Sedictious left a comment

The code is clear and while I can't judge the linguistic content of the PR, everything looks great and well-documented. I'd normally agree on maybe breaking this up into smaller pieces, but I think this could still be merged as-is.

self.nucleus = ["u"]


def extract_common_stem(*args):

This comment has been minimized.

@Sedictious

Sedictious Feb 13, 2019

Contributor

I really like this idea and I could experiment with something hopefully more accurate. What I have in mind:

  • Strip prefixes and suffixes (if the option is offered)
  • Fairly straight-forward brute-force: Keep a stack of possible phonetic replacements and keep updating both of the strings, keeping track of the longest substring.

This doesn't account for inflections (which is half the point anyway), so this should be preferably coupled with stemming.

This comment has been minimized.

@clemsciences

clemsciences Feb 14, 2019

Author Member

The idea I had was to align the three forms with each other and see from which form it is the cheapest to transform to the two other forms, if we consider addition, deletion and modification as operations which cost.

@@ -483,7 +490,8 @@ class Transcriber:
- firstly, a greedy approximation of the pronunciation of word
- then, use of rules to precise pronunciation of a preprocessed list of transcribed words
"""
def __init__(self, diphthongs_ipa, diphthongs_ipa_class, ipa_class, rules):

def __init__(self, diphthongs_ipa: dict, diphthongs_ipa_class: dict, ipa_class: dict, rules: list):
"""

This comment has been minimized.

@Sedictious

Sedictious Feb 13, 2019

Contributor

I agree with the renaming, far more self-explanatory.

@clemsciences

This comment has been minimized.

Copy link
Member Author

clemsciences commented Feb 15, 2019

@kylepjohnson, do you want a new specific CLTK repository for an Old Norse lemmatizer (and later integrate it to CLTK core) or this pull request for CLTK core?

@clemsciences

This comment has been minimized.

Copy link
Member Author

clemsciences commented Feb 24, 2019

Hey @kylepjohnson, may I have some feedback?


from enum import Enum, auto
from typing import Union

This comment has been minimized.

@kylepjohnson
To decline a noun and if you know its nominative singular, genitive singular and nominative plural forms, you can use the following functions.

+--------+-------------------------------+------------------------------+----------------------------+

This comment has been minimized.

@kylepjohnson

kylepjohnson Mar 5, 2019

Member

Being narrowed down to just nouns and just for ON, I can better imagine the use for this -- essentially it's an ON noun type.

l will accept this PR, however I would like to see some work which uses these. For example, what kind of data set can you use -- a lexicon? And do you expect this to help in parsing texts? or for other reasons, too?

This comment has been minimized.

@clemsciences

clemsciences Mar 5, 2019

Author Member

Hey @kylepjohnson, this document may summarize what I expect to do with such code.

@kylepjohnson kylepjohnson merged commit 65e6c27 into cltk:master Mar 5, 2019

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details

kylepjohnson added a commit that referenced this pull request Mar 5, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.