New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add gender to singular word forms in German #58
Comments
- Extended `ParsingContext` to temporarily hold the gender index (`Genus=`, `Genus 1=`, `Genus 2=`, `Genus 3=`, `Genus 4=`).
- Added license to the `ParsingContextText`.
@chmeyer By the way, there was a bug in gender parsing. The forms |
- Moved gender parsing to a dedicated class. - Fixed a bug where gender strings `mn` and `mnf` were not supported. - Added logging for unrecognized gender strings.
Special cases mit
|
Fine in this issue and PR, I guess. |
@chmeyer Another question. While working on #58 I've found out that some words do not have a gender for several reasons - like only plural form. Wiktionary authors use genus values I think it would be good to introduce something like What do you think? This would be probably not backwards compatible. |
Theoretically, this is not a property related to gender, so I'm not in favor of the Currently, this entry-related property is encoded in Thus, I suggest changing the part of speech of nouns to SINGULARE_TANTUM if there are only singular forms and to PLURALE_TANTUM if there are only plural forms based on the word-form-parsing component. Mind that the part of speech property is also set at other code locations, so we need to make sure in the tests that it won't get overridden. (Or if this yields chaos, we can think about separating out this morphological property into a separate attribute. In the long run, this would be the cleanest option.) |
Ok, I see. The reason why I would like to do this is to ensure the completeness of parsing. At the moment "unknown" values are simply mapped to I am interested to fix both cases. Either by fixing the JWKTL code or by correcting articles in the Wiktionary. But to do this, I need to have these problems reported first. For this I'd need to distinguish I'll think about a different solution. Maybe introduce a lower-level |
Got it. How about |
I think have a special parse-time enum will be better. Here's a suggestion. I'll file an issue concerning What do you think? |
- Added a gender property to the Wiktionary word form
OK, using the enum at parsing time is fine. |
- Corrected the URLs in the integration test
- Moved noun table extraction to a separate class.
- DEWiktionaryEntryParserTest now sets file name as page title - Moved `(Einzahl)` and `(Mehrzahl)` to their own patterns. - Parsing `Genus` using a pattern as well. - Added tests for `Singular?`, `Singular i*`/`Singular i**`, `Singular* i`.
- Added a test for Fote to check the label `Singular`
- Moved index extracted from the matcher to an utility class - Added a unit test for the pattern/matcher utility class
- Added a unit test for `Singular 1` referencing `Genus`
- Added a unit test for the pattern/matcher utility class
- Reworked Genus processing - Moved handling Genus, Singular, Einzahl, Plural, Mehrzahl into separate methods
- Added Gams test where `Singular` refered to `Genus 1` - Refactored noun table extractor and moved `Genus`, `Singular`, `Einzahl`, `Plural`, `Mehrzahl` to separate handler classes.
- Moved word form case handling into separate classes
- Added forgotten license header
- Introduced `ITemplateParameterHandler`
- Added forgotten license header
- Added unit tests for genus and number handlers
- Fixing the test which was failing due to " " at the end of the file name
- Added tests for case handlers
- Using index `1` for labels without index - Added tests to check setting and getting genus in noun table handler
- Added Javadocs
- Introduced `ITemplateParameterHandler`
- Added forgotten license header
- Added unit tests for genus and number handlers
- Fixing the test which was failing due to " " at the end of the file name
- Added tests for case handlers
- Using index `1` for labels without index - Added tests to check setting and getting genus in noun table handler
- Added Javadocs
- Applied suggestion from code review to case handler tests
- Renamed `ITemplateParameterHandler` to reflect its specifics to `WiktionaryWordForm`
Please see the discussion in #57.
GrammaticalGender getGender()
to theIWiktionaryWordForm
.Genus
Genus 1
Genus 2
Genus 3
Genus 4
m
,n
orf
, log a warning.Singular
Singular 1
,Singular 1*
,Singular 1**
Singular 2
,Singular 2*
,Singular 2**
Singular 3
,Singular 3*
,Singular 3**
Singular 4
,Singular 4*
,Singular 4**
null
as gender to the word form.null
as gender to the word form.The text was updated successfully, but these errors were encountered: