Documentation and discoverability of additional algorithms #19

bbqsrc · 2020-03-19T10:44:58Z

We have two algorithms at play in divvunspell that don't exist in hfst-ospell:

Case handling
Penalty weighting for first letter different, last letter difference and Damerau–Levenshtein distance for middle letters

Things to do to make this good:

Document somewhere sane how the algorithms behave
Add some information to --help either with a link or with the information itself
In the suggestion output for divvunspell, show the penalties, and the unmodified weights, as well as the modified weights
Document how to add the weight information to BHFST files so it can be controlled by the linguist
If possible, add a flag for disabling the penalty weighting algorithm (like --no-case-handling already does somewhat, but separate the two into different flags)

The text was updated successfully, but these errors were encountered:

nlhowell · 2020-05-11T15:56:44Z

Just a ping: this is important for me; I have orthographic corrections that
specifically apply to the beginning and ends of words; these are given low (or
even zero!) weight.

hfst-ospell makes the correct suggestions, but divvunspell overrides some of
these with much less appropriate corrections. It would be great if I could add
some information to the .bhfst to modify this.

Here's an example.

Input: кера (final glyph is "cyrillic a")
Correct spelling: кера̄ (final glyph is "cyrillic a" + "combining macron")

Suggested spellings (hfst-ospell):

$ echo 'кера' | hfst-ospell tsez.zhfst -S | head
"кера" is NOT in the lexicon:
Corrections for "кера":
кера̄    1.000000
кека    10.000000
кекра    10.000000
кеза    10.000000
кура    10.000000
кераз    10.000000
кеца    10.000000
кецра    10.000000

Suggested spellings (divvunspell):

$ echo 'кера' | divvunspell -b ddo.bhfst -s | head
Reading from stdin...
Input: кера		[INCORRECT]
кеза		15
кека		15
кекра		15
кеца		15
кецра		15
кура		15
кера̄		16
кераз		25
керо		25

bbqsrc added the enhancement label Mar 19, 2020

bbqsrc self-assigned this Mar 19, 2020

bbqsrc added this to the 1.0 milestone Mar 19, 2020

snomos mentioned this issue Nov 3, 2021

Multiple acceptors and error models #25

Open

9 tasks

Trondtr mentioned this issue May 29, 2024

divvunspell having problems with Cyrillic capital letters (it seems) #39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation and discoverability of additional algorithms #19

Documentation and discoverability of additional algorithms #19

bbqsrc commented Mar 19, 2020

nlhowell commented May 11, 2020

Documentation and discoverability of additional algorithms #19

Documentation and discoverability of additional algorithms #19

Comments

bbqsrc commented Mar 19, 2020

nlhowell commented May 11, 2020