Is this possible? Scan dic file and obtain all forms of all files #1

MonsterMMORPG · 2017-03-02T22:47:27Z

What i want is simple

I would like to obtain all words that can be composed from the given word

E.g.

make/UAGS

in us.dic file

So i want to obtain all words that can be obtained from this word/suffix combination

e.g. results are : made, making, makes etc

aarondandy · 2017-03-06T06:08:29Z

This is possible, but really hard! The easiest thing is what I think you want though, to just slap some affixes onto some roots. This is not totally correct but it should get you somewhere. There are loads of corner cases that I don't understand and this example does not even touch on compound words. Hope this helps: https://gist.github.com/aarondandy/aaa622afeeb0cb86b0d4efe697c23be5

MonsterMMORPG · 2017-03-07T00:28:41Z

Ty for this code. But i am rather interested in perfectly working one especially that would work with UTF8 database. Currently there is unmunch command which works on English dataset but i need another solution for arabic and turkish :D There is another bash script but it only prints on screen and requires a keyword to work.

…

On Mon, Mar 6, 2017 at 9:08 AM, Aaron Dandy ***@***.***> wrote: This is possible, but really hard! The easiest thing is what I think you want though, to just slap some affixes onto some roots. This is not *totally* correct but it should get you somewhere. There are loads of corner cases that I don't understand and this example does not even touch on compound words. Hope this helps: https://gist.github.com/aarondandy/ aaa622afeeb0cb86b0d4efe697c23be5 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AD9Q4jn0xPqrVTVfsO5SJLLWAA4zdqvbks5ri6LegaJpZM4MRooe> .

rianjs · 2017-05-27T13:38:15Z

I worked around this by using Hunspell's unmunch command which will generate all forms of all words. This is probably quicker/easier for a one-off job. (And it enables really fast comparisons against a HashSet<string>--at least an order of magnitude faster than Hunspell itself.)

MonsterMMORPG · 2017-05-27T20:00:47Z

unmunch doesnt work for UTF8 e.g. arabic

…

On Sat, May 27, 2017 at 4:38 PM, Rian Stockbower ***@***.***> wrote: I worked around this by using Hunspell's unmunch will generate all forms of all words. This is probably quicker/easier for a one-off job. (And it enables *really* fast comparisons against a HashSet<string>--at least an order of magnitude faster than Hunspell itself.) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AD9Q4jcwpa2ZWPoK-3JKrNtRN1O1KlrXks5r-CdIgaJpZM4MRooe> .

rianjs · 2017-05-28T13:24:22Z

unmunch doesnt work for UTF8

unmunch may not work for non-ASCII characters, or non-Latin characters, or unusual character encodings, but it absolutely works on UTF-8 files. You may want to read up on Unicode and character encodings.

MonsterMMORPG · 2017-05-28T13:40:54Z

ok lets say than for non-Latin characters

…

On Sun, May 28, 2017 at 4:24 PM, Rian Stockbower ***@***.***> wrote: unmunch doesnt work for UTF8 unmunch may not work for non-ASCII characters, or non-Latin characters, or unusual character encodings, but it absolutely works on UTF-8 files. You may want to read up on Unicode and character encodings <https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/> . — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AD9Q4jn7CLvgNeWhQpoiiOJ5YaNPmohzks5r-XWGgaJpZM4MRooe> .

aarondandy added the question label Mar 6, 2017

aarondandy added the enhancement label Jul 20, 2017

aarondandy mentioned this issue Aug 14, 2023

Get words that start with X #79

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is this possible? Scan dic file and obtain all forms of all files #1

Is this possible? Scan dic file and obtain all forms of all files #1

MonsterMMORPG commented Mar 2, 2017

aarondandy commented Mar 6, 2017

MonsterMMORPG commented Mar 7, 2017 via email

rianjs commented May 27, 2017 •

edited

Loading

MonsterMMORPG commented May 27, 2017 via email

rianjs commented May 28, 2017

MonsterMMORPG commented May 28, 2017 via email

Is this possible? Scan dic file and obtain all forms of all files #1

Is this possible? Scan dic file and obtain all forms of all files #1

Comments

MonsterMMORPG commented Mar 2, 2017

aarondandy commented Mar 6, 2017

MonsterMMORPG commented Mar 7, 2017 via email

rianjs commented May 27, 2017 • edited Loading

MonsterMMORPG commented May 27, 2017 via email

rianjs commented May 28, 2017

MonsterMMORPG commented May 28, 2017 via email

rianjs commented May 27, 2017 •

edited

Loading