Skip to content

suggest output should be in VISL CG3 stream format #29

@flammie

Description

@flammie

Like all the other steps in the typical grammarchecker pipeline, the last suggest step should default to linguist-readable VISL CG 3 format, i.e. same as divvun-suggest does:

$ cat tools/grammarcheckers/modes/smegram.mode
#!/bin/sh

hfst-tokenise -g '/Users/tpi006/github/giellalt/lang-sme/tools/grammarcheckers/tokeniser-gramcheck-gt-desc.pmhfst' \
 | divvun-blanktag '/Users/tpi006/github/giellalt/lang-sme/tools/grammarcheckers/analyser-gt-whitespace.hfst' \
 | vislcg3 -g '/Users/tpi006/github/giellalt/lang-sme/tools/grammarcheckers/valency.bin' \
 | vislcg3 -g '/Users/tpi006/github/giellalt/lang-sme/tools/grammarcheckers/mwe-dis.bin' \
 | cg-mwesplit \
 | divvun-blanktag '/Users/tpi006/github/giellalt/lang-sme/tools/grammarcheckers/analyser-gt-errorwhitespace.hfst' \
 | divvun-cgspell -n 10 -b 15.000000 -w 5000.000000 -u 0.400000 -l '/Users/tpi006/github/giellalt/lang-sme/tools/grammarcheckers/acceptor.default.hfst' -m '/Users/tpi006/github/giellalt/lang-sme/tools/grammarcheckers/errmodel.default.hfst' \
 | vislcg3 -g '/Users/tpi006/github/giellalt/lang-sme/tools/grammarcheckers/valency-postspell.bin' \
 | vislcg3 -g '/Users/tpi006/github/giellalt/lang-sme/tools/grammarcheckers/grc-disambiguator.bin' \
 | vislcg3 -g '/Users/tpi006/github/giellalt/lang-sme/tools/grammarcheckers/spellchecker.bin' \
 | vislcg3 -g '/Users/tpi006/github/giellalt/lang-sme/tools/grammarcheckers/grammarchecker.bin' \
 | divvun-suggest -g '/Users/tpi006/github/giellalt/lang-sme/tools/grammarcheckers/generator-gramcheck-gt-norm.hfstol' -m '/Users/tpi006/github/giellalt/lang-sme/tools/grammarcheckers/errors.xml' -l se
$ echo 'cuolbma' | tools/grammarcheckers/modes/smegram.mode
"<cuolbma>"		cuolbma	→  čuolbma	→  čuolbmá	→  čuolmma	→  čuolbmi	→  čulbme	→  sulbmo	→  čuolbman	→  suolbmu	→  suolmmo	→  čuolbmal (msg: Čállinmeattáhus --- Hápmi ii leat sátnelisttus.)
	"čuolbma" N Sem/Dummytag Sg Nom <W:26.9993> <WA:11.9993> <spelled> "čuolbma"S SUGGESTWF &typo
typo
	"čuolbmat" V TV Ind Prs Sg3 <W:35.3019> <WA:15.3019> <spelled> "čuolbmá"S SUGGESTWF &typo
typo
	"čuolbmái" A CmpN/SgN CmpN/PlG Sem/Hum Sg Acc <W:35.3019> <WA:15.3019> <spelled> "čuolbmá"S SUGGESTWF &typo
typo
	"čuolbmái" A CmpN/SgN CmpN/PlG Sem/Hum Sg Gen <W:35.3019> <WA:15.3019> <spelled> "čuolbmá"S SUGGESTWF &typo
typo
	"čuolbmat" V TV Imprt ConNeg <W:41.6992> <WA:11.6992> <spelled> "čuolmma"S SUGGESTWF &typo
typo
	"čuolbmat" V TV Imprt Sg2 <W:41.6992> <WA:11.6992> <spelled> "čuolmma"S SUGGESTWF &typo
typo
	"čuolbmat" V TV Ind Prs ConNeg <W:41.6992> <WA:11.6992> <spelled> "čuolmma"S SUGGESTWF &typo
typo
	"čuolbma" N Sem/Dummytag Sg Gen <W:41.6992> <WA:11.6992> <spelled> "čuolmma"S SUGGESTWF &typo
typo
	"čuolbma" N Sem/Dummytag Sg Acc <W:41.6992> <WA:11.6992> <spelled> "čuolmma"S SUGGESTWF &typo
typo
	"čuolbmat" V TV Imprt Du2 <W:45.3019> <WA:15.3019> <spelled> "čuolbmi"S SUGGESTWF &typo
typo
	"čuolbmat" V TV PrsPrc CmpNP/None <W:45.3019> <WA:15.3019> <spelled> "čuolbmi"S SUGGESTWF &typo
typo
	"čulbmet" V TV Ind Prs ConNeg <W:55.3019> <WA:15.3019> <spelled> "čulbme"S SUGGESTWF &typo
typo
	"čulbmet" V TV Ind Prs Sg3 <W:55.3019> <WA:15.3019> <spelled> "čulbme"S SUGGESTWF &typo
typo
	"čuolbmat" V TV Ind Prt Pl3 <W:55.3019> <WA:15.3019> <spelled> "čulbme"S SUGGESTWF &typo
typo
	"čuolbmat" V TV Ind Prs Du1 <W:55.3019> <WA:15.3019> <spelled> "čulbme"S SUGGESTWF &typo
typo
	"suolbmut" V IV Ind Prt Pl3 <W:55.3019> <WA:15.3019> <spelled> "sulbmo"S SUGGESTWF &typo
typo
	"čuolbmat" V TV Actio Gen <W:58.0491> <WA:13.0491> <spelled> "čuolbman"S SUGGESTWF &typo
typo
	"čuolbmat" V TV Actio Nom <W:58.0491> <WA:13.0491> <spelled> "čuolbman"S SUGGESTWF &typo
typo
	"čuolbmat" V TV Ind Prt ConNeg <W:58.0491> <WA:13.0491> <spelled> "čuolbman"S SUGGESTWF &typo
typo
	"čuolbmat" V TV PrfPrc <W:58.0491> <WA:13.0491> <spelled> "čuolbman"S SUGGESTWF &typo
typo
	"čuolbma" N Sem/Dummytag Ess <W:58.0491> <WA:13.0491> <spelled> "čuolbman"S SUGGESTWF &typo
typo
	"suolbmut" V IV Ind Prs Sg3 <W:60.3019> <WA:15.3019> <spelled> "suolbmu"S SUGGESTWF &typo
typo
	"suolbmut" V IV PrsPrc CmpNP/None <W:60.3019> <WA:15.3019> <spelled> "suolbmu"S SUGGESTWF &typo
typo
	"suolbmut" V IV Ind Prs ConNeg <W:60.3019> <WA:15.3019> <spelled> "suolmmo"S SUGGESTWF &typo
typo
	"čuolbmalit" V TV Gram/3syll Imprt ConNeg <W:60.3019> <WA:15.3019> <spelled> "čuolbmal"S SUGGESTWF &typo
typo
	"čuolbmalit" V TV Gram/3syll Imprt Sg2 <W:60.3019> <WA:15.3019> <spelled> "čuolbmal"S SUGGESTWF &typo
typo
	"čuolbmalit" V TV Gram/3syll Ind Prs ConNeg <W:60.3019> <WA:15.3019> <spelled> "čuolbmal"S SUGGESTWF &typo
typo
:\n

currently divvun-runtime says:

$ echo 'cuolbma' | divvun-runtime run -p tools/grammarcheckers/bundle.drb
[
  {
    "form": "cuolbma",
    "beg": 0,
    "end": 7,
    "err": "typo",
    "msg": [
      "Spelling error",
      "Not in the dictionary"
    ],
    "rep": [
      "čuolbma",
      "čuolbmá",
      "čuolmma",
      "čuolbmi",
      "čuolbman",
      "sulbmo",
      "suolbmu",
      "čulbme",
      "čuolbmal",
      "suolmmo"
    ]
  }
]

Having programmer-readable formats e.g. json in aiddition would also a cool option for any step though.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions