New Hungarian tokenizer based on quex, huntoken
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
quntoken
src
test
.gitignore
LICENSE.txt
Makefile
README.md
VERSION

README.md

quntoken

New Hungarian tokenizer based on quex and huntoken. This tool is also integrated into the e-magyar language processing system under the name emToken.

Requirements

  • OS: linux x86-64
  • python 2.7 as default python
  • python 3.5+
  • g++ = 5
  • pytest

Install

git clone https://github.com/dlt-rilmta/quntoken.git
cd quntoken
make prereq
make all

Usage

./quntoken [OPTIONS] [-f FORMAT] FILE

Options

  • -d: Remove division of words at the end of the lines.
  • -f: Define output format. Valid formats: xml, json, vert. Default format: xml.
  • -V: Display version number and exit.
  • -h: Display help and exit.