Skip to content
JavaScript CSS Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
scripts
src
static
.gitignore
.prettierrc
.travis.yml
LICENSE
README.md
gatsby-browser.js
gatsby-config.js
gatsby-node.js
gatsby-ssr.js
package-lock.json
package.json

README.md

Tokenisation Benchmark Visualization

This tool contains a collection of tokenisation benchmarks for Thai. We aim to have all major algorithms's benchmarks included in this tool. It also has features that allows one to compare and investigate cases when each algorithm fails.

Related Works

Datasets

Name Link Description
BEST Validation Set Link This is a validation set that I randomly selected from BEST's training set.
Thai National Historical Corpus (TNHC) Link Classical Thai literature texts. Some preprocessing steps were applied.
Orchid Link Thai Academic articles. Some preprocessing steps were applied.
กลอนตากลม Link โดย คมเพชร เชิงกลอน ภาค สายลม

How to obtain the benchmark result?

Datasources of this tool are artifacts produced by the Tokenisation Benchmark for Thai script, i.e. eval-details-*.json. Please create an issue if you want to include your benchmark in this tool.

Requirement

  • NodeJS v11.4.0

Development

This project is created by using Gatsby. One can start a development server using the command below:

$ npm run develop

For production deployment, please use scripts/deploy.sh, preparing a production build and GitHub Page synchronisation.

$ ./scripts/deploy.sh

Acknowledgements

You can’t perform that action at this time.