HKIME

HKIME is an on-going project that aims to create an Input Method Editor for Cantonese. Progress is documented in the Jupyter Notebooks located under the directory notebooks.

Our latest milestone is a simple bi-gram based statistical language model with no smoothing from a corpus made from scraping Cantonese Wikipedia. We are currently working on training a NN-based language model.

Relevant papers and sources that have tremendously aided our progress are in resources.md and sources.md.

Core Components of the Project

All of these are in progress at the moment

Fuzzy Jyutping / Processing different romanizations
Jyutping Segmentation
Jyut2Char: Jyutping to Character Conversion
Scraping / Corpus Generation
Neural Net Language Model

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
hkime		hkime
notebooks		notebooks
scraping		scraping
.gitignore		.gitignore
README.md		README.md
resources.md		resources.md
sources.md		sources.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HKIME

Core Components of the Project

About

Releases

Packages

Contributors 4

Languages

Jyutt/HKIME

Folders and files

Latest commit

History

Repository files navigation

HKIME

Core Components of the Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages