SEETM 1.1.0 Release

SEETM (Sinhala-English Equivalent Token Mapper) allows creating equivalent token maps and replace them with a base token to avoid OOV tokens and generate a single feature for all equivalent tokens in a Sinhala-English code-switching dataset in rasa-based conversational AIs.

Features

Allows mapping multiple equivalent tokens into a base token
Fully supports rasa 2.8.x projects
Provides an easy-to-use CLI
Provides an efficient server-based GUI
Provides a fully-functional custom whitespace tokenizer
Fully-supports Sinhala in the GUI

What's Cooking?

Mapping suggestions in the SEETM server GUI
Automatically generated mappings

Limitations and Known Issues

Should manually add the SEETM tokenizer to the rasa pipeline or else the token maps are not taking any effect
IPA-based suggestions could contain slight changes based on th IPA mapping origin. (SEETM uses CMU)

Resources and References

CMU Pronunciation Dictionary
eng-to-ipa pip package (GitHub)

📒 Docs: https://seetm.github.io
📦 PyPi: https://pypi.org/project/seetn/1.1.0/
🪵 Full Changelog: https://github.com/SEETM-NLP/seetm/blob/main/CHANGELOG.md

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
seetm		seetm
.DS_Store		.DS_Store
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
READMEPyPI.md		READMEPyPI.md
SEETMBUILD.md		SEETMBUILD.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

seetm

seetm

.DS_Store

.DS_Store

.gitignore

.gitignore

CHANGELOG.md

CHANGELOG.md

LICENSE

LICENSE

README.md

README.md

READMEPyPI.md

READMEPyPI.md

SEETMBUILD.md

SEETMBUILD.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

SEETM 1.1.0 Release

SEETM (Sinhala-English Equivalent Token Mapper) allows creating equivalent token maps and replace them with a base token to avoid OOV tokens and generate a single feature for all equivalent tokens in a Sinhala-English code-switching dataset in rasa-based conversational AIs.

Features

What's Cooking?

Limitations and Known Issues

Resources and References

About

Releases 2

Packages

Languages

License

SEETM-NLP/seetm

Folders and files

Latest commit

History

Repository files navigation

SEETM 1.1.0 Release

SEETM (Sinhala-English Equivalent Token Mapper) allows creating equivalent token maps and replace them with a base token to avoid OOV tokens and generate a single feature for all equivalent tokens in a Sinhala-English code-switching dataset in rasa-based conversational AIs.

Features

What's Cooking?

Limitations and Known Issues

Resources and References

About

Resources

License

Stars

Watchers

Forks

Languages