KS X 1026 Python

Python implementation for KS X 1026-1.

KS X 1026-1

KS X 1026-1 is a Korean standard for Hangul processing guide for information interchange. More informations are available here.

Installation

KS X 1026 Python is available via PyPi

pip install ksx1026

or setup.py

python setup.py install

Normalizations

Hangul Decomposition

Returns a Johab Modern Hangul Syllable Block for the given Wanseong Modern Hangul Syllable Block

char S: Single character Hangul Syllable. If not, return input.

>>> from ksx1026.normalization import decomposeHangul
>>> c = "\uAC01"
>>> d = decomposeHangul(c)
>>> print(d.encode('raw_unicode_escape'))
b'\\u1100\\u1161\\u11a8'

Hangul Decomposition String

Returns a Johab Modern Hangul Syllable string for the given Wanseong Modern Hangul Syllable string

string source: unicode string.

>>> from ksx1026.normalization import decomposeHangulStr
>>> source = "\uAC01\uAC01"
>>> d = decomposeHangul(source)
>>> print(d.encode('raw_unicode_escape'))
b'\\u1100\\u1161\\u11a8\\u1100\\u1161\\u11a8'

Hangul Composition

Returns a Wanseong Modern Hangul Syllable Block for the given Johab Modern Hangul Syllable Block. Even when a portion of an Old Hangul Syllable Block is a Modern Hangul Syllable Block,unlike UAX #15, that portion is not transformed to a Wanseong Modern Hangul Syllable Block.

string source: unicode string.

>>> from ksx1026.normalization import composeHangul
>>> source = "\u1100\u1161\u11a8"
>>> d = composeHangul(source)
>>> print(d.encode('raw_unicode_escape'))
b'\\uac01'
>>> source = "\u1100\u1161\u11c3"
>>> d = composeHangul(source)
>>> print(d.encode('raw_unicode_escape'))
b'\\u1100\\u1161\\u11c3'

Hangul Recomposition

If one uses a UAX #15 algorithm instead of the above composeHangul function for normalization, an Old Hangul Syllable Block can be decomposed into a Wanseong Modern Hangul Syllable Block and Johab Hangul Letter(s). In such cases, after applying, one can use the following recomposition algorithm to restore a character string in Normalization Form NFC or NFKC to an L V T format.

string source: unicode string

>>> from ksx1026.normalization import recomposeHangul
>>> source = "\uac00\u11c3"
>>> d = recomposeHangul(source)
>>> print(d.encode('raw_unicode_escape'))
b'\\u1100\\u1161\\u11c3'

Normalization of Compatibility/Halfwidth Hangul Letters and Hangul-embedded symbols

Normalizing Compatibility/Halfwidth Hangul Letters and Hangul-embedded symbols (NormalizeJamoKDKC)

string source: unicode string

>>> from ksx1026.normalization import normalizeJamoKDKC
>>> source = "\u3200"
>>> d = normalizeJamoKDKC(source)
>>> print(d.encode('raw_unicode_escape'))
>>> b'(\\u1100\\u1160)

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.github/workflows		.github/workflows
docs		docs
ksx1026		ksx1026
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.rst		README.rst
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KS X 1026 Python

KS X 1026-1

Installation

Normalizations

Hangul Decomposition

Hangul Decomposition String

Hangul Composition

Hangul Recomposition

Normalization of Compatibility/Halfwidth Hangul Letters and Hangul-embedded symbols

About

Releases 4

Packages

Languages

License

Pusnow/KS-X-1026-Python

Folders and files

Latest commit

History

Repository files navigation

KS X 1026 Python

KS X 1026-1

Installation

Normalizations

Hangul Decomposition

Hangul Decomposition String

Hangul Composition

Hangul Recomposition

Normalization of Compatibility/Halfwidth Hangul Letters and Hangul-embedded symbols

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages